Some thoughts on the problem, not intended as a complete proposal or argument:
* Indexing is expensive. If there's a shared public index, that'd make it a lot easier for people to try new ranking algorithms. Maybe the index can be built into the way the new internet works, like DNS or routing, so the cost is shared.
* How fast a ranking algorithm is depends on how the indexing is done. Is there some common set of features we could agree on that we'd want to build the shared index on? Any ranking that wants something not in the public index would need either a private index or a slow sequential crawl. Sometimes you could do a rough search using the public index and then re-rank by crawling the top N, so maybe the public index just needs to be good enough that some ranker can get the best result within the top 1000.
* Maybe the indexing servers execute the ranking algorithm? (An equation or SQL-like thing, not something written in a Turing Complete language). Then they might be able to examine the query to figure out where else in the network to look, or where to give up because the score will be too low.
* Maybe the way things are organized and indexed is influenced by the ranking algorithms used. If indexing servers are constantly receiving queries that split a certain way, they can cache / index / shard on that. This might make deciding what goes into a shared index easier.
> Indexing is expensive. If there's a shared public index, that'd make it a lot easier for people to try new ranking algorithms. Maybe the index can be built into the way the new internet works, like DNS or routing, so the cost is shared.
But what are you storing in your index? The content that is considered in your ranking will vary wildly by your ranking methods. (example - early indexes cared only for the presence of words. Then we started to care about the count of words, then the relationships between words and the context. Then about figuring out if the site was scammy, or slow.
The only way to store an index of all content (to cover all the options) is to...store the internet.
I'm not trying to be negative - I feel very poorly served by the rankings that are out there, as I feel on 99% of issues I'm on the longtail rather than what they target. But I can't see how a "shared index" would be practical for all the kinds of ranking algorithms both present and future.
* Indexing is expensive. If there's a shared public index, that'd make it a lot easier for people to try new ranking algorithms. Maybe the index can be built into the way the new internet works, like DNS or routing, so the cost is shared.
* How fast a ranking algorithm is depends on how the indexing is done. Is there some common set of features we could agree on that we'd want to build the shared index on? Any ranking that wants something not in the public index would need either a private index or a slow sequential crawl. Sometimes you could do a rough search using the public index and then re-rank by crawling the top N, so maybe the public index just needs to be good enough that some ranker can get the best result within the top 1000.
* Maybe the indexing servers execute the ranking algorithm? (An equation or SQL-like thing, not something written in a Turing Complete language). Then they might be able to examine the query to figure out where else in the network to look, or where to give up because the score will be too low.
* Maybe the way things are organized and indexed is influenced by the ranking algorithms used. If indexing servers are constantly receiving queries that split a certain way, they can cache / index / shard on that. This might make deciding what goes into a shared index easier.