Do we need a Public Web Index?

Johannes Beus

(Author)

Published: 23.04.2019

Barely a week that goes by where Google’s use of its market strength isn’t criticised. Dirk Lewandowski, Professor at HAW Hamburg, has now proposed a public web index as an alternative to Google. Could this work?

Contents

Google owns the whole 'stack.'
Search in the era of Machine Learning: Strength in Numbers.
Change is Difficult
Summary

Google Search is currently in the news more often than it should be. In the last week, VG Media, a media-usage collection agency in Germany, called for 1.24 billion Euros from Google, the Android operating system is implementing the European Commission advice to open the operating system to new search engines and Idealo is suing for 500 million Euros.

Against this background, Dirk Lewandowski, Professor for Information Research & Information Retrieval at HAW Hamburg (
Hamburg University of Applied Sciences) and one of Europe’s “search engine professors” has made the suggestion (PDF version) to operate the individual parts of a search engine separately:

A proposal for building an index of the Web that separates the infrastructure part of the search engine – the index – from the services part that will form the basis for myriad search engines and other services utilising Web data on top of a public infrastructure open to everyone.

https://arxiv.org/abs/1903.03846

The core problem, he says, is that when designing a ranking algorithm, the world view of the creators always comes into play – and thus truly neutral and unbiased results are not given. Due to the market share of over 90% by Google in Europe, this is a problem.

The solution: A separation of crawling / index as well as the frontend / algorithm of search engines. On the basis of a public infrastructure, a whole range of different search providers should emerge and enable much-needed diversity.

Could this work? I’m rather sceptical for the following reasons.

Google owns the whole ‘stack.’

Google has, in the truest sense of the word, taken control of a large part of the Internet. Starting at the bottom, Google’s public and free DNS server is the most widely used DNS server and Android by far the most successful smartphone OS. The market share of Chrome on the desktop regularly reaches new highs and there are no serious competitors to Google Analytics. The list continues and, if Google doesn’t yet own a necessary piece of the puzzle, it will be bought, as seen with the search deals with Apple and Firefox.

Search in the era of Machine Learning: Strength in Numbers.

Whether data is the new oil or the new gold is not yet resolved (the only thing that is, is that I can’t hear either anymore) but the essence of the statement is already true. In the age of machine learning, hardware and software are largely public. The difference between a good and a bad result is the training data used to create the algorithm. And here Google has a huge lead: On the one hand, Google has, as described above, access to much more than just search data, on the other hand, the search volume is also large enough in Long-tail areas on Google to bring meaningful results. Lewandowski suggests the return of user data from all individual search engines to the core system. I dare to doubt that this can compensate for the structural disadvantages.

Change is Difficult

There are alternatives to Google; many in fact. They’re good and objectively oriented and yet almost everyone still wants the Google search engine. The trust in the Google brand, which has grown over the years, is so great in the search market that it is difficult to convince users to change. Users prefer Bing results in the Google layout than Google hits in the Bing design. A successful alternative must be significantly better than Google to initiate a switch to a new search engine.

Summary

Every push for more diversity in the European search engine landscape is helpful and welcome. The idea of providing crawlers and indexes as a kind of basic public service is understandable from an academic point of view. However, I am sceptical that it will work on the real-world Internet. However, the EU has already provided finance for a lot of stuff so who knows? It’s worth a try.

Johannes Beus

(Author)

Published: 23.04.2019