Upload filters were front and centre in the debate about the EU-directive on copyright in the Digital Single Market. This led many to almost overlook another provision: The Copyright Law for Press Publishers in the EU. In this article we show how relevant journalistic content is for the result pages on Google. The results show that the overwhelming majority of Google’s business is realised without content from press publishers.
The subject of copyright law for press publishers has reached Europe due to the new copyright directive. At its core, the directive includes, among other things, the question of whether showing snippets of journalistic content will require royalties?
In order to evaluate how important journalistic content is for Google, we used our databases and analysed both the size and relevance that this section of the search results actually has for Google.
Our Data Evaluation
As an initial step, we defined which results will be affected by this directive. Seeing how there are no authoritative, public lists we decided to include any domain which had published at least five articles on Google News, within the previous week. Next, we cleared out domains which obviously did not offer journalistic content as the main part of their business. This resulted in a list with 4,075 domains, which we based our evaluation on.
Only 0.11% of commercial search queries can be characterised as journalistic in nature
We started out with the obvious question of how relevant journalistic content is for search queries on which Google actually makes money in their Google search (Google News does not contain ads).
For this, we measured the number of search queries in which domains from our list of 4,075 domains took up at least five of the usual ten organic search results on the first page. As an added indicator of how important a search is for Google, these search queries also needed to show ads (Google Ads or Google Shopping).
The result was surprisingly low: for the about 20 million search queries we evaluated (20,949,557 to be exact), only 22,923 search queries had a journalistic character as well as including ads. This means that only 0.11% of search queries with a journalistic character are also commercially relevant for Google.
2.65% of search queries can be characterised as journalistic
Our next evaluation went after the question of how many search queries on Google could be characterised as being journalistic in nature. For this, we measured the search queries in which domains from our list made up at least five of the usual ten results on the first result page.
For the about 20 million search queries this was true for 554,350. This means that 2.65% of search queries on Google can be characterised as journalistic. We then ran the same evaluation again and looked how many search queries showed seven of the ten results with domains from our list and were left with 0.66% of search queries which can be characterised as strongly journalistic in nature.
10.18% of all results from journalistic domains
Finally, we wanted to know how many results are from journalistic domains, in general? For this we evaluated about 1.1 billion Google results (1,122,469,731 to be precise). These Google results paint a representative picture of the search behaviour in the United Kingdom on Google.co.uk
114 million results (114,293,863 to be precise) of those 1.1 billion results were from the list we defined in the beginning. This means that, based on our data analysis, currently there are 10.18% of Google results which would be affected by the new EU directive.
Interestingly, if we only considered the 50 most successful journalistic domains on Google, they would make up 3.01% of the Google results, by themselves. Which makes them responsible for a large part of the results affected by the directive.
The Copyright Law for Press Publishers has been around in Germany sind 2013 and has not led Google to pay any royalties to the publishers. A similar directive in Spain resulted in Google News and a number of similar services to shut down. This led the majority of especially small publishers to register heavy losses (study) and even big newspapers like El Pais took a public stand against the directive.
There are already extensive controls over both the indexing as well as what can be shown as results on Google’s result pages. Every website owner can completely forbid the indexing (robots.txt) or decide, on a granular level, how the snippets for each document should look (meta robots).
Any attempt at squaring this circle will not work: being shown in Google’s results with their snippet to both profit from the traffic as well as additionally getting royalties. Based on our evaluation it has become clear that journalistic content is not commercially relevant enough for Google. When push comes to shove, Google will be able to live without these results in their hit lists.