Register / Login

Confusion about First Click Free

Ever since both the Hamburger Abendblatt as well as the Berliner Morgenpost started to function as guinea-pigs for paid content for the publishing-houses online presence at the beginning of the week, my feedreader is bristling with opinions and evaluations on this issue. I do not want to say anything about this contentwise, but I do want to write something on the technical implementation, seeing how there are obviously still some uncertainties here.

As far as publishing-houses go, Google is accommodating them more than any other content-distributor on the Internet. Danny Sullivan (from Searchengineland) has summarized this nicely a while ago. Part of this accommodation goes by the name of “First Click Free”: when a user is send to a newspapers site by Google, then they can read the first article for free and have to pay for those thereafter. Usually this is accomplished that users, which come from Google, are identified through their referrer and will then have to pay from the second pageview on.

Evil Cloaking?
For all of the pages to be entered into Google's websearch as well as Google News, the Googlebot needs to be able to crawl all of those pages without restrictions. For this, the searchenginecrawler needs to get a different version of the page than the human visitor: this is called cloaking. In the past, cloaking was a rather widespread phenomenon in the SEO-scene, but for the past few years, the advantages are so slim or nonexistent, that most will just abandon the practice. Now, when netzpolitik.org pushes the cloaking of both sides into the gray corner in a posting (“... I would have though that 'cloaking' would still lead to a sites exclusion from the searchengine result pages (SERPs).”), they are not thinking far enough ahead: in this case, there is no cloaking for the purpose of gaining some advantage in the SERPs but they are using a feature for a potential mentearization of the site with the expressed permission from Google.

Clumsy Cloaking
Another reason for postings (for example from Chip.de or Carta.info) takes into account the technical realization of the Bot/Human recognition. At the moment, the Googlebots footprint in the logfiles looks like this:

66.249.71.13 - - [16/Dec/2009:13:05:13 +0100] "GET /news/ HTTP/1.1" 200 16199 "-"
"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"


Up front the IP-Address, then some stuff that is of no importance to us and at the end, the User-Agent. There is the possibility to implement cloaking based on the IP-Address, the User-Agents or a combination of both. At least the Abendblatt seems to have decided to just consider the User-Agent. This is somewhat clumsy because this value can be changed by users as they wish (for example with a Firefox plugin), which then lets the visitor browse through all of the content on the site. It would be better to use a combination between IP-Address and User-Agent: the big searchengines are, for years now, offering a well established procedure for this. For every access where the User-Agent shows a visit by a searchengine, the first step checks the reverse-DNS-entry for the IP-Address:

beus@helios:~$ host 66.249.71.13
Name: crawl-66-249-71-13.googlebot.com


If a hostname with the googlebot.com domain resolves, it is resolved back to the IP-Address:

beus@helios:~$ host crawl-66-249-71-13.googlebot.com
crawl-66-249-71-13.googlebot.com A 66.249.71.13


In this case, everything looks fine: the same IP-Address as in the beginning resolves in the end. This can be used to keep users from getting to pages that are only for the searchengine, just because they changed their User-Agent. Incidentally, this subject matter is not that much of a new thing …
Johannes Beus - on Wed (12/16/2009) at 14:17 PM

Comments closedComments closed
This posting is older than 30 days and therefore closed for new comments.