Brave new Signal-World

From:

Published: 29.02.2012

Modified: 13.09.2022

We just got used to the idea that SEO does not only mean the mandatory listing of all meta-keywords, but that it also consists of linkbuilding and already the world has turned and there are new signals like user-behavior and social-media-data that take the high seat in the public’s perception. And just in case this wasn’t enough, Google has now created a smokescreen with their monthly blogposts, which regularly makes it harder to focus on whats really important. This also leads to interesting discussions in numerous blogs and networks. I want to use this posting to add some points to the discussion at large.

It might sometimes seem hard to remember with all the new features and verticals coming out all the time, but remember that Google is still a full-text-searchengine. I don’t want to go on and on about the basics, but I believe that they can be quite helpful in comprehending certain relationships. Google uses a webcrawler that goes through large parts of the public Internet and uses the words it finds there to fill its index. Now, when someone comes to Google for advice, Google first looks at their index to find the sites where the queried word is actually present. Depending on the query, this may be a list with a few million URLs. Only then, in a second step, does Google use its ominous algorithm, with which we deal with on a daily basis. Google will then sort the list with those URLs from step one with the help of a presumably huge list of rules and processes, just to show us the first 10 results.

To actually get picked for the algorithmic sorting, two preconditions have to be met: first, Google needs to have crawled the site and saved it in its index and then Google also needs to classify that site as relevant for the particular searchquery. The first condition can usually be achieved by using a solid page-layout: use an orderly information-structure and sensible internal linking to show the Googlecrawler the way. As far as the second condition is concerned, Google will use a rather simple indicator 99% of the time: the word (or a synonym) that is being searched for can be found on the page or withing the title of the page. Only once these conditions are met, do we get to the sorting and ranking of URLs. So how does user-behavior and social-network-signals fit into this system?

I am rather certain, that Google will only use these two signals during the last step, the sorting of results. And even there we see obvious difficulties, which is likely the reason why these two factors don’t take up a huge significance in the algorithm, at the moment. When we look at the user-behavior, you notice that the fun only starts once you put them in relation to the actual searchquery. Meaning a bounce rate for that one URL for that one keyword, instead of a global bounce-rate for the domain. If we take a look at the click-rates on the Google results pages, it quickly becomes apparent, that the click-rate takes a massive plunge once you are past the first page or results. This means that Google will not be able to get much meaningful user-data from there and the further we go towards the long-tail, the more inadequate the coverage becomes. By implication, this actually also means that this signal could be used to decide whether to rank a site on position 3 or 4 (ReRanking), while it will clearly be unable to help with the decision of whether the site belongs in the top-10 or top-1.000, at all.

When we look at the social-signals, we get a situation that’s even more deadlocked: at the moment, Google does not have a reliable source for this data. After they canceled their contract with Twitter for a complete subscription of all tweets, Twitter converted their system to replace all the URLs on publicly available websites with their own URL-shortener and setting them to ‘nofollow’. When it comes to the relationship between Facebook and Google, you couldn’t call it so friendly that Facebook would home-deliver the necessary data to their competitor. All that is left for a possible source is Google+. We have been gathering the signals for URLs for a while now and it is impossible to make out a trend that Google+ is actually being used more. A new Spiegel Online article, for example, has 1.608 mentions on Facebook, 227 tweets and a whopping 3 Google+ votes. Not exactly what you would call a solid foundation for an elementary part of an algorithm, that is responsible for 95% of the revenues for a publicly-traded company. So, how can we measure the significance a rankingsignal has on Google’s algorithm? When Google starts to publicly warns people about not manipulating these signals, then it is about time to start giving some thought to these signals …

From:

Johannes Beus

Published: 29.02.2012