Johannes Beus
The quality of the searchresults decides the success of a searchengine: if the user repeatedly gets irrelevant sites, he will switch to a searchengine that feeds him better hitlists rather quickly. Besides the attempt to keep the quality as high as possible through the tuning of the many set screws, meaning the ratingfactors, there is another approach in trying to reduce the probability of displaying pages that are clearly unfitting. The – at least unofficially confirmed – favoritism of Wikipedia in the SERPs can be explained by stating that while Wikipedia articles might not always be the best about a subject, the are very seldom really bad. In the following I want to get into one possible way of identifying these potential “problemwebsites”. Having a few billion sites in your index, as large searchengines do, you often see the development of regularities. Seeing that, at the moment, I have these data in a database and they will make a good example, here the average length of the title-tag for German websites:

You can quickly spot that for the majority of pages the title has a length between 10 and 130 characters. There are similar distributions for a multitude of other website-characteristics, some are as simple and easy to spot as the length of the title, but many of then are much more complex and in combination of different characteristics. This means that, for example, the number of external inks in relation to the available content or the number of headlines in relation to the length of the text could yield interesting clues.
If a website has enough clues present that it does not comply to the “usual”, that can be found on the web, it may happen that a searchengine will submit the site to a special treatment and take a closer look whether this is just a harmless wild shot or if the sites quality does not meet the searchengines expectations. Up to here it sounds rather simple, just build your page to the specifications that most other sites adhere to as not to attract attention. Even so, searchengines do not live off the assessment of mainly Onpage-factors anymore but are strongly targeting linkstructures and that is where things get really interesting, but more on that in the next part of this series.
Is this still normal? – part I
Is this still normal? – part II
Is this still normal? – part III