Ranking data errors, and how we handle them

Measuring correct rankings is complex. More than ten years of experience has helped us to successfully sail around most of the rocks, but sometimes mistakes do happen. In this document we explain how we handle them.

The collection of search engine rankings takes place in two steps. In the first step, we collect the raw data. This raw data is then processed in a second stage.

When processing, we analyse not only the organic rankings, but many other features of the search results too. For most search terms, there are hundreds of fields that need to be identified correctly.

The errors described here usually occur when search engines customise the appearance of the search results or insert new result types. It is not a matter of avoidable errors in the development of the toolbox, but of necessary adaptations to an updated SERP layout.

Automatically detected errors

Since we publish daily data for all important markets, this process has to be completely automatic and independent. Deviations from normal results over previous days are automatically detected which prevents data from automatically being published in the Toolbox. In this case an employee will first check the cause and apply fixes if necessary.

If the daily data is not available at the beginning of the day, this is usually why. The checked and possibly-corrected data is published as soon as possible, often on the same day.

Non-automated detection

The automatic detection of errors only works if a sufficient number of results are affected. If fields or features are affected that are only present in a few domains, we check them after any inconsistencies are noticed, or reported by our users.

If we can determine the cause of the mistake, a solution is implemented into the second stage of the ranking measurement process. From then on, all newly processed data is correct.

Since we provide new data every day for all relevant markets, these errors are often visible in the data of the previous few days. If we consider the impact of the error to be relevant enough for all Toolbox users, we have the opportunity to correct the error retrospectively.

This is possible because we archive the raw data from the first step of the ranking measurement, described above, for over four weeks. Only the process of processing the raw data is repeated. The original, and correct raw data for the right date will continue to be used.

As a result of this, ranking and visibility data can change. We understand that such changes to data from the last few days are not nice, but we consider this better than the possibility of incorrect data. As a rule, only daily toolbox data is affected, not the weekly toolbox data.