Problems of the backlink-analysis
Backlinks, meaning incoming links are in one way or another the main criterion for sorting the results for all current searchengines. So its reasonable that a large part of SEO-work usually falls within this area. While you can obtain backlinks for your own projects relatively easy and complete through the CSV-export of Google's Webmastertools or through assessing the reference information of your serverlogfiles, for competing-projects this is impossible. Often so called “backlink-checkers” are used in such cases. These tools have the assignment to queue possible sources for backlink-data and prepare the received data.Since we resort to a self-developed solution internally, which we partially made public a few weeks ago, I have some thoughts on such tools. I believe that even pure users should know about the possibilities and limitations of such tools to be able to interpret the results correctly.
The biggest problem of all backlink-checkers is the acquisition of the fundamental data. The most obvious and used way to acquire this information is to queue searchengines. While they were quite generous with the visualization of the data in the past, they have changed noticeably in the last few years: Google shows rather useless links, if at all, MSN has closed down their (formerly excellent) backlink-queue completely, which only leaves Yahoo as part of the big three. While Yahoo announces impressive numbers (~340.000 backlinks for sistrix.com) they only show about 1.000 of the backlinks – it might well be that relevant backlinks, which are responsible for the ranking of the domain, are not visible here. Now the problematic part is that all subsequent calculations of domain- or IP-popularity, PageRank or other key-data can merely build on those links that the tool really knows.
To broaden the data-pool there are a few different approaches. One is to query other searchengines additionally to the Yahoo-data and throw all returned links into one pot. The problem here is that the number of really existing, different indexes have noticeably shrunk. There might be a multitude of front-ends but in most cases they will just query Google's, Yahoo's or MSN's indexes. For this reason and because of the assumption that this will not improve in the future we decided to assemble our own backlink-index for the German-speaking Internet. Our own crawler searches parts of the Internet (which are relevant to us) and uses the results to build a database with linkage-information. This index is still in its infancy but for the most part it is already working well. We get considerably more backlinks for well linked sites like widipedia.org, for example, than other searchengines (want to) provide. It is not looking so rosy for sites that are less well linked but I think that we will see noticeable improvements in the next few month.
Another problem that all public backlink-checks share is scalability. While searchengines might have no problem with ten or twenty link-queries, the story changes with the few thousand or even tens of thousand queries, which a favorite backlink-checker does a day. In this case an operator has to think of possibilities how to accomplish this – most of the times this is neither technically nor financially an easy feat. It becomes more and more complicated if you do not only want the backlinks but if you want to display additional informations from external sources like, for example, the Google PageRank or the Alexa-rank for every backlink found. In this case the queries for a single backlink-query can quickly hit the thousands – a reason for us to keep this extension out of the public domain and not accessible for everyone.
This posting is older than 30 days and therefore closed for new comments.