Search Engines & SEO Blog
OpenLinkGraph: Index-size & BenchmarkJohannes Beus
What is the size and the quality of the current OpenLinkGraph link-index? Answering this question is more complicated than it might appear at first sight. Let's start out with only the facts: for the current index we crawled 4.2 billion websites, where we detected approximately 45 billion links, from which we put together our first public index. Putting these number into perspective is not quite as easy, when you consider that, not too long ago, an index-size of about 100 million sites was considered adequate for a German searchengine, while, for example, you can read on the Seomoz start-page that they take 400 billion websites into account for their index.To get a better feel for how the quality and quantity of our data perform in comparison to other services, we compared both the entire amount of links found as well as the amount of unique, linking domains for a number of domains. The end result gives a good estimation of how deep our crawler proceeds. The chart below shows our internal benchmarking: While Seomoz has the smallest index in this comparison, I feel as though the results for the OpenLinkGraph are definitely presentable for a first trial-run. For me, it was especially important to get good results for domains pertaining to the “daily use” like the vertical portals in the second group as well as the Bonn-centric domains in the third group. Seeing how Seomoz was solely up top for SEO-domains (last group) speaks to, in my mind, the fact that their crawling is not being controlled enough. I was a little surprised about the large discrepancy between our data for amazon.com (474 million) to those of Seomoz (137 million) as well as Majesticseo (247 million). Especially for these US-sites I would have expected both to have been stronger. For our second benchmark, we used the domain-popularity, which is the amount of different domains that link to the target-domain: For this benchmark, Majesticseo is the one trailing behind: it seems that they are not crawling with enough width to cover enough unique domains. Seomoz on the other hand is noticeably improving in this benchmark. Though, all in all, I am not without pride to be able to confirm that our data also comes out on top for this round of benchmarks. Especially for the domains that are important to me, I am happy to see much green smiling back at me. There will surely be enough examples, as we have already seen some in this comparison, where the Seomoz and/or Majesticseo data are more comprehensive – seeing how both offer a free-or-charge basic-version of their tools, it might be worth comparing them for yourself. Comments & Trackbacks1
10/03/2011 at 15:10
Great Info. Excited to see the changes, I want the web to be more relevant for me and for the folks searching, too. Looking forward to the top 100 list from you. THANKS!
|














