Google-Index, Google-Bot and the Crawling Process

A website can only be found in a Google Search result after it has been added to the Google’s Index, and there are a number of ways to influence that. Understanding and controlling the process is extremely important as mistakes can have a huge negative impact. Get a quick overview by using this article. More detailed articles are linked.

Lily Ray talks about the crawling and indexing process

Lily Ray is a member of the SISTRIX data journalism team

What is Crawling and why is it needed?

The only way you’ll get into the Google Index, the source of all Google search results, is if you let the ‘Googlebot’ crawl your website.

Googlebot is Google’s web crawling bot (sometimes also called a “spider”). Crawling is the process by which Googlebot discovers new and updated pages to be added to the Google index.

Google Search Console-Help

This article from Google, the Basics of the Google-Bot, will help you understand how the crawling process feeds into the Google Index and how the ranking algorithm uses the index to sort and present search results to users we’ve summarised it in this image.

Google index and SERPs algorithm process
Google crawl and index process.

Is crawling important for SEO?

Without a crawler taking a look at your website, there’s no chance of appearing in Google search results. It’s as simple as that.

If you’re lucky, Google will find your website through a link on another site, crawl it and index it without you doing anything but it not only a hit-and-miss process, it’s also important to know when it happens and how much of the site got crawled and indexed. This is where the SEO’s most important tool comes into play – the Google Search Console. GSC, as it is commonly referred to, provides tool for submitting sites, checking crawling and indexing and viewing potential issues.

How do I get Google to crawl my website?

The simple answer is to connect the website to GSC and use the ‘submit to index‘ feature. There are a few other ways too. Either your site is found from links from other sites, which is difficult to track, can take time and is no guaranteed way to get a site crawled, or you can ‘ping’ a sitemap to google.

Are there crawling and indexing issues I should be aware of?

One of the most important considerations is mobile-first indexing which will take a smartphone view of your site and index the content it finds for both desktop and mobile searches. If your site hides certain content from being seen on a mobile phone, it won’t appear in the either the mobile or desktop search results.

There are considerations and controls you can use to guide google. For example, you can prevent Google from following links on your website, prevent it from crawling certain directories and tell it not to index certain HTML pages, or other page types it finds. If you want Google to stay away from your site, you can do that too, but beware that Google might make up its own mind and index some pages anyway, based on incoming links.

Making sure your website is only accessible through one domain or hostname is important too. You don’t want two versions of your site being available via, for example, a www and non-www version.

If your website was in the search results and suddenly disappears, here’s a guide to tracking down the problem. It could be that you’ve been manually removed from Google because of bad practices or, more likely, there’s a technical issue such as a misconfiguration in the robots.txt file or header tags.

You can measure crawler activity in GSC, or in your website logs by looking for the crawlers user-agent. The bot has limitations too. Think about forms. Can Google’s crawler fill out forms and submit them? What happens when a site uses Javascript to create html. Will Google see that? (Answer: In most cases yes, but possibly not immediately.)

If you have a very large website you’ll need to consider the crawling budget as Google won’t spend unlimited time crawling through millions of pages. Crawling for extensive websites is covered in this article.

To make it easier for the Google Bot to crawl and understand your own website it is important to practice good OnPage-Optimization as well as using a solid page structure (Sitemaps) and the internal link-structure in mind.

Frequently asked questions

Crawling and indexing case studies and related news

How search works – What Google says

The life span of a Google query is less then 1/2 second, and involves quite a few steps before you see the most relevant results. This overview video is a good starting point. If you want to know more, detailed articles from Google are listed here.

Steve Paine