What is the noindex tag?

The noindex tag in the source text of a specific URL asks search engines not to include that page in the search index.

What does ‘noindex’ mean, and when are noindex instructions used?

Noindex instructions are used by website operators to inform search engine crawlers that a specific page on a website should not be included in the search engine’s index.

Noindex is the only reliable means of ensuring that a URL does not appear in the SERPs in any circumstances. The process consumes crawl budget, as Google has to retrieve and evaluate the page, but it does not consume index budget.

How can I implement the noindex tag?

There are two ways to stop pages from being indexed with a noindex tag. This can be done either with a meta tag in the HTML of the page or in the HTTP header.

Via meta tag in the head section of the source code

<head>
<meta name="robots" content="noindex">
</head>

This command asks all search engines not to index. It is possible to stop only certain bots from indexing via a noindex call; this involves replacing the value of the “name” attribute with the name of the desired bot.

For the general Google bot:

<meta name="googlebot" content="noindex">

For the general Bing bot:

<meta name="bingbot" content="noindex">

Important: When you use individual bot names, remember that it is only those bots that will be addressed. Google itself has compiled a list of all possible Googlebot user agents.

Via HTTP response header

Another way to prevent websites from being indexed is via the X-Robots-Tag-Header. Here, the instruction on the indexing status is sent directly in the response header of the HTTP response.

A response could look like this:

HTTP/1.1 200 OK
(...)
X-Robots-Tag: noindex
(...)

This variant also allows individual bots from different search engines to be directly addressed and excluded. A complete list of all of the possible instructions that Google adheres to can be found on Google’s developer page.

What is the difference between the two types of implementation?

Both forms of implementation lead to the same result: the URL is not included in the search engine’s index.

The advantage of the X-Robots tag is that documents that do not include HTML source code, such as PDF and other files, can also be excluded from indexing.

A heads up on robots.txt

In order for the noindex tag to be recognised and executed by the crawlers, the pages must be crawlable. Therefore, it is important to make sure that page crawls are not prevented by the robots.txt file.

Steve Paine
09.07.2021