Whatever your reasons may be for blocking Google from crawling all or parts of your domain, you can do so within the so-called robots.txt file.
Blocking the Google-Bot using the robots.txt
The robots.txt is a simple text file with the name “robots”. It has to be placed in the root-directory of a website in order for search engines to follow the directives.
If a website has a robots.txt, it can be accessed through the following path:
http://www.my-domain.com/robots.txt
The contents of the robots.txt
By using the following instructions, we exclusively forbid the Google-Bot access to our entire website:
You have to add the following to your robots.txt to tell Google-Bot to stay away from the entire domain:
User-Agent: Googlebot
Disallow: /
If you only want to restrict access to some directories or files instead of the entire website, the robots.txt has to contain the following:
The following only tells Google-Bot that it is forbidden from accessing the directory “a-directory” as well as the file “one-file.pdf”:
User-Agent: Googlebot
Disallow: /a-directory/
Disallow: /one-file.pdf
Tip
The code examples shown here are only meant for Google-Bot. Crawlers from other search engines, such as Bing, will not be blocked.