Robots.txt

Robots.txt is a text file placed in the root directory of a website that controls how crawlers access the website's content.


This file is not a foolproof method of preventing page indexing but can significantly decrease the chances that the page will be indexed. To help you better understand, imagine your website is a bank:

  • The lobby (aka homepage) is where everyone enters.
  • The vault (aka important, sensitive data) contains all the money and confidential customer information; hence, it is heavily restricted.
  • Staff-Only Areas (aka private pages) where customers aren't allowed in.

Crawlers are people coming to the bank. Some are regular customers (good bots like Googlebot); others might be suspicious characters (spammy/A.I. bots).

The robots.txt is like a sign posted in the bank lobby politely telling everyone where they are allowed and not allowed to go.

html

User-agent: *
Disallow: /vault/
Disallow: /staff-only/

This means no one is allowed to enter the vault and staff-only area, as you do not want the public or search engines to see.

html

User-agent: content-scraper
Disallow: /

This means the specific bot 'content-scraper' is not allowed to enter the bank at all, as you do not want your bank to be visited at all by this particular bot. 

Published
October 16, 2023
Updated
February 17, 2025

Frequently Asked Questions

Yes, robots.txt allows Google to crawl your website. However, it tells Google which pages on your website it can and cannot crawl. Google will obey the instructions in your robots.txt file.

If you don't have a robots.txt file, search engines will typically crawl and index all accessible pages on your website. This means that there won't be any specific instructions given to search engines about which pages they can or cannot crawl. It's generally recommended to have a robots.txt file in place in order to have more control over how search engines interact with your website.