Robots.txt
Robots.txt is a text file placed in the root directory of a website that controls how crawlers access the website's content.
This file is not a foolproof method of preventing page indexing but can significantly decrease the chances that the page will be indexed. To help you better understand, imagine your website is a bank:
- The lobby (aka homepage) is where everyone enters.
- The vault (aka important, sensitive data) contains all the money and confidential customer information; hence, it is heavily restricted.
- Staff-Only Areas (aka private pages) where customers aren't allowed in.
Crawlers are people coming to the bank. Some are regular customers (good bots like Googlebot); others might be suspicious characters (spammy/A.I. bots).
The robots.txt is like a sign posted in the bank lobby politely telling everyone where they are allowed and not allowed to go.
html
User-agent: *
Disallow: /vault/
Disallow: /staff-only/
This means no one is allowed to enter the vault and staff-only area, as you do not want the public or search engines to see.
html
User-agent: content-scraper
Disallow: /
This means the specific bot 'content-scraper' is not allowed to enter the bank at all, as you do not want your bank to be visited at all by this particular bot.
- Published
- October 16, 2023
- Updated
- February 17, 2025