Free tools

Robots.txt Examples

Robots.txt file content for huffpost.com.

Robot.txt file for: huffpost.com

      # Cambria robots

User-agent: grapeshot
Disallow: /member
Disallow: /*?*err_code=404
Disallow: /search
Disallow: /search/?*

User-agent: *
Crawl-delay: 4
Disallow: /*?*page=
Disallow: /member
Disallow: /*?*err_code=404
Disallow: /search
Disallow: /search/?*
Disallow: /mapi/v4/*/user/*
Disallow: /embed

User-agent: Amazonbot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: google-extended
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Googlebot
Allow: /
Disallow: /*?*err_code=404
Disallow: /search
Disallow: /search/?*

User-agent: GPTBot
Disallow: /

User-agent: magpie-crawler
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: TurnitinBot
Disallow: /

# archives
Sitemap: https://www.huffpost.com/sitemaps/sitemap-v1.xml
Sitemap: https://www.huffpost.com/sitemaps/sitemap-google-news.xml
Sitemap: https://www.huffpost.com/sitemaps/sitemap-google-video.xml
Sitemap: https://www.huffpost.com/sitemaps/sections.xml

# huffingtonpost.com archive sitemaps
Sitemap: https://www.huffpost.com/sitemaps-huffingtonpost/sitemap.xml
Sitemap: https://www.huffpost.com/sitemaps-huffingtonpost/sections.xml