Free tools

Robots.txt Examples

Robots.txt file content for crunchbase.com.

Robot.txt file for: crunchbase.com

      User-agent: *

# Allow API and JS paths to be requested by crawlers
Allow: /v4/md/applications/crunchbase
Allow: /*.js$

Disallow: /login
Disallow: /register
Disallow: /account
Disallow: /account/invite
Disallow: /reset-password
Disallow: /subscriptions
Disallow: /contribute
Disallow: /add-new
Disallow: /edit
Disallow: /edit/success
Disallow: /edit/review
Disallow: /buy
Allow: /buy/select-product
Disallow: /account-setup
Disallow: /verify
Disallow: /admin
Disallow: /v4
Disallow: /home
Disallow: /search
Disallow: /discover

# AI and LLM Crawling
User-agent: CCBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Omgilibot
Disallow: /

User-agent: Omgili
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: ImagesiftBot 
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: PerplexityBot
Disallow: /

Sitemap: https://www.crunchbase.com/www-sitemaps/sitemap-index.xml