Free tools

Robots.txt Examples

Robots.txt file content for cornell.edu.

Robot.txt file for: cornell.edu

      User-agent: *
Crawl-Delay: 6
Disallow: /_dynamic_files/
Disallow: /_tasks/
Disallow: /test/
Disallow: /tools/
Disallow: /template/
Disallow: /search/
Disallow: /visit/plan/
Disallow: /video/kaltura/
Disallow: /video/tasks/
Disallow: /server-health-check/


# SiteImprove should ignore these page particularly because they aren't actually used, but are still linked for historical reasons
User-agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0) SiteCheck-sitecrawl by Siteimprove.com
Disallow: /cuinfo/specialconditions/
Disallow: /_includes/header.cfm

User-agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0) LinkCheck by Siteimprove.com
Disallow: /cuinfo/specialconditions/
Disallow: /_includes/header.cfm

User-agent: HTML validator: Siteimprove_W3C_Validator/1.3
Disallow: /cuinfo/specialconditions/
Disallow: /_includes/header.cfm

User-agent: CSS Validator: Jigsaw/2.3.0 W3C_CSS_Validator_JFouffa/2.0
Disallow: /cuinfo/specialconditions/
Disallow: /_includes/header.cfm