Free tools

Robots.txt Examples

Robots.txt file content for columbia.edu.

Robot.txt file for: columbia.edu

      # ignore this line - 1
# for info on robots.txt syntax see
# http://www.searchtools.com/robots/robots-txt.html

User-agent: *
Disallow: /cgi-bin/
Disallow: /acis/whatsnew.html
Disallow: /httpd/reports/
Disallow: /itc/ccnmtl/assets/

## New Homepage ##
# Directories
Disallow: /content/core/
Disallow: /content/profiles/
# Files
Disallow: /content/README.txt
Disallow: /content/web.config
# Paths (clean URLs)
Disallow: /content/admin/
Disallow: /content/comment/reply/
Disallow: /content/filter/tips/
Disallow: /content/node/add/
Disallow: /content/search/
Disallow: /content/user/register/
Disallow: /content/user/password/
Disallow: /content/user/login/
Disallow: /content/user/logout/
# Paths (no clean URLs)
Disallow: /content/index.php/admin/
Disallow: /content/index.php/comment/reply/
Disallow: /content/index.php/filter/tips/
Disallow: /content/index.php/node/add/
Disallow: /content/index.php/search/
Disallow: /content/index.php/user/password/
Disallow: /content/index.php/user/register/
Disallow: /content/index.php/user/login/
Disallow: /content/index.php/user/logout/