Machine-Readable Infrastructure

robots.txt Patterns

Last reviewed:

One robots.txt file at the site root. Adjust the relevant block, then verify your CDN or WAF isn’t enforcing a conflicting policy — robots.txt is crawler guidance, not access control, and infrastructure rules can override it in either direction.

Allow all crawlers

User-agent: *
Disallow:

Sitemap: https://example.com/sitemap.xml

Block common internal and low-value SEO paths

User-agent: *
Disallow: /admin/
Disallow: /cart/
Disallow: /checkout/
Disallow: /login/
Disallow: /search

Sitemap: https://example.com/sitemap.xml

Block a directory but allow required assets

User-agent: *
Disallow: /private-reports/
Allow: /private-reports/public-summary.pdf
Allow: /private-reports/assets/

Sitemap: https://example.com/sitemap.xml

Block parameter and file-type patterns for supported bots

User-agent: Googlebot
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /*.pdf$

User-agent: Bingbot
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /*.pdf$

Sitemap: https://example.com/sitemap.xml

Block AI training, keep search access open

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: OAI-SearchBot
Allow: /

Sitemap: https://example.com/sitemap.xml

Block ChatGPT Search specifically

User-agent: OAI-SearchBot
Disallow: /

Sitemap: https://example.com/sitemap.xml

Field notes

  • GPTBot and OAI-SearchBot are separate tokens — blocking one does not block the other. Google-Extended controls training use only; it does not disable Google Search crawling, AI Overviews, or AI Mode. Bingbot covers both Bing Search and Bing-powered AI surfaces, so one block affects both.
  • robots.txt manages crawl access to low-value or sensitive paths — it does not deindex a public URL. Use noindex, redirects, or removal workflows for that.
  • Wildcard * and end-match $ rules are reliable for Google and Bing, not a universal baseline for every bot.
  • One Sitemap: line per sitemap file when publishing multiple sitemaps.
  • Test the actual bot path through your CDN, firewall, and origin after any change — the text file is not the whole policy.