robots.txt Patterns
Last reviewed:
One robots.txt file at the site root. Adjust the relevant block, then verify your CDN or WAF isn’t enforcing a conflicting policy — robots.txt is crawler guidance, not access control, and infrastructure rules can override it in either direction.
Allow all crawlers
User-agent: *
Disallow:
Sitemap: https://example.com/sitemap.xml
Block common internal and low-value SEO paths
User-agent: *
Disallow: /admin/
Disallow: /cart/
Disallow: /checkout/
Disallow: /login/
Disallow: /search
Sitemap: https://example.com/sitemap.xml
Block a directory but allow required assets
User-agent: *
Disallow: /private-reports/
Allow: /private-reports/public-summary.pdf
Allow: /private-reports/assets/
Sitemap: https://example.com/sitemap.xml
Block parameter and file-type patterns for supported bots
User-agent: Googlebot
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /*.pdf$
User-agent: Bingbot
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /*.pdf$
Sitemap: https://example.com/sitemap.xml
Block AI training, keep search access open
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: OAI-SearchBot
Allow: /
Sitemap: https://example.com/sitemap.xml
Block ChatGPT Search specifically
User-agent: OAI-SearchBot
Disallow: /
Sitemap: https://example.com/sitemap.xml
Field notes
GPTBotandOAI-SearchBotare separate tokens — blocking one does not block the other.Google-Extendedcontrols training use only; it does not disable Google Search crawling, AI Overviews, or AI Mode.Bingbotcovers both Bing Search and Bing-powered AI surfaces, so one block affects both.- robots.txt manages crawl access to low-value or sensitive paths — it does not deindex a public URL. Use
noindex, redirects, or removal workflows for that. - Wildcard
*and end-match$rules are reliable for Google and Bing, not a universal baseline for every bot. - One
Sitemap:line per sitemap file when publishing multiple sitemaps. - Test the actual bot path through your CDN, firewall, and origin after any change — the text file is not the whole policy.