Edge Worker Bot Management
Last reviewed:
Edge rules execute before origin and before
robots.txtis fetched — a 403 here is final and overrides anyrobots.txtallow directive, regardless of what the file says. Use deliberately, and only when robots.txt genuinely isn’t enough (ignoring bots, rate-limiting before origin degradation, differential responses to bot UAs, routing bots to a cached version).
Cloudflare Worker
// Edge Worker Bot Management
// Deploy as a Cloudflare Worker. Adapt for Fastly VCL or Akamai EdgeWorkers.
// CONFIGURE: UA substrings for training crawlers you want to block
const TRAINING_BOTS = [
"GPTBot",
"Google-Extended",
"ClaudeBot",
"Applebot-Extended",
"CCBot",
"FacebookBot",
"anthropic-ai"
];
// CONFIGURE: UA substrings for AI search crawlers you want to allow
const SEARCH_BOTS = [
"OAI-SearchBot",
"PerplexityBot",
"YouBot",
"Amazonbot"
];
// CONFIGURE: "block" | "allow" | "redirect"
const TRAINING_ACTION = "block";
const SEARCH_ACTION = "allow";
const REDIRECT_URL = "https://example.com/bot-policy"; // used only when action = "redirect"
export default {
async fetch(request) {
const ua = request.headers.get("User-Agent") || "";
const isTraining = TRAINING_BOTS.some(bot => ua.includes(bot));
const isSearch = SEARCH_BOTS.some(bot => ua.includes(bot));
if (isTraining) {
if (TRAINING_ACTION === "block") return new Response("Forbidden", { status: 403 });
if (TRAINING_ACTION === "redirect") return Response.redirect(REDIRECT_URL, 301);
}
if (isSearch) {
if (SEARCH_ACTION === "block") return new Response("Forbidden", { status: 403 });
if (SEARCH_ACTION === "redirect") return Response.redirect(REDIRECT_URL, 301);
}
// Default: pass through to origin
return fetch(request);
}
};
Fastly VCL snippet
// Fastly VCL equivalent — add to vcl_recv
sub vcl_recv {
// Block training crawlers
if (req.http.User-Agent ~ "GPTBot|Google-Extended|ClaudeBot|Applebot-Extended|CCBot") {
error 403 "Forbidden";
}
}
Field notes
- UA-only matching is spoofable — for high-trust decisions, combine UA matching with IP range verification; all major AI crawlers publish IP range files or ASNs.
- Separate training and search bot lists —
GPTBot(training) andOAI-SearchBot(ChatGPT Search real-time) are different tokens requiring independent decisions; conflating them is the most common misconfiguration. - Keep the UA string list current — outdated lists silently fail to catch new or renamed crawlers.
- Log 403s from this layer separately from origin 403s — edge-level blocks are invisible to most server-side log analysis unless explicitly forwarded to a logging sink.
- Akamai EdgeWorkers uses the same fetch/response pattern; UA access is
request.getHeaders().get("User-Agent").