Sources
Curated links move to the Sources block on the page that actually uses them — every playbook, template, briefing, and news issue carries one. This page is for the genuinely cross-cutting libraries: docs, crawler identity, and tools that don’t belong to any single page.
Official documentation
Search engines
- Google Search Central — the canonical Google Search doc tree.
- Google Search Essentials — the baseline technical, spam, and key-practice rules for eligibility.
- How Google Search Works — the crawl, index, and ranking pipeline at the system level.
- Google Search Central Blog — official announcements, deprecations, and clarifications.
- Google Search ranking systems guide — the canonical list of active ranking systems; don’t conflate “systems,” “spam policies,” and “core updates.”
- Google spam policies — cloaking, doorway pages, link spam, machine-generated content, scraped content.
- Google Search update history — confirmed dates for core, spam, and product-review updates.
- Google manual actions documentation — how to identify a manual action and file reconsideration.
- Google Search Quality Rater Guidelines (PDF) — the human evaluation framework E-E-A-T and YMYL are defined in.
- Google Search Status Dashboard — check before treating a site issue as a Google-side incident.
- Bing Webmaster Guidelines — Microsoft’s core search rules and quality expectations.
- Bing Webmaster Blog — Microsoft’s official announcement channel.
AI features and crawler documentation
- Google AI features and your website — AI Overviews, AI Mode, and the site controls that actually apply.
- About AI Overviews — Google’s help-center explanation of how AI Overviews work.
- Google: creating helpful, reliable, people-first content — the closest official statement of what Google rewards at the content level.
- Google: common crawlers, including Google-Extended — Googlebot variants and the AI-training opt-out token, explained.
- OpenAI bot documentation — GPTBot, OAI-SearchBot, OAI-AdsBot, and ChatGPT-User, and which pipeline each controls.
- Anthropic: does Anthropic crawl data from the web — ClaudeBot policy and opt-out.
- PerplexityBot documentation — user-agent, IP ranges, and robots.txt handling.
- About Applebot — Apple’s crawler documentation for Siri and Apple Search.
Standards and structured data
- Schema.org — the root vocabulary for structured data types and properties.
- Google: general structured data guidelines — technical and quality requirements for rich-result eligibility.
- RFC 9309 — Robots Exclusion Protocol — the formal standard for robots.txt behavior.
- Google: robots.txt introduction — Google’s crawler-specific interpretation.
- Google: robots meta tag and X-Robots-Tag — page-level and header-level crawl and snippet directives.
- Google: JavaScript SEO basics — crawl, render, index as distinct phases.
- Sitemaps Protocol — the reference spec for sitemap and sitemap-index XML.
- IndexNow documentation — implementation details for keys, endpoints, and submission mechanics.
- Google: redirects and Google Search — permanent vs. temporary redirect handling.
Entity and knowledge graph
- Google Knowledge Graph API — authentication, query syntax, and response format for
kgsearch.googleapis.com. - Wikidata — structured knowledge base feeding Google’s Knowledge Graph and LLM grounding.
- Wikidata SPARQL Query Service — full SPARQL endpoint for structured entity queries.
- Wikidata notability policy — required before creating a new entity entry.
- Google: claim a Knowledge Panel — the correction pathway for verified owners.
Commerce
- Google Merchant Center product feed spec — the core spec for Google Shopping and merchant data ingestion.
- Schema.org Product — the on-page product vocabulary a feed should agree with.
Crawler user-agents & verified IP ranges
OAI-SearchBot(ChatGPT Search),GPTBot(training),OAI-AdsBot(ad-page verification),ChatGPT-User(user-triggered) — four independent OpenAI tokens; see OpenAI bot documentation for current IP ranges.Googlebot(Search indexing, drives AI Overviews/AI Mode) vs.Google-Extended(Gemini training opt-out only, no Search effect) — see Google’s common crawlers.- Googlebot IP ranges (JSON) — published, updated by Google.
Bingbot(Bing Search and Copilot, one token for both) — see Bing: which crawlers does Bing use; no comprehensive static IP file, verify via reverse DNS to*.search.msn.com.- Verify Bingbot — Microsoft’s authenticity-check tool; don’t trust the user-agent string alone.
PerplexityBot— see PerplexityBot documentation for published IP ranges.ClaudeBot— Anthropic publishes crawler policy but not a stable standalone IP file; combine UA validation with log review.Applebot— see About Applebot; Apple publishes an IP file.- ASN reference: Google is
AS15169, Microsoft isAS8075. ASN matching is useful for triage, not proof of bot identity — pair it with UA and, where available, reverse-DNS or published-IP verification. IP ranges move; a WAF rule with no review cadence becomes a silent outage.
Tool landscape
First-party reporting
- Google Search Console — Google reflects AI-feature activity inside standard web reporting, not a standalone AI dashboard.
- Google Search Console: Performance report documentation — what impressions, clicks, and position include and exclude.
- Google Analytics — for judging whether a visibility change turned into actual visits or business behavior.
- Bing Webmaster Tools — currently the strongest first-party AI citation reporting surface (AI Performance).
AI visibility monitoring (third-party, directional — not ground truth)
- Profound — repeated brand and citation monitoring across AI answer environments.
- seoClarity — AI visibility inside a broader enterprise SEO workflow.
- Ahrefs Brand Radar — brand mention and visibility-pattern tracking.
- Semrush — AI visibility alongside conventional competitive and keyword workflows.
- OtterlyAI — lighter prompt tracking and recurring answer checks.
- Peec AI — narrower monitoring around AI answer visibility and citation movement.
- Scrunch — directional monitoring for repeated prompt and answer checks.
Crawling, validation, and monitoring
- Screaming Frog SEO Spider — the default crawler for hands-on technical audits and extraction diffing.
- Screaming Frog Log File Analyser — inspect crawl behavior from logs.
- Botify — enterprise crawl and log-analysis tooling for sites past desktop-crawler scale.
- Google Rich Results Test — validator for rich-result eligible markup.
- Schema Markup Validator — broader Schema.org syntax and structure validation, independent of Google’s parser.
- Google PageSpeed Insights — Core Web Vitals field and lab comparison for a single URL.
- Chrome UX Report (CrUX) — public field performance data at origin and URL-pattern level; field data decides Core Web Vitals pass/fail, lab tools are for debugging.
- WebPageTest — request-level detail, filmstrips, repeatable throttled tests.
- Lighthouse — repeatable lab audits; not a substitute for field data.
- web-vitals JavaScript library — Google’s own real-user-monitoring measurement library.
Research and competitive tools
- Ahrefs — backlinks, content-gap work, competitive research.
- SISTRIX — particularly useful in some international markets.
- Similarweb — market share, traffic-shape, and category movement rather than page-level diagnosis.
- Google Trends — directional demand shifts, not absolute keyword volume.
Answer engines and AI search surfaces worth tracking
- ChatGPT Search, Perplexity, Microsoft Copilot, Google AI Mode — the primary surfaces most citation and extraction work targets.
- Brave Search, DuckDuckGo, You.com — worth checking when independent-index or privacy-oriented behavior is part of the audience mix, not a default priority.
- International: Baidu (China), Yandex (Russia/nearby), Naver (South Korea) — relevant only when geography or language makes them real traffic surfaces.
Industry publications (context and synthesis — verify against primary sources above)
- Search Engine Land, Search Engine Roundtable, Search Engine Journal — fast reporting; a good first stop, not the final word.
- Ahrefs Blog — practical studies and hands-on experimentation.
Rule of readmission: a link removed from this page returns only with evidence — a request, a citation, or usage data — never for catalog completeness.