Technical SEO

Foundations Crawling Rendering + Performance International Monitoring + QA Quick Actions

Search Foundations and Governance

Core Manuals

Google Search Essentials: Technical Requirements Overview
Indexability Requirements: HTTP Status, Robots Rules, and Content Accessibility
Google Search Spam Policies and Quality Guidelines
How Google Search Works: Crawling, Rendering, Indexing, and Serving
Google Search Console: Setup, Verification, and Baseline Monitoring
Google Search Console URL Inspection Tool: Debugging Crawl and Indexing Issues
Bing Webmaster Guidelines and Indexing Requirements
Bing Webmaster Tools: Setup, Verification, and Baseline Monitoring
Bing Webmaster Tools URL Inspection and Submission Tool

Webmaster Platforms

Primary Documentation

Google Search Essentials - Google's core technical requirements for crawling, indexing, and ranking
Google URL Inspection Tool documentation - URL-level index status, rendering, and coverage diagnostics
Bing Webmaster Guidelines - Bing's technical requirements, including March 2026 AI and grounding guidance
Bing Webmaster Tools - Index coverage, crawl diagnostics, and AI performance reporting

Crawling, Discovery, and URL Control

XML Sitemaps and Discovery

XML Sitemap Creation and Submission
Sitemap Index Files for Large Sites
Image Sitemaps for Content Discovery
News Sitemaps for Google News Eligibility

Robots.txt and Crawl Control

robots.txt directives (RFC 9309): User-agent, Disallow, Allow, Sitemap; Crawl-delay honored by Bing only, ignored by Google
Meta robots directives: noindex, nofollow, nosnippet, noarchive, max-snippet:[n], max-image-preview:[setting], max-video-preview:[n] — placed in <meta name="robots"> or per-bot (googlebot, bingbot)
X-Robots-Tag: HTTP header equivalent of meta robots — use for PDFs, images, and other non-HTML resources
Crawl budget: Prioritize unique, valuable URLs; reduce crawl waste from parameterized duplicates, infinite scroll traps, and session-ID URLs
Faceted navigation: Block or noindex low-value filter combinations; canonical parameterized pages to the primary category URL; keep core category paths crawlable
Internal link crawlability: Use real <a href> links in navigation — JavaScript-only links, onclick handlers, and CSS-rendered menus may not be followed by crawlers
data-snippet (Bing-specific): HTML attribute applied to any element to designate exactly which text Bing may display or cite in Copilot answers — more precise than NOSNIPPET, which blocks snippets entirely

IndexNow Protocol

IndexNow: Real-Time URL Submission to Search Engines
Submit changed URLs immediately to Bing, Yandex, and other participating search engines
Faster discovery than waiting for next crawl
Use accurate lastmod values in XML sitemaps — sitemaps should reflect the last meaningful content change, not the date the sitemap was regenerated
Support HTTP freshness signals (ETag, Last-Modified) where available; these allow crawlers to detect changes efficiently without full re-fetching
See also: AI SEO Crawlability for AI-specific submission workflows

Canonicalization and URL Consolidation

Self-referencing canonical: Every indexable page should include <link rel="canonical" href="..."> pointing to itself
One canonical per page: Multiple conflicting canonicals cause engines to ignore all of them
HTTPS preferred: Canonical URL should always be the HTTPS version; HTTP should 301 to HTTPS
Consistent URL format: Match trailing slashes, lowercase, and www/non-www across canonical, internal links, and sitemap
Don't canonical to redirects: The canonical target must return 200, not a redirect
Cross-domain canonical: Supported — useful for syndicated content, but the non-canonical URL may still be indexed
HTTP header canonical: Use Link: <url>; rel="canonical" for PDFs, images, and non-HTML resources
Stable, consistent canonical URLs support long-term citation continuity in Bing/Copilot and other answer engines — URL instability breaks grounding references even when content is unchanged

HTTP Redirects and Status Codes

HTTP Redirects for SEO: 301, 302, 307, 308, Meta Refresh, and JavaScript Redirects
Site Moves and Domain Migrations
HTTP Status Codes for SEO
Soft 404 Detection and Prevention
Removed Content Handling: 404, 410, and noindex

Indexing and Snippet Controls

noindex placement: Add <meta name="robots" content="noindex"> in HTML <head>, or send X-Robots-Tag: noindex as an HTTP header for non-HTML resources
noindex vs. canonical: Use noindex to remove a page from the index entirely; use rel="canonical" to consolidate signals toward a preferred duplicate — do not combine both on the same page
Pagination: Ensure all paginated pages are linked via crawlable <a href> links; rel="next/prev" is deprecated but still treated as a hint by some engines; avoid noindexing deep pages that contain unique products or content
Snippet-control directives (NOSNIPPET, data-nosnippet, NOCACHE, NOARCHIVE) affect AI answer preview richness and citation depth, especially in Bing/Copilot — review these controls specifically for grounding eligibility, not only for standard snippet suppression
Google Query Fan-Out: AI Overviews and AI Mode may issue multiple related sub-queries across subtopics; well-interlinked subtopic pages are more discoverable through fan-out than through single-query indexing alone

Implementation and Migration Guides

New Site Launch SEO Playbook: Pre-Launch Checklist and Go-Live Process
Site Migration SEO Playbook: Redirects, Canonicals, Crawl, and Monitoring

Crawling and Site Auditing Tools

Rendering, JavaScript, Mobile, and Performance

JavaScript SEO

Rendering pipeline: Google crawls HTML first, then queues pages for rendering (can take seconds to days) — content requiring JS to appear is not indexed until rendered
SSR / SSG preferred: Server-side rendering or static-site generation ensures content is in the initial HTML response; client-side rendering delays discovery
Dynamic rendering: Serve pre-rendered HTML to bots if SSR is not feasible — treat as a temporary workaround, not a long-term solution
Critical content check: Primary text, links, canonicals, meta robots, and structured data must all appear in the rendered HTML — test with URL Inspection → "View Tested Page"
Lazy-loading risks: Content behind scroll-triggered or click-triggered lazy loading may never be rendered by Googlebot — use native loading="lazy" on images only, not on primary text content
JS-generated SEO signals: Canonicals, hreflang, and structured data injected by JavaScript are supported but riskier — render delays and errors can cause them to be missed
SPA routing: Use History API with real URLs (not hash fragments); each route must return a unique, fully-rendered page with correct status codes
Testing: Compare raw HTML source vs. rendered HTML in URL Inspection; use Puppeteer, Rendertron, or DevTools to debug render issues locally

HTML5 and Accessibility

Semantic landmarks: Use <main> (one per page — primary content), <article> (self-contained piece), <section> (thematic grouping with heading), <nav> (navigation blocks), <header>/<footer> (page or section level), <aside> (tangentially related content)
Why it matters for SEO: Semantic elements help search engines identify primary content vs. boilerplate — <main> signals the core content area, <nav> identifies navigation links, <aside> marks secondary content
ARIA landmarks: Use role="banner", role="navigation", role="main", role="contentinfo" when semantic HTML elements are not available — avoid redundant ARIA on elements that already have implicit roles
Heading hierarchy: One <h1> per page, logical nesting (h2 → h3 → h4), no skipped levels — headings are strong relevance signals
Cookie banner / CMP risks: Consent banners rendered via JavaScript can inject content above <main> that shifts layout (CLS), block rendering of primary content, or add noindex-like behavior if misconfigured — test rendered HTML with and without consent

Mobile-First Indexing

Default crawl agent: Google uses the mobile Googlebot (smartphone user agent) for all crawling and indexing — the desktop version of your site is not what gets indexed
Content parity required: All content, links, structured data, meta tags, and alt text must be present on the mobile version — desktop-only content will not be indexed
Responsive design preferred: Single URL, same HTML, CSS adapts layout — simplest to maintain and least error-prone
Dynamic serving: Same URL but different HTML per user agent — must use Vary: User-Agent header; higher risk of parity drift
Separate mobile URLs (m.*): Supported but complex — requires bidirectional annotations (rel="alternate" / rel="canonical"), easy to misconfigure
Verification: Use URL Inspection in GSC to confirm the mobile render matches expectations — check that no content, links, or structured data are missing from the mobile version

Core Web Vitals and Performance

LCP (Largest Contentful Paint) — target < 2.5 s: Optimize hero image (compress, use modern format, preload), reduce server response time (TTFB), eliminate render-blocking CSS/JS, use CDN for static assets
INP (Interaction to Next Paint) — target < 200 ms: Break long JavaScript tasks into smaller chunks, defer non-critical scripts, reduce main-thread work, use requestIdleCallback for low-priority work, minimize DOM size
CLS (Cumulative Layout Shift) — target < 0.1: Set explicit width / height on images and embeds, reserve space for ads and dynamic content, avoid injecting content above the fold after load, use font-display: swap with size-adjusted fallback fonts
Field vs. lab data: Field data (CrUX / real users) determines ranking impact — lab data (Lighthouse / WebPageTest) is for debugging only; a page can pass lab tests but fail in the field due to real-world device and network conditions
Page experience signals: CWV + HTTPS + no intrusive interstitials + mobile-friendly = page experience; these are tie-breaker signals, not dominant ranking factors
Crawl efficiency: Server response time affects crawl rate — slow TTFB reduces how many pages Googlebot can crawl per session; 5xx errors and timeouts waste crawl budget

Performance and Rendering Tools

International and Multi-Regional SEO

Hreflang and International SEO

Hreflang syntax: <link rel="alternate" hreflang="en-US" href="…" /> — language code (ISO 639-1) optionally followed by region code (ISO 3166-1 Alpha 2)
x-default: Use hreflang="x-default" for the fallback page (usually language selector or global homepage) — required for complete hreflang sets
Placement options: HTML <head> (most common), HTTP Link header (for non-HTML like PDFs), or XML sitemap (best for large-scale sites) — pick one method per URL set
Return links required: Every hreflang annotation must be reciprocal — if page A references page B, page B must reference page A; missing return links cause the annotation to be ignored
URL structure options: ccTLDs (strongest geo signal, highest cost), subdomains (gTLD + subdomain, moderate signal), subdirectories (simplest, relies on GSC geotargeting) — all are valid, choose based on infrastructure
Canonical + hreflang alignment: The canonical URL of each page must be the URL used in hreflang annotations — if a page canonicalizes to a different URL, its hreflang is ignored
Common mistakes: Mixing language-only and language+region codes inconsistently, using country codes where language codes are needed (e.g., hreflang="uk" is wrong — use hreflang="en-GB"), forgetting self-referencing hreflang
Avoid IP-based redirects: Do not redirect crawlers based on IP geolocation — Googlebot crawls primarily from the US; geo-redirects can prevent non-local versions from being indexed

Monitoring, QA, Log Analysis, and SEO Operations

Monitoring and QA Workflows

GSC Performance report: Monitor clicks, impressions, CTR, and position by query, page, country, device — check weekly for sudden drops; compare date ranges to detect regressions
GSC Page Indexing report: Review "Not indexed" reasons (crawled but not indexed, discovered but not crawled, noindex, redirect, 404) — prioritize fixing pages that should be indexed
GSC Rich Results report: Monitor valid/invalid structured data items — errors here mean lost rich result eligibility; check after any schema deployment
GSC Crawl Stats report: Review total crawl requests, response codes, file types, crawl purpose (discovery vs. refresh) — sudden crawl drops may signal server issues or robots.txt changes
GSC Sitemaps report: Confirm submitted sitemaps are processed, track discovered vs. indexed URL counts, watch for errors in sitemap parsing
Bing WMT equivalents: Search Performance (clicks, impressions, CTR), Crawl Information (crawl activity, errors), SEO Reports (site scan for common issues), URL Inspection
Video Indexing report: Check coverage for pages with VideoObject schema — specific issues include "video not detected" and "thumbnail missing"
Structured data QA: After deploying schema changes, validate with Rich Results Test, then monitor GSC Rich Results report for 2–4 weeks for new errors
Recrawl request workflow: Use URL Inspection → "Request Indexing" in GSC for urgent pages; use IndexNow for Bing; submit updated sitemaps for bulk changes

Regression Testing and QA

Post-deploy checks: After every release, verify: canonical tags intact, meta robots unchanged, structured data still valid, redirects still working, rendered HTML matches expectations
Staging environment controls: Block staging from crawlers using noindex + robots.txt Disallow + password protection (belt-and-suspenders) — a single method alone is not reliable enough
CMS template governance: Embed SEO fields (title, meta description, canonical, schema, robots) directly into page templates — do not rely on manual entry per page; template changes should trigger SEO review
Automated regression tests: Add SEO assertions to CI/CD: check for <title>, canonical, noindex absence on production pages, valid JSON-LD, correct HTTP status codes
Post-redesign validation: Compare pre/post crawl data (Screaming Frog diff) for canonical changes, missing pages, orphaned URLs, broken internal links, lost structured data
Component library QA: If structured data is embedded in reusable components, test each component variant in isolation — a bug in a shared component affects every page using it

Log File Analysis

What to extract: Bot user agent, requested URL, response code, response time, bytes sent, timestamp — filter to known search bot user agents for SEO analysis
Bot segmentation: Separate verified Googlebot, Bingbot, OAI-SearchBot, Applebot, PerplexityBot from unknown bots and scrapers — use reverse DNS verification (not just user-agent strings) for accurate segmentation
Crawl frequency analysis: Group crawl hits by URL pattern (template type, directory, depth) — identify which sections get crawled most/least; compare against sitemap priority
Crawl waste identification: Flag URLs that consume crawl budget without value: parameterized duplicates, paginated archives, soft 404s, trapped faceted navigation, calendar/infinite scroll pages
Correlate with index coverage: Cross-reference log data with sitemap URLs and GSC Page Indexing report — pages in sitemap but never crawled indicate discovery problems; frequently crawled but not indexed suggests quality issues
Migration and launch monitoring: During site migrations, monitor logs in real time for: old URL crawl drop-off, new URL crawl pickup, redirect chain hits, unexpected 404/410 spikes, crawl rate changes

Bot Validation and Management

Googlebot verification: Reverse DNS lookup on the IP must resolve to *.googlebot.com or *.google.com, then forward DNS must resolve back to the original IP
Bingbot verification: Reverse DNS must resolve to *.search.msn.com; Bing also publishes IP ranges in Bing Webmaster Tools documentation
AI search bot IPs: OAI-SearchBot publishes ranges in searchbot.json; Applebot publishes ranges in applebot.json — allowlist these in WAF/CDN rules
Spoofed user-agent detection: Any bot claiming to be Googlebot or Bingbot but failing reverse+forward DNS is spoofed — block or deprioritize
WAF/CDN allowlisting: Verify that rate limiting, bot mitigation, and CAPTCHA rules do not accidentally block legitimate search and AI crawlers
Log segmentation: Segment server logs by verified bot (Googlebot, Bingbot, OAI-SearchBot, Applebot, PerplexityBot) vs. unknown bots to distinguish valuable crawl activity from scraping and abuse
Bing explicitly ties crawl waste (duplicate low-value URLs, thin pages, excessive pagination) to reduced indexing depth and lower grounding eligibility — crawl budget is not only a traditional SEO concern but also an AI visibility concern

CDN and Caching for SEO

Cache-Control: Use Cache-Control: public, max-age= for static assets; for HTML pages, use short TTLs or no-cache with ETag / Last-Modified to balance freshness and performance
Vary header: Use Vary: User-Agent if serving different HTML per device (dynamic serving) — without it, CDN may serve mobile HTML to Googlebot desktop or vice versa
ETag / Last-Modified: Enable conditional requests (If-None-Match, If-Modified-Since) so crawlers get 304 Not Modified for unchanged content — reduces crawl load
Stale content risk: Aggressive caching can serve outdated HTML (old canonicals, removed noindex, stale structured data) to crawlers — purge cache after SEO-critical changes
Cache invalidation: After publishing content updates, redirects, or schema changes, invalidate CDN cache for affected URLs immediately — delayed purging can cause crawlers to see old content for hours or days

Edge Routing and Redirects

Header forwarding: Ensure origin SEO headers (X-Robots-Tag, Link rel="canonical", Vary) pass through edge/CDN layers unchanged — CDN stripping or overwriting headers is a common silent failure
Canonical preservation across redirects: When edge rules add/remove trailing slashes, force HTTPS, or normalize domains, ensure the final canonical URL matches the redirect destination — mismatches confuse indexing
Redirect chain management: Edge redirects stacked on origin redirects create chains — audit total hops regularly; aim for single-hop redirects from any old URL to the current canonical
Geo-based routing risks: IP-based geo-redirects at the edge can block Googlebot (US IPs) from reaching non-US content — use hreflang instead; if geo-routing is required, exempt known bot IPs
Query parameter normalization: Edge rules that strip, sort, or rewrite query parameters must preserve parameters needed for tracking and canonical consistency — test with crawl tools after deploying new rules
Edge rewrite validation: URL rewrites at the edge (e.g., path mapping, vanity URLs) must produce correct status codes and consistent canonical signals — test both browser and bot user agents

Response Debugging for SEO

Origin vs. edge comparison: Fetch the same URL directly from origin and through CDN — compare response headers (especially X-Robots-Tag, canonical, status code) and HTML body to detect CDN-introduced discrepancies
5xx diagnosis: Intermittent 5xx errors in GSC Crawl Stats indicate server or edge instability — correlate with CDN logs, origin health checks, and deployment timestamps; Googlebot may reduce crawl rate after repeated 5xx
Header and HTML parity: Compare responses across browser, Googlebot user agent, and CDN edge — look for differences in rendered content, meta tags, status codes, and headers that could cause differential indexing
Render blocking: CDN or WAF rules that block JavaScript, CSS, or font resources from Googlebot cause incomplete rendering — check GSC URL Inspection "Page resources" for blocked resource errors
Incident response for crawl drops: If GSC shows sudden crawl or indexing drops, check in order: CDN/edge config changes, robots.txt accessibility, DNS resolution, origin server health, recent deploys that may have introduced noindex or redirect loops

Log Analysis Tools

Quick Actions

URL Not Indexing?

Run URL Inspection in GSC
Check HTTP status (must be 200)
Review robots.txt rules
Check for noindex tag
Verify canonical points to itself
Test rendered HTML
Request indexing

Site Migration Checklist

Map all old URLs to new URLs
Implement 301 redirects
Update internal links
Update XML sitemap
Submit new sitemap to GSC/Bing
Monitor crawl stats daily
Track indexation status

Core Web Vitals Issues?

Run PageSpeed Insights
Check LCP (image optimization)
Check INP (JavaScript execution)
Check CLS (layout shifts)
Review field data in CrUX
Test on real devices
Monitor GSC Core Web Vitals report