AI Platform Rules Center

Official guidance, normalized into one operational playbook.

This page tracks platform-level recommendations that affect AEO, agent visibility, and executable web actions. We only include primary sources and map them to implementation steps teams can ship this sprint.

Platforms Tracked

Bots / Surfaces

Official Sources

Change Events Logged

Last Global Verification

2026-02-18

Maintained By

AgentSurge Rules Engine

Update Policy

Primary-source docs are re-validated weekly. Critical bot-policy shifts are pushed as immediate alerts.

Run Free AI Audit API + Integration Docs

Search / AEOWeekly

Google Search + AI Features

Eligibility and readability in AI Overviews / AI Mode are driven by core Search quality, crawlability, and snippet controls.

Last Verified

2026-02-18

Official Guidance

Google states no extra AI-specific technical requirement is needed beyond standard Search guidance.
Snippet and indexing controls (noindex, nosnippet, max-snippet) apply to AI features too.
Google-Extended can be controlled separately for Gemini training/grounding use cases.

AgentSurge Implementation Checklist

Keep key pages crawlable and indexable (products, docs, policy pages).
Align structured data with visible content to avoid trust loss.
Track snippet directives and noindex drift after deploys.

Robots / Access Notes

• Use explicit allow/deny blocks per bot family.
• Avoid blanket Disallow rules on core commerce and docs paths.
• Audit robots and meta robots together to prevent contradictory states.

Bot/IP note: Use documented Google crawler verification methods where bot authenticity is required at the edge.

GooglebotGooglebot-ImageGoogle-Extended

Rule Change Log

2026-02-18
High
Re-validated AI features guidance: no additional AI-only file requirements introduced.

Primary Sources

Model + Search SurfacesWeekly

OpenAI (ChatGPT + GPTBot)

Separate visibility control for ChatGPT search surfaces and model-training crawls, with deterministic robots policy.

Last Verified

2026-02-18

Official Guidance

OpenAI documents multiple bots with distinct purposes (search, training, user-triggered fetch).
Bots are expected to respect robots.txt directives.
Official bot/IP references are published for infrastructure filtering and monitoring.

AgentSurge Implementation Checklist

Allow OAI-SearchBot on high-intent pages if ChatGPT visibility is desired.
Configure GPTBot policy separately based on training preference.
Log bot behavior and cache freshness to detect stale retrieval risk.

Robots / Access Notes

• Keep policy explicit: do not rely on ambiguous wildcard-only bot rules.
• Review robots after major route changes and checkout flow updates.
• Avoid accidental block of API-backed public product content.

Bot/IP note: Prefer official OpenAI network guidance for WAF allowlists where strict bot origin validation is required.

OAI-SearchBotGPTBotChatGPT-User

Rule Change Log

2026-02-18
High
Confirmed split-control guidance between OAI-SearchBot and GPTBot remains intact.

Primary Sources

Model + Search SurfacesWeekly

Anthropic (Claude)

Consistent crawler policy and canonical content structure improve long-context answer fidelity in Claude experiences.

Last Verified

2026-02-18

Official Guidance

Anthropic documents bot families and indicates robots directives are respected.
Crawler controls are intended to be managed via robots.txt and standard web controls.
Different bot intents (training/search/user request) should be governed separately.

AgentSurge Implementation Checklist

Keep canonical policy and FAQ pages stable with clear heading hierarchy.
Provide concise machine-readable summaries for policy-critical pages.
Separate allow rules for search retrieval vs broader crawl preferences.

Robots / Access Notes

• Set per-bot rules instead of broad allow/deny where possible.
• Preserve access to product/legal/support pages when optimization is intended.
• Re-check robots after CDN, firewall, or geo rules are changed.

Bot/IP note: Do not rely solely on static IP assumptions; pair robots policy with runtime traffic validation.

ClaudeBotClaude-SearchBotClaude-User

Rule Change Log

2026-02-18
Medium
Re-checked crawler policy and bot taxonomy for Claude surfaces.

Primary Sources

Anthropic web crawling guidance

Agent Execution LayerBiweekly

Chrome + WebMCP Early Preview

Expose safe, declarative, and deterministic action surfaces so compatible browser agents can execute workflows reliably.

Last Verified

2026-02-18

Official Guidance

Chrome announced WebMCP early preview support to improve model-to-web interactions.
Execution reliability depends on explicit form/action contracts rather than DOM guesswork.
WebMCP direction emphasizes security boundaries, capability declarations, and predictable calls.

AgentSurge Implementation Checklist

Map high-value actions (search, add-to-cart, checkout) to stable endpoints.
Validate action schemas and replay behavior before exposing tool paths.
Instrument request signing, rate limits, and auditable action logs.

Robots / Access Notes

• WebMCP readiness is not a replacement for crawl/index hygiene; both layers matter.
• Keep public discovery surfaces crawlable while protecting write actions with auth and signatures.

Bot/IP note: Treat action endpoints as security-sensitive interfaces regardless of client channel.

Browser agent runtime (tool invocation path)

Rule Change Log

2026-02-18
Medium
Confirmed WebMCP remains early-stage; deployment messaging should remain 'forward-compatible'.

Primary Sources

Recrawl FreshnessMonthly

Microsoft Bing + IndexNow

Fast URL submission and freshness signaling reduce lag between content updates and model/search retrieval.

Last Verified

2026-02-18

Official Guidance

Bing promotes IndexNow for near-real-time URL update notifications.
Automated submission is recommended for adds/updates/deletes.
Webmaster tooling should be used for coverage and recrawl diagnostics.

AgentSurge Implementation Checklist

Trigger IndexNow submissions on content publish/update/delete events.
Track submission success and retry failures via queue workers.
Align sitemap and IndexNow feed for consistency across discovery channels.

Robots / Access Notes

• Ensure critical pages are crawlable by bingbot when AI/search visibility is required.
• Avoid conflicting disallow rules between static and generated robots sections.

Bot/IP note: Validate endpoint authenticity and submission quotas for high-volume sites.

bingbotIndexNow ingestion

Rule Change Log

2026-02-18
Medium
Validated IndexNow guidance and API-driven submission positioning.

Primary Sources

Commerce Agent ChannelBiweekly

Shopify Agentic Storefronts

Connect agent discovery and checkout execution in a commerce-native flow with measurable attribution.

Last Verified

2026-02-18

Official Guidance

Shopify documents agentic storefront capabilities and early-access deployment guidance.
Storefront MCP integration defines contract boundaries for product and transaction actions.
Event-driven sync is critical for inventory, price, and policy freshness.

AgentSurge Implementation Checklist

Keep storefront product/policy data synchronized to action and retrieval layers.
Use webhook-driven updates for product create/update/delete events.
Instrument checkout attribution by provider/channel for value reporting.

Robots / Access Notes

• Do not block key product and policy pages needed for assistant retrieval.
• Separate storefront public access from authenticated write-action endpoints.

Bot/IP note: Use signed callbacks and event verification for all webhook and action endpoints.

Shopify Storefront MCP clientsAgentic storefront integrations

Rule Change Log

2026-02-18
Medium
Re-validated storefront MCP references and early-access positioning.

Primary Sources

Model Platform / Policy WatchWeekly

xAI / Grok (Public crawler policy status)

xAI provides API/model documentation, but does not currently publish a clear, first-party web-crawler token policy comparable to GPTBot or ClaudeBot docs.

Last Verified

2026-02-18

Official Guidance

xAI official docs focus on API and product capabilities, not a dedicated crawler user-agent policy page.
X Terms of Service explicitly prohibit unauthorized crawling/scraping of X services.
For external websites, treat Grok-discovery assumptions as unverified until a primary crawler policy is published.

AgentSurge Implementation Checklist

Do not hardcode non-official 'Grok bot' directives in robots policy templates.
Track suspicious and unknown assistant-like traffic via behavior + attribution logs.
Re-check xAI and X policy docs on a weekly cadence until a formal crawler spec appears.

Robots / Access Notes

• Prefer standards-based coverage (structured data, canonical quality, crawl hygiene) over speculative bot targeting.
• Label xAI-specific controls as 'policy pending' in customer-facing reports.
• Avoid claiming official Grok crawler support unless a primary source exists.

Bot/IP note: No official xAI crawler IP allowlist source was found in current first-party docs; use generic abuse controls and observability.

Not publicly documented (as of 2026-02-18)

Rule Change Log

2026-02-18
High
Added explicit 'policy not published' status to prevent non-official Grok crawler claims.

Primary Sources

Enforcement / Control PlaneBiweekly

Cloudflare AI Crawl Control

Robots directives are advisory; Cloudflare adds enforcement, compliance monitoring, and per-crawler control for AI traffic.

Last Verified

2026-02-18

Official Guidance

Cloudflare provides AI Crawl Control for monitoring and controlling AI crawler access policies.
Cloudflare documents robots compliance tracking and notes robots.txt is not a hard technical block.
Managed robots.txt can be combined with WAF rules for stronger enforcement.

AgentSurge Implementation Checklist

Mirror your AgentSurge bot policy to Cloudflare allow/block controls.
Enable compliance monitoring to detect crawlers violating robots directives.
Use WAF custom rules for high-risk paths and non-compliant AI traffic.

Robots / Access Notes

• Keep robots.txt as the declared policy, but enforce critical restrictions at edge controls.
• Review preview/staging environments to avoid accidental AI crawl of non-production content.
• Validate allow/block behavior by crawler family using Cloudflare metrics.

Bot/IP note: Use crawler identity plus IP/network validation where possible; do not rely on user-agent alone for enforcement.

Control layer (crawler-agnostic)

Rule Change Log

2026-02-18
Medium
Added enforcement-layer profile for teams that need stronger controls than robots directives alone.

Primary Sources

AI Search / Answer EngineWeekly

Perplexity

Control eligibility in Perplexity search surfaces while keeping user-triggered fetches observable and policy-safe.

Last Verified

2026-02-18

Official Guidance

PerplexityBot is for search indexing and is not used for foundation-model training.
Perplexity-User is for user-initiated fetches and generally ignores robots.txt because the request is user-driven.
Perplexity publishes JSON IP ranges and recommends combining User-Agent and IP checks in WAF.

AgentSurge Implementation Checklist

Allow PerplexityBot for visibility pages (products, docs, policy summaries).
Treat Perplexity-User requests as user-channel traffic in logs and attribution.
Auto-refresh WAF allowlists from Perplexity IP JSON endpoints.

Robots / Access Notes

• Set explicit directives for both PerplexityBot and Perplexity-User.
• Do not assume search-crawler rules apply to user-triggered fetch behavior.
• Monitor policy changes with a short verification cadence.

Bot/IP note: Use official Perplexity IP feeds as source of truth and keep automated sync in place.

PerplexityBotPerplexity-User

Rule Change Log

2026-02-18
High
Added dual-bot policy model (search vs user fetch) with IP-feed-based WAF controls.

Primary Sources

AI Assistant / User FetchBiweekly

Mistral (Le Chat)

Govern Mistral user-triggered retrieval while maintaining predictable access and observability.

Last Verified

2026-02-18

Official Guidance

MistralAI-User is documented for user actions in Le Chat.
MistralAI-User is stated as non-automatic crawling and not used for generative training crawls.
Mistral publishes crawler IP ranges in JSON for network-level controls.

AgentSurge Implementation Checklist

Add explicit robots rules for MistralAI-User where channel visibility is desired.
Track request path and conversion intent from Mistral-driven sessions separately.
Refresh and validate Mistral IP list in firewall policy automation.

Robots / Access Notes

• Keep MistralAI-User handling explicit in robots and server logs.
• Avoid over-broad deny rules that block key factual and policy URLs.
• Re-verify behavior after major docs and legal content updates.

Bot/IP note: Mistral publishes an IP JSON file intended for verification and WAF policy alignment.

MistralAI-User

Rule Change Log

2026-02-18
Medium
Added MistralAI-User policy profile and IP feed verification guidance.

Primary Sources

AI + Social FetchersBiweekly

Meta Web Crawlers

Separate training/indexing crawler policy from user-triggered fetchers and social preview crawls.

Last Verified

2026-02-18

Official Guidance

Meta-ExternalAgent is documented for web crawling including AI-model and indexing use cases.
Meta-ExternalFetcher is user-initiated and may bypass robots.txt.
facebookexternalhit may bypass robots for security/integrity checks.

AgentSurge Implementation Checklist

Publish clear robots rules for meta-externalagent and social preview paths.
Treat externalfetcher and preview checks as special traffic in observability and rate policy.
Maintain Open Graph tags within early response bytes for preview quality.

Robots / Access Notes

• Do not assume robots blocks apply equally to all Meta fetchers.
• Use explicit per-bot directives instead of wildcard-only controls.
• Pair user-agent checks with network validation for spoof resistance.

Bot/IP note: Meta documents AS32934-based verification and publishes peering/network references for crawler-origin checks.

meta-externalagentmeta-externalfetcherfacebookexternalhit

Rule Change Log

2026-02-18
High
Added multi-bot Meta policy with explicit bypass caveats for user-initiated and security fetches.

Primary Sources

Meta Web Crawlers documentation

Ecosystem Search + AI Training ControlBiweekly

Applebot + Applebot-Extended

Control Apple search discoverability and generative-model usage separately through Applebot tokens.

Last Verified

2026-02-18

Official Guidance

Applebot powers search experiences (Spotlight, Siri, Safari) and can also support Apple AI foundations.
Applebot-Extended is a control token for data usage in generative model training and does not crawl pages.
If Applebot is not specified but Googlebot is, Apple states Applebot follows Googlebot instructions.

AgentSurge Implementation Checklist

Configure Applebot and Applebot-Extended directives independently in robots.
Keep rendering resources (JS/CSS/XHR) accessible for reliable Applebot rendering.
Use robots meta directives (noindex, nosnippet, nofollow, none) where policy-sensitive.

Robots / Access Notes

• Use host-level robots for each subdomain and validate separately.
• Treat discovery controls and training controls as separate policy decisions.
• Do not assume blocking Applebot-Extended removes search discoverability.

Bot/IP note: Apple documents reverse-DNS verification and publishes Applebot CIDR references.

ApplebotApplebot-Extended

Rule Change Log

2026-02-18
High
Added Applebot-Extended split-control policy for discoverability vs generative training.

Primary Sources

About Applebot

Commerce + AI SurfacesWeekly

Amazon Crawler Family

Manage Amazon training, search, and user-action crawlers with independent controls and IP-verified policy.

Last Verified

2026-02-18

Official Guidance

Amazonbot may be used to improve products/services and can be used for Amazon AI model training.
Amzn-SearchBot is for Amazon search experiences and is documented as non-training crawl.
Amzn-User supports user actions and is documented as non-training crawl.

AgentSurge Implementation Checklist

Set independent robots rules for Amazonbot, Amzn-SearchBot, and Amzn-User.
Use noarchive/noindex/none directives where training and indexing constraints differ.
Automate WAF updates from Amazon IP feeds for each crawler family.

Robots / Access Notes

• Amazon states settings are independent and can take around 24h to apply.
• Amazon uses host-level robots and can fall back to cached robots copies.
• Amazon docs state crawl-delay is not supported.

Bot/IP note: Amazon publishes distinct IP lists for Amazonbot, Amzn-SearchBot, and Amzn-User.

AmazonbotAmzn-SearchBotAmzn-User

Rule Change Log

2026-02-18
High
Added Amazon multi-crawler profile with independent training/search/user-action controls.

Primary Sources

AI-assisted AnswersBiweekly

DuckDuckGo DuckAssistBot

Control inclusion in DuckDuckGo AI-assisted answers without impacting organic result eligibility.

Last Verified

2026-02-18

Official Guidance

DuckAssistBot crawls in real time for AI-assisted answers with source citation.
DuckDuckGo states this crawler data is not used to train AI models.
Opting out of DuckAssistBot is documented as not impacting organic rankings.

AgentSurge Implementation Checklist

Apply explicit robots policy for DuckAssistBot by intent (allow/disallow).
Track response and conversion behavior from DuckAssistBot-origin sessions.
Use published JSON IP list for edge validation and anomaly detection.

Robots / Access Notes

• DuckDuckGo documents up to ~72 hours for opt-out changes to take effect.
• Set per-domain rules and verify reflected behavior in access logs.
• Use explicit bot policy rather than wildcard-only directives.

Bot/IP note: DuckDuckGo publishes DuckAssistBot IPs and a JSON endpoint for verification.

DuckAssistBot

Rule Change Log

2026-02-18
Medium
Added DuckAssistBot policy profile for AI-assisted answer inclusion control.

Primary Sources

Search / Answer LayerMonthly

Brave Search

Maintain discoverability by preserving Googlebot crawlability and using noindex for delisting semantics.

Last Verified

2026-02-18

Official Guidance

Brave documents no differentiated crawler user agent.
Brave states pages not crawlable by Googlebot are not crawled by Brave Search.
Brave documents noindex-based delisting workflow rather than robots-based delisting.

AgentSurge Implementation Checklist

Ensure core pages are crawlable by Googlebot-equivalent rules if Brave visibility is desired.
Use noindex for delisting behavior and verify after re-fetch.
Track Brave referral and answer-surface traffic in attribution layer.

Robots / Access Notes

• Do not rely on UA-specific allowlists for Brave due non-differentiated crawler behavior.
• Manage delisting through noindex and content lifecycle endpoints.

Bot/IP note: Prioritize behavior-based validation and source analytics over strict UA matching for Brave crawler traffic.

Brave Search crawler

Rule Change Log

2026-02-18
Medium
Added Brave crawler handling model based on Googlebot crawlability dependency.

Primary Sources

Brave Search crawler help

Open Data Crawl LayerMonthly

Common Crawl (CCBot)

Manage inclusion in large open crawl datasets that are widely reused across research and model pipelines.

Last Verified

2026-02-18

Official Guidance

CCBot identifies itself explicitly and supports robots-based opt-out.
Common Crawl states CCBot honors robots.txt and supports crawl-delay behavior.
Blocking CCBot is done via direct User-agent directive in robots.txt.

AgentSurge Implementation Checklist

Set explicit CCBot directives by content class (public docs vs sensitive sections).
Tune crawl-delay if server load requires rate shaping.
Monitor if critical brand pages are unintentionally excluded from open web corpora.

Robots / Access Notes

• Use dedicated CCBot blocks rather than broad global disallow where possible.
• Keep in mind CCBot policy can influence downstream model/data availability.

Bot/IP note: Primary control surface is robots policy and crawl behavior management documented by Common Crawl.

CCBot

Rule Change Log

2026-02-18
Medium
Added CCBot controls for opt-out and crawl-delay policy management.

Primary Sources

Operational Note

Why this exists

Rule drift is one of the fastest ways teams lose AI visibility. The Rules Center keeps policy updates centralized, links every claim to primary documentation, and translates guideline changes into implementation actions your engineering and growth teams can execute.