Technical SEO • informational intent
ChatGPT-User vs GPTBot: The Two OpenAI Crawlers You Need to Understand
OpenAI runs multiple crawlers and they do very different things. This guide explains the difference between GPTBot, ChatGPT-User, and OAI-SearchBot — and which ones to allow, block, or monitor based on your goals.
Why OpenAI runs multiple crawlers
OpenAI's crawling infrastructure is deliberately split into multiple named bots so that site owners can make different consent decisions for different use cases. The three main bots — GPTBot, ChatGPT-User, and OAI-SearchBot — serve distinct purposes, and treating them as interchangeable leads to the wrong robots.txt rules.
The split reflects how AI products actually work. Training data collection is a one-to-many operation: a single crawl feeds a model that will answer millions of future questions. Real-time browsing is a one-to-one operation: a single fetch happens in response to a specific user asking a specific question. Search indexing is a third pattern: continuous recrawling to maintain a queryable index. Each has different bandwidth, freshness, and consent characteristics, and each needs its own identity so you can control it independently.
Understanding which bot does what is the difference between accidentally blocking all of ChatGPT and deliberately making a nuanced decision about which parts of ChatGPT see your site.
GPTBot: the training crawler
GPTBot is the bot that collects content for training future OpenAI models. When GPTBot crawls your site, the resulting pages may be incorporated into the dataset used to train the next version of GPT-4, GPT-5, or whatever model OpenAI trains next. It is slow, batched, and not linked to any specific user query.
The token for robots.txt: 'GPTBot'. The full user agent includes 'GPTBot/1.2' (or whatever version is current). GPTBot is the bot most site owners think of when they say 'block AI' or 'allow AI' — it's the one associated with content licensing debates and opt-out discussions. Blocking GPTBot prevents your future content from being absorbed into training data, but does not affect content that was already crawled in earlier sweeps.
Whether to allow GPTBot depends on your content strategy. Allowing it means your content contributes to the shared knowledge ChatGPT uses when answering questions, which increases the odds you'll be mentioned in 'cold' answers (answers generated without live browsing). Blocking it preserves your content for exclusive use but removes you from that shared knowledge pool over time as the model is retrained.
ChatGPT-User: the live browsing agent
ChatGPT-User is a completely different bot with a completely different job. It is the fetcher that makes live HTTP requests during an active ChatGPT conversation when a user asks a question that triggers web browsing. Unlike GPTBot, its crawls are small, one-shot, and directly tied to a user's specific request.
The token for robots.txt: 'ChatGPT-User' (with the hyphen — case matters). When a ChatGPT user clicks a citation link or asks ChatGPT to browse, this is the bot that hits your site. Blocking it means your content cannot be cited in real-time answers, even if GPTBot is allowed and your content is already in training data.
This is the bot most businesses should focus on allowing. Its behavior directly determines whether your site can be mentioned in the current ChatGPT conversation, which is the moment users are actively looking for answers. Blocking ChatGPT-User while allowing GPTBot is usually a mistake — it means you're donating content to training but blocking the retrieval surface that would turn training into visibility.
OAI-SearchBot: the search index crawler
OAI-SearchBot is OpenAI's third crawler, introduced more recently than the other two. Its job is to build and maintain the search index that powers SearchGPT — OpenAI's search feature inside ChatGPT. Unlike ChatGPT-User (which fetches on demand), OAI-SearchBot recrawls sites on a schedule to keep an index fresh.
The token for robots.txt: 'OAI-SearchBot'. This bot behaves more like Googlebot than like the other OpenAI crawlers — it cares about freshness, sitemaps, and comprehensive coverage. Allowing it is the path to appearing in SearchGPT results, which are shown to ChatGPT users when they search the web from inside the ChatGPT interface.
OAI-SearchBot is often the most important of the three to allow for ecommerce and content sites. SearchGPT results drive clicks the same way Google results do, but with newer infrastructure and fewer advertising intermediaries. Sites that get into the SearchGPT index see measurable referral traffic in a way that training-data presence alone doesn't deliver.
The three-bot decision matrix
With three bots, there are eight possible allow/block combinations. Most of them are either optimal or nonsensical; a few are common configurations worth naming.
- Allow all three (recommended for most) — Your content is used for training, can be cited live, and appears in SearchGPT. This is the default for businesses that benefit from AI visibility.
- Block all three — You opt out of OpenAI entirely. Use this only if you have licensing restrictions or a principled opt-out; it costs you all OpenAI-driven visibility.
- Allow ChatGPT-User and OAI-SearchBot, block GPTBot — Preserve content from training use while remaining citable in real time and indexed in search. A middle ground some publishers use.
- Allow GPTBot only — You donate to training but cannot be cited live or appear in search. This is almost always a mistake unless you specifically want to influence future model behavior without user-facing visibility.
- Allow OAI-SearchBot only — You appear in SearchGPT results but not in citations or training. Reasonable for sites that want the search channel specifically and don't care about the others.
Per-bot testing and verification
Because the three bots have distinct user agents, you can test each one independently and confirm your intended configuration is working. Curl each user agent against a content page and check the response. All three should return 200 with your actual HTML if you're allowing them, or 403/404 if you're blocking them.
A subtlety: some CDN bot-management tools lump all OpenAI crawlers into one category. When you configure a skip rule for 'OpenAI bots', it may be applied to all three regardless of what you intend. If you want to allow some and block others, you may need separate CDN rules that match the exact user agent strings rather than a category.
Access logs are the other side of verification. Over the course of a week or two, you should see requests from all three user agents (assuming you've allowed them). If one is missing, something is blocking it at a layer you haven't looked at — most commonly the CDN or a security plugin.
Execution Checklist
- • Learn the three OpenAI bots and what each one does (training, live browsing, search indexing).
- • Decide intentionally which of the three to allow based on your content strategy.
- • Write explicit named sections in robots.txt for each bot (don't rely on wildcards).
- • Test each bot's access individually with curl and the exact user agent string.
- • Check CDN bot-management settings for category-level rules that affect multiple bots at once.
- • Review access logs to confirm you're seeing traffic from each allowed bot.
- • Re-audit your bot configuration every quarter — OpenAI occasionally adds new bots or changes behavior.
FAQ
If I only allow one OpenAI bot, which should it be?
ChatGPT-User. It is the bot directly tied to live citations in active ChatGPT conversations, which is the moment users are actively looking for answers. Allowing ChatGPT-User without GPTBot or OAI-SearchBot means you can be mentioned in retrieval-based responses without contributing content to training data or the SearchGPT index.
Does blocking GPTBot affect ChatGPT-User or OAI-SearchBot?
No — each bot obeys only its own named section in robots.txt. Blocking GPTBot does not block ChatGPT-User or OAI-SearchBot. This is the whole point of having three separate bots: you can make different decisions for each. Make sure to write explicit named sections if you want per-bot control.
Is there a way to rate-limit ChatGPT-User without blocking it?
Not through robots.txt. Crawl-delay directives are generally ignored by OpenAI crawlers. If ChatGPT-User traffic is causing load issues, rate-limit at your CDN or server layer by user agent. This is rarely necessary — ChatGPT-User fetches are small and one-shot, tied to individual user conversations, so volume is usually modest compared to training crawlers.
Do other AI providers have similar multi-bot splits?
Partially. Google splits Googlebot (for search) from Google-Extended (for generative AI). Apple splits Applebot from Applebot-Extended. Anthropic's ClaudeBot is currently a single bot that handles both training and retrieval — no split today. Perplexity uses PerplexityBot for everything. The pattern is moving toward more granular bots over time as publishers demand more control, so expect more splits in 2026 and beyond.