Technical SEO • informational intent

Why GPTBot Is Blocked by Your robots.txt (and You Probably Didn't Do It)

A diagnostic walkthrough of the seven most common ways GPTBot ends up blocked — inherited defaults, security plugins, hosting providers, CDN settings, and copy-pasted snippets — and how to identify which one is affecting your site.

Apr 11, 202612 min readSite owners troubleshooting why their site isn't reachable by ChatGPT

why gptbot is blocked by robots.txtgptbot blocked unexpectedlyrobots.txt blocking gptbot inheritedhosting provider blocks ai crawlerswordpress security plugin blocks gptbotgptbot disallow how it got there

You almost certainly did not type 'Disallow: GPTBot'

Most of the sites we audit with GPTBot blocks have one thing in common: the owner has no memory of adding the rule. They didn't write it, they didn't authorize it, and they're often surprised the rule is there at all. Yet there it sits in robots.txt, quietly keeping ChatGPT from ever seeing their content.

This is because robots.txt is rarely a file that one person writes and maintains. It is assembled by a combination of CMS defaults, plugins, hosting providers, templates, and copy-pasted snippets from blog posts. Over time, layers accumulate and one of them — usually without clear attribution — starts blocking AI bots.

This post is about finding out who or what added the rule, so you can remove it at the source instead of just patching the file. If you remove the rule without finding its origin, the next CMS update or plugin sync will put it right back.

Suspect #1: a security or SEO plugin you installed last year

This is the single most common source. WordPress plugins like Wordfence, Sucuri, iThemes Security, and All in One SEO have added 'block aggressive crawlers' features over the last two years, and in several cases these features started blocking AI crawlers in updates that weren't advertised as touching bot rules.

Shopify merchants see the same thing with apps that promise 'content protection' or 'competitor scraper blocking' — these apps often inject GPTBot and ClaudeBot into a block list by default. Because the blocks are added by the app rather than by you, editing robots.txt manually doesn't help: the app re-adds them every time its settings are re-applied.

To confirm this is your cause, check the settings panels of every security, SEO, and bot-management plugin on your site. Look for a 'crawler control', 'bot protection', or 'AI opt-out' section. If there's a toggle labeled something like 'block AI training crawlers', it's almost certainly the culprit.

Suspect #2: a hosting provider default

Some managed hosting platforms add bot rules to all customer sites by default. WP Engine, Kinsta, Pantheon, and a handful of Shopify Plus wrapper agencies have all shipped defaults at various points that block or challenge GPTBot, typically framed as 'performance protection' or 'DDoS mitigation'.

These rules can live in several places: in a server-level robots.txt that merges with your own, in a nginx config that rewrites responses to known bot user agents, or in a CDN configuration the host manages on your behalf. Because you can't see or edit these rules from your site's admin panel, they are the hardest class of blocks to diagnose.

The diagnostic: curl your robots.txt from outside your control panel. Compare the response to the version of the file you think you have. If they differ, something between your origin and the public internet is rewriting the file. Open a ticket with your host asking whether they block GPTBot by default, and whether that can be disabled for your account.

Suspect #3: a 2023-era blog post you copy-pasted

When GPTBot first launched in August 2023, a wave of blog posts and publisher guides recommended blocking it on principle. Many of them included drop-in robots.txt snippets that readers pasted directly into their files. Two years later, those snippets are still there, usually on sites whose owners no longer remember why they added them.

The tell for this cause: a 'Disallow: /' under 'User-agent: GPTBot' that is the only AI bot rule in your file, with no equivalent rule for ClaudeBot or PerplexityBot (because those bots didn't exist yet when the blog post was written). If your file blocks only GPTBot and nothing newer, you probably inherited a 2023 snippet.

This is the easiest class to fix — just remove the lines. But do the removal in your source of truth (your CMS, your Git-managed config, or wherever robots.txt is generated from) rather than in the deployed file, or your next deploy will restore the block.

Suspect #4: a wildcard rule that accidentally matches GPTBot

Sometimes the rule blocking GPTBot isn't about GPTBot at all. A broader rule — 'User-agent: *' with 'Disallow: /' or 'Disallow: /api/' — applies to GPTBot along with everything else. The intent was to block content scrapers or hide internal APIs, but GPTBot happens to be caught in the same net.

The fix for this case is different from removing a GPTBot-specific block. You don't want to remove the wildcard; you want to add a named GPTBot section that overrides the wildcard for this one bot. When robots.txt parsers see both 'User-agent: *' and 'User-agent: GPTBot', they apply only the GPTBot section for GPTBot traffic, so a bot-specific Allow wins cleanly.

Watch out for wildcards in path patterns too. 'Disallow: /*?' (intended to block query-string URLs) will block anything GPTBot tries to crawl that has a query parameter. 'Disallow: /*.json$' blocks structured data files. These wildcard path rules are a quieter source of AI blocks that don't show up when you search the file for 'GPTBot'.

Suspect #5: your CDN is the one doing the blocking

Not every GPTBot block lives in robots.txt. If your file looks correct but GPTBot is still not reaching you, the block has moved one layer up — to your CDN, WAF, or bot management service. In these cases the bot never sees your robots.txt at all; the CDN returns a 403 or challenge page before the request reaches your server.

Cloudflare's Bot Fight Mode is the most common network-layer blocker. AWS WAF and Akamai Bot Manager have similar effects. The diagnostic is to check your access logs (or your CDN's analytics dashboard) for requests from 'GPTBot'. If you see no requests at all, the CDN is silently dropping them before they ever hit your origin. If you see 403 or 429 status codes, the CDN is actively rejecting them.

The fix is a CDN allow rule for AI crawler user agents, not a robots.txt change. See our dedicated guide on allowing AI crawlers for the exact Cloudflare and AWS configurations.

Suspect #6: a recent plugin update or CMS migration

GPTBot rules sometimes appear suddenly, between one week and the next, because a routine update changed the default behavior of a plugin you already had. This is especially common after WordPress core upgrades, Shopify theme updates, or CMS migrations.

The evidence for this cause: your Perplexity or ChatGPT citations were fine a month ago and have dropped off recently. Cross-reference the timing against your change log — the plugin update closest in time to the visibility drop is your prime suspect. In many cases you can verify by checking the plugin's release notes for any mention of bot management or crawler changes.

Preventing this going forward usually means auditing plugins after every major update, and maintaining a simple monitoring script that checks robots.txt and your key pages' crawler accessibility on a schedule. It's the most common way we see sites go from 'working' back to 'blocked' after they fix the issue once.

Execution Checklist

• Check every security, SEO, and bot-management plugin for 'block AI crawlers' toggles.
• Compare your hand-edited robots.txt against the live file served from your public URL.
• Ask your managed host whether they block GPTBot by default and how to disable it for your account.
• Look for a lone 'User-agent: GPTBot / Disallow: /' — likely a 2023-era copy-paste.
• Check wildcard path rules ('Disallow: /*?', 'Disallow: /*.json$') for accidental AI matches.
• Look in CDN logs or dashboards for GPTBot requests — missing requests mean a network-layer block.
• After fixing the root cause, set a monitoring check so a future update doesn't silently reintroduce the block.

FAQ

Is there a tool that tells me exactly which line is blocking GPTBot?

Yes. Any AI visibility audit tool (including ours) can tell you whether GPTBot is currently blocked and show the specific robots.txt rule or CDN response causing the block. For a manual check, paste your robots.txt into a strict parser (the Python robotparser library works) and pass 'GPTBot' as the user agent — it will tell you which rule wins for a given path.

I edited robots.txt to unblock GPTBot, but it's still blocked. Why?

Three common reasons: (1) your CMS or plugin regenerated the file and reverted your edits, (2) your CDN is blocking at the network layer regardless of robots.txt, or (3) the file you edited is not the one being served (some hosts use a different canonical file for bots). Curl your robots.txt from outside your server to see what GPTBot actually sees.

Does blocking GPTBot protect my content from being used for training?

Only partially. GPTBot honors robots.txt, so a correct block prevents OpenAI from crawling your site. However, your content may already be in the training set from a previous crawl, and it may also appear on third-party sites that are themselves crawled. robots.txt is a forward-looking signal, not a retroactive deletion. If content removal is your goal, you also need to submit opt-out requests to OpenAI and other providers.

Will unblocking GPTBot improve my traffic right away?

Not immediately. GPTBot collects content for training, so its crawl only matters when OpenAI next trains or refreshes its knowledge — which happens on a cadence of weeks to months. For faster results, also make sure ChatGPT-User (the live browsing agent) can reach your site. That bot fetches pages in real time during active ChatGPT sessions, and changes there show up in answers within days.

Run Free Audit View Pricing Back to Blog