Blocked GPTBot to protect training data. Just realized I also blocked ChatGPT's live search.

Milan Novák
🔧 Schema, llms.txt, technical fundamentals

Found this the embarrassing way. Was looking at why my GEO score on ChatGPT had flatlined for 8 weeks while Perplexity kept improving. Finally checked my robots.txt.

In 2024, I added a blanket Disallow: / under User-agent: GPTBot. Made sense at the time — I didn't want OpenAI training on my content.

Problem: GPTBot is the training scraper. OAI-SearchBot is the live-search retrieval bot. They're different user-agents. My block was killing both.

Same issue with Anthropic: ClaudeBot (training, fine to block) vs Claude-Web (retrieval, should allow). I had them listed as one.

The surgical 2026 robots.txt I ended up with:

```

Block training scrapers

User-agent: GPTBot Disallow: /

User-agent: ClaudeBot Disallow: /

User-agent: Google-Extended Disallow: /

Allow retrieval bots

User-agent: OAI-SearchBot Allow: /

User-agent: Claude-Web Allow: /

User-agent: PerplexityBot Allow: / ```

I also added User-agent: ChatGPT-User (the browsing plugin bot) as an Allow since that's another retrieval path people miss.

Verify your setup with curl -A 'OAI-SearchBot' https://yourdomain.com/ — should return 200 with full HTML, not a redirect or error.

If you added any GPTBot blocks in 2023-2024, worth auditing whether you also killed retrieval.

53

3 replies

  1. Ada K.

    I made the exact same mistake. Worse actually — I had a WAF rule that blocked all crawlers matching 'GPT' in the user-agent string, which caught OAI-SearchBot too since the string appears there. Took me weeks to find it. Your curl diagnostic is the right first check.

  2. Dave A.

    For anyone considering the WAF route to block training scrapers: Cloudflare's 'AI scrapers' toggle is actually fairly well maintained and handles the training-vs-retrieval distinction better than most manual robots.txt configs i've seen. The manual route Milan describes is right but it's also a maintenance burden as new bots spin up.

  3. Leo H.

    wait, ChatGPT-User is a different user-agent from OAI-SearchBot? I only had OAI-SearchBot in my Allow list. Going to check if ChatGPT-User is being blocked by a wildcard rule.

Add a reply

Have a related question or experience?

Post a new question