PSA: blocking GPTBot ≠ blocking ChatGPT live search. They're different bots. Check your robots.txt.

Petr VlčekFounder
🔧 Schema, llms.txt, technical fundamentals

Petr here. I see this mistake in about 30% of the domains we onboard, so worth saying clearly.

If you added User-agent: GPTBot / Disallow: / in 2023 or 2024 to protect your content from training data scraping — that's completely reasonable. But GPTBot (training) and OAI-SearchBot (live search retrieval) are separate user-agents. One block does not equal two blocks.

Same thing with Anthropic: ClaudeBot (training scraper, block if you want) and Claude-Web (retrieval for Claude's browsing, allow if you want citations) are different strings. Most robots.txt files I see treat them as one.

Quick diagnostic: run curl -A 'OAI-SearchBot' https://yourdomain.com/ -o /dev/null -w '%{http_code}' — if you get anything other than 200, you're likely blocking ChatGPT's live search.

Worth 5 minutes to check. Seen teams spend weeks troubleshooting ChatGPT citation gaps with the root cause sitting in robots.txt the whole time.

61

1 reply

  1. Milan Novák

    Can confirm. Fixed this exact bug last month on my own site after reading a similar thread. The curl test you mention is the fastest way to verify. Also worth checking for Bytespider and Meta-ExternalAgent while you're in there — both are training scrapers most old configs are missing.

Add a reply

Have a related question or experience?

Post a new question