What does robots.txt do for AI bots?

robots.txt is a plain-text file at the root of your site that tells crawlers which paths they may or may not fetch. For AI bots specifically, it is the primary lever to control whether your content trains future AI models (block GPTBot, ClaudeBot, Google-Extended, etc.) and/or appears in live AI search citations (block OAI-SearchBot, Claude-User, PerplexityBot, etc.). It is a polite request — well-behaved crawlers honour it; less polite ones (some training scrapers) ignore it.

Does blocking Google-Extended affect my appearance in Google AI Mode or AI Overviews?

No. Google-Extended is an opt-out token for Bard / Gemini / Vertex AI training. AI Mode and AI Overviews are driven by Googlebot, which is unaffected by the Google-Extended rule. Blocking Google-Extended opts you out of Google AI training while keeping your AI Mode and AI Overviews visibility intact.

How long does it take for new robots.txt rules to take effect?

Polite crawlers (OpenAI, Anthropic, Google, Perplexity) re-fetch robots.txt on their own schedule — typically daily, sometimes weekly. New rules generally take 24 to 72 hours to take effect across major engines. Some less polite crawlers (e.g. Bytespider) sometimes ignore robots.txt entirely; for those, rate-limit at the CDN layer in addition to the file.

Is this builder free?

Yes. The whole tool runs in your browser — no signup, no API call, no logging. The file is built client-side and copied or downloaded directly. We do this because the broader product (GEO Tracker AI) measures whether your brand actually gets cited by ChatGPT, Perplexity, and Google AI Mode — which is the part you cannot verify by writing a good robots.txt alone.

Free tool · No signup

`robots.txt` builder for AI bots

Q: Does blocking GPTBot stop me from being cited in ChatGPT search?

No. GPTBot is OpenAI's training crawler. ChatGPT search citations use a different bot called OAI-SearchBot. To stop being cited in ChatGPT search you would also need to disallow OAI-SearchBot and ChatGPT-User. The split is intentional — OpenAI lets you opt out of training without losing citation eligibility.

Pick a stance, toggle individual bots, get a ready-to-paste file. Covers GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot, Bytespider, and 10+ more — with honest notes on what each one does. Runs in your browser, built by GEO Tracker AI.

Three presets — allow / block training / block all

Client-side only · no data leaves your browser

Copy or download · drop in your site root

Use this tool if

• You don't have a robots.txt yet, or yours hasn't been reviewed in 12+ months.
• You want a deliberate stance on whether your content trains AI models vs gets cited in live AI search.
• You're on WordPress / Webflow / Framer / Next.js / static and want a copy-paste file.

Skip this tool if

• You already have a carefully-tuned robots.txt reviewed recently — keep yours.
• You need IP-level blocking or rate-limiting, not just a polite request — handle that at the CDN / WAF layer.
• See the dynamic-route pattern in the guide for environment-conditional rules.

AI bots — pick or toggle(0 blocked / 18)

First time? Read this 30-second guide

What is robots.txt? A plain-text file you host at the root of your site that tells crawlers which paths they may or may not fetch. It is the main lever for controlling whether your content trains AI models, appears in AI citations, or both.

Pick a stance — "Allow everything" (most B2B SaaS), "Block training, allow citation" (publishers, IP-sensitive), or "Block all AI" (paid content). Load preset to apply.
Toggle individual bots if you want more nuanced rules. Hover the info hint on each bot to see what it does.
Add your sitemap URL and any site-specific paths you want blocked from all crawlers (e.g. /admin, /api/private).
Copy or download — drop the file at yourdomain.com/robots.txt.

Need more detail? The full setup guide covers per-bot context, framework setup, common mistakes, and how to verify the rules take effect.

OpenAI

Anthropic

Perplexity

Google & Apple

Other major AI crawlers

Extras

Optional fields applied to the wildcard User-agent: * rule.

Sitemap URL

Recommended. Google, Bing, and several AI crawlers use it for discovery.

Site-specific paths to disallow (one per line)

Applied to all crawlers under the default User-agent: *rule. Don't use this for secret URLs — they will be publicly listed.

Generated robots.txt

Drop in your site root as /robots.txt

# Default — all crawlers allowed (including AI bots)

User-agent: *
Allow: /

Output is a standard robots.txt with one rule block per blocked AI bot plus your wildcard. No backend call; generated in your browser. Inputs auto-save to this browser's local storage so an accidental refresh won't lose your work — use Reset to clear.

The honest part — what this controls and what it does not

robots.txt is a 30-year-old convention. It is the primary lever for telling polite crawlers what they may fetch, but it is not enforcement and it is not a ranking signal. A few clarifications worth getting right before deploying:

Does blocking GPTBot stop me from being cited in ChatGPT search?

No. GPTBotis OpenAI's training crawler. Citations in ChatGPT search use a separate bot called OAI-SearchBot. To stop being cited in ChatGPT search you would also need to disallow OAI-SearchBot and ChatGPT-User. The split is intentional — OpenAI lets you opt out of training without losing citation eligibility.

Does Google-Extended block Google AI Mode or AI Overviews?

No. Google-Extended is an opt-out token for Bard / Gemini / Vertex AI training. AI Mode and AI Overviews are driven by Googlebot, which is unaffected by the Google-Extended rule. You cannot cleanly opt out of AI Mode / AI Overviews without also opting out of all of Google Search.

How long does it take for new rules to take effect?

Polite crawlers re-fetch robots.txt on their own schedule — typically daily, sometimes weekly. New rules generally take 24–72 hours to take effect across major engines. Some less polite crawlers (e.g. Bytespider) ignore robots.txt entirely; for those, rate-limit at the CDN / WAF layer.

Is robots.txt security?

No. It is a publicly readable file. Pages listed as Disallow are more visible to attackers, not less, because they are explicitly listed in the file. Use real authentication for anything sensitive.

How do I verify the rules actually work?

Three checks: (1) curl -I yoursite.com/robots.txt to confirm the file is served at HTTP 200 with Content-Type: text/plain. (2) Google Search Console → Settings → Crawling → robots.txt report shows Google's cached version and lets you re-fetch. (3) Your own server logs, filtered by User-Agent on the blocked bot names — polite bots should stop fetching disallowed paths within 24–72 hours. None of that tells you whether AI engines cite you more or less afterwards — for that, run a controlled audit on a buyer-question panel.

Disclaimer & limits

This is a free helper that produces a standard robots.txt file at the time of our most recent review (2026-05-13). A few honest notes so expectations are right from day one:

robots.txt is a polite request, not enforcement. Well-behaved crawlers (OpenAI, Anthropic, Google, Perplexity, Common Crawl) honour it. Less polite ones (some training scrapers, low-quality data brokers, Bytespider on a bad day) ignore it. For real blocking use IP-level rules at your CDN / WAF in addition to the file.
Bot landscape changes faster than tools. Vendors add, rename, and split bots several times a year — OpenAI introduced OAI-SearchBot separately from GPTBot in 2024, Anthropic added Claude-User and Claude-SearchBot in 2024–2025, Apple added Applebot-Extended. We track these and update the tool when they happen, but cannot promise the tool always reflects the absolute latest practice — verify vendor docs for anything mission-critical.
Output is not validated. We don't check the sitemap URL or extra paths you enter. Test the resulting file with Google's robots.txt Tester before deploying, and re-curl yoursite.com/robots.txt after deploy to confirm it is reachable.
Blocking AI bots does not improve AI search visibility. In most cases it reduces it — fewer bots means less content available for citations. Reasons to block are about IP, licensing, and brand control, not ranking. If your goal is more AI citations, blocking is usually the wrong lever; the right one is content quality, entity clarity, and a citation footprint in third-party sources.
Provided as-is. The tool is provided free of charge without warranty, express or implied. We accept no liability for outcomes related to AI search visibility, organic ranking, training inclusion, or business results that follow from using or not using the generated file.

The reliable way to know whether your robots.txt changes moved AI citation behaviour is controlled measurement — before vs after, on a fixed buyer-question panel.

Next step

You shipped `robots.txt`. Now what does AI see?

Blocking is half the story — the other half is whether the bots you did let through are actually citing your brand on the buyer-questions that matter. Free 60-second audit, no card required:

Run a free AI audit Read the setup guide