How ChatGPT actually picks citations (2026)

ChatGPT's answers come from two distinct sources, and the citation behaviour differs between them:

Trained knowledge.The base model's weights encode everything OpenAI's training corpus contained at the cut-off date. ChatGPT often answers from this knowledge without any web access — no citation shown, no source URL displayed. If your brand is "baked in" you get mentioned in the prose without a clickable link.
Live retrieval (search mode). When the model decides current information is needed, it triggers a web search. Citations show up as clickable footnote-style links. This is where source quality, freshness, and technical eligibility matter most.

Both paths matter, but the levers you can pull are different. Training data is locked until the next checkpoint; live retrieval reacts to the open web in days to weeks.

The three bots that fetch your site for ChatGPT

OpenAI split its crawlers into three user-agents, each with a different purpose:

GPTBot — training data collection. Blocking this opts you out of future OpenAI model training. Does not affect live citations.
OAI-SearchBot— fetches pages cited in ChatGPT's search-mode answers. Does not train. Blocking this stops your site appearing as a citation source.
ChatGPT-User— live retrieval triggered by a user action (clicking "browse", asking a real-time question, or certain product actions). Does not train. OpenAI states that because the fetch is user-initiated, robots.txt rules may not always apply.

Practical implication: if you want to opt out of training but still be cited, block GPTBot and allow OAI-SearchBot + ChatGPT-User. The free robots.txt builder covers exactly this preset.

The Bing backbone — why classic SEO still matters

ChatGPT's search mode runs on a combination of OpenAI's own index and Microsoft Bing. For most queries that involve buyer-intent product comparisons or category lookups ("best X for Y"), Bing is the primary source of candidate URLs. ChatGPT then ranks and cites a subset.

This has two concrete consequences for optimisation:

Bing indexing is the floor.If Bing doesn't index your page, ChatGPT cannot cite it in search mode. Submit your sitemap via Bing Webmaster Tools, fix Bing-specific crawl errors, and treat Bing as a first-class search engine — not an afterthought.
Classic SEO signals carry through. Domain authority, on-page structure, page speed, schema.org markup, and internal linking all influence Bing rank, which influences ChatGPT citation eligibility. "AI SEO" is not a separate discipline — it's classic SEO with a few extra signals layered on top.

What ChatGPT cites most (data, not opinion)

Tinuiti's Q1 2026 AI Citation Trends Report (the largest publicly available study of AI citation patterns) measured ChatGPT's citation distribution across nine verticals. Useful aggregate signals:

Reddit appears in > 5% of ChatGPT responses (vs 0.1% for Gemini, 24% for Perplexity). Less concentrated than Perplexity but still a non-trivial source.
Wikipedia, YouTube, and major reference sites dominate the high-authority slot — these come from trained knowledge plus Bing's authority signals.
Government, academic, and institutional domains are over-represented vs general SERP. ChatGPT skews conservative on authority — your brand site competes more against nih.govthan against a peer SaaS's blog.
Structured product information (pricing, features, comparisons) gets pulled preferentially from pages with clean schema.org markup vs prose-only product pages.

For the engine-by-engine breakdown, see our analysis AI Visibility Is Not One Channel and ChatGPT vs Perplexity citations.

The 8-step action plan

Verify Bing indexing. Submit sitemap.xml via Bing Webmaster Tools. Resolve crawl errors. This is the floor — without it, no search-mode citation is possible.
Ship clean Organization JSON-LD. Consistent name, url, sameAs across every page. ChatGPT pulls brand-entity facts preferentially from this block. Use our JSON-LD generator if you don't have one yet.
Allow OAI-SearchBot and ChatGPT-User in robots.txt. Block GPTBot only if you have a deliberate training-opt-out stance.
Ship an llms.txt pointing at your most important buyer-question pages. Not a ranking signal — but a clean nav map for AI agents.
Get into Reddit conversations in your niche. Reddit appears in > 5% of ChatGPT responses. Comment with disclosure on threads about category problems your product solves — value-first, no link-spam. See Reddit citation strategy.
Publish opinionated comparison pages. ChatGPT cites "X vs Y" and "X alternatives" pages heavily when a user asks for category comparisons. Aim for vendor-neutral honesty — biased pages get filtered.
Refresh your top 5 pages every quarter. ChatGPT's live retrieval favours fresh content. A page updated last week beats one from 2024 for the same query.
Measure Share of Voice on a buyer-question panel. Without measurement none of the above is verifiable. Free 60-second audit at /grader.

Common mistakes

Blocking GPTBot assuming you opt out of citations. You don't. GPTBot is training-only; citation eligibility is controlled by OAI-SearchBot.
Ignoring Bing."We rank on Google so we're fine" — except ChatGPT's search mode pulls primarily from Bing for many queries. Get indexed there.
No structured product information. Pricing buried in image-only marketing copy doesn't get extracted. Plain text + schema.org Offer pulls cleanly.
Hoping for training inclusion to fix bad live retrieval. Training cut-offs are stale by 12+ months; live retrieval is what the user sees today. Fix live retrieval first.

How to verify it actually works

Ask ChatGPT directly on your fixed buyer-question panel. Note which competitors are cited alongside you (or instead of you). Do this in a fresh session — chat history skews results.
Run a controlled audit. Our free 60-second audit runs a buyer-question panel through Perplexity (which uses a similar citation model). The full GEO Tracker AI product covers ChatGPT search mode directly.
Track Share of Voice over time. Single scans are noise; trend over 30+ days is signal.

What to ship next

How to rank in Perplexity — different retrieval model, different sources.
Google AI Mode optimisation — bigger surface, different citation behaviour.
How to measure AI visibility — Share of Voice, citation tracking, Outcome Loop.