How to measure AI visibility — Share of Voice, citations, and outcomes

Why measurement is the actual lever

Every previous guide in this collection — llms.txt, robots.txt, JSON-LD, ChatGPT, Perplexity, AI Mode, Reddit — describes a thing you can ship. Without measurement, none of those ships produce a decision you can defend in a meeting. Did the Reddit comment move the needle? Did the JSON-LD change citation accuracy? Did the robots.txt update accidentally kill citations? You cannot know without measuring.

Three numbers cover ~95% of the meaningful answers:

Share of Voice (also Share of Model) — of all the buyer-questions your prospects ask AI engines, what percentage now cite your brand?
AI citation count over time — is the raw count rising, flat, or falling per engine (ChatGPT / Perplexity / Google AI Mode)?
Outcome of specific actions — when you ship llms.txt or a Reddit comment, does Share of Voice on the target prompts actually move 14 days later, or is it noise?

The vocabulary AI search tools converged on

AI search is a young enough category that until late 2025 every vendor used slightly different names for the same numbers. The category has now converged on a shared vocabulary — every major GEO / AEO tool (Otterly, Profound, Peec AI, AthenaHQ, BrightEdge, Tinuiti, and GEO Tracker AI) uses the same words:

Share of Voice (sometimes Share of Model) — share of tracked buyer-prompts where the AI engine cites your brand.
AI citation— individual brand appearance inside an AI engine's answer.
Citation position — average rank of your brand within the AI answer (top of list, mid, bottom).
Visibility score (or composite GEO Score) — vendor-specific 0-100 number combining the above three.

If you read a Tinuiti report and an Otterly homepage and our docs, you should now see the same vocabulary everywhere. If a tool still uses "mention rate" or a non-standard term, that's a flag the team hasn't updated their copy.

Build a buyer-question panel (the foundation)

Every AI visibility measurement starts with a fixed list of questions your prospective customers actually ask AI engines. This panel is the denominator for Share of Voice and the X-axis for citation tracking over time.

How to build it:

Start with 5–15 questions.More is noise; fewer is too narrow. Mix high-intent ("best X for Y"), category ("what is X"), and competitive ("X vs Y") shapes.
Use the exact phrasing buyers use. Not your marketing wording. Pull from sales call recordings, support tickets, search-console queries.
Re-run on a fixed cadence. Weekly minimum, daily for high-volatility categories. Single-shot scans are anecdote; trend lines are data.
Keep the panel stable for 90 days. Resist the urge to add or remove questions mid-period. Stable panel = comparable Share of Voice across time.

Share of Voice — the headline metric

Definition: of the prompts in your panel, the percentage where the AI engine cites your brand in its answer. Numerator = scans where you appear. Denominator = total scans (panel size × engines × scan count).

Same math across every major tool in 2026. The math is trivial; the discipline is in keeping the panel stable and re-running on a fixed cadence.

Practical interpretation bands:

0–10% SoV — invisible. Your brand is not in the AI consideration set for the questions buyers ask. The technical hygiene guides (llms.txt, robots.txt, JSON-LD) are step 1.
10–35% SoV — appearing occasionally, inconsistently. Step 2 is content density and Reddit / community presence.
35–65% SoV — meaningful share but losing key prompts. Step 3 is competitive analysis — why are you losing the prompts you lose?
65%+ SoV — dominant. Step 4 is defending against new entrants and protecting against drift.

AI citations and citation position

Share of Voice answers "am I cited?". Citation-level metrics answer "how am I cited?". Two specifics worth tracking:

Raw citation count — total mentions per engine per week. Less elegant than SoV but useful for catching big jumps or drops faster.
Citation position — when cited, are you the first, second, or fifth brand named? Gemini measurement work by Tinuiti suggests average citation position of 1.6 in their data; ChatGPT trends similar. Top-3 position drives most downstream click-through and brand-search lift.
Citation source (3rd-party domains) — when the AI cites your brand, which 3rd-party URLs does it pull from? G2 review? Reddit thread? Your blog? This tells you where to invest next.

The 14-day Outcome Loop — attribute actions to deltas

The hardest measurement question in AI search is causation. SoV moved last week — was that your Reddit comment, the AI engine's model update, or noise? You can't fully eliminate noise, but you can isolate the action.

The Outcome Loop pattern:

Mark the action. When you ship something (a Reddit comment, a comparison page, a schema fix), record the timestamp.
Anchor a 14-day measurement window. Pre-window: 14 days before the action. Post-window: 14 days after.
Compare Share of Voice and citation count on the target prompts only. Not your whole panel — the prompts the action was meant to affect.
Tag with confidence band. Low / medium / high based on sample size and effect size. A 5pp SoV shift on 3 scans is low confidence; the same shift on 30 scans is high.

This is the loop GEO Tracker AI's product automates. It's also the loop you can run manually if you have the discipline — the maths is trivial, the discipline is hard.

Tools — free vs paid, what each does

Practical 2026 landscape:

Free 60-second audit (GEO Tracker AI /grader) — single Perplexity scan on a 3-question panel. Good for first directional read. No history, no multi-engine, no outcome loop.
Manual ChatGPT / Perplexity / AI Mode probing — works but doesn't scale beyond 5-10 prompts and has session-bias problems (history skews answers).
Otterly.ai, Peec AI, Profound, AthenaHQ — paid tools, $99-$595/mo, single-engine to multi-engine Share of Voice tracking. Different strengths — see our /compare pages for the breakdown.
BrightEdge / Semrush AI Toolkit / Ahrefs Brand Radar — enterprise tools, broader feature coverage, higher price points. Right for marketing teams with existing tooling budgets.
GEO Tracker AI (full product, Pro $129 / Business $299) — what we build. Multi-engine (ChatGPT + Perplexity + Google AI Mode), 14-day Outcome Loop, action layer with paste-ready outreach drafts. Self-serve, no demo call.

For Phase 1 (figuring out if you need to measure at all): start with the free audit. For Phase 2 (consistent tracking with attribution): pick a paid tool.

Common measurement mistakes

Asking your personal ChatGPT. Chat history skews answers heavily. Use a fresh session or controlled-environment tool.
Changing the panel mid-period. Adding or removing prompts breaks period-over-period comparability. Lock the panel for 90 days minimum.
Measuring once and concluding. Single scans are noise. AI engines re-rank weekly; you need at least 4 scans before any conclusion is defensible.
Attributing to one variable.If you shipped a Reddit comment, a JSON-LD update, and a new blog post in the same week, you can't cleanly attribute SoV movement to any one. Stagger actions for clean attribution.
Confusing Share of Voice with rank. SoV is binary per scan (cited or not). Position measures rank when cited. They're different metrics — track both.
Ignoring the engines you don't use. Your buyers may use Claude or Gemini even if you don't. Multi-engine measurement catches the gap you can't see from your own usage.