Why measurement is the actual lever
Every previous guide in this collection — llms.txt, robots.txt, JSON-LD, ChatGPT, Perplexity, AI Mode, Reddit — describes a thing you can ship. Without measurement, none of those ships produce a decision you can defend in a meeting. Did the Reddit comment move the needle? Did the JSON-LD change citation accuracy? Did the robots.txt update accidentally kill citations? You cannot know without measuring.
Three numbers cover ~95% of the meaningful answers:
- Share of Voice (also Share of Model) — of all the buyer-questions your prospects ask AI engines, what percentage now cite your brand?
- AI citation count over time — is the raw count rising, flat, or falling per engine (ChatGPT / Perplexity / Google AI Mode)?
- Outcome of specific actions — when you ship llms.txt or a Reddit comment, does Share of Voice on the target prompts actually move 14 days later, or is it noise?
The vocabulary AI search tools converged on
AI search is a young enough category that until late 2025 every vendor used slightly different names for the same numbers. The category has now converged on a shared vocabulary — every major GEO / AEO tool (Otterly, Profound, Peec AI, AthenaHQ, BrightEdge, Tinuiti, and GEO Tracker AI) uses the same words:
- Share of Voice (sometimes Share of Model) — share of tracked buyer-prompts where the AI engine cites your brand.
- AI citation— individual brand appearance inside an AI engine's answer.
- Citation position — average rank of your brand within the AI answer (top of list, mid, bottom).
- Visibility score (or composite GEO Score) — vendor-specific 0-100 number combining the above three.
If you read a Tinuiti report and an Otterly homepage and our docs, you should now see the same vocabulary everywhere. If a tool still uses "mention rate" or a non-standard term, that's a flag the team hasn't updated their copy.
Build a buyer-question panel (the foundation)
Every AI visibility measurement starts with a fixed list of questions your prospective customers actually ask AI engines. This panel is the denominator for Share of Voice and the X-axis for citation tracking over time.
How to build it:
- Start with 5–15 questions.More is noise; fewer is too narrow. Mix high-intent ("best X for Y"), category ("what is X"), and competitive ("X vs Y") shapes.
- Use the exact phrasing buyers use. Not your marketing wording. Pull from sales call recordings, support tickets, search-console queries.
- Re-run on a fixed cadence. Weekly minimum, daily for high-volatility categories. Single-shot scans are anecdote; trend lines are data.
- Keep the panel stable for 90 days. Resist the urge to add or remove questions mid-period. Stable panel = comparable Share of Voice across time.
Share of Voice — the headline metric
Definition: of the prompts in your panel, the percentage where the AI engine cites your brand in its answer. Numerator = scans where you appear. Denominator = total scans (panel size × engines × scan count).
Same math across every major tool in 2026. The math is trivial; the discipline is in keeping the panel stable and re-running on a fixed cadence.
Practical interpretation bands:
- 0–10% SoV — invisible. Your brand is not in the AI consideration set for the questions buyers ask. The technical hygiene guides (llms.txt, robots.txt, JSON-LD) are step 1.
- 10–35% SoV — appearing occasionally, inconsistently. Step 2 is content density and Reddit / community presence.
- 35–65% SoV — meaningful share but losing key prompts. Step 3 is competitive analysis — why are you losing the prompts you lose?
- 65%+ SoV — dominant. Step 4 is defending against new entrants and protecting against drift.
AI citations and citation position
Share of Voice answers "am I cited?". Citation-level metrics answer "how am I cited?". Two specifics worth tracking:
- Raw citation count — total mentions per engine per week. Less elegant than SoV but useful for catching big jumps or drops faster.
- Citation position — when cited, are you the first, second, or fifth brand named? Gemini measurement work by Tinuiti suggests average citation position of 1.6 in their data; ChatGPT trends similar. Top-3 position drives most downstream click-through and brand-search lift.
- Citation source (3rd-party domains) — when the AI cites your brand, which 3rd-party URLs does it pull from? G2 review? Reddit thread? Your blog? This tells you where to invest next.
The 14-day Outcome Loop — attribute actions to deltas
The hardest measurement question in AI search is causation. SoV moved last week — was that your Reddit comment, the AI engine's model update, or noise? You can't fully eliminate noise, but you can isolate the action.
The Outcome Loop pattern:
- Mark the action. When you ship something (a Reddit comment, a comparison page, a schema fix), record the timestamp.
- Anchor a 14-day measurement window. Pre-window: 14 days before the action. Post-window: 14 days after.
- Compare Share of Voice and citation count on the target prompts only. Not your whole panel — the prompts the action was meant to affect.
- Tag with confidence band. Low / medium / high based on sample size and effect size. A 5pp SoV shift on 3 scans is low confidence; the same shift on 30 scans is high.
This is the loop GEO Tracker AI's product automates. It's also the loop you can run manually if you have the discipline — the maths is trivial, the discipline is hard.
Tools — free vs paid, what each does
Practical 2026 landscape:
- Free 60-second audit (GEO Tracker AI /grader) — single Perplexity scan on a 3-question panel. Good for first directional read. No history, no multi-engine, no outcome loop.
- Manual ChatGPT / Perplexity / AI Mode probing — works but doesn't scale beyond 5-10 prompts and has session-bias problems (history skews answers).
- Otterly.ai, Peec AI, Profound, AthenaHQ — paid tools, $99-$595/mo, single-engine to multi-engine Share of Voice tracking. Different strengths — see our /compare pages for the breakdown.
- BrightEdge / Semrush AI Toolkit / Ahrefs Brand Radar — enterprise tools, broader feature coverage, higher price points. Right for marketing teams with existing tooling budgets.
- GEO Tracker AI (full product, Pro $129 / Business $299) — what we build. Multi-engine (ChatGPT + Perplexity + Google AI Mode), 14-day Outcome Loop, action layer with paste-ready outreach drafts. Self-serve, no demo call.
For Phase 1 (figuring out if you need to measure at all): start with the free audit. For Phase 2 (consistent tracking with attribution): pick a paid tool.
Common measurement mistakes
- Asking your personal ChatGPT. Chat history skews answers heavily. Use a fresh session or controlled-environment tool.
- Changing the panel mid-period. Adding or removing prompts breaks period-over-period comparability. Lock the panel for 90 days minimum.
- Measuring once and concluding. Single scans are noise. AI engines re-rank weekly; you need at least 4 scans before any conclusion is defensible.
- Attributing to one variable.If you shipped a Reddit comment, a JSON-LD update, and a new blog post in the same week, you can't cleanly attribute SoV movement to any one. Stagger actions for clean attribution.
- Confusing Share of Voice with rank. SoV is binary per scan (cited or not). Position measures rank when cited. They're different metrics — track both.
- Ignoring the engines you don't use. Your buyers may use Claude or Gemini even if you don't. Multi-engine measurement catches the gap you can't see from your own usage.