Methodology · Public reference

How we measure AI search visibility — transparently

Every number you see in GEO Tracker AI— GEO Score, Share of Voice, mention rate — comes from a real prompt sent to a real AI engine, parsed by code you can audit. No traffic-lift claims, no black-box “AI optimization” magic, no “we got you cited” attribution. Here is exactly what we run, how we score it, and what we deliberately do not promise.

Section 1

The five engines we monitor

We run five live engines that cover the overwhelming majority of AI answer surface area today. Each one runs against a documented, vendor-supported endpoint — no scraping of consumer UIs, no “simulated” chat sessions. Which engines a given scan uses depends on your plan: Free runs Perplexity, Starter adds ChatGPT, Pro adds Google AI Mode and Gemini, and Business and Scale add Google AI Overview.

EngineWhat we captureWhy it matters
ChatGPTThe web-grounded answer ChatGPT users see, captured with web search forced on — at a flat, predictable per-query cost.Default answer engine for hundreds of millions of weekly users. Highest weight in the composite GEO Score.
PerplexityNative Perplexity Sonar API with the citation-first response model — every answer ships with source URLs.Citation-first answers — every response ships with the source URLs we parse into the dashboard. On every plan, including Free.
Google AI ModeLive capture of the Google AI Mode answer — the dedicated AI tab on google.com, not the old AI Overview panel.The fastest-growing AI search surface in 2026. From Pro up.
GeminiGoogle’s flagship model, captured live at a flat per-query cost.A distinct answer surface from AI Mode, with its own citation behaviour. From Pro up.
Google AI OverviewThe real AI Overview panel above the organic SERP, captured live. When no overview renders, that absence is itself a recorded visibility signal.The answer block billions of classic Google searches now see first. Business and Scale.

Engines without at least one returned result for a given scan are excluded from the score denominator — we never penalize a brand for an engine outage we caused or a vendor returned empty.

Section 2

How a scan actually works

One scan = one tracked question, run against every engine your tier covers, parsed by deterministic code, persisted as auditable rows in our database. Four steps:

  1. 1 · Send the prompt

    We send the exact question you tracked — verbatim, no extra instructions, low temperature for parsing consistency. Concise system framing (“answer in 2–4 sentences, mention specific products by name”) keeps responses comparable across engines.

  2. 2 · Parse the response

    Layer 1 is regex-based: did the response contain the brand hostname or token? Layer 2 is deterministic interpretation — tone, top-list position, citation-by-domain. Long-tail ambiguous cases get an optional LLM refine pass. Any LLM error falls back to the deterministic verdict, never silently drops.

  3. 3 · Extract citations

    We pull every source URL the engine cited and surface them as a 30-day Citations view per question. A separate brand-level extractor names the products the engine actually recommended in prose (“HubSpot”, “Pipedrive”), not just URL hosts.

  4. 4 · Persist for replay

    Every scan, every response excerpt, every cited domain is stored. You can re-derive the score from raw rows months later. We track our own LLM spend per scan in the same ledger.

Section 3

GEO Score = mention rate × quality, weighted by engine

The GEO Score is a 0–100 composite per engine, then a weighted average across engines. The intuition has two layers:

  • Mention rate — what fraction of your tracked questions returned an answer that named your brand at all.
  • Quality — when you were named, how prominently. We snap quality to four bands {0, 40, 70, 90} mapped to not_mentioned / mentioned / recommended / top_recommended.
  • Baseline lift — a 0.40 floor on quality so a single weak mention still gets partial credit; ramping up linearly toward 1.0 as quality improves.
  • Engine blend — ChatGPT 0.35 · Google AI Mode 0.25 · Gemini 0.15 · Perplexity 0.15 · Google AI Overview 0.10. The weights renormalize across whichever engines your plan runs and which returned a result, so engines with no data in the scan window are dropped from the denominator.

The exact formula (confidence weighting, edge cases, MVP quality bands) lives in the source — read the full reference in /docs or browse the engineering blog for a deeper walkthrough.

Section 4

Smart cadence + confidence weighting

Asking an AI once is a coin flip, and running every tracked question against every engine every day would burn LLM budget on noise you cannot act on. So we check monitored questions weekly and ask each engine multiple times per question, then average — two safeguards keep cost and noise honest:

  • Weekly cadence + reps by tier. Monitored questions run weekly, with each one measured several times per engine and averaged into a mention rate with a confidence band, plus a weekly full re-scan with longer response budgets. Unmonitored questions sit at on-demand cadence until you flip them on.
  • Question Confidence buckets. Every tracked question gets a 5-signal trust score (LLM domain coherence + real People Also Ask demand + Search Console token overlap + AI scan baseline mention + how many sources AI cites for it), bucketed as Verified / Worth tracking / Off-target.
  • Confidence-weighted scoring. The GEO Score weights each scan result by its source bucket — Verified 1.0, Worth tracking 0.7, Off-target 0.3. A noisy question cannot drag a real signal into the floor.
  • Outcome Loop measurement. When you mark an action done, we re-scan the affected questions 14 days later and report the actual GEO Score delta. Lift is measured, not promised.

Section 5

What we do not claim

Most “rank in ChatGPT” vendors lead with a traffic-lift number — “216 % average traffic increase”, “3× faster citations”, “dominate GEO in 30 days”. None of those numbers are falsifiable: there is no public instrument to audit them with. We sell the instrument, so we hold ourselves to a higher bar:

What we doWhat we deliberately do not do
Show you exactly which buyer questions name you and which name competitors instead.Promise we “got you cited” — citation drift is correlated with content work, never causally attributed.
Report the actual GEO Score delta 14 days after each action.Quote a specific organic-traffic percentage lift — we don't run your analytics.
Surface the engine response excerpt verbatim with source URLs.Auto-publish anything to your CMS, your social channels, or anywhere else without explicit per-post approval.
Persist raw scans so you can replay any score months later.Hide the math behind a black box — every formula, weight, and bucket is documented.

If the answer to “did this actually work?” matters to you, measurement-grounded reporting beats SEO-by-promise every time. That is the entire thesis of this product.

See your own measurement, free

The free GEO Snapshot runs three real Perplexity Sonar scans against your domain — same engine, same parser, same scoring as the paid product. No credit card, no email gate beyond the shareable result. Upgrade later to add ChatGPT + Google AI Mode and daily cadence.

Pro starts a 14-day trial with full ChatGPT + Perplexity + Google AI Mode coverage and Mission Control cockpit. Cancel anytime in the customer portal.