Citation Source Intelligence — what it is and the three layers

Why we don't treat AI citations as undifferentiated URLs — and the three-layer architecture (extract, enrich, act) that turns a list of cited pages into a queue of paste-ready actions.

5 min read

Most AI visibility tools stop at "ChatGPT cited this URL". That is the floor, not the ceiling. Citation Source Intelligence (CSI) is the discipline of treating every cited URL as a structured object with an authority score, a source type, an actionability state, and — for the ones still open — a paste-ready draft for the comment, pitch, or PR the user can send.

The premise is simple: a Wikipedia article, an archived Reddit thread, a 2022 listicle, and a Hacker News story two days old are all "AI citations". Treating them as the same row in a dashboard is the single biggest reason teams stare at GEO reports without ever taking action. CSI is the structure that makes them different.

What CSI actually is

CSI is not a single feature. It is the pipeline that runs over every URL the tracked AI engines reference for your queries. Three layers, each adding a different kind of structure on top of the raw citation:

Layer 1 — Extract

For every scan result, we parse the AI's response, identify URLs that were cited (linked, footnoted, or named as the source of a fact), and record them in citation_sources with the query, engine, and timestamp they came from. This is the cheapest layer — pattern matching, no LLM, no remote fetch. The output is a clean list: "these are the URLs the engine pulled from for your queries this week."

This is where most other AI visibility tools stop. CSI uses it as input.

Layer 2 — Enrich

For every cited URL, we fetch the page, classify the source type (reddit_thread, hn_story, listicle, news_article, awesome_list, docs_page, podcast, youtube_video, forum_thread, comparison_page, …), pull platform-specific signals (Reddit archived / locked flags, publication date, author, comment count, score), and run an actionability classifier that decides whether the venue is open for input. The four states are described in detail in the Citation actionability concept.

The output of Layer 2 is the difference between "ChatGPT cited a Reddit thread" and "ChatGPT cited a Reddit thread that was archived eight months ago — there is no comment box."

Layer 3 — Act

For URLs classified as Live or Limited in Layer 2, we generate a tailored draft for the action that fits the source type:

  • reddit_thread → a value-first Reddit comment with affiliation disclosure, mention of two or three real alternatives, and the etiquette guards (no superlatives, ≤160 words, single link) that keep the comment from being auto-removed.
  • hn_story → a Hacker News comment sized for the 14-day reply window and the HN tone (concrete, technical, no marketing voice).
  • listicle / comparison_page / news_article → an email pitch to the author, framed as a useful update for a future piece rather than a request to be added.
  • awesome_list / github_repo → a pull request description for the README entry.
  • podcast → a guest pitch matched to the show's format.
  • youtube_video → a partnership outreach rather than a comment (comments on stale uploads have low visibility).

For URLs classified as Frozen or Manual, the generator does not run. Drafting a comment for an archived thread is worse than drafting nothing — it wastes tokens, looks naïve, and trains the user to ignore the output.

How the layers connect

The three layers share data through the citation_sources and citation_actions tables. A given URL flows through them in order, but each layer is independently re-runnable: a refreshed actionability classification (Layer 2) does not require a new draft (Layer 3) unless the state changed; a new draft does not require a re-fetch unless the source content drifted.

This separation matters because the layers have different cost profiles. Layer 1 is free. Layer 2 costs one HTTP fetch per URL. Layer 3 costs an LLM call per draft — the largest cost in the pipeline — and is the one we most aggressively gate on actionability state.

Why this is its own concept (not just a feature)

Most product surfaces — the Citations tab in the dashboard, the action queue in the weekly insights digest, the per-source detail view — are read views over the CSI pipeline. Understanding the pipeline as a concept makes the UI behavior predictable: why a Reddit thread shows a "Frozen" badge and no draft button; why a fresh HN story shows a draft within minutes; why actionability re-classifies on its own when a thread crosses the 30-day or 180-day boundary; why some URLs show enrichment data (comment count, age, author) and others only show the URL.

Concepts that pair with this one:

For the dashboard view of CSI, see Citation Source Intelligence in the Citations tab. For the strategic argument behind why the action layer matters in the broader GEO category, the blog post 22 AI Visibility Tools, $25 to $699 a Month. Not One Tells You What to Do Next. maps where this fits in the competitive landscape.