Citation actionability — Live, Limited, Frozen, Manual

Why we classify every cited URL into four states before generating a draft, and what each state means for the action you can take.

5 min read

A citation tool that drafts a Reddit comment for an archived thread is worse than one that drafts nothing — it wastes tokens, looks naïve, and trains the user to ignore the output. Citation Source Intelligence classifies every cited URL into one of four actionability states before the LLM is ever called.

The four states

Live

The venue is open and recent — a thread accepting new comments, a listicle from the last 12 months whose author is reachable, an awesome-list repo with an active CONTRIBUTING.md. Live sources get a draft generated on demand.

Limited

The venue is technically open but degraded:

  • A Reddit thread older than 90 days but not yet archived.
  • A listicle / blog post / comparison page over 12 months old (authors rarely update legacy roundups — sequel pitches usually beat asking for an edit).
  • A news article over 30 days old (most outlets close comment threads after a month, but the author may still reply by email).
  • Long-tail forum threads (necroposting is poorly received on most platforms).
  • Documentation pages (no comment venue, but a maintainer pitch or PR is plausible).

Limited sources still produce a draft, with a yellow warning above the Generate button explaining the degraded reach.

Frozen

The venue does not exist or is permanently closed:

  • Reddit threads marked archived or locked in the JSON API, or removed_by_category set, or selftext / author replaced by [removed] / [deleted].
  • Hacker News stories older than ~14 days (HN auto-locks comments after 2 weeks).
  • HTTP 404 / 410 / 451 — the page no longer exists.
  • Wikipedia, Mozilla Developer Network, IETF, RFC Editor, W3C, ISO, IANA, kernel.org, cppreference, man7 — informational references with no public comment venue and (in Wikipedia's case) a strict conflict-of-interest policy that turns brand-side editing into a reputational liability.

Frozen sources never reach the LLM. Instead the dashboard renders a replacement strategy card with a one-click URL where applicable:

  • "Search r/SaaS for fresh threads" — opens the subreddit's search page filtered to recent posts on the same topic.
  • "Submit your own Show HN" — opens news.ycombinator.com/submit.
  • "Find an active podcast in the same beat" — opens a Listen Notes search.
  • "Build a canonical resource on your own domain" — explanatory copy, no URL.
  • "AI will drop this citation as it re-indexes" — for 4xx defunct pages where no action is needed.

Frozen does not equal done — it just means the action moves off-source.

Manual

The venue is open but requires a logged-in identity we cannot automate:

  • Twitter / X replies (need an authenticated account).
  • YouTube comments (need a logged-in Google account).

Manual sources do not produce a draft — instead the action panel hands the user a direct link to the venue and a short reminder to keep tone consistent with the etiquette guards we apply elsewhere.

Why four buckets and not three

Earlier versions of CSI used three states (Ready / Limited / Frozen) and put Twitter and YouTube under Limited. That conflated two genuinely different problems: "the venue is degraded" (sequel pitch helps) versus "the venue is open but we cannot drive a keyboard" (login is required, period).

Splitting them out makes the UX honest. A Limited podcast feed gets a sequel-pitch hint; a Manual Twitter reply gets a "we cannot post for you" hand-off. The action plan looks different for each.

How the classification is computed

The classifier is a pure function over already-extracted signals — no LLM, no remote calls. The decision tree:

  1. Hard 4xx (404 / 410 / 451) → Frozen with reason Source returned HTTP <status> — page no longer exists.
  2. Other 4xx (403, 429, etc.) → Limited with reason We could not crawl this source — open the URL manually before drafting.
  3. Informational hostname blocklist → Frozen.
  4. Reddit JSON flagsremoved, locked, or archived → Frozen with the matching reason. Otherwise, age > 90 days → Limited; else Ready.
  5. HN age > 14 days → Frozen (auto-locks comments after 2 weeks).
  6. Listicle / blog / comparison age > 12 months → Limited.
  7. News article age > 30 days → Limited.
  8. Podcast last episode > 6 months → Limited (likely defunct feed).
  9. Twitter / YouTube → Manual.
  10. Docs page → Limited (no comment venue, PR / outreach instead).
  11. Forum thread age > 12 months → Limited (necroposting).
  12. Default → Ready.

The full rule table lives in lib/citations/actionability-classifier.ts in the codebase and is exercised by ~25 unit tests.

What this means for cost

Frozen sources skip the LLM entirely. On a typical Pro-tier dataset, roughly 10–20 % of cited URLs land in Frozen at any given moment (archived Reddit, defunct domains, Wikipedia / MDN refs) — that is 10–20 % of LLM cost we never pay because the system refuses to draft into a wall.