Content audit — what we crawl, score, and generate

The multi-page audit that grades how AI-discoverable your site is, plus the JSON-LD and FAQ schema generators that fix the gaps.

5 min read

The content audit is the part of GEO Tracker that grades your own site, not the AI's answers. Once a week (and on demand) we crawl up to the page-cap of your plan, parse each page, and score four dimensions of "is this page set up for AI to cite?". We then generate copy-ready JSON-LD and FAQ schema where the score says you're missing it.

What we crawl

The audit picks pages from your sitemap, ranked by signals AI engines use to weight which pages "represent" your site:

  • Homepage — always included, it's the canonical entry.
  • Pricing — high citation pull when AI is asked about cost.
  • About / company — primary source for brand-identity claims.
  • Docs / guides — where AI typically finds technical accuracy.
  • Feature / product pages — the noun cards AI uses for "X for use case Y" sentences.
  • Blog posts ranked by lastmod priority — recent, well-described content scores higher.

Plan caps: 10 pages per weekly audit on Pro, 25 on Business. The homepage is always one of those slots. Free plan has no automated audit — see Pricing.

The four-dimensional readiness score

Each page is graded on:

DimensionWeightWhat it checks
Crawlability25 %HTTP status, render mode, canonical, robots, language tags.
Semantic markup30 %JSON-LD coverage, schema.org type fit, structured data validity.
Content clarity25 %Title + meta description fitness, H1, semantic heading flow.
Citation friendliness20 %Author + date metadata, source links, FAQ structure, alt text on images.

The four sub-scores combine into a 0–100 readiness number per page. The homepage's number powers the Discovery Readiness card on Overview; all pages roll up to a domain-wide trend chart.

What you actually do with the score

Each row in the audit list shows the readiness number plus a small set of actionable issues — for example, "Pricing page has no Product JSON-LD", "Blog post is missing an author block", "Feature page meta description is 36 characters (target 120–160)".

For schema-related issues we offer a one-click JSON-LD generator: the dialog produces a copy-ready block typed for the page's URL pattern and pre-filled with the page's actual title, description, OG image, and canonical URL.

JSON-LD types we generate

We type the schema by URL pattern so what we hand you matches what AI parsers expect:

URL patternSchema type generated
/ (homepage)Organization + WebSite
/aboutAboutPage + Organization
/pricingProduct (with Offer items)
/blog/*BlogPosting
/docs/*TechArticle
/enterprise, /businessService
/careers/*JobPosting
Anything elseWebPage (safe default)

If the page is missing the data we'd need to generate honest schema (e.g. a blog post with no description), we refuse to generate and show an amber warning row instead. Filling JSON-LD with placeholders would make AI parsers worse at trusting your site, not better.

FAQ schema generator

Separately from JSON-LD, the FAQ generator looks at any page with a <h3>Q…</h3> / <p>A…</p> pattern (or markdown equivalent) and produces a FAQPage schema block with the questions and answers preserved verbatim. This is the highest-leverage schema type for AI visibility today — engines use FAQ blocks heavily as direct answer sources.

What we don't do

Re-running an audit

Click "Re-run audit" on the dashboard hero at any time. The same scrape pass populates both the readiness score (Page health) and the multi- page audit list — they share one fetch each, so they can never disagree.

A single audit takes 5–30 seconds depending on page count and your sitemap discovery speed. The cron runs Mondays 04:00 UTC by default.

What sits one layer below this

Content audit grades page-level signals, but it assumes AI bots can actually fetch the pages in the first place. If your robots.txt blocks OAI-SearchBot or your homepage sends X-Robots-Tag: noindex, the highest-quality JSON-LD in the world won't help. Run the AI Crawlability Monitor first — see how AI bots discover your site for the underlying taxonomy.