Evidence / lab notes, pre-registered

The measurements, the thresholds, and the dates, committed before the data.

This page exists so you can check our claims against what we said before we knew the answer. Everything here is archived append-only from run #1.

What does CiteGround measure?

Basket share of voice, grounding concentration, and brand-mention share in cited sources, each with its variance band. Never per-query rank.

Share of voice runs on baskets of 20-30 buyer prompts, 3 runs per engine per week, read over a rolling 4-week window. Grounding concentration counts the distinct domains an answer cites, which is the winnability signal: few domains, clear target list. Brand-mention share in cited sources is the intermediate metric, the fraction of cited pages per prompt that carry the brand, because it moves weeks before share of voice does. Per-query rank is excluded on principle: published tests put its repeatability under 1 percent.

What is Sprint 0?

A pre-registered experiment on our own product: matched intervention and control prompt sets, thresholds and read dates committed before data.

The test property is citeground.com itself, our own tool in its own category, which starts at 0% share of voice pre-launch (the property history is in the log below). Two questions are under test: does our placement playbook move AI answers on a live property (Set A), and does the cheap commodity variant work at all (Set B). Set B is an experiment cell, not a client delivery method: client placements are editorial and disclosed, and paid placement "into AI answers" stays on the refuse list because it does not exist. Running this on our own product means we can publish everything, including a miss.

What did we pre-register?

Four cells across two footholds: white-hat placements vs control, cheap paid placements vs control (amended to our own surfaces, entry #004). Thresholds and verdict dates fixed in advance.

Fig. 01 / Sprint 0 pre-registration summary

Cells

Foothold 1: Set A (white-hat placements + comparison pages + owner-posted answers) vs Set C (control). Foothold 2: Set B (2-4 commodity paid niche edits) vs Set C2 (control). 15-20+ prompts per cell; grounding-page sets verified disjoint at baseline within each foothold.

Thresholds, fixed in advance

A reads positive on share-of-voice delta vs control of +5 points sustained across 3+ consecutive weekly runs, or brand-mention share in cited sources up 10+ points vs control. B is judged on reach only: 50%+ of placed pages entering the cited set for their prompts. B makes no causal share-of-voice claim; the sample is too small and we say so now.

Read schedule

Day 14 + 30: process notes, verdict withheld. Reach read: provisional at day 30-45, confirmatory at placements-live + 6 weeks. Outcome read: no earlier than 8 weeks after interventions go live. Verdicts publish at those reads and only at those reads.

Will you publish a null result?

Yes. Results publish at the pre-registered read dates whichever way they point. A null here is a finding, and it ships on schedule.

A tool vendor cannot publish "mentions did not move citations" against its own category, and an agency selling certainty cannot publish a miss. This practice can, because the product is honest measurement plus the human fix, and a documented null is worth more to the next client than a massaged win. Holding a result back to wait for a balancing win would end this page's reason to exist.

Is llms.txt worth doing?

It is hygiene, not a lever. We ship one because it costs nothing, and we never bill for it. Third-party citations are where movement lives.

This site serves its own llms.txt, and our own checks show the file sitting on domains that are still absent from every answer that matters to them. Ship it, spend nothing on it, and put the effort where the evidence points: the third-party pages AI answers actually cite. Any vendor lining up schema and llms.txt as billable line items is billing you for hygiene.

Where is the weekly log?

Below, newest first. Entries land weekly; verdicts only land on the pre-registered dates. Thin weeks say so instead of padding.

#006
2026-07-03

Multi-engine day-0 baseline complete: 118 archived runs across ChatGPT, Perplexity, and Google AI Overviews. CiteGround absent on 12 of 12 category prompts on every engine, 0% share of voice. The fuller read also re-ordered the category leaderboard (Profound 28% edges Otterly 26%; AI Overviews alone had read Otterly 35%) - engines ground differently, measurably, again. The clock the standing test runs on starts here.

#005
2026-07-03

Test property changed to citeground.com, our own tool, before any intervention shipped: the prior property has other stakeholders and this experiment's subject should be fully ours. Day-0 baseline archived: absent on 12 of 12 category prompts, 0% share of voice; the category is fragmented (leader at 35%) and two prompts ground on just 8 distinct domains, which our winnability metric reads as takeable. Google AI Overviews baseline complete; ChatGPT and Perplexity baselines complete within days. The standing test: if the playbook cannot move our own number, it does not deserve your retainer.

#004
2026-07-03

Pre-registration amendment, made before any Set B order was placed: Set B (the cheap commodity-placement cell) re-targets our own directory-listing surfaces instead of the test property. Reason: the test property has other stakeholders, and the experiment's risk should sit with the practice that benefits from it. Set A (editorial placements for the test property) is unchanged.

#003
scheduled

Day-14 checkpoint: process notes publish here, verdict withheld by design. Reading outcomes this early would report noise as signal.

#003
2026-07-03

ChatGPT and Perplexity engines added to the harness (web-grounded, API-native citations). Multi-engine baseline re-run: 242 archived runs. The verdict moved: AI-Overviews-only showed Tendro absent 20 of 20; across three engines it is absent 14 of 20 with 4% share of voice, because engines ground differently. Adding engines revised our own number, which is the point of measuring.

#002
scheduled

Sprint 0 interventions go live per cell; the ladder clock starts. Placement orders and owned content logged with timestamps.

#002
2026-07-03

Full baseline shipped: 20 prompts, 3 runs each, 63 archived runs total. Tendro absent on 20 of 20. Supersedes the 3-prompt smoke reading in #001, which stays below because the archive is append-only.

#001
2026-07-02

Pre-registration published. Baseline run archived on tendro.com: absent on 3/3 checked buyer prompts, single-run, labeled directional only. The teardown is public at /t/tendro.

Baseline artifact: the citeground.com day-0 teardown.