CiteGround / methods / full disclosure

How every number on this site is made.

A measurement practice owes you its methodology. This page is the full disclosure: what we query, how readings are built, what the labels mean, what we refuse to measure, and what happens to your data.

What exactly do we query?

The model vendors' APIs with web grounding enabled, accessed through a data provider. Consumer apps can differ; we spot-check them and say where it matters.

Our readings query the AI vendors' models with web search enabled, accessed through DataForSEO's API layer. That is the honest answer to "is this actually ChatGPT?": it is the ChatGPT models answering with live web grounding, through an API, which is the only way answers can be archived, repeated, and compared. The consumer apps add personalization and memory on top; where a consumer surface diverges from the API reading, we run manual spot-checks and report the engine as directional rather than metric-bearing. No proxy pretends to be a person.

Display name	Engine id	What it queries
ChatGPT (API, web-grounded)	chatgpt_d4s	OpenAI's ChatGPT models with web search enabled, via the DataForSEO LLM Responses API
Perplexity (API)	perplexity_d4s	Perplexity's sonar models (web-grounded by default), via the DataForSEO LLM Responses API
Google AI Overviews	google_aio	Live Google SERPs with AI Overviews loaded, via the DataForSEO SERP API

How is a reading built?

A basket of 12-30 buyer prompts, 3 runs per prompt per engine, rolling windows. Every run lands in an append-only archive, from run #1.

One prompt, once, is an anecdote. A reading is a basket of 12-30 prompts your buyers actually ask, each run 3 times per engine, aggregated over a rolling window. Every run is archived append-only with its timestamp: numbers get superseded by newer entries, never rewritten. The evidence log shows this in practice, including the entries where new data revised our own conclusions.

What does the variance band measure?

Run-to-run agreement of the named-brand set per prompt. A brand can be stably present while the full set churns; presence and set-stability are different measurements.

Presence counts ("named in 15 of 15 runs") and the band ("set agreement 22%") can sit side by side without contradiction: the first tracks one brand, the second tracks whether the whole roster of named brands repeats run to run. A stable winner inside a churning answer is common, and it is exactly the nuance a single fake-precise number hides.

What don't we measure?

Per-query AI rank (under 1% repeatable in the SparkToro/Gumshoe study), personalization effects, and anything we cannot archive and re-run.

The SparkToro/Gumshoe study (2,961 runs of identical prompts, late 2025) found the same brand list returned less than 1% of the time. We will not sell a metric that fails that test, and we do not model per-user personalization: our readings are the unpersonalized web-grounded answer, which is the shared baseline every buyer variation departs from.

How is your data handled?

Prospect and client readings stay private. We never publish a reading of a company in our pipeline; published teardowns are our own properties or labeled category readings.

GapCheck submissions and client baskets are working data, not content. Published teardowns are limited to our own properties and clearly labeled category readings of public answers. Client results get published only when a contract explicitly bought naming rights.