Deepak Gupta

By Deepak GuptaPublished June 1, 2026GEO

The GEO Measurement Study: 50,000 AI Citations in 90 Days, What Actually Moves Citation Share

I tracked 50,000 citations across ChatGPT Search, Perplexity, Claude, Gemini, Google AI Overviews, and Bing Copilot for 90 days. What actually moved citation share, and what didn't.

I tracked 50,000 AI engine citations across the six major grounded engines for 90 days starting in February 2026. Here is the methodology, the per-engine results, and the patterns that actually moved citation share.

TL;DR

240 pages, 200 prompts, six engines, twice a week for 13 weeks. 50,431 citations recorded.
Deep sameAs on Person and Organization schema produced the largest lift: +34% overall, +52% on Gemini and Bing.
Visible dates with matching dateModified came second: +22% overall, +41% on Claude.
Backlinks, keyword density, and length for its own sake moved nothing inside the retrieved set.
Gated pages earned 14 citations against 1,847 for their ungated equivalents.
This is first-party observation on one corpus, with changes rolled out sequentially rather than isolated. Read the lifts as direction, not as effect sizes.

Most GEO advice still reads like 2024 SEO advice with the words swapped. "Write helpful content." "Build authority." "Optimise for entities." Useful as far as it goes, but unmeasured. So I ran an experiment with a fixed corpus, a fixed query set, and a tracker that pulled live responses from six engines twice a week for 13 weeks. What follows is what the data showed, not what the playbooks say.

The setup, in plain terms

The corpus was 240 pages spread across CIAM Compass, GEO Compass, Hash Lab, and the apex blog at guptadeepak.com. The query set was 200 prompts, partitioned five ways: 40 definitional ("what is X"), 40 comparison ("X vs Y"), 40 implementation ("how do I implement X"), 40 buyer-intent ("best X for Y stage"), and 40 freshness-sensitive ("latest changes to X in 2026").

Each prompt was issued twice a week to six engines: ChatGPT Search, Perplexity, Claude (with web search on), Gemini, Google AI Overviews, and Bing Copilot. A citation counted as either an inline link or a citation block referencing one of the 240 tracked URLs. The methodology mirrors the one I described in measuring AI visibility and is consistent with the broader benchmark I keep updated at geo-compass/methodology.

Total citations across the 90 days: 50,431. The split by engine is the first interesting result.

Parameter	Value
Corpus	240 pages across CIAM Compass, GEO Compass, Hash Lab, and the apex blog
Query set	200 prompts: 40 definitional, 40 comparison, 40 implementation, 40 buyer-intent, 40 freshness-sensitive
Engines	ChatGPT Search, Perplexity, Claude with web search, Gemini, Google AI Overviews, Bing Copilot
Cadence	Twice weekly for 13 weeks, starting February 2026
Citation definition	Inline link or citation block referencing one of the 240 tracked URLs
Total recorded	50,431 citations
Design limitation	Changes rolled out sequentially, not isolated. Lifts are observational, not causal

Per-engine behaviour differs more than the playbooks admit

ChatGPT Search produced 18,200 citations (36%). It is the most generous citer on definitional and implementation queries and the most concentrated on the top three sources per answer. If you only optimise for one engine, optimise for the one your buyers use most, which is still ChatGPT in 2026.

Perplexity produced 12,900 citations (26%). Perplexity rewards breadth: it routinely cites 6 to 12 sources per answer, which means it is the easiest engine to enter and the hardest engine to dominate. A property with 8 citation-eligible pages on a topic gets cited as a side reference even when it is not the lead source.

Google AI Overviews produced 8,400 citations (17%) but with a heavy tilt toward properties that already rank in the top 5 of classic Google for the same query. Backlinks still matter here, indirectly, because they shape what gets into the retrieval window.

Claude (with web search on) produced 5,600 citations (11%). Claude is the strictest about freshness: it actively penalises content without a visible dateModified within the last 6 months on time-sensitive queries.

Gemini produced 3,500 citations (7%) and Bing Copilot produced 1,831 (3.6%). Gemini and Bing are the engines where entity authority moves the needle most, because both rely heavily on Microsoft and Google knowledge graph lookups before grounding.

Engine	Citations	Share	Behaviour
ChatGPT Search	18,200	36%	Generous on definitional and implementation, concentrated on top three sources
Perplexity	12,900	26%	Rewards breadth, 6 to 12 sources per answer. Easy to enter, hard to dominate
Google AI Overviews	8,400	17%	Tilted toward properties already ranking top 5 in classic Google
Claude	5,600	11%	Strictest on freshness. Penalises stale dateModified on time-sensitive queries
Gemini	3,500	7%	Heavy knowledge-graph lookup before grounding
Bing Copilot	1,831	3.6%	Same. Entity authority moves it most

The takeaway is that share-of-answer is not a single number. It is six numbers, and the same content can rank top-3 on one engine and invisible on another. State of AI Search 2026 covers the engine differences in more depth.

I made 12 controlled changes over the 90 days, one or two per week, and watched what moved. Five categories of change produced statistically meaningful lift. Most of the rest were noise.

1. Person and Organization schema with sameAs depth

The single highest-impact change was adding deep sameAs arrays to Person and Organization JSON-LD blocks across the corpus. "Deep" means at least 8 entries per entity: Wikidata, Crunchbase, LinkedIn, GitHub, the company About page, the personal site, the Twitter or X profile, and a Mastodon or Bluesky handle.

Citation lift after rollout: +34% across all engines, +52% on Gemini and Bing specifically. The mechanism is straightforward. Both engines do entity-linking before grounding, and an entity that resolves to the same Wikidata QID across 8+ external sources is treated as more authoritative than one that resolves through a single thin About page. I covered the implementation in entity authority for AI engines.

2. Explicit dating and dateModified discipline

Adding visible "Published" and "Last updated" dates to every page, with matching datePublished and dateModified in JSON-LD, produced the second-largest lift.

Citation lift after rollout: +22% overall, +41% on Claude (the strictest engine on freshness), +18% on Perplexity. The reason Claude moves so much is that Claude's grounding prompt explicitly penalises content where the dateModified is more than 12 months old on time-sensitive queries, and most properties simply don't expose dateModified at all.

Important nuance: dateModified that lies will get you penalised once detected. I refreshed the actual content quarterly on the 60 most-cited pages. Bumping the dateModified field without changing the content produced no measurable lift on the second pass.

3. llms.txt and llms-full.txt presence

Publishing /llms.txt and /llms-full.txt at the root of each portal produced a clear lift, but smaller than the schema and date changes.

Citation lift after rollout: +11% overall, concentrated almost entirely on Claude and Perplexity. ChatGPT Search and Google AI Overviews showed no measurable change. The mechanism is that some engines prioritise the llms.txt-listed URLs in their crawl frontier, and Claude's web tool specifically reads llms.txt when one is present.

Update, July 2026. Large-scale log evidence published since this experiment does not support a general llms.txt effect. Ahrefs analysed 137,000 domains and found 97% of llms.txt files received zero requests in May 2026, with AI retrieval bots accounting for roughly 1.1% of requests to the files that were fetched. My +11% observation came from a window in which several changes shipped together, so the design does not support attributing it cleanly to llms.txt. Treat it as unreplicated. The fuller picture is in the llms.txt guide.

4. Chunk-level structure with citable sentences

I rewrote 30 high-value pages so that each H2 and H3 was followed by a single-sentence answer (the "claim"), then the supporting paragraphs. The pattern matches what I described in citation-worthy content patterns.

Citation lift after the 30-page rewrite: +18% on the rewritten pages, with no measurable carryover to the rest of the corpus. The lift was concentrated on definitional and implementation queries. This is the change that has the biggest per-page ROI, but it doesn't scale through a sitewide template change. You have to actually rewrite the prose.

5. Methodology pages

Adding visible methodology pages for benchmarks, vendor scoring, and any quantitative claim produced a lift specifically on "how was X measured" and "is X trustworthy" types of grounding follow-ups.

Citation lift after rollout: +9% overall, +24% on buyer-intent queries. Buyer-intent queries are the highest-conversion category, so this is the change I would prioritise if I were starting over. The pattern is simple: every vendor scoring page now links to a top-level methodology page that explains the rubric. See geo-compass/methodology for the model.

This is the part of the experiment that surprised me most.

Backlinks did not move citation share within the retrieved set. They still matter for SEO retrieval (Google AI Overviews and Bing both weight them), and they correlate with citation share because authoritative properties tend to have both. But within a controlled corpus, acquiring 40 new backlinks to 10 of the tracked pages produced no measurable lift in AI citations to those pages, while the same pages did move on Google's classic results.

Generic FAQ schema is now actively devalued. Three engines (Claude, Gemini, ChatGPT) appear to discount FAQ schema generated from boilerplate questions that don't match the body content. I removed FAQ schema from 12 pages where the FAQ was a tacked-on appendix. Citation share on those pages went up by 7% on average over the following 4 weeks. The lesson: FAQ schema should only wrap genuinely answer-shaped content that's already in the body, not a separate list bolted on.

Length for its own sake did not move citation share. I expanded 8 pages from ~1,500 words to ~3,500 words by adding examples, edge cases, and longer explanations. Citation share on those pages was statistically flat. The pages that did move on length were the ones where the added content was a new H2 answering a query not previously addressed at all.

Keyword density did not move citation share. This should be obvious in 2026 but is still in some playbooks. I varied keyword frequency by 3x on a control set of 6 pages. Zero correlation with citation share on any engine.

The surprising finding: gated content has effectively zero citation share

Two of the 240 tracked URLs were gated whitepaper landing pages (form-fill required to access the PDF). Across 90 days, they earned 14 citations combined, or 0.03% of the total. The non-gated equivalents (open analyses on the same topics) earned 1,847 citations combined.

This confirms the argument in why gated whitepapers are killing your AI visibility. The grounding pipelines simply do not see what's behind a form. If your best analysis is gated, it might as well not exist as far as AI engines are concerned.

What this means for next quarter's content investment

If I had to pick three things for a B2B SaaS team starting from zero:

Fix entity authority first. Deep sameAs on Person and Organization schema, consistent canonical names across the property, methodology pages. The lift is large and the work takes one engineer-week. Vendor selection: see GEO Compass vendor index.
Add visible dates and dateModified discipline second. Every page, header rendered, JSON-LD matched, quarterly refresh on the top 60.
Rewrite the top 20 pages for chunk-level structure third. Claim first, support after, one idea per sentence on technical content. Citation share goes up roughly 15 to 20% on the rewritten pages within 60 days.

Everything else (llms.txt, schema beyond Person/Org, FAQ rewrites) is incremental. Useful, but not the headline. Citation share is the metric you should be reporting to your board, and the work above is what moves it.

The broader picture

Two pieces of context for anyone reading this and wondering whether to bother. First, my apex piece on the 70% product content advantage still holds: product content earns the citations, blogs don't. The 50,000-citation data confirmed it. Of the 50,431 citations, 38,200 (76%) landed on product-style pages: vendor profiles, comparison tables, algorithm reference pages, methodology pages. Blog posts earned 12,231 citations (24%) despite making up roughly 40% of the tracked corpus.

Second, the architecture matters. The grounding pipelines are RAG pipelines, and the same retrieval-quality principles apply. Understanding RAG architecture is the technical foundation if you want to reason from first principles rather than copy a checklist.

I will refresh this study in August 2026 with another 90-day window and publish the deltas. If you want to replicate it on your own corpus, the methodology is open at geo-compass/methodology.

Get the newsletter

New writing on identity, AI security, and building software, delivered when it ships. No tracking pixels, no funnels, unsubscribe with one click.