Deepak Gupta

By Deepak GuptaPublished June 27, 2026GEO

GEO Is Experimental Science: How to Run AI Visibility Like a Product Team

SEO became a craft because you could reverse-engineer one ranking function. GEO can't work that way: you're optimizing against many non-deterministic models with probabilistic citation behavior. That makes GEO experimental science, and it should be run like a product team runs experiments.

SEO eventually became a craft. A set of known techniques, applied with skill, producing reasonably predictable results. Do these things to a page, and rankings tend to move in knowable ways. The reason it could become a craft is that you were optimizing against a single ranking function with observable, stable-enough signals that you could reverse-engineer cause and effect.

GEO cannot become that kind of craft, and the people treating it like one are going to keep being frustrated. You are not optimizing against one ranking function. You are optimizing against many different models, each non-deterministic, each with its own citation behavior, none of them offering a rank position you can reverse-engineer. That is not a craft problem. It is an experimental science problem, and it should be approached the way a good product team approaches experiments: with hypotheses, controlled tests, measurement, and iteration.

I have argued throughout this series that GEO is a product discipline. This is the methodological half of that argument. Not just that GEO lives in the product, but that GEO is done the way product work is done, through experimentation against an uncertain system, rather than through executing a fixed playbook. Here is what that means in practice.

Why GEO Resists the Playbook

Start with why the deterministic, playbook approach that worked for mature SEO breaks down for GEO. There are three structural reasons.

You are optimizing against many systems, not one. ChatGPT, Perplexity, Claude, Gemini, Google AI Overviews, and the rest each ground and cite differently. They have different training, different retrieval mechanisms, different citation rates, and famously low overlap in what they cite. A change that improves your standing on one engine may do nothing, or the opposite, on another. There is no single target to optimize against, which means there is no single playbook that works across all of them.

The systems are non-deterministic. The same prompt can produce different answers on different runs. Citation is probabilistic, not deterministic, you have a probability of being included in an answer, not a fixed position you either hold or do not. This means you cannot read a single result as ground truth the way you could read a Google rank. You have to think in distributions and probabilities, which is the language of experiments, not of checklists.

You cannot reverse-engineer a position that does not exist. Mature SEO worked partly because you could observe your rank, change something, and observe the rank move, establishing cause and effect. There is no equivalent in GEO. There is no rank to watch. The citation behavior emerges from model internals and retrieval processes you cannot inspect, and it shifts as models update. You are optimizing a black box that changes underneath you, against a probabilistic outcome, across multiple different systems. No amount of accumulated technique turns that into a deterministic craft.

Put together, these mean GEO has the essential character of experimental science: an uncertain, changing system that you can only understand through structured testing and measurement, never through a fixed set of known-correct moves.

The Experimental Loop

If GEO is experimental science, then doing it well means running the experimental loop deliberately, the same loop a product team runs when optimizing against uncertain user behavior. Four steps, repeated.

Form a hypothesis. Start with a specific, testable belief about what drives citation in your category, on the engines that matter to you. Not "better content gets cited more," which is too vague to test, but something like "publishing a structured comparison page that directly answers the top three constraint-laden queries our buyers ask will increase our citation share on Perplexity for those queries." A good GEO hypothesis names the change, the engine, the queries, and the expected direction of the effect. Specificity is what makes it testable.

Test in a controlled way. Make the change and isolate it as much as you can, so that any movement can be attributed to the change rather than to noise or to a dozen simultaneous edits. This is harder in GEO than in a classic A/B test because you often cannot run a true controlled experiment on a live model. But you can approximate control: change one meaningful thing at a time, track a defined set of queries before and after, and watch a holdout set of queries you did not touch to gauge background variation. The discipline of changing one thing and measuring is what separates experimentation from guessing.

Measure the probabilistic outcome. Because citation is probabilistic and varies by run and by engine, measurement means tracking citation behavior across a defined query set, on each engine separately, over enough observations to see a real signal through the noise. A single check tells you almost nothing; a tracked query set sampled repeatedly tells you whether your citation probability actually shifted. This is why measurement infrastructure matters in GEO, you are estimating a distribution, not reading a position, and that requires sampling over time rather than a one-shot look.

Iterate on what you learn. Feed the result back into the next hypothesis. A change that worked on Perplexity but not ChatGPT teaches you something about how those engines differ for your content, which sharpens your next experiment. A change that did nothing rules out a hypothesis and frees you to test a better one. Over many cycles, you build a model, not of a fixed algorithm, but of how citation tends to behave in your specific category, on the engines your buyers use. That accumulated, tested understanding is the real asset, and it is earned through iteration, not handed over in a playbook.

Why This Is Product Work, Not Marketing Work

The experimental loop I just described is exactly how good product teams operate. They form hypotheses about user behavior, ship controlled changes, measure outcomes that are inherently probabilistic, and iterate toward understanding. They are comfortable with uncertainty, with measuring distributions, with the idea that you learn the system by testing it rather than by reading its manual, because there is no manual.

This is a different posture from how marketing has often approached SEO, which over time became more execution-of-known-techniques than experimentation-against-uncertainty. The mature SEO playbook was real and worked because the underlying system was stable enough to support it. GEO does not offer that stability, which is why importing the playbook mindset fails and why the experimental mindset succeeds.

It is also why the people doing GEO well increasingly look like product people running experiments rather than marketers executing a checklist. The work rewards hypothesis-driven iteration on signals, many of which are product-resident, the names, the docs, the structured data, the self-description, tested against an uncertain, multi-engine, probabilistic target. That is product methodology applied to product signals, which is about as clean a case as you can make for why GEO is a product discipline rather than the next chapter of marketing.

What This Means for How You Staff and Run GEO

If GEO is experimental science done on product signals, a few practical implications follow for how you actually run it.

Staff for experimentation, not execution. The most valuable GEO people are the ones comfortable forming hypotheses, designing tests, and reading probabilistic results, the experimental temperament, not the checklist temperament. This is a different hire than a traditional SEO executor, and it overlaps heavily with how you would staff a product growth team.

Invest in measurement infrastructure. Because you are estimating distributions across engines over time, you need the ability to track defined query sets repeatedly, per engine, and to see signal through noise. Without that, you are guessing rather than experimenting. This is why the GEO tooling category exists, and why measurement is foundational rather than optional.

Expect to learn your own category, because no one can hand it to you. The output of running the experimental loop is a tested understanding of how citation behaves in your specific vertical, on your buyers' engines, for your kind of content and product. That understanding is genuinely valuable precisely because it cannot be bought as a playbook, it has to be earned through experimentation, which means a competitor cannot simply copy it.

Treat model changes as a permanent condition, not a disruption. The engines update, and your hard-won understanding partially decays each time they do. This is not a bug in your process; it is the nature of optimizing against a moving system. The experimental loop is what lets you re-learn quickly when the ground shifts, which is exactly why the loop, rather than any fixed set of conclusions, is the durable asset.

The companies that win at GEO will not be the ones who found the right playbook, because there is no fixed playbook to find. They will be the ones who built the capacity to run the experimental loop well, on product signals, against an uncertain multi-engine target, and to keep running it as the engines change. That is a scientific and product capability, not a marketing checklist, and recognizing the difference is most of what it takes to do GEO seriously.

Related reading:

Crawl Budget Is Now an AI Visibility Problem, the technical floor under all AI visibility
GEO Is a Product Discipline, Not a Marketing Tactic, the thesis this methodology completes
Why GEO Has to Be Vertical, why your category understanding can't be bought
The GEO Org Chart, who runs the experimental loop
GEO Compass, vendor-neutral resource including measurement methodology

Deepak Gupta is a serial entrepreneur and cybersecurity expert who co-founded and scaled a CIAM platform to serve over 1 billion users globally. He leads GrackerAI, a GEO platform built specifically for B2B SaaS and cybersecurity companies to achieve visibility in LLM search engines like ChatGPT, Perplexity, and Google AI Overviews. He writes about AI, cybersecurity, and B2B growth at guptadeepak.com.

Get the newsletter

New writing on identity, AI security, and building software, delivered when it ships. No tracking pixels, no funnels, unsubscribe with one click.