The 8 GEO Metrics We Report to Clients (and Why Most Agencies Track the Wrong Ones)

A folder of ChatGPT screenshots is a slide, not a report. These are the eight GEO metrics that survive the reliability test and the revenue test.

You are paying a GEO agency, and every month you get proof that the work is happening. The proof is a set of screenshots showing your brand inside a ChatGPT answer. That is the situation most marketing leads are in, and it looks like reporting without actually being reporting.

A screenshot captures one answer, on one run, for one phrasing of one question. It tells you that you appeared, and nothing more. It does not tell you whether the appearance repeats, whether a buyer ever saw it, or whether it produced a demo.

What most teams miss is that AI visibility without those three answers is a vanity metric. This piece lays out the eight metrics that belong in a GEO report, and the vanity metric each one replaces. After reading it, you will be able to look at any GEO report, including the one you already pay for, and tell within a minute whether it measures real progress or just decorates a dashboard. Start with the question the screenshots are quietly dodging.


Why ChatGPT screenshots aren’t a real GEO report

A real GEO report has to answer two questions a screenshot cannot: did you measure this enough times to believe it, and did any of it produce revenue? Call them the reliability test and the revenue test. Most weak reports fail both.

The reliability test matters because AI answers are not stable the way a Google ranking is. Ask the same buying question twice, and the brands in the answer can change between runs. A screenshot freezes one lucky moment and sells it as a trend.

The teams building AI visibility tools have started making this point in formal terms. Their finding is that a single observation is close to meaningless, and that visibility should be read as a range built from many runs rather than one number.

The revenue test matters because visibility is not the goal; the pipeline is the goal. A report can show rising mentions all quarter and still hide the one fact a founder cares about, which is whether AI sent anyone who became a customer.

The eight metrics below are ordered to move from the first question a buyer asks an AI model to the demo that lands in your CRM. Here are the eight GEO KPIs at a glance, then one by one.


The 8-metric standard at a glance

GEO metricWhat it tells youThe vanity metric it replaces
AI Visibility Score (AVS)Whether your presence in AI answers is rising or falling, on one comparable number“You appeared in ChatGPT” screenshots and raw mention counts
Citation share and positionHow much of the citation pie you own, and whether AI cites you early or buries youBacklink counts and domain rating as the headline
Recommendation rateWhether AI recommends you as a buy-it option, not just names you in passingPassive brand-mention frequency
Commercial-intent prompt coverageWhether you show up on the prompts that come right before a purchaseTotal impressions and informational-only coverage
Answer accuracy and brand representationWhether AI describes your pricing, positioning, and category correctlySkipped entirely in most reports
Share of voice vs named competitorsWhether you win or lose the head-to-head buyers actually runTracking yourself in isolation
AI referral traffic and assisted conversionsWhat AI visibility sends you, and what it does after it landsZero-click “estimated reach”
Pipeline and revenue influenced by AIWhether any of the visibility work created moneyTraffic treated as the finish line

1. AI Visibility Score (AVS): one number instead of a screenshot folder

AI Visibility Score (AVS) is a single 0 to 100 score for how often and how prominently AI models name your brand across a fixed set of buyer questions, measured the same way every week. We score each appearance by prominence:

  • Named directly in the answer: 5 points
  • Linked as a source: 3 points
  • Mentioned only in passing: 1 point
  • Brand absent from the response entirely: 0 points

The points are summed across the prompt set and across ChatGPT, Perplexity, Claude, and Gemini, then normalized to a 0 to 100 scale you can track on a single line. 

Google AI Overviews now sit on top of a large share of buyer searches, so a complete report watches that surface too, not only the chat tools.

DerivateX coined the AVS metric for a plain reason. Clients kept asking how they would know the work was paying off, and a folder of screenshots was not an answer. A score replaces the folder, and it tells you whether your standing in AI answers rose or fell this month.

2. Citation share and position: who AI trusts, not who has the most links

Citation share is the percentage of all the sources AI cites in your category that point to you, and citation position is whether AI leans on you among its first sources or buries you near the bottom. Splitting owned citations from third-party citations shows whether AI trusts your own pages or trusts other people talking about you.

Earning citation share comes down to two jobs done together: clean entity and schema signals so AI knows exactly who you are, and third-party coverage so independent sources back it up. A backlink count tells you how many links exist. Citation share, what some tools call your citation rate, tells you whether AI decided you were worth quoting.

3. Recommendation rate: being chosen, not just named

Recommendation rate measures how often AI actively recommends you, not how often it mentions you. There is a real gap between an answer that lists your category and happens to include your name, and an answer to “best [category] for [use case]” that puts you on the shortlist.

The second one is a buyer being handed your name at the moment of choice. Tracking recommendation rate on its own stops a report from turning a few incidental mentions into the impression that AI is selling you. Often it is not, and that is the thing you want to find out.

4. Commercial-intent prompt coverage: visibility where buyers actually decide

Commercial-intent prompt coverage is your visibility on the prompts that sit right before a purchase, mapped to buyer-journey stage, rather than the informational prompts far above it.
Showing up for “what is [category]” is pleasant. Showing up for “[your product] vs [competitor] pricing” is the visibility that moves a deal.

There is a newer surface to track here as well. Buyers are starting to let AI agents browse, compare, and shortlist for them through tools like ChatGPT agents, Perplexity Comet, and Claude for Chrome. Agent Search Optimization is the work of being surfaced and picked when those agents do the comparing.

A report built on total impressions misses all of this, because impressions count the top of the funnel and ignore the bottom.

5. Answer accuracy and brand representation: what AI tells buyers about you

Answer accuracy and brand representation track whether AI describes you correctly: the rate of made-up claims, the rate of outdated information, and whether your pricing, positioning, and category come through right.

Most reports skip this metric, which is the reason it belongs in yours.

Visibility with the wrong story attached can lose you a deal before a human reads your site. If a model tells a buyer you charge double your real price, or files you under the wrong category, the damage is done quietly. One head of growth we work with found that three of four models still described her product with a feature it had retired a year earlier.

Catching that is technical work, and a screenshot will never show it, because the screenshot only captures the runs where you looked good.

6. Share of voice vs named competitors: the head-to-head that decides deals

Share of voice against named competitors is your visibility relative to the specific rivals your buyers compare you to, tracked over time, including the moments you take a competitor’s citation or they take yours.

Tracking yourself in isolation feels productive and tells you almost nothing.

A rising mention count means little if the two competitors in every deal are rising faster. Buyers rarely evaluate you alone. They ask AI to compare three options, and the answer is a relative ranking.

A report that does not name your real competitors and chart you against them is timing a race by watching one runner.

7. AI referral traffic and assisted conversions: what actually shows up

AI referral traffic and assisted conversions measure the AI sessions tagged in GA4 by source, followed through to signups and demos, instead of a zero-click “estimated reach” number nobody can verify.

The volume looks small at first. It is also the highest-intent traffic most teams are not measuring at all. The gap shows up in our surveys: a large share of marketers say they optimize for AI search, while only a small fraction actually track how they perform in it.

There is a reason the low volume matters less than it looks. In a study of more than 500 high-value topics, Semrush found that the average visitor from AI search is worth about 4.4 times a traditional organic visitor, measured by conversion rate, because the model has already done the comparison work before the click.

That premium is strongest in B2B and research-heavy buying, and thinner in e-commerce, so the honest version of the claim is that AI traffic is unusually valuable for SaaS in particular. A good report shows you the real tagged sessions and what they did next, not an estimate. For the full method, see our breakdown of measuring AI search ROI.

8. Pipeline and revenue influenced by AI: the metric that justifies the spend

Pipeline and revenue influenced by AI is board-ready attribution that runs from AI-sourced sessions through to demos, signups, and revenue. This is the metric that ends the argument about whether GEO is working, and the one most agencies cannot produce.

Treating traffic as the finish line is how a report stays comfortable and useless at the same time. It can be produced, though. Gumlet attributes about 20% of its direct monthly inbound revenue to AI search across ChatGPT, Claude, and Perplexity, a figure their co-founder Divyesh Patel can point to in an attribution dashboard rather than a slide.

The work that gets a client to that number is the same work the other seven metrics measure. The eighth is where it shows up as money.


How to know the GEO numbers in your report are real

Reliability is not a ninth metric; it is the thing that makes the other eight worth reading. Any of these numbers is worthless if it came from a single manual check, because AI answers move with the run, with small changes in wording, and with model updates. The research mentioned earlier reached the same conclusion: one look cannot be trusted, and visibility should be reported as a range from repeated runs rather than a single figure.

A report you can trust shows its method on its face:

  • The sampling cadence. We run client prompts Monday, Wednesday, and Friday across all four models, so a number reflects a pattern and not a good day.
  • The baseline and the before-and-after delta, so a change has something to be measured against.
  • The variance, so you can see how stable a result actually is.
  • The prompt set and any model-version changes, so a drop reads as a real shift and not a measurement glitch.

Ask one question of any GEO report in front of you: if I re-ran this tomorrow, would the number hold? If the report cannot answer that, the metrics on it are decoration.


Frequently asked questions

What metrics should a GEO agency actually report to me?

A GEO agency should report eight GEO KPIs, each replacing a vanity one. Those are AI Visibility Score, citation share and position, recommendation rate, commercial-intent prompt coverage, answer accuracy, share of voice against named competitors, AI referral traffic with assisted conversions, and pipeline influenced by AI. 

The first seven describe where and how you appear in AI answers. The eighth ties that appearance to demos and revenue.

 If a report shows mention counts and screenshots but never reaches pipeline, it is measuring activity, not outcomes, and it leaves the only question a founder cares about unanswered.

Why isn’t a screenshot of my brand in ChatGPT a real report?

A screenshot captures one answer, on one run, for one wording of one question. AI answers change between runs, so a single capture proves nothing about whether the result repeats or whether a real buyer saw it. It also says nothing about pipeline. 

A real report measures the same set of prompts repeatedly across ChatGPT, Perplexity, Claude, and Gemini, shows the trend and the variance, and connects that visibility to signups and revenue. The screenshot is a snapshot of a lucky moment. The report is evidence of a pattern that produced something.

How do I know if my GEO agency is actually working?

Hold the report to two questions. 

First, could the numbers be re-run and hold up? A trustworthy report states its sampling cadence, baseline, before-and-after delta, and variance, rather than showing a single check. 

Second, does it reach pipeline? Look for AI sessions tagged in your analytics and followed through to demos and revenue, not just a rising mention count. 

If the report cannot survive a re-run and never connects to pipeline, the agency is showing you activity. A working engagement shows movement on visibility and a clear line to revenue.

What questions should I ask my GEO agency?

Ask three. 

First, how do you measure performance, and can you show the sampling cadence, baseline, and variance behind every number? 

Second, how many AI surfaces do you track, and which ones? A program that only watches ChatGPT is half a program. 

Third, can you draw the line from AI visibility to pipeline inside my own analytics? If the answers are vague, or they stop at rankings and screenshots, you are buying SEO with a GEO label.

Isn’t AI search traffic too small to bother tracking this closely?

AI search traffic is small today, often around one percent of total visits, so measuring it can look like overkill. The volume is not the point. Visitors arriving from AI search convert at a much higher rate than traditional organic, because the model has already compared options before they click, which makes that small slice unusually valuable for B2B SaaS. 

The share is also growing fast. Measuring it now gives you a baseline and a method before the channel gets big, instead of scrambling to build attribution after a competitor already owns the category.

How do you attribute revenue to AI search?

Tag AI sessions in GA4 by source, then follow those sessions through to signups, demos, and closed revenue in your CRM. Done properly, this produces board-ready attribution rather than a guess. The model that referred the visitor often pre-qualified them, so AI-sourced sessions tend to convert better than their volume suggests. 

As a real example, Gumlet attributes about 20% of its direct monthly inbound revenue to AI search across ChatGPT, Claude, and Perplexity. That figure comes from an attribution dashboard, which is what separates real revenue measurement from an estimate on a slide.

Can GA4 track AI search visibility on its own?

No. GA4 can capture referral traffic from AI platforms once you tag the sources, but it cannot measure citation rate, share of voice, recommendation rate, or AI Visibility Score. Those signals live inside the AI answers themselves, not in your traffic logs, so they need prompt sampling across models or a dedicated monitoring setup. 

GA4 tells you what arrived on your site. It cannot tell you how often you appeared in AI answers, how you ranked against competitors, or whether the description of your brand was accurate. You need both layers to see the full picture.


The standard to hold your next report to

The line between a GEO report and a screenshot folder is whether the numbers can be re-run and whether they reach revenue. Everything in a credible report serves those two tests, and every vanity metric exists to avoid them.

Pull up your most recent GEO report and run it against the eight metrics here. If it stops at mentions and screenshots, ask your agency for the sampling method behind every number, and for the line from AI sessions to pipeline. You can see where you stand on the first metric in a few minutes:

See your AI Visibility Score across ChatGPT, Perplexity, Claude, and Gemini.

As buyers hand more of the comparison work to AI agents, the distance between brands that can prove AI-sourced pipeline and brands that can only show screenshots is what will decide who makes the shortlist.

Shivanshi Bhatia
Written byCo-founder, DerivateX