AI Visibility Score (AVS): The Metric That Measures What Google Analytics Cannot

Your monthly SEO report looks fine. Domain Rating is climbing. You rank in the top three for your main category keyword. Organic traffic is up 14% year over year.

None of that tells you whether ChatGPT recommends you.

And increasingly, that is the question your buyers are asking first. Not in a search bar. In a conversation.

  • What’s the best CRM for real estate investors?
  • Which video infrastructure tool should I use?”
  • Who are the best B2B SaaS SEO agencies?

They type those questions into ChatGPT or Perplexity, read the answer, and form a shortlist before they ever open a browser tab.

If your brand isn’t in that answer, you aren’t on the shortlist. And no metric you currently track will tell you that you’re missing.

This is the problem the AI Visibility Score solves.

AI Visibility Score (AVS) is a 0 to 100 metric that measures how frequently and prominently a brand is cited across AI tools when users ask questions relevant to its product or service.

DerivateX coined AVS and uses it as the primary reporting metric for every LLM SEO engagement. This is the complete methodology.


Why the Metrics You Already Track Do Not Measure AI Search

Before introducing a new metric, it is worth being precise about why the existing ones fail at this specific problem.

Domain Rating and Domain Authority

Domain Rating (Ahrefs) and Domain Authority (Moz) measure the strength of a website’s backlink profile relative to other sites. They are useful proxies for Google’s perception of your authority.

LLMs do not consult Ahrefs before generating a response. A brand with a DR of 8 can be the most frequently cited brand in ChatGPT for its category if its content is structured correctly, its entity signals are clear, and it has accumulated enough independent third-party mentions. DR measures one signal that partially overlaps with what LLMs care about. It does not measure AI search presence directly.

Google Rankings

A first-page ranking on Google means a Googlebot crawl evaluated your page against hundreds of ranking signals and decided it belongs in position one through ten for a particular query.

ChatGPT does not have positions. Perplexity does not return a ranked list of blue links. These tools generate recommendations in natural language, and the factors that determine whose name appears in those recommendations are meaningfully different from the factors that determine Google rankings. A brand can rank first on Google and be completely absent from AI tool recommendations in the same category.

Organic Traffic

Organic traffic in Google Analytics or Search Console measures sessions that originated from Google search results. It is a downstream metric: it captures people who clicked through, not people who considered you.

AI referral traffic does not appear cleanly in Google Analytics. ChatGPT does not consistently pass referrer data. Perplexity does, partially. A user who asked ChatGPT which project management tool to use, read the recommendation, and navigated directly to your website will show up as direct traffic in your analytics. You will never know the recommendation happened.

Share of Voice

Traditional Share of Voice in SEO measures what percentage of clicks for a set of target keywords land on your site versus competitors. It is a useful metric for Google visibility.

It measures nothing about whether your brand appears in AI-generated answers. A brand with 40% Share of Voice on Google could have zero AI search presence.

Social Listening and Brand Mentions

Social listening tools track when humans mention your brand across social platforms, forums, and news sites. This is useful for reputation monitoring.

It does not measure what AI tools say about you. A model can recommend your brand hundreds of times per day across millions of user conversations without a single one of those recommendations appearing in a social listening dashboard.

The gap is real. Every metric listed above is measuring something legitimate and useful. None of them measure AI search visibility. AVS fills that gap.


What Is the AI Visibility Score?

AI Visibility Score (AVS) is a normalised 0 to 100 metric calculated by running a defined set of target prompts across multiple AI tools, scoring each result based on how prominently the brand appears, and expressing the total as a percentage of the maximum possible score.

AVS measures three things simultaneously:

  1. Frequency: How often the brand appears across all tested prompts
  2. Prominence: Whether it is named directly, linked, or mentioned in passing
  3. Breadth: Whether it appears consistently across all four major AI tools or only in one

AVS does not measure sentiment (whether the AI tool says positive or negative things about the brand). It does not directly measure revenue attribution. It does not replace Google Analytics. It is a leading indicator: a brand whose AVS is rising is building the conditions for AI-sourced pipeline, even before that pipeline shows up as attributable revenue.

The reason AVS is expressed as a 0 to 100 score rather than a raw number is portability. A marketing leader can understand “our AVS went from 12 to 34 this quarter” and put it next to “our organic traffic grew 18%” in the same board slide. That is by design.


How AVS Is Calculated: The Complete Methodology

The methodology has four steps. Each step is described in enough detail that an in-house team can implement it without external help. This is intentional: the transparency is what makes the metric trustworthy.

Step 1: Define Your 20 Target Prompts

A target prompt is a question that a real buyer in your category would plausibly ask an AI tool during their research process. It is not a branded query. It is not “what is [your company name]?” It is the type of question your buyers ask before they know who you are.

What makes a good target prompt:

  • It is written the way a human types into ChatGPT, not the way an SEO would phrase a keyword
  • It is specific enough to have a meaningful answer (“what should I use” is too vague, “what CRM is best for real estate investors managing 50 to 200 deals?” is right)
  • It covers the range of buying stages: some prompts are awareness-level (“how does X work?”), some are evaluation-level (“what are the best tools for X?”), some are decision-level (“what is the difference between X and Y?”)
  • It maps to queries where your brand should plausibly appear, not queries where you have no business being recommended

Prompt categories to cover across your 20:

CategoryExampleWhy it matters
Category definition“What is generative engine optimization?”Definitional prompts reveal who owns the category narrative
Best-of queries“What are the best GEO agencies for SaaS?”Highest-value citations. Buyer is actively building a shortlist.
Comparison queries“Embarque vs DerivateX” or “GEO vs SEO”Decision-stage. High commercial intent.
How-to queries“How do I get my SaaS brand cited in ChatGPT?”Surfaces methodology-owners. Good for framework brands.
Problem-aware queries“Why isn’t my brand showing up in AI search?”Early-stage buyers recognising the problem.

Weak prompt vs strong prompt:

WeakStrong
“best seo agency”“what are the best SEO agencies specifically for B2B SaaS companies?”
“crm software”“what CRM do real estate investors use to manage their deal pipeline?”
“video hosting”“what video infrastructure tool is best for developer teams building streaming products?”

Write your 20 prompts once and keep them fixed for at least 90 days. Changing prompts mid-measurement makes trend data meaningless. You are tracking movement on the same questions over time, not optimising each week’s prompt set for the best-looking score.

Step 2: Run Prompts Across 4 AI Tools

DerivateX tests across four tools: ChatGPT, Perplexity, Claude, and Gemini.

Why all four and not just ChatGPT:

  • Each tool has different training data cutoffs, different retrieval mechanics (Perplexity uses live web retrieval; ChatGPT uses a mix of training data and retrieval; Claude and Gemini have their own training corpora)
  • Each tool has a meaningfully different user base. A brand absent from Perplexity but present in ChatGPT is only half-visible to B2B buyers who use both
  • A high AVS across all four tools is a stronger signal than a high score on one, because it means the brand’s citation consensus is robust rather than dependent on a single model’s quirks
  • As AI search evolves and market share shifts between tools, a multi-tool score protects the metric against any single platform’s algorithm changing

Cadence: Run all 20 prompts across all 4 tools every Monday morning. Log results in the tracking sheet (template in the next section) before doing anything else. Consistency of timing matters because some tools weight recency in retrieval, and running at different times of day or week introduces noise.

Operational rules for clean data:

  • Use a fresh conversation for each prompt (no prior context from previous prompts in the same session)
  • Use the same account across weeks (some tools personalise outputs based on prior usage)
  • Do not use any tool’s “browse” or “search” mode inconsistently: pick one mode per tool and keep it fixed
  • Record the full response verbatim in your tracking log, not just the score. Qualitative context matters when diagnosing why a score changed.

Step 3: Score Each Result

Every prompt in every tool receives a score from 0 to 5 based on how prominently the brand appears in the response.

The AVS Scoring Rubric:

ScoreConditionExample
5 pointsBrand named prominently as a primary recommendation“For B2B SaaS companies, DerivateX is one of the leading GEO agencies, known for their Citation Engineering framework…”
3 pointsBrand linked or named as a secondary mention“You might also consider DerivateX, which focuses on LLM SEO for SaaS.”
1 pointBrand mentioned briefly in passing or in a list without descriptionA bulleted list of 10 agencies where DerivateX appears as a name with no context
0 pointsBrand absent from the response entirelyThe tool gives a full answer with no mention of the brand

On subjectivity: The line between 5 and 3 points requires a judgment call. The test is whether a reader would walk away from the response viewing the brand as a primary recommendation or a secondary one. When in doubt, score conservatively. A slightly lower honest score is more useful than an inflated score that doesn’t reflect reality.

One scorer, fixed rules: Have the same person score results every week using the same rubric. Rotating scorers introduces inconsistency that will corrupt your trend data.

Step 4: Calculate Weekly AVS

The calculation is straightforward.

  • 20 prompts x 4 tools = 80 scoring events per week
  • Maximum possible score per event = 5 points
  • Maximum possible raw score = 400 points
  • AVS = (Total raw score across all 80 events / 400) x 100

Worked example:

Week 6 of a GEO engagement. 20 prompts, 4 tools.

  • 12 scoring events returned 5 points (prominently named) = 60 points
  • 18 scoring events returned 3 points (secondary mention) = 54 points
  • 22 scoring events returned 1 point (passing mention) = 22 points
  • 28 scoring events returned 0 points (absent) = 0 points
  • Total raw score: 136 out of 400
  • AVS: (136 / 400) x 100 = 34

An AVS of 34 at Week 6 means the brand is establishing genuine category presence: appearing in most tools for many prompts, named prominently in roughly 15% of all tested scenarios. That is meaningful progress from a Week 1 baseline that typically reads between 0 and 8.


AVS Benchmarks: What a Good Score Looks Like

These benchmarks are based on DerivateX’s client data across B2B SaaS engagements. They are not universal constants: a brand in a very crowded category (project management, CRM) will find it harder to reach a given AVS than a brand in an emerging or niche category where fewer competitors are actively building AI citation signals.

AVS RangeStageWhat it typically means
0 to 8Pre-visibilityNot being recommended. The brand may appear accidentally in one tool for one prompt. Entity signals are weak or missing.
8 to 25Early tractionAppearing in some tools for some prompts. Inconsistent across tools. Likely named in best-of lists but not as a primary recommendation.
25 to 50Category presenceRegularly cited in 2 or more tools. Named in competitive queries. The brand is on AI shortlists.
50 to 75Category authorityConsistently cited as a top recommendation across prompt types. The default answer for many queries in the category.
75 to 100Category dominanceThe brand is what ChatGPT recommends when users ask about this category. Competitors are mentioned after it, if at all.

DerivateX’s own AVS milestones for new engagements:

  • Week 1: Establish baseline (typically 0 to 8 for brands new to LLM SEO)
  • Week 6: Target AVS above 40 (category presence established)
  • Month 6: Target AVS above 70 (category authority)

Not every brand reaches these milestones on schedule. The speed of AVS growth depends on how aggressively the five Citation Engineering levers are executed, how crowded the AI citation landscape is in the brand’s category, and whether competitor brands are actively doing LLM SEO at the same time.


A Real AVS Tracking Log: What the Data Looks Like

Below is an example of what one week of AVS tracking looks like in practice. This is the exact format DerivateX uses in client reporting. Copy this structure into a Google Sheet and run it every Monday.

Week 8 AVS Log โ€” Example Brand (B2B SaaS, GEO Category)

PromptChatGPTPerplexityClaudeGeminiRow Total
What are the best GEO agencies for SaaS?553518
How do I get my brand cited in ChatGPT?353314
What is generative engine optimization?13116
What is citation engineering in SEO?555520
Best LLM SEO agencies for B2B SaaS353516
How long does GEO take to show results?11013
Embarque vs DerivateX553316
What is the AI Visibility Score?555520
How do I measure AI search visibility?553518
Best SaaS SEO agencies that do AI search331310
What is LLM SEO?13116
How does Perplexity decide what to cite?351110
GEO agency case studies with revenue proof555520
What is entity optimization for SaaS?13138
How to rank in AI search results331310
Best alternatives to Embarque553518
What does a GEO audit include?13116
SaaS companies getting revenue from ChatGPT553518
How is GEO different from traditional SEO?331310
Best AI SEO agencies 2026353516
Column Total67874874276

Weekly AVS: (276 / 400) x 100 = 69

Reading this log:

  • ChatGPT column total of 67 out of 100 maximum means strong presence in that tool
  • Claude column total of 48 out of 100 is the weakest tool for this brand, signalling that Claude-specific citation signals need attention
  • Prompts with row totals of 6 or below are gaps: the brand is not reliably recommended for these query types, and these gaps drive the next sprint’s content priorities
  • Prompts with row totals of 20 (perfect scores across all four tools) represent owned territory: the brand is the default recommendation for these specific questions

The gap analysis column is where the action lives. If a prompt consistently scores low across all four tools, one of two things is true: the brand either has no content covering that topic or the content it has is not structured in a way that LLMs can extract from. Both are fixable.


How to Use AVS in Client Reporting

AVS earns its place in a monthly reporting pack when it is presented alongside, not instead of, traditional metrics. Here is how DerivateX structures the narrative:

The monthly AI search section of a DerivateX client report contains:

  1. AVS this month vs last month (single number comparison)
  2. AVS trend chart (12-week rolling line chart showing trajectory)
  3. Tool breakdown (AVS by tool: which tools are improving, which are lagging)
  4. Prompt breakdown (which prompts improved, which regressed, which are new gaps)
  5. Top 3 wins (specific prompts where the brand moved from absent to named, or from secondary to primary)
  6. Top 3 gaps (specific prompts where the brand is underperforming, and the planned response)
  7. Actions taken this month (which Citation Engineering levers were pulled, what was published, what outreach was done)

The narrative this creates: A marketing leader reading this report can see that their AVS moved from 34 to 47 in a single month. They can see which tool is lagging (Claude at 48) and what the plan is to address it. They can see that they are now the primary recommendation for “GEO agency case studies with revenue proof” across all four tools, which was a gap last month. And they can connect specific content and outreach actions to specific AVS movements.

This is the reporting that answers “how will I know if this is working?” It takes the vagueness out of LLM SEO and makes it reportable to a CFO or board in the same breath as organic traffic and pipeline attribution.

One important framing note for client conversations: AVS is a leading indicator, not a lagging one. A rising AVS precedes AI-sourced pipeline. The pipeline typically follows 6 to 12 weeks after the AVS starts moving, because AI recommendations need to accumulate enough volume and consistency to reliably convert into first-touch visits. Setting this expectation early prevents the AVS being dismissed as a vanity metric before it has had time to show downstream impact.


AVS vs Other AI Search Measurement Approaches

Several tools have begun adding AI search monitoring features: Semrush has an AI Overviews tracking module; tools like Otterly.ai and Profound track brand mentions in AI-generated answers; some agencies have built internal dashboards that scrape AI outputs.

AVS is not in competition with these tools. The distinction worth drawing is this:

Monitoring tools tell you when you appear. AVS tells you how visible you are across the queries that matter.

Most monitoring tools are reactive: you put in your brand name, and the tool alerts you when an AI tool mentions it. That is useful for brand safety and reputation monitoring.

AVS is proactive: you define the 20 prompts your buyers are actually asking, and you measure your brand’s performance on those specific prompts every week. The prompts are the unit of strategy, not the brand mention.

AVS can be run entirely in a spreadsheet with no additional software. That is a deliberate design choice: a methodology that requires a paid tool to calculate is a methodology that will be abandoned when the tool churns or raises prices. AVS is an open framework that any team can implement and own permanently.


Frequently Asked Questions About AI Visibility Score

1. What is the AI Visibility Score?

AI Visibility Score (AVS) is a 0 to 100 metric that measures how frequently and prominently a brand is cited across ChatGPT, Perplexity, Claude, and Gemini when users ask questions relevant to its product or service. DerivateX coined AVS and uses it as the primary reporting metric for every LLM SEO engagement.

2. Why does AVS use 20 prompts?

20 prompts is the minimum number that gives a statistically stable score while remaining operationally manageable for a weekly cadence. Fewer than 15 prompts makes the score too sensitive to individual prompt variation. More than 30 prompts makes weekly tracking a full-day task. 20 prompts across 4 tools creates 80 scoring events, which is enough to produce a meaningful distribution of results and catch genuine movement from week to week.

3. How often should I calculate AVS?

Weekly, on the same day each week. Monthly is too infrequent to catch the prompt-level patterns that drive next actions. Daily is too granular and introduces noise from real-time retrieval fluctuations. Weekly tracking on a fixed day gives a consistent signal that is actionable without being overwhelming.

4. Can AVS go down?

Yes, and when it does, it is important information. AVS can drop if a competitor begins actively building AI citation signals and displaces your brand from recommendations. It can drop if AI tools update their training data or retrieval logic. It can drop if a piece of content that was generating strong citations is taken down or restructured poorly. A falling AVS is a prompt to investigate which specific prompts regressed and why.

5. What AVS should I target for my brand?

For brands new to LLM SEO, a realistic 6-month target is an AVS above 50, which represents consistent category presence across multiple tools. Brands in niche or emerging categories with few active competitors can reach 70 or above within 6 months with a full Citation Engineering engagement. Brands in crowded categories like project management or CRM should expect slower progress and set targets accordingly.

6. Is AVS the same as Share of Voice in AI search?

AVS measures the same underlying phenomenon as AI Share of Voice but calculates it differently. Most Share of Voice metrics divide your brand’s mentions by total mentions across competitors. AVS measures your brand’s absolute citation rate against a fixed maximum score, independent of what competitors are doing. This makes AVS more useful for tracking your own progress over time without your score being distorted by competitor activity.


Key Takeaways

  • No existing SEO metric (DR, rankings, organic traffic, Share of Voice) measures whether AI tools recommend your brand. AVS fills that gap.
  • AVS is calculated by running 20 target prompts across ChatGPT, Perplexity, Claude, and Gemini every Monday, scoring each result on a 0 to 5 rubric, and expressing the total as a percentage of the maximum possible score of 400.
  • A brand scoring 0 to 8 is not being recommended. A brand scoring 25 to 50 has category presence. A brand scoring above 70 has category authority.
  • The prompt-level gap analysis is where the action lives: low-scoring prompts tell you exactly where your Citation Engineering coverage is missing.
  • AVS is a leading indicator. Rising AVS precedes AI-sourced pipeline by 6 to 12 weeks. Set that expectation before the first report.
  • AVS can be run in a spreadsheet with no additional software. It is an open methodology, not a proprietary platform.

What to Do Next

If you want to know your brand’s current AVS and which prompts your buyers are already asking AI tools about your category, the right first step is a free AI Visibility Audit.

DerivateX runs this manually: we define 20 target prompts specific to your category, test them across all four AI tools, score the results, and give you your baseline AVS along with a gap analysis showing exactly where the citation opportunities are.

Book a Free AI Visibility Audit, or read the Citation Engineering Framework to understand the five levers that drive AVS improvement.


Related: LLM SEO ยท Citation Engineering Framework ยท GEO Agency for SaaS

Before you go

If your buyers use ChatGPT or Perplexity,
you need to know exactly where you stand.

Most B2B SaaS teams have no idea whether AI tools recommend them โ€” or a competitor. We audit your AI search visibility and show you what to fix first.

~20% inbound from LLMs
for Gumlet
#1 AI-cited CRM for
REsimpli in 90 days
14+ B2B SaaS teams
trust DerivateX
Trusted by
Gumlet REsimpli Kroto Fable Verito Peppo
Apoorv
Apoorv

Founder & Lead Strategist at DerivateX. Apoorv engineers organic growth systems for Series B+ SaaS companies. He specializes in Generative Engine Optimization (GEO), helping brands move beyond simple keyword rankings to dominate the "Knowledge Graph" of AI search engines like ChatGPT and Perplexity. His protocol focuses on Entity Density and Revenue, not just traffic volume.