HubSpot AI Search Grader: What It Measures and What to Do With Your Score
The HubSpot AI Search Grader is a free tool that measures how visible your brand is in AI-generated answers from ChatGPT, Perplexity, and other large language models. It scores your share of voice, brand sentiment, and overall visibility when AI assistants answer questions in your category, then benchmarks you against competitors. It takes about two minutes to run and requires no HubSpot subscription.
That's the short version. The longer version — and the part most marketing teams get wrong — is what to actually do with the number it spits out. We've run the grader for dozens of client brands as part of broader HubSpot audits, and the pattern is consistent: teams treat the score like a vanity metric, screenshot it for a slide deck, and move on. This article covers what the tool genuinely measures, where it's blind, and a concrete action plan for every score band.
What Is the HubSpot AI Search Grader?
The HubSpot AI Search Grader (sometimes called the HubSpot AEO Grader, for Answer Engine Optimization) is a free web tool that tests how often and how favorably large language models mention your brand when answering buyer-intent questions. You enter your brand name, industry, and target audience; the tool queries LLMs with representative prompts and grades the responses. The output is a 0–100 score built from share of voice, sentiment, and brand visibility.
Think of it as a rank tracker for a world without ranks. Classic SEO tools tell you where you sit on a results page. In AI search there is no page — there's a synthesized answer, and your brand is either in it or it isn't. HubSpot built the grader to make that binary visible and measurable, the same way its Website Grader did for technical SEO a decade ago.
A few things worth knowing before you run it:
- It's genuinely free. No Marketing Hub subscription needed. You'll trade an email address for the report, which is fair — you're using a lead magnet built by the company that popularized lead magnets.
- It's brand-level, not page-level. The grader evaluates how LLMs talk about your company, not whether a specific blog post gets cited.
- It's a snapshot, not a monitor. One run tells you where you stand today. LLM outputs shift as models update, so re-run it monthly if AI visibility matters to your pipeline.
Book a free HubSpot audit. No onboarding calls, no meetings — click our invitation link to grant partner access to your portal, and we'll send you a full list of improvements within days.
How the AI Search Grader Works: Share of Voice, Sentiment, and Visibility
The AI Search Grader works by sending category-relevant prompts to large language models and analyzing the responses for three signals: share of voice (how often your brand appears versus competitors), sentiment (whether mentions are positive, neutral, or negative), and brand visibility (how prominently and accurately you're described). These are blended into a single 0–100 score with sub-grades for each dimension.
Here's what each component actually tells you:
Share of voice. When someone asks ChatGPT "what's the best mid-market CRM for a manufacturing company?", a handful of brands get named. Share of voice measures how often you're one of them across a batch of prompts in your category. This is the closest AI-search equivalent to ranking position, and it's the metric most correlated with actual referral traffic from AI assistants.
Sentiment. LLMs don't just name brands — they characterize them. "X is powerful but notoriously expensive and hard to implement" is a mention, but not one you want. The grader classifies the tone of your mentions. In our experience this is where B2B brands get surprised: a 45-person fintech client of ours scored well on share of voice but discovered the models consistently described their product as "legacy," language scraped from a competitor's comparison pages that dominated the training data.
Brand visibility. This is the broader composite: are you mentioned at all, are the facts about you correct, and do you appear for the high-intent prompts (comparisons, "best X for Y," alternatives queries) rather than only informational ones?
The mechanics matter because they tell you what levers exist. You can't buy your way into an LLM's answer. What you can influence: the volume and consistency of third-party content that describes your brand, structured comparison content on your own domain, review-site presence (G2, Capterra, Reddit threads — LLMs lean on these heavily), and unambiguous positioning language that models can repeat verbatim.
Running the Grader: A Two-Minute Walkthrough
Running the HubSpot AI Search Grader takes two to three minutes: you provide your brand name, website, industry, and a description of your target buyer, and the tool returns a scored report by email and on screen. The most important step is the one people rush — describing your audience accurately — because the prompts the tool generates are only as relevant as the inputs you give it.
The flow, step by step:
- Go to HubSpot's AI Search Grader page (it lives in HubSpot's free tools directory alongside Website Grader).
- Enter your brand name exactly as customers say it. If you're "Acme Software" but everyone calls you "Acme," test the common usage — that's what appears in prompts.
- Enter your website URL so the tool can associate your domain with the brand.
- Select your industry and describe your target audience. Be specific: "ops leaders at 50–500 employee logistics companies" generates far more useful prompts than "businesses."
- Submit and wait for the report — usually under a minute.
- Review the overall score plus the sub-scores for share of voice, sentiment, and visibility. Read the actual sample responses if the report surfaces them; the qualitative language is often more actionable than the number.
- Re-run quarterly (monthly if you're actively investing in AI visibility) and log the scores. Trend matters more than any single reading.
One practical tip: run it twice — once with your exact positioning and once with a broader category description. The gap between the two scores tells you whether you're visible only in your niche or in the wider market your competitors play in.
Interpreting Your Score: What Each Band Means
There is no official pass/fail threshold, but after running the grader across client portfolios, clear bands emerge: below 40 means AI assistants effectively don't know you exist; 40–70 means you appear inconsistently or with weak framing; above 70 means you're a default answer in your category. Your priority actions differ sharply by band, so diagnose before you act.
| Score band | What it means | Priority actions |
|---|---|---|
| 0–25 (Invisible) | LLMs don't mention you even for niche prompts. Common for young brands, rebrands, and companies whose category language doesn't match how buyers ask. | Fix foundational presence: G2/Capterra profiles, consistent brand descriptions everywhere, comparison pages on your own site, earn mentions on third-party sites LLMs actually cite. |
| 26–45 (Occasional) | You surface for narrow, exact-match prompts but never for broad "best X" queries. | Publish head-to-head comparison content, target listicle inclusions ("top 10 X tools"), build Reddit/community presence, tighten one-line positioning so models can repeat it. |
| 46–70 (Contender) | You appear regularly but behind category leaders, or with lukewarm/mixed sentiment. | Attack sentiment: audit what review sites and forums say, address the recurring criticism at the source, push customer proof (case studies, named reviews) that reframes the narrative. |
| 71–85 (Established) | You're a default recommendation for your core use case with positive sentiment. | Defend and extend: monitor monthly, expand into adjacent-category prompts, ensure factual accuracy of what models say (features and positioning change faster than models do). |
| 86–100 (Category leader) | You're cited first, framed favorably, across broad and niche prompts. | Maintenance mode plus conversion focus — your problem is no longer visibility, it's what happens after the click (more on that below). |
Treat the bands as a triage tool. A 38 with positive sentiment is a better position than a 55 where every mention includes "expensive" or "outdated" — visibility with bad framing actively costs you deals you never see.
Limitations: What the Grader Doesn't Tell You
The AI Search Grader is a useful directional snapshot, but it has real blind spots: it samples a limited set of prompts, LLM outputs are non-deterministic so scores fluctuate between runs, and it measures visibility rather than traffic or revenue. Use it to prioritize effort, not to report ROI.
Be honest with your leadership team about these constraints:
- Prompt sampling. The tool tests a generated batch of prompts, not the infinite space of real user questions. Your actual buyers may phrase things the grader never tests.
- Non-determinism. Ask an LLM the same question twice, get different brand lists. A 5–10 point swing between runs is noise, not signal. Don't celebrate or panic over single-run deltas.
- Model coverage lags reality. The models tested and their snapshot dates shift. Your score against one model version says little about the assistant a buyer used yesterday, especially assistants with live web browsing.
- No revenue linkage. The grader can't tell you whether AI mentions produce pipeline. For that you need proper source tracking in your CRM — if your attribution reporting can't isolate AI-assistant referrals as a channel, fix that before investing heavily in AEO.
- Brand-level granularity. It won't tell you which content earned the mention, so you still need to reverse-engineer citations manually (ask the assistants directly and check the sources they cite).
None of this makes the tool worthless. It makes it a compass, not a speedometer.
Your Action Plan by Score Band
The fastest way to move an AI Search Grader score is to fix the sources LLMs learn from: review platforms, third-party comparisons, community discussions, and clear positioning language on your own domain. Work the checklist below top to bottom — the early items move low scores, the later items move high ones.
- Standardize your one-line brand description (what you do, for whom, differentiator) and deploy it identically across your homepage, LinkedIn, G2, Crunchbase, and press boilerplate. Models repeat what they read consistently.
- Claim and complete review-platform profiles (G2, Capterra, TrustRadius for B2B software; industry equivalents otherwise) and run a systematic review-generation campaign. These sites are disproportionately cited by LLMs.
- Publish comparison and alternatives pages on your own domain — "You vs. Competitor," "Best X for Y" — with honest, structured content. If you don't frame the comparison, someone else's framing becomes the model's answer.
- Get into third-party listicles and analyst roundups for your category. One inclusion in a widely-scraped "top 10" post does more for share of voice than ten posts on your own blog.
- Address the sentiment root cause. If models call you "hard to implement," that language lives somewhere — old reviews, a Reddit thread, a competitor's page. Counter it at the source with updated reviews and public proof, not just new marketing copy.
- Build community presence where real discussions happen (Reddit, industry Slacks that get indexed, niche forums). Authentic practitioner mentions are high-trust signals for models.
- Add FAQ-style, direct-answer content to key pages so assistants can quote you cleanly — the same discipline that wins featured snippets wins AI citations.
- Re-run the grader monthly, log scores in a spreadsheet, and track alongside AI-referral sessions in your analytics so you can correlate effort with movement.
- Instrument your CRM to tag AI-sourced leads so the visibility work connects to pipeline — HubSpot's own Breeze AI tooling and custom source properties both work here.
The Part the Grader Can't Score: What Happens After the Click
AI visibility is a top-of-funnel win, and the AI Search Grader only measures that top. Whether those AI-referred visitors become revenue depends entirely on what's behind the click: your forms, routing, lifecycle logic, and CRM data quality. A 90 grader score feeding a broken portal is an expensive way to generate leads nobody follows up with.
We see this constantly. A 40-person SaaS client of ours invested two quarters in AEO, moved their grader score from 31 to 68, and saw AI-referred demo requests triple. Revenue barely moved — because those leads landed in a portal with no lead scoring, lifecycle stages nobody maintained, and a "speed to lead" of four business days. The marketing team fixed AI search; the CRM quietly discarded the results.
The uncomfortable truth: for most small and mid-sized companies, one week spent fixing CRM fundamentals returns more pipeline than one month spent on AI visibility. Ideally you do both — but sequence matters. Run the grader, note your band, start the checklist. Then run a HubSpot audit to make sure the portal on the receiving end deserves the traffic.
Book a free HubSpot audit. No onboarding calls, no meetings — click our invitation link to grant partner access to your portal, and we'll send you a full list of improvements within days.
FAQ
Is the HubSpot AI Search Grader really free?
Yes. It's a free lead-generation tool from HubSpot, like Website Grader. You provide an email address to receive the report, but no Marketing Hub subscription or credit card is required.
How often should I run the AI Search Grader?
Monthly if AI visibility is an active initiative, quarterly otherwise. LLM outputs are non-deterministic, so ignore swings under ~10 points and focus on the multi-run trend.
What's a good AI Search Grader score?
Above 70 generally means you're a default recommendation in your category. But context beats the number: a 40 with positive sentiment in a crowded category can be a stronger position than a 60 with negative framing. Diagnose the sub-scores, not just the headline.
Can I directly improve my score by publishing more blog posts?
Not directly. Your own blog matters less than third-party sources — review platforms, listicles, community discussions — because LLMs weight independent mentions heavily. Own-domain comparison pages and consistent positioning language help, but the biggest lever is earning credible external mentions.
Does a high AI Search Grader score mean more revenue?
No — it means more visibility. Whether that converts depends on your funnel: routing, follow-up speed, lead scoring, and lifecycle hygiene in your CRM. Pair AEO work with a portal audit so the leads you win actually get worked.






