Anatomy of an AI citation — how ChatGPT picks which brands to name

We ran 5,000 prompts across 12 SaaS categories through ChatGPT (gpt-4o), tracked which brands got cited, and reverse-engineered what they had in common. Here's what we found.

The methodology

We picked 12 categories — CRM, project management, email marketing, payment processing, design tools, analytics, video conferencing, no-code, observability, customer support, billing, and HR. For each, we generated ~30 realistic discovery prompts (“what's the best X for Y?”, “recommend a Z for SMBs”) and ran them all through gpt-4o with the web_search tool enabled.

We then tagged every cited brand with a set of features: domain authority, Wikipedia presence, schema markup density, comparison pages, FAQ count, and presence in roundup articles by tier-1 publications. Then we ran a simple regression to see which features correlated with citation rate.

Top 5 predictors of being cited

Wikipedia article exists — by far the strongest single predictor. Brands with a Wikipedia article were cited 3.4× as often as brands without one, controlling for company size.
Cited by tier-1 publications — TechCrunch, Forbes, Bloomberg, Wired. A single mention in a 2024–2026 roundup article was worth ~25% more citation likelihood.
Comparison pages on your own domain— “X vs Y” pages with structured content (table, FAQ, schema) were quoted verbatim ~40% of the time when relevant.
Product schema markup — brands with valid Product, Organization, and SoftwareApplication schema were cited 1.8× as often. This is structured-data-as-recommendation-signal.
Active subreddit or community— brands with a Reddit community of 5k+ members saw a measurable lift, especially for “reviews” and “experiences” queries.

What did NOT help much

Raw page count. Sites with 10,000+ pages had no advantage over sites with 100 well-structured pages.
Keyword density. Old-school SEO tactics had near-zero correlation. AI models extract meaning, not keyword frequency.
Backlink count alone. Backlink quality mattered (Wikipedia / tier-1 pubs); raw count didn't.

The brand that surprised us

In our project management run, a small, ~50-person tool we hadn't heard of was cited in 17 of 30 prompts— outperforming several Series-B competitors. What did they have? A comprehensive set of “X vs Y” comparison pages, all with FAQ schema, and a single TechCrunch mention from 2024.

That's the playbook compressed: comparison content + schema + one tier-1 citation = punch far above your weight class.

What this means for you

Pursue your Wikipedia entry if you don't have one. It's the single highest-ROI investment in AI visibility.
Build comparison pages for your top 3 competitors. Structured. With FAQs. With schema.
Pitch tier-1 publications — even a single roundup mention compounds for years.
Audit your schema. Most brands have either none, or broken/incomplete JSON-LD. Fix the basics.

See your own data →

Anatomy of an AI citation — how ChatGPT decides which brands to recommend

The methodology

Top 5 predictors of being cited

What did NOT help much

The brand that surprised us

What this means for you