How Search on AI Tools Works? (Insights from Real Tests)

GTM Strategies

How Generative AI search works? : What I Learned from Testing 7 AI Tools on Real Buyer Prompts

Jul 23

•

min read

Artificial Intelligence

Inbound Marketing

Pankaj Tripathi

Co-Founder of GTMDialogues & CEO of Inbound Marketing Practice.

Until recently, SEO meant optimizing for Google’s algorithm.

You’d chase rankings, tweak metadata, and fight your way to the top of Page 1.

But Gen-AI tools are gradually changing it.

Google still is the leading power in search with 5 trillion searches in 2024. ChatGPT accounts for 5 billion search-like prompts every month. The difference is huge but a lot of your buyers are doing their research on ChatGPT than visiting tens of websites to find information about you.

To further strengthen my point, I would add that Ahrefs recently rolled out an AI citation report, showing which websites are being cited by AI tools like ChatGPT and Perplexity.

If your site isn’t showing up there, you may not exist in your buyer’s journey at all.

To understand this shift, I ran an experiment.

I tested how 7 popular AI tools - ChatGPT, Gemini, Perplexity, Claude, Grok, Copilot, and DeepSeek handle source citations across 5 types of real-world prompts.

I wanted to find out:

Do these tools cite sources? If so, when?
Which tools give credit to websites—and which don’t?
What happens when you ask for stats, definitions, product recommendations, or content?
What should marketers and content teams do differently?

Let’s break it down.

The Hypothesis & Experiment Design

Before I get into prompt testing and citations, let’s talk about the tools themselves.

Each of these generative AI tools has a growing, loyal user base, and a role in shaping how information is discovered online.

Tool	Monthly Users / Reach	Positioning
ChatGPT	5B+ prompt requests/month	Market leader in Gen-AI; OpenAI’s flagship
Gemini	Integrated into Google ecosystem	Built into Android and Google Workspace
Microsoft Copilot	Native in Windows 11, Office apps	Quietly reaching millions through defaults
Perplexity	~50M visits/month (April 2024)	Fast-growing, research-focused, real-time web
Claude	Enterprise-heavy usage via Slack	Known for summarization and safer responses
Grok	Native to X (formerly Twitter)	Elon-backed, integrated with real-time posts
DeepSeek	China’s leading open LLM	Fast-rising global challenger with English/Chinese support

Each tool is used differently:

Some aim to replace search (Perplexity, Gemini)
Some aim to embed assistance in workflows (ChatGPT, Copilot)
Some are real-time facet checker (Grok)
Others offer safe summarization (Claude) or local AI options (DeepSeek)

But none of them work like traditional search engines.

The Hypothesis

I noticed that AI tools don’t cite consistently. Instead, they behave based on intent of the query, their design philosophy, and perhaps even legal risk.

My core hypothesis: The way AI tools cite (or don’t cite) sources depends on the type of question you ask, not just the tool itself.

In other words:

Some prompts trigger citations (like stats)
Others don’t (like “write an article”)

And marketers optimizing for SEO need to understand this new intent-citation dynamic if they want to stay visible.

Experiment Setup

To test this, I ran the same five prompts across all seven tools. Each prompt reflected a different search or task intent:

Definition → What is B2B marketing?
Process → How to choose tools for B2B marketing?
Product Comparison → Which tool is better: Ahrefs or Semrush?
Statistics → How many people in B2B used generative AI?
Task → Write me a 300-word article on B2B marketing

For each response, I recorded:

Whether sources were cited
How many links were included
If links pointed to brand sites, third-party blogs, or aggregators
Whether the tool offered clickable or just textual citations

I also took screenshots (shared below in placeholders) to show how responses varied.

Prompt 1: What is B2B Marketing?

This is the kind of question that shows up at the very top of the funnel.

If AI tools are trying to help users “understand a concept,” I expected at least some of them to cite sources, especially those that position themselves as research assistants.

Here’s what happened:

1. ChatGPT

Prompt: What is B2B Marketing?

Citation: No sources cited

Behavior: Delivered a well-structured, 3-paragraph response without any links or references.

Takeaway: Acts like a teacher, not a librarian. It’s trying to explain, not refer.

2. Gemini (Google)

Prompt: What is B2B Marketing?

Citation: Yes

Behavior: Cited multiple third-party marketing blogs and educational websites. Links appeared below the summary.

Takeaway: Surprisingly transparent—especially for a basic question. Likely pulling from indexed Google Search results.

3. Grok (X / Twitter AI)

Prompt: What is B2B Marketing?

Citation: No citations

Behavior: Delivered a short, conversational explanation. No links, footnotes, or references.

Takeaway: Grok acts like Twitter. It speaks in opinions, not footnotes.

4. Claude (Anthropic)

Prompt: What is B2B Marketing?

Citation: No sources cited

Behavior: Response felt formal and articulate, but no citation or attribution.

Takeaway: Leans toward safety and accuracy, but avoids external references.

5. Perplexity

Prompt: What is B2B Marketing?

Citation: Yes

Behavior: Cited multiple sources inline and included a “Sources” box at the bottom. Links appear on the top of output.

Takeaway: Perplexity behaves like a hybrid between search engine and AI. Best performer in this category.

6. Microsoft Copilot

Prompt: What is B2B Marketing?

Citation: No sources

Behavior: Short paragraph, no citations. No indication where the information came from.

Takeaway: Despite being backed by Bing, it doesn’t act like a search engine here.

7. DeepSeek

Prompt: What is B2B Marketing?

Citation: No sources

Behavior: Provided a paragraph-length answer with no links or citations.

Takeaway: Similar to Claude - safe, broad, and self-contained.

Here’s the quick summary of citation pattern for “definition” prompt:-

Tool	Cites Sources?	Notes
ChatGPT	❌	Clean explanation, no citation
Gemini	✅	Cited Google-indexed sources
Grok	❌	Conversational, citation-free
Claude	❌	Structured response, no links
Perplexity	✅	Inline + list citations, very strong
Copilot	❌	No links, no source reference
DeepSeek	❌	Factual tone, no citations

‍

Out of seven tools, only Perplexity and Gemini cited sources for a basic definition. Everyone else treated the answer as general knowledge.

Prompt 2: How to Choose Tools for B2B Marketing?

This prompt reflects a very common real-world use case: a buyer trying to evaluate tools, build a shortlist, or understand selection criteria. I expected this question to trigger more citations than the previous definition-based prompt.

But the results were inconsistent, and surprising in some cases.