Back to Blog
AI WritingApril 15, 202616 min read

How to identify keywords for SEO with real search data

Dipflowby Ivaylo, with help from Dipflow

Real keyword research starts the moment you stop trusting keyword tools at face value. If you are trying to identify keywords for seo, the fastest way to waste a quarter is to grab a list from one platform, treat the volume column like physics, and ship a content plan that never had a chance.

We learned that the annoying way: a client asked for “real search data” after an agency sold them 60 AI-generated topics with confident-looking numbers. We did what scrappy teams do. We picked 10 of those keywords, checked the SERPs manually, cross-checked two tools, and then tried to map each query to an actual page type. Half the terms had either mismatched intent or SERPs dominated by features that ate the clicks. A few had “volume,” but no visible demand in the form of question variants, long-tail expansions, or consistent competitor coverage. We scrapped the plan.

What follows is our field-tested method: how we define real search data, how we turn a vague topic into seed keywords that match human phrasing, how we mine Google’s own suggestion surfaces without drowning, and how we prioritize with a scoring model that forces trade-offs between volume, difficulty, relevancy, and search intent.

What “real search data” actually means (and why tools disagree)

When people say “real search data,” they usually mean: “Show me numbers that came from actual search behavior, not vibes.” That’s fair. It’s also where people get tricked.

In practice, “real” is a bundle of imperfect proxies:

Monthly search volume is the frequency a term is searched in a period, usually monthly. It’s not your traffic. It’s not your potential clicks. It’s a count estimate.

Competition or keyword difficulty is a scale that indicates how hard it is to rank. Every vendor computes it differently, often by looking at the current top results and their authority signals.

Search intent is the classification of what the searcher is trying to accomplish. Informational, commercial, transactional, navigational. Tools guess. The SERP tells the truth.

CPC (cost-per-click) is a paid metric that sometimes acts like a proxy for commercial value. Sometimes. It can also just mean advertisers are irrational in that niche.

Here’s the part that trips smart people up: these metrics are not measured the same way across platforms, and even within the same tool the number is often a modeled estimate. That’s why you can see 90 searches in one tool and 260 in another, for the same keyword, same country.

Where this falls apart is when you treat any one number as ground truth, then build a whole calendar around it. You will overproduce content for terms that never drive clicks, and underinvest in terms that are quietly valuable.

Our definition of “real search data” is narrower and more useful: a keyword is real enough to bet on when you can triangulate demand from at least two of these sources.

First, a tool reports non-zero monthly volume in your target location.

Second, Google suggests meaningful variants via Autocomplete, People Also Ask, or Related Searches.

Third, at least a few non-mega sites rank for it or for very close variants, which implies the SERP is not locked behind brand authority.

If all you have is a number in a tool, we treat it as a hypothesis, not a target.

How we identify keywords for SEO: turning a vague topic into seeds that match human phrasing

Seed keywords are not “topics.” They are the starting queries you feed into suggestion surfaces and research tools so you can expand into the real language people type.

Most seed lists fail for one of two reasons. Either they are too broad, so the SERP intent goes sideways and the competition spikes. Or they are copied from competitors, so they reflect the competitor’s positioning, not your audience’s problem.

The broad-head trap is easy to spot. Someone starts with “chicken” and wonders why they’re competing with Wikipedia, national magazines, recipe sites, and e-commerce giants. A better seed is “backyard chickens,” because it’s more specific, tends to have lower competition, and it pulls a SERP that’s closer to a real content angle.

The hard part is not knowing that. It’s producing seeds consistently.

We use a five-minute input checklist. It’s boring. It works.

Write this down before you touch any tool:

  • One job-to-be-done statement in plain English: “When I [situation], I want to [motivation], so I can [outcome].”
  • Three pain phrases your audience would say verbatim, including the emotional word. “I’m stuck,” “it’s confusing,” “I keep wasting money,” “it takes forever.”
  • Three desired outcome phrases. “fast,” “cheap,” “step-by-step,” “for beginners,” “without [thing they hate].”
  • Three constraint phrases: budget, location, timeframe. “under $500,” “near me,” “same day,” “for small apartment,” “in [city].”
  • Five question stems: “how to,” “best,” “what is,” “vs,” “why does.”

Now turn that into seeds by combining one element from each bucket. You are manufacturing phrasing that sounds like a search box.

Example, if your service is local bookkeeping for freelancers:

Job-to-be-done: “When tax time is coming and my books are a mess, I want to get caught up fast so I don’t overpay and panic.”

Pain phrase: “books are a mess.” Desired outcome: “caught up fast.” Constraint: “for freelancers.” Question stem: “how to.”

Your seeds become:

“How to catch up bookkeeping fast for freelancers” and “bookkeeping cleanup for freelancers” and “how much does bookkeeping cost for freelancers” and “bookkeeper near me for freelancers.”

Not elegant. Real.

What nobody mentions: your best seed keywords often come from places that are not SEO tools.

We pull language from customer calls, chat logs, onboarding forms, and support tickets. We also lurk in Facebook Groups, YouTube comments, and Reddit threads, not because those platforms are “SEO,” but because they reveal the phrasing people use when they are not trying to impress anyone. If you only use professional jargon, you will miss the queries beginners actually type.

We’ve made this mistake ourselves. We once built a seed list around “identity verification workflow” because that’s what the product team called it. The audience searched “why is my verification stuck” and “how long does verification take.” That mismatch cost us weeks.

Anyway, back to the point.

Once we have 10 to 20 seeds that sound like real questions, we expand.

SERP mining that doesn’t melt your brain: Autocomplete, People Also Ask, Related Searches

SERP suggestion mining is the cheapest way to find query patterns that are already proven to exist. You do not need a subscription to start.

Our workflow is simple and a little obsessive.

We open an incognito window, set our location if possible (or use a location parameter via tools later), and type each seed slowly into Google. We capture Autocomplete suggestions as they appear. Then we run the search, open People Also Ask, expand it a few times, and copy the questions. Then we scroll to Related Searches and capture those.

We do this in a plain text doc first, not a spreadsheet. Spreadsheets make you pretend the data is clean. It isn’t.

What trips people up is copying suggestions without labeling intent. Autocomplete happily mixes informational, commercial, and navigational variants in the same set. People Also Ask tends to skew informational. Related Searches often includes adjacent topics that are not actually part of the same intent.

Our fix: we tag each captured phrase with a single letter while we collect it.

I = informational, the person wants to understand or solve.

C = commercial, the person is evaluating options.

T = transactional, the person is ready to buy, book, download, sign up.

N = navigational, the person wants a specific site or brand.

Do not overthink the tag. The value is speed.

Then we do a quick “noise pass.” We delete items that are pure definitions if we cannot produce a credible educational page. We delete items that are off-topic, like celebrity results, or queries that clearly imply a different audience. We also delete anything that would require regulated advice we cannot responsibly publish.

One more annoyance: SERP features can steal clicks. If the query triggers a giant featured snippet, AI overview, map pack, or an answer box that fully satisfies the question, the keyword can still be worth targeting, but the traffic model changes. We note “SERP crowded” in the margin.

This is why “real search data” is not a single number. The SERP itself is data.

Prioritizing with real metrics: volume, difficulty, relevancy, intent (without the high-volume trap)

This is the messy middle. You will see keywords with high volume and brutal difficulty, or low difficulty and unclear demand, or perfect intent but tiny volume. Tools will not decide for you.

We use a scoring model because it forces us to be honest about trade-offs. It also helps when a stakeholder tries to overrule everything with “but the volume is bigger.”

Step one: put keywords into bands, not precise numbers

Treating volume as a precise value is how people get lied to. Many tools display exact volumes, but the underlying reality is fuzzy. We band everything.

Volume band (0 to 3):

0 = no measurable volume or only appears in one tool with no SERP evidence.

1 = low volume, but consistent variants exist (often long-tail).

2 = moderate volume, multiple variants, consistent SERP coverage.

3 = high volume, head or chunky mid-tail.

Difficulty band (0 to 3):

0 = very low difficulty, SERP includes forums, small blogs, niche sites.

1 = low to moderate, some authority sites but also openings.

2 = high, SERP filled with strong domains, well-optimized pages.

3 = very high, brands and entrenched results, heavy SERP features.

Intent fit band (0 to 3):

0 = wrong page type, wrong stage, or SERP intent doesn’t match what we can publish.

1 = partial fit, requires angle gymnastics.

2 = good fit, we can match the content type.

3 = perfect fit, we can satisfy the query cleanly.

Business relevancy band (0 to 3):

0 = not tied to what we offer or to a meaningful audience.

1 = tangential, brand awareness only.

2 = relevant, can support conversion or retention.

3 = directly tied to revenue or core activation.

Step two: compute a simple priority score

We do: Priority = (Volume + Intent + Relevancy) minus Difficulty.

That gives a range from -3 to 9.

We then assign tiers.

Tier 1 (6 to 9): publish or refresh soon.

Tier 2 (3 to 5): publish when capacity opens, or bundle into a cluster.

Tier 3 (0 to 2): only if it supports a bigger cluster or fills a gap.

Tier 4 (below 0): park it.

You can argue with any scoring model, and we do. The point is not mathematical truth. The point is consistency.

Decision rules we use when metrics conflict

A high volume keyword with a difficulty of 3 is not “ambitious.” It’s usually a trap unless you have a strong domain, a unique angle, and patience.

A low volume keyword can be gold if it is specific, intent-clean, and sits close to a conversion action. Higher specificity often lowers competition. That is the whole long-tail advantage.

Low volume is a red flag when two things are true: the keyword has no meaningful variants in Autocomplete or People Also Ask, and the SERP results are thin or irrelevant. That usually means the demand is not there, or people phrase it differently.

Here’s a concrete example of good vs bad targeting logic.

Bad target: “project management.” Huge volume, difficulty through the roof, intent mixed between definitions, software, certifications, templates. You can write something. You probably won’t rank.

Better target: “project management template for marketing team” or “project management checklist for client onboarding.” Lower volume, but the intent is crisp, the content type is obvious, and you can actually satisfy it.

The catch is that stakeholders love the big terms because they look important. We have had to show SERP screenshots to end the debate.

How we sanity-check difficulty without paying for ten tools

Difficulty scores are helpful, but they are not gospel.

We do a manual check on the first page.

If the top results are mostly high-authority domains and every page is a fully-built guide with fresh dates, original images, and backlinks, the difficulty is functionally high even if the tool score is moderate.

If the top results include weak pages, outdated content, thin listicles, or irrelevant results that only rank due to domain strength, that’s an opening.

We also look at SERP features. If the query triggers a map pack, shopping results, or a dense People Also Ask box, you might rank and still get fewer clicks.

That is still fine. You just plan for it.

Local and geo-sensitive keyword evaluation: choose a target area or you’re guessing

If your business serves a region, national keyword volumes are a nice story and a bad plan.

Local research requires you to pick a target area in whatever tool you use, because localized volume and SERP composition change by city, state, and country. Some platforms cover a huge number of locations, down to cities and districts.

The friction point is simple: people forget to set location, then wonder why the content underperforms locally.

We handle local evaluation in three passes.

First, we collect modifiers: “near me,” city names, neighborhoods, “open now,” “best in [city],” “cost in [state].” These modifiers can flip intent from informational to transactional.

Second, we check the localized SERP. A keyword that looks like a blog query nationally might be dominated by map results locally. That changes what you should build. Sometimes the best “keyword play” is not a blog post, it’s a service page plus a well-built Google Business Profile.

Third, we interpret volume carefully. Local volumes are often smaller and noisier. That does not mean the keyword is worthless. It means you should look for families of related queries you can cover on one strong page.

If you are doing keyword research for a multi-location business, you will be tempted to clone pages for every city. Resist the urge until you confirm there’s distinct demand and distinct intent by location. Duplicate pages are a fast route to thin content and internal competition.

Competitor and existing-ranking shortcuts that actually save time

Keyword research does not have to start from scratch, especially if you already have a site with content.

Extracting keywords from competitor URLs without copying them blindly

Competitor discovery is powerful because it reveals what already works in your niche. Most tools let you enter a domain or a specific URL and pull the keywords it ranks for.

The annoying part: copying competitor keywords blindly.

A competitor might rank because they have brand authority, backlinks, or a long history. You can copy the keyword and still fail.

We only keep competitor keywords if they pass three tests.

Intent test: does the SERP match a page type we can create, and can we satisfy it better or differently?

SERP crowding test: are clicks available, or is the page buried under features?

Fit test: is the keyword actually relevant to our offering, or is it a side topic the competitor can afford to cover because they are huge?

Sometimes we find “ghost wins” here: competitor pages ranking for long-tail variants they did not explicitly target. That’s a clue that Google associates the topic cluster. Those variants are often easier wins for us.

Finding “striking distance” keywords on your own site

If your site already ranks, your fastest wins are often not new pages. They’re updates.

We look for keywords where we rank just outside the top 10. That range is where a refresh can move you onto page one faster than a brand-new URL.

This step is humbling because it reveals how often we were “close” but not satisfying the query fully.

When we see a striking-distance keyword, we:

Check if the page matches intent. If the SERP is asking for a checklist and we wrote a narrative essay, we rewrite.

Expand the content to cover missing sub-questions shown in People Also Ask.

Improve the title and headings to reflect the exact phrasing, without stuffing.

Tighten the intro so it answers the question quickly.

Sometimes we also prune. Yes, prune. If the page is trying to rank for five different intents, it ranks for none.

From keyword lists to a content map (without cannibalizing yourself)

At some point you have enough keywords. The work shifts from discovery to restraint.

We cluster by intent first, not by wording. Two queries that sound different can have identical intent, and should usually be one page. Two queries that sound similar can have different intent, and should be separate.

We assign one primary keyword per page, then keep a small set of supporting variants that the page can naturally answer. We also decide the page type upfront: guide, checklist, comparison, service page, FAQ, tools page.

Cannibalization happens when you publish multiple pages that target the same keyword family and the same intent. Google then has to guess which page to rank, and it often picks the weaker one. One sentence rule we use: if two pages would have the same outline, they should be one page.

The practical checklist we keep taped to the wall

We’ve tried fancy systems. This is what survives.

Pick a target location first if you have local intent.

Build seed keywords from audience language, not industry jargon.

Mine Autocomplete, People Also Ask, and Related Searches, and tag intent as you capture.

Triangulate “real” demand using at least two signals: tool volume plus SERP evidence.

Prioritize with a scoring model that forces trade-offs between volume, difficulty, relevancy, and intent.

Start execution with striking-distance refreshes, then Tier 1 new pages.

If you do nothing else, do this: open the SERP before you commit. Every time.

That one habit saves months.

FAQ

How do I figure out what keywords to use for SEO?

Start with 10 to 20 seed queries based on customer language, then expand them using Google Autocomplete, People Also Ask, and Related Searches. Keep only the keywords that show tool volume and clear SERP evidence of real variants and clickable results.

What counts as real search data for keyword research?

Real search data is not one metric, it is triangulation. Look for non-zero tool volume plus SERP validation like suggestions and competitor coverage that is not dominated by mega brands.

How do I avoid the high-volume keyword trap?

Check the SERP before you commit: high volume with mixed intent, entrenched results, or heavy SERP features usually underdelivers on clicks. Prioritize intent fit and relevancy first, then treat volume as one input in a scoring model.

What is the 80/20 rule in SEO keyword research?

It means a small set of pages and keyword clusters typically drive most results. Focus on Tier 1 opportunities, plus striking-distance updates, before you spend time chasing broad head terms.

competitor keyword gapgoogle search consolekeyword prioritizationlong tail keywordssearch intentserp analysis
Identify Keywords for SEO With Real Data - Dipflow | Dipflow