SlideShare a Scribd company logo
How to categorise
100,000 queries
in 15 minutes
Richard Lawrence
Rise at Seven
@richlawre
WHAT YOU WILL LEARN
How to grab thousands of
queries
How to categorise them
quickly
How to train a text classifier
How to make a tool anyone
can use to do all this
How to gain useful insight
Loads of code examples
throughout!
GRABBING THE QUERIES
GRABBING THE QUERIES
● Set up the API:
bit.ly/gsc-api-
setup
USING PYTHON
GRABBING THE QUERIES
● Use this Colab
notebook:
USING PYTHON
bit.ly/gsc-api-python
GRABBING THE QUERIES
WHAT YOU GET
● List of queries
and associated
metrics
● Use this
dashboard:
USING DATA STUDIO
bit.ly/aleyda-gsc-
dash
GRABBING THE QUERIES
How to categorise 100K search queries in 15 minutes - MeasureFest
CATEGORISING THE
QUERIES
CATEGORISING QUERIES
THREE POTENTIAL WAYS
Keyword
mentions
Topic
similarity
Intent
BY KEYWORD MENTIONS
BY KEYWORD MENTIONS
USING AN APRIORI ALGORITHM
Creates categories
for commonly
mentioned
keywords
QUERY
KEYWORD
MENTION
CATEGORY
BUY SHIRTS
SHIRTS
SHIRTS IDEAS
SHIRTS SALE
BY KEYWORD MENTIONS
USING AN APRIORI ALGORITHM
Specify number of
keywords needed to
create a category
QUERY
KEYWORD
MENTION
CATEGORY
BUY SHIRTS
SHIRTS
SHIRTS IDEAS
SHIRTS SALE
BY KEYWORD MENTIONS
USING AN APRIORI ALGORITHM
Specify around
1,000 for 100K
queries
QUERY
KEYWORD
MENTION
CATEGORY
BUY SHIRTS
SHIRTS
SHIRTS IDEAS
SHIRTS SALE
BY KEYWORD MENTIONS
A FULL EXPLANATION
bit.ly/apriori-
explanation
bit.ly/apriori-colab
● Use this Colab
notebook:
GETTING HANDS ON
BY KEYWORD MENTIONS
BY TOPIC SIMILARITY
EXAMPLE TOPIC CLUSTERS
BY TOPIC SIMILARITY
Cats
Kittens
Feline
Dogs
Puppies
Canine
THE PROCESS
BY TOPIC SIMILARITY
Turn words
into
numbers
plotted on a
graph by
meaning
Find clusters
of words on
graph to
represent
topics
Reduce
dimensions
of graph
TURNING WORDS INTO NUMBERS
Example embeddings:
● BERT
● Word2vec
● GloVe
BY TOPIC SIMILARITY
REDUCING DIMENSIONALITY
Example algorithms:
● PCA
● SVD
● UMAP
BY TOPIC SIMILARITY
CLUSTERING INTO TOPICS
Example algorithms:
● K-Means
● BIRCH
● HDBSCAN
BY TOPIC SIMILARITY
GETTING HANDS ON
BY TOPIC SIMILARITY
● Use this Colab
notebook:
bit.ly/topic-similarity
USING BERTOPIC FOR ALL
● Embeddings: BERT
● Reduce dimensions: UMAP
● Clustering: HDBSCAN
● Also summarises topics!
BY TOPIC SIMILARITY
GETTING HANDS ON
BY TOPIC SIMILARITY
● Use this Colab
notebook:
bit.ly/bertopic
BY INTENT
BY INTENT
SOME EXAMPLE INTENT CLUSTERS
Definition
Intro
Explain
Setup
Create
Make
BY INTENT
THE PROCESS
QUERY
KEYWORD
MENTION
CATEGORY
BUY SHIRTS
SHIRTS
SHIRTS SALE
SHIRTS IDEAS
Categorise
by keyword
mention with
the apriori
algorithm
BY INTENT
THE PROCESS
QUERY
KEYWORD
MENTION
CATEGORY
TF-IDF
WORD
BUY SHIRTS
SHIRTS
BUY
SHIRTS SALE SALE
SHIRTS IDEAS IDEAS
Use TF-IDF to
find rare
words at the
category level
BY INTENT
THE PROCESS
QUERY
KEYWORD
MENTION
CATEGORY
TF-IDF
WORD
CLUSTER
BUY SHIRTS
SHIRTS
BUY
PURCHASE
SHIRTS SALE SALE
SHIRTS IDEAS IDEAS INSPIRATION
Cluster TF-
IDF words
by
similarity
GETTING HANDS ON
● Use this Colab
notebook:
bit.ly/intent-cluster
BY INTENT
TRAINING A CLASSIFIER
TRAINING A CLASSIFIER
● Teach with
labelled data
& it then
labels new
data
HOW IT WORKS
Yes No
● Keyword mentions
● If you make any
manual changes to
labels
● Similarity (built in)
● Intent (built in)
YOU MIGHT NOT NEED ONE
TRAINING A CLASSIFIER
TRAINING A CLASSIFIER
WHAT YOU CAN USE IT FOR
● Classifying
new Search
Console data
TRAINING A CLASSIFIER
WHAT YOU CAN USE IT FOR
● Classifying
new keyword
research
QUERY CLASSIFICATION
SHIRT PRICES PURCHASE
SHIRT SIZES GUIDANCE
SHIRT COLOURS INSPIRATION
Train neural
network with
training set
Use test set
to evaluate
performance
Turn words into
word
embeddings
Split labelled data
into test &
training sets
TRAINING A CLASSIFIER
THE PROCESS
USING BERT
● You use its word
embeddings to
better understand
the labels
TRAINING A CLASSIFIER
Pros Cons
● Accurate (fine tune)
● Can use smaller
dataset
● Requires GPUs
● Takes a bit longer
USING BERT
TRAINING A CLASSIFIER
● Use this Colab
notebook:
GETTING HANDS ON
bit.ly/bert-classifier
TRAINING A CLASSIFIER
REMEMBER TO USE GPUS FOR BERT!
Runtime >
Change runtime type
TRAINING A CLASSIFIER
Pros Cons
● No GPUs required
● Bit quicker
USING FASTTEXT
● Less accurate
● Requires more data
TRAINING A CLASSIFIER
GETTING HANDS ON
● Use this Colab
notebook:
bit.ly/fasttext-classifier
TRAINING A CLASSIFIER
MAKING A TOOL ANYONE
CAN USE
MAKING A TOOL
● Easily put a front
end on your code
with Python
INTRODUCING STREAMLIT
MAKING A TOOL
● Get started here:
GET STARTED WITH STREAMLIT
bit.ly/start-streamlit
How to categorise 100K search queries in 15 minutes - MeasureFest
How to categorise 100K search queries in 15 minutes - MeasureFest
How to categorise 100K search queries in 15 minutes - MeasureFest
EXAMPLE INSIGHT
EXAMPLE INSIGHT
SUMMARY
Use the GSC API or GDS to
grab thousands of queries
Categorise them by keyword
mention, topic similarity or
intent with Python
Train a classifier to label
future data (but you might
not need to)
Use Streamlit to build a tool &
allow anyone to follow this
workflow
Gain insight on how people
search at different stages of
their journey
Follow me @richlawre
THANKS

More Related Content

PPTX
Why Scaling (Great) Content Is So Bloody Hard
PPTX
How to Automatically Subcategorise Your Website Automatically With Python
PPTX
Brighton SEO - Luis Bueno Tabernero - How to do an ASO Audit like in the 90's...
PPTX
Monet BrightonSEO Slides 2022
PPTX
Search Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEO
PPTX
Canonicalization for SEO BrightonSEO April 2023 Patrick Stox
PPTX
Machine Learning use cases for Technical SEO Automation Brighton SEO Patrick ...
PPTX
Crawl Budget: Everything you Need to Know
Why Scaling (Great) Content Is So Bloody Hard
How to Automatically Subcategorise Your Website Automatically With Python
Brighton SEO - Luis Bueno Tabernero - How to do an ASO Audit like in the 90's...
Monet BrightonSEO Slides 2022
Search Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEO
Canonicalization for SEO BrightonSEO April 2023 Patrick Stox
Machine Learning use cases for Technical SEO Automation Brighton SEO Patrick ...
Crawl Budget: Everything you Need to Know

What's hot (20)

PPTX
BrightonSEO - Master Crawl Budget Optimization for Enterprise Websites
PDF
SEO low hanging Fruit - Identifying High Impact Opportunities Fast #SEOforUkr...
PDF
Probabilistic Thinking in SEO - BrightonSEO October 2022
PPTX
Don't be a cannibal
PDF
Using Search Intent in our Link Building Efforts
PDF
SEO Automation Without Using Hard Code by Tevfik Mert Azizoglu - BrightonSEO ...
PPTX
The Full Scoop on Google's Title Rewrites
PPTX
BrightonSEO - NLP for SEOs - How to optimise your content for BERT.pptx
PDF
BrightonSEO - Apr 2022 - No excuses for doing UX
PDF
Creating Search Quality Algorithms - Richard Lawrence - BrightonSEO.pdf
PDF
The Hidden Gems of Low search volume
PPTX
HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO Apri...
PPTX
How to leverage indexation tracking to monitor issues and improve performance
PDF
How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...
PDF
Martin McGarry - SEO strategy c/o England manager Gareth Southgate
PDF
Stoicism in Digital - brightonSEO April 2022.pdf
PPTX
Beth Barnham Schema Auditing BrightonSEO Slides.pptx
PDF
BrightonSEO October 2022 - Martijn Scheybeler - SEO Testing: Find Out What Wo...
PPTX
I Am A Donut - How To Avoid International SEO Mistakes
PPTX
Diginius - DuckDuckGo, Privacy and the Future of Search
BrightonSEO - Master Crawl Budget Optimization for Enterprise Websites
SEO low hanging Fruit - Identifying High Impact Opportunities Fast #SEOforUkr...
Probabilistic Thinking in SEO - BrightonSEO October 2022
Don't be a cannibal
Using Search Intent in our Link Building Efforts
SEO Automation Without Using Hard Code by Tevfik Mert Azizoglu - BrightonSEO ...
The Full Scoop on Google's Title Rewrites
BrightonSEO - NLP for SEOs - How to optimise your content for BERT.pptx
BrightonSEO - Apr 2022 - No excuses for doing UX
Creating Search Quality Algorithms - Richard Lawrence - BrightonSEO.pdf
The Hidden Gems of Low search volume
HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO Apri...
How to leverage indexation tracking to monitor issues and improve performance
How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...
Martin McGarry - SEO strategy c/o England manager Gareth Southgate
Stoicism in Digital - brightonSEO April 2022.pdf
Beth Barnham Schema Auditing BrightonSEO Slides.pptx
BrightonSEO October 2022 - Martijn Scheybeler - SEO Testing: Find Out What Wo...
I Am A Donut - How To Avoid International SEO Mistakes
Diginius - DuckDuckGo, Privacy and the Future of Search
Ad

Similar to How to categorise 100K search queries in 15 minutes - MeasureFest (20)

PPTX
Using Compass to Diagnose Performance Problems
PPTX
Using Compass to Diagnose Performance Problems in Your Cluster
PPTX
Pie for ai
PDF
You Don't Know SEO
PDF
Managing PPC Campaigns with Adwords Editor
PDF
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
PDF
Croud Presents: How to Build a Data-driven SEO Strategy Using NLP
PDF
Continuous delivery for machine learning
PDF
Modeling at Scale: SigOpt at TWIMLcon 2019
PDF
Best Practices for Building Successful LLM Applications
PPTX
Using Python and Data Science Practices in SEO Analysis of Data
PDF
Journey's diary developing a framework using tdd
PPTX
Applying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAI
PDF
SEARCH API: TIPS AND TRICKS - FROM BEGINNING TO CUSTOM SOLUTIONS
PPTX
Django apps and ORM Beyond the basics [Meetup hosted by Prodeers.com]
PDF
How to Incorporate ML in your SERP Analysis, Lazarina Stoy -BrightonSEO Oct, ...
PDF
Building successful and secure products with AI and ML
PDF
Agile Gurugram 30-31Aug 2024 | The Art of Prompt Engineering for Agile Teams ...
PDF
Haystack 2019 - Evolution of Yelp search to a generalized ranking platform - ...
PPTX
Episode 3 – Classes, Inheritance, Abstract Class, and Interfaces
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems in Your Cluster
Pie for ai
You Don't Know SEO
Managing PPC Campaigns with Adwords Editor
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Croud Presents: How to Build a Data-driven SEO Strategy Using NLP
Continuous delivery for machine learning
Modeling at Scale: SigOpt at TWIMLcon 2019
Best Practices for Building Successful LLM Applications
Using Python and Data Science Practices in SEO Analysis of Data
Journey's diary developing a framework using tdd
Applying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAI
SEARCH API: TIPS AND TRICKS - FROM BEGINNING TO CUSTOM SOLUTIONS
Django apps and ORM Beyond the basics [Meetup hosted by Prodeers.com]
How to Incorporate ML in your SERP Analysis, Lazarina Stoy -BrightonSEO Oct, ...
Building successful and secure products with AI and ML
Agile Gurugram 30-31Aug 2024 | The Art of Prompt Engineering for Agile Teams ...
Haystack 2019 - Evolution of Yelp search to a generalized ranking platform - ...
Episode 3 – Classes, Inheritance, Abstract Class, and Interfaces
Ad

Recently uploaded (20)

PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPT
Quality review (1)_presentation of this 21
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Global journeys: estimating international migration
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Computer network topology notes for revision
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Supervised vs unsupervised machine learning algorithms
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Galatica Smart Energy Infrastructure Startup Pitch Deck
Quality review (1)_presentation of this 21
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Global journeys: estimating international migration
Miokarditis (Inflamasi pada Otot Jantung)
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Launch Your Data Science Career in Kochi – 2025
Business Acumen Training GuidePresentation.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Computer network topology notes for revision

How to categorise 100K search queries in 15 minutes - MeasureFest