SlideShare a Scribd company logo
How to create your own
search quality
evaluation algorithms
Richard Lawrence
Sanity.io
@richlawre
@richlawre
● Principal SEO at
Sanity
Who the hell is this guy anyway?
Who the hell is this guy anyway?
@richlawre
● Sanity is a headless
CMS and more!
@richlawre
● Doing a Data Science
degree in my spare
time
Who the hell is this guy anyway?
Onto some context
@richlawre
The ‘helpful content update’ might have
been a bit of a damp squib…
@richlawre
…but Google is always working towards
ranking helpful content more highly
@richlawre
So wouldn’t it be great to know if your
content is helping your audience - at scale?
@richlawre
The search rater guidelines hold the key
@richlawre
167 page document
that says what good
looks like!
Google says it doesn’t directly use the
ratings in its ranking algorithms
“We use responses from Raters
to evaluate changes, but they
don’t directly impact how our
search results are ranked.”
bit.ly/ratings-answer
@richlawre
But it will use the rated content to help find
features of what ‘good’ looks like
@richlawre
Similar methods have been used for years
in various areas - like counterfeit notes
@richlawre
Features are found that best separate
authentic and counterfeit notes
Distance between edge & watermark
Width of
shaded area
Counterfeit
Authentic
@richlawre
Features for high vs. low quality content will
likely be more complex
@richlawre
Bing confirmed this is how it works in 2019
bit.ly/bing-confirmation @richlawre
With 90% of its algorithms being ML based
@richlawre
bit.ly/bing-features
Plus it revealed its process
@richlawre
bit.ly/bing-process
So how can we harness this as an industry?
@richlawre
We can try to create our own!
@richlawre
1. Label the content
2. Create a ‘Needs Met’ algorithm
3. Create a ‘Page Quality’ algorithm
What we need to do
@richlawre
Labelling the content
@richlawre
Get a representative sample of searches
448 million search queries
bit.ly/448-million @richlawre
Here’s how to play around with the file
@richlawre
bit.ly/large-file
Then gather the top 20 rankings for each
sample query
Likely available
feature of your
favourite rank
tracking software
@richlawre
Use some search raters to rate the content
Collect
labels
Choose
provider
Create
guidelines
Must not be
identical to
Google’s…
Needs Met &
Page Quality
2 search raters
with 3rd called in
for disagreements
@richlawre
Creating a Needs Met algorithm
@richlawre
This measures fulfilling search intent
Features will mainly be
relating to relevance
and structure
@richlawre
GPT language models are perfect for this
The open source option
@richlawre
GPT-3 became cheaper in September too
@richlawre
We need to create a pattern for GPT-J to learn
Content:
<h1>Compare car insurance quotes</h1>
<p>It's quick and easy to compare car insurance
and find cheaper cover – we just need a few
details about you and your vehicle.</p>
Target query: car insurance
Needs Met rating: Good
@richlawre
It will then rate new content
Content:
<h1>Car insurance</h1>
<p>From theft to write-offs and even lost keys,
you'll be covered with us. Here's what you'll like
about our comprehensive cover </p>
Target query: car insurance
Needs Met rating: ?????
@richlawre
We need to scrape content from each page to
give to the language model - with the rating
@richlawre
Then use this info to train GPT-J
@richlawre
bit.ly/finetune-gptj
You can also use existing services
@richlawre
NLP Cloud Forefront.ai
NLP Cloud also became cheaper!
@richlawre
Validate performance with a test set
@richlawre
Judge performance with a Confusion Matrix
@richlawre
Correct
Wrong
Correct Wrong
True positive False negative
False positive True negative
Actual
Prediction
Few shot learning can help improve
performance
@richlawre
Prompt
Example 1
Rating: Excellent
Example 2
Rating: Poor
Example 3
Rating: ????
GPT-J
Good
As can explaining to the model what it
needs to do!
@richlawre
Consider the content to rate.
Rate it according how well it
fits the search query.
We’ve done this for you within Sanity Studio
@richlawre
And lots of other great features
@richlawre
Contact us for more info about the beta for
these features:
bit.ly/sanity-beta
@richlawre
This isn’t perfect of course - though still very
useful
@richlawre
● Only text content
● Useful indication only
● Great at scale
Creating a Page Quality algorithm
@richlawre
This is much more difficult!
@richlawre
It measures how well a page achieves its
purpose
@richlawre
This is about quality of
content, independent
of search queries
So features can relate to a large number of
areas!
@richlawre
‘Main Content’ vs
‘Supplementary
Content’
Website
background
information
Amount of Main Content
Position of Main Content
Depth of ‘about’ info
Wikipedia presence
And you have to work out how to measure
them
@richlawre
Amount of Main
Content
Length of Main
Content area
Number of words
in Main Content
It becomes a huge multivariate challenge
@richlawre
Page
Length of
MC area
‘About us’
word count
Clicks to
‘About us’
Page 1 17cm 500 2
Page 2 20cm 300 1
Page 3 15cm 1000 2
Page 4 25cm 750 3
Then we need to find features that best
separate the groups
Number of words in ‘About’ section
Length of
‘Main Content’
area
High quality
Low quality
@richlawre
But with a large number of features!
@richlawre
This can be explored with a number of
potential models
@richlawre
Linear Discriminant Analysis
@richlawre
This can be explored with a number of
potential models
Random Forest
@richlawre
This can be explored with a number of
potential models
Neural Network
This is a huge challenge!
@richlawre
Which features?
@richlawre
How to measure them?
@richlawre
Which model?
@richlawre
The work is ongoing here!
@richlawre
Let’s sum up
@richlawre
Google likely uses its raters to gather
labelled data on content quality
@richlawre
It will then likely use that to find features of
‘good’ and ‘bad’ content
@richlawre
And creates algorithms to distinguish
between the two
@richlawre
You can do the same!
@richlawre
Get your own labelled content and create
your own scoring algorithms
@richlawre
We have created a ‘Needs Met’ score within
Sanity Studio
@richlawre
So that you can get an indication of content
calibre directly in your publishing workflow
@richlawre
Contact us to get more info about the beta
here:
bit.ly/sanity-beta
@richlawre
Richard Lawrence
Principal at Sanity.io
@richlawre
@richlawre

More Related Content

PDF
BrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdf
PDF
How to control googlebot
PPTX
I Am A Donut - How To Avoid International SEO Mistakes
PPTX
How to come up with content ideas without relying on search volume.pptx
PPTX
Can you trust AI with your content?
PPTX
Shining a light on the dark funnel
PPTX
BrightonSEO - Master Crawl Budget Optimization for Enterprise Websites
PDF
Probabilistic Thinking in SEO - BrightonSEO October 2022
BrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdf
How to control googlebot
I Am A Donut - How To Avoid International SEO Mistakes
How to come up with content ideas without relying on search volume.pptx
Can you trust AI with your content?
Shining a light on the dark funnel
BrightonSEO - Master Crawl Budget Optimization for Enterprise Websites
Probabilistic Thinking in SEO - BrightonSEO October 2022

What's hot (20)

PPTX
Machine Learning use cases for Technical SEO Automation Brighton SEO Patrick ...
PDF
Networking for SEOs (and why it matters)
PPTX
Holistic Search - Developing An Organic First Strategy
PPTX
How SEO changes, as we say bye bye to cookies
PDF
How to Create an Airtight SEO Strategy to Beat Any Competitor - Rumble Romagnoli
PDF
BrightonSEO - Apr 2022 - No excuses for doing UX
PPTX
How to leverage indexation tracking to monitor issues and improve performance
PDF
How to get more traffic with less content - BrightonSEO
PPTX
Lucy Dodds - BrightonSEO Autumn 22
PDF
How to Incorporate ML in your SERP Analysis, Lazarina Stoy -BrightonSEO Oct, ...
PPTX
BrightonSEO - NLP for SEOs - How to optimise your content for BERT.pptx
PDF
BrightonSEO-Pres.pdf
PPTX
Why Scaling (Great) Content Is So Bloody Hard
PDF
How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...
PPTX
Swipe left: Why your content is getting ghosted
PDF
BrightonSEO October 2022 - Martijn Scheybeler - SEO Testing: Find Out What Wo...
PPTX
Monet BrightonSEO Slides 2022
PDF
Agile SEO: Prioritise SEO Activities with Cadence and Risk Radius
PDF
Making Magento Magnificent for Marketing - Brighton SEO Spring 2023.pdf
PDF
BrightonSEO slide deck Oct 2022 - Levi Williams-Clucas - Review Generation an...
Machine Learning use cases for Technical SEO Automation Brighton SEO Patrick ...
Networking for SEOs (and why it matters)
Holistic Search - Developing An Organic First Strategy
How SEO changes, as we say bye bye to cookies
How to Create an Airtight SEO Strategy to Beat Any Competitor - Rumble Romagnoli
BrightonSEO - Apr 2022 - No excuses for doing UX
How to leverage indexation tracking to monitor issues and improve performance
How to get more traffic with less content - BrightonSEO
Lucy Dodds - BrightonSEO Autumn 22
How to Incorporate ML in your SERP Analysis, Lazarina Stoy -BrightonSEO Oct, ...
BrightonSEO - NLP for SEOs - How to optimise your content for BERT.pptx
BrightonSEO-Pres.pdf
Why Scaling (Great) Content Is So Bloody Hard
How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...
Swipe left: Why your content is getting ghosted
BrightonSEO October 2022 - Martijn Scheybeler - SEO Testing: Find Out What Wo...
Monet BrightonSEO Slides 2022
Agile SEO: Prioritise SEO Activities with Cadence and Risk Radius
Making Magento Magnificent for Marketing - Brighton SEO Spring 2023.pdf
BrightonSEO slide deck Oct 2022 - Levi Williams-Clucas - Review Generation an...
Ad

Similar to Creating Search Quality Algorithms - Richard Lawrence - BrightonSEO.pdf (20)

PDF
Smx toronto adv-kw-research-final
PDF
Advanced Keyword Research SMX Toronto March 2013
PPTX
Search Quality Evaluator Guidelines. Digirank Ltd Aug 18
PPTX
How Google works
PDF
The latest version of Google’s search engine evaluation guide
PDF
Web Performance & Search Engines - A look beyond rankings
PDF
"Empathy Behind the Algorithms" by Chris Corak - Now What? Conference 2017
PPTX
David Yarian- Volume 9
PPTX
SearchCon 2016 | 3 Insights from a Google Engineer with David Yarian
PDF
How Is AI Going to Impact SEO?
PDF
SEO Master Class - Steve Wiideman, Wiideman Consulting Group
PPTX
Search and social patents for 2012 and beyond
PPTX
Haystack keynote 2019: What is Search Relevance? - Max Irwin
PPTX
Ranking Elements of the Future
PDF
Robin Fishley | Saatchi & Saatchi | Brighton SEO slides April 2017 | A new wa...
PPTX
Rand Fishkin en The Inbounder
PDF
Search quality in practice
PDF
Measuring Relevance in the Negative Space
PDF
Design the Search Experience
PPTX
Plerdy's CRO/UX_Party February 2021 - Dan Taylor - SEO & UX
Smx toronto adv-kw-research-final
Advanced Keyword Research SMX Toronto March 2013
Search Quality Evaluator Guidelines. Digirank Ltd Aug 18
How Google works
The latest version of Google’s search engine evaluation guide
Web Performance & Search Engines - A look beyond rankings
"Empathy Behind the Algorithms" by Chris Corak - Now What? Conference 2017
David Yarian- Volume 9
SearchCon 2016 | 3 Insights from a Google Engineer with David Yarian
How Is AI Going to Impact SEO?
SEO Master Class - Steve Wiideman, Wiideman Consulting Group
Search and social patents for 2012 and beyond
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Ranking Elements of the Future
Robin Fishley | Saatchi & Saatchi | Brighton SEO slides April 2017 | A new wa...
Rand Fishkin en The Inbounder
Search quality in practice
Measuring Relevance in the Negative Space
Design the Search Experience
Plerdy's CRO/UX_Party February 2021 - Dan Taylor - SEO & UX
Ad

Recently uploaded (20)

PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Database Infoormation System (DBIS).pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Foundation of Data Science unit number two notes
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Reliability_Chapter_ presentation 1221.5784
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Business Acumen Training GuidePresentation.pptx
Fluorescence-microscope_Botany_detailed content
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
IB Computer Science - Internal Assessment.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Moving the Public Sector (Government) to a Digital Adoption
Business Ppt On Nestle.pptx huunnnhhgfvu
Database Infoormation System (DBIS).pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Launch Your Data Science Career in Kochi – 2025
oil_refinery_comprehensive_20250804084928 (1).pptx
climate analysis of Dhaka ,Banglades.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
.pdf is not working space design for the following data for the following dat...
Foundation of Data Science unit number two notes
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb

Creating Search Quality Algorithms - Richard Lawrence - BrightonSEO.pdf

  • 1. How to create your own search quality evaluation algorithms Richard Lawrence Sanity.io @richlawre
  • 2. @richlawre ● Principal SEO at Sanity Who the hell is this guy anyway?
  • 3. Who the hell is this guy anyway? @richlawre ● Sanity is a headless CMS and more!
  • 4. @richlawre ● Doing a Data Science degree in my spare time Who the hell is this guy anyway?
  • 6. The ‘helpful content update’ might have been a bit of a damp squib… @richlawre
  • 7. …but Google is always working towards ranking helpful content more highly @richlawre
  • 8. So wouldn’t it be great to know if your content is helping your audience - at scale? @richlawre
  • 9. The search rater guidelines hold the key @richlawre 167 page document that says what good looks like!
  • 10. Google says it doesn’t directly use the ratings in its ranking algorithms “We use responses from Raters to evaluate changes, but they don’t directly impact how our search results are ranked.” bit.ly/ratings-answer @richlawre
  • 11. But it will use the rated content to help find features of what ‘good’ looks like @richlawre
  • 12. Similar methods have been used for years in various areas - like counterfeit notes @richlawre
  • 13. Features are found that best separate authentic and counterfeit notes Distance between edge & watermark Width of shaded area Counterfeit Authentic @richlawre
  • 14. Features for high vs. low quality content will likely be more complex @richlawre
  • 15. Bing confirmed this is how it works in 2019 bit.ly/bing-confirmation @richlawre
  • 16. With 90% of its algorithms being ML based @richlawre bit.ly/bing-features
  • 17. Plus it revealed its process @richlawre bit.ly/bing-process
  • 18. So how can we harness this as an industry? @richlawre
  • 19. We can try to create our own! @richlawre
  • 20. 1. Label the content 2. Create a ‘Needs Met’ algorithm 3. Create a ‘Page Quality’ algorithm What we need to do @richlawre
  • 22. Get a representative sample of searches 448 million search queries bit.ly/448-million @richlawre
  • 23. Here’s how to play around with the file @richlawre bit.ly/large-file
  • 24. Then gather the top 20 rankings for each sample query Likely available feature of your favourite rank tracking software @richlawre
  • 25. Use some search raters to rate the content Collect labels Choose provider Create guidelines Must not be identical to Google’s… Needs Met & Page Quality 2 search raters with 3rd called in for disagreements @richlawre
  • 26. Creating a Needs Met algorithm @richlawre
  • 27. This measures fulfilling search intent Features will mainly be relating to relevance and structure @richlawre
  • 28. GPT language models are perfect for this The open source option @richlawre
  • 29. GPT-3 became cheaper in September too @richlawre
  • 30. We need to create a pattern for GPT-J to learn Content: <h1>Compare car insurance quotes</h1> <p>It's quick and easy to compare car insurance and find cheaper cover – we just need a few details about you and your vehicle.</p> Target query: car insurance Needs Met rating: Good @richlawre
  • 31. It will then rate new content Content: <h1>Car insurance</h1> <p>From theft to write-offs and even lost keys, you'll be covered with us. Here's what you'll like about our comprehensive cover </p> Target query: car insurance Needs Met rating: ????? @richlawre
  • 32. We need to scrape content from each page to give to the language model - with the rating @richlawre
  • 33. Then use this info to train GPT-J @richlawre bit.ly/finetune-gptj
  • 34. You can also use existing services @richlawre NLP Cloud Forefront.ai
  • 35. NLP Cloud also became cheaper! @richlawre
  • 36. Validate performance with a test set @richlawre
  • 37. Judge performance with a Confusion Matrix @richlawre Correct Wrong Correct Wrong True positive False negative False positive True negative Actual Prediction
  • 38. Few shot learning can help improve performance @richlawre Prompt Example 1 Rating: Excellent Example 2 Rating: Poor Example 3 Rating: ???? GPT-J Good
  • 39. As can explaining to the model what it needs to do! @richlawre Consider the content to rate. Rate it according how well it fits the search query.
  • 40. We’ve done this for you within Sanity Studio @richlawre
  • 41. And lots of other great features @richlawre
  • 42. Contact us for more info about the beta for these features: bit.ly/sanity-beta @richlawre
  • 43. This isn’t perfect of course - though still very useful @richlawre ● Only text content ● Useful indication only ● Great at scale
  • 44. Creating a Page Quality algorithm @richlawre
  • 45. This is much more difficult! @richlawre
  • 46. It measures how well a page achieves its purpose @richlawre This is about quality of content, independent of search queries
  • 47. So features can relate to a large number of areas! @richlawre ‘Main Content’ vs ‘Supplementary Content’ Website background information Amount of Main Content Position of Main Content Depth of ‘about’ info Wikipedia presence
  • 48. And you have to work out how to measure them @richlawre Amount of Main Content Length of Main Content area Number of words in Main Content
  • 49. It becomes a huge multivariate challenge @richlawre Page Length of MC area ‘About us’ word count Clicks to ‘About us’ Page 1 17cm 500 2 Page 2 20cm 300 1 Page 3 15cm 1000 2 Page 4 25cm 750 3
  • 50. Then we need to find features that best separate the groups Number of words in ‘About’ section Length of ‘Main Content’ area High quality Low quality @richlawre
  • 51. But with a large number of features! @richlawre
  • 52. This can be explored with a number of potential models @richlawre Linear Discriminant Analysis
  • 53. @richlawre This can be explored with a number of potential models Random Forest
  • 54. @richlawre This can be explored with a number of potential models Neural Network
  • 55. This is a huge challenge! @richlawre
  • 57. How to measure them? @richlawre
  • 59. The work is ongoing here! @richlawre
  • 61. Google likely uses its raters to gather labelled data on content quality @richlawre
  • 62. It will then likely use that to find features of ‘good’ and ‘bad’ content @richlawre
  • 63. And creates algorithms to distinguish between the two @richlawre
  • 64. You can do the same! @richlawre
  • 65. Get your own labelled content and create your own scoring algorithms @richlawre
  • 66. We have created a ‘Needs Met’ score within Sanity Studio @richlawre
  • 67. So that you can get an indication of content calibre directly in your publishing workflow @richlawre
  • 68. Contact us to get more info about the beta here: bit.ly/sanity-beta @richlawre
  • 69. Richard Lawrence Principal at Sanity.io @richlawre @richlawre