Creating Search Quality Algorithms - Richard Lawrence - BrightonSEO.pdf

How to create your own
search quality
evaluation algorithms
Richard Lawrence
Sanity.io
@richlawre

@richlawre
● Principal SEO at
Sanity
Who the hell is this guy anyway?

@richlawre
● Sanity is a headless
CMS and more!

@richlawre
● Doing a Data Science
degree in my spare
time

The ‘helpful content update’ might have
been a bit of a damp squib…
@richlawre

…but Google is always working towards
ranking helpful content more highly
@richlawre

So wouldn’t it be great to know if your
content is helping your audience - at scale?
@richlawre

The search rater guidelines hold the key
@richlawre
167 page document
that says what good
looks like!

Google says it doesn’t directly use the
ratings in its ranking algorithms
“We use responses from Raters
to evaluate changes, but they
don’t directly impact how our
search results are ranked.”
bit.ly/ratings-answer
@richlawre

But it will use the rated content to help find
features of what ‘good’ looks like
@richlawre

Similar methods have been used for years
in various areas - like counterfeit notes
@richlawre

Features are found that best separate
authentic and counterfeit notes
Distance between edge & watermark
Width of
shaded area
Counterfeit
Authentic
@richlawre

Features for high vs. low quality content will
likely be more complex
@richlawre

Bing confirmed this is how it works in 2019
bit.ly/bing-confirmation @richlawre

With 90% of its algorithms being ML based
@richlawre
bit.ly/bing-features

Plus it revealed its process
@richlawre
bit.ly/bing-process

So how can we harness this as an industry?
@richlawre

We can try to create our own!
@richlawre

1. Label the content
2. Create a ‘Needs Met’ algorithm
3. Create a ‘Page Quality’ algorithm
What we need to do
@richlawre

Labelling the content
@richlawre

Get a representative sample of searches
448 million search queries
bit.ly/448-million @richlawre

Here’s how to play around with the file
@richlawre
bit.ly/large-file

Then gather the top 20 rankings for each
sample query
Likely available
feature of your
favourite rank
tracking software
@richlawre

Use some search raters to rate the content
Collect
labels
Choose
provider
Create
guidelines
Must not be
identical to
Google’s…
Needs Met &
Page Quality
2 search raters
with 3rd called in
for disagreements
@richlawre

Creating a Needs Met algorithm
@richlawre

This measures fulfilling search intent
Features will mainly be
relating to relevance
and structure
@richlawre

GPT language models are perfect for this
The open source option
@richlawre

GPT-3 became cheaper in September too
@richlawre

We need to create a pattern for GPT-J to learn
Content:
<h1>Compare car insurance quotes</h1>
<p>It's quick and easy to compare car insurance
and find cheaper cover – we just need a few
details about you and your vehicle.</p>
Target query: car insurance
Needs Met rating: Good
@richlawre

It will then rate new content
Content:
<h1>Car insurance</h1>
<p>From theft to write-offs and even lost keys,
you'll be covered with us. Here's what you'll like
about our comprehensive cover </p>
Target query: car insurance
Needs Met rating: ?????
@richlawre

We need to scrape content from each page to
give to the language model - with the rating
@richlawre

Then use this info to train GPT-J
@richlawre
bit.ly/finetune-gptj

You can also use existing services
@richlawre
NLP Cloud Forefront.ai

NLP Cloud also became cheaper!
@richlawre

Validate performance with a test set
@richlawre

Judge performance with a Confusion Matrix
@richlawre
Correct
Wrong
Correct Wrong
True positive False negative
False positive True negative
Actual
Prediction

Few shot learning can help improve
performance
@richlawre
Prompt
Example 1
Rating: Excellent
Example 2
Rating: Poor
Example 3
Rating: ????
GPT-J
Good

As can explaining to the model what it
needs to do!
@richlawre
Consider the content to rate.
Rate it according how well it
fits the search query.

We’ve done this for you within Sanity Studio
@richlawre

And lots of other great features
@richlawre

Contact us for more info about the beta for
these features:
bit.ly/sanity-beta
@richlawre

This isn’t perfect of course - though still very
useful
@richlawre
● Only text content
● Useful indication only
● Great at scale

Creating a Page Quality algorithm
@richlawre

This is much more difficult!
@richlawre

It measures how well a page achieves its
purpose
@richlawre
This is about quality of
content, independent
of search queries

So features can relate to a large number of
areas!
@richlawre
‘Main Content’ vs
‘Supplementary
Content’
Website
background
information
Amount of Main Content
Position of Main Content
Depth of ‘about’ info
Wikipedia presence

And you have to work out how to measure
them
@richlawre
Amount of Main
Content
Length of Main
Content area
Number of words
in Main Content

It becomes a huge multivariate challenge
@richlawre
Page
Length of
MC area
‘About us’
word count
Clicks to
‘About us’
Page 1 17cm 500 2
Page 2 20cm 300 1
Page 3 15cm 1000 2
Page 4 25cm 750 3

Then we need to find features that best
separate the groups
Number of words in ‘About’ section
Length of
‘Main Content’
area
High quality
Low quality
@richlawre

But with a large number of features!
@richlawre

This can be explored with a number of
potential models
@richlawre
Linear Discriminant Analysis

@richlawre
potential models
Random Forest

@richlawre
potential models
Neural Network

This is a huge challenge!
@richlawre

How to measure them?
@richlawre

The work is ongoing here!
@richlawre

Google likely uses its raters to gather
labelled data on content quality
@richlawre

It will then likely use that to find features of
‘good’ and ‘bad’ content
@richlawre

And creates algorithms to distinguish
between the two
@richlawre

You can do the same!
@richlawre

Get your own labelled content and create
your own scoring algorithms
@richlawre

We have created a ‘Needs Met’ score within
Sanity Studio
@richlawre

So that you can get an indication of content
calibre directly in your publishing workflow
@richlawre

Contact us to get more info about the beta
here:
bit.ly/sanity-beta
@richlawre

Richard Lawrence
Principal at Sanity.io
@richlawre
@richlawre

Creating Search Quality Algorithms - Richard Lawrence - BrightonSEO.pdf

More Related Content

What's hot (20)

Similar to Creating Search Quality Algorithms - Richard Lawrence - BrightonSEO.pdf (20)

Recently uploaded (20)

Creating Search Quality Algorithms - Richard Lawrence - BrightonSEO.pdf