SlideShare a Scribd company logo
Audience segmentation with
machine learning
Richard Lawrence
Rise at Seven
@richlawre
About me
SEO background,
studying a Data
Science degree in
spare time.
@richlawre
About me
Follow me on Twitter
@richlawre
@richlawre
What we’re going to
cover
@richlawre
@richlawre
A bit of context
about machine
learning AUDIENCE 1
Pageviews
Transaction
revenue
AUDIENCE 2
AUDIENCE 3
The agenda
@richlawre
An overview of
how audience
segmentation
works
AUDIENCE 1
Pageviews
Transaction
revenue
AUDIENCE 2
AUDIENCE 3
The agenda
@richlawre
Some detail about
how to do it
AUDIENCE 1
Pageviews
Transaction
revenue
AUDIENCE 2
AUDIENCE 3
The agenda
@richlawre
How to take things
further
AUDIENCE 1
Pageviews
Transaction
revenue
AUDIENCE 2
AUDIENCE 3
The agenda
A bit of context
@richlawre
It learns with
labelled data
@richlawre
What is supervised
machine learning?
@richlawre
It finds
patterns in
data
What is unsupervised
machine learning?
Audience segmentation
in a nutshell
@richlawre
We extract data about
individual sessions from
web analytics
@richlawre
Extracting the data
CHANNEL SESSIONS TRANSACTIONS REVENUE
Organic search 1000 50 £12,000
Paid search 700 30 £3,000
Direct 500 25 £6,000
Referral 300 30 £4,000
Instead of
grouping sessions
by channel or
section...
@richlawre
Extracting the data
...we extract
details about
individual
sessions
@richlawre
SESSION ID PAGEVIEWS TIME PER PAGE REVENUE
Session 1 7 30 seconds £77.50
Session 2 10 20 seconds £27.50
Session 3 5 23 seconds £36.50
Session 4 8 18 seconds £45.30
We then use unsupervised
machine learning to find
interesting patterns
@richlawre
Instead of
analysing sessions
grouped together
in some way...
Finding patterns
@richlawre
AUDIENCE 1
Pageviews
Transaction
revenue
AUDIENCE 2
AUDIENCE 3
...we use machine
learning to find
patterns in user
behaviour.
@richlawre
Finding patterns
This results in
actionable audience
segments
@richlawre
The Gatherer
Landing section: Homepage
Least time
per page
Most number of
pages viewed
Highest number of
conversions per
session
Most likely to
download a
brochure
Description:
The Gatherer comes directly to the website to the homepage,
visits multiple car models to download a brochure for each to look
at offline later.
Example CRO Test:
Link to a model comparison table from
the homepage with option to download
a brochure for each model
Likely onsite journey
Example segment from
Car manufacturer
Second section: Car Models
Exit section: Car Models
@richlawre
The Skipper
Example segment from
Train operator
Description:
The Skipper has likely already done their travel research (around
when to travel & where) multiple times without buying and are
simply returning - likely at the last minute - to finally finish task.
Example CRO Test:
Use a cookie to add a banner to the
homepage that takes a returning user
back to where they left off in the
transaction process.
Slightly more time
per page then average
More likely to buy in
the evening or at night
Fewest days since last
session
Fewest pages per
visit
Over index for
visiting via tablet
Over index for
visiting via email
@richlawre
Why do you need to do
this?
@richlawre
1.Find behaviours you
may not have realised
existed
@richlawre
2.Generate test
hypotheses for CRO
@richlawre
3.Track behaviours
over time (more about
this later)
@richlawre
How to do it
@richlawre
The key steps
1. Extract the data
2. Process the data
3. Select features
4. Cluster the data
5. Manually explore the segments
@richlawre
1.Extracting the data
@richlawre
Using Google Analytics API
Extract by Session ID
or Client ID
@richlawre
https://www.jcchouinard.com/google-analytics-api-using-python/
Using Google Analytics API
Useful dimensions:
landingPagePath
secondPagePath
exitPagePath
@richlawre
https://www.jcchouinard.com/google-analytics-api-using-python/
Using Google Analytics API
Useful metrics:
pageviewsPerSession
revenuePerTransaction
goalXXCompletions
https://www.jcchouinard.com/google-analytics-api-using-python/
@richlawre
Using Google Analytics API
There is a limit on the
number of
metrics/dimensions
10
@richlawre
https://www.jcchouinard.com/google-analytics-api-using-python/
Using Google Analytics API
There is also a limit
on the number of
rows per call
25,000
@richlawre
https://www.jcchouinard.com/google-analytics-api-using-python/
Using Google Analytics API
The answer is to
loop over days,
metrics, dimensions
& merge!
@richlawre
https://www.jcchouinard.com/google-analytics-api-using-python/
Using BigQuery
Data is nested -
I’ve found it makes
things more
difficult at the
session level
https://adswerve.com/blog/google-analytics-queries-in-bigquery-part-two-users-sessions-unnesting-hits/
@richlawre
Using BigQuery
However it is
possible to do and
there is some great
information around
https://adswerve.com/blog/google-analytics-queries-in-bigquery-part-two-users-sessions-unnesting-hits/
@richlawre
Using BigQuery
Can also run the
unsupervised
machine learning
algorithm directly
in SQL
https://adswerve.com/blog/google-analytics-queries-in-bigquery-part-two-users-sessions-unnesting-hits/
@richlawre
Using BigQuery
Previously used 1M
sessions with
Python & Google
Colab - BigQuery
wasn’t necessary
https://adswerve.com/blog/google-analytics-queries-in-bigquery-part-two-users-sessions-unnesting-hits/
@richlawre
Using BigQuery
Choose days at
random to ensure
variation
https://adswerve.com/blog/google-analytics-queries-in-bigquery-part-two-users-sessions-unnesting-hits/
@richlawre
2.Processing the data
@richlawre
Useful data transformations
Change hours of
the day to
morning,
afternoon,
evening,night
SESSION ID DAY DAY TYPE
Session 1 Monday Weekday
Session 2 Tuesday Weekday
Session 3 Saturday Weekend
Session 4 Wednesday Weekday
@richlawre
Change days to
weekday &
weekend
@richlawre
SESSION ID DAY DAY TYPE
Session 1 Monday Weekday
Session 2 Tuesday Weekday
Session 3 Saturday Weekend
Session 4 Wednesday Weekday
Useful data transformations
Change pages to
sections
@richlawre
SESSION ID DAY DAY TYPE
Session 1 Monday Weekday
Session 2 Tuesday Weekday
Session 3 Saturday Weekend
Session 4 Wednesday Weekday
Useful data transformations
Useful data transformations
Combine certain
conversion points
@richlawre
SESSION ID DAY DAY TYPE
Session 1 Monday Weekday
Session 2 Tuesday Weekday
Session 3 Saturday Weekend
Session 4 Wednesday Weekday
Here is a useful link
to do find and
replace it in Python
& Pandas
@richlawre
SESSION ID DAY DAY TYPE
Session 1 Monday Weekday
Session 2 Tuesday Weekday
Session 3 Saturday Weekend
Session 4 Wednesday Weekday
Useful data transformations
You could use
Google DataPrep
instead
@richlawre
SESSION ID DAY DAY TYPE
Session 1 Monday Weekday
Session 2 Tuesday Weekday
Session 3 Saturday Weekend
Session 4 Wednesday Weekday
Useful data transformations
One hot encoding
Converts categories
to 1s & 0s.
SESSION
ID
CHANNEL
Session 1 Organic Search
Session 2 Paid Search
Session 3 Direct
Session 4 Direct
SESSION
ID
ORGANIC
SEARCH
PAID
SEARCH
DIRECT
Session 1 1 0 0
Session 2 0 1 0
Session 3 0 0 1
Session 4 0 0 1
@richlawre
Values aren’t
increasing so doesn’t
skew the clustering
algorithm
SESSION
ID
CHANNEL
Session 1 Organic Search
Session 2 Paid Search
Session 3 Direct
Session 4 Direct
SESSION
ID
ORGANIC
SEARCH
PAID
SEARCH
DIRECT
Session 1 1 0 0
Session 2 0 1 0
Session 3 0 0 1
Session 4 0 0 1
@richlawre
One hot encoding
Use for numerical as
well as categorical
data
SESSION
ID
CHANNEL
Session 1 Organic Search
Session 2 Paid Search
Session 3 Direct
Session 4 Direct
SESSION
ID
ORGANIC
SEARCH
PAID
SEARCH
DIRECT
Session 1 1 0 0
Session 2 0 1 0
Session 3 0 0 1
Session 4 0 0 1
@richlawre
One hot encoding
See here for how to
do it with Python
SESSION
ID
CHANNEL
Session 1 Organic Search
Session 2 Paid Search
Session 3 Direct
Session 4 Direct
SESSION
ID
ORGANIC
SEARCH
PAID
SEARCH
DIRECT
Session 1 1 0 0
Session 2 0 1 0
Session 3 0 0 1
Session 4 0 0 1
@richlawre
One hot encoding
3.Selecting features
@richlawre
Best subset regression
Choose desired
response variable
& find potential
explanatory
variables
@richlawre
Best subset regression
Runs regression
analysis for
combinations of
variables at once to
find correlation
@richlawre
Best subset regression
This will help you
narrow down
features to find
useful patterns
within
@richlawre
Best subset regression
See Python
walkthrough here
@richlawre
4.Clustering the data
@richlawre
Principal Component
Analysis
Transforms a large
set of variables into
a smaller one
without much loss
@richlawre
Principal Component
Analysis
See walkthrough
here.
@richlawre
Using a KMeans algorithm
The
unsupervised
machine learning
algorithm to find
patterns
@richlawre
Using a KMeans algorithm
See a full
walkthrough here
with Python.
@richlawre
Using a KMeans algorithm
You can also do this
directly in BigQuery.
@richlawre
Using a silhouette score
Way of finding the
optimum
number of
clusters
@richlawre
Using a silhouette score
Optimal number is
at the elbow in
the graph - not
much gain after
this
@richlawre
5.Always manually
explore the segments!
@richlawre
Taking it to the next level
@richlawre
Classify any future session
@richlawre
Use the labelled
data to train a
supervised
machine learning
algorithm - we use
deep learning
Classify any future session
@richlawre
The better defined
your segments,
the better this will
perform
Classify any future session
@richlawre
Push the labelled
sessions back into
Google Analytics
via Data Import
Visualise in Streamlit
@richlawre
CRM segment 1
CRM segment 2
CRM segment 3
CRM segment 4
CRM segment 5
Summary
@richlawre
Unsupervised machine learning
finds interesting patterns in data.
@richlawre
Apply this to individual sessions
from Google Analytics to create
behaviour segments.
@richlawre
This can be a great source of ideas
for CRO hypotheses.
@richlawre
There are 5 steps for the analysis:
extract,process,feature selection,
cluster,manually explore
@richlawre
You can use Python or other
toolsets (Google Cloud) to do the
analysis.
@richlawre
You can use the segments to label
any future session on the website.
@richlawre
Thanks!
@richlawre

More Related Content

PPTX
How to categorise 100K search queries in 15 minutes - MeasureFest
PDF
The Ultimate Google Indexing Session
PDF
Command Line Hacks For SEO - Brighton April 2018 - Tom Pool
PDF
TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...
PPTX
What we can learn from losing SEO tests
PDF
Creating Search Quality Algorithms - Richard Lawrence - BrightonSEO.pdf
PPTX
MeasureFest 2021: Interactive Core Web Vitals In Data Studio
PDF
Frontend Crash Course: HTML and CSS
How to categorise 100K search queries in 15 minutes - MeasureFest
The Ultimate Google Indexing Session
Command Line Hacks For SEO - Brighton April 2018 - Tom Pool
TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...
What we can learn from losing SEO tests
Creating Search Quality Algorithms - Richard Lawrence - BrightonSEO.pdf
MeasureFest 2021: Interactive Core Web Vitals In Data Studio
Frontend Crash Course: HTML and CSS

What's hot (20)

PDF
Probabilistic Thinking in SEO - BrightonSEO October 2022
PDF
BrightonSEO 2022.pdf
PPTX
Data Pitfalls - Brighton SEO - Katie Swann.pptx
PDF
Networking for SEOs (and why it matters)
PPTX
BrightonSEO October 2022 - Dan Taylor SEO - Indexing Ecommerce Websites
PDF
A beginner's guide to machine learning for SEOs - WTSFest 2022
PPTX
Don't F*ck Up Your Site Migration - Serena Pearson
PDF
BrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdf
PPTX
Brighton SEO: SEO + PPC Working Together
PPTX
Swipe left: Why your content is getting ghosted
PPTX
Diginius - DuckDuckGo, Privacy and the Future of Search
PPTX
BrightonSEO - Master Crawl Budget Optimization for Enterprise Websites
PDF
Como Utilizar a Intenção de Busca para Dominar o Google Discover
PPTX
Intro tot language models voor SEO
PPTX
William slawski-google-patents- how-do-they-influence-search
PPTX
Monet BrightonSEO Slides 2022
PDF
Martin McGarry - SEO strategy c/o England manager Gareth Southgate
PDF
Everything You Didn't Know About Entity SEO
PPTX
React, Flux y React native
PDF
Using Search Intent in our Link Building Efforts
Probabilistic Thinking in SEO - BrightonSEO October 2022
BrightonSEO 2022.pdf
Data Pitfalls - Brighton SEO - Katie Swann.pptx
Networking for SEOs (and why it matters)
BrightonSEO October 2022 - Dan Taylor SEO - Indexing Ecommerce Websites
A beginner's guide to machine learning for SEOs - WTSFest 2022
Don't F*ck Up Your Site Migration - Serena Pearson
BrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdf
Brighton SEO: SEO + PPC Working Together
Swipe left: Why your content is getting ghosted
Diginius - DuckDuckGo, Privacy and the Future of Search
BrightonSEO - Master Crawl Budget Optimization for Enterprise Websites
Como Utilizar a Intenção de Busca para Dominar o Google Discover
Intro tot language models voor SEO
William slawski-google-patents- how-do-they-influence-search
Monet BrightonSEO Slides 2022
Martin McGarry - SEO strategy c/o England manager Gareth Southgate
Everything You Didn't Know About Entity SEO
React, Flux y React native
Using Search Intent in our Link Building Efforts
Ad

Similar to MeasureFest July 2021 - Session Segmentation with Machine Learning (20)

PPT
BAQMaR - Conference DM
PDF
Piano rubyslava final
PPTX
Day 1 (Lecture 2): Business Analytics
PDF
Predicting online user behaviour using deep learning algorithms
PDF
Transitioning to-lean-at-infochimps
PPTX
How OMGPOP Uses Predictive Analytics to Drive Change
PPTX
BMDSE v1 - Data Scientist Deck
PDF
Data Con LA 2022 - Real world consumer segmentation
PDF
Data Analysis - Making Big Data Work
PDF
Detection of Behavior using Machine Learning
PDF
Customer segmentation scbcn17
PPTX
Customer analytics
PPTX
Rd big data & analytics v1.0
PDF
Web log data analysis by enhanced fuzzy c
PDF
ecommerce analytics with BigQuery ML
PDF
Web analytics using R
PDF
Working With Big Data
PDF
Machine learning and big data
PDF
Machine learning for customer classification
PPTX
Analytics infrastructure, platforms and methods
BAQMaR - Conference DM
Piano rubyslava final
Day 1 (Lecture 2): Business Analytics
Predicting online user behaviour using deep learning algorithms
Transitioning to-lean-at-infochimps
How OMGPOP Uses Predictive Analytics to Drive Change
BMDSE v1 - Data Scientist Deck
Data Con LA 2022 - Real world consumer segmentation
Data Analysis - Making Big Data Work
Detection of Behavior using Machine Learning
Customer segmentation scbcn17
Customer analytics
Rd big data & analytics v1.0
Web log data analysis by enhanced fuzzy c
ecommerce analytics with BigQuery ML
Web analytics using R
Working With Big Data
Machine learning and big data
Machine learning for customer classification
Analytics infrastructure, platforms and methods
Ad

Recently uploaded (20)

PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
Quality review (1)_presentation of this 21
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
Moving the Public Sector (Government) to a Digital Adoption
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Quality review (1)_presentation of this 21
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Business Acumen Training GuidePresentation.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Fluorescence-microscope_Botany_detailed content
1_Introduction to advance data techniques.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Galatica Smart Energy Infrastructure Startup Pitch Deck
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Introduction-to-Cloud-ComputingFinal.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Miokarditis (Inflamasi pada Otot Jantung)
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Moving the Public Sector (Government) to a Digital Adoption

MeasureFest July 2021 - Session Segmentation with Machine Learning

  • 1. Audience segmentation with machine learning Richard Lawrence Rise at Seven @richlawre
  • 2. About me SEO background, studying a Data Science degree in spare time. @richlawre
  • 3. About me Follow me on Twitter @richlawre @richlawre
  • 4. What we’re going to cover @richlawre
  • 5. @richlawre A bit of context about machine learning AUDIENCE 1 Pageviews Transaction revenue AUDIENCE 2 AUDIENCE 3 The agenda
  • 6. @richlawre An overview of how audience segmentation works AUDIENCE 1 Pageviews Transaction revenue AUDIENCE 2 AUDIENCE 3 The agenda
  • 7. @richlawre Some detail about how to do it AUDIENCE 1 Pageviews Transaction revenue AUDIENCE 2 AUDIENCE 3 The agenda
  • 8. @richlawre How to take things further AUDIENCE 1 Pageviews Transaction revenue AUDIENCE 2 AUDIENCE 3 The agenda
  • 9. A bit of context @richlawre
  • 10. It learns with labelled data @richlawre What is supervised machine learning?
  • 11. @richlawre It finds patterns in data What is unsupervised machine learning?
  • 12. Audience segmentation in a nutshell @richlawre
  • 13. We extract data about individual sessions from web analytics @richlawre
  • 14. Extracting the data CHANNEL SESSIONS TRANSACTIONS REVENUE Organic search 1000 50 £12,000 Paid search 700 30 £3,000 Direct 500 25 £6,000 Referral 300 30 £4,000 Instead of grouping sessions by channel or section... @richlawre
  • 15. Extracting the data ...we extract details about individual sessions @richlawre SESSION ID PAGEVIEWS TIME PER PAGE REVENUE Session 1 7 30 seconds £77.50 Session 2 10 20 seconds £27.50 Session 3 5 23 seconds £36.50 Session 4 8 18 seconds £45.30
  • 16. We then use unsupervised machine learning to find interesting patterns @richlawre
  • 17. Instead of analysing sessions grouped together in some way... Finding patterns @richlawre
  • 18. AUDIENCE 1 Pageviews Transaction revenue AUDIENCE 2 AUDIENCE 3 ...we use machine learning to find patterns in user behaviour. @richlawre Finding patterns
  • 19. This results in actionable audience segments @richlawre
  • 20. The Gatherer Landing section: Homepage Least time per page Most number of pages viewed Highest number of conversions per session Most likely to download a brochure Description: The Gatherer comes directly to the website to the homepage, visits multiple car models to download a brochure for each to look at offline later. Example CRO Test: Link to a model comparison table from the homepage with option to download a brochure for each model Likely onsite journey Example segment from Car manufacturer Second section: Car Models Exit section: Car Models @richlawre
  • 21. The Skipper Example segment from Train operator Description: The Skipper has likely already done their travel research (around when to travel & where) multiple times without buying and are simply returning - likely at the last minute - to finally finish task. Example CRO Test: Use a cookie to add a banner to the homepage that takes a returning user back to where they left off in the transaction process. Slightly more time per page then average More likely to buy in the evening or at night Fewest days since last session Fewest pages per visit Over index for visiting via tablet Over index for visiting via email @richlawre
  • 22. Why do you need to do this? @richlawre
  • 23. 1.Find behaviours you may not have realised existed @richlawre
  • 25. 3.Track behaviours over time (more about this later) @richlawre
  • 26. How to do it @richlawre
  • 27. The key steps 1. Extract the data 2. Process the data 3. Select features 4. Cluster the data 5. Manually explore the segments @richlawre
  • 29. Using Google Analytics API Extract by Session ID or Client ID @richlawre https://www.jcchouinard.com/google-analytics-api-using-python/
  • 30. Using Google Analytics API Useful dimensions: landingPagePath secondPagePath exitPagePath @richlawre https://www.jcchouinard.com/google-analytics-api-using-python/
  • 31. Using Google Analytics API Useful metrics: pageviewsPerSession revenuePerTransaction goalXXCompletions https://www.jcchouinard.com/google-analytics-api-using-python/ @richlawre
  • 32. Using Google Analytics API There is a limit on the number of metrics/dimensions 10 @richlawre https://www.jcchouinard.com/google-analytics-api-using-python/
  • 33. Using Google Analytics API There is also a limit on the number of rows per call 25,000 @richlawre https://www.jcchouinard.com/google-analytics-api-using-python/
  • 34. Using Google Analytics API The answer is to loop over days, metrics, dimensions & merge! @richlawre https://www.jcchouinard.com/google-analytics-api-using-python/
  • 35. Using BigQuery Data is nested - I’ve found it makes things more difficult at the session level https://adswerve.com/blog/google-analytics-queries-in-bigquery-part-two-users-sessions-unnesting-hits/ @richlawre
  • 36. Using BigQuery However it is possible to do and there is some great information around https://adswerve.com/blog/google-analytics-queries-in-bigquery-part-two-users-sessions-unnesting-hits/ @richlawre
  • 37. Using BigQuery Can also run the unsupervised machine learning algorithm directly in SQL https://adswerve.com/blog/google-analytics-queries-in-bigquery-part-two-users-sessions-unnesting-hits/ @richlawre
  • 38. Using BigQuery Previously used 1M sessions with Python & Google Colab - BigQuery wasn’t necessary https://adswerve.com/blog/google-analytics-queries-in-bigquery-part-two-users-sessions-unnesting-hits/ @richlawre
  • 39. Using BigQuery Choose days at random to ensure variation https://adswerve.com/blog/google-analytics-queries-in-bigquery-part-two-users-sessions-unnesting-hits/ @richlawre
  • 41. Useful data transformations Change hours of the day to morning, afternoon, evening,night SESSION ID DAY DAY TYPE Session 1 Monday Weekday Session 2 Tuesday Weekday Session 3 Saturday Weekend Session 4 Wednesday Weekday @richlawre
  • 42. Change days to weekday & weekend @richlawre SESSION ID DAY DAY TYPE Session 1 Monday Weekday Session 2 Tuesday Weekday Session 3 Saturday Weekend Session 4 Wednesday Weekday Useful data transformations
  • 43. Change pages to sections @richlawre SESSION ID DAY DAY TYPE Session 1 Monday Weekday Session 2 Tuesday Weekday Session 3 Saturday Weekend Session 4 Wednesday Weekday Useful data transformations
  • 44. Useful data transformations Combine certain conversion points @richlawre SESSION ID DAY DAY TYPE Session 1 Monday Weekday Session 2 Tuesday Weekday Session 3 Saturday Weekend Session 4 Wednesday Weekday
  • 45. Here is a useful link to do find and replace it in Python & Pandas @richlawre SESSION ID DAY DAY TYPE Session 1 Monday Weekday Session 2 Tuesday Weekday Session 3 Saturday Weekend Session 4 Wednesday Weekday Useful data transformations
  • 46. You could use Google DataPrep instead @richlawre SESSION ID DAY DAY TYPE Session 1 Monday Weekday Session 2 Tuesday Weekday Session 3 Saturday Weekend Session 4 Wednesday Weekday Useful data transformations
  • 47. One hot encoding Converts categories to 1s & 0s. SESSION ID CHANNEL Session 1 Organic Search Session 2 Paid Search Session 3 Direct Session 4 Direct SESSION ID ORGANIC SEARCH PAID SEARCH DIRECT Session 1 1 0 0 Session 2 0 1 0 Session 3 0 0 1 Session 4 0 0 1 @richlawre
  • 48. Values aren’t increasing so doesn’t skew the clustering algorithm SESSION ID CHANNEL Session 1 Organic Search Session 2 Paid Search Session 3 Direct Session 4 Direct SESSION ID ORGANIC SEARCH PAID SEARCH DIRECT Session 1 1 0 0 Session 2 0 1 0 Session 3 0 0 1 Session 4 0 0 1 @richlawre One hot encoding
  • 49. Use for numerical as well as categorical data SESSION ID CHANNEL Session 1 Organic Search Session 2 Paid Search Session 3 Direct Session 4 Direct SESSION ID ORGANIC SEARCH PAID SEARCH DIRECT Session 1 1 0 0 Session 2 0 1 0 Session 3 0 0 1 Session 4 0 0 1 @richlawre One hot encoding
  • 50. See here for how to do it with Python SESSION ID CHANNEL Session 1 Organic Search Session 2 Paid Search Session 3 Direct Session 4 Direct SESSION ID ORGANIC SEARCH PAID SEARCH DIRECT Session 1 1 0 0 Session 2 0 1 0 Session 3 0 0 1 Session 4 0 0 1 @richlawre One hot encoding
  • 52. Best subset regression Choose desired response variable & find potential explanatory variables @richlawre
  • 53. Best subset regression Runs regression analysis for combinations of variables at once to find correlation @richlawre
  • 54. Best subset regression This will help you narrow down features to find useful patterns within @richlawre
  • 55. Best subset regression See Python walkthrough here @richlawre
  • 57. Principal Component Analysis Transforms a large set of variables into a smaller one without much loss @richlawre
  • 59. Using a KMeans algorithm The unsupervised machine learning algorithm to find patterns @richlawre
  • 60. Using a KMeans algorithm See a full walkthrough here with Python. @richlawre
  • 61. Using a KMeans algorithm You can also do this directly in BigQuery. @richlawre
  • 62. Using a silhouette score Way of finding the optimum number of clusters @richlawre
  • 63. Using a silhouette score Optimal number is at the elbow in the graph - not much gain after this @richlawre
  • 64. 5.Always manually explore the segments! @richlawre
  • 65. Taking it to the next level @richlawre
  • 66. Classify any future session @richlawre Use the labelled data to train a supervised machine learning algorithm - we use deep learning
  • 67. Classify any future session @richlawre The better defined your segments, the better this will perform
  • 68. Classify any future session @richlawre Push the labelled sessions back into Google Analytics via Data Import
  • 69. Visualise in Streamlit @richlawre CRM segment 1 CRM segment 2 CRM segment 3 CRM segment 4 CRM segment 5
  • 71. Unsupervised machine learning finds interesting patterns in data. @richlawre
  • 72. Apply this to individual sessions from Google Analytics to create behaviour segments. @richlawre
  • 73. This can be a great source of ideas for CRO hypotheses. @richlawre
  • 74. There are 5 steps for the analysis: extract,process,feature selection, cluster,manually explore @richlawre
  • 75. You can use Python or other toolsets (Google Cloud) to do the analysis. @richlawre
  • 76. You can use the segments to label any future session on the website. @richlawre