SlideShare a Scribd company logo
Customer segmentation
an excuse to use Machine Learning ;-)
Customer segmentation scbcn17
● Julio Martinez
● Web developer since 2001
● 2 years working at Ulabox
● Machine Learning hobbyist
● Find me: @liopic
Who am I?
1. docker pull jupyter/scipy-notebook
2. git clone git@github.com:ulabox/datasets
3. git clone git@github.com:liopic/scbcn17-customer-segmentation
4. cp datasets/data/*.csv scbcn17-customer-segmentation/
Preparing the workshop
My 2017 objective: M.L.
● Motivation
○ It’s the new hot thing
○ AlphaGo beat Lee Sedol, March 2016
● Some background, but need to learn more
1. Choose the way
○ Coursera’s vs. books vs. workshops vs. posts
2. Find an excuse to apply it
○ @work is better than @home
Learning about Machine Learning
Customer clusters @work, aka “the excuse”
● There is a non-programmer Business Analysis Department
● Groups of customers based on periodicity + amount spent
○ Example: people that buy once per month, 100€ ticket
○ Useful for business reports
○ Not so useful for UX, CRM
● Groups by behavior? Clustering orders!
Boring!
1. With past data -> make a ML model
○ clean data
○ choose a ML algorithm/s
○ tune the algorithm, with testing
2. With new data -> use model to predict (or give new info)
○ deploy pipeline
○ update model
101 Machine Learning: the method
● Supervised
○ data + labels(result)
● Unsupervised
○ just data
● Reinforcement
○ function to optimize
101 Machine Learning: type of problems
Supervised learning
TRAINING SET
cat cat person
TEST SET
???
Unsupervised learning
TRAINING SET
TEST SET
There is NO test
● Try to extract features (information, shapes): similar and different
● Uses:
○ Clustering
○ Anomaly detection (it doesn’t look “normal”)
○ Dimensional reduction
○ Transfer features, projections ...
Unsupervised learning
● Use:
○ grouping
○ quantization
● Algorithms:
○ k-means
○ DBSCAN
Clustering
● need: how many clusters
k-means
● need: how many samples at minimum, tune other params
DBSCAN: Density-based spatial clustering of applications with noise
So, ready to hack?
But wait a moment!
● Data preparation
○ Keep same order of magnitude, usually [0,1]
○ Remove noise
○ Other processes
■ Binarize data, categorical features
● weekday, ex. 4 -> 0, 0, 0, 1, 0, 0, 0
■ Process missing data
Before algorithms: data!
● Explore the data
○ Images are richer than numbers
■ “We get more orders at 22h” vs.
● Ask domain experts
○ Understand normal & border cases
■ The step at 14h is the web cutoff time
Before algorithms: data!
● Explore and optimize the data
○ Features that count, feature engineering
○ Avoid the “curse of dimensionality”
● Start small, understandable, useful
● Find excuses to try it, and sell it!
Lessons learned
Now, let’s hack!
1. docker pull jupyter/scipy-notebook
2. git clone git@github.com:ulabox/datasets
3. git clone git@github.com:liopic/scbcn17-customer-segmentation
4. cp datasets/data/*.csv scbcn17-customer-segmentation/
5. cd scbcn17-customer-segmentation
6. ./jupyter.sh
7. Open the link in your browser and open the Workshop.ipynb file
Let’s hack
Thank you!

More Related Content

PDF
Hadoop @ eBuddy
PDF
Introduction of Artificial Intelligence and Machine Learning
PPTX
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
PDF
Introduction to machine learning and applications (1)
PPTX
Machine Learning - Startup weekend UCSB 2018
PDF
MeasureFest July 2021 - Session Segmentation with Machine Learning
PDF
Bespoke Data Insights at New Finance
PDF
Introduction to Data Science
Hadoop @ eBuddy
Introduction of Artificial Intelligence and Machine Learning
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Introduction to machine learning and applications (1)
Machine Learning - Startup weekend UCSB 2018
MeasureFest July 2021 - Session Segmentation with Machine Learning
Bespoke Data Insights at New Finance
Introduction to Data Science

Similar to Customer segmentation scbcn17 (20)

PPTX
Big Data & Machine Learning - TDC2013 Sao Paulo
PDF
Machine learning and big data
PPTX
Workshop_Presentation.pptx
PDF
Introduction to Machine Learning with SciKit-Learn
PPTX
Data Science Demystified
PDF
Using Machine Learning to Understand and Predict Marketing ROI
PPTX
L15.pptx
PPTX
Data scientist roadmap
PDF
Data Science in Industry - Applying Machine Learning to Real-world Challenges
PDF
General introduction to AI ML DL DS
PDF
AIIA - Charting the Path to Intelligent Operations with Machine Learning - At...
PPTX
Machine Learning for Modern Developers
PDF
A few Challenges to Make Machine Learning Easy
PPTX
Data Scientist's Daily Life
PPTX
Navigating-the-World-of-Data-Science.pptx
PDF
Big Data Analytics for connected home
PDF
Data Analysis - Making Big Data Work
PDF
Data Science Folk Knowledge
PPTX
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
PPTX
Machine Learning
Big Data & Machine Learning - TDC2013 Sao Paulo
Machine learning and big data
Workshop_Presentation.pptx
Introduction to Machine Learning with SciKit-Learn
Data Science Demystified
Using Machine Learning to Understand and Predict Marketing ROI
L15.pptx
Data scientist roadmap
Data Science in Industry - Applying Machine Learning to Real-world Challenges
General introduction to AI ML DL DS
AIIA - Charting the Path to Intelligent Operations with Machine Learning - At...
Machine Learning for Modern Developers
A few Challenges to Make Machine Learning Easy
Data Scientist's Daily Life
Navigating-the-World-of-Data-Science.pptx
Big Data Analytics for connected home
Data Analysis - Making Big Data Work
Data Science Folk Knowledge
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
Machine Learning
Ad

More from Julio Martinez (8)

PDF
Buscando un trabajo en un pajar
PDF
Remote working effectively
PDF
Conclusion of the Seminary UPC 2017
PDF
Introduction to Docker
PDF
Some OOP paradigms & SOLID
PDF
Introduction to Clean Code
PDF
Professional development
PDF
Code metrics in PHP
Buscando un trabajo en un pajar
Remote working effectively
Conclusion of the Seminary UPC 2017
Introduction to Docker
Some OOP paradigms & SOLID
Introduction to Clean Code
Professional development
Code metrics in PHP
Ad

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPT
Teaching material agriculture food technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
cuic standard and advanced reporting.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Cloud computing and distributed systems.
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
NewMind AI Weekly Chronicles - August'25 Week I
Dropbox Q2 2025 Financial Results & Investor Presentation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Teaching material agriculture food technology
The AUB Centre for AI in Media Proposal.docx
cuic standard and advanced reporting.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Advanced methodologies resolving dimensionality complications for autism neur...
Building Integrated photovoltaic BIPV_UPV.pdf
Unlocking AI with Model Context Protocol (MCP)
Reach Out and Touch Someone: Haptics and Empathic Computing
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Cloud computing and distributed systems.
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Approach and Philosophy of On baking technology
Chapter 3 Spatial Domain Image Processing.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Understanding_Digital_Forensics_Presentation.pptx
Encapsulation_ Review paper, used for researhc scholars
Network Security Unit 5.pdf for BCA BBA.
NewMind AI Weekly Chronicles - August'25 Week I

Customer segmentation scbcn17

  • 1. Customer segmentation an excuse to use Machine Learning ;-)
  • 3. ● Julio Martinez ● Web developer since 2001 ● 2 years working at Ulabox ● Machine Learning hobbyist ● Find me: @liopic Who am I?
  • 4. 1. docker pull jupyter/scipy-notebook 2. git clone git@github.com:ulabox/datasets 3. git clone git@github.com:liopic/scbcn17-customer-segmentation 4. cp datasets/data/*.csv scbcn17-customer-segmentation/ Preparing the workshop
  • 5. My 2017 objective: M.L. ● Motivation ○ It’s the new hot thing ○ AlphaGo beat Lee Sedol, March 2016 ● Some background, but need to learn more
  • 6. 1. Choose the way ○ Coursera’s vs. books vs. workshops vs. posts 2. Find an excuse to apply it ○ @work is better than @home Learning about Machine Learning
  • 7. Customer clusters @work, aka “the excuse” ● There is a non-programmer Business Analysis Department ● Groups of customers based on periodicity + amount spent ○ Example: people that buy once per month, 100€ ticket ○ Useful for business reports ○ Not so useful for UX, CRM ● Groups by behavior? Clustering orders! Boring!
  • 8. 1. With past data -> make a ML model ○ clean data ○ choose a ML algorithm/s ○ tune the algorithm, with testing 2. With new data -> use model to predict (or give new info) ○ deploy pipeline ○ update model 101 Machine Learning: the method
  • 9. ● Supervised ○ data + labels(result) ● Unsupervised ○ just data ● Reinforcement ○ function to optimize 101 Machine Learning: type of problems
  • 10. Supervised learning TRAINING SET cat cat person TEST SET ???
  • 12. ● Try to extract features (information, shapes): similar and different ● Uses: ○ Clustering ○ Anomaly detection (it doesn’t look “normal”) ○ Dimensional reduction ○ Transfer features, projections ... Unsupervised learning
  • 13. ● Use: ○ grouping ○ quantization ● Algorithms: ○ k-means ○ DBSCAN Clustering
  • 14. ● need: how many clusters k-means
  • 15. ● need: how many samples at minimum, tune other params DBSCAN: Density-based spatial clustering of applications with noise
  • 16. So, ready to hack? But wait a moment!
  • 17. ● Data preparation ○ Keep same order of magnitude, usually [0,1] ○ Remove noise ○ Other processes ■ Binarize data, categorical features ● weekday, ex. 4 -> 0, 0, 0, 1, 0, 0, 0 ■ Process missing data Before algorithms: data!
  • 18. ● Explore the data ○ Images are richer than numbers ■ “We get more orders at 22h” vs. ● Ask domain experts ○ Understand normal & border cases ■ The step at 14h is the web cutoff time Before algorithms: data!
  • 19. ● Explore and optimize the data ○ Features that count, feature engineering ○ Avoid the “curse of dimensionality” ● Start small, understandable, useful ● Find excuses to try it, and sell it! Lessons learned
  • 21. 1. docker pull jupyter/scipy-notebook 2. git clone git@github.com:ulabox/datasets 3. git clone git@github.com:liopic/scbcn17-customer-segmentation 4. cp datasets/data/*.csv scbcn17-customer-segmentation/ 5. cd scbcn17-customer-segmentation 6. ./jupyter.sh 7. Open the link in your browser and open the Workshop.ipynb file Let’s hack