SlideShare a Scribd company logo
1st edition
November 4-5, 2018
Machine Learning School in Doha
BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 ·
Build Your Own
Unsupervised Models
Gregory Antell, Ph.D.
Machine Learning Architect, BigML Inc.
!2
The impact of hyperparameter choices
BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 ·
https://bigml.com/gallery/datasets
!3
Professors Salaries: http://bml.io/13s39KL
(search under “Higher Education and Research”)
Reading Habits: http://bml.io/XGXCg3
(search under “Demographics and Surveys”)
TED Talks: https://bigml.com/user/czuriaga/gallery/dataset/
5ab8fabc92fb562acb00170a
(search under “Human Resources and Psychology”)
Diabetes: http://bml.io/QdJPdb
(search under “Healthcare”)
BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 ·
Unsupervised Learning
!4
• Unsupervised methods seek to identify and uncover the
hidden structure of complicated datasets
• In general, these models require higher levels of domain
knowledge and interpretation to in order to validate
• Selection of features and initialization can have a massive
impact on the results of unsupervised algorithms
BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 ·
BigML Unsupervised Options
!5
ANOMALY DETECTION ASSOCIATION DISCOVERY
CLUSTER TOPIC MODEL
BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 ·
Exercise 1 - Anomaly Detection
!6
Professors salaries dataset: http://bml.io/13s39KL
(search under “Higher Education and Research”)
• Clone the dataset
• Create an anomaly detector returning at least the top 10
anomalous instances
• What is the gender, salary, and years since Ph.D for the
top anomaly?
• What is the rank and years of service for the #2 anomaly
BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 ·
Exercise 1 - Anomaly Detection
!7
Professors salaries dataset: http://bml.io/13s39KL
(search under “Higher Education and Research”)
• Clone the dataset
• Create an anomaly detector returning at least the top 10
anomalous instances
• What is the gender, salary, and years since Ph.D for the
top anomaly?
• What is the rank and years of service for the #2 anomaly
Female, $62884, 25
Associate Professor, 53
BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 ·
Exercise 2 - Clustering
!8
• Clone the dataset
• Create a cluster using the G-means option
• How many different clusters are returned?
• Repeat but remove race, marital status, and fields regarding
newspapers and magazines. How many clusters now?
Reading Habits: http://bml.io/XGXCg3
(search under “Demographics and Surveys”)
BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 ·
Exercise 2 - Clustering
!9
• Clone the dataset
• Create a cluster using the G-means option
• How many different clusters are returned?
• Repeat but remove race, marital status, and fields regarding
newspapers and magazines. How many clusters now?
Reading Habits: http://bml.io/XGXCg3
(search under “Demographics and Surveys”)
8
2
BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 ·
Exercise 3 - Topic modeling
!10
• Clone the dataset
• Create a topic model using the default options and only the
fields “title” and “description
• How many different topics are returned?
• What are some other terms highly associated with topics
that deal with healthcare?
TED Talks: https://bigml.com/user/czuriaga/gallery/dataset/
5ab8fabc92fb562acb00170a
(search under “Human Resources and Psychology”)
BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 ·
Exercise 3 - Topic modeling
!11
• Clone the dataset
• Create a topic model using the default options and only the
fields “title” and “description
• How many different topics are returned?
• What are some other terms highly associated with topics
that deal with healthcare?
TED Talks: https://bigml.com/user/czuriaga/gallery/dataset/
5ab8fabc92fb562acb00170a
(search under “Human Resources and Psychology”)
40 topics
Research, disease, cancer, drug, etc.
BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 · !12
• Clone the dataset
• Create an association discovery
• Sort associations by lift. What features are compared in the
top association?
• Sort associations by leverage. Do the top feature
associations change?
Diabetes: http://bml.io/QdJPdb
(search under “Healthcare”)
Exercise 4 - Associations
BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 · !13
• Clone the dataset
• Create an association discovery
• Sort associations by lift. What features are compared in the
top association?
• Sort associations by leverage. Do the top feature
associations change?
Diabetes: http://bml.io/QdJPdb
(search under “Healthcare”)
Exercise 4 - Associations
Yes: triceps skin thickness and insulin
triceps skin thickness and BMI
MLSD18. Unsupervised Workshop

More Related Content

PDF
MLSD18. Basic Transformations - BigML
PDF
MLSD18. Real-World Use Case I
PDF
MLSD18. Real World Use Case II
PDF
MLSD18. Supervised Summary
PDF
MLSD18. End-to-End Machine Learning
PDF
MLSD18. Data Cleaning
PDF
MLSD18. Machine Learning Research at QCRI
PDF
MLSD18. OptiML and Fusions
MLSD18. Basic Transformations - BigML
MLSD18. Real-World Use Case I
MLSD18. Real World Use Case II
MLSD18. Supervised Summary
MLSD18. End-to-End Machine Learning
MLSD18. Data Cleaning
MLSD18. Machine Learning Research at QCRI
MLSD18. OptiML and Fusions

What's hot (20)

PDF
MLSD18. Summary of Morning Sessions
PDF
MLSD18 Evaluations
PDF
MLSD18. Supervised Workshop
PDF
MLSD18. Ensembles, Logistic Regression, Deepnets
PDF
MLSD18. Basic Transformations - QCRI
PDF
MLSD18. Automating Machine Learning Workflows
PDF
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
PDF
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
PDF
TigerGraph.js
PDF
Connected datalondon metadata-driven apps
PDF
BigML Summer 2017 Release
PDF
Building, and communicating, a knowledge graph in Zalando
PDF
RDF Data Quality Assessment - connecting the pieces
PDF
BigML Release: PCA
PDF
schema.org, Linked Data's Gateway Drug
PDF
BSSML17 - API and WhizzML
PDF
Data Warehousing Trends
PDF
(Big) Data Science
PDF
MLSEV. Use Case: The All-in-One Data Warehouse and Machine Learning
PDF
MLSEV. Use Case: The Data-Driven Factory
MLSD18. Summary of Morning Sessions
MLSD18 Evaluations
MLSD18. Supervised Workshop
MLSD18. Ensembles, Logistic Regression, Deepnets
MLSD18. Basic Transformations - QCRI
MLSD18. Automating Machine Learning Workflows
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
TigerGraph.js
Connected datalondon metadata-driven apps
BigML Summer 2017 Release
Building, and communicating, a knowledge graph in Zalando
RDF Data Quality Assessment - connecting the pieces
BigML Release: PCA
schema.org, Linked Data's Gateway Drug
BSSML17 - API and WhizzML
Data Warehousing Trends
(Big) Data Science
MLSEV. Use Case: The All-in-One Data Warehouse and Machine Learning
MLSEV. Use Case: The Data-Driven Factory
Ad

Similar to MLSD18. Unsupervised Workshop (20)

PDF
BSSML17 - Anomaly Detection
PDF
BSSML16 L4. Association Discovery and Topic Modeling
PDF
DutchMLSchool 2022 - Anomaly Detection
PDF
DutchMLSchool. Supervised vs Unsupervised Learning
PDF
BSSML16 L3. Clusters and Anomaly Detection
PDF
DutchMLSchool 2022 - Anomaly Detection at Scale
PDF
MLSD18. Feature Engineering
PPTX
Anomaly Detection Using Isolation Forests
PDF
DutchMLSchool. ML: A Technical Perspective
PDF
VSSML16 L3. Clusters and Anomaly Detection
PDF
DutchMLSchool 2022 - My First Anomaly Detector
PDF
Web UI, Algorithms, and Feature Engineering
PDF
DutchMLSchool. Associations and Topic Models
PPTX
Big Data & ML for Clinical Data
PDF
Anomaly Detection using Deep Auto-Encoders
PDF
VSSML17 L4. Association Discovery and Latent Dirichlet Allocation
PDF
BigML Education - Supervised vs Unsupervised
PDF
MLSEV. Association Discovery and Topic Modeling
PDF
DutchMLSchool. Introduction to Machine Learning with the BigML Platform
PDF
DutchMLSchool 2022 - History and Developments in ML
BSSML17 - Anomaly Detection
BSSML16 L4. Association Discovery and Topic Modeling
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool. Supervised vs Unsupervised Learning
BSSML16 L3. Clusters and Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection at Scale
MLSD18. Feature Engineering
Anomaly Detection Using Isolation Forests
DutchMLSchool. ML: A Technical Perspective
VSSML16 L3. Clusters and Anomaly Detection
DutchMLSchool 2022 - My First Anomaly Detector
Web UI, Algorithms, and Feature Engineering
DutchMLSchool. Associations and Topic Models
Big Data & ML for Clinical Data
Anomaly Detection using Deep Auto-Encoders
VSSML17 L4. Association Discovery and Latent Dirichlet Allocation
BigML Education - Supervised vs Unsupervised
MLSEV. Association Discovery and Topic Modeling
DutchMLSchool. Introduction to Machine Learning with the BigML Platform
DutchMLSchool 2022 - History and Developments in ML
Ad

More from BigML, Inc (20)

PDF
Digital Transformation and Process Optimization in Manufacturing
PDF
DutchMLSchool 2022 - Automation
PDF
DutchMLSchool 2022 - ML for AML Compliance
PDF
DutchMLSchool 2022 - Multi Perspective Anomalies
PDF
DutchMLSchool 2022 - End-to-End ML
PDF
DutchMLSchool 2022 - A Data-Driven Company
PDF
DutchMLSchool 2022 - ML in the Legal Sector
PDF
DutchMLSchool 2022 - Smart Safe Stadiums
PDF
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
PDF
DutchMLSchool 2022 - Citizen Development in AI
PDF
Democratizing Object Detection
PDF
BigML Release: Image Processing
PDF
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
PDF
Machine Learning in Retail: ML in the Retail Sector
PDF
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
PDF
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
PDF
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
PDF
Intelligent Mobility: Machine Learning in the Mobility Industry
PPTX
Intelligent Mobility: Embedded Machine Learning, Damage Detection in Rail
PDF
Intelligent Mobility: Business Value of IoT and ML in Logistics
Digital Transformation and Process Optimization in Manufacturing
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Citizen Development in AI
Democratizing Object Detection
BigML Release: Image Processing
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: ML in the Retail Sector
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
Intelligent Mobility: Machine Learning in the Mobility Industry
Intelligent Mobility: Embedded Machine Learning, Damage Detection in Rail
Intelligent Mobility: Business Value of IoT and ML in Logistics

Recently uploaded (20)

PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Computer network topology notes for revision
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Mega Projects Data Mega Projects Data
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPT
Reliability_Chapter_ presentation 1221.5784
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Lecture1 pattern recognition............
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Database Infoormation System (DBIS).pptx
Computer network topology notes for revision
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
IB Computer Science - Internal Assessment.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Mega Projects Data Mega Projects Data
Qualitative Qantitative and Mixed Methods.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Reliability_Chapter_ presentation 1221.5784
ISS -ESG Data flows What is ESG and HowHow
Lecture1 pattern recognition............
climate analysis of Dhaka ,Banglades.pptx
Fluorescence-microscope_Botany_detailed content
Supervised vs unsupervised machine learning algorithms
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Business Ppt On Nestle.pptx huunnnhhgfvu

MLSD18. Unsupervised Workshop

  • 1. 1st edition November 4-5, 2018 Machine Learning School in Doha
  • 2. BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 · Build Your Own Unsupervised Models Gregory Antell, Ph.D. Machine Learning Architect, BigML Inc. !2 The impact of hyperparameter choices
  • 3. BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 · https://bigml.com/gallery/datasets !3 Professors Salaries: http://bml.io/13s39KL (search under “Higher Education and Research”) Reading Habits: http://bml.io/XGXCg3 (search under “Demographics and Surveys”) TED Talks: https://bigml.com/user/czuriaga/gallery/dataset/ 5ab8fabc92fb562acb00170a (search under “Human Resources and Psychology”) Diabetes: http://bml.io/QdJPdb (search under “Healthcare”)
  • 4. BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 · Unsupervised Learning !4 • Unsupervised methods seek to identify and uncover the hidden structure of complicated datasets • In general, these models require higher levels of domain knowledge and interpretation to in order to validate • Selection of features and initialization can have a massive impact on the results of unsupervised algorithms
  • 5. BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 · BigML Unsupervised Options !5 ANOMALY DETECTION ASSOCIATION DISCOVERY CLUSTER TOPIC MODEL
  • 6. BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 · Exercise 1 - Anomaly Detection !6 Professors salaries dataset: http://bml.io/13s39KL (search under “Higher Education and Research”) • Clone the dataset • Create an anomaly detector returning at least the top 10 anomalous instances • What is the gender, salary, and years since Ph.D for the top anomaly? • What is the rank and years of service for the #2 anomaly
  • 7. BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 · Exercise 1 - Anomaly Detection !7 Professors salaries dataset: http://bml.io/13s39KL (search under “Higher Education and Research”) • Clone the dataset • Create an anomaly detector returning at least the top 10 anomalous instances • What is the gender, salary, and years since Ph.D for the top anomaly? • What is the rank and years of service for the #2 anomaly Female, $62884, 25 Associate Professor, 53
  • 8. BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 · Exercise 2 - Clustering !8 • Clone the dataset • Create a cluster using the G-means option • How many different clusters are returned? • Repeat but remove race, marital status, and fields regarding newspapers and magazines. How many clusters now? Reading Habits: http://bml.io/XGXCg3 (search under “Demographics and Surveys”)
  • 9. BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 · Exercise 2 - Clustering !9 • Clone the dataset • Create a cluster using the G-means option • How many different clusters are returned? • Repeat but remove race, marital status, and fields regarding newspapers and magazines. How many clusters now? Reading Habits: http://bml.io/XGXCg3 (search under “Demographics and Surveys”) 8 2
  • 10. BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 · Exercise 3 - Topic modeling !10 • Clone the dataset • Create a topic model using the default options and only the fields “title” and “description • How many different topics are returned? • What are some other terms highly associated with topics that deal with healthcare? TED Talks: https://bigml.com/user/czuriaga/gallery/dataset/ 5ab8fabc92fb562acb00170a (search under “Human Resources and Psychology”)
  • 11. BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 · Exercise 3 - Topic modeling !11 • Clone the dataset • Create a topic model using the default options and only the fields “title” and “description • How many different topics are returned? • What are some other terms highly associated with topics that deal with healthcare? TED Talks: https://bigml.com/user/czuriaga/gallery/dataset/ 5ab8fabc92fb562acb00170a (search under “Human Resources and Psychology”) 40 topics Research, disease, cancer, drug, etc.
  • 12. BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 · !12 • Clone the dataset • Create an association discovery • Sort associations by lift. What features are compared in the top association? • Sort associations by leverage. Do the top feature associations change? Diabetes: http://bml.io/QdJPdb (search under “Healthcare”) Exercise 4 - Associations
  • 13. BigML, Inc · @bigmlcom · @QatarComputing · #MLSD18 · !13 • Clone the dataset • Create an association discovery • Sort associations by lift. What features are compared in the top association? • Sort associations by leverage. Do the top feature associations change? Diabetes: http://bml.io/QdJPdb (search under “Healthcare”) Exercise 4 - Associations Yes: triceps skin thickness and insulin triceps skin thickness and BMI