SlideShare a Scribd company logo
1st edition | July 8-11, 2019
BigML, Inc #DutchMLSchool
My First BigML Model
Subtitle here
Mercè Martín
VP of Applications, BigML
2
BigML, Inc #DutchMLSchool 3
• Lots of decisions
• Lots of potentially related variables
• Uncertain correlations
ML CAN HELP
Do I really need a model?
BigML, Inc #DutchMLSchool 4
We decide the actionNew data arrives The model labels it
Maybe I too could use a model…
BigML, Inc #DutchMLSchool
The challenge
5
BigML, Inc #DutchMLSchool
Credit delinquency
I WANT TO MINIMIZE RISK BY PREDICTING DEFAULTS
6
https://www.kaggle.com/c/GiveMeSomeCredit
BigML, Inc #DutchMLSchool
First step
7
BigML, Inc #DutchMLSchool
Defining the question
8
BigML, Inc #DutchMLSchool
Defining the real question
9
When do I consider a customer is in default?
When the customer misses payments?
What if the customer pays late?
What is the maximum delinquency that you allow?
BigML, Inc #DutchMLSchool
Defining the contest goal
10
Predicting who will be 90 days past due or
worse
to act only on them
BigML, Inc #DutchMLSchool
And now…
11
BigML, Inc #DutchMLSchool
The First Decision
12
https://bigml.com/accounts/register
BigML, Inc #DutchMLSchool
The Data Dictionary
13
Variable Name Description Type
SeriousDlqin2yrs Person experienced 90 days past due delinquency or worse Y/N
RevolvingUtilizationOfUnsecuredLines
Total balance on credit cards and personal lines of credit except real estate
and no installment debt like car loans divided by the sum of credit limits
percentage
age Age of borrower in years integer
NumberOfTime30-59DaysPastDueNotWo
rse
Number of times borrower has been 30-59 days past due but no worse in the
last 2 years.
integer
DebtRatio Monthly debt payments, alimony,living costs divided by monthy gross income percentage
MonthlyIncome Monthly income real
NumberOfOpenCreditLinesAndLoans
Number of Open loans (installment like car loan or mortgage) and Lines of
credit (e.g. credit cards)
integer
NumberOfTimes90DaysLate Number of times borrower has been 90 days or more past due. integer
NumberRealEstateLoansOrLines Number of mortgage and real estate loans including home equity lines of credit integer
NumberOfTime60-89DaysPastDueNotWo
rse
Number of times borrower has been 60-89 days past due but no worse in the
last 2 years.
integer
NumberOfDependents Number of dependents in family excluding themselves (spouse, children etc.) integer
10 predictors
BigML, Inc #DutchMLSchool
The Data
14
BigML, Inc #DutchMLSchool
The Source
15
How to interpret your data?
• Field types
• Locale (decimals)
• Missing tokens
• Text / Items parsing
BigML, Inc #DutchMLSchool
The Dataset
16
How is data distributed?
• Histograms
• Statistics
• Number of missings
• Number of errors
BigML, Inc #DutchMLSchool
And now… The Model
17
BigML, Inc #DutchMLSchool
The Model
18
What insights will the model extract?
• Patterns
• Importance
• and…
BigML, Inc #DutchMLSchool
The Prediction
19
What label corresponds to this loan?
• Predictions (labels)
• Confidence
• Explanations
BigML, Inc #DutchMLSchool
Are predictions correct?
20
BigML, Inc #DutchMLSchool
The Evaluation
21
TEST
TRAINING
CONFIDENCEPREDICTION
%
EVALUATION
%
MODEL
BigML, Inc #DutchMLSchool
And now… The Evaluation
22
BigML, Inc #DutchMLSchool
The Evaluation
23
Do predictions match the real values?
Hey! Great accuracy!!! right?
BigML, Inc #DutchMLSchool
I wish to make a complaint!
24
BigML, Inc #DutchMLSchool
The Evaluation
25
Do predictions match the real values?
• Positive class: 1
1 / 1
Predicted / Actual
TP
FN 0 / 1
FP
TN
1 / 0
0 / 0
BigML, Inc #DutchMLSchool
The Costs
26
Predicting who will be 90 days past due or worse
to act only on them
• Always remember the goal
TO MINIMIZE COST WE SHOULD MAXIMIZE THE RECALL
• And the costs of failing!!!
Unbalanced
BigML, Inc #DutchMLSchool
And now… Model Tuning
27
BigML, Inc #DutchMLSchool
Compensating unbalance
28
The percentage of examples of the
class we are interested is very low
Increasing their frequency could help
the model to learn better
BigML, Inc #DutchMLSchool
Choosing according to Costs
29
THE BALANCED MODEL WORKS BETTER
vs.
Unbalanced
Balanced
BigML, Inc #DutchMLSchool
And now… Automating
30
BigML, Inc #DutchMLSchool
The OptiML
31
BigML, Inc #DutchMLSchool
Automating tuning
32
Smart search for the
best performing
configuration
BigML, Inc #DutchMLSchool
And the winner is…
33
A simple decision tree!!!
• 19-node
• balanced
• pruned
BigML, Inc #DutchMLSchool
Operating the model
34
Pick the probability
threshold to decide
when to accept your
prediction
BigML, Inc #DutchMLSchool
So are we ready?
35
ITERATIVE
PREPARING AND
TRANSFORMING DATA
OPERATINGMODELING
BigML, Inc #DutchMLSchool
Going to production
36
What does production mean for you? Well, I need to
• Predict a bunch of data periodically: batch predictions
• Check from a call center as a customer calls: single predictions
• Use immediately the predicted value in my web: single local predictions
• Integrate groups of predictions in my app: batch local predictions
BigML, Inc #DutchMLSchool
Whitebox models & bindings
37
Predictions should be integrated in any widget and software
• IT systems
• Mobiles
• Tablets
• ATMs
• Amazon Echo
• Google Sheets
• Web sites
BigML, Inc #DutchMLSchool
Going to production
38
BigML, Inc #DutchMLSchool
Local production environment
39
https://bigml.com/tools
Co-organized by: Sponsor:
Business Partners:

More Related Content

PDF
DutchMLSchool. Clusters and Anomalies
PDF
DutchMLSchool. Supervised vs Unsupervised Learning
PDF
DutchMLSchool. Automating Decision Making
PDF
DutchMLSchool. Machine Learning End-to-End
PDF
DutchMLSchool. ML for Energy Trading and Automotive Sector
PDF
DutchMLSchool. Associations and Topic Models
PDF
DutchMLSchool. Opening Remarks
PDF
DutchMLSchool. ML: A Technical Perspective
DutchMLSchool. Clusters and Anomalies
DutchMLSchool. Supervised vs Unsupervised Learning
DutchMLSchool. Automating Decision Making
DutchMLSchool. Machine Learning End-to-End
DutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. Associations and Topic Models
DutchMLSchool. Opening Remarks
DutchMLSchool. ML: A Technical Perspective

What's hot (11)

PDF
DutchMLSchool. Models, Evaluations, and Ensembles
PDF
MLSEV. Cluster Analysis and Anomaly Detection
PDF
DutchMLSchool. Logistic Regression, Deepnets, Time Series
PDF
MLSEV. Machine Learning: Business Perspective
PDF
MLSEV. Models, Evaluations and Ensembles
PDF
DutchMLSchool. ML Automation
PDF
MLSEV. Logistic Regression, Deepnets, and Time Series
PDF
BigMLSchool: Customer Segmentation
PDF
SearchLove London 2016 | Tom Anthony | SEO Split-Testing - How You can Run Te...
PPTX
Machine Learning for Sales & Marketing
PDF
MLSEV Virtual. My first BigML Project
DutchMLSchool. Models, Evaluations, and Ensembles
MLSEV. Cluster Analysis and Anomaly Detection
DutchMLSchool. Logistic Regression, Deepnets, Time Series
MLSEV. Machine Learning: Business Perspective
MLSEV. Models, Evaluations and Ensembles
DutchMLSchool. ML Automation
MLSEV. Logistic Regression, Deepnets, and Time Series
BigMLSchool: Customer Segmentation
SearchLove London 2016 | Tom Anthony | SEO Split-Testing - How You can Run Te...
Machine Learning for Sales & Marketing
MLSEV Virtual. My first BigML Project
Ad

Similar to DutchMLSchool. Your first BigML Project (20)

PDF
DutchMLSchool 2022 - End-to-End ML
PDF
BigMLSchool: My First End-to-End Machine Learning Project
PDF
DutchMLSchool. ML Business Perspective
PDF
VSSML18. Introduction to Machine Learning and the BigML Platform
PDF
DutchMLSchool. Introduction to Machine Learning with the BigML Platform
PDF
VSSML18. Evaluations
PDF
DutchMLSchool 2022 - Anomaly Detection at Scale
PDF
DutchMLSchool. Machine Learning: Why Now?
PPTX
Wooing the Best Bank Deposit Customers
PDF
BSSML16 L5. Summary Day 1 Sessions
PDF
A few Challenges to Make Machine Learning Easy
PDF
BSSML17 - Introduction, Models, Evaluations
PDF
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
PPTX
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
PPTX
Machine Learning vs Decision Optimization comparison
PDF
BigMLSchool: Bankruptcy Prediction
PDF
VSSML17 Review. Summary Day 1 Sessions
PDF
Machine Learning in Production
PDF
Machine learning for IoT - unpacking the blackbox
PDF
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - End-to-End ML
BigMLSchool: My First End-to-End Machine Learning Project
DutchMLSchool. ML Business Perspective
VSSML18. Introduction to Machine Learning and the BigML Platform
DutchMLSchool. Introduction to Machine Learning with the BigML Platform
VSSML18. Evaluations
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool. Machine Learning: Why Now?
Wooing the Best Bank Deposit Customers
BSSML16 L5. Summary Day 1 Sessions
A few Challenges to Make Machine Learning Easy
BSSML17 - Introduction, Models, Evaluations
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
Machine Learning vs Decision Optimization comparison
BigMLSchool: Bankruptcy Prediction
VSSML17 Review. Summary Day 1 Sessions
Machine Learning in Production
Machine learning for IoT - unpacking the blackbox
DutchMLSchool 2022 - Anomaly Detection
Ad

More from BigML, Inc (20)

PDF
Digital Transformation and Process Optimization in Manufacturing
PDF
DutchMLSchool 2022 - Automation
PDF
DutchMLSchool 2022 - ML for AML Compliance
PDF
DutchMLSchool 2022 - Multi Perspective Anomalies
PDF
DutchMLSchool 2022 - My First Anomaly Detector
PDF
DutchMLSchool 2022 - History and Developments in ML
PDF
DutchMLSchool 2022 - A Data-Driven Company
PDF
DutchMLSchool 2022 - ML in the Legal Sector
PDF
DutchMLSchool 2022 - Smart Safe Stadiums
PDF
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
PDF
DutchMLSchool 2022 - Citizen Development in AI
PDF
Democratizing Object Detection
PDF
BigML Release: Image Processing
PDF
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
PDF
Machine Learning in Retail: ML in the Retail Sector
PDF
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
PDF
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
PDF
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
PDF
Intelligent Mobility: Machine Learning in the Mobility Industry
PPTX
Intelligent Mobility: Embedded Machine Learning, Damage Detection in Rail
Digital Transformation and Process Optimization in Manufacturing
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Citizen Development in AI
Democratizing Object Detection
BigML Release: Image Processing
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: ML in the Retail Sector
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
Intelligent Mobility: Machine Learning in the Mobility Industry
Intelligent Mobility: Embedded Machine Learning, Damage Detection in Rail

Recently uploaded (20)

PDF
annual-report-2024-2025 original latest.
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
IMPACT OF LANDSLIDE.....................
PDF
Microsoft Core Cloud Services powerpoint
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PDF
[EN] Industrial Machine Downtime Prediction
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
Managing Community Partner Relationships
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PDF
Introduction to Data Science and Data Analysis
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PDF
Global Data and Analytics Market Outlook Report
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
annual-report-2024-2025 original latest.
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Topic 5 Presentation 5 Lesson 5 Corporate Fin
IMPACT OF LANDSLIDE.....................
Microsoft Core Cloud Services powerpoint
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
[EN] Industrial Machine Downtime Prediction
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
retention in jsjsksksksnbsndjddjdnFPD.pptx
Managing Community Partner Relationships
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
Introduction to Data Science and Data Analysis
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Global Data and Analytics Market Outlook Report
IBA_Chapter_11_Slides_Final_Accessible.pptx
Qualitative Qantitative and Mixed Methods.pptx
importance of Data-Visualization-in-Data-Science. for mba studnts
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg

DutchMLSchool. Your first BigML Project