SlideShare a Scribd company logo
All	Models	Are	
Wrong,	But	Some	
Are	Useful:

6	Lessons	For	
Making	Predictive	
Analytics	Work
Dr.	Brian	Mac	Namee	
brian.macnamee@ucd.ie	
@brianmacnamee
machine		
learning	
ar,ficial		
intelligence	
data		
science	
cogni,ve		
compu,ng	
big		
data	
Inspired	by	Brendan	Tierney		
h:p://www.oraly,cs.com/2012/06/data-science-is-mul,disciplinary.html		
deep	
learning
ar#ficial		
intelligence	
data		
science	
cogni#ve		
compu#ng	
big		
data	
deep	
learning	
Inspired	by	Brendan	Tierney		
h:p://www.oraly#cs.com/2012/06/data-science-is-mul#disciplinary.html		
machine		
learning
All Models Are Wrong, But Some Are Useful: 6 Lessons for Making Predictive Analytics Work
All Models Are Wrong, But Some Are Useful: 6 Lessons for Making Predictive Analytics Work
if LOAN-SALARY RATIO < 1.5 then
OUTCOME=’repay’
else if LOAN-SALARY RATIO > 4 then
OUTCOME=’default’
else if AGE < 40 and OCCUPATION =’industrial’then
OUTCOME=’default’
else
OUTCOME=’repay’
end if
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
Better	data	usually	beats	bigger	models
Prediction	is	a	lot	of	things1
2 There	is	no	such	thing	as	a	free	lunch
3 Look	for	Goldilocks
4
Choose	your	evaluation	carefully5
6 Remember	Occam’s	Razor
Prediction	Is	A	

Lot	Of	Things
1
All Models Are Wrong, But Some Are Useful: 6 Lessons for Making Predictive Analytics Work
Predicting	the	
value	of	an	
unknown	variable	
at	a	time	in	the	
future
Forecast
0
27.5
55
82.5
110
July September November January March May
0
27.5
55
82.5
110
July September November January March May
Predict	the	value	of	an	
unknown	variable	
associated	with	an	
object
Label
Image Set
Image Set
Containing	Nerves Not	Containing	Nerves
Predicting	the	
propensity	of	
somebody	to	take	
an	action	at	a	time	
in	the	future
Rank
Population
Population
Least	Likely	

To	Respond
Most	Likely	

To	Respond
"In	data	analytics	a	prediction	
is	an	assignment	of	a	value	to	
an	unknown	variable."
Fundamentals	of	Machine	Learning	
for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	
Aoife	D'Arcy
www.machinelearningbook.com
Predictions	means	a	lot	of	different	things,	
which	means	we	can	apply	predictive	
modelling	to	many	different	problems.		
Think	carefully	about	what	type	of	decision	
you	want	to	make	(label,	rank,	or	forecast),	and	
then	design	a	predictive	modelling	solution	to	
best	help	with	that.
Lesson
27
There	Is	No	Such	
Thing	As	A	

Free	Lunch
2
www.rapidminer.com
29 www.rapidminer.com
"We	have	dubbed	the	associated
results	No	Free	Lunch	theorems	because	they	
demonstrate	that	if	an	algorithm	performs	well	
on	a	certain	class	of	problems	then	it	necessarily	
pays	for	that	with	degraded	performance	on	the	
set	of	all	remaining	problems."
Wolpert	&	Macready
"No Free Lunch Theorems for Optimization", David H. Wolpert and William G.
Macready, IEEE Transactions On Evolutionary Computation, vol. 1, no. 1, 1997
http://ti.arc.nasa.gov/m/profile/dhw/papers/78.pdf
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
Tree Model
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
Nearest Neighbour Model
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
Linear Model
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
Tree Model
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
Nearest Neighbour Model
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
Linear Model
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
There	are	a	huge	number	of	different	
predictive	modelling	algorithms.	You	need	to	
experiment	with	lots	of	different	ones.
Lesson
random	forest	decision	tree	istonic	regression	neural	
network	 nearest	 neighbour	 naive	 Bayes	 support	
vector	machine	logistic	regression	Bayesian	network	
ensemble			gradient		boosting		linear		model		winnow
Look	For	
Goldilocks
3
●
●
●
●
●
0 20 40 60 80 100
20000400006000080000
Age
Income
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
●
●
●
●
●
0 20 40 60 80 100
20000400006000080000
Age
Income
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
●
●
●
●
●
0 20 40 60 80 100
20000400006000080000
Age
Income
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
●
●
●
●
●
0 20 40 60 80 100
20000400006000080000
Age
Income
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
0 50 100 150 200
0.10.20.30.40.5
Training Iteration
MisclassificationRate
Performance on Training Set
Performance on Validation Set
0 50 100 150 200
0.10.20.30.40.5
Training Iteration
MisclassificationRate
Performance on Training Set
Performance on Validation Set
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
0 50 100 150 200
0.10.20.30.40.5
Training Iteration
MisclassificationRate
Performance on Training Set
Performance on Validation Set
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
0 50 100 150 200
0.10.20.30.40.5
Training Iteration
MisclassificationRate
Performance on Training Set
Performance on Validation Set
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
0 50 100 150 200
0.10.20.30.40.5
Training Iteration
MisclassificationRate
Performance on Training Set
Performance on Validation Set
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
0 50 100 150 200
0.10.20.30.40.5
Training Iteration
MisclassificationRate
Performance on Training Set
Performance on Validation Set
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
0 50 100 150 200
0.10.20.30.40.5
Training Iteration
MisclassificationRate
Performance on Training Set
Performance on Validation Set
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
0 50 100 150 200
0.10.20.30.40.5
Training Iteration
MisclassificationRate
Performance on Training Set
Performance on Validation Set
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
0 50 100 150 200
0.10.20.30.40.5
Training Iteration
MisclassificationRate
Performance on Training Set
Performance on Validation Set
0 50 100 150 200
0.10.20.30.40.5
Training Iteration
MisclassificationRate
Performance on Training Set
Performance on Validation Set
Always	tune	your	models,	but	be	very	careful	
of	overfitting.	A	validation	dataset	is	crucial	
here.
Lesson
56
Better	Data	
Usually	Beats	
Bigger	Models
4
Digital	Image	Processing,	
Gonzalez	&	Woods,	2002
Digital	Image	Processing,	
Gonzalez	&	Woods,	2002
Denoised image
100 200 300 400 500 600
50
100
150
200
250
300
350
400
450
Digital	Image	Processing,	
Gonzalez	&	Woods,	2002
Digital	Image	Processing,	
Gonzalez	&	Woods,	2002
Digital	Image	Processing,	
Gonzalez	&	Woods,	2002
Raw	Activity
Normalised	Activity
Wake	Aligned	Activity
Cumulative	Wake	Aligned	Activity
Activity
Activity Peak	activity	(day)
Variation	in	activity	(day)
Total	activity	(day)
Peak	activity	(1st	hour)
Variation	in	activity	(1st	hour)
Total	activity	(1st	hour)
Area	under	cumulative	activity	curve
…
Choose	An	
Algorithm
Generate	Data
Tune	Model	
Parameters
Choose	An	
Algorithm
Generate	Data
Tune	Model	
Parameters
Developing	new,	richer	features	is	often	a	
better	way	to	improve	model	performance	
than	using	more	sophisticated	modelling	
techniques.
Lesson
An	Aside	On	Deep	Learning
Deep Learning
Google Trends: http://www.google.com/trends/
2005 2007 2009 2011 2013 2015
Deep-learning	methods	are	representaUon-learning	
methods	with	mulple	levels	of	representaon,	
obtained	by	composing	simple	but	non-linear	modules	
that	each	transform	the	representaon	at	one	level	
(starng	with	the	raw	input)	into	a	representaon	at	a	
higher,	slightly	more	abstract	level.		
[LeCun	et	al,	2014]
Deep Learning
Yann LeCun, Yoshua Bengio & Geoffrey Hinton
http://www.nature.com/nature/journal/v521/n7553/full/nature14539.html
0
1
2
3
4
5
6
7
8
9
Convoluonal	neural	networks	seem	to	brilliantly	
address	the	selecUvity-invariance	dilemma	that	is	
fundamental	to	all	efforts	to	learn	to	classify	objects:	
they	produce	representaons	that	are	selecve	to		the	
aspects	of	the	image	that	are	important	for	
discriminaon,	but	that	are	invariant	to	irrelevant	
aspects	
Convoluonal	networks	hold	records	for	problems	in	
image	recogniUon,	speech	recogniUon,	and	text	
classificaUon	amongst	other	areas
All Models Are Wrong, But Some Are Useful: 6 Lessons for Making Predictive Analytics Work
On Welsh Corgis, Computer Vision, and the Power of Deep Learning, Microsoft Research, 2014
http://research.microsoft.com/en-us/news/features/dnnvision-071414.aspx
Rise of the machines, The Economist, 2015
http://www.economist.com/news/briefing/21650526-artificial-intelligence-scares-peopleexcessively-so-rise-machines
Hardware	
Data	Algorithms	
Applica4ons
79
Choose	Your		
Evaluation	
Carefully
5
All Models Are Wrong, But Some Are Useful: 6 Lessons for Making Predictive Analytics Work
A marketing company working for a charity has
developed two different models that predict the
likelihood that donors will respond to a mail-
shot asking them to make a special extra
donation.
Two models have been built and an evaluation
experiment had been performed. Now we must
decide which model to use.
Prediction
TRUE FALSE
Target
TRUE 2355 337
FALSE 329 1714
Classification	Accuracy:	85.93%
Model	1
Prediction
TRUE FALSE
Target
TRUE 2198 494
FALSE 471 1572
Classification	Accuracy:	79.62%
Model	2
Model	1
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
Model	2
Fundamentals	of	Machine	Learning	for	Predictive	Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com
There	are	many	different	performance	
measures	that	we	can	use	to	evaluate	the	
performance	of	a	model.	You	need	to	pick	the	
one	that	best	matches	the	decisions	you	are	
trying	to	make.
Lesson
87
Remember	
Occam’s	Razor
6
All Models Are Wrong, But Some Are Useful: 6 Lessons for Making Predictive Analytics Work
All Models Are Wrong, But Some Are Useful: 6 Lessons for Making Predictive Analytics Work
All Models Are Wrong, But Some Are Useful: 6 Lessons for Making Predictive Analytics Work
All Models Are Wrong, But Some Are Useful: 6 Lessons for Making Predictive Analytics Work
Timeline
Followers
Following
Tweets
+ Metadata
Profile
Tweets
+ Metadata
Profile
Tweets
+ Metadata
Profile
http://www.cso.ie/en/releasesandpublications/er/ibn/irishbabiesnames2014/
Always	start	with	simple	solutions	first.	Only	
add	complexity	if	required.
Lesson
Frustra	fit	per	plura	quod	potest	
fieri	per	pauciora		
(It	is	futile	to	do	with	more	things	
that	which	can	be	done	with	
fewer)
Better	data	usually	beats	bigger	models
Prediction	is	a	lot	of	things1
2 There	is	no	such	thing	as	a	free	lunch
3 Look	for	Goldilocks
4
Choose	your	evaluation	carefully5
6 Remember	Occam’s	Razor
Fundamentals	of	Machine	Learning	for	Predictive	
Data	Analytics	
John	Kelleher,	Brian	Mac	Namee,	and	Aoife	D'Arcy
www.machinelearningbook.com	
Thank	You	
Questions?
Training	Course:	Fundamentals	of	Machine

Learning	for	Predictive	Data	Analytics	
Dublin,	March	21st	-	23rd	
www.theanalyticsstore.ie/training/
brian.macnamee@ucd.ie	
@brianmacnamee

More Related Content

PDF
Fairness in Machine Learning @Codemotion
PPTX
How to write a journal article
PDF
Text mining, word embeddings, & wikipedia
PPTX
Serverless Machine Learning - Hanoi Google Next 2019
PPTX
Phasic Systems - Dr. Geoffrey Malafsky
PDF
Top 7 Reasons why Maintenance Work Orders are Closed Out Accurately
PDF
[DSC Europe 22] Avoid mistakes building AI products - Karol Przystalski
PDF
10.09.14 glassdoor webinar_all_slides
Fairness in Machine Learning @Codemotion
How to write a journal article
Text mining, word embeddings, & wikipedia
Serverless Machine Learning - Hanoi Google Next 2019
Phasic Systems - Dr. Geoffrey Malafsky
Top 7 Reasons why Maintenance Work Orders are Closed Out Accurately
[DSC Europe 22] Avoid mistakes building AI products - Karol Przystalski
10.09.14 glassdoor webinar_all_slides

Similar to All Models Are Wrong, But Some Are Useful: 6 Lessons for Making Predictive Analytics Work (20)

PDF
How To Get Into Data Science & Analytics - 2nd Talk - feliperego.com.au
PDF
How To Get Into Data Science & Analytics - feliperego.com.au
PDF
DataCamp investor deck April 2015
PPTX
Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...
PDF
Technology watch - AI in chemical industry
PPTX
First Steps on Big Data
PPTX
Machine Learning Interviews_ Lessons from Both Sides - FSDL.pptx
PPTX
VLDB Slides on Making Sense of Applying ML to APIs
PDF
What is Gamification?
PPTX
How to get the most of your Data & Analytcs
PDF
I believe I can fly (Extract London 2015)
PDF
10 Tips From A Young Data Scientist
PPTX
Top 10 Data Science Interview Questions in 2022.pptx
PDF
Cool vs Creepy - Ethics and Data Science - Cooper 2Feb
PDF
What Managers Need to Know about Data Science
DOCX
MYRESUME MAY 2015
PDF
Business Driven Information Systems 6th Edition Baltzan Test Bank
PPTX
Data Science-Why?What?How? By Hari Prasad
PDF
Life Span Development A Topical Approach 3rd Edition Feldman Test Bank
PDF
Decoding Data Science
How To Get Into Data Science & Analytics - 2nd Talk - feliperego.com.au
How To Get Into Data Science & Analytics - feliperego.com.au
DataCamp investor deck April 2015
Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...
Technology watch - AI in chemical industry
First Steps on Big Data
Machine Learning Interviews_ Lessons from Both Sides - FSDL.pptx
VLDB Slides on Making Sense of Applying ML to APIs
What is Gamification?
How to get the most of your Data & Analytcs
I believe I can fly (Extract London 2015)
10 Tips From A Young Data Scientist
Top 10 Data Science Interview Questions in 2022.pptx
Cool vs Creepy - Ethics and Data Science - Cooper 2Feb
What Managers Need to Know about Data Science
MYRESUME MAY 2015
Business Driven Information Systems 6th Edition Baltzan Test Bank
Data Science-Why?What?How? By Hari Prasad
Life Span Development A Topical Approach 3rd Edition Feldman Test Bank
Decoding Data Science
Ad

Recently uploaded (20)

PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
Database Infoormation System (DBIS).pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
ISS -ESG Data flows What is ESG and HowHow
PPT
Quality review (1)_presentation of this 21
PPTX
1_Introduction to advance data techniques.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PDF
Introduction to the R Programming Language
Galatica Smart Energy Infrastructure Startup Pitch Deck
Fluorescence-microscope_Botany_detailed content
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Supervised vs unsupervised machine learning algorithms
.pdf is not working space design for the following data for the following dat...
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Mega Projects Data Mega Projects Data
Database Infoormation System (DBIS).pptx
Clinical guidelines as a resource for EBP(1).pdf
ISS -ESG Data flows What is ESG and HowHow
Quality review (1)_presentation of this 21
1_Introduction to advance data techniques.pptx
annual-report-2024-2025 original latest.
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Introduction to the R Programming Language
Ad

All Models Are Wrong, But Some Are Useful: 6 Lessons for Making Predictive Analytics Work