SlideShare a Scribd company logo
Supervised	
Learning	
Algorithms	
Analysis	
of	
	Different	
approaches	
Evgeniy	Marinov	
ML	Consultant	
Philip	Yankov	
x8academy
ML	DefiniCon	
•  There	are	plenty	of	definiCons...		
•  Informal:	The	field	of	study	that	gives	
computers	the	ability	to	learn	without	being	
explicitly	programmed	(Arthur	Samuel,	1959)		
•  Formal:	A	computer	program	is	said	to	learn	
from	experience	E,	with	respect	to	some	task	
T,	and	some	performance	measure	P,	if	its	
performance	on	T	as	measured	by	P	improves	
with	experience	E	(Tom	Mitchell,	1998).
From	Wikipedia	
•  Machine	learning	is:		
– a	subfield	of	computer	science	that	evolved	from	
the	study	of	paRern	recogniCon	and	in	AI	in	the	
1980s	(ML	is	a	separate	field	flourishing	from	the	
1990s,	first	benefited	from	staCsCcs	and	then	
from	the	increasing	availability	of	digiCzed	
informaCon	at	that	Cme).
Why	ML?
Why	ML?
Key	factors	enabling	ML	growth	today	
•  Cloud	Compu)ng	
•  Internet	of	Things	
•  Big	Data	(+	Unstructured	Data)
Why	Data	is	so	important?
Why	Data	is	so	important?	
•  Google	Photos	
– Unlimited	storage	
•  Google	voice	
– OK,	Google
Pipeline of Supervised learning algorithms
Nowadays	
•  It	is	so	easy	to	get	data	you	need	and	to	use	
an	API	or	service	of	some	company	to	
experiment	with	them
Methods	for	collecCng	data
Methods	for	collecCng	data	
•  Download	
– Spreadsheet	
– Text	
•  API	
•  Crawling	/	scraping
Supervised	Learning
Task Description
Pipeline
IniCal	example
NotaCon
Pipeline of Supervised learning algorithms
•  Asdasd	
•  Asdasd	
•  Asdasd	
•  Asdasd	
The	regression	funcCon	f(x)
•  as	
•  as	
•  as
How	to	evaluate	our	model?
Pipeline
Assessing the Model Accuracy
Pipeline of Supervised learning algorithms
Bias-variance	trade-off
Bias-variance	trade-off
Cross-validaCon
GeneralizaCon	Error	and	Overfi`ng
Choosing	a	Model	by	data	types	of	
response
Pipeline
Data	types	and	Generalized	Linear	
model	
•  Simple	and	General	linear	models	
•  RestricCons	of	the	linear	model		
•  Data	type	of	the	response	Y	
	
1)  	(General)	Linear	model	R,	Y	~	Gaussian(µ,	σ^2)			--	conCnuous	
2)  	LogisCc	regression	{0,	1},	Y	~	Bernoulli(p)		--	binary	data	
3)		Poisson	regression	{0,	1,...},	Y	~	Poisson(µ)		--	counCng	data
Simple	and	General	linear	models	
Simple:	
General:
Error	of	the	General	Linear	model	
		
Click	to	add	Text
RestricCons	of	Linear	models	
Although	the	General	linear	model	is	a	useful	
framework,	it	is	not	appropriate	in	the	following	cases:	
•  The	range	of	Y	is	restricted	(e.g.	binary,	count,	
posiCve/negaCve)	
•  Var[Y]	depends	on	the	mean	E[Y]	(for	the	Gaussian	
they	are	independent)	
Name	 Mean	 Variance	
Bernoulli(p)	 p	 p(1 - p)	
Binomial(p, n)	 np	 np(1 - p)	
Poisson(p)	 p	 p
Binary	response	Y	–	{0,	1}		
•  The	Bernoulli(p)	is	discrete	r.v.	with	two	possible	outcomes:	
•  p	and	q	=	1	–	p	
•  The	parameter	p	does	not	change	over	Cme			
•  Bernoulli	is	building	block	for	other	more	complicated	
distribuCons	
•  Examples:	
•  Coin	flips	{Heads,	Tails}	–	if	unbiased	
•  then	p	=	0.5	
•  Click	on	Ad,	Fail/Success	on	Exam
Generalized	Linear	model	-	IntuiCon
ExponenCal	Family
General	linear	model
Binary Data
Modeling	CounCng	/	Poisson	Data
Maximizing	the	Log-Likelihood	and	Parameters	
esCmaCon
Preprocessing
Pipeline
Problems	with	feature	types	
•  Big	number	of	features	->	Dimensionality	
reducCon	->	SVD,	PCA	
– Dimensionality	reduc)on:	“compress”	the	data	
from	a	high-dimensional	representaCon	into	a	
lower-dimensional	one	(useful	for	visualizaCon	or	
as	an	internal	transformaCon	for	other	ML	
algorithms)	
•  Sparse	features	->	Hashing
•  Instead	of	using	two	coordinates	( 𝒙, 𝒚)	to	describe	
point	locaCons,	let’s	use	only	one	coordinate	(𝒛)	
•  Point’s	posiCon	is	its	locaCon	along	vector	​ 𝒗↓ 𝟏 	
•  How	to	choose	​ 𝒗↓ 𝟏 ?	Minimize	reconstruc)on	error	
SVD	–	Dimensionality	ReducCon	
v1
first right
singular vector
Movie 1 rating
Movie2rating
SVD	-	Dimensionality	ReducCon	
More	details	
•  Q:	How	exactly	is	dim.	reduc)on	done?	
•  A:	Set	smallest	singular	values	to	zero	
	
46	
0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
x	 x	
1 1 1 0 0
3 3 3 0 0
4 4 4 0 0
5 5 5 0 0
0 2 0 4 4
0 0 0 5 5
0 1 0 2 2
0.13 0.02 -0.01
0.41 0.07 -0.03
0.55 0.09 -0.04
0.68 0.11 -0.05
0.15 -0.59 0.65
0.07 -0.73 -0.67
0.07 -0.29 0.32
12.4 0 0
0 9.5 0
0 0 1.3
≈
SVD	-	Dimensionality	ReducCon	
More	details	
•  Q:	How	exactly	is	dim.	reduc)on	done?	
•  A:	Set	smallest	singular	values	to	zero	
	
47	
x	 x	
1 1 1 0 0
3 3 3 0 0
4 4 4 0 0
5 5 5 0 0
0 2 0 4 4
0 0 0 5 5
0 1 0 2 2
0.13 0.02 -0.01
0.41 0.07 -0.03
0.55 0.09 -0.04
0.68 0.11 -0.05
0.15 -0.59 0.65
0.07 -0.73 -0.67
0.07 -0.29 0.32
12.4 0 0
0 9.5 0
0 0 1.3
0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
≈
SVD	-	Dimensionality	ReducCon	
More	details	
•  Q:	How	exactly	is	dim.	reduc)on	done?	
•  A:	Set	smallest	singular	values	to	zero	
	
≈	 x	 x	
1 1 1 0 0
3 3 3 0 0
4 4 4 0 0
5 5 5 0 0
0 2 0 4 4
0 0 0 5 5
0 1 0 2 2
0.13 0.02
0.41 0.07
0.55 0.09
0.68 0.11
0.15 -0.59
0.07 -0.73
0.07 -0.29
12.4 0
0 9.5
0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
ǁA-BǁF =	√	Σij (Aij-Bij)2
is	“small”	
SVD	–	Dimensionality	ReducCon	(PCA	
generalizaCon)	
More	details	
•  Q:	How	exactly	is	dim.	reduc)on	done?	
•  A:	Set	smallest	singular	values	to	zero	
	
≈	
1 1 1 0 0
3 3 3 0 0
4 4 4 0 0
5 5 5 0 0
0 2 0 4 4
0 0 0 5 5
0 1 0 2 2
0.92 0.95 0.92 0.01 0.01
2.91 3.01 2.91 -0.01 -0.01
3.90 4.04 3.90 0.01 0.01
4.82 5.00 4.82 0.03 0.03
0.70 0.53 0.70 4.11 4.11
-0.69 1.34 -0.69 4.78 4.78
0.32 0.23 0.32 2.01 2.01
Frobenius	norm:	
ǁMǁF =	√Σij Mij
2
Feature selection - example
Dummy Encoding
(De)MoCvaCon
SoluCon	to	those	problems	with	
features
Pipeline
Factorization Machine (degree 2)
General Applications of FMs
Summary	Pipeline
Pipeline
From	prototype	to	producCon	
•  Prototype	vs	ProducCon	Cme?	–	model	
(pipeline)	should	stay	the	same
Libraries
QuesCons?
Thank	you!!!
References	
•  hRps://www.coursera.org/learn/machine-
learning	
•  hRp://www.cs.cmu.edu/~tom/	
•  hRp://scikit-learn.org/stable/	
•  hRp://www.scalanlp.org/	
•  hRp://www.algo.uni-konstanz.de/members/
rendle/pdf/Rendle2010FM.pdf	
•  hRps://securityintelligence.com/factorizaCon-
machines-a-new-way-of-looking-at-machine-
learning/
References	
•  An	IntroducCon	to	Generalized	Linear	Models	
–	AnneRe	Dobson,	Adrian	BarneR	
•  Applying	Generalized	Linear	Models	–	James	
Lindsey	
•  hRps://www.codementor.io/jadianes/
building-a-recommender-with-apache-spark-
python-example-app-part1-du1083qbw	
•  hRps://www.chrisstucchio.com/blog/
index.html

More Related Content

PPTX
Describe Machine learning with math.
PDF
Number systems
PPTX
Dimensionality reduction: SVD and its applications
PPT
lecture 1 introduction to a u to m o t
PPTX
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
PPTX
Big Data & Machine Learning - TDC2013 Sao Paulo
PPTX
Machine learning workshop @DYP Pune
PDF
6 data envelopment_analysis
 
Describe Machine learning with math.
Number systems
Dimensionality reduction: SVD and its applications
lecture 1 introduction to a u to m o t
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 Sao Paulo
Machine learning workshop @DYP Pune
6 data envelopment_analysis
 

Similar to Pipeline of Supervised learning algorithms (20)

PDF
Tim connecting-the-dots
PPTX
Feature Engineering
PDF
The Machine Learning Workflow with Azure
PPTX
Machine Learning Essentials Demystified part2 | Big Data Demystified
PPTX
Keynote at IWLS 2017
PDF
Neural Networks and Deep Learning for Physicists
PPTX
Week2- Deep Learning Intuition.pptx
PDF
Data Summer Conf 2018, “Architecting IoT system with Machine Learning (ENG)” ...
PDF
Architecting IoT with Machine Learning
PPTX
Machine Learning, Deep Learning and Data Analysis Introduction
PDF
Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
PDF
Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
PPTX
An Introduction to Deep Learning with Apache MXNet (November 2017)
PDF
AI & ML in Cyber Security - Why Algorithms Are Dangerous
PDF
Introduction to computing Processing and performance.pdf
PPTX
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
PPT
DS Lecture-1 about discrete structure .ppt
PPTX
22PCOAM16_UNIT 1_Session 7 Single layer Perceptrons.pptx
PPTX
Introduction to computer vision with Convoluted Neural Networks
Tim connecting-the-dots
Feature Engineering
The Machine Learning Workflow with Azure
Machine Learning Essentials Demystified part2 | Big Data Demystified
Keynote at IWLS 2017
Neural Networks and Deep Learning for Physicists
Week2- Deep Learning Intuition.pptx
Data Summer Conf 2018, “Architecting IoT system with Machine Learning (ENG)” ...
Architecting IoT with Machine Learning
Machine Learning, Deep Learning and Data Analysis Introduction
Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
An Introduction to Deep Learning with Apache MXNet (November 2017)
AI & ML in Cyber Security - Why Algorithms Are Dangerous
Introduction to computing Processing and performance.pdf
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
DS Lecture-1 about discrete structure .ppt
22PCOAM16_UNIT 1_Session 7 Single layer Perceptrons.pptx
Introduction to computer vision with Convoluted Neural Networks
Ad

Recently uploaded (20)

PPT
protein biochemistry.ppt for university classes
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
An interstellar mission to test astrophysical black holes
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
. Radiology Case Scenariosssssssssssssss
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PDF
Sciences of Europe No 170 (2025)
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
The scientific heritage No 166 (166) (2025)
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
famous lake in india and its disturibution and importance
protein biochemistry.ppt for university classes
Phytochemical Investigation of Miliusa longipes.pdf
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Comparative Structure of Integument in Vertebrates.pptx
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Placing the Near-Earth Object Impact Probability in Context
An interstellar mission to test astrophysical black holes
INTRODUCTION TO EVS | Concept of sustainability
. Radiology Case Scenariosssssssssssssss
lecture 2026 of Sjogren's syndrome l .pdf
Sciences of Europe No 170 (2025)
HPLC-PPT.docx high performance liquid chromatography
The scientific heritage No 166 (166) (2025)
Introduction to Fisheries Biotechnology_Lesson 1.pptx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
7. General Toxicologyfor clinical phrmacy.pptx
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
2. Earth - The Living Planet Module 2ELS
famous lake in india and its disturibution and importance
Ad

Pipeline of Supervised learning algorithms