SlideShare a Scribd company logo
Machine Learning
Basic Concepts
Feature'2
'
Feature'1'
!"#$%&"'(
'
!"#$%&"')'
*"+,-,./'0.%/1#&2'
Terminology
Machine Learning, Data Science, Data Mining, Data Analysis, Sta-
tistical Learning, Knowledge Discovery in Databases, Pattern Dis-
covery.
Data everywhere!
1. Google: processes 24 peta bytes of data per day.
2. Facebook: 10 million photos uploaded every hour.
3. Youtube: 1 hour of video uploaded every second.
4. Twitter: 400 million tweets per day.
5. Astronomy: Satellite data is in hundreds of PB.
6. . . .
7. “By 2020 the digital universe will reach 44
zettabytes...”
The Digital Universe of Opportunities: Rich Data and the
Increasing Value of the Internet of Things, April 2014.
That’s 44 trillion gigabytes!
Data types
Data comes in different sizes and also flavors (types):
 Texts
 Numbers
 Clickstreams
 Graphs
 Tables
 Images
 Transactions
 Videos
 Some or all of the above!
Smile, we are ’DATAFIED’ !
• Wherever we go, we are “datafied”.
• Smartphones are tracking our locations.
• We leave a data trail in our web browsing.
• Interaction in social networks.
• Privacy is an important issue in Data Science.
The Data Science process
T
i
m
e
DATA COLLECTION
Static
Data.
Domain
expertise
1 3
4
5
!
DB%
DB
EDA
MACHINE LEARNING
Visualization
Descriptive
statistics,
Clustering
Research
questions?
Classification,
scoring, predictive
models,
clustering, density
estimation, etc.
Data-driven
decisions
Application
deployment
Model%(f)%
Yes!/!
90%!
Predicted%class/risk%
A!and!B!!!C!
Dashboard
Static
Data.
2 DATA PREPARATION
Data!cleaning!
+
+
+
+
+
-
+
+
-
-
-
-
-
-
+
Feature/variable!
engineering!
Applications of ML
• We all use it on a daily basis. Examples:
Machine Learning
• Spam filtering
• Credit card fraud detection
• Digit recognition on checks, zip codes
• Detecting faces in images
• MRI image analysis
• Recommendation system
• Search engines
• Handwriting recognition
• Scene classification
• etc...
Interdisciplinary field
ML!
Statistics!
Visualization!
Economics!
Databases!
Signal
processing!
Engineering !
Biology!
ML versus Statistics
Statistics:
• Hypothesis testing
• Experimental design
• Anova
• Linear regression
• Logistic regression
• GLM
• PCA
Machine Learning:
• Decision trees
• Rule induction
• Neural Networks
• SVMs
• Clustering method
• Association rules
• Feature selection
• Visualization
• Graphical models
• Genetic algorithm
http://statweb.stanford.edu/~jhf/ftp/dm-stat.pdf
Machine Learning definition
“How do we create computer programs that improve with experi-
ence?”
Tom Mitchell
http://videolectures.net/mlas06_mitchell_itm/
Machine Learning definition
“How do we create computer programs that improve with experi-
ence?”
Tom Mitchell
http://videolectures.net/mlas06_mitchell_itm/
“A computer program is said to learn from experience E with
respect to some class of tasks T and performance measure P, if
its performance at tasks in T, as measured by P, improves with
experience E. ”
Tom Mitchell. Machine Learning 1997.
Supervised vs. Unsupervised
Given: Training data: (x1, y1), . . . , (xn, yn) / xi ∈ Rd and yi is the
label.
example x1 → x11 x12 . . . x1d y1 ← label
. . . . . . . . . . . . . . . . . .
example xi → xi1 xi2 . . . xid yi ← label
. . . . . . . . . . . . . . . . . .
example xn → xn1 xn2 . . . xnd yn ← label
Supervised vs. Unsupervised
Given: Training data: (x1, y1), . . . , (xn, yn) / xi ∈ Rd and yi is the
label.
example x1 → x11 x12 . . . x1d y1 ← label
. . . . . . . . . . . . . . . . . .
example xi → xi1 xi2 . . . xid yi ← label
. . . . . . . . . . . . . . . . . .
example xn → xn1 xn2 . . . xnd yn ← label
Supervised vs. Unsupervised
Unsupervised learning:
Learning a model from unlabeled data.
Supervised learning:
Learning a model from labeled data.
Unsupervised Learning
Training data:“examples” x.
x1, . . . , xn, xi ∈ X ⊂ Rn
• Clustering/segmentation:
f : Rd −→ {C1, . . . Ck} (set of clusters).
Example: Find clusters in the population, fruits, species.
Unsupervised learning
Feature'2
'
Feature'1'
Unsupervised learning
Feature'2
'
Feature'1'
Unsupervised learning
Feature'2
'
Feature'1'
Methods: K-means, gaussian mixtures, hierarchical clustering,
spectral clustering, etc.
Supervised learning
Training data:“examples” x with “labels” y.
(x1, y1), . . . , (xn, yn) / xi ∈ Rd
• Classification: y is discrete. To simplify, y ∈ {−1, +1}
f : Rd −→ {−1, +1} f is called a binary classifier.
Example: Approve credit yes/no, spam/ham, banana/orange.
Supervised learning
!#$%'(
'
!#$%')'
Supervised learning
!#$%'(
'
!#$%')'
*+,-,./'0.%/1#2'
Supervised learning
!#$%'(
'
!#$%')'
*+,-,./'0.%/1#2'
Methods: Support Vector Machines, neural networks, decision
trees, K-nearest neighbors, naive Bayes, etc.
Supervised learning
Classification:
!#$%'(
'
!#$%')'
!#$%'(
'
!#$%')'
!#$%'('
!#$%')'
!#$%'('
!#$%')'
!#$%'(
'
!#$%')'
Supervised learning
Non linear classification
Supervised learning
Training data:“examples” x with “labels” y.
(x1, y1), . . . , (xn, yn) / xi ∈ Rd
• Regression: y is a real value, y ∈ R
f : Rd −→ R f is called a regressor.
Example: amount of credit, weight of fruit.
Supervised learning
Regression:
!
#$%'($)
Example: Income in function of age, weight of the fruit in function
of its length.
Supervised learning
Regression:
!
#$%'($)
Supervised learning
Regression:
!
#$%'($)
Supervised learning
Regression:
!
#$%'($)
Training and Testing
!#$%$%'()*'
+,'-./$*01'
+/2).'345'
Training and Testing
!#$%$%'()*'
+,'-./$*01'
+/2).'345'
6%7/1)8''
)%2)8''
#)8''
4#1$.9'(*#*:(8'
;$7/2)'
=)2$*'#1/:%*''
=)2$*'9)(?%/'
K-nearest neighbors
• Not every ML method builds a model!
• Our first ML method: KNN.
• Main idea: Uses the similarity between examples.
• Assumption: Two similar examples should have same labels.
• Assumes all examples (instances) are points in the d dimen-
sional space Rd.
K-nearest neighbors
• KNN uses the standard Euclidian distance to define nearest
neighbors.
Given two examples xi and xj:
d(xi, xj) =
v
u
u
u
t
d
X
k=1
(xik − xjk)2
K-nearest neighbors
Training algorithm:
Add each training example (x, y) to the dataset D.
x ∈ Rd, y ∈ {+1, −1}.
K-nearest neighbors
Training algorithm:
Add each training example (x, y) to the dataset D.
x ∈ Rd, y ∈ {+1, −1}.
Classification algorithm:
Given an example xq to be classified. Suppose Nk(xq) is the set of
the K-nearest neighbors of xq.
ŷq = sign(
X
xi∈Nk(xq)
yi)
K-nearest neighbors
3-NN. Credit: Introduction to Statistical Learning.
K-nearest neighbors
3-NN. Credit: Introduction to Statistical Learning.
Question: Draw an approximate decision boundary for K = 3?
K-nearest neighbors
Credit: Introduction to Statistical Learning.
K-nearest neighbors
Question: What are the pros and cons of K-NN?
K-nearest neighbors
Question: What are the pros and cons of K-NN?
Pros:
+ Simple to implement.
+ Works well in practice.
+ Does not require to build a model, make assumptions, tune
parameters.
+ Can be extended easily with news examples.
K-nearest neighbors
Question: What are the pros and cons of K-NN?
Pros:
+ Simple to implement.
+ Works well in practice.
+ Does not require to build a model, make assumptions, tune
parameters.
+ Can be extended easily with news examples.
Cons:
- Requires large space to store the entire training dataset.
- Slow! Given n examples and d features. The method takes
O(n × d) to run.
- Suffers from the curse of dimensionality.
Applications of K-NN
1. Information retrieval.
2. Handwritten character classification using nearest neighbor in
large databases.
3. Recommender systems (user like you may like similar movies).
4. Breast cancer diagnosis.
5. Medical data mining (similar patient symptoms).
6. Pattern recognition in general.
Training and Testing
!#$%$%'()*'
+,'-./$*01'
+/2).'345'
6%7/1)8''
)%2)8''
#)8''
4#1$.9'(*#*:(8'
;$7/2)'
=)2$*'#1/:%*''
=)2$*'9)(?%/'
Question: How can we be confident about f?
Training and Testing
• We calculate Etrain the in-sample error (training error or em-
pirical error/risk).
Etrain(f) =
n
X
i=1
`oss(yi, f(xi))
Training and Testing
• We calculate Etrain the in-sample error (training error or em-
pirical error/risk).
Etrain(f) =
n
X
i=1
`oss(yi, f(xi))
• Examples of loss functions:
– Classification error:
`oss(yi, f(xi)) =
(
1 if sign(yi) 6= sign(f(xi))
0 otherwise
Training and Testing
• We calculate Etrain the in-sample error (training error or em-
pirical error/risk).
Etrain(f) =
n
X
i=1
`oss(yi, f(xi))
• Examples of loss functions:
– Classification error:
`oss(yi, f(xi)) =
(
1 if sign(yi) 6= sign(f(xi))
0 otherwise
– Least square loss:
`oss(yi, f(xi)) = (yi − f(xi))2
Training and Testing
• We calculate Etrain the in-sample error (training error or em-
pirical error/risk).
Etrain(f) =
n
X
i=1
`oss(yi, f(xi))
• We aim to have Etrain(f) small, i.e., minimize Etrain(f)
Training and Testing
• We calculate Etrain the in-sample error (training error or em-
pirical error/risk).
Etrain(f) =
n
X
i=1
`oss(yi, f(xi))
• We aim to have Etrain(f) small, i.e., minimize Etrain(f)
• We hope that Etest(f), the out-sample error (test/true error),
will be small too.
Overfitting/underfitting
An intuitive example
Structural Risk Minimization
Predic'on*Error
*
Low*******************************************Complexity*of*the*model*************************************High*
____Test*error****
____Training*error*
High*Bias****** * * * * * * * ***Low*Bias**
Low*Variance* * * * * * * * *High*Variance*
UnderfiAng****************Good*models** * * *OverfiAng * ***********
Training and Testing
!#$%'
()'
!#$%'
()'
!#$%'
()'
Training and Testing
!#$%'
()'
!#$%'
()'
High bias (underfitting)
!#$%'
()'
Training and Testing
!#$%'
()'
!#$%'
()'
High bias (underfitting)
!#$%'
()'
High variance (overfitting)
Training and Testing
!#$%'
()'
!#$%'
()'
High bias (underfitting) Just right!
!#$%'
()'
High variance (overfitting)
Avoid overfitting
In general, use simple models!
• Reduce the number of features manually or do feature selec-
tion.
• Do a model selection (ML course).
• Use regularization (keep the features but reduce their impor-
tance by setting small parameter values) (ML course).
• Do a cross-validation to estimate the test error.
Regularization: Intuition
We want to minimize:
Classification term + C × Regularization term
n
X
i=1
`oss(yi, f(xi)) + C × R(f)
Regularization: Intuition
!#$%'
()'
!#$%'
()'
!#$%'
()'
f(x) = λ0 + λ1x ... (1)
f(x) = λ0 + λ1x + λ2x2 ... (2)
f(x) = λ0 + λ1x + λ2x2 + λ3x3 + λ4x4 ... (3)
Hint: Avoid high-degree polynomials.
Train, Validation and Test
TRAIN VALIDATION TEST
Example: Split the data randomly into 60% for training, 20% for
validation and 20% for testing.
Train, Validation and Test
TRAIN VALIDATION TEST
1. Training set is a set of examples used for learning a model
(e.g., a classification model).
Train, Validation and Test
TRAIN VALIDATION TEST
1. Training set is a set of examples used for learning a model
(e.g., a classification model).
2. Validation set is a set of examples that cannot be used for
learning the model but can help tune model parameters (e.g.,
selecting K in K-NN). Validation helps control overfitting.
Train, Validation and Test
TRAIN VALIDATION TEST
1. Training set is a set of examples used for learning a model
(e.g., a classification model).
2. Validation set is a set of examples that cannot be used for
learning the model but can help tune model parameters (e.g.,
selecting K in K-NN). Validation helps control overfitting.
3. Test set is used to assess the performance of the final model
and provide an estimation of the test error.
Train, Validation and Test
TRAIN VALIDATION TEST
1. Training set is a set of examples used for learning a model
(e.g., a classification model).
2. Validation set is a set of examples that cannot be used for
learning the model but can help tune model parameters (e.g.,
selecting K in K-NN). Validation helps control overfitting.
3. Test set is used to assess the performance of the final model
and provide an estimation of the test error.
Note: Never use the test set in any way to further tune
the parameters or revise the model.
K-fold Cross Validation
A method for estimating test error using training data.
Algorithm:
Given a learning algorithm A and a dataset D
Step 1: Randomly partition D into k equal-size subsets D1, . . . , Dk
Step 2:
For j = 1 to k
Train A on all Di, i ∈ 1, . . . k and i 6= j, and get fj.
Apply fj to Dj and compute EDj
Step 3: Average error over all folds.
k
X
j=1
(EDj)
Confusion matrix
!#$%$' (')*%$'
!#$%$' !#$%'()*)+$%,!- ./0($%'()*)+$%,.-
(')*%$' ./0($%1$2/*)+$%,.1- !#$%1$2/*)+$%,!1-
344#/45 +,!-.-,(/-0-+,!-.-,(-.-1!-.-1(/
$4)()'6 ,!-0-+,!-.-1!/
7$6()*)+)*5%,8$4/00- ,!-0-+,!-.-1(/
79$4):)4)*5 ,(-0-+,(-.-1!/
34*#/0%;/$0%
$=)4*$=%;/$0
,2'-3'45'6%*)'-7-3#$%$'-34'8$5%$6#-%2*%-*4'-
544'5%
,2'-3'45'6%*)'-7-3#$%$'-5*#'#-%2*%-9'4'-
34'8$5%'8-*#-3#$%$'
,2'-3'45'6%*)'-7-6')*%$'-5*#'#-%2*%-9'4'-
34'8$5%'8-*#-6')*%$'
,2'-3'45'6%*)'-7-34'8$5%$6#-%2*%-*4'-544'5%
Evaluation metrics
!#$%$' (')*%$'
!#$%$' !#$%'()*)+$%,!- ./0($%'()*)+$%,.-
(')*%$' ./0($%1$2/*)+$%,.1- !#$%1$2/*)+$%,!1-
344#/45 +,!-.-,(/-0-+,!-.-,(-.-1!-.-1(/
$4)()'6 ,!-0-+,!-.-1!/
7$6()*)+)*5%,8$4/00- ,!-0-+,!-.-1(/
79$4):)4)*5 ,(-0-+,(-.-1!/
34*#/0%;/$0%
$=)4*$=%;/$0
,2'-3'45'6%*)'-7-3#$%$'-34'8$5%$6#-%2*%-*4'-
544'5%
,2'-3'45'6%*)'-7-3#$%$'-5*#'#-%2*%-9'4'-
34'8$5%'8-*#-3#$%$'
,2'-3'45'6%*)'-7-6')*%$'-5*#'#-%2*%-9'4'-
34'8$5%'8-*#-6')*%$'
,2'-3'45'6%*)'-7-34'8$5%$6#-%2*%-*4'-544'5%
Terminology review
Review the concepts and terminology:
Instance, example, feature, label, supervised learning, unsu-
pervised learning, classification, regression, clustering, pre-
diction, training set, validation set, test set, K-fold cross val-
idation, classification error, loss function, overfitting, under-
fitting, regularization.
Machine Learning Books
1. Tom Mitchell, Machine Learning.
2. Abu-Mostafa, Yaser S. and Magdon-Ismail, Malik and Lin,
Hsuan-Tien, Learning From Data, AMLBook.
3. The elements of statistical learning. Data mining, inference,
and prediction T. Hastie, R. Tibshirani, J. Friedman.
4. Christopher Bishop. Pattern Recognition and Machine Learn-
ing.
5. Richard O. Duda, Peter E. Hart, David G. Stork. Pattern
Classification. Wiley.
Machine Learning Resources
• Major journals/conferences: ICML, NIPS, UAI, ECML/PKDD,
JMLR, MLJ, etc.
• Machine learning video lectures:
http://videolectures.net/Top/Computer_Science/Machine_Learning/
• Machine Learning (Theory):
http://hunch.net/
• LinkedIn ML groups: “Big Data” Scientist, etc.
• Women in Machine Learning:
https://groups.google.com/forum/#!forum/women-in-machine-learning
• KDD nuggets http://www.kdnuggets.com/
Credit
• The elements of statistical learning. Data mining, inference,
and prediction. 10th Edition 2009. T. Hastie, R. Tibshirani,
J. Friedman.
• Machine Learning 1997. Tom Mitchell.

More Related Content

PPTX
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
PPTX
Machine Learning
PPTX
Machine Learning Contents.pptx
PPTX
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
PPTX
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
PPTX
Machine learning
PDF
ML Basics
PPT
Machine learning
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
Machine Learning
Machine Learning Contents.pptx
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine learning
ML Basics
Machine learning

What's hot (20)

PDF
An introduction to Machine Learning
PPT
Machine Learning presentation.
PDF
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
PPTX
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
PPT
Machine Learning
PDF
Computer vision
PPT
Machine Learning
PDF
Machine learning
PDF
Machine Learning Deep Learning AI and Data Science
PPTX
Data preprocessing in Machine learning
PPTX
Machine learning
PDF
Machine Learning for Everyone
PDF
Machine learning
PPTX
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
PPTX
Computer vision ppt
PPTX
Machine learning ppt.
PDF
Introduction to Machine Learning
PDF
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Tutori...
PDF
Machine Learning Ml Overview Algorithms Use Cases And Applications
PPTX
Feature Selection in Machine Learning
An introduction to Machine Learning
Machine Learning presentation.
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning
Computer vision
Machine Learning
Machine learning
Machine Learning Deep Learning AI and Data Science
Data preprocessing in Machine learning
Machine learning
Machine Learning for Everyone
Machine learning
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
Computer vision ppt
Machine learning ppt.
Introduction to Machine Learning
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Tutori...
Machine Learning Ml Overview Algorithms Use Cases And Applications
Feature Selection in Machine Learning
Ad

Similar to Machine Learning ebook.pdf (20)

PDF
know Machine Learning Basic Concepts.pdf
PDF
ML Basic Concepts.pdf
PPTX
Machine Learning Seminar
PPTX
Machine learning ppt unit one syllabuspptx
PPT
slides
PPT
slides
PPT
Machine Learning ICS 273A
PPT
Supervised and unsupervised learning
PPTX
Intro to machine learning
PPTX
Lecture 09(introduction to machine learning)
PDF
Introduction to Machine Learning
PPTX
Intro to modelling-supervised learning
PPTX
Introduction to Machine Learning
PPTX
Machine_Learning.pptx
PDF
Lect 8 learning types (M.L.).pdf
PPT
learning.ppt
PPTX
demo lecture for foundation class for btech
PPTX
Machine Learning.pptx
PDF
Machine Learning : why we should know and how it works
PPS
Brief Tour of Machine Learning
know Machine Learning Basic Concepts.pdf
ML Basic Concepts.pdf
Machine Learning Seminar
Machine learning ppt unit one syllabuspptx
slides
slides
Machine Learning ICS 273A
Supervised and unsupervised learning
Intro to machine learning
Lecture 09(introduction to machine learning)
Introduction to Machine Learning
Intro to modelling-supervised learning
Introduction to Machine Learning
Machine_Learning.pptx
Lect 8 learning types (M.L.).pdf
learning.ppt
demo lecture for foundation class for btech
Machine Learning.pptx
Machine Learning : why we should know and how it works
Brief Tour of Machine Learning
Ad

Recently uploaded (20)

PPTX
additive manufacturing of ss316l using mig welding
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
web development for engineering and engineering
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Well-logging-methods_new................
PPT
Project quality management in manufacturing
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
Digital Logic Computer Design lecture notes
PPT
Mechanical Engineering MATERIALS Selection
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Geodesy 1.pptx...............................................
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
additive manufacturing of ss316l using mig welding
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Structs to JSON How Go Powers REST APIs.pdf
web development for engineering and engineering
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Well-logging-methods_new................
Project quality management in manufacturing
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Model Code of Practice - Construction Work - 21102022 .pdf
Digital Logic Computer Design lecture notes
Mechanical Engineering MATERIALS Selection
Embodied AI: Ushering in the Next Era of Intelligent Systems
Geodesy 1.pptx...............................................
Arduino robotics embedded978-1-4302-3184-4.pdf

Machine Learning ebook.pdf