SlideShare a Scribd company logo
The First NIDA Business Analytics and Data Sciences Contest/Conference
วันที่ 1-2 กันยายน 2559 ณ อาคารนวมินทราธิราช สถาบันบัณฑิตพัฒนบริหารศาสตร์
https://businessanalyticsnida.wordpress.com
https://www.facebook.com/BusinessAnalyticsNIDA/
โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
สาขาวิชาวิทยาการข้อมูล
คณะสถิติประยุกต์ สถาบันบัณฑิตพัฒนบริหารศาสตร์
Machine Learning: An introduction
เครื่องจักรเรียนรู้ได้อย่างไร?
เครื่องจักรเรียนรู้อะไรได้บ้าง
การเรียนรู้ของเครื่องจักรเอาไปประยุกต์ใช้งานใดได้บ้าง
ต้องใช้คณิตศาสตร์ขั้นสูงในการเรียนรู้เรื่องการเรียนรู้ของเครื่องจักร?
มี software อะไรบ้างที่ใช้สาหรับการเรียนรู้ของเครื่องจักร
ประเภทของการเรียนรู้ของเครื่องจักรมีกี่ประเภท แต่ละประเภทเอาไปประยุกต์ใช้อะไร
นวมินทราธิราช 3003 วันที่ 1 กันยายน 2559 10.15-12.30 น.
Machine Learning
An Introduction
Types of Machine Learning
• Supervised Learning ( Classification,
Prediction )
• Unsupervised Learning ( Cluster Analysis )
• Association Analysis
• Reinforcement Learning
• Evolutionary Learning
Classification
• Based on Supervised Learning
• Given a collection of records (training set )
– Each record contains a set of attributes,
one of the attributes is the class.
• Find a model for class attribute as a function
of the values of other attributes.
• Goal: previously unseen records should be
assigned a class as accurately as possible.
– A test set is used to determine the accuracy
of the model. Usually, the given data set is
divided into training and test sets, with
training set used to build the model and test
set used to validate it.
Classification Task
Apply
Model
Induction
Deduction
Learn
Model
Model
Tid Attrib1 Attrib2 Attrib3 Class
1 Yes Large 125K No
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No
8 No Small 85K Yes
9 No Medium 75K No
10 No Small 90K Yes
10
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Learning
algorithm
Training Set
Examples of Classification Tasks
• Predicting potential customers of a new product
• Identifying spam emails or network intrusion
connections
• Classifying credit risks of customers
• Categorizing news stories as finance,
weather, entertainment, sports, etc
Classification Techniques
• Decision Trees
• K-nearest Neighbors
• Neural Networks
• Naïve Bayes and Bayesian Belief Networks
• Support Vector Machines
• Ensemble Method
Example of a Decision Tree
Tid Refund Marital
Status
Taxable
Income Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Splitting Attributes
Training Data
Model: Decision Tree
Decision Tree Classification
Task
Apply
Model
Induction
Deduction
Learn
Model
Model
Tid Attrib1 Attrib2 Attrib3 Class
1 Yes Large 125K No
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No
8 No Small 85K Yes
9 No Medium 75K No
10 No Small 90K Yes
10
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Tree
Induction
algorithm
Training Set
Decision
Tree
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
Start from the root of tree.
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
Assign Cheat to “No”
Decision Boundary
y < 0.33?
: 0
: 3
: 4
: 0
y < 0.47?
: 4
: 0
: 0
: 4
x < 0.43?
Yes
Yes
No
No Yes No
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
y
• Border line between two neighboring regions of different classes is
known as decision boundary
• Decision boundary is parallel to axes because test condition involves
a single attribute at-a-time
Tree Induction
• Greedy strategy
– Split the training records assigned to each node
from root node to the leaf nodes based on an
attribute test that optimizes certain criterion e.g.
gains of homogeneity of training records for each
node in the tree
– Measures of homogeneity of training records for
a tree node : Entropy, GINI
– Stop splitting when some predefined criterion are
met e.g. the measures reach a predefined
certain thresholds
Measure of Impurity: GINI
• Gini Index for a given node t :
• p( j | t) is the relative frequency of class j at node t.
– Maximum (1 - 1/nc) when records are equally
distributed among all classes, implying least
interesting information
– Minimum (0.0) when all records belong to one class,
implying most interesting information

j
tjptGINI 2
)]|([1)(
Measure of Impurity: Entropy
• Entropy at a given node t:
• p( j | t) is the relative frequency of class j at node t.
– Measures impurity of a node.
• Maximum (log nc) when records are equally distributed
among all classes implying least information
• Minimum (0.0) when all records belong to one class,
implying most information
 j
tjptjptEntropy )|(log)|()(
Nearest Neighbor Classifiers
Training
Records
Test
Record
Compute
Distance
Choose k of the
“nearest” records
Nearest-Neighbor Classifiers
 Requires three things
– The set of stored records
– Distance Metric to compute
distance between records
– The value of k, the number of
nearest neighbors to retrieve
 To classify an unknown record:
– Compute distance to other
training records
– Identify k nearest neighbors
– Use class labels of nearest
neighbors to determine the
class label of unknown record
(e.g., by taking majority vote)
Unknown record
Nearest Neighbor Classification
• Choosing the value of k:
– If k is too small, sensitive to noise points
– If k is too large, neighborhood may include points from
other classes
X
Bayesian Classifiers
• Consider each attribute and class label as random
variables
• Given a record with attributes (A1, A2,…,An)
– Goal is to predict class C
– Specifically, we want to find the value of C that
maximizes P(C| A1, A2,…,An )
• Can we estimate P(C| A1, A2,…,An ) directly from
data?
Bayesian Classifiers
• Approach:
– compute the posterior probability P(C | A1, A2, …, An) for
all values of C using the Bayes theorem
– Choose value of C that maximizes
P(C | A1, A2, …, An)
– Equivalent to choosing value of C that maximizes
P(A1, A2, …, An|C) P(C)
• How to estimate P(A1, A2, …, An | C )?
)(
)()|(
)|(
21
21
21
n
n
n
AAAP
CPCAAAP
AAACP


 
Naïve Bayes Classifier
• Assume independence among attributes Ai when
class is given:
– P(A1, A2, …, An |Cj) = P(A1| Cj) P(A2| Cj)… P(An| Cj)
– Can estimate P(Ai| Cj) for all Ai and Cj.
– New unknown record is classified to Cj if P(Cj) 
P(Ai| Cj) is maximal.
Artificial Neural Networks (ANN)
)( tXwIY
i
ii  
Perceptron Model
)( tXwsignY
i
ii  
or
• Model is an assembly
of inter-connected
nodes and weighted
links
• Output node sums up
each of its input value
according to the
weights of its links
• Compare output node
against some
threshold t

X1
X2
X3
Y
Black box
w1
t
Output
node
Input
nodes
w2
w3
General Structure of ANN
Activation
function
g(Si )
Si
Oi
I1
I2
I3
wi1
wi2
wi3
Oi
Neuron iInput Output
threshold, t
Input
Layer
Hidden
Layer
Output
Layer
x1 x2 x3 x4 x5
y
Training ANN means learning the
weights of the neurons as well as t
1
( )
1 x
sigmoid x
e


( )i i
i
Y sigmoid w X t 
Backpropagation algorithm
– Gradient Descent is illustrated using single weight
w1 of w
– Preferred values for w1 minimize
– Optimal value for w1 is w1*
SSE
w1L w1RW1* W1
 
2
( , )i i
i
SSE Y f w X 
Backpropagation algorithm
– Direction for adjusting wCURRENT is negative sign of
derivative at SSE at wCURRENT
– To adjust, use magnitude of the derivative of SSE at
wCURRENT
– When curve steep, adjustment large
– When curve nearly flat, adjustment small
– Learning Rate η has values [0, 1]
)(
CURRENTw
SSE
sign



)(
CURRENT
CURRENT
w
SSE
w


 
Support Vector Machines
• Find hyperplane maximizes the margin => B1 is better than
B2
B1
B2
b11
b12
b21
b22
margin
Support Vector Machines
B1
b11
b12
0 bxw

1 bxw
 1 bxw

1 ( ) if w x b 1
( )
1 ( ) if w x b 1
positive class
f x
negative class
      
 
       
2
||||
2
Margin
w

Support Vector Machines
• We want to maximize:
– Which is equivalent to minimizing:
– But subjected to the following constraints:
– This is a constrained optimization problem.
Numerical approaches, e.g. quadratic
programming, can be used to solve it.
2
||||
2
Margin
w

i
i
1 if w x b 1
( )
1 if w x b 1if x
  
 
    
2
||||
)(
2
w
wL


Support Vector Machines
• Decision Function for classifying a given
data z
 i i i
i SV
f(z) = sign y x z + b

 
  
 
Nonlinear Support Vector Machines
• What if decision boundary is not linear?• What if decision boundary is not linear?
Nonlinear Support Vector Machines
• Transform data vector X into new dimensional space
• Some Kernel Functions can be used to compute the dot
product between any two given original data vectors in
the new data space (without the need of actual data
transformation).
Ensemble Methods
• Construct a set of classifiers from the training
data
• Predict class label of previously unseen records
by aggregating predictions made by multiple
classifiers
General Idea
Original
Training data
....D1
D2 Dt-1 Dt
D
Step 1:
Create Multiple
Data Sets
C1 C2 Ct -1 Ct
Step 2:
Build Multiple
Classifiers
C*
Step 3:
Combine
Classifiers
Why does it work?







25
13
25
06.0)1(
25
i
ii
i

• Suppose there are 25 base classifiers
– Each classifier has error rate,  = 0.35
– Assume classifiers are independent
– Probability that the ensemble classifier makes
a wrong prediction:
Examples of Ensemble Methods
• How to generate an ensemble of classifiers?
– Bagging
– Boosting
Bagging
• Sampling with replacement
• Build classifier on each bootstrap sample
• Each sample has probability 1 - (1 – 1/n)n of
being selected in each round
Original Data 1 2 3 4 5 6 7 8 9 10
Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9
Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2
Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7
Boosting
• An iterative procedure to adaptively change
distribution of training data by focusing more on
previously misclassified records
– Initially, all N records are assigned equal
weights
– Unlike bagging, weights may change at the
end of boosting round
Boosting
• Records that are wrongly classified will have their
weights increased
• Records that are classified correctly will have
their weights decreased
Original Data 1 2 3 4 5 6 7 8 9 10
Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3
Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2
Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4
• Example 4 is hard to classify
• Its weight is increased, therefore it is more
likely to be chosen again in subsequent rounds
What is Cluster Analysis?
• Finding groups of objects such that the objects in
a group will be similar (or related) to one another
and different from (or unrelated to) the objects in
other groups Inter-cluster
distances are
maximized
Intra-cluster
distances are
minimized
Applications of Cluster Analysis
• Understanding
– Group related documents for browsing, group
customers into segments or group stocks with similar
price fluctuations
• Summarization
– Reduce the size of large data sets by sampling data
from each cluster
K-means Clustering
• Each cluster is associated with a centroid
(center point)
• Each data point is assigned to the cluster with
the closest centroid
• Number of clusters, K, must be specified
• The basic algorithm is very simple
K-Means Algorithm
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 3
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 4
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 5
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 6
Limitations of K-means
• K-means has problems when clusters are of
differing
– Sizes
– Densities
– Non-globular shapes
• K-means has problems when the data contains
outliers.
Hierarchical Clustering
• Produces a set of nested clusters organized as a
hierarchical tree
• Can be visualized as a dendrogram
– A tree like diagram that records the sequences
of merges or splits
1 3 2 5 4 6
0
0.05
0.1
0.15
0.2
1
2
3
4
5
6
1
2
3 4
5
Agglomerative Clustering Algorithm
• A popular hierarchical clustering technique
• Basic algorithm is straightforward
1. Compute the proximity matrix (similarities between
pairs of clusters)
2. Let each data point be a cluster
3. Repeat
4. Merge the two closest clusters
5. Update the proximity matrix
6. Until only a single cluster remains
May not be suitable for large datasets due to the cost
of computing and updating the proximity matrix
How to Define Inter-Cluster Similarity
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.
Similarity?
• MIN
• MAX
• Group Average
• Distance Between Centroids
• Ward’s Method uses squared
error
Proximity Matrix
Hierarchical Clustering: Comparison
Group Average
Ward’s Method
1
2
3
4
5
6
1
2
5
3
4
MIN MAX
1
2
3
4
5
6
1
2
5
3
4
1
2
3
4
5
6
1
2 5
3
41
2
3
4
5
6
1
2
3
4
5
Other Issues
• Data Cleaning
• Data Sampling
• Dimension Reduction
• Data Visualization
• Over fitting and Under fitting Problems
• Imbalance Issues

More Related Content

PDF
Face recognition and deep learning โดย ดร. สรรพฤทธิ์ มฤคทัต NECTEC
PDF
Data exploration validation and sanitization
PDF
Barga Data Science lecture 6
PPTX
presentationIDC - 14MAY2015
PDF
Machine learning for IoT - unpacking the blackbox
PDF
Barga Data Science lecture 3
PDF
Heuristic design of experiments w meta gradient search
PDF
Introduction to Machine Learning with SciKit-Learn
Face recognition and deep learning โดย ดร. สรรพฤทธิ์ มฤคทัต NECTEC
Data exploration validation and sanitization
Barga Data Science lecture 6
presentationIDC - 14MAY2015
Machine learning for IoT - unpacking the blackbox
Barga Data Science lecture 3
Heuristic design of experiments w meta gradient search
Introduction to Machine Learning with SciKit-Learn

What's hot (20)

PPTX
Automatic Visualization - Leland Wilkinson, Chief Scientist, H2O.ai
PDF
Barga Data Science lecture 5
PPTX
PDF
Barga Data Science lecture 8
PDF
Barga Data Science lecture 10
PPTX
Linear regression on 1 terabytes of data? Some crazy observations and actions
PPTX
Machine learning with scikitlearn
PPTX
AWS Forcecast: DeepAR Predictor Time-series
PPTX
Recommendation system using collaborative deep learning
PDF
Machine Learning for Aerospace Training
PDF
Machine Learning with JavaScript
PDF
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
PDF
Machine learning for_finance
PPTX
Build Deep Learning model to identify santader bank's dissatisfied customers
PDF
Barga Data Science lecture 9
PDF
GLM & GBM in H2O
PDF
1030 track 2 barrett_using our laptop
PPTX
Machine Learning and Real-World Applications
PDF
Barga Data Science lecture 7
PPTX
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
Automatic Visualization - Leland Wilkinson, Chief Scientist, H2O.ai
Barga Data Science lecture 5
Barga Data Science lecture 8
Barga Data Science lecture 10
Linear regression on 1 terabytes of data? Some crazy observations and actions
Machine learning with scikitlearn
AWS Forcecast: DeepAR Predictor Time-series
Recommendation system using collaborative deep learning
Machine Learning for Aerospace Training
Machine Learning with JavaScript
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Machine learning for_finance
Build Deep Learning model to identify santader bank's dissatisfied customers
Barga Data Science lecture 9
GLM & GBM in H2O
1030 track 2 barrett_using our laptop
Machine Learning and Real-World Applications
Barga Data Science lecture 7
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
Ad

Viewers also liked (16)

PDF
Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS A...
PDF
Analytics of Hospital Clustering & Profiling as a Tool for Evidence-based Org...
PDF
Introduction to big data and analytic eakasit patcharawongsakda
PDF
การฉายภาพประชากรข้าราชการไทยใน 30 ปีข้างหน้า โดย อาจารย์ ดร. อานนท์ ศักดิ์วรว...
PDF
Actuarial Sciences and Risk Management @NIDA ผศ.ดร.ปรีชา วิจิตรธรรมรส หัวหน้...
PDF
"Factors Affecting The Engagement of LINE Customers in Bangkok โดย นายวงศกร ...
PDF
Proportional Hazard Model for Predicting Stroke Mortality โดย พิมพ์ชนก พุฒขาว...
PDF
From fraudulence to adversarial learning จรัล งามวิโรจน์เจริญ chief data sci...
PDF
ออกแบบกรมธรรมประกันชีวิตให้เข้าใจง่ายและดึงดูดใจคนซื้อ: การศึกษาตัวแปรส่งผ่าน...
PPTX
Smart farm concept ait
PDF
เสถียรภาพและความมั่นคงของกองทุนการออมแห่งชาติ: การประเมินทางคณิตศาสตร์ประกันภ...
PDF
นำเสนอขาย RMF อย่างไรให้ได้ผล: การศึกษาเชิงทดลองเพื่อรองรับภาวะสังคมผู้สูงอาย...
PDF
Big Data Analytics to Enhance Security คุณอนพัทย์ พิพัฒน์กิติบดี Technical Ma...
PDF
สถิติทางการกับการพัฒนาประเทศ บทบาทของสำนักงานสถิติแห่งชาติ โดย นางหทัยชนก พรร...
PDF
วิชาการสถิติเกี่ยวข้องกับงานวิจัยเกษตรอย่างไร โดย พุฒนา รุ่งระวี วทม. (NIDA)
PDF
Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล
Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS A...
Analytics of Hospital Clustering & Profiling as a Tool for Evidence-based Org...
Introduction to big data and analytic eakasit patcharawongsakda
การฉายภาพประชากรข้าราชการไทยใน 30 ปีข้างหน้า โดย อาจารย์ ดร. อานนท์ ศักดิ์วรว...
Actuarial Sciences and Risk Management @NIDA ผศ.ดร.ปรีชา วิจิตรธรรมรส หัวหน้...
"Factors Affecting The Engagement of LINE Customers in Bangkok โดย นายวงศกร ...
Proportional Hazard Model for Predicting Stroke Mortality โดย พิมพ์ชนก พุฒขาว...
From fraudulence to adversarial learning จรัล งามวิโรจน์เจริญ chief data sci...
ออกแบบกรมธรรมประกันชีวิตให้เข้าใจง่ายและดึงดูดใจคนซื้อ: การศึกษาตัวแปรส่งผ่าน...
Smart farm concept ait
เสถียรภาพและความมั่นคงของกองทุนการออมแห่งชาติ: การประเมินทางคณิตศาสตร์ประกันภ...
นำเสนอขาย RMF อย่างไรให้ได้ผล: การศึกษาเชิงทดลองเพื่อรองรับภาวะสังคมผู้สูงอาย...
Big Data Analytics to Enhance Security คุณอนพัทย์ พิพัฒน์กิติบดี Technical Ma...
สถิติทางการกับการพัฒนาประเทศ บทบาทของสำนักงานสถิติแห่งชาติ โดย นางหทัยชนก พรร...
วิชาการสถิติเกี่ยวข้องกับงานวิจัยเกษตรอย่างไร โดย พุฒนา รุ่งระวี วทม. (NIDA)
Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล
Ad

Similar to Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล (20)

PPTX
BAS 250 Lecture 8
PPTX
Data Mining Lecture_10(b).pptx
PPTX
background.pptx
PPTX
Week 13 Feature Selection Computer Vision Bagian 2
PPTX
NN Classififcation Neural Network NN.pptx
PPT
594503964-Introduction-to-Classification-PPT-Slides-1.ppt
PPTX
Ensemble_instance_unsupersied_learning 01_02_2024.pptx
PDF
Decision-Tree-.pdf techniques and many more
PDF
Decision tree for data mining and computer
PPTX
Lec13 Clustering.pptx
PDF
Machine Learning Algorithms Introduction.pdf
PDF
Machine learning meetup
PDF
Machine Learning Notes for beginners ,Step by step
PPTX
10_support_vector_machines (1).pptx
PPTX
Deep learning from mashine learning AI..
PPT
[ppt]
PPT
[ppt]
PDF
مدخل إلى تعلم الآلة
PPT
PPTX
Lecture4.pptx
BAS 250 Lecture 8
Data Mining Lecture_10(b).pptx
background.pptx
Week 13 Feature Selection Computer Vision Bagian 2
NN Classififcation Neural Network NN.pptx
594503964-Introduction-to-Classification-PPT-Slides-1.ppt
Ensemble_instance_unsupersied_learning 01_02_2024.pptx
Decision-Tree-.pdf techniques and many more
Decision tree for data mining and computer
Lec13 Clustering.pptx
Machine Learning Algorithms Introduction.pdf
Machine learning meetup
Machine Learning Notes for beginners ,Step by step
10_support_vector_machines (1).pptx
Deep learning from mashine learning AI..
[ppt]
[ppt]
مدخل إلى تعلم الآلة
Lecture4.pptx

More from BAINIDA (20)

PDF
ดนตรีของพระเจ้าแผ่นดิน อานนท์ ศักดิ์วรวิชญ์ สุรพงษ์ บ้านไกรทอง หอประชุมวปอ 7...
PDF
Mixed methods in social and behavioral sciences
PDF
Advanced quantitative research methods in political science and pa
PPTX
Latest thailand election2019report
PDF
Data science in medicine
PPTX
Nursing data science
PDF
Financial time series analysis with R@the 3rd NIDA BADS conference by Asst. p...
PDF
Statistics and big data for justice and fairness
PDF
Data science and big data for business and industrial application
PDF
Update trend: Free digital marketing metrics for start-up
PDF
Advent of ds and stat adjustment
PPTX
เมื่อ Data Science เข้ามา สถิติศาสตร์จะปรับตัวอย่างไร
PPTX
Data visualization. map
PPTX
Dark data by Worapol Alex Pongpech
PDF
Deepcut Thai word Segmentation @ NIDA
PPTX
Professionals and wanna be in Business Analytics and Data Science
PDF
Deep learning and image analytics using Python by Dr Sanparit
PDF
Visualizing for impact final
PPTX
Python programming workshop
PDF
Second prize business plan @ the First NIDA business analytics and data scien...
ดนตรีของพระเจ้าแผ่นดิน อานนท์ ศักดิ์วรวิชญ์ สุรพงษ์ บ้านไกรทอง หอประชุมวปอ 7...
Mixed methods in social and behavioral sciences
Advanced quantitative research methods in political science and pa
Latest thailand election2019report
Data science in medicine
Nursing data science
Financial time series analysis with R@the 3rd NIDA BADS conference by Asst. p...
Statistics and big data for justice and fairness
Data science and big data for business and industrial application
Update trend: Free digital marketing metrics for start-up
Advent of ds and stat adjustment
เมื่อ Data Science เข้ามา สถิติศาสตร์จะปรับตัวอย่างไร
Data visualization. map
Dark data by Worapol Alex Pongpech
Deepcut Thai word Segmentation @ NIDA
Professionals and wanna be in Business Analytics and Data Science
Deep learning and image analytics using Python by Dr Sanparit
Visualizing for impact final
Python programming workshop
Second prize business plan @ the First NIDA business analytics and data scien...

Recently uploaded (20)

PPTX
Pharma ospi slides which help in ospi learning
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
RMMM.pdf make it easy to upload and study
PDF
Computing-Curriculum for Schools in Ghana
PPTX
GDM (1) (1).pptx small presentation for students
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
Lesson notes of climatology university.
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
master seminar digital applications in india
PPTX
Cell Structure & Organelles in detailed.
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Microbial disease of the cardiovascular and lymphatic systems
Pharma ospi slides which help in ospi learning
FourierSeries-QuestionsWithAnswers(Part-A).pdf
RMMM.pdf make it easy to upload and study
Computing-Curriculum for Schools in Ghana
GDM (1) (1).pptx small presentation for students
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
O5-L3 Freight Transport Ops (International) V1.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
VCE English Exam - Section C Student Revision Booklet
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Lesson notes of climatology university.
2.FourierTransform-ShortQuestionswithAnswers.pdf
master seminar digital applications in india
Cell Structure & Organelles in detailed.
Microbial diseases, their pathogenesis and prophylaxis
Microbial disease of the cardiovascular and lymphatic systems

Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล

  • 1. The First NIDA Business Analytics and Data Sciences Contest/Conference วันที่ 1-2 กันยายน 2559 ณ อาคารนวมินทราธิราช สถาบันบัณฑิตพัฒนบริหารศาสตร์ https://businessanalyticsnida.wordpress.com https://www.facebook.com/BusinessAnalyticsNIDA/ โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล สาขาวิชาวิทยาการข้อมูล คณะสถิติประยุกต์ สถาบันบัณฑิตพัฒนบริหารศาสตร์ Machine Learning: An introduction เครื่องจักรเรียนรู้ได้อย่างไร? เครื่องจักรเรียนรู้อะไรได้บ้าง การเรียนรู้ของเครื่องจักรเอาไปประยุกต์ใช้งานใดได้บ้าง ต้องใช้คณิตศาสตร์ขั้นสูงในการเรียนรู้เรื่องการเรียนรู้ของเครื่องจักร? มี software อะไรบ้างที่ใช้สาหรับการเรียนรู้ของเครื่องจักร ประเภทของการเรียนรู้ของเครื่องจักรมีกี่ประเภท แต่ละประเภทเอาไปประยุกต์ใช้อะไร นวมินทราธิราช 3003 วันที่ 1 กันยายน 2559 10.15-12.30 น.
  • 3. Types of Machine Learning • Supervised Learning ( Classification, Prediction ) • Unsupervised Learning ( Cluster Analysis ) • Association Analysis • Reinforcement Learning • Evolutionary Learning
  • 4. Classification • Based on Supervised Learning • Given a collection of records (training set ) – Each record contains a set of attributes, one of the attributes is the class. • Find a model for class attribute as a function of the values of other attributes. • Goal: previously unseen records should be assigned a class as accurately as possible. – A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.
  • 5. Classification Task Apply Model Induction Deduction Learn Model Model Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes 10 Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10 Test Set Learning algorithm Training Set
  • 6. Examples of Classification Tasks • Predicting potential customers of a new product • Identifying spam emails or network intrusion connections • Classifying credit risks of customers • Categorizing news stories as finance, weather, entertainment, sports, etc
  • 7. Classification Techniques • Decision Trees • K-nearest Neighbors • Neural Networks • Naïve Bayes and Bayesian Belief Networks • Support Vector Machines • Ensemble Method
  • 8. Example of a Decision Tree Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Splitting Attributes Training Data Model: Decision Tree
  • 9. Decision Tree Classification Task Apply Model Induction Deduction Learn Model Model Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes 10 Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10 Test Set Tree Induction algorithm Training Set Decision Tree
  • 10. Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data Start from the root of tree.
  • 11. Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data
  • 12. Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data
  • 13. Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data
  • 14. Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data
  • 15. Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data Assign Cheat to “No”
  • 16. Decision Boundary y < 0.33? : 0 : 3 : 4 : 0 y < 0.47? : 4 : 0 : 0 : 4 x < 0.43? Yes Yes No No Yes No 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x y • Border line between two neighboring regions of different classes is known as decision boundary • Decision boundary is parallel to axes because test condition involves a single attribute at-a-time
  • 17. Tree Induction • Greedy strategy – Split the training records assigned to each node from root node to the leaf nodes based on an attribute test that optimizes certain criterion e.g. gains of homogeneity of training records for each node in the tree – Measures of homogeneity of training records for a tree node : Entropy, GINI – Stop splitting when some predefined criterion are met e.g. the measures reach a predefined certain thresholds
  • 18. Measure of Impurity: GINI • Gini Index for a given node t : • p( j | t) is the relative frequency of class j at node t. – Maximum (1 - 1/nc) when records are equally distributed among all classes, implying least interesting information – Minimum (0.0) when all records belong to one class, implying most interesting information  j tjptGINI 2 )]|([1)(
  • 19. Measure of Impurity: Entropy • Entropy at a given node t: • p( j | t) is the relative frequency of class j at node t. – Measures impurity of a node. • Maximum (log nc) when records are equally distributed among all classes implying least information • Minimum (0.0) when all records belong to one class, implying most information  j tjptjptEntropy )|(log)|()(
  • 21. Nearest-Neighbor Classifiers  Requires three things – The set of stored records – Distance Metric to compute distance between records – The value of k, the number of nearest neighbors to retrieve  To classify an unknown record: – Compute distance to other training records – Identify k nearest neighbors – Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote) Unknown record
  • 22. Nearest Neighbor Classification • Choosing the value of k: – If k is too small, sensitive to noise points – If k is too large, neighborhood may include points from other classes X
  • 23. Bayesian Classifiers • Consider each attribute and class label as random variables • Given a record with attributes (A1, A2,…,An) – Goal is to predict class C – Specifically, we want to find the value of C that maximizes P(C| A1, A2,…,An ) • Can we estimate P(C| A1, A2,…,An ) directly from data?
  • 24. Bayesian Classifiers • Approach: – compute the posterior probability P(C | A1, A2, …, An) for all values of C using the Bayes theorem – Choose value of C that maximizes P(C | A1, A2, …, An) – Equivalent to choosing value of C that maximizes P(A1, A2, …, An|C) P(C) • How to estimate P(A1, A2, …, An | C )? )( )()|( )|( 21 21 21 n n n AAAP CPCAAAP AAACP    
  • 25. Naïve Bayes Classifier • Assume independence among attributes Ai when class is given: – P(A1, A2, …, An |Cj) = P(A1| Cj) P(A2| Cj)… P(An| Cj) – Can estimate P(Ai| Cj) for all Ai and Cj. – New unknown record is classified to Cj if P(Cj)  P(Ai| Cj) is maximal.
  • 26. Artificial Neural Networks (ANN) )( tXwIY i ii   Perceptron Model )( tXwsignY i ii   or • Model is an assembly of inter-connected nodes and weighted links • Output node sums up each of its input value according to the weights of its links • Compare output node against some threshold t  X1 X2 X3 Y Black box w1 t Output node Input nodes w2 w3
  • 27. General Structure of ANN Activation function g(Si ) Si Oi I1 I2 I3 wi1 wi2 wi3 Oi Neuron iInput Output threshold, t Input Layer Hidden Layer Output Layer x1 x2 x3 x4 x5 y Training ANN means learning the weights of the neurons as well as t 1 ( ) 1 x sigmoid x e   ( )i i i Y sigmoid w X t 
  • 28. Backpropagation algorithm – Gradient Descent is illustrated using single weight w1 of w – Preferred values for w1 minimize – Optimal value for w1 is w1* SSE w1L w1RW1* W1   2 ( , )i i i SSE Y f w X 
  • 29. Backpropagation algorithm – Direction for adjusting wCURRENT is negative sign of derivative at SSE at wCURRENT – To adjust, use magnitude of the derivative of SSE at wCURRENT – When curve steep, adjustment large – When curve nearly flat, adjustment small – Learning Rate η has values [0, 1] )( CURRENTw SSE sign    )( CURRENT CURRENT w SSE w    
  • 30. Support Vector Machines • Find hyperplane maximizes the margin => B1 is better than B2 B1 B2 b11 b12 b21 b22 margin
  • 31. Support Vector Machines B1 b11 b12 0 bxw  1 bxw  1 bxw  1 ( ) if w x b 1 ( ) 1 ( ) if w x b 1 positive class f x negative class                  2 |||| 2 Margin w 
  • 32. Support Vector Machines • We want to maximize: – Which is equivalent to minimizing: – But subjected to the following constraints: – This is a constrained optimization problem. Numerical approaches, e.g. quadratic programming, can be used to solve it. 2 |||| 2 Margin w  i i 1 if w x b 1 ( ) 1 if w x b 1if x           2 |||| )( 2 w wL  
  • 33. Support Vector Machines • Decision Function for classifying a given data z  i i i i SV f(z) = sign y x z + b        
  • 34. Nonlinear Support Vector Machines • What if decision boundary is not linear?• What if decision boundary is not linear?
  • 35. Nonlinear Support Vector Machines • Transform data vector X into new dimensional space • Some Kernel Functions can be used to compute the dot product between any two given original data vectors in the new data space (without the need of actual data transformation).
  • 36. Ensemble Methods • Construct a set of classifiers from the training data • Predict class label of previously unseen records by aggregating predictions made by multiple classifiers
  • 37. General Idea Original Training data ....D1 D2 Dt-1 Dt D Step 1: Create Multiple Data Sets C1 C2 Ct -1 Ct Step 2: Build Multiple Classifiers C* Step 3: Combine Classifiers
  • 38. Why does it work?        25 13 25 06.0)1( 25 i ii i  • Suppose there are 25 base classifiers – Each classifier has error rate,  = 0.35 – Assume classifiers are independent – Probability that the ensemble classifier makes a wrong prediction:
  • 39. Examples of Ensemble Methods • How to generate an ensemble of classifiers? – Bagging – Boosting
  • 40. Bagging • Sampling with replacement • Build classifier on each bootstrap sample • Each sample has probability 1 - (1 – 1/n)n of being selected in each round Original Data 1 2 3 4 5 6 7 8 9 10 Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9 Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2 Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7
  • 41. Boosting • An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records – Initially, all N records are assigned equal weights – Unlike bagging, weights may change at the end of boosting round
  • 42. Boosting • Records that are wrongly classified will have their weights increased • Records that are classified correctly will have their weights decreased Original Data 1 2 3 4 5 6 7 8 9 10 Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3 Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2 Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4 • Example 4 is hard to classify • Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds
  • 43. What is Cluster Analysis? • Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups Inter-cluster distances are maximized Intra-cluster distances are minimized
  • 44. Applications of Cluster Analysis • Understanding – Group related documents for browsing, group customers into segments or group stocks with similar price fluctuations • Summarization – Reduce the size of large data sets by sampling data from each cluster
  • 45. K-means Clustering • Each cluster is associated with a centroid (center point) • Each data point is assigned to the cluster with the closest centroid • Number of clusters, K, must be specified • The basic algorithm is very simple
  • 46. K-Means Algorithm -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 1 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 3 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 4 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 6
  • 47. Limitations of K-means • K-means has problems when clusters are of differing – Sizes – Densities – Non-globular shapes • K-means has problems when the data contains outliers.
  • 48. Hierarchical Clustering • Produces a set of nested clusters organized as a hierarchical tree • Can be visualized as a dendrogram – A tree like diagram that records the sequences of merges or splits 1 3 2 5 4 6 0 0.05 0.1 0.15 0.2 1 2 3 4 5 6 1 2 3 4 5
  • 49. Agglomerative Clustering Algorithm • A popular hierarchical clustering technique • Basic algorithm is straightforward 1. Compute the proximity matrix (similarities between pairs of clusters) 2. Let each data point be a cluster 3. Repeat 4. Merge the two closest clusters 5. Update the proximity matrix 6. Until only a single cluster remains May not be suitable for large datasets due to the cost of computing and updating the proximity matrix
  • 50. How to Define Inter-Cluster Similarity p1 p3 p5 p4 p2 p1 p2 p3 p4 p5 . . . . . . Similarity? • MIN • MAX • Group Average • Distance Between Centroids • Ward’s Method uses squared error Proximity Matrix
  • 51. Hierarchical Clustering: Comparison Group Average Ward’s Method 1 2 3 4 5 6 1 2 5 3 4 MIN MAX 1 2 3 4 5 6 1 2 5 3 4 1 2 3 4 5 6 1 2 5 3 41 2 3 4 5 6 1 2 3 4 5
  • 52. Other Issues • Data Cleaning • Data Sampling • Dimension Reduction • Data Visualization • Over fitting and Under fitting Problems • Imbalance Issues