SlideShare a Scribd company logo
Large-Scale
Machine Learning
Armin shoughi
Dec 2019
machine-learning framework
In this section we introduce the framework for machine-learning algorithms and give the basic
definitions.
Machine learning
Tom Mitchel : “A computer program is said to learn from experience
E with respect to some class of tasks T and performance measure P,
if its performance at tasks in T, as measured by P, improves with
experience E.”
▪ Supervise learning >> 1)Regression 2)Classification
▪ Unsupervised learning >> 1)Clustering 2)Assuciation Rules
▪ Reinforcement learning
3
Training set
The data to which a machine-learning (often abbreviated ML)
algorithm is applied is called a training set. A training set consists of a
set of pairs (x,y), called training examples, where
• x is a vector of values, often called a feature vector,
• y is the class label, or simply output, the classification value for x.
A B C D
1 122 78 23 1
2 12 64 65 1
3 65 257 82 2
4
Validation and Test set
One general issue regarding the handling of data is that there is a
good reason to withhold some of the available data from the training
set. The remaining data is called the test set. In some cases, we
withhold two sets of training examples, a validation set, as well as the
test set.
5
Batch VS On-line Learning
That is, the entire training set is available at the beginning of the process, and it is all used in
whatever way the algorithm requires to produce a model once and for all. The alternative is
on-line learning, where the training set arrives in a stream and, like any stream, cannot be
revisited after it is processed.
1. Deal with very large training sets, because it does not access more than one training
example at a time.
2. Adapt to changes in the population of training examples as time goes on.
For instance, Google trains its spam-email classifier this way, adapting the classifier for spam
as new kinds of spam email are sent by spammers and indicated to be spam by the
recipients.
6
Feature Selection
Sometimes, the hardest part of designing a good model or classifier
is figuring out what features to use as input to the learning algorithm.
For example, spam is often generated by particular hosts, either
those belonging to the spammers, or hosts that have been coopted
into a “botnet” for the purpose of generating spam. Thus, including
the originating host or originating email address into the feature
vector describing an email might enable us to design a better
classifier and lower the error rate.
7
Perceptron
8
A perceptron is a linear binary classifier. Its input is a vector
x = [x1,x2,...,xd]
with real-valued components. Associated with the perceptron is a vector of weights
w = [w1,w2,...,wd]
also with real-valued components.
Each perceptron has a threshold θ. The output of the perceptron is +1 if w.x > θ,
and the output is −1 if w.x < θ. The special case where w.x = θ will always be
regarded as “wrong,”
Training a Perceptron
9
The following method will converge to some hyperplane that separates the positive and
negative examples, provided one exists.
1. Initialize the weight vector w to all 0’s.
2. Pick a learning-rate parameter η, which is a small, positive real number. The choice of η
affects the convergence of the perceptron. If η is too small, then convergence is slow; if it
is too big, then the decision boundary will “dance around” and again will converge slowly,
if at all.
3. Consider each training example t = (x,y) in turn.
(a) Let y′ = w.x.
(b) If y′ and y have the same sign, then do nothing; t is properly classified.
(c) However, if y′ and y have different signs, or y′ = 0, replace w by w + ηyx. That is,
adjust w slightly in the direction of x.
Example
10
and Viagra the of nigeria y
1 1 1 1 1 1 +1
2 0 1 0 0 1 -1
3 0 0 1 0 0 +1
4 1 0 0 1 0 -1
5 1 0 1 1 1 +1
η = 1/2,
Convergence of Perceptron's
11
As we mentioned at the beginning of this section, if the data points
are linearly separable, then the perceptron algorithm will converge to
a separator. However, if the data is not linearly separable, then the
algorithm will eventually repeat a weight vector and loop infinitely.
Parallel Implementation of Perceptron's
12
The training of a perceptron is an inherently sequential process. If the
number of dimensions of the vectors involved is huge, then we might
obtain some parallelism by computing dot products in parallel
The Map Function: Each Map task is given a chunk of training examples, and each Map
task knows the current weight vector w. The Map task computes w.x for each feature vector
x = [x1,x2,...,xk]
in its chunk and compares that dot product with the label y, which is +1 or −1, associated with
x.
The Reduce Function: For each key i, the Reduce task that handles key i adds all the
associated increments and then adds that sum to the ith component of w.
SVM Support Vector Machine
13
As we mentioned at the beginning of this section, if the data points are linearly
separable, then the perceptron algorithm will converge to a separator. However, if
the data is not linearly separable, then the algorithm will eventually repeat a weight
vector and loop infinitely.
SVM Support Vector Machine
14
How to Calculate this Distance ?
15
Test
16
1. What's On-line learning has the advantages?
2. Explain common tests for Perceptron termination.
3. Let us consider training a perceptron to recognize spam email. The training set consists
of pairs (x,y) where x is a vector of 0’s and 1’s, with each component xi corresponding
to the presence (xi = 1) or absence (xi = 0) of a particular word in the email. The value
of y is +1 if the email is known to be spam and −1 if it is known not to be spam. While
the number of words found in the training set of emails is very large, we shall use a
simplified example where there are only five words: “and,” “Viagra,” “the,” “of,” and
“Nigeria.” Figure gives the training set of six vectors and their corresponding classes.
and Viagra the of Nigeria y
1 1 0 0 1 0 +1
2 1 1 1 0 1 -1
3 0 0 1 1 0 +1
4 1 1 0 0 0 -1

More Related Content

PPTX
Unsupervised learning (clustering)
PDF
PAC Learning
PPTX
Hyperparameter Tuning
PPTX
Data Mining: Mining stream time series and sequence data
PPTX
Presentation on K-Means Clustering
PDF
Outlier detection method introduction
PPTX
Module 4 part_1
PPTX
Cluster and Grid Computing
Unsupervised learning (clustering)
PAC Learning
Hyperparameter Tuning
Data Mining: Mining stream time series and sequence data
Presentation on K-Means Clustering
Outlier detection method introduction
Module 4 part_1
Cluster and Grid Computing

What's hot (20)

PDF
Cloud Security Governance
PPT
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
PDF
Uint-4 Mining Data Stream.pdf
PPTX
Principal Component Analysis (PCA) and LDA PPT Slides
ODP
Data Analysis in Python
PPT
Adaptive Resonance Theory
PPT
Data preprocessing in Data Mining
PPTX
AI Algorithms
PPTX
Hierarchical clustering
PPTX
Recurrent Neural Networks (RNNs)
PPT
Pattern recognition
PPTX
Open Cloud Consortium Overview (01-10-10 V6)
PPTX
Computational learning theory
PPT
Data Mining Techniques
PDF
Exploratory data analysis data visualization
PPTX
Instance based learning
PDF
Introduction to Machine Learning with SciKit-Learn
PPT
1.7 data reduction
Cloud Security Governance
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Uint-4 Mining Data Stream.pdf
Principal Component Analysis (PCA) and LDA PPT Slides
Data Analysis in Python
Adaptive Resonance Theory
Data preprocessing in Data Mining
AI Algorithms
Hierarchical clustering
Recurrent Neural Networks (RNNs)
Pattern recognition
Open Cloud Consortium Overview (01-10-10 V6)
Computational learning theory
Data Mining Techniques
Exploratory data analysis data visualization
Instance based learning
Introduction to Machine Learning with SciKit-Learn
1.7 data reduction
Ad

Similar to large scale Machine learning (20)

PDF
Week 1.pdf
PPT
SOFTCOMPUTERING TECHNICS - Unit
PDF
Machine learning
PPT
ch11.pptKGYUTFYDRERLJIOUY7T867RVHOJIP09-IU08Y7GTFGYU890-I90UIYGUI
PPT
ch11.ppt kusrdsdagrfzgfdfgdfsdsfdsxgdhfjgh50s
PDF
Getting started with Machine Learning
PPTX
Neural network 20161210_jintaekseo
PPTX
Machine learning with neural networks
PDF
Machine Learning_Unit 2_Full.ppt.pdf
PPT
SOFT COMPUTERING TECHNICS -Unit 1
PDF
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
PDF
Machine Learning.pdf
PPT
lec1.ppt
PPT
Lecture 1
PDF
Deep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdf
PPTX
PPTX
Deep learning crash course
PPT
notes as .ppt
PPTX
Deeplearning for Computer Vision PPT with
PPTX
the better way artificial learning nural networks in best method .pptx
Week 1.pdf
SOFTCOMPUTERING TECHNICS - Unit
Machine learning
ch11.pptKGYUTFYDRERLJIOUY7T867RVHOJIP09-IU08Y7GTFGYU890-I90UIYGUI
ch11.ppt kusrdsdagrfzgfdfgdfsdsfdsxgdhfjgh50s
Getting started with Machine Learning
Neural network 20161210_jintaekseo
Machine learning with neural networks
Machine Learning_Unit 2_Full.ppt.pdf
SOFT COMPUTERING TECHNICS -Unit 1
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
Machine Learning.pdf
lec1.ppt
Lecture 1
Deep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdf
Deep learning crash course
notes as .ppt
Deeplearning for Computer Vision PPT with
the better way artificial learning nural networks in best method .pptx
Ad

Recently uploaded (20)

PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PPTX
Fluid dynamics vivavoce presentation of prakash
PDF
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPTX
endocrine - management of adrenal incidentaloma.pptx
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PPT
veterinary parasitology ````````````.ppt
PDF
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PPTX
Lesson-1-Introduction-to-the-Study-of-Chemistry.pptx
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PPTX
PMR- PPT.pptx for students and doctors tt
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PDF
The Land of Punt — A research by Dhani Irwanto
PPTX
gene cloning powerpoint for general biology 2
PPTX
BIOMOLECULES PPT........................
PPTX
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
PPT
Presentation of a Romanian Institutee 2.
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
Fluid dynamics vivavoce presentation of prakash
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
endocrine - management of adrenal incidentaloma.pptx
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
veterinary parasitology ````````````.ppt
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
Hypertension_Training_materials_English_2024[1] (1).pptx
lecture 2026 of Sjogren's syndrome l .pdf
Lesson-1-Introduction-to-the-Study-of-Chemistry.pptx
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PMR- PPT.pptx for students and doctors tt
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
The Land of Punt — A research by Dhani Irwanto
gene cloning powerpoint for general biology 2
BIOMOLECULES PPT........................
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
Presentation of a Romanian Institutee 2.

large scale Machine learning

  • 2. machine-learning framework In this section we introduce the framework for machine-learning algorithms and give the basic definitions.
  • 3. Machine learning Tom Mitchel : “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” ▪ Supervise learning >> 1)Regression 2)Classification ▪ Unsupervised learning >> 1)Clustering 2)Assuciation Rules ▪ Reinforcement learning 3
  • 4. Training set The data to which a machine-learning (often abbreviated ML) algorithm is applied is called a training set. A training set consists of a set of pairs (x,y), called training examples, where • x is a vector of values, often called a feature vector, • y is the class label, or simply output, the classification value for x. A B C D 1 122 78 23 1 2 12 64 65 1 3 65 257 82 2 4
  • 5. Validation and Test set One general issue regarding the handling of data is that there is a good reason to withhold some of the available data from the training set. The remaining data is called the test set. In some cases, we withhold two sets of training examples, a validation set, as well as the test set. 5
  • 6. Batch VS On-line Learning That is, the entire training set is available at the beginning of the process, and it is all used in whatever way the algorithm requires to produce a model once and for all. The alternative is on-line learning, where the training set arrives in a stream and, like any stream, cannot be revisited after it is processed. 1. Deal with very large training sets, because it does not access more than one training example at a time. 2. Adapt to changes in the population of training examples as time goes on. For instance, Google trains its spam-email classifier this way, adapting the classifier for spam as new kinds of spam email are sent by spammers and indicated to be spam by the recipients. 6
  • 7. Feature Selection Sometimes, the hardest part of designing a good model or classifier is figuring out what features to use as input to the learning algorithm. For example, spam is often generated by particular hosts, either those belonging to the spammers, or hosts that have been coopted into a “botnet” for the purpose of generating spam. Thus, including the originating host or originating email address into the feature vector describing an email might enable us to design a better classifier and lower the error rate. 7
  • 8. Perceptron 8 A perceptron is a linear binary classifier. Its input is a vector x = [x1,x2,...,xd] with real-valued components. Associated with the perceptron is a vector of weights w = [w1,w2,...,wd] also with real-valued components. Each perceptron has a threshold θ. The output of the perceptron is +1 if w.x > θ, and the output is −1 if w.x < θ. The special case where w.x = θ will always be regarded as “wrong,”
  • 9. Training a Perceptron 9 The following method will converge to some hyperplane that separates the positive and negative examples, provided one exists. 1. Initialize the weight vector w to all 0’s. 2. Pick a learning-rate parameter η, which is a small, positive real number. The choice of η affects the convergence of the perceptron. If η is too small, then convergence is slow; if it is too big, then the decision boundary will “dance around” and again will converge slowly, if at all. 3. Consider each training example t = (x,y) in turn. (a) Let y′ = w.x. (b) If y′ and y have the same sign, then do nothing; t is properly classified. (c) However, if y′ and y have different signs, or y′ = 0, replace w by w + ηyx. That is, adjust w slightly in the direction of x.
  • 10. Example 10 and Viagra the of nigeria y 1 1 1 1 1 1 +1 2 0 1 0 0 1 -1 3 0 0 1 0 0 +1 4 1 0 0 1 0 -1 5 1 0 1 1 1 +1 η = 1/2,
  • 11. Convergence of Perceptron's 11 As we mentioned at the beginning of this section, if the data points are linearly separable, then the perceptron algorithm will converge to a separator. However, if the data is not linearly separable, then the algorithm will eventually repeat a weight vector and loop infinitely.
  • 12. Parallel Implementation of Perceptron's 12 The training of a perceptron is an inherently sequential process. If the number of dimensions of the vectors involved is huge, then we might obtain some parallelism by computing dot products in parallel The Map Function: Each Map task is given a chunk of training examples, and each Map task knows the current weight vector w. The Map task computes w.x for each feature vector x = [x1,x2,...,xk] in its chunk and compares that dot product with the label y, which is +1 or −1, associated with x. The Reduce Function: For each key i, the Reduce task that handles key i adds all the associated increments and then adds that sum to the ith component of w.
  • 13. SVM Support Vector Machine 13 As we mentioned at the beginning of this section, if the data points are linearly separable, then the perceptron algorithm will converge to a separator. However, if the data is not linearly separable, then the algorithm will eventually repeat a weight vector and loop infinitely.
  • 14. SVM Support Vector Machine 14
  • 15. How to Calculate this Distance ? 15
  • 16. Test 16 1. What's On-line learning has the advantages? 2. Explain common tests for Perceptron termination. 3. Let us consider training a perceptron to recognize spam email. The training set consists of pairs (x,y) where x is a vector of 0’s and 1’s, with each component xi corresponding to the presence (xi = 1) or absence (xi = 0) of a particular word in the email. The value of y is +1 if the email is known to be spam and −1 if it is known not to be spam. While the number of words found in the training set of emails is very large, we shall use a simplified example where there are only five words: “and,” “Viagra,” “the,” “of,” and “Nigeria.” Figure gives the training set of six vectors and their corresponding classes. and Viagra the of Nigeria y 1 1 0 0 1 0 +1 2 1 1 1 0 1 -1 3 0 0 1 1 0 +1 4 1 1 0 0 0 -1