SlideShare a Scribd company logo
A M R I N D E R A R O R A
ONLINE ALGORITHMS IN
MACHINE LEARNING
BRIEF INTRODUCTION/CONTACT INFO
CTO at BizMerlin
aarora@bizmerlin.com
www.bizmerlin.com
Adjunct Faculty at GWU/CS
Algorithms
amrinder@gwu.edu
www.gwu.edu
+1 571 276 8807
Arora - Online Algorithms Machine Learning 2
Second Edition
ISBN: 978-1-63487-073-3
ONLINE ALGORITHMS IN
MACHINE LEARNING
• First, let us understand a basic machine learning
problem.
• For example, let us consider: classification
Arora - Online Algorithms Machine Learning 3
CLASSIFICATION
• Given: A collection of records (training set), where
each record contains a set of attributes, and a
class.
• Find: A model for class attribute as a function of the
values of other attributes.
• Goal: previously unseen records should be assigned
a class as accurately as possible.
• A test set is used to determine the accuracy of the
model. Usually, the given data set is divided into
training and test sets, with training set used to build
the model and test set used to validate it.
Arora - Online Algorithms Machine Learning 4
ILLUSTRATING CLASSIFICATION TASK
Apply
Model
Induction
Deduction
Learn
Model
Model
Tid Attrib1 Attrib2 Attrib3 Class
1 Yes Large 125K No
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No
8 No Small 85K Yes
9 No Medium 75K No
10 No Small 90K Yes
10
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Learning
algorithm
Training Set
Arora - Online Algorithms Machine Learning 5
EXAMPLES OF CLASSIFICATION TASK
• Predict tax returns as “clean” or “need an
audit”
• Predicting tumor cells as benign or
malignant
• Classifying credit card transactions
as legitimate or fraudulent
• Classifying secondary structures of protein
as alpha-helix, beta-sheet, or random
coil
• Categorizing news stories as finance,
weather, entertainment, sports, etc
Arora - Online Algorithms Machine Learning 6
CLASSIFICATION TECHNIQUES
• Decision Tree based Methods
• Rule-based Methods
• Memory based reasoning
• Neural Networks
• Naïve Bayes and Bayesian Belief Networks
• Support Vector Machines
Arora - Online Algorithms Machine Learning 7
EXAMPLE OF A DECISION TREE
• Decision Trees are an intuitive example of
classification techniques
• income < $40K
• job > 5 yrs then good risk
• job < 5 yrs then bad risk
• income > $40K
• high debt then bad risk
• low debt then good risk
Arora - Online Algorithms Machine Learning 8
SO, WE HAVE DIFFERENT KINDS OF
CLASSIFIERS..
• Different decision trees based on Hunt’s
• C4.5
• Naïve Bayes
• Support Vector Machine
• Each of these models can be considered as an “expert”.
• We do not know how good each “expert” will perform in
an actual setting
• This is where online algorithms in machine learning an
help us.
Arora - Online Algorithms Machine Learning 9
ONLINE ALGORITHMS IN MACHINE
LEARNING
• Given m experts, each given an output (0,1)
• We want to be able predict the output
• After each try, we are told the result.
• Goal: After some time, we want to be able to do
“not much worse” than the best expert (without
knowing beforehand who was a good expert)
Arora - Online Algorithms Machine Learning 10
“WEIGHTED MAJORITY” –
ALGORITHM 1
• Initialize the weights of all experts w1..wn to 1
• At each step, take the majority decision.
• That is, output 1 if weighted average of experts saying 1 is at
least 0.5
• After each step, halve the weight of each expert
who was wrong (leave the weight of correct
experts unchanged)
Arora - Online Algorithms Machine Learning 11
PERFORMANCEOFWM-A1
Proof
• Suppose WM-A1 makes M mistakes
• After each mistake, total weight goes down by ¼. So, it is no more
than n(3/4)M
• [All initial weights are 1, so initial total weight = n]
• After each mistake, best expert’s weight goes down by ½. So, it is
no more than 1/2m
• So, 1/2m ≤ n(3/4)M
• [Best expert’s weight is no more than the total weight.]
Arora - Online Algorithms Machine Learning 12
The number of mistakes made by Weighted
Majority- Algorithm 1 is never more than 2.41 (m
+ lg n), where m is the number of mistakes made
by best expert.
PERFORMANCE OF WM-A1
Proof (cont.)
1/2m ≤ n(3/4)M
 (4/3)M ≤ n 2m
 M lg (4/3) ≤ lg n + m
 M ≤ [1 / lg (4/3)] [m + lg n]
 M ≤ 2.41 [m + lg n]
Arora - Online Algorithms Machine Learning 13
The number of mistakes made by Weighted
Majority- Algorithm 1 is never more than 2.41 (m
+ lg n), where m is the number of mistakes made
by best expert, and n is number of experts.
“WEIGHTED MAJORITY” –
ALGORITHM 2
• Initialize the weights of all experts w1..wn to 1
• At each step, take the probability decision. That is,
output 1 with probability that is equal to sum of
weights of experts that say 1 (divided by total
weight).
• After each step, multiply the weight of each expert
who was wrong by β (leave the weight of correct
experts unchanged)
Arora - Online Algorithms Machine Learning 14
PERFORMANCEOFWM-A2
For β = ½, this is:
1.39m + 2 ln n
For β = 3/4, this is:
1.15m + 4 ln n
Arora - Online Algorithms Machine Learning 15
The number of mistakes made by Weighted
Majority- Algorithm 2 is never more than (m ln
(1/ β) + ln n)/(1- β), where m is the number of
mistakes made by best expert.
PERFORMANCE OF WM-A2
Proof
 Suppose we have seen t tries so far.
 Let Fi be the fraction of total weight on the wrong answers at the i-th
trial.
 Suppose WM-A2 makes M mistakes.
 Therefore M = {i=1 to t} { Fi }
 [Why? Because, in each try, probability of mistake = Fi]
 Suppose best expert makes m mistakes.
 After each mistake, best expert’s weight gets multiplied by β. So, it is
no more than βm
 During each round, the total weight changes as:
 W  W (1 – (1-β) Fi )
Arora - Online Algorithms Machine Learning 16
PERFORMANCE OF WM-A2
Proof (cont.)
 Therefore, at the end of t tries, total weight:
W = n  {i= 1 to t} {1 – (1 – β) Fi}
 Since total weight ≥ weight of best expert:
n  {i= 1 to t} {1 – (1 – β) Fi} ≥ βm
 Taking natural logs:
ln n + {i=1 to t} ln {1 – (1 – β) Fi} ≥ m ln β
 Reversing the inequality (multiply by -1):
– ln n – {i=1 to t} ln {1 – (1 – β) Fi} ≤ m ln (1/β)
 A bit of math: – ln (1 – x) > x
– ln n + (1 – β) {i=1 to t} {Fi} ≤ m ln (1/β)
 – ln n + (1 – β) M ≤ m ln (1/β)
 M ≤ {m ln (1/β) + ln n} / {1 – β}
Arora - Online Algorithms Machine Learning 17
SUMMARY
The number of mistakes made by Weighted
Majority- Algorithm 2 is never more than (m ln (1/
β) + ln n)/(1- β), where m is the number of
mistakes made by best expert.
Arora - Online Algorithms Machine Learning 18
WHY DOES THIS ALL MATTER?
• Many practical applications use techniques
such as ensemble models.
• Ensemble models are a generalization of the
simple majority algorithms we discussed in
this presentation
• There are many relevant practical
applications
• Pandora, Netflix and other Recommendation Engines
• Government and Commercial targeting systems
http://www.fda.gov/predict
Arora - Online Algorithms Machine Learning 19
Q&A
• Ask anything you want..
Arora - Online Algorithms Machine Learning 20
PIZZA TIME!
You better cut the pizza in four pieces
because I'm not hungry enough to eat
six.
--Yogi Berra
Arora - Online Algorithms Machine Learning 21
Arora - Online Algorithms Machine Learning 22
APPENDIX 1
MORE ON DECISION TREES
DECISION TREE INDUCTION
• Many Algorithms:
• Hunt’s Algorithm (one of the earliest)
• CART
• ID3, C4.5
• SLIQ,SPRINT
Arora - Online Algorithms Machine Learning 23
GENERAL STRUCTURE OF HUNT’S
ALGORITHM
• Let Dt be the set of training records that reach a
node t
• General Procedure:
• If Dt contains records that belong the same class
yt, then t is a leaf node labeled as yt
• If Dt is an empty set, then t is a leaf node labeled
by the default class, yd
• If Dt contains records that belong to more than
one class, use an attribute test to split the data
into smaller subsets. Recursively apply the
procedure to each subset.
Arora - Online Algorithms Machine Learning 24
MEASURES OF NODE IMPURITY
• Gini Index
• Entropy
• Misclassification error
Arora - Online Algorithms Machine Learning 25

j
tjptGINI 2
)]|([1)(
 j
tjptjptEntropy )|(log)|()(
)|(max1)( tiPtError i


More Related Content

PDF
Lecture 2 role of algorithms in computing
PPTX
#1 designandanalysis of algo
PPT
Design and analysis of Algorithm By Dr. B. J. Mohite
PDF
01 Analysis of Algorithms: Introduction
PDF
Lecture 1 objective and course plan
PDF
Daa notes 2
PPTX
Algorithm analysis (All in one)
PDF
ADA complete notes
Lecture 2 role of algorithms in computing
#1 designandanalysis of algo
Design and analysis of Algorithm By Dr. B. J. Mohite
01 Analysis of Algorithms: Introduction
Lecture 1 objective and course plan
Daa notes 2
Algorithm analysis (All in one)
ADA complete notes

What's hot (19)

PDF
Daa notes 1
PPTX
Daa unit 1
PPT
Design and Analysis of Algorithms
PDF
Lecture 3 insertion sort and complexity analysis
PDF
Introduction to Algorithms Complexity Analysis
PDF
Design and analysis of computer algorithms
PDF
Design and analysis of algorithms
PDF
Algorithms lecture 3
PDF
Lecture 1 (bce-7)
PPT
Aad introduction
PDF
Design & Analysis of Algorithms Lecture Notes
PPT
Daa presentation 97
PDF
L05 language model_part2
PPTX
Algorithm and Data Structures - Basic of IT Problem Solving
PPTX
Daa unit 5
PPT
Algorithm And analysis Lecture 03& 04-time complexity.
PDF
Fundamentals of algorithms
PPTX
Design and analysis of algorithms - Abstract View
PDF
Design & Analysis Of Algorithm
Daa notes 1
Daa unit 1
Design and Analysis of Algorithms
Lecture 3 insertion sort and complexity analysis
Introduction to Algorithms Complexity Analysis
Design and analysis of computer algorithms
Design and analysis of algorithms
Algorithms lecture 3
Lecture 1 (bce-7)
Aad introduction
Design & Analysis of Algorithms Lecture Notes
Daa presentation 97
L05 language model_part2
Algorithm and Data Structures - Basic of IT Problem Solving
Daa unit 5
Algorithm And analysis Lecture 03& 04-time complexity.
Fundamentals of algorithms
Design and analysis of algorithms - Abstract View
Design & Analysis Of Algorithm
Ad

Viewers also liked (20)

PDF
Online Machine Learning: introduction and examples
PDF
Classification Based Machine Learning Algorithms
PDF
OnlineClassifiers
PDF
Distributed Online Machine Learning Framework for Big Data
PPTX
Online Optimization Problem-1 (Online machine learning)
PPTX
Algorithmic Puzzles
PPTX
Euclid's Algorithm for Greatest Common Divisor - Time Complexity Analysis
PPTX
Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...
PPTX
Dynamic Programming - Part II
PPTX
Scaling up Machine Learning Algorithms for Classification
PDF
A use case of online machine learning using Jubatus
PPTX
NP completeness
PPTX
Dynamic Programming - Part 1
PPTX
Graph Traversal Algorithms - Depth First Search Traversal
PDF
IBMModel2
DOC
Main single agent machine learning algorithms
PPT
Thinking about nlp
PDF
Cost savings from auto-scaling of network resources using machine learning
PPTX
Deep learning for text analytics
PPTX
Online Machine Learning: introduction and examples
Classification Based Machine Learning Algorithms
OnlineClassifiers
Distributed Online Machine Learning Framework for Big Data
Online Optimization Problem-1 (Online machine learning)
Algorithmic Puzzles
Euclid's Algorithm for Greatest Common Divisor - Time Complexity Analysis
Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...
Dynamic Programming - Part II
Scaling up Machine Learning Algorithms for Classification
A use case of online machine learning using Jubatus
NP completeness
Dynamic Programming - Part 1
Graph Traversal Algorithms - Depth First Search Traversal
IBMModel2
Main single agent machine learning algorithms
Thinking about nlp
Cost savings from auto-scaling of network resources using machine learning
Deep learning for text analytics
Ad

Similar to Online algorithms in Machine Learning (20)

PPTX
Online Algorithms - An Introduction
PDF
IRJET- Machine Learning
PDF
[系列活動] Machine Learning 機器學習課程
PPT
Real-time ranking with concept drift using expert advice
PDF
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
PPTX
Unit 1-ML (1) (1).pptx
PDF
C3.5.1
PPT
Ensembles_Unit_IV.ppt
PDF
Ml intro
PDF
Ml intro
PPT
Decision Tree, Naive Bayes, Association Rule Mining, Support Vector Machine, ...
PDF
GBM theory code and parameters
PDF
Inductive Reasoning and (one of) the Foundations of Machine Learning
PDF
Decision tree learning
PPT
svm-jain.ppt
PPTX
Chapter 4 Classification
PDF
Machine learning in science and industry — day 2
DOCX
ML Project(by-Ethem-Alpaydin)-Introduction-to-Machine-Learni-24.docx
PDF
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
Online Algorithms - An Introduction
IRJET- Machine Learning
[系列活動] Machine Learning 機器學習課程
Real-time ranking with concept drift using expert advice
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
Unit 1-ML (1) (1).pptx
C3.5.1
Ensembles_Unit_IV.ppt
Ml intro
Ml intro
Decision Tree, Naive Bayes, Association Rule Mining, Support Vector Machine, ...
GBM theory code and parameters
Inductive Reasoning and (one of) the Foundations of Machine Learning
Decision tree learning
svm-jain.ppt
Chapter 4 Classification
Machine learning in science and industry — day 2
ML Project(by-Ethem-Alpaydin)-Introduction-to-Machine-Learni-24.docx
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines

More from Amrinder Arora (20)

PPTX
NP-Completeness - II
PPTX
Graph Traversal Algorithms - Breadth First Search
PDF
Bron Kerbosch Algorithm - Presentation by Jun Zhai, Tianhang Qiang and Yizhen...
PDF
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana
PDF
Stopping Rule for Secretory Problem - Presentation by Haoyang Tian, Wesam Als...
PDF
Proof of O(log *n) time complexity of Union find (Presentation by Wei Li, Zeh...
PDF
Proof of Cook Levin Theorem (Presentation by Xiechuan, Song and Shuo)
PPTX
Greedy Algorithms
PPTX
Divide and Conquer - Part II - Quickselect and Closest Pair of Points
PPTX
Divide and Conquer - Part 1
PPTX
Asymptotic Notation and Data Structures
PPTX
Introduction to Algorithms and Asymptotic Notation
PPTX
Set Operations - Union Find and Bloom Filters
PPTX
Binomial Heaps and Fibonacci Heaps
PPTX
R-Trees and Geospatial Data Structures
PPTX
Tries - Tree Based Structures for Strings
PPTX
Splay Trees and Self Organizing Data Structures
PPTX
BTrees - Great alternative to Red Black, AVL and other BSTs
PPTX
Binary Search Trees - AVL and Red Black
PPTX
Graphs, Trees, Paths and Their Representations
NP-Completeness - II
Graph Traversal Algorithms - Breadth First Search
Bron Kerbosch Algorithm - Presentation by Jun Zhai, Tianhang Qiang and Yizhen...
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana
Stopping Rule for Secretory Problem - Presentation by Haoyang Tian, Wesam Als...
Proof of O(log *n) time complexity of Union find (Presentation by Wei Li, Zeh...
Proof of Cook Levin Theorem (Presentation by Xiechuan, Song and Shuo)
Greedy Algorithms
Divide and Conquer - Part II - Quickselect and Closest Pair of Points
Divide and Conquer - Part 1
Asymptotic Notation and Data Structures
Introduction to Algorithms and Asymptotic Notation
Set Operations - Union Find and Bloom Filters
Binomial Heaps and Fibonacci Heaps
R-Trees and Geospatial Data Structures
Tries - Tree Based Structures for Strings
Splay Trees and Self Organizing Data Structures
BTrees - Great alternative to Red Black, AVL and other BSTs
Binary Search Trees - AVL and Red Black
Graphs, Trees, Paths and Their Representations

Recently uploaded (20)

PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Classroom Observation Tools for Teachers
PPTX
Cell Types and Its function , kingdom of life
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Institutional Correction lecture only . . .
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
RMMM.pdf make it easy to upload and study
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
01-Introduction-to-Information-Management.pdf
PDF
Basic Mud Logging Guide for educational purpose
Final Presentation General Medicine 03-08-2024.pptx
FourierSeries-QuestionsWithAnswers(Part-A).pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPH.pptx obstetrics and gynecology in nursing
Pharma ospi slides which help in ospi learning
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Classroom Observation Tools for Teachers
Cell Types and Its function , kingdom of life
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
Institutional Correction lecture only . . .
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
RMMM.pdf make it easy to upload and study
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Week 4 Term 3 Study Techniques revisited.pptx
TR - Agricultural Crops Production NC III.pdf
01-Introduction-to-Information-Management.pdf
Basic Mud Logging Guide for educational purpose

Online algorithms in Machine Learning

  • 1. A M R I N D E R A R O R A ONLINE ALGORITHMS IN MACHINE LEARNING
  • 2. BRIEF INTRODUCTION/CONTACT INFO CTO at BizMerlin aarora@bizmerlin.com www.bizmerlin.com Adjunct Faculty at GWU/CS Algorithms amrinder@gwu.edu www.gwu.edu +1 571 276 8807 Arora - Online Algorithms Machine Learning 2 Second Edition ISBN: 978-1-63487-073-3
  • 3. ONLINE ALGORITHMS IN MACHINE LEARNING • First, let us understand a basic machine learning problem. • For example, let us consider: classification Arora - Online Algorithms Machine Learning 3
  • 4. CLASSIFICATION • Given: A collection of records (training set), where each record contains a set of attributes, and a class. • Find: A model for class attribute as a function of the values of other attributes. • Goal: previously unseen records should be assigned a class as accurately as possible. • A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it. Arora - Online Algorithms Machine Learning 4
  • 5. ILLUSTRATING CLASSIFICATION TASK Apply Model Induction Deduction Learn Model Model Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes 10 Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10 Test Set Learning algorithm Training Set Arora - Online Algorithms Machine Learning 5
  • 6. EXAMPLES OF CLASSIFICATION TASK • Predict tax returns as “clean” or “need an audit” • Predicting tumor cells as benign or malignant • Classifying credit card transactions as legitimate or fraudulent • Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil • Categorizing news stories as finance, weather, entertainment, sports, etc Arora - Online Algorithms Machine Learning 6
  • 7. CLASSIFICATION TECHNIQUES • Decision Tree based Methods • Rule-based Methods • Memory based reasoning • Neural Networks • Naïve Bayes and Bayesian Belief Networks • Support Vector Machines Arora - Online Algorithms Machine Learning 7
  • 8. EXAMPLE OF A DECISION TREE • Decision Trees are an intuitive example of classification techniques • income < $40K • job > 5 yrs then good risk • job < 5 yrs then bad risk • income > $40K • high debt then bad risk • low debt then good risk Arora - Online Algorithms Machine Learning 8
  • 9. SO, WE HAVE DIFFERENT KINDS OF CLASSIFIERS.. • Different decision trees based on Hunt’s • C4.5 • Naïve Bayes • Support Vector Machine • Each of these models can be considered as an “expert”. • We do not know how good each “expert” will perform in an actual setting • This is where online algorithms in machine learning an help us. Arora - Online Algorithms Machine Learning 9
  • 10. ONLINE ALGORITHMS IN MACHINE LEARNING • Given m experts, each given an output (0,1) • We want to be able predict the output • After each try, we are told the result. • Goal: After some time, we want to be able to do “not much worse” than the best expert (without knowing beforehand who was a good expert) Arora - Online Algorithms Machine Learning 10
  • 11. “WEIGHTED MAJORITY” – ALGORITHM 1 • Initialize the weights of all experts w1..wn to 1 • At each step, take the majority decision. • That is, output 1 if weighted average of experts saying 1 is at least 0.5 • After each step, halve the weight of each expert who was wrong (leave the weight of correct experts unchanged) Arora - Online Algorithms Machine Learning 11
  • 12. PERFORMANCEOFWM-A1 Proof • Suppose WM-A1 makes M mistakes • After each mistake, total weight goes down by ¼. So, it is no more than n(3/4)M • [All initial weights are 1, so initial total weight = n] • After each mistake, best expert’s weight goes down by ½. So, it is no more than 1/2m • So, 1/2m ≤ n(3/4)M • [Best expert’s weight is no more than the total weight.] Arora - Online Algorithms Machine Learning 12 The number of mistakes made by Weighted Majority- Algorithm 1 is never more than 2.41 (m + lg n), where m is the number of mistakes made by best expert.
  • 13. PERFORMANCE OF WM-A1 Proof (cont.) 1/2m ≤ n(3/4)M  (4/3)M ≤ n 2m  M lg (4/3) ≤ lg n + m  M ≤ [1 / lg (4/3)] [m + lg n]  M ≤ 2.41 [m + lg n] Arora - Online Algorithms Machine Learning 13 The number of mistakes made by Weighted Majority- Algorithm 1 is never more than 2.41 (m + lg n), where m is the number of mistakes made by best expert, and n is number of experts.
  • 14. “WEIGHTED MAJORITY” – ALGORITHM 2 • Initialize the weights of all experts w1..wn to 1 • At each step, take the probability decision. That is, output 1 with probability that is equal to sum of weights of experts that say 1 (divided by total weight). • After each step, multiply the weight of each expert who was wrong by β (leave the weight of correct experts unchanged) Arora - Online Algorithms Machine Learning 14
  • 15. PERFORMANCEOFWM-A2 For β = ½, this is: 1.39m + 2 ln n For β = 3/4, this is: 1.15m + 4 ln n Arora - Online Algorithms Machine Learning 15 The number of mistakes made by Weighted Majority- Algorithm 2 is never more than (m ln (1/ β) + ln n)/(1- β), where m is the number of mistakes made by best expert.
  • 16. PERFORMANCE OF WM-A2 Proof  Suppose we have seen t tries so far.  Let Fi be the fraction of total weight on the wrong answers at the i-th trial.  Suppose WM-A2 makes M mistakes.  Therefore M = {i=1 to t} { Fi }  [Why? Because, in each try, probability of mistake = Fi]  Suppose best expert makes m mistakes.  After each mistake, best expert’s weight gets multiplied by β. So, it is no more than βm  During each round, the total weight changes as:  W  W (1 – (1-β) Fi ) Arora - Online Algorithms Machine Learning 16
  • 17. PERFORMANCE OF WM-A2 Proof (cont.)  Therefore, at the end of t tries, total weight: W = n  {i= 1 to t} {1 – (1 – β) Fi}  Since total weight ≥ weight of best expert: n  {i= 1 to t} {1 – (1 – β) Fi} ≥ βm  Taking natural logs: ln n + {i=1 to t} ln {1 – (1 – β) Fi} ≥ m ln β  Reversing the inequality (multiply by -1): – ln n – {i=1 to t} ln {1 – (1 – β) Fi} ≤ m ln (1/β)  A bit of math: – ln (1 – x) > x – ln n + (1 – β) {i=1 to t} {Fi} ≤ m ln (1/β)  – ln n + (1 – β) M ≤ m ln (1/β)  M ≤ {m ln (1/β) + ln n} / {1 – β} Arora - Online Algorithms Machine Learning 17
  • 18. SUMMARY The number of mistakes made by Weighted Majority- Algorithm 2 is never more than (m ln (1/ β) + ln n)/(1- β), where m is the number of mistakes made by best expert. Arora - Online Algorithms Machine Learning 18
  • 19. WHY DOES THIS ALL MATTER? • Many practical applications use techniques such as ensemble models. • Ensemble models are a generalization of the simple majority algorithms we discussed in this presentation • There are many relevant practical applications • Pandora, Netflix and other Recommendation Engines • Government and Commercial targeting systems http://www.fda.gov/predict Arora - Online Algorithms Machine Learning 19
  • 20. Q&A • Ask anything you want.. Arora - Online Algorithms Machine Learning 20
  • 21. PIZZA TIME! You better cut the pizza in four pieces because I'm not hungry enough to eat six. --Yogi Berra Arora - Online Algorithms Machine Learning 21
  • 22. Arora - Online Algorithms Machine Learning 22 APPENDIX 1 MORE ON DECISION TREES
  • 23. DECISION TREE INDUCTION • Many Algorithms: • Hunt’s Algorithm (one of the earliest) • CART • ID3, C4.5 • SLIQ,SPRINT Arora - Online Algorithms Machine Learning 23
  • 24. GENERAL STRUCTURE OF HUNT’S ALGORITHM • Let Dt be the set of training records that reach a node t • General Procedure: • If Dt contains records that belong the same class yt, then t is a leaf node labeled as yt • If Dt is an empty set, then t is a leaf node labeled by the default class, yd • If Dt contains records that belong to more than one class, use an attribute test to split the data into smaller subsets. Recursively apply the procedure to each subset. Arora - Online Algorithms Machine Learning 24
  • 25. MEASURES OF NODE IMPURITY • Gini Index • Entropy • Misclassification error Arora - Online Algorithms Machine Learning 25  j tjptGINI 2 )]|([1)(  j tjptjptEntropy )|(log)|()( )|(max1)( tiPtError i 