SlideShare a Scribd company logo
Active learning  Haidong Shi, Nanyi Zeng Nov,12,2008
outline 1, introduction 2, active learning with different methods 3, Employing EM and Pool-based Active  Learning for Text Classification
introduction 1, what is active learning? 2, why active learning is very important? 3, real life applications
introduction The primary goal of machine learning is to derive general patterns from a limited amount of data. For most of supervised and unsupervised learning tasks, what we usually do is to gather a significant quantity of data which is randomly sampled from the underlying population distribution and then we induce a classifier or model.
introduction But this process is some kind of passive! Often the most time-consuming and costly task in the process is the gathering the data. Example: document classification. Easy to get large pool of unlabeled document. But it will take a long time for people to hand-label thousands of training document.
introduction Now, instead of randomly picking documents to be manually labeled fro our training set, we want to choose and query documents from the pool very carefully. Based on this carefully choosing training data, we can improve the model’s performance very quickly.
what is active learning? The process of guiding the sampling process by querying for certain types of instances based upon the data that we have seen so far is called active learning.
Why important The process of labeling the training data is not only time-consuming sometimes but also very expensive.  Less training data we need, more we will save.
applications Text classification Web page classification Junk mail recognition
active learning with different methods 1, Neural Networks 2, Bayesian rule 3, SVM No matter which method will be used, the core problem will be the same.
active learning with different methods The core problem is how to select training points actively? In other words, which training points will be informative to the model?
Apply active learning to Neural Networks Combined with query by committee Algorithm: 1, Samples two Neural Networks from distribution 2, when the unlabeled data arrives, use the committee to predict the label 3, if they disagree with each other, select it.
Apply active learning to Neural Networks Usually: Committee may contain more than two members. Classification problem will count #(+) and #(-) to see whether they are close. Regression problem use the variance of the outputs as the criteria of disagreement. Stop criteria is maximum model variance dropped below a set threshold.
Apply active learning to Baysian theory Characteristic:  build a probabilistic classifier which not only make classification decisions, but estimate their uncertainty Try to estimate P(Ci | w), posterior probability that an example with pattern w belongs to class Ci. P(Ci | w) will directly guide to select training data.
Apply active learning to SVM Problem is also what is the criteria for uncertainty sampling? we can improve the model by attempting to maximally narrow the existing margin. If the points which lie on or close to the dividing hyperplane are added into training points, it will on average narrow the margin most.
Apply active learning to Baysian theory About the stopping criteria: All unlabeled data in the margin have been exhausted, we will stop. Why? Only unlabeled data within the margin will have great effect on our learner. Labeling an example in the margin may shift the margin such that examples that were previously outside are now inside.
Employing EM and Pool-based Active Learning for Text Classification Motivation: Obtaining labeled training examples for  text classification is often expensive, while gathering large quantities of unlabeled examples is very cheap. Here, we will present techniques for using a large pool of unlabeled documents to improve text classification when labeled training data is sparse.
How data are produced We approach the task of text classification from a bayesian learning perspective, we assume that the documents are generated by a particular parametric model, mixture of naïve nayes, and one-to-one correspondence between class labels and the mixture components.
How data are produced ,Indicate the jth component and jth class Each component c j  is parameterized by a disjoint subset of  θ The likelihood of a document is a sum of total probability over all generative components
How data are produced Document di is considered to be an ordered list of word events. Wdik represents the word in position k of document di. The subscript of w indicates an index into the vocabulary V=<w1,w2,…,w|v|>. Combined with standard naïve bayes assumption: words are independent from other words in the same document.
goal Given these underlying assumption of how data are produced, the task of learning a text classifier consists of forming an estimate of  θ , written as  based on a training set.
Formular If the task is to classify a test document di into a single class, simply select the class with the highest posterior probability: argmax j  P(c j |d j ;  )
EM and Unlabeled data problem: When naïve bayes is given just a small set of labeled training data, classifiction accuracy will suffer because variance in the parameter estimates of the generative model will be high.
EM and Unlabeled data Motivation: By augmenting this small labeled set with a large set of unlabeled data and combining the two pools with EM, we can improve the parameter estimates.
implementation of EM Initialize just using labeled data. E-step: Calculate probabilistically-weighted class labels, P(cj | dj;  ), for every unlabeled document. M-step: Calculate a new maximum likelihood estimate for  θ  using all the labeled data. The process iterate until  reaches a fixed point
Active learning with EM
Disagreement creteria To measure committee disagreement for each document using Kullback-Leibler divergence to the mean. KL divergence to the mean is an average of the KL divergence between each distribution and the mean of all the distributions:
END Thank you

More Related Content

PPT
Active learning lecture
PPTX
Active learning
PPTX
Naive bayes
PDF
Link prediction
PDF
Uncertainty Quantification in AI
PPTX
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
ODP
NAIVE BAYES CLASSIFIER
PDF
Hierarchical Clustering
Active learning lecture
Active learning
Naive bayes
Link prediction
Uncertainty Quantification in AI
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
NAIVE BAYES CLASSIFIER
Hierarchical Clustering

What's hot (20)

PDF
Uncertainty Estimation in Deep Learning
PPTX
Federated Learning
PPT
Alpaydin - Chapter 2
PDF
Naive Bayes
PPTX
Graph Neural Network - Introduction
PPTX
Causal inference in practice
PDF
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...
PDF
Naive Bayes Classifier
PDF
Modeling uncertainty in deep learning
PPT
PDF
Generative adversarial networks
PPTX
Random walk on Graphs
PPTX
Introduction to XGboost
PDF
Robustness in deep learning
PPTX
Machine Learning - Ensemble Methods
PDF
Tutorial on Deep Generative Models
PDF
Uncertainty in Deep Learning
PDF
An Introduction to Causal Discovery, a Bayesian Network Approach
PDF
Introduction to Machine Learning Classifiers
PPT
. An introduction to machine learning and probabilistic ...
Uncertainty Estimation in Deep Learning
Federated Learning
Alpaydin - Chapter 2
Naive Bayes
Graph Neural Network - Introduction
Causal inference in practice
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...
Naive Bayes Classifier
Modeling uncertainty in deep learning
Generative adversarial networks
Random walk on Graphs
Introduction to XGboost
Robustness in deep learning
Machine Learning - Ensemble Methods
Tutorial on Deep Generative Models
Uncertainty in Deep Learning
An Introduction to Causal Discovery, a Bayesian Network Approach
Introduction to Machine Learning Classifiers
. An introduction to machine learning and probabilistic ...
Ad

Viewers also liked (6)

PPTX
Active learning
PPTX
Activate Your Learners! Active Learning Strategies for Fostering Participant ...
PPTX
Active Learning Strategy
PPT
Active learning powerpoint presentation
PPTX
A new approach to promoting active learning in the classroom ppt
PPTX
Active learning
Active learning
Activate Your Learners! Active Learning Strategies for Fostering Participant ...
Active Learning Strategy
Active learning powerpoint presentation
A new approach to promoting active learning in the classroom ppt
Active learning
Ad

Similar to activelearning.ppt (20)

PPTX
Deep Neural Networks in Text Classification using Active Learning
PPT
Learning On The Border:Active Learning in Imbalanced classification Data
PDF
An Overview of Naïve Bayes Classifier
PPT
slides
PPT
slides
PPT
Introduction to Machine Learning Aristotelis Tsirigos
PPT
learning.ppt
PPT
Lecture 2
PPT
AML_030607.ppt
PDF
International Journal of Engineering Research and Development (IJERD)
DOC
Lecture #1: Introduction to machine learning (ML)
PDF
WISS 2015 - Machine Learning lecture by Ludovic Samper
PPT
Cs221 lecture5-fall11
PPT
coppin chapter 10e.ppt
PPT
lecture15-supervised.ppt
PPT
Using binary classifiers
PDF
Machine Learning: Learning with data
PDF
One talk Machine Learning
PDF
Machine Learning - Deep Learning
Deep Neural Networks in Text Classification using Active Learning
Learning On The Border:Active Learning in Imbalanced classification Data
An Overview of Naïve Bayes Classifier
slides
slides
Introduction to Machine Learning Aristotelis Tsirigos
learning.ppt
Lecture 2
AML_030607.ppt
International Journal of Engineering Research and Development (IJERD)
Lecture #1: Introduction to machine learning (ML)
WISS 2015 - Machine Learning lecture by Ludovic Samper
Cs221 lecture5-fall11
coppin chapter 10e.ppt
lecture15-supervised.ppt
Using binary classifiers
Machine Learning: Learning with data
One talk Machine Learning
Machine Learning - Deep Learning

More from butest (20)

PDF
EL MODELO DE NEGOCIO DE YOUTUBE
DOC
1. MPEG I.B.P frame之不同
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPT
Timeline: The Life of Michael Jackson
DOCX
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPTX
Com 380, Summer II
PPT
PPT
DOCX
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
DOC
MICHAEL JACKSON.doc
PPTX
Social Networks: Twitter Facebook SL - Slide 1
PPT
Facebook
DOCX
Executive Summary Hare Chevrolet is a General Motors dealership ...
DOC
Welcome to the Dougherty County Public Library's Facebook and ...
DOC
NEWS ANNOUNCEMENT
DOC
C-2100 Ultra Zoom.doc
DOC
MAC Printing on ITS Printers.doc.doc
DOC
Mac OS X Guide.doc
DOC
hier
DOC
WEB DESIGN!
EL MODELO DE NEGOCIO DE YOUTUBE
1. MPEG I.B.P frame之不同
LESSONS FROM THE MICHAEL JACKSON TRIAL
Timeline: The Life of Michael Jackson
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
LESSONS FROM THE MICHAEL JACKSON TRIAL
Com 380, Summer II
PPT
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
MICHAEL JACKSON.doc
Social Networks: Twitter Facebook SL - Slide 1
Facebook
Executive Summary Hare Chevrolet is a General Motors dealership ...
Welcome to the Dougherty County Public Library's Facebook and ...
NEWS ANNOUNCEMENT
C-2100 Ultra Zoom.doc
MAC Printing on ITS Printers.doc.doc
Mac OS X Guide.doc
hier
WEB DESIGN!

activelearning.ppt

  • 1. Active learning Haidong Shi, Nanyi Zeng Nov,12,2008
  • 2. outline 1, introduction 2, active learning with different methods 3, Employing EM and Pool-based Active Learning for Text Classification
  • 3. introduction 1, what is active learning? 2, why active learning is very important? 3, real life applications
  • 4. introduction The primary goal of machine learning is to derive general patterns from a limited amount of data. For most of supervised and unsupervised learning tasks, what we usually do is to gather a significant quantity of data which is randomly sampled from the underlying population distribution and then we induce a classifier or model.
  • 5. introduction But this process is some kind of passive! Often the most time-consuming and costly task in the process is the gathering the data. Example: document classification. Easy to get large pool of unlabeled document. But it will take a long time for people to hand-label thousands of training document.
  • 6. introduction Now, instead of randomly picking documents to be manually labeled fro our training set, we want to choose and query documents from the pool very carefully. Based on this carefully choosing training data, we can improve the model’s performance very quickly.
  • 7. what is active learning? The process of guiding the sampling process by querying for certain types of instances based upon the data that we have seen so far is called active learning.
  • 8. Why important The process of labeling the training data is not only time-consuming sometimes but also very expensive. Less training data we need, more we will save.
  • 9. applications Text classification Web page classification Junk mail recognition
  • 10. active learning with different methods 1, Neural Networks 2, Bayesian rule 3, SVM No matter which method will be used, the core problem will be the same.
  • 11. active learning with different methods The core problem is how to select training points actively? In other words, which training points will be informative to the model?
  • 12. Apply active learning to Neural Networks Combined with query by committee Algorithm: 1, Samples two Neural Networks from distribution 2, when the unlabeled data arrives, use the committee to predict the label 3, if they disagree with each other, select it.
  • 13. Apply active learning to Neural Networks Usually: Committee may contain more than two members. Classification problem will count #(+) and #(-) to see whether they are close. Regression problem use the variance of the outputs as the criteria of disagreement. Stop criteria is maximum model variance dropped below a set threshold.
  • 14. Apply active learning to Baysian theory Characteristic: build a probabilistic classifier which not only make classification decisions, but estimate their uncertainty Try to estimate P(Ci | w), posterior probability that an example with pattern w belongs to class Ci. P(Ci | w) will directly guide to select training data.
  • 15. Apply active learning to SVM Problem is also what is the criteria for uncertainty sampling? we can improve the model by attempting to maximally narrow the existing margin. If the points which lie on or close to the dividing hyperplane are added into training points, it will on average narrow the margin most.
  • 16. Apply active learning to Baysian theory About the stopping criteria: All unlabeled data in the margin have been exhausted, we will stop. Why? Only unlabeled data within the margin will have great effect on our learner. Labeling an example in the margin may shift the margin such that examples that were previously outside are now inside.
  • 17. Employing EM and Pool-based Active Learning for Text Classification Motivation: Obtaining labeled training examples for text classification is often expensive, while gathering large quantities of unlabeled examples is very cheap. Here, we will present techniques for using a large pool of unlabeled documents to improve text classification when labeled training data is sparse.
  • 18. How data are produced We approach the task of text classification from a bayesian learning perspective, we assume that the documents are generated by a particular parametric model, mixture of naïve nayes, and one-to-one correspondence between class labels and the mixture components.
  • 19. How data are produced ,Indicate the jth component and jth class Each component c j is parameterized by a disjoint subset of θ The likelihood of a document is a sum of total probability over all generative components
  • 20. How data are produced Document di is considered to be an ordered list of word events. Wdik represents the word in position k of document di. The subscript of w indicates an index into the vocabulary V=<w1,w2,…,w|v|>. Combined with standard naïve bayes assumption: words are independent from other words in the same document.
  • 21. goal Given these underlying assumption of how data are produced, the task of learning a text classifier consists of forming an estimate of θ , written as based on a training set.
  • 22. Formular If the task is to classify a test document di into a single class, simply select the class with the highest posterior probability: argmax j P(c j |d j ; )
  • 23. EM and Unlabeled data problem: When naïve bayes is given just a small set of labeled training data, classifiction accuracy will suffer because variance in the parameter estimates of the generative model will be high.
  • 24. EM and Unlabeled data Motivation: By augmenting this small labeled set with a large set of unlabeled data and combining the two pools with EM, we can improve the parameter estimates.
  • 25. implementation of EM Initialize just using labeled data. E-step: Calculate probabilistically-weighted class labels, P(cj | dj; ), for every unlabeled document. M-step: Calculate a new maximum likelihood estimate for θ using all the labeled data. The process iterate until reaches a fixed point
  • 27. Disagreement creteria To measure committee disagreement for each document using Kullback-Leibler divergence to the mean. KL divergence to the mean is an average of the KL divergence between each distribution and the mean of all the distributions: