SlideShare a Scribd company logo
Learning on the Border: Active Learning in Imbalanced Data Classification SeyDa, Jian Hungm Leon Bottou, C. Lee Giles,CIKM’07 Presenter: Ping-Hua Yang
Abstract This paper is concerned with the class imbalance problem which has been known to hinder the learning performance of classification algorithms. In this paper, we demonstrate that active learning is capable of solving the class imbalance  problem by providing the learner more balanced classes.
Outline Introduction Related work Methodology Performance metrics Datasets Experiments and empirical evaluation Conclusions
Introduction  A training dataset is called imbalanced If at least on of the classes are represented by significantly less number of instances than the others Examples of application which may have class imbalance problem Predicting pre-term births Identifying fraudulent credit card transactions Text categorization Classification of protein databases Detecting certain objects from satellite images
Introduction  In classification tasks, it’s generally more important to correctly classify the minority class instances. Mispredicting a rare event can result in more serious consequences. However in classification problems with imbalanced data, the minority class examples are more likely to misclassified than the majority class examples. Due to machine learning algorithms design principles. This paper proposes a framework which has high prediction performance to overcome this serious data mining problem. In this paper we propose several methods : Using active learning strategy to deal with the class imbalance problem. SVM based active learning selection strategy
Introduction  Many research direction in recent to overcome the class imbalance problem is to resample the original training dataset to create more balanced classes.
Related work Assign distinct costs to the classification( [ P.Domingos, 1999],[M.Pazzani,C.Merz,P.Nurphy,K.Ali,T.Hume,C.Brunk,1994] The misclassification penalty for the positive class is assigned a higher value than negative class. This method requires tuning to come up with good penalty parameters for the misclassified examples. Resample the original training dataset([N.V.Chawla,2002],[N.Japkowicz,1995],[M.Kubat,1997],[C.X.Ling,1998]) either by over-sampling the minority class or under-sampling the majority class. under-sampling may discard potentially useful data that could be important. over-sampling may suffer from over-fitting and due to the increase in the number of samples, the training time of the learning process gets longer.
Related work Use a recognition-based, instead of discrimination-based inductive leaning([N.Japkowicz,1995],[B.Raskutti,2004]) These methods attempt to measure the amount of similarity between a query object and the target class. The major drawback of those methods is the need for tuning the similarity threshold SMOTE – synthetic minority over-sampling technique([N.V.Chawla,2002]) Minority class is oversampled by creating synthetic examples rather than with replacement. Preprocessing the data with SMOTE may lead to improved prediction performance. SMOTE brings more computational cost and increased number of training data.
Methodology  Active leaning has access to a vast pool of unlabeled examples, and it tries to make a clever choice to select the most informative example to obtain its label. The strategy of selecting instances within the margin addresses the imbalanced dataset classification very well.
Methodology
Support Vector Machines SVM are well known for their strong theoretical foundations, generalization performance and ability to handle high dimensional. Using the training set, SVM builds an optimum hyper-plane. This hyper-plane can be obtained by minimizing the following objective function w : the norm of the hyper-plane , y i :  labels , Φ(*) : mapping from input space to feature space , b : offset , ξ : slack variables
Support Vector Machines The dual representation of equation 1 K(x i ,x j ) = (Φ(x i ),Φ(x j )) , α i : Lagrange multipliers After solving the QP problem, the norm of the hyper-plane  w  can be represented as
Support Vector Machines
Active Learning In equation 5, only the support vectors have an effect on the SVM solution. If SVM is retrained with a new set of data which only consist of those support vectors, the learner will end up finding the same hyper-plane. In this paper we will focus on a form of selection strategy called SVM based active learning. In SVMs, the most informative instance is the closest instance to the hyper-plane. For the possibility of  a non-symmetric version space, there are more complex selection methods.
Active Learning with Small Pools The basic working principle of SVM active learning Learn an SVM on the existing training data, Select the closest instance to the hyper-plane, Add the new selected instance to the training set and train again. In classical active learning, the search for the most informative instance is performed over the entire dataset. For large datasets, searching the entire training set is a very time consuming and computationally expensive task. By using the “ 59 trick ” which does not necessitate a full search through the entire dataset but locates an approximate most informative sample.
Active Learning with Small Pools The selection method picks L(L<< # training instances) random training samples in each iteration and selects the best among them. Pick a random subset X L , L<<N  Select the closest sample x i  from X L  based on the condition that x i  is among the top p% closest instances in X N  with probability (1- η )
Active Learning with Small Pools
Online SVM for Active Learning LASVM is an online kernel classifier which relies on the traditional soft margin SVM formulation. LASVM requires less computational resources. LASVM’s model is continually modified as it process training instances on by one. Each LASVM iteration receives a fresh training example and tries to optimize the dual cost function in equation(3) using feasible direction searches. The new informative instance selected by active learning can be integrated to the existing model without retraining all the samples repeatedly.
Active Learning with Early Stopping A theoretically sound method to stop training is when the examples in the margin are exhausted. To check if there are still unseen training instances in the margin The distance of the new selected is compared to the support vector of current model. A practical implementation of this idea is to count the number of support vectors during the active learning training process.
Active Learning with Early Stopping
Performance Metrics Classification accuracy is not a good metric to evaluate classifiers in applications with class imbalance problem. In non-separable case, if the misclassification penalty  C  is very small, SVM learner simply tends to classify every example as negative. G-means sensitivity : TruPos./(TruPos.+FalseNeg.)  specifity : TrueNeg./(TrueNeg.+FalsePos.)  Receiver Operating Curve (ROC) ROC is a plot of the true positive rate against the false positive rate as the decision threshold is changed.
Performance Matrix Area under the ROC Curve (AUC) Numerical measure of a model’s discrimination performance. Show how successfully and correctly the model separates the positive and negative. Precision Recall Break-Even Point (PRBEP) PRBEP is the accuracy of the positive class at the threshold where precision equals to recall.
Datasets
Experiments and Empirical evaluation
Experiments and Empirical evaluation
Experiments and Empirical evaluation
Experiments and Empirical evaluation
Experiments and Empirical evaluation
Experiments and Empirical evaluation
Experiments and Empirical evaluation
Experiments and Empirical evaluation
Conclusions The results of this paper offer a better understanding of the effect of the active learning on imbalanced datasets. By focusing the learning on the instances around the classification boundary, more balanced class distributions can be provided to the learner in the earlier steps of the learning.

More Related Content

PPTX
Borderline Smote
PDF
Classification Based Machine Learning Algorithms
DOC
SVM Tutorial
PPTX
Event classification & prediction using support vector machine
PPTX
Support vector machine
PPT
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
PPTX
Machine Learning using Support Vector Machine
PPTX
Binary Class and Multi Class Strategies for Machine Learning
Borderline Smote
Classification Based Machine Learning Algorithms
SVM Tutorial
Event classification & prediction using support vector machine
Support vector machine
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine Learning using Support Vector Machine
Binary Class and Multi Class Strategies for Machine Learning

What's hot (20)

PPTX
Unsupervised learning clustering
PDF
Machine learning in science and industry — day 1
PDF
Classification
PDF
Data Science - Part IX - Support Vector Machine
PDF
Statistical Pattern recognition(1)
PDF
Machine learning in science and industry — day 2
PDF
Aaa ped-14-Ensemble Learning: About Ensemble Learning
PDF
Machine learning in science and industry — day 4
PPTX
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
PPTX
Decision Trees
PPTX
Support Vector Machine and Implementation using Weka
PDF
Introduction to conventional machine learning techniques
PPTX
Support vector machine-SVM's
PDF
Introduction to Some Tree based Learning Method
PPTX
WEKA: Credibility Evaluating Whats Been Learned
PPTX
Machine learning and linear regression programming
PPTX
sentiment analysis using support vector machine
PPT
Text categorization
PPT
Decision tree and random forest
PPTX
Lecture 09(introduction to machine learning)
Unsupervised learning clustering
Machine learning in science and industry — day 1
Classification
Data Science - Part IX - Support Vector Machine
Statistical Pattern recognition(1)
Machine learning in science and industry — day 2
Aaa ped-14-Ensemble Learning: About Ensemble Learning
Machine learning in science and industry — day 4
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Decision Trees
Support Vector Machine and Implementation using Weka
Introduction to conventional machine learning techniques
Support vector machine-SVM's
Introduction to Some Tree based Learning Method
WEKA: Credibility Evaluating Whats Been Learned
Machine learning and linear regression programming
sentiment analysis using support vector machine
Text categorization
Decision tree and random forest
Lecture 09(introduction to machine learning)
Ad

Similar to Learning On The Border:Active Learning in Imbalanced classification Data (20)

PDF
When deep learners change their mind learning dynamics for active learning
PPTX
Active learning
PPTX
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
PPT
Introduction to Machine Learning Aristotelis Tsirigos
PPT
Support Vector Machines
PDF
Support Vector Machines for Computing Action Mappings in Learning Classifier ...
DOC
SVM Tutorial
PDF
A survey of modified support vector machine using particle of swarm optimizat...
DOCX
introduction to machine learning unit iv
PPTX
How Machine Learning Helps Organizations to Work More Efficiently?
PPT
Lecture 2
PPT
Supervised and unsupervised learning
PPT
November, 2006 CCKM'06 1
PDF
50120140504015
PPTX
Tariku Bokila SVMA Presentation.pptx ddd
PPT
Download It
PPTX
Machine learning interviews day2
PPTX
SVM[Support vector Machine] Machine learning
PPT
Machine Learning workshop by GDSC Amity University Chhattisgarh
PPT
2.6 support vector machines and associative classifiers revised
When deep learners change their mind learning dynamics for active learning
Active learning
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
Introduction to Machine Learning Aristotelis Tsirigos
Support Vector Machines
Support Vector Machines for Computing Action Mappings in Learning Classifier ...
SVM Tutorial
A survey of modified support vector machine using particle of swarm optimizat...
introduction to machine learning unit iv
How Machine Learning Helps Organizations to Work More Efficiently?
Lecture 2
Supervised and unsupervised learning
November, 2006 CCKM'06 1
50120140504015
Tariku Bokila SVMA Presentation.pptx ddd
Download It
Machine learning interviews day2
SVM[Support vector Machine] Machine learning
Machine Learning workshop by GDSC Amity University Chhattisgarh
2.6 support vector machines and associative classifiers revised
Ad

Recently uploaded (20)

PDF
Dr Tran Quoc Bao the first Vietnamese speaker at GITEX DigiHealth Conference ...
PPTX
4.5.1 Financial Governance_Appropriation & Finance.pptx
PDF
way to join Real illuminati agent 0782561496,0756664682
PDF
ABriefOverviewComparisonUCP600_ISP8_URDG_758.pdf
PDF
ADVANCE TAX Reduction using traditional insurance
PDF
Spending, Allocation Choices, and Aging THROUGH Retirement. Are all of these ...
PDF
Mathematical Economics 23lec03slides.pdf
PDF
Understanding University Research Expenditures (1)_compressed.pdf
PDF
Why Ignoring Passive Income for Retirees Could Cost You Big.pdf
PDF
ECONOMICS AND ENTREPRENEURS LESSONSS AND
PDF
Bitcoin Layer August 2025: Power Laws of Bitcoin: The Core and Bubbles
PDF
NAPF_RESPONSE_TO_THE_PENSIONS_COMMISSION_8 _2_.pdf
PDF
illuminati Uganda brotherhood agent in Kampala call 0756664682,0782561496
PDF
Corporate Finance Fundamentals - Course Presentation.pdf
PDF
ECONOMICS AND ENTREPRENEURS LESSONSS AND
PPTX
Unilever_Financial_Analysis_Presentation.pptx
PPTX
Introduction to Managemeng Chapter 1..pptx
PDF
Copia de Minimal 3D Technology Consulting Presentation.pdf
PPTX
Globalization-of-Religion. Contemporary World
PDF
Topic Globalisation and Lifelines of National Economy.pdf
Dr Tran Quoc Bao the first Vietnamese speaker at GITEX DigiHealth Conference ...
4.5.1 Financial Governance_Appropriation & Finance.pptx
way to join Real illuminati agent 0782561496,0756664682
ABriefOverviewComparisonUCP600_ISP8_URDG_758.pdf
ADVANCE TAX Reduction using traditional insurance
Spending, Allocation Choices, and Aging THROUGH Retirement. Are all of these ...
Mathematical Economics 23lec03slides.pdf
Understanding University Research Expenditures (1)_compressed.pdf
Why Ignoring Passive Income for Retirees Could Cost You Big.pdf
ECONOMICS AND ENTREPRENEURS LESSONSS AND
Bitcoin Layer August 2025: Power Laws of Bitcoin: The Core and Bubbles
NAPF_RESPONSE_TO_THE_PENSIONS_COMMISSION_8 _2_.pdf
illuminati Uganda brotherhood agent in Kampala call 0756664682,0782561496
Corporate Finance Fundamentals - Course Presentation.pdf
ECONOMICS AND ENTREPRENEURS LESSONSS AND
Unilever_Financial_Analysis_Presentation.pptx
Introduction to Managemeng Chapter 1..pptx
Copia de Minimal 3D Technology Consulting Presentation.pdf
Globalization-of-Religion. Contemporary World
Topic Globalisation and Lifelines of National Economy.pdf

Learning On The Border:Active Learning in Imbalanced classification Data

  • 1. Learning on the Border: Active Learning in Imbalanced Data Classification SeyDa, Jian Hungm Leon Bottou, C. Lee Giles,CIKM’07 Presenter: Ping-Hua Yang
  • 2. Abstract This paper is concerned with the class imbalance problem which has been known to hinder the learning performance of classification algorithms. In this paper, we demonstrate that active learning is capable of solving the class imbalance problem by providing the learner more balanced classes.
  • 3. Outline Introduction Related work Methodology Performance metrics Datasets Experiments and empirical evaluation Conclusions
  • 4. Introduction A training dataset is called imbalanced If at least on of the classes are represented by significantly less number of instances than the others Examples of application which may have class imbalance problem Predicting pre-term births Identifying fraudulent credit card transactions Text categorization Classification of protein databases Detecting certain objects from satellite images
  • 5. Introduction In classification tasks, it’s generally more important to correctly classify the minority class instances. Mispredicting a rare event can result in more serious consequences. However in classification problems with imbalanced data, the minority class examples are more likely to misclassified than the majority class examples. Due to machine learning algorithms design principles. This paper proposes a framework which has high prediction performance to overcome this serious data mining problem. In this paper we propose several methods : Using active learning strategy to deal with the class imbalance problem. SVM based active learning selection strategy
  • 6. Introduction Many research direction in recent to overcome the class imbalance problem is to resample the original training dataset to create more balanced classes.
  • 7. Related work Assign distinct costs to the classification( [ P.Domingos, 1999],[M.Pazzani,C.Merz,P.Nurphy,K.Ali,T.Hume,C.Brunk,1994] The misclassification penalty for the positive class is assigned a higher value than negative class. This method requires tuning to come up with good penalty parameters for the misclassified examples. Resample the original training dataset([N.V.Chawla,2002],[N.Japkowicz,1995],[M.Kubat,1997],[C.X.Ling,1998]) either by over-sampling the minority class or under-sampling the majority class. under-sampling may discard potentially useful data that could be important. over-sampling may suffer from over-fitting and due to the increase in the number of samples, the training time of the learning process gets longer.
  • 8. Related work Use a recognition-based, instead of discrimination-based inductive leaning([N.Japkowicz,1995],[B.Raskutti,2004]) These methods attempt to measure the amount of similarity between a query object and the target class. The major drawback of those methods is the need for tuning the similarity threshold SMOTE – synthetic minority over-sampling technique([N.V.Chawla,2002]) Minority class is oversampled by creating synthetic examples rather than with replacement. Preprocessing the data with SMOTE may lead to improved prediction performance. SMOTE brings more computational cost and increased number of training data.
  • 9. Methodology Active leaning has access to a vast pool of unlabeled examples, and it tries to make a clever choice to select the most informative example to obtain its label. The strategy of selecting instances within the margin addresses the imbalanced dataset classification very well.
  • 11. Support Vector Machines SVM are well known for their strong theoretical foundations, generalization performance and ability to handle high dimensional. Using the training set, SVM builds an optimum hyper-plane. This hyper-plane can be obtained by minimizing the following objective function w : the norm of the hyper-plane , y i : labels , Φ(*) : mapping from input space to feature space , b : offset , ξ : slack variables
  • 12. Support Vector Machines The dual representation of equation 1 K(x i ,x j ) = (Φ(x i ),Φ(x j )) , α i : Lagrange multipliers After solving the QP problem, the norm of the hyper-plane w can be represented as
  • 14. Active Learning In equation 5, only the support vectors have an effect on the SVM solution. If SVM is retrained with a new set of data which only consist of those support vectors, the learner will end up finding the same hyper-plane. In this paper we will focus on a form of selection strategy called SVM based active learning. In SVMs, the most informative instance is the closest instance to the hyper-plane. For the possibility of a non-symmetric version space, there are more complex selection methods.
  • 15. Active Learning with Small Pools The basic working principle of SVM active learning Learn an SVM on the existing training data, Select the closest instance to the hyper-plane, Add the new selected instance to the training set and train again. In classical active learning, the search for the most informative instance is performed over the entire dataset. For large datasets, searching the entire training set is a very time consuming and computationally expensive task. By using the “ 59 trick ” which does not necessitate a full search through the entire dataset but locates an approximate most informative sample.
  • 16. Active Learning with Small Pools The selection method picks L(L<< # training instances) random training samples in each iteration and selects the best among them. Pick a random subset X L , L<<N Select the closest sample x i from X L based on the condition that x i is among the top p% closest instances in X N with probability (1- η )
  • 17. Active Learning with Small Pools
  • 18. Online SVM for Active Learning LASVM is an online kernel classifier which relies on the traditional soft margin SVM formulation. LASVM requires less computational resources. LASVM’s model is continually modified as it process training instances on by one. Each LASVM iteration receives a fresh training example and tries to optimize the dual cost function in equation(3) using feasible direction searches. The new informative instance selected by active learning can be integrated to the existing model without retraining all the samples repeatedly.
  • 19. Active Learning with Early Stopping A theoretically sound method to stop training is when the examples in the margin are exhausted. To check if there are still unseen training instances in the margin The distance of the new selected is compared to the support vector of current model. A practical implementation of this idea is to count the number of support vectors during the active learning training process.
  • 20. Active Learning with Early Stopping
  • 21. Performance Metrics Classification accuracy is not a good metric to evaluate classifiers in applications with class imbalance problem. In non-separable case, if the misclassification penalty C is very small, SVM learner simply tends to classify every example as negative. G-means sensitivity : TruPos./(TruPos.+FalseNeg.) specifity : TrueNeg./(TrueNeg.+FalsePos.) Receiver Operating Curve (ROC) ROC is a plot of the true positive rate against the false positive rate as the decision threshold is changed.
  • 22. Performance Matrix Area under the ROC Curve (AUC) Numerical measure of a model’s discrimination performance. Show how successfully and correctly the model separates the positive and negative. Precision Recall Break-Even Point (PRBEP) PRBEP is the accuracy of the positive class at the threshold where precision equals to recall.
  • 32. Conclusions The results of this paper offer a better understanding of the effect of the active learning on imbalanced datasets. By focusing the learning on the instances around the classification boundary, more balanced class distributions can be provided to the learner in the earlier steps of the learning.