SlideShare a Scribd company logo
Feature
Selection
Feature Selection
• Selection of a subset of features from a larger pool of available features.
• Goal: to select features that are rich in discriminatory information with respect to
the classification problem at hand.
• A poor choice of features drives the classifier to perform badly.
• Selecting highly informative features is an attempt
• to place classes in the feature space far apart from each other (large between-class
distance)
• to position the data points within each class close to each other (small within-class
variance).
Feature Selection
• Another major issue in feature selection is choosing the number of features l to be
used out of an original n > l
• Reducing this number helps in avoiding overfitting to the specific training
data set and of designing classifiers that result in good generalization
performance—that is, classifiers that perform well when faced with data
outside the training set.
• Before feature selection techniques can be used, a preprocessing stage is
necessary for “housekeeping” purposes, such as removal of outlier points
and data normalization
Feature Selection
OUTLIER REMOVAL
• A point that lies far away from the mean value of the corresponding random variable;
• Points with values far from the rest of the data may cause large errors during the classifier
training phase.
• This is not desirable, especially when the outliers are the result of noisy measurements.
• For normally distributed data, a threshold of 1, 2, or 3 times the standard deviation is used
to define outliers.
• Points that lie away from the mean by a value larger than this threshold are removed.
• However, for non-normal distributions, more rigorous measures should be considered
(e.g., cost functions).`
Feature Selection
DATA NORMALIZATION
m
Feature Selection
• Three types of features selection
• Individual features selection
• Combination of features
• Features subset selection
Individual Feature Selection
• The first step in FS is to look at each feature individually and check whether or not
it is an informative one.
• If not, the feature is discarded.
• To this end, statistical tests are commonly used.
• The idea is to test whether the mean values of a feature differ significantly in two
classes .
• In the case of more than two classes, the test may be applied for each class pair.
• Assuming that the data in the classes are normally distributed, the t-test is a
popular choice.
Individual Feature Selection
HYPOTHESIS TESTING: THE t-TEST
• The goal of the statistical t-test is to determine which of the following two hypotheses is
true:
H0: The mean values of the feature in the two classes are equal. (null hypothesis)
H1: The mean values of the feature in the two classes are different. (Alternate hypothesis)
• If null hypothesis is true, the feature is discarded, i.e., no significant difference between
the means of two classes exists.
• The hypothesis test is carried out against the so-called significance level, α,
which corresponds to the probability of committing an error in our decision.
• Typical values used in practice are α = 0.05 and α = 0.001.
• A significance level of 0.05 indicates a 5% risk of concluding that a difference exists
when there is no actual difference.
Individual Feature Selection
HYPOTHESIS TESTING: THE t-TEST
• The t-test assumes that the values of the features are drawn from normal
distributions
• If the feature distributions turn out not to be normal, one should choose a
nonparametric statistical significance test, such as the Wilcoxon rank sum
test, or the Fisher ratio
Individual Feature Selection
FISHER’S DISCRIMINANT RATIO
• FDR is commonly employed to quantify the discriminatory power of
individual features between two equiprobable classes.
• It is independent of the type of class distribution.
•
associated with the values of a feature in two classes. The FDR is defined
as
CLASS SEPARABILITY MEASURES
• The previous measures quantify the class-discriminatory power of
individual features.
• In this se`ction, we turn our attention from individual features to
combinations of
• features (i.e., feature vectors) and describe measures that quantify class
separability in the respective feature space.
• Three class-separability measures are considered:
• Divergence
• Bhattacharyya distance and
• Scatter matrices
Divergence
Bhattacharyya Distance
FEATURE SUBSET SELECTION
• Reduce the number of features by discarding the less informative ones, using
scalar feature selection.
• Consider the features that survive from the previous step in different combinations
in order to keep the “best” combination.
• Exhaustive search
• Sequential forward and backward selection
Evaluating
ML Models
Confusion Matrix
• In a two-class (positive and negative) problem, a classifier’s ability to
predict a true or false state gives rise to four output possibilities:
• True positive (TP)
• If the actual class is positive and the classifier also predicts it as positive
• False negative (FN)
• If the actual class is positive, however, the classifier predicts it as negative
• True negative (TN)
• If the actual class is negative, however, the classifier predicts it as positive
• False positive (FP)
• If the actual class is negative and the classifier also predicts it as negative
•
Predicted (10)
Pred Positive
(PP = 7)
Pred Neg
(PN = 3)
Actual
(10)
Positive
(P = 5)
TP = 4
hit
FN = 1
Type II error,
miss
True positive rate (TPR),
recall, sensitivity,
probability of detection
= TP/P = 4/5
False negative rate
(FNR)
= FN/P = 1/5
Negative
(N = 5)
FP = 3
Type I error, false
alarm
TN = 2
Correct
rejection
False positive rate (FPR)
= FP/N = 3/5
True negative rate
(TNR), specificity
=TN/N = 2/5
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦
=
𝑇𝑃 + 𝑇𝑁
𝑃 + 𝑁
= 0.60
Precision, Pxositive
predictive value (PPV)
=TP/PP = 4/7 =
False omission
rate (FOR)
= FN/PN = 1/3
F1 score
2 ×
PPV×TPR
PPV+TPR
=0.7
False discovery rate
(FDR)
= FP/PP = 3/7
Negative
predictive value
(NPV) = TN/PN =
2/3
Week_8machine learning (feature selection).pptx
Week_8machine learning (feature selection).pptx
1. Mean normalized features from a data are given below:
• x1 = [0.6 0 -0.6] and x2 = [0.5 -0.1 -0.4];
a. Write the data matrix X
b.Find the covariance matrix.
2. If eigenvalues and eigen vectors of the covariance matrix of the data
matrix are:
• 𝑙𝑎𝑚𝑏𝑑𝑎 = 0.1, 1.1
• 𝐸 =
0.5 −0.9
−0.9 −0.5
a. Write the transformed data matrix in terms of the projection on the
vector that explains maximum variance.

More Related Content

PDF
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
PDF
An introduction to variable and feature selection
PDF
featurers_Machinelearning___________.pdf
PPT
Data mining techniques unit iv
PPTX
04 Classification in Data Mining
PPT
Nbvtalkonfeatureselection
DOCX
Classification Using Decision Trees and RulesChapter 5.docx
PPTX
Build_Machine_Learning_System for Machine Learning Course
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
An introduction to variable and feature selection
featurers_Machinelearning___________.pdf
Data mining techniques unit iv
04 Classification in Data Mining
Nbvtalkonfeatureselection
Classification Using Decision Trees and RulesChapter 5.docx
Build_Machine_Learning_System for Machine Learning Course

Similar to Week_8machine learning (feature selection).pptx (20)

PPT
feature selection slides share and types of features selection
PPTX
Unit 4 Classification of data and more info on it
PPTX
Performance Measurement for Machine Leaning.pptx
PPTX
Feature selection
PPT
Experimental Design1.ppt
PPTX
UNIT 5.pptx
PPTX
evolution of data mining.pptx
PPT
CONFUSION MATRIX.ppt
PPTX
Wrapper feature selection method
PPTX
6. Nonparametric Test_JASP.ppt with full example
PPTX
Multivariate analysis variable presentation
PPTX
IME 672 - Classifier Evaluation I.pptx
PPTX
measures pptekejwjejejeeisjsjsjdjdjdjjddjdj
PDF
Machine learning Mind Map
PPTX
Analyzing Road Side Breath Test Data with WEKA
PDF
Research 101: Quantitative Data Preparation
PPTX
Inferential statistics quantitative data - single sample and 2 groups
PPTX
Week 11 Model Evalaution Model Evaluation
PPTX
Nonparametric Test_JAMOVI.ppt- Statistical data analysis
PPTX
Basic stat analysis using excel
feature selection slides share and types of features selection
Unit 4 Classification of data and more info on it
Performance Measurement for Machine Leaning.pptx
Feature selection
Experimental Design1.ppt
UNIT 5.pptx
evolution of data mining.pptx
CONFUSION MATRIX.ppt
Wrapper feature selection method
6. Nonparametric Test_JASP.ppt with full example
Multivariate analysis variable presentation
IME 672 - Classifier Evaluation I.pptx
measures pptekejwjejejeeisjsjsjdjdjdjjddjdj
Machine learning Mind Map
Analyzing Road Side Breath Test Data with WEKA
Research 101: Quantitative Data Preparation
Inferential statistics quantitative data - single sample and 2 groups
Week 11 Model Evalaution Model Evaluation
Nonparametric Test_JAMOVI.ppt- Statistical data analysis
Basic stat analysis using excel
Ad

More from muhammadsamroz (7)

PPTX
GEOTHERMAL ENERGY heat from the earth.pptx
PPT
Chapter 2 Linear Control System .ppt
PPTX
FYP_Final_Presentation_Sample_N.U.S.T@Co
PPTX
Week_1 Machine Learning introduction.pptx
PPTX
[TEKNOFEST 2024] BMS Issues in Electric-Powered Application[1].pptx
PPTX
Title Defense on AI based portable CBC kit .pptx
PPTX
FTSW - Stay Navy document to us navy .pptx
GEOTHERMAL ENERGY heat from the earth.pptx
Chapter 2 Linear Control System .ppt
FYP_Final_Presentation_Sample_N.U.S.T@Co
Week_1 Machine Learning introduction.pptx
[TEKNOFEST 2024] BMS Issues in Electric-Powered Application[1].pptx
Title Defense on AI based portable CBC kit .pptx
FTSW - Stay Navy document to us navy .pptx
Ad

Recently uploaded (20)

PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
Pre independence Education in Inndia.pdf
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PPTX
Pharma ospi slides which help in ospi learning
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
2.FourierTransform-ShortQuestionswithAnswers.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
Pre independence Education in Inndia.pdf
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Pharma ospi slides which help in ospi learning
Anesthesia in Laparoscopic Surgery in India
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
O7-L3 Supply Chain Operations - ICLT Program
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Microbial disease of the cardiovascular and lymphatic systems
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPH.pptx obstetrics and gynecology in nursing
human mycosis Human fungal infections are called human mycosis..pptx
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Supply Chain Operations Speaking Notes -ICLT Program
O5-L3 Freight Transport Ops (International) V1.pdf
Week 4 Term 3 Study Techniques revisited.pptx

Week_8machine learning (feature selection).pptx

  • 2. Feature Selection • Selection of a subset of features from a larger pool of available features. • Goal: to select features that are rich in discriminatory information with respect to the classification problem at hand. • A poor choice of features drives the classifier to perform badly. • Selecting highly informative features is an attempt • to place classes in the feature space far apart from each other (large between-class distance) • to position the data points within each class close to each other (small within-class variance).
  • 3. Feature Selection • Another major issue in feature selection is choosing the number of features l to be used out of an original n > l • Reducing this number helps in avoiding overfitting to the specific training data set and of designing classifiers that result in good generalization performance—that is, classifiers that perform well when faced with data outside the training set. • Before feature selection techniques can be used, a preprocessing stage is necessary for “housekeeping” purposes, such as removal of outlier points and data normalization
  • 4. Feature Selection OUTLIER REMOVAL • A point that lies far away from the mean value of the corresponding random variable; • Points with values far from the rest of the data may cause large errors during the classifier training phase. • This is not desirable, especially when the outliers are the result of noisy measurements. • For normally distributed data, a threshold of 1, 2, or 3 times the standard deviation is used to define outliers. • Points that lie away from the mean by a value larger than this threshold are removed. • However, for non-normal distributions, more rigorous measures should be considered (e.g., cost functions).`
  • 6. Feature Selection • Three types of features selection • Individual features selection • Combination of features • Features subset selection
  • 7. Individual Feature Selection • The first step in FS is to look at each feature individually and check whether or not it is an informative one. • If not, the feature is discarded. • To this end, statistical tests are commonly used. • The idea is to test whether the mean values of a feature differ significantly in two classes . • In the case of more than two classes, the test may be applied for each class pair. • Assuming that the data in the classes are normally distributed, the t-test is a popular choice.
  • 8. Individual Feature Selection HYPOTHESIS TESTING: THE t-TEST • The goal of the statistical t-test is to determine which of the following two hypotheses is true: H0: The mean values of the feature in the two classes are equal. (null hypothesis) H1: The mean values of the feature in the two classes are different. (Alternate hypothesis) • If null hypothesis is true, the feature is discarded, i.e., no significant difference between the means of two classes exists. • The hypothesis test is carried out against the so-called significance level, α, which corresponds to the probability of committing an error in our decision. • Typical values used in practice are α = 0.05 and α = 0.001. • A significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference.
  • 9. Individual Feature Selection HYPOTHESIS TESTING: THE t-TEST • The t-test assumes that the values of the features are drawn from normal distributions • If the feature distributions turn out not to be normal, one should choose a nonparametric statistical significance test, such as the Wilcoxon rank sum test, or the Fisher ratio
  • 10. Individual Feature Selection FISHER’S DISCRIMINANT RATIO • FDR is commonly employed to quantify the discriminatory power of individual features between two equiprobable classes. • It is independent of the type of class distribution. • associated with the values of a feature in two classes. The FDR is defined as
  • 11. CLASS SEPARABILITY MEASURES • The previous measures quantify the class-discriminatory power of individual features. • In this se`ction, we turn our attention from individual features to combinations of • features (i.e., feature vectors) and describe measures that quantify class separability in the respective feature space. • Three class-separability measures are considered: • Divergence • Bhattacharyya distance and • Scatter matrices
  • 14. FEATURE SUBSET SELECTION • Reduce the number of features by discarding the less informative ones, using scalar feature selection. • Consider the features that survive from the previous step in different combinations in order to keep the “best” combination. • Exhaustive search • Sequential forward and backward selection
  • 16. Confusion Matrix • In a two-class (positive and negative) problem, a classifier’s ability to predict a true or false state gives rise to four output possibilities: • True positive (TP) • If the actual class is positive and the classifier also predicts it as positive • False negative (FN) • If the actual class is positive, however, the classifier predicts it as negative • True negative (TN) • If the actual class is negative, however, the classifier predicts it as positive • False positive (FP) • If the actual class is negative and the classifier also predicts it as negative •
  • 17. Predicted (10) Pred Positive (PP = 7) Pred Neg (PN = 3) Actual (10) Positive (P = 5) TP = 4 hit FN = 1 Type II error, miss True positive rate (TPR), recall, sensitivity, probability of detection = TP/P = 4/5 False negative rate (FNR) = FN/P = 1/5 Negative (N = 5) FP = 3 Type I error, false alarm TN = 2 Correct rejection False positive rate (FPR) = FP/N = 3/5 True negative rate (TNR), specificity =TN/N = 2/5 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃 + 𝑇𝑁 𝑃 + 𝑁 = 0.60 Precision, Pxositive predictive value (PPV) =TP/PP = 4/7 = False omission rate (FOR) = FN/PN = 1/3 F1 score 2 × PPV×TPR PPV+TPR =0.7 False discovery rate (FDR) = FP/PP = 3/7 Negative predictive value (NPV) = TN/PN = 2/3
  • 20. 1. Mean normalized features from a data are given below: • x1 = [0.6 0 -0.6] and x2 = [0.5 -0.1 -0.4]; a. Write the data matrix X b.Find the covariance matrix. 2. If eigenvalues and eigen vectors of the covariance matrix of the data matrix are: • 𝑙𝑎𝑚𝑏𝑑𝑎 = 0.1, 1.1 • 𝐸 = 0.5 −0.9 −0.9 −0.5 a. Write the transformed data matrix in terms of the projection on the vector that explains maximum variance.