SlideShare a Scribd company logo
Preprocessing data
Feature Scaling
Daniel Shin
What is feature scaling?
Feature scaling is a method used to
standardize the range of independent
variables or features of data. In data
processing, it is also known as data
normalization and is generally performed
during the data preprocessing step.
● Many estimators require it
○ Regression
○ SVM
○ kNN
○ Neural networks
● Image processing
Why use feature scaling?
● Methods
○ Calculate z-scores of your feature variables
○ Min-max method
How to use feature scaling?
● Andrew Ng
○ Okay if variables in the [-⅓ , ⅓ ] to [-3 , 3]
range
○ Otherwise, scale
When to use feature scaling?
● preprocessing.scale
○ by default scales feature to mean 0 and
variance 1
○ i.e calculates the Z-score
● preprocessing.normalize
○ scales each feature to unit vector
○ useful for text classification/clustering
sklearn.preprocessing
Results
Pre-scaled accuracy precision recall f1 Scaled accuracy precision recall f1
LogisticReg 0.838 0.857 0.788 0.817 LogisticReg 0.835 0.850 0.787 0.813
SVMC 0.539 0.000 0.000 0.000 SVMC 0.828 0.846 0.773 0.803
GaussNB 0.828 0.828 0.802 0.811 GaussNB 0.828 0.828 0.802 0.811
DecisionTree 0.703 0.688 0.736 0.690 DecisionTree 0.714 0.720 0.729 0.693
RandomForest 0.801 0.840 0.685 0.741 RandomForest 0.818 0.845 0.700 0.773
kNN9 0.653 0.637 0.576 0.605 kNN9 0.825 0.841 0.766 0.800
Actually this is wrong
Fit scaler only on training set
std_scale = preprocessing.StandardScaler().fit(X_train)
X_train_std = std_scale.transform(X_train)
X_test_std = std_scale.transform(X_test)
also MinMaxScaler()

More Related Content

PDF
Final exam 2011 spring
PPT
Data preprocessing
PDF
03. Data Preprocessing
PPTX
Data Preprocessing- Data Warehouse & Data Mining
PPT
PPT
Data preprocessing
PPT
Data preprocessing
PPTX
Feature Scaling and Normalization Feature Scaling and Normalization.pptx
Final exam 2011 spring
Data preprocessing
03. Data Preprocessing
Data Preprocessing- Data Warehouse & Data Mining
Data preprocessing
Data preprocessing
Feature Scaling and Normalization Feature Scaling and Normalization.pptx

Similar to Preprocessing presentation (20)

PPTX
Feature scaling
PPTX
Data Preprocessing:Feature scaling methods
PPTX
Data Scaling, data science, data preparation.
PDF
13_Data Preprocessing in Python.pptx (1).pdf
PPTX
Preparing your data for Machine Learning with Feature Scaling
PPTX
Data Transformation – Standardization & Normalization PPM.pptx
PDF
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
PDF
Feature Scaling with R.pdf
PDF
Feature Engineering - Getting most out of data for predictive models - TDC 2017
PDF
ML-Unit-4.pdf
PDF
Feature Engineering - Getting most out of data for predictive models
PPTX
Pandas Data Cleaning and Preprocessing PPT.pptx
PDF
Machine Learning - Implementation with Python - 3.pdf
PPTX
Introduction to ML_Data Preprocessing.pptx
PPTX
Data Preprocessing
PDF
Explore ml day 2
PPTX
Machine learning session 5
PDF
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
PPTX
Feature Engineering
PDF
Data preprocessing in Machine Learning
Feature scaling
Data Preprocessing:Feature scaling methods
Data Scaling, data science, data preparation.
13_Data Preprocessing in Python.pptx (1).pdf
Preparing your data for Machine Learning with Feature Scaling
Data Transformation – Standardization & Normalization PPM.pptx
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
Feature Scaling with R.pdf
Feature Engineering - Getting most out of data for predictive models - TDC 2017
ML-Unit-4.pdf
Feature Engineering - Getting most out of data for predictive models
Pandas Data Cleaning and Preprocessing PPT.pptx
Machine Learning - Implementation with Python - 3.pdf
Introduction to ML_Data Preprocessing.pptx
Data Preprocessing
Explore ml day 2
Machine learning session 5
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Feature Engineering
Data preprocessing in Machine Learning
Ad

Recently uploaded (20)

PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPTX
Database Infoormation System (DBIS).pptx
PDF
annual-report-2024-2025 original latest.
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PDF
Global Data and Analytics Market Outlook Report
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PDF
Microsoft Core Cloud Services powerpoint
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
A Complete Guide to Streamlining Business Processes
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PDF
How to run a consulting project- client discovery
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Database Infoormation System (DBIS).pptx
annual-report-2024-2025 original latest.
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Global Data and Analytics Market Outlook Report
SAP 2 completion done . PRESENTATION.pptx
Topic 5 Presentation 5 Lesson 5 Corporate Fin
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
retention in jsjsksksksnbsndjddjdnFPD.pptx
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Microsoft Core Cloud Services powerpoint
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
A Complete Guide to Streamlining Business Processes
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
How to run a consulting project- client discovery
Qualitative Qantitative and Mixed Methods.pptx
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
ISS -ESG Data flows What is ESG and HowHow
Ad

Preprocessing presentation

  • 2. What is feature scaling? Feature scaling is a method used to standardize the range of independent variables or features of data. In data processing, it is also known as data normalization and is generally performed during the data preprocessing step.
  • 3. ● Many estimators require it ○ Regression ○ SVM ○ kNN ○ Neural networks ● Image processing Why use feature scaling?
  • 4. ● Methods ○ Calculate z-scores of your feature variables ○ Min-max method How to use feature scaling?
  • 5. ● Andrew Ng ○ Okay if variables in the [-⅓ , ⅓ ] to [-3 , 3] range ○ Otherwise, scale When to use feature scaling?
  • 6. ● preprocessing.scale ○ by default scales feature to mean 0 and variance 1 ○ i.e calculates the Z-score ● preprocessing.normalize ○ scales each feature to unit vector ○ useful for text classification/clustering sklearn.preprocessing
  • 7. Results Pre-scaled accuracy precision recall f1 Scaled accuracy precision recall f1 LogisticReg 0.838 0.857 0.788 0.817 LogisticReg 0.835 0.850 0.787 0.813 SVMC 0.539 0.000 0.000 0.000 SVMC 0.828 0.846 0.773 0.803 GaussNB 0.828 0.828 0.802 0.811 GaussNB 0.828 0.828 0.802 0.811 DecisionTree 0.703 0.688 0.736 0.690 DecisionTree 0.714 0.720 0.729 0.693 RandomForest 0.801 0.840 0.685 0.741 RandomForest 0.818 0.845 0.700 0.773 kNN9 0.653 0.637 0.576 0.605 kNN9 0.825 0.841 0.766 0.800
  • 9. Fit scaler only on training set std_scale = preprocessing.StandardScaler().fit(X_train) X_train_std = std_scale.transform(X_train) X_test_std = std_scale.transform(X_test) also MinMaxScaler()