SlideShare a Scribd company logo
Regression Methods in
Machine Learning
Categorical Variable Conversion
Portland Data Science Group
Andrew Ferlitsch
Community Outreach Officer
July, 2017
Linear Regression
• All the features (independent variables) need to be a
real number.
• CANNOT be a categorical value, ie., a named or
enumerated value.
• Example:
Male vs. Female
Red, Blue, Green
Apple, Banana, Pear, Orange
Categorical Variables
Age Gender Income
25 Male 25000
26 Female 22000
30 Male 45000
24 Female 26000
Independent Variables (Features)
Dependent Variables (Label)
Real Values Value to Predict
Categorical Values
Dummy Variable Conversion
Known in Python as OneHotEncoder
For each categorical feature:
1. Scan the dataset and determine all the unique instances.
2. Create a new feature (i.e., dummy variable) in dataset, one
per unique instance.
3. Remove the categorical feature from the dataset.
4. For each sample (row), set a 1 in the feature (dummy
variable) that corresponds to that categorical value instance,
and:
5. Set a 0 in the remaining features (dummy variables) for that
categorical field.
6. Remove one dummy variable field.
Dummy Variable Trap
Gender
Male
Female
Male
Female
Need to Drop one Dummy Variable!
Male Female
1 0
0 1
1 0
0 1
x1 x2 x3
Multicollinearity occurs when one variable predicts another.
i.e., x2 = ( 1 – x3)
As a result, a regression analysis cannot distinguish between the
contribution of x2 and x3.
Drop one of Dummy Variables
Age Male Income
25 1 25000
26 0 22000
30 1 45000
24 0 26000
Drop one of the Dummy Variables
Age Gender Income
25 Male 25000
26 Female 22000
30 Male 45000
24 Female 26000
Gender is Replaced with Male
Age Race Income
20 White Apple
26 Hispanic 22000
30 Asian 45000
24 Asian 26000
Age White Asian Income
20 1 0 Apple
26 0 0 22000
30 0 1 45000
24 0 1 26000
Dropped Hispanic (i.e., Hispanic = White: 0, Asian: 0)

More Related Content

PPTX
Machine Learning - Dataset Preparation
PPTX
Preparing your data for Machine Learning with Feature Scaling
PPTX
Java Programming
PDF
Introduction on Data Structures
PPTX
Java Tutorial Lab 9
PDF
Introduction java programming
PPTX
Session 07 text data.pptx
PPTX
Java Tutorial Lab 2
Machine Learning - Dataset Preparation
Preparing your data for Machine Learning with Feature Scaling
Java Programming
Introduction on Data Structures
Java Tutorial Lab 9
Introduction java programming
Session 07 text data.pptx
Java Tutorial Lab 2

What's hot (20)

PPTX
Session 06 machine learning.pptx
PPTX
264finalppt (1)
PPTX
Learn ActionScript programming myassignmenthelp.net
PDF
Aaa ped-15-Ensemble Learning: Random Forests
PPTX
Clustering: A Scikit Learn Tutorial
PDF
A Primer on Entity Resolution
PPTX
Machine Learning Innovations
PDF
Data exploration validation and sanitization
PPTX
Machine learning with R
KEY
Building a Mongo DSL in Scala at Hot Potato
PPT
4. Classes and Methods
PDF
LectureNotes-03-DSA
PDF
Boosted tree
PPT
Abstract data types (adt) intro to data structure part 2
PPTX
Abstract Data Types
PPT
L6 structure
PPTX
Mini_Project
PPT
Data structure lecture 1
Session 06 machine learning.pptx
264finalppt (1)
Learn ActionScript programming myassignmenthelp.net
Aaa ped-15-Ensemble Learning: Random Forests
Clustering: A Scikit Learn Tutorial
A Primer on Entity Resolution
Machine Learning Innovations
Data exploration validation and sanitization
Machine learning with R
Building a Mongo DSL in Scala at Hot Potato
4. Classes and Methods
LectureNotes-03-DSA
Boosted tree
Abstract data types (adt) intro to data structure part 2
Abstract Data Types
L6 structure
Mini_Project
Data structure lecture 1
Ad

Similar to Machine Learning - Dummy Variable Conversion (20)

PPTX
Lec4(Multiple Regression) & Building a Model & Dummy Variable.pptx
PDF
Machine Learning - Implementation with Python - 4.pdf
DOCX
Project 1FINA 415-15BGroup of 5.Due by 18092015..docx
PPTX
unit-5 Data Wrandling weightage marks.pptx
PDF
multiple linear regression in spss (procedure and output)
PPTX
Different Types of Machine Learning Algorithms
PPTX
simple and multiple linear Regression. (1).pptx
PPTX
DUMMY.pptx
PPTX
ForecastIT 6. Multi-Variable Linear Regression
PPTX
Regression Analysis.pptx
PPTX
Regression Analysis Techniques.pptx
PPTX
Regression
PDF
ML_Lec3 introduction to regression problems.pdf
PPTX
linear regression in machine learning.pptx
PPTX
Introduction to Regression . pptx
PPTX
Detail Study of the concept of Regression model.pptx
PPTX
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
PPTX
Linear regression aims to find the "best-fit" linear line
PPTX
Regression (Linear Regression and Logistic Regression) by Akanksha Bali
PPTX
LINEAR LOGISTIC REGRESSION PPT(1).pptx
Lec4(Multiple Regression) & Building a Model & Dummy Variable.pptx
Machine Learning - Implementation with Python - 4.pdf
Project 1FINA 415-15BGroup of 5.Due by 18092015..docx
unit-5 Data Wrandling weightage marks.pptx
multiple linear regression in spss (procedure and output)
Different Types of Machine Learning Algorithms
simple and multiple linear Regression. (1).pptx
DUMMY.pptx
ForecastIT 6. Multi-Variable Linear Regression
Regression Analysis.pptx
Regression Analysis Techniques.pptx
Regression
ML_Lec3 introduction to regression problems.pdf
linear regression in machine learning.pptx
Introduction to Regression . pptx
Detail Study of the concept of Regression model.pptx
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear regression aims to find the "best-fit" linear line
Regression (Linear Regression and Logistic Regression) by Akanksha Bali
LINEAR LOGISTIC REGRESSION PPT(1).pptx
Ad

More from Andrew Ferlitsch (20)

PPTX
AI - Intelligent Agents
PPTX
Pareto Principle Applied to QA
PPTX
Whiteboarding Coding Challenges in Python
PPTX
Object Oriented Programming Principles
PPTX
Python - OOP Programming
PPTX
Python - Installing and Using Python and Jupyter Notepad
PPTX
Natural Language Processing - Groupings (Associations) Generation
PPTX
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
PPTX
Machine Learning - Introduction to Recurrent Neural Networks
PPTX
Machine Learning - Introduction to Convolutional Neural Networks
PPTX
Machine Learning - Introduction to Neural Networks
PPTX
Python - Numpy/Pandas/Matplot Machine Learning Libraries
PPTX
Machine Learning - Accuracy and Confusion Matrix
PPTX
Machine Learning - Ensemble Methods
PPTX
ML - Multiple Linear Regression
PPTX
ML - Simple Linear Regression
PPTX
Machine Learning - Splitting Datasets
PPTX
Machine Learning - Introduction to Tensorflow
PPTX
Introduction to Machine Learning
PPTX
AI - Introduction to Dynamic Programming
AI - Intelligent Agents
Pareto Principle Applied to QA
Whiteboarding Coding Challenges in Python
Object Oriented Programming Principles
Python - OOP Programming
Python - Installing and Using Python and Jupyter Notepad
Natural Language Processing - Groupings (Associations) Generation
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
Machine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Neural Networks
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Ensemble Methods
ML - Multiple Linear Regression
ML - Simple Linear Regression
Machine Learning - Splitting Datasets
Machine Learning - Introduction to Tensorflow
Introduction to Machine Learning
AI - Introduction to Dynamic Programming

Recently uploaded (20)

PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
KodekX | Application Modernization Development
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Electronic commerce courselecture one. Pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation_ Review paper, used for researhc scholars
Understanding_Digital_Forensics_Presentation.pptx
Encapsulation theory and applications.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Per capita expenditure prediction using model stacking based on satellite ima...
Diabetes mellitus diagnosis method based random forest with bat algorithm
NewMind AI Weekly Chronicles - August'25 Week I
Dropbox Q2 2025 Financial Results & Investor Presentation
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
KodekX | Application Modernization Development
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Spectral efficient network and resource selection model in 5G networks
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Programs and apps: productivity, graphics, security and other tools
Chapter 3 Spatial Domain Image Processing.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Electronic commerce courselecture one. Pdf
Unlocking AI with Model Context Protocol (MCP)
Encapsulation_ Review paper, used for researhc scholars

Machine Learning - Dummy Variable Conversion

  • 1. Regression Methods in Machine Learning Categorical Variable Conversion Portland Data Science Group Andrew Ferlitsch Community Outreach Officer July, 2017
  • 2. Linear Regression • All the features (independent variables) need to be a real number. • CANNOT be a categorical value, ie., a named or enumerated value. • Example: Male vs. Female Red, Blue, Green Apple, Banana, Pear, Orange
  • 3. Categorical Variables Age Gender Income 25 Male 25000 26 Female 22000 30 Male 45000 24 Female 26000 Independent Variables (Features) Dependent Variables (Label) Real Values Value to Predict Categorical Values
  • 4. Dummy Variable Conversion Known in Python as OneHotEncoder For each categorical feature: 1. Scan the dataset and determine all the unique instances. 2. Create a new feature (i.e., dummy variable) in dataset, one per unique instance. 3. Remove the categorical feature from the dataset. 4. For each sample (row), set a 1 in the feature (dummy variable) that corresponds to that categorical value instance, and: 5. Set a 0 in the remaining features (dummy variables) for that categorical field. 6. Remove one dummy variable field.
  • 5. Dummy Variable Trap Gender Male Female Male Female Need to Drop one Dummy Variable! Male Female 1 0 0 1 1 0 0 1 x1 x2 x3 Multicollinearity occurs when one variable predicts another. i.e., x2 = ( 1 – x3) As a result, a regression analysis cannot distinguish between the contribution of x2 and x3.
  • 6. Drop one of Dummy Variables Age Male Income 25 1 25000 26 0 22000 30 1 45000 24 0 26000 Drop one of the Dummy Variables Age Gender Income 25 Male 25000 26 Female 22000 30 Male 45000 24 Female 26000 Gender is Replaced with Male Age Race Income 20 White Apple 26 Hispanic 22000 30 Asian 45000 24 Asian 26000 Age White Asian Income 20 1 0 Apple 26 0 0 22000 30 0 1 45000 24 0 1 26000 Dropped Hispanic (i.e., Hispanic = White: 0, Asian: 0)