SlideShare a Scribd company logo
5
Most read
7
Most read
10
Most read
Unit 1

Concepts of features

Preprocessing of data
Concepts of Feature

A feature is an individual measurable property within a recorded
dataset. In machine learning and statistics, features are often
called “variables” or “attributes.”

Features can be individual variables, derived variables, or
combined attributes constructed from underlying data elements.

Feature Data

Feature data is simply the data that is passed as input to
machine learning (ML) models.

In tabular data, feature data is the sequence of columns that
are used as input to the model.
Feature Data
Types of data

1. Qualitative data- It is descriptive data.

Qualitative data deals with characteristics and
descriptors that can’t be easily measured,but can be
observed subjectively.

2. Quantitative data- It is numerical information.

Quantitative data deals with numbers and things you
can measure objectively.
Preprocessing of Data

Data preprocessing is a key aspect of data
preparation. It refers to any processing applied to
raw data to ready it for further analysis or
processing tasks.

Tasks such as:

Data analysis

Machine learning

Data science

AI


Steps in Data Preprocessing

Data preprocessing involves several steps,
each addressing specific challenges related to
data quality, structure, and relevance.

Steps in Data Preprocessing

Step 1: Data cleaning

Data cleaning is the process of identifying and correcting errors or
inconsistencies in the data to ensure it is accurate and complete.
The objective is to address issues that can distort analysis or
model performance.

For example:

Handling missing values

Removing duplicates

Correcting inconsistent formats

Steps in Data Preprocessing

Step 2: Data integration

Data integration involves combining data from multiple sources to
create a unified dataset. This is often necessary when data is
collected from different source systems.

Some techniques used in data integration include:

Schema matching

Data deduplication

Steps in Data Preprocessing

Step 3: Data Transformation

Data transformation converts data into formats suitable for analysis,
machine learning, or mining.

For example:

Scaling and normalization

Encoding categorical variables

Feature engineering and extraction

Steps in Data Preprocessing

Step 4: Data reduction

Data reduction simplifies the dataset by reducing the number of
features or records while preserving the essential information. This
helps speed up analysis and model training without sacrificing
accuracy.

Techniques for data reduction include:

Feature selection

Principal component analysis (PCA)

Sampling methods


More Related Content

PPT
preproccessing level 3 for students.ppt
PPT
data Preprocessing different techniques summarized
PPTX
Data preprocessing PPT
PPT
Unit-2 Part-1(a)-1.pptgguuijjiiioooooooooo
PPT
Pre_processing_the_data_using_advance_technique
PDF
Copy of Data preprocessing.pdf give notes regarding mining concpts
PPT
Preprocessing.ppt
PPTX
Steps and Techniques for Effective Model
preproccessing level 3 for students.ppt
data Preprocessing different techniques summarized
Data preprocessing PPT
Unit-2 Part-1(a)-1.pptgguuijjiiioooooooooo
Pre_processing_the_data_using_advance_technique
Copy of Data preprocessing.pdf give notes regarding mining concpts
Preprocessing.ppt
Steps and Techniques for Effective Model

Similar to Feature types and data preprocessing steps (20)

PPTX
PPTX
Advance Data_Preprocessing_and_Wrangling
PDF
Data Preparation and Preprocessing , Data Cleaning
PPT
Data Preprocessing_17924109858fc09abd41bc880e540c13.ppt
PDF
03Preprocessing01.pdf
PDF
Data Warehousing and Suitable for BCA, BSC, MCA
PDF
KNOLX_Data_preprocessing
PPTX
Data mining
PDF
Data preprocessing using Machine Learning
PDF
Data Preprocessing -Data Quality Noisy Data
PPTX
DATA preprocessing.pptx
PPTX
data_preprocessingknnnaiveandothera.pptx
PDF
Preprocessing Step in Data Cleaning - Data Mining
PPT
Preprocessing data mining hhxdzsdsasaasa
PPTX
Machine learning topics machine learning algorithm into three main parts.
PDF
Data Cleaning and Preprocessing: Ensuring Data Quality
PDF
3-DataPreprocessing a complete guide.pdf
PPT
Chapter 2 Cond (1).ppt
PPT
Data Preprocessing in Pharmaceutical.ppt
PPT
Preprocessing_new.ppt
Advance Data_Preprocessing_and_Wrangling
Data Preparation and Preprocessing , Data Cleaning
Data Preprocessing_17924109858fc09abd41bc880e540c13.ppt
03Preprocessing01.pdf
Data Warehousing and Suitable for BCA, BSC, MCA
KNOLX_Data_preprocessing
Data mining
Data preprocessing using Machine Learning
Data Preprocessing -Data Quality Noisy Data
DATA preprocessing.pptx
data_preprocessingknnnaiveandothera.pptx
Preprocessing Step in Data Cleaning - Data Mining
Preprocessing data mining hhxdzsdsasaasa
Machine learning topics machine learning algorithm into three main parts.
Data Cleaning and Preprocessing: Ensuring Data Quality
3-DataPreprocessing a complete guide.pdf
Chapter 2 Cond (1).ppt
Data Preprocessing in Pharmaceutical.ppt
Preprocessing_new.ppt
Ad

More from deepalishinkar1 (20)

PPTX
software architecture and design pattern
PPTX
software view and quality attribute of SA
PPTX
Data transformation with normalization and standardization
PPTX
data structure stack appplication in python
PPTX
Stack application in infix to prefix expression
PPTX
Data Structure Stack operation in python
PPTX
steps for template in django for project
PDF
Web application on menu card qrcode generator.pdf
PPTX
Django model create a table in django web framework
PPTX
TO DO APP USING STREAMLIT PYTHON PROJECT
PPTX
basic concepts of object oriented in python
PDF
Inheritance and polymorphism oops concepts in python
PPTX
DATABASE CONNECTIVITY PYTHON USING MYSQL/SQLITE/POSTGRE
PPTX
File Operations in python Read ,Write,binary file etc.
PDF
How to create a django project procedure
PDF
Virtual environment in python on windows / linux os
PPTX
Operators in python
PPTX
Data handling in python
PDF
Practical approach on numbers system and math module
PDF
Demonstration on keyword
software architecture and design pattern
software view and quality attribute of SA
Data transformation with normalization and standardization
data structure stack appplication in python
Stack application in infix to prefix expression
Data Structure Stack operation in python
steps for template in django for project
Web application on menu card qrcode generator.pdf
Django model create a table in django web framework
TO DO APP USING STREAMLIT PYTHON PROJECT
basic concepts of object oriented in python
Inheritance and polymorphism oops concepts in python
DATABASE CONNECTIVITY PYTHON USING MYSQL/SQLITE/POSTGRE
File Operations in python Read ,Write,binary file etc.
How to create a django project procedure
Virtual environment in python on windows / linux os
Operators in python
Data handling in python
Practical approach on numbers system and math module
Demonstration on keyword
Ad

Recently uploaded (20)

PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PDF
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PPTX
Artificial Intelligence
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PDF
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PPTX
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
PPT
Occupational Health and Safety Management System
PPTX
Module 8- Technological and Communication Skills.pptx
PDF
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
PPTX
Information Storage and Retrieval Techniques Unit III
PPTX
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
PPTX
Nature of X-rays, X- Ray Equipment, Fluoroscopy
PDF
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
PPTX
communication and presentation skills 01
PPT
Total quality management ppt for engineering students
PDF
distributed database system" (DDBS) is often used to refer to both the distri...
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
Artificial Intelligence
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
Exploratory_Data_Analysis_Fundamentals.pdf
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
Occupational Health and Safety Management System
Module 8- Technological and Communication Skills.pptx
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
Information Storage and Retrieval Techniques Unit III
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
Nature of X-rays, X- Ray Equipment, Fluoroscopy
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
communication and presentation skills 01
Total quality management ppt for engineering students
distributed database system" (DDBS) is often used to refer to both the distri...

Feature types and data preprocessing steps

  • 1. Unit 1  Concepts of features  Preprocessing of data
  • 2. Concepts of Feature  A feature is an individual measurable property within a recorded dataset. In machine learning and statistics, features are often called “variables” or “attributes.”  Features can be individual variables, derived variables, or combined attributes constructed from underlying data elements. 
  • 3. Feature Data  Feature data is simply the data that is passed as input to machine learning (ML) models.  In tabular data, feature data is the sequence of columns that are used as input to the model.
  • 5. Types of data  1. Qualitative data- It is descriptive data.  Qualitative data deals with characteristics and descriptors that can’t be easily measured,but can be observed subjectively.  2. Quantitative data- It is numerical information.  Quantitative data deals with numbers and things you can measure objectively.
  • 6. Preprocessing of Data  Data preprocessing is a key aspect of data preparation. It refers to any processing applied to raw data to ready it for further analysis or processing tasks.  Tasks such as:  Data analysis  Machine learning  Data science  AI 
  • 7.  Steps in Data Preprocessing  Data preprocessing involves several steps, each addressing specific challenges related to data quality, structure, and relevance.
  • 8.  Steps in Data Preprocessing  Step 1: Data cleaning  Data cleaning is the process of identifying and correcting errors or inconsistencies in the data to ensure it is accurate and complete. The objective is to address issues that can distort analysis or model performance.  For example:  Handling missing values  Removing duplicates  Correcting inconsistent formats
  • 9.  Steps in Data Preprocessing  Step 2: Data integration  Data integration involves combining data from multiple sources to create a unified dataset. This is often necessary when data is collected from different source systems.  Some techniques used in data integration include:  Schema matching  Data deduplication
  • 10.  Steps in Data Preprocessing  Step 3: Data Transformation  Data transformation converts data into formats suitable for analysis, machine learning, or mining.  For example:  Scaling and normalization  Encoding categorical variables  Feature engineering and extraction
  • 11.  Steps in Data Preprocessing  Step 4: Data reduction  Data reduction simplifies the dataset by reducing the number of features or records while preserving the essential information. This helps speed up analysis and model training without sacrificing accuracy.  Techniques for data reduction include:  Feature selection  Principal component analysis (PCA)  Sampling methods 