2. Concepts of Feature
A feature is an individual measurable property within a recorded
dataset. In machine learning and statistics, features are often
called “variables” or “attributes.”
Features can be individual variables, derived variables, or
combined attributes constructed from underlying data elements.
3. Feature Data
Feature data is simply the data that is passed as input to
machine learning (ML) models.
In tabular data, feature data is the sequence of columns that
are used as input to the model.
5. Types of data
1. Qualitative data- It is descriptive data.
Qualitative data deals with characteristics and
descriptors that can’t be easily measured,but can be
observed subjectively.
2. Quantitative data- It is numerical information.
Quantitative data deals with numbers and things you
can measure objectively.
6. Preprocessing of Data
Data preprocessing is a key aspect of data
preparation. It refers to any processing applied to
raw data to ready it for further analysis or
processing tasks.
Tasks such as:
Data analysis
Machine learning
Data science
AI
7.
Steps in Data Preprocessing
Data preprocessing involves several steps,
each addressing specific challenges related to
data quality, structure, and relevance.
8.
Steps in Data Preprocessing
Step 1: Data cleaning
Data cleaning is the process of identifying and correcting errors or
inconsistencies in the data to ensure it is accurate and complete.
The objective is to address issues that can distort analysis or
model performance.
For example:
Handling missing values
Removing duplicates
Correcting inconsistent formats
9.
Steps in Data Preprocessing
Step 2: Data integration
Data integration involves combining data from multiple sources to
create a unified dataset. This is often necessary when data is
collected from different source systems.
Some techniques used in data integration include:
Schema matching
Data deduplication
10.
Steps in Data Preprocessing
Step 3: Data Transformation
Data transformation converts data into formats suitable for analysis,
machine learning, or mining.
For example:
Scaling and normalization
Encoding categorical variables
Feature engineering and extraction
11.
Steps in Data Preprocessing
Step 4: Data reduction
Data reduction simplifies the dataset by reducing the number of
features or records while preserving the essential information. This
helps speed up analysis and model training without sacrificing
accuracy.
Techniques for data reduction include:
Feature selection
Principal component analysis (PCA)
Sampling methods