DATA MINING - CHARACTERISTICS and APPLICATION

D A T A M I N I N G
&
M A C H I N E L E A R N I N G
DA F F O D I L I N T E R N AT I O N A L U N I V E R S I T Y
Md.Anisur Rahman

Contents
1)Data Mining & Machine Learning
2)Data
3)Exploring data -Visualization
4)Data Mining and ML Techniques
5)Applications
6)Summary

DATA MINING
Data mining is considered the process of extracting useful information from a
vast amount of data. It’s used to discover new, accurate, and useful patterns in the
data, looking for meaning and relevant information.
MACHINE LEARNING
Machine learning is the process of discovering algorithms that have improved
courtesy of experience derived from data. It’s the design, study, and development
of algorithms that permit machines to learn without human intervention.
Both data mining and machine learning fall under the aegis of Data
Science, which makes sense since they both use data. Both processes are
used for solving complex problems, so consequently, many people
(erroneously) use the two terms interchangeably.

DATA
Collection of data objects and their attributes.
A collection of attributes
describe an object.
-record, point, case,
sample, entity, or instance
property or characteristic of an object
-eye color of a person, temperature,
variable, field, characteristic, or feature
TYPES OF ATTRIBUTES
Nominal Order Interval Ratio
zip codes, employee
ID numbers, eye
color,
sex: {male, female}
hardness of minerals,
{good, better, best},
grades,
street numbers
calendar dates,
temperature in
Celsius or Fahrenheit
temperature in Kelvin,
monetary quantities,
counts, age, mass, length,
electrical current

IMPORTANT CHARACTERISTICS OF STRUCTURED DATA
1)Dimensionality
Dimensionality is basically the number of columns in a dataset which also can be called the
attributes of data. If we add too many dimensions, this can potentially make the data
incredibly difficult to analyze because it becomes so different, and difficult to group together,
the data in a meaningful way.
2)Sparsity
Data sparsity is term used for how much data we have for a particular dimension/entity of
the model. Data is considered sparse when certain expected values in a dataset are missing,
which is a common phenomenon in general large scaled data analysis.
3)Resolution
Data resolution means a number of units or digits to which a measured or calculated value is
expressed and used. Patterns depend on the scale; think about weather patterns, rainfall over
a time period.
4)Distribution
Data distributions are used often in statistics.They are graphical methods of organizing and
displaying useful information.There are several types of data distributions.We are familiar
with the symmetrical and skewed distribution

Record
• Data Matrix
• Document Data
• Transaction Data
Graph • World Wide Web
• Molecular Structures
Order
• Spatial Data
• Temporal Data
• Genetic Sequence etc.

DATA QUALITY
Noise and Outliers
• Noise refers to modification of original values
• Outliers are data objects with characteristics that are considerably different than most of the other data
objects in the data set.
MissingValues
• Information is not collected
• Attributes may not be applicable to all cases
• We can handle missing values by eliminating missing values or filling them with statistical approach
Duplicate Data
• Data set may include data objects that are duplicates, or almost duplicates of one another.
• Major issue when merging data from heterogeneous sources.
• Data cleaning can solve the problem for duplication of data.

DATA VISUALIZATION
Data visualization is the graphical representation of information and data. By using visual elements like charts,
graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and
patterns in data. Data visualization tools and technologies are essential to analyze massive amounts of
information and make data-driven decisions.

Market Based Analysis
Education
Manufacturing Engineering
Research Analysis
Fraud Detection
APPLICATIONS

Market Based Analysis
Digital Midea & Entertainment
Manufacturing & Automobile
E- Commerce & CRM
Healthcare
APPLICATIONS

DATA MINING - CHARACTERISTICS and APPLICATION

DATA MINING - CHARACTERISTICS and APPLICATION

More Related Content

Similar to DATA MINING - CHARACTERISTICS and APPLICATION (20)

More from MD.ANISUR RAHMAN (7)

Recently uploaded (20)

DATA MINING - CHARACTERISTICS and APPLICATION