SlideShare a Scribd company logo
4
Most read
5
Most read
D A T A M I N I N G
&
M A C H I N E L E A R N I N G
DA F F O D I L I N T E R N AT I O N A L U N I V E R S I T Y
Md.Anisur Rahman
Contents
1)Data Mining & Machine Learning
2)Data
3)Exploring data -Visualization
4)Data Mining and ML Techniques
5)Applications
6)Summary
DATA MINING
Data mining is considered the process of extracting useful information from a
vast amount of data. It’s used to discover new, accurate, and useful patterns in the
data, looking for meaning and relevant information.
MACHINE LEARNING
Machine learning is the process of discovering algorithms that have improved
courtesy of experience derived from data. It’s the design, study, and development
of algorithms that permit machines to learn without human intervention.
Both data mining and machine learning fall under the aegis of Data
Science, which makes sense since they both use data. Both processes are
used for solving complex problems, so consequently, many people
(erroneously) use the two terms interchangeably.
DATA
Collection of data objects and their attributes.
A collection of attributes
describe an object.
-record, point, case,
sample, entity, or instance
property or characteristic of an object
-eye color of a person, temperature,
variable, field, characteristic, or feature
TYPES OF ATTRIBUTES
Nominal Order Interval Ratio
zip codes, employee
ID numbers, eye
color,
sex: {male, female}
hardness of minerals,
{good, better, best},
grades,
street numbers
calendar dates,
temperature in
Celsius or Fahrenheit
temperature in Kelvin,
monetary quantities,
counts, age, mass, length,
electrical current
IMPORTANT CHARACTERISTICS OF STRUCTURED DATA
1)Dimensionality
Dimensionality is basically the number of columns in a dataset which also can be called the
attributes of data. If we add too many dimensions, this can potentially make the data
incredibly difficult to analyze because it becomes so different, and difficult to group together,
the data in a meaningful way.
2)Sparsity
Data sparsity is term used for how much data we have for a particular dimension/entity of
the model. Data is considered sparse when certain expected values in a dataset are missing,
which is a common phenomenon in general large scaled data analysis.
3)Resolution
Data resolution means a number of units or digits to which a measured or calculated value is
expressed and used. Patterns depend on the scale; think about weather patterns, rainfall over
a time period.
4)Distribution
Data distributions are used often in statistics.They are graphical methods of organizing and
displaying useful information.There are several types of data distributions.We are familiar
with the symmetrical and skewed distribution
Record
• Data Matrix
• Document Data
• Transaction Data
Graph • World Wide Web
• Molecular Structures
Order
• Spatial Data
• Temporal Data
• Genetic Sequence etc.
DATA QUALITY
Noise and Outliers
• Noise refers to modification of original values
• Outliers are data objects with characteristics that are considerably different than most of the other data
objects in the data set.
MissingValues
• Information is not collected
• Attributes may not be applicable to all cases
• We can handle missing values by eliminating missing values or filling them with statistical approach
Duplicate Data
• Data set may include data objects that are duplicates, or almost duplicates of one another.
• Major issue when merging data from heterogeneous sources.
• Data cleaning can solve the problem for duplication of data.
DATA PREPROCESSING
DATA VISUALIZATION
Data visualization is the graphical representation of information and data. By using visual elements like charts,
graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and
patterns in data. Data visualization tools and technologies are essential to analyze massive amounts of
information and make data-driven decisions.
TECHNIQUES
Market Based Analysis
Education
Manufacturing Engineering
Research Analysis
Fraud Detection
APPLICATIONS
Market Based Analysis
Digital Midea & Entertainment
Manufacturing & Automobile
E- Commerce & CRM
Healthcare
APPLICATIONS
DATA MINING - CHARACTERISTICS and APPLICATION
THANK YOU

More Related Content

PPTX
Machine_Learning_VTU_6th_Semester_Module_2.1.pptx
PDF
CS3352-Foundations of Data Science Notes.pdf
PPTX
Introduction of Data Science and Data Analytics
PDF
G045033841
PPTX
Data Science topic and introduction to basic concepts involving data manageme...
PDF
Data science
PPTX
1 UNIT-DSP.pptx
Machine_Learning_VTU_6th_Semester_Module_2.1.pptx
CS3352-Foundations of Data Science Notes.pdf
Introduction of Data Science and Data Analytics
G045033841
Data Science topic and introduction to basic concepts involving data manageme...
Data science
1 UNIT-DSP.pptx

Similar to DATA MINING - CHARACTERISTICS and APPLICATION (20)

PPTX
DOWLD SLIDES.pptx
PPTX
UNIT - 5: Data Warehousing and Data Mining
PPTX
Data science unit1
PDF
Real World Application of Big Data In Data Mining Tools
PPTX
Managing Data For Efficiency.pptx and in
PPTX
Yogesh Waghode Data-Mining-ppt seminar report
DOCX
Core Concepts and Cutting Edge Technologies in Data Science
DOCX
Data Warehose and Data Mining Unit II.docx
PPTX
Big data visualization state of the art
PPTX
Chapter 2 - EMTE.pptx
PPTX
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
PDF
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
PDF
Overview of Data Mining
PPTX
Hetrogeneous Data handling in Big Data Analysis
PDF
Lect 1 introduction
PDF
2 introductory slides
PDF
Data Science Introduction and Process in Data Science
PPTX
SRU_RK_Lecturer1 about datamining cocepts
PPTX
ch2 DS.pptx
DOWLD SLIDES.pptx
UNIT - 5: Data Warehousing and Data Mining
Data science unit1
Real World Application of Big Data In Data Mining Tools
Managing Data For Efficiency.pptx and in
Yogesh Waghode Data-Mining-ppt seminar report
Core Concepts and Cutting Edge Technologies in Data Science
Data Warehose and Data Mining Unit II.docx
Big data visualization state of the art
Chapter 2 - EMTE.pptx
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
Overview of Data Mining
Hetrogeneous Data handling in Big Data Analysis
Lect 1 introduction
2 introductory slides
Data Science Introduction and Process in Data Science
SRU_RK_Lecturer1 about datamining cocepts
ch2 DS.pptx
Ad

More from MD.ANISUR RAHMAN (7)

PDF
Explainable Ai.pdf
PDF
IOT and its communication models and protocols.pdf
PPTX
Ethics in Computing.pptx
PPTX
Deadlock and Banking Algorithm
PPTX
A day in the life of a Web Request
PDF
Zener diode
Explainable Ai.pdf
IOT and its communication models and protocols.pdf
Ethics in Computing.pptx
Deadlock and Banking Algorithm
A day in the life of a Web Request
Zener diode
Ad

Recently uploaded (20)

PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
Lecture Notes Electrical Wiring System Components
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
additive manufacturing of ss316l using mig welding
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPT
Project quality management in manufacturing
PDF
Digital Logic Computer Design lecture notes
DOCX
573137875-Attendance-Management-System-original
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Foundation to blockchain - A guide to Blockchain Tech
bas. eng. economics group 4 presentation 1.pptx
R24 SURVEYING LAB MANUAL for civil enggi
Model Code of Practice - Construction Work - 21102022 .pdf
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Automation-in-Manufacturing-Chapter-Introduction.pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Lecture Notes Electrical Wiring System Components
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
OOP with Java - Java Introduction (Basics)
additive manufacturing of ss316l using mig welding
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Internet of Things (IOT) - A guide to understanding
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Project quality management in manufacturing
Digital Logic Computer Design lecture notes
573137875-Attendance-Management-System-original
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Foundation to blockchain - A guide to Blockchain Tech

DATA MINING - CHARACTERISTICS and APPLICATION

  • 1. D A T A M I N I N G & M A C H I N E L E A R N I N G DA F F O D I L I N T E R N AT I O N A L U N I V E R S I T Y Md.Anisur Rahman
  • 2. Contents 1)Data Mining & Machine Learning 2)Data 3)Exploring data -Visualization 4)Data Mining and ML Techniques 5)Applications 6)Summary
  • 3. DATA MINING Data mining is considered the process of extracting useful information from a vast amount of data. It’s used to discover new, accurate, and useful patterns in the data, looking for meaning and relevant information. MACHINE LEARNING Machine learning is the process of discovering algorithms that have improved courtesy of experience derived from data. It’s the design, study, and development of algorithms that permit machines to learn without human intervention. Both data mining and machine learning fall under the aegis of Data Science, which makes sense since they both use data. Both processes are used for solving complex problems, so consequently, many people (erroneously) use the two terms interchangeably.
  • 4. DATA Collection of data objects and their attributes. A collection of attributes describe an object. -record, point, case, sample, entity, or instance property or characteristic of an object -eye color of a person, temperature, variable, field, characteristic, or feature TYPES OF ATTRIBUTES Nominal Order Interval Ratio zip codes, employee ID numbers, eye color, sex: {male, female} hardness of minerals, {good, better, best}, grades, street numbers calendar dates, temperature in Celsius or Fahrenheit temperature in Kelvin, monetary quantities, counts, age, mass, length, electrical current
  • 5. IMPORTANT CHARACTERISTICS OF STRUCTURED DATA 1)Dimensionality Dimensionality is basically the number of columns in a dataset which also can be called the attributes of data. If we add too many dimensions, this can potentially make the data incredibly difficult to analyze because it becomes so different, and difficult to group together, the data in a meaningful way. 2)Sparsity Data sparsity is term used for how much data we have for a particular dimension/entity of the model. Data is considered sparse when certain expected values in a dataset are missing, which is a common phenomenon in general large scaled data analysis. 3)Resolution Data resolution means a number of units or digits to which a measured or calculated value is expressed and used. Patterns depend on the scale; think about weather patterns, rainfall over a time period. 4)Distribution Data distributions are used often in statistics.They are graphical methods of organizing and displaying useful information.There are several types of data distributions.We are familiar with the symmetrical and skewed distribution
  • 6. Record • Data Matrix • Document Data • Transaction Data Graph • World Wide Web • Molecular Structures Order • Spatial Data • Temporal Data • Genetic Sequence etc.
  • 7. DATA QUALITY Noise and Outliers • Noise refers to modification of original values • Outliers are data objects with characteristics that are considerably different than most of the other data objects in the data set. MissingValues • Information is not collected • Attributes may not be applicable to all cases • We can handle missing values by eliminating missing values or filling them with statistical approach Duplicate Data • Data set may include data objects that are duplicates, or almost duplicates of one another. • Major issue when merging data from heterogeneous sources. • Data cleaning can solve the problem for duplication of data.
  • 9. DATA VISUALIZATION Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data. Data visualization tools and technologies are essential to analyze massive amounts of information and make data-driven decisions.
  • 11. Market Based Analysis Education Manufacturing Engineering Research Analysis Fraud Detection APPLICATIONS
  • 12. Market Based Analysis Digital Midea & Entertainment Manufacturing & Automobile E- Commerce & CRM Healthcare APPLICATIONS