SlideShare a Scribd company logo
Nandini Patil
Assistant Professor
Godutai Engg. College,kalaburagi
Data Mining
Introduction
 data mining is the art and science of discovering
the knowledge insights and patterns in the data.
 it is act of extracting useful pattern from an
organized Collection of data.
 patterns must be valid, novel, potentially useful
and understandable.
 data mining is a multidisciplinary field that
borrows techniques from a variety of field.
 it utilizes the knowledge of the data quality & data
organizing from the databases area.
Cont…
 It draws-
 Analytical techniques from statistics and computer
science
 knowledge of decision making from field of business
management.
 Example: customer who buy cheese & milk also
buy bread
Gathering & selecting data
 Growth of data is coming with higher velocity,
volume & variety.
 to learn from Data quality data needs to be
effectively gathered in the and organised and
then sufficiently mined.
 gathering and curating data takes time & efforts
when data is unstructured or semistructured.
 knowledge of the business domain helps to select
the right streams of the data for pursuing the new
insights.
Data cleaning & preparation
 Duplicate data needs to be removed
 Missing values need to be filled in
 Data element should be comparable
 Continuous values may need to be binned
 Outlier data elements need to be removed
 Ensure that the data is representative of the
phenomena
 Data may need to be selected to increase information
density
outputs of data mining
 data mining output servers different types of the
objective
 data mining output are
 decision tree
 regression evaluation or mathematical functions
 Some business rules
Evaluating data mining Results
predictive accuracy = correct predictions /Total predictions
predictive accuracy =(TP+TN)/(TP+TN+FP+FN)
Data mining techniques
Data mining techniques
 Decision Tree
 Regression
 artificial neural networks
 cluster analysis
 Association rules
Tools & platforms for data mining
 simple or a sophisticated
 standalone or embedded
 open source or a commercial
 User interface
 Data formats
comparison of popular data mining
platform
Data mining best practices
 Business understanding
 data understanding
 data preparation
 modeling
 model evaluation
 discrimination and rollout
CRISP-DM data mining cycle
Myths about Data Mining
 Myth #1 Data mining is about algorithms
 Myth #2 Data mining is about predictive accuracy
 Myth #3 Data mining requires a data warehouse
 Myth #4 Data mining requires large quantities of
data
 Myth #5 Data mining technology expert
Data mining mistakes
 Mistake #1 Selecting the wrong problem for data
mining
 Mistake #2 Buried under mountains of data
without clear metadata
 Mistake #3 Disorganized data mining
 Mistake #4 Insufficient business knowledge
 Mistake #5 Incompatibility of data mining tools
and datasets
 Mistake #6 Looking only at aggregated results
and not at individual records/predictions
 Mistake #7 Not measuring your results differently
from the way you are sponsor measures them
Data mining

More Related Content

PPT
Data mining
PPTX
Weak Slot and Filler Structures
PPT
Temporal data mining
PDF
Data Streaming For Big Data
PPTX
Optimization in Deep Learning
PDF
The Data Science Process
PPTX
Exploratory data analysis with Python
Data mining
Weak Slot and Filler Structures
Temporal data mining
Data Streaming For Big Data
Optimization in Deep Learning
The Data Science Process
Exploratory data analysis with Python

What's hot (20)

PPTX
What is Data mining? Data mining Presentation
PPTX
Lecture optimal binary search tree
PDF
Ontologies
PPTX
Classification
PDF
Lecture6 introduction to data streams
PDF
Introduction to Statistical Machine Learning
PDF
Missing data handling
PPT
Association rule mining
PPT
Schemaless Databases
PPTX
Artificial Neural Network
PPT
Data mining techniques unit 1
PDF
Data preprocessing using Machine Learning
PPT
Introduction and architecture of expert system
PDF
An introduction to Machine Learning
PPTX
Data mining - Process, Techniques and Research Topics
PDF
Is Machine learning useful for Fraud Prevention?
PDF
Dimensionality Reduction
PPT
Predicate logic_2(Artificial Intelligence)
PPTX
Data mining concepts and work
What is Data mining? Data mining Presentation
Lecture optimal binary search tree
Ontologies
Classification
Lecture6 introduction to data streams
Introduction to Statistical Machine Learning
Missing data handling
Association rule mining
Schemaless Databases
Artificial Neural Network
Data mining techniques unit 1
Data preprocessing using Machine Learning
Introduction and architecture of expert system
An introduction to Machine Learning
Data mining - Process, Techniques and Research Topics
Is Machine learning useful for Fraud Prevention?
Dimensionality Reduction
Predicate logic_2(Artificial Intelligence)
Data mining concepts and work
Ad

Similar to Data mining (20)

PPTX
Fundamentals of Data Science: Introduction.pptx
DOCX
Understanding Data Mining: Benefits, Challenges, and How AI & ML Help
PDF
Overview of Data Mining
PDF
What is Data Mining? Key Concepts Explained
PPTX
Data mining introduction
PPT
PPT
PPTX
Exploring Data Wealth: Data Mining Insights
PPTX
Data mining and its applications!
PDF
Data mining chapter for students of university
PPTX
Data mining
DOCX
Seminar Report Vaibhav
PPTX
data minig for eng with all topics and history
PPTX
Data Mining: What is Data Mining?
PPT
Data mining
PPTX
DATA MINING seminar prjzkpwnshzghBwkwodoxjz
PPTX
Data mining concepts
PPT
Unit 1 (Chapter-1) on data mining concepts.ppt
PPTX
Business Intelligence and Analytics Unit-2 part-A .pptx
Fundamentals of Data Science: Introduction.pptx
Understanding Data Mining: Benefits, Challenges, and How AI & ML Help
Overview of Data Mining
What is Data Mining? Key Concepts Explained
Data mining introduction
Exploring Data Wealth: Data Mining Insights
Data mining and its applications!
Data mining chapter for students of university
Data mining
Seminar Report Vaibhav
data minig for eng with all topics and history
Data Mining: What is Data Mining?
Data mining
DATA MINING seminar prjzkpwnshzghBwkwodoxjz
Data mining concepts
Unit 1 (Chapter-1) on data mining concepts.ppt
Business Intelligence and Analytics Unit-2 part-A .pptx
Ad

Recently uploaded (20)

PDF
Integrating Fractal Dimension and Time Series Analysis for Optimized Hyperspe...
PDF
Analyzing Impact of Pakistan Economic Corridor on Import and Export in Pakist...
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
Artificial Intelligence
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PDF
Visual Aids for Exploratory Data Analysis.pdf
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PPT
A5_DistSysCh1.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS
PPTX
introduction to high performance computing
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPT
Occupational Health and Safety Management System
PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
Integrating Fractal Dimension and Time Series Analysis for Optimized Hyperspe...
Analyzing Impact of Pakistan Economic Corridor on Import and Export in Pakist...
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
Automation-in-Manufacturing-Chapter-Introduction.pdf
Artificial Intelligence
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
III.4.1.2_The_Space_Environment.p pdffdf
Visual Aids for Exploratory Data Analysis.pdf
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
Exploratory_Data_Analysis_Fundamentals.pdf
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
A5_DistSysCh1.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS
introduction to high performance computing
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Safety Seminar civil to be ensured for safe working.
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Occupational Health and Safety Management System
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf

Data mining

  • 1. Nandini Patil Assistant Professor Godutai Engg. College,kalaburagi Data Mining
  • 2. Introduction  data mining is the art and science of discovering the knowledge insights and patterns in the data.  it is act of extracting useful pattern from an organized Collection of data.  patterns must be valid, novel, potentially useful and understandable.  data mining is a multidisciplinary field that borrows techniques from a variety of field.  it utilizes the knowledge of the data quality & data organizing from the databases area.
  • 3. Cont…  It draws-  Analytical techniques from statistics and computer science  knowledge of decision making from field of business management.  Example: customer who buy cheese & milk also buy bread
  • 4. Gathering & selecting data  Growth of data is coming with higher velocity, volume & variety.  to learn from Data quality data needs to be effectively gathered in the and organised and then sufficiently mined.  gathering and curating data takes time & efforts when data is unstructured or semistructured.  knowledge of the business domain helps to select the right streams of the data for pursuing the new insights.
  • 5. Data cleaning & preparation  Duplicate data needs to be removed  Missing values need to be filled in  Data element should be comparable  Continuous values may need to be binned  Outlier data elements need to be removed  Ensure that the data is representative of the phenomena  Data may need to be selected to increase information density
  • 6. outputs of data mining  data mining output servers different types of the objective  data mining output are  decision tree  regression evaluation or mathematical functions  Some business rules
  • 7. Evaluating data mining Results predictive accuracy = correct predictions /Total predictions predictive accuracy =(TP+TN)/(TP+TN+FP+FN)
  • 9. Data mining techniques  Decision Tree  Regression  artificial neural networks  cluster analysis  Association rules
  • 10. Tools & platforms for data mining  simple or a sophisticated  standalone or embedded  open source or a commercial  User interface  Data formats
  • 11. comparison of popular data mining platform
  • 12. Data mining best practices  Business understanding  data understanding  data preparation  modeling  model evaluation  discrimination and rollout
  • 14. Myths about Data Mining  Myth #1 Data mining is about algorithms  Myth #2 Data mining is about predictive accuracy  Myth #3 Data mining requires a data warehouse  Myth #4 Data mining requires large quantities of data  Myth #5 Data mining technology expert
  • 15. Data mining mistakes  Mistake #1 Selecting the wrong problem for data mining  Mistake #2 Buried under mountains of data without clear metadata  Mistake #3 Disorganized data mining  Mistake #4 Insufficient business knowledge  Mistake #5 Incompatibility of data mining tools and datasets  Mistake #6 Looking only at aggregated results and not at individual records/predictions  Mistake #7 Not measuring your results differently from the way you are sponsor measures them