SlideShare a Scribd company logo
1 I NAME OF PRESENTER
Data Mining
Ashis Kumar Chanda
Department of Computer Science and Engineering
University of Dhaka
2 I NAME OF PRESENTERCSE, DU2
Key concepts
 What is Data mining
 Why learn Data mining
 Data type
 Warehouse & OLAP
 Data Cleaning, Integration
 Associations, Item sets, Support, Confidence
3 I NAME OF PRESENTERCSE, DU3
Data Mining
 Data mining refers to Knowledge mining
from large amount of data
 Also known as “Knowledge Discovery from
Data” or KDD
 Target is to find a Hidden Pattern
4 I NAME OF PRESENTER
 We can’t get all type of information through Query
 Query not support Statistical analysis
 Again, we can apply artificial intelligence & find new
patterns or structures
CSE, DU4
Why learn data mining
Query provide values but data mining provides idea that help
to take (business ) decision
Ex: Women live at “Dhanmondi” & older than 40 years
most frequently buy “Jamdani Shari” at “Arong”
5 I NAME OF PRESENTERCSE, DU5
Data type
 Tabular (Transaction data) Most commonly
used
 Spatial Data (Remote sensing data/
encoded data)
 Tree Data ( xml )
 Graphs (www, bio-molecular)
 Sequence (DNA, activity log)
 Text, multimedia data
6 I NAME OF PRESENTERCSE, DU6
Warehouse & OLAP
Ware House
Data Source
Warehouse is an archive of information gathered from
multiple sources
Suppose a Banking database where each has a data source
that stores all transactions of that area. And all data source
will provide a clean/safe copy at Warehouse
7 I NAME OF PRESENTERCSE, DU7
Warehouse & OLAP
There is several issues about Warehouse:
 When and how to gather data
 What schema/pattern to use
 Data transformation & cleaning
 How to update
“Warehouse is a collection of data marts”
Where data mart is store of data in specialized pattern
8 I NAME OF PRESENTERCSE, DU8
Warehouse & OLAP
OLAP: Online Analytical Processing
OLAP tools support interactive analysis of summary Information
OLAP permits an analyst to view different summaries of
multidimensional data
Item name
Dress
Fig: Data Cube
9 I NAME OF PRESENTERCSE, DU9
Data cleaning
There may be some missing data, duplicate data, dirty data
So we need to data cleaning
Some methods:
 Ignore the tuple (not effective unless tuple contain many
missing attribute)
 Fill missing values (time consuming)
 Fill with a global value (like: unknown)
 Use mean attribute
 Use most probable value
10 I NAME OF PRESENTERCSE, DU10
11 I NAME OF PRESENTERCSE, DU11
Associations & Item sets
Associations:
An associations is a rule of the form if X then Y
It is denoted as X-> Y
Example: if there is an exam then I read
Item Sets:
For any rule if X->Y & Y->X Then X, Y are called item-set
Example:
People buying school books in January also by notebook
People buying school note books in January also by book
12 I NAME OF PRESENTERCSE, DU12
Support & confidence
Support:
The proportion of transactions in the data set which contains
the itemset
Confidence:
The conditional probability that an item appears in a
transaction when another item appears.
13 I NAME OF PRESENTERCSE, DU13
Support & confidence
Support for {I₁,I₂}
= support_count(I1 U I2)/ |D|
= 4/9
Confidence for I1 → I2
=support_count(I1 U I2) /
support_count(I1)
= 4/6
14 I NAME OF PRESENTERCSE, DU14
Association rules
Where, support count(AUB) is the number of transactions
containing the itemsets AUB, and support count(A) is the
number of transactions containing the itemset A.
•Association rules can be generated as follows:
1. For each frequent itemset l, generate all nonempty subsets
of l.
2. For every nonempty subset s of l, output the rule “s → (l-
s)” if support count(l)/support count(s) >= min_conf,
where min_conf is the minimum confidence threshold.
15 I NAME OF PRESENTERCSE, DU15
Summary
Basic topics: Data mining, Data cleaning, Warehouse, OLAP
Term: Association, Item-set, Support, Confidence
16 I NAME OF PRESENTERCSE, DU16
References
- Data Mining Concepts & Techniques
by J. Han & M. Kamber
- Database system Concept
by Abraham Sillberschatz, Korth, Sudarshan
- Lecture of Dr. S. Srinath
Institute of Technology at Madras, India

More Related Content

PPTX
EDI Training Module 5: Creating Clean Data foro Publishing
PPTX
EDI Training Module 4: Organizing Data Into Publishable Units
PDF
Comparative study of frequent item set in data mining
PPTX
EDI Training Module 12: An Introduction to Metadata and Data Repositories
PPTX
EDI Training Module 12: Learn to Cite and Link Your Data
PPTX
EDI Training Module 10: EDI Data Repository Overview
PPT
Elementary data organisation
PPTX
Exploratory data analysis with Python
EDI Training Module 5: Creating Clean Data foro Publishing
EDI Training Module 4: Organizing Data Into Publishable Units
Comparative study of frequent item set in data mining
EDI Training Module 12: An Introduction to Metadata and Data Repositories
EDI Training Module 12: Learn to Cite and Link Your Data
EDI Training Module 10: EDI Data Repository Overview
Elementary data organisation
Exploratory data analysis with Python

What's hot (20)

PPTX
Introduction to data pre-processing and cleaning
PDF
A basic course on Research data management, part 4: caring for your data, or ...
PPTX
23.database
PDF
A classification of methods for frequent pattern mining
PPTX
DataVsStatistics
PDF
A basic course on Research data management, part 1: what and why
PPTX
Data Mining: Mining ,associations, and correlations
PPTX
Data mining nouman javed
PPTX
Research trends in data warehousing and data mining
PDF
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
PPTX
PPT
Data pre processing
PPS
Database
PDF
Data preprocessing
PPTX
Data structures
PDF
Research Data Management
PPTX
EDI Training Module 11: Publishing Data in the EDI Repository
PPTX
Data Mining: Classification and analysis
PDF
A Study of Various Projected Data Based Pattern Mining Algorithms
PPT
Data Warehouse By Piyush
Introduction to data pre-processing and cleaning
A basic course on Research data management, part 4: caring for your data, or ...
23.database
A classification of methods for frequent pattern mining
DataVsStatistics
A basic course on Research data management, part 1: what and why
Data Mining: Mining ,associations, and correlations
Data mining nouman javed
Research trends in data warehousing and data mining
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
Data pre processing
Database
Data preprocessing
Data structures
Research Data Management
EDI Training Module 11: Publishing Data in the EDI Repository
Data Mining: Classification and analysis
A Study of Various Projected Data Based Pattern Mining Algorithms
Data Warehouse By Piyush
Ad

Viewers also liked (20)

PDF
Mining Data from Reservoir Simulation Result
PPSX
10 flatteners
PPT
L’eutanasia
PPTX
Activity in comp.
PPTX
Keep austin weird 2014.ppt
PPTX
Keep Austin Weird 2013
PPTX
Tornado re brand presentation (draft)(not for reproduction)
PPT
Final photos
PPTX
Test Powerpoint Upload
PPT
Test audio
PDF
Venticinque Aprile Un bellissimo giorno da ricordare e onorare
PPT
Universal Design
PDF
Nooges Brochure
PPTX
Disney
PPT
The colors of the flag
PPTX
Mexican manufacturers inc 10 18
PDF
Nooges-T Project
PDF
Big Data vs. Smart Data: The Cook County Land Bank’s Data-Driven plan for lan...
PPTX
Fotosintesis
PPT
Jaguar
Mining Data from Reservoir Simulation Result
10 flatteners
L’eutanasia
Activity in comp.
Keep austin weird 2014.ppt
Keep Austin Weird 2013
Tornado re brand presentation (draft)(not for reproduction)
Final photos
Test Powerpoint Upload
Test audio
Venticinque Aprile Un bellissimo giorno da ricordare e onorare
Universal Design
Nooges Brochure
Disney
The colors of the flag
Mexican manufacturers inc 10 18
Nooges-T Project
Big Data vs. Smart Data: The Cook County Land Bank’s Data-Driven plan for lan...
Fotosintesis
Jaguar
Ad

Similar to Data Mining (Introduction) (20)

PPTX
Human resource assignment help
PPTX
PPT
Data mining techniques unit 1
PDF
Introduction to Data Mining, KDD Process, OLTP and OLAP
PDF
Issues in data mining Patterns Online Analytical Processing
PPTX
UNIT 2: Part 2: Data Warehousing and Data Mining
PPT
Analysis technologies - day3 slides Lecture notesppt
PDF
Dm unit i r16
PPTX
Introduction to data mining
PPTX
Data mining
PPT
1328cvkdlgkdgjfdkjgjdfgdfkgdflgkgdfglkjgld8679 - Copy.ppt
PPT
Dwdmunit1 a
PPTX
Data warehousing and mining furc
PPT
Cssu dw dm
PPT
Introduction to DataMining
PDF
Data mining
PDF
Lect 1 introduction
PPTX
omama munir 58.pptx
PDF
Overview of Data Mining
PPT
DWDMUNIhjkuijhgfdswertyuuyhtgrertyuujhytrertyT1A.ppt
Human resource assignment help
Data mining techniques unit 1
Introduction to Data Mining, KDD Process, OLTP and OLAP
Issues in data mining Patterns Online Analytical Processing
UNIT 2: Part 2: Data Warehousing and Data Mining
Analysis technologies - day3 slides Lecture notesppt
Dm unit i r16
Introduction to data mining
Data mining
1328cvkdlgkdgjfdkjgjdfgdfkgdflgkgdfglkjgld8679 - Copy.ppt
Dwdmunit1 a
Data warehousing and mining furc
Cssu dw dm
Introduction to DataMining
Data mining
Lect 1 introduction
omama munir 58.pptx
Overview of Data Mining
DWDMUNIhjkuijhgfdswertyuuyhtgrertyuujhytrertyT1A.ppt

More from Ashis Kumar Chanda (20)

PPT
Word 2 vector
PPTX
Multi-class Image Classification using deep convolutional networks on extreme...
PPT
Full resolution image compression with recurrent neural networks
PPT
Understanding Natural Language Queries over Relational Databases
PPTX
03. Agile Development
PPT
Software Cost Estimation
PPT
Risk Management
PPT
Project Management
PPTX
Requirements engineering
PPT
2. Software process
PPT
1. Introduction
PPTX
Periodic pattern mining
PPTX
FPPM algorithm
PDF
Secure software design
PPT
Sequential logic circuit optimization
PPT
Introduction to CS
PPT
Iterative deepening search
PPTX
Word 2 vector
Multi-class Image Classification using deep convolutional networks on extreme...
Full resolution image compression with recurrent neural networks
Understanding Natural Language Queries over Relational Databases
03. Agile Development
Software Cost Estimation
Risk Management
Project Management
Requirements engineering
2. Software process
1. Introduction
Periodic pattern mining
FPPM algorithm
Secure software design
Sequential logic circuit optimization
Introduction to CS
Iterative deepening search

Recently uploaded (20)

PPTX
additive manufacturing of ss316l using mig welding
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
Construction Project Organization Group 2.pptx
PPTX
Geodesy 1.pptx...............................................
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPT
Mechanical Engineering MATERIALS Selection
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
PPT on Performance Review to get promotions
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
Sustainable Sites - Green Building Construction
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
additive manufacturing of ss316l using mig welding
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Construction Project Organization Group 2.pptx
Geodesy 1.pptx...............................................
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Foundation to blockchain - A guide to Blockchain Tech
Mechanical Engineering MATERIALS Selection
OOP with Java - Java Introduction (Basics)
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPT on Performance Review to get promotions
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Operating System & Kernel Study Guide-1 - converted.pdf
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
bas. eng. economics group 4 presentation 1.pptx
Sustainable Sites - Green Building Construction
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx

Data Mining (Introduction)

  • 1. 1 I NAME OF PRESENTER Data Mining Ashis Kumar Chanda Department of Computer Science and Engineering University of Dhaka
  • 2. 2 I NAME OF PRESENTERCSE, DU2 Key concepts  What is Data mining  Why learn Data mining  Data type  Warehouse & OLAP  Data Cleaning, Integration  Associations, Item sets, Support, Confidence
  • 3. 3 I NAME OF PRESENTERCSE, DU3 Data Mining  Data mining refers to Knowledge mining from large amount of data  Also known as “Knowledge Discovery from Data” or KDD  Target is to find a Hidden Pattern
  • 4. 4 I NAME OF PRESENTER  We can’t get all type of information through Query  Query not support Statistical analysis  Again, we can apply artificial intelligence & find new patterns or structures CSE, DU4 Why learn data mining Query provide values but data mining provides idea that help to take (business ) decision Ex: Women live at “Dhanmondi” & older than 40 years most frequently buy “Jamdani Shari” at “Arong”
  • 5. 5 I NAME OF PRESENTERCSE, DU5 Data type  Tabular (Transaction data) Most commonly used  Spatial Data (Remote sensing data/ encoded data)  Tree Data ( xml )  Graphs (www, bio-molecular)  Sequence (DNA, activity log)  Text, multimedia data
  • 6. 6 I NAME OF PRESENTERCSE, DU6 Warehouse & OLAP Ware House Data Source Warehouse is an archive of information gathered from multiple sources Suppose a Banking database where each has a data source that stores all transactions of that area. And all data source will provide a clean/safe copy at Warehouse
  • 7. 7 I NAME OF PRESENTERCSE, DU7 Warehouse & OLAP There is several issues about Warehouse:  When and how to gather data  What schema/pattern to use  Data transformation & cleaning  How to update “Warehouse is a collection of data marts” Where data mart is store of data in specialized pattern
  • 8. 8 I NAME OF PRESENTERCSE, DU8 Warehouse & OLAP OLAP: Online Analytical Processing OLAP tools support interactive analysis of summary Information OLAP permits an analyst to view different summaries of multidimensional data Item name Dress Fig: Data Cube
  • 9. 9 I NAME OF PRESENTERCSE, DU9 Data cleaning There may be some missing data, duplicate data, dirty data So we need to data cleaning Some methods:  Ignore the tuple (not effective unless tuple contain many missing attribute)  Fill missing values (time consuming)  Fill with a global value (like: unknown)  Use mean attribute  Use most probable value
  • 10. 10 I NAME OF PRESENTERCSE, DU10
  • 11. 11 I NAME OF PRESENTERCSE, DU11 Associations & Item sets Associations: An associations is a rule of the form if X then Y It is denoted as X-> Y Example: if there is an exam then I read Item Sets: For any rule if X->Y & Y->X Then X, Y are called item-set Example: People buying school books in January also by notebook People buying school note books in January also by book
  • 12. 12 I NAME OF PRESENTERCSE, DU12 Support & confidence Support: The proportion of transactions in the data set which contains the itemset Confidence: The conditional probability that an item appears in a transaction when another item appears.
  • 13. 13 I NAME OF PRESENTERCSE, DU13 Support & confidence Support for {I₁,I₂} = support_count(I1 U I2)/ |D| = 4/9 Confidence for I1 → I2 =support_count(I1 U I2) / support_count(I1) = 4/6
  • 14. 14 I NAME OF PRESENTERCSE, DU14 Association rules Where, support count(AUB) is the number of transactions containing the itemsets AUB, and support count(A) is the number of transactions containing the itemset A. •Association rules can be generated as follows: 1. For each frequent itemset l, generate all nonempty subsets of l. 2. For every nonempty subset s of l, output the rule “s → (l- s)” if support count(l)/support count(s) >= min_conf, where min_conf is the minimum confidence threshold.
  • 15. 15 I NAME OF PRESENTERCSE, DU15 Summary Basic topics: Data mining, Data cleaning, Warehouse, OLAP Term: Association, Item-set, Support, Confidence
  • 16. 16 I NAME OF PRESENTERCSE, DU16 References - Data Mining Concepts & Techniques by J. Han & M. Kamber - Database system Concept by Abraham Sillberschatz, Korth, Sudarshan - Lecture of Dr. S. Srinath Institute of Technology at Madras, India