SlideShare a Scribd company logo
Department of Information Technology 1Data base Technologies (ITB4201)
Dr. C.V. Suresh Babu
Professor
Department of IT
Hindustan Institute of Science & Technology
Introduction to Data Mining
& Data Warehousing
Department of Information Technology 2Data base Technologies (ITB4201)
Action Plan
• Introduction
• Objectives
• What is Data Mining?
• Data Mining Applications
• Data Warehousing
• Advantages and disadvantages
• Trends and Current Issues
• Future Research Possibilities
Department of Information Technology 3Data base Technologies (ITB4201)
Introduction
What is Data Mining?
Data Mining is the process of collecting large amounts of raw data and
transforming that data into useful information.
Data Warehousing?
A Data Warehouse is a computerized collection of mined data.
Department of Information Technology 4Data base Technologies (ITB4201)
Objectives
• Explore the business applications of data mining &
warehousing
• Explain the advantages & disadvantages
• Uncover software used in data mining.
• Find what data mining is used for.
• Discover current trends, regulation, and future uses of the
technology.
Department of Information Technology 5Data base Technologies (ITB4201)
What is Data Mining?
Data mining is the practice of
searching through large amounts
of computerized data to find useful
patterns or trends (American
Heritage Dictionary, 2008).
Department of Information Technology 6Data base Technologies (ITB4201)
Data Mining Applications
Banking
Detect Fraudulent Activity
Insurance
Risk Assessment
Medicine/Healthcare
Enhance Research
Retail
Track consumer buying trends
Department of Information Technology 7Data base Technologies (ITB4201)
Cross-Industry Standard Process for Data Mining
- Understanding the business
- Understanding the data
- Data preparation
- Modeling
- Evaluation
- Deployment
CRISP-DM
Department of Information Technology 8Data base Technologies (ITB4201)
What is a Data Warehouse?
A Practitioners Viewpoint
“A data warehouse is simply a single, complete, and consistent
store of data obtained from a variety of sources and made
available to end users in a way they can understand and use it in
a business context.”
-- Barry Devlin, IBM Consultant
Department of Information Technology 9Data base Technologies (ITB4201)
Data Mining
Advantages
• Improves Customer Satisfaction/service
• Saves Time and Money
• Increases Sales Effectiveness
• Increases profitability
Department of Information Technology 10Data base Technologies (ITB4201)
Data Mining
Disadvantages
–Require skilled technical users to interpret and analyze data
from warehouse
–Validity of the patterns
• Related to real world circumstances
–Unable to Identify Casual Relationships
–Reserved for the few instead of the many
Department of Information Technology 11Data base Technologies (ITB4201)
A Data Warehouse is...
• Stored collection of diverse data
– A solution to data integration problem
– Single repository of information
• Subject-oriented
– Organized by subject, not by application
– Used for analysis, data mining, etc.
• Optimized differently from transaction-oriented db
• User interface aimed at executive
Department of Information Technology 12Data base Technologies (ITB4201)
A Data Warehouse is... (continued)
• Large volume of data (Gb, Tb)
• Non-volatile
– Historical
– Time attributes are important
• Updates infrequent
• May be append-only
• Examples
– All transactions ever at WalMart
– Complete client histories at insurance firm
– Stockbroker financial information and portfolios
Department of Information Technology 13Data base Technologies (ITB4201)
Data Warehousing
Advantages
–Access to information
–Data Inconsistency
–Decrease Computing Cost
–Productivity Increase
–Increase company profits
Department of Information Technology 14Data base Technologies (ITB4201)
Data Warehousing
Disadvantages
–Data must be cleaned, loaded, and extracted
• 80% of the overall process
–User Variability
• Proper Training
–Difficult to Maintain
• Incongruence among systems
Department of Information Technology 15Data base Technologies (ITB4201)
Current Issues
Data Quality
– Duplicated records
– Lack of Data Standards
– Human Error
Inoperability
– Lack of communications among existing systems
Mission Creep
Department of Information Technology 16Data base Technologies (ITB4201)
Trends & Current Issues
•4 Major Trends
•Data – growing amount collected to be sifted
•Hardware – growing performance & storage
•Scientific Computing – theory, experiment, simulation
•Business – Meet higher standard in order to foresee risks, opportunities, and
benefits for the company
•Growing quickly due to renewal with new methodology frequently discovered
•Applications of uses & methodology to medical, marketing, operations, & others
•The government is closely reviewing the uses of data mining, due to the possibilities
both good and bad
•Counterterrorism data mining has been done, but in some instances has been
deemed a violation of privacy
Department of Information Technology 17Data base Technologies (ITB4201)
Future Research Possibilities
• Government’s uses for data mining.
–National Security
–Terrorism Detection
• Identity theft through data mining.
Department of Information Technology 18Data base Technologies (ITB4201)
Conclusion/Analysis
• Data mining is the extraction of information that can
predict future trends & behaviors
• Requires a large amount of data to be collected, and then
stored in data warehouse
• Possible violation of privacy in some circumstances
• Government is getting involved with regulation, despite
the counterterrorism program being a possible violation
Department of Information Technology 19Data base Technologies (ITB4201)
Test Yourself
1. What is true about data mining?
A. Data Mining is defined as the procedure of extracting information from huge sets of data
B. Data mining also involves other processes such as Data Cleaning, Data Integration, Data Transformation
C. Data mining is the procedure of mining knowledge from data.
D. All of the above
2. A goal of data mining includes which of the following?
A. To explain some observed event or condition
B. To confirm that data exists
C. To analyze data for expected relationships
D. To create a new data warehouse
3. A data warehouse is which of the following?
A. Can be updated by end users.
B. Contains numerous naming conventions and formats.
C. Organized around important subject areas.
D. Contains only current data.
4. Which of the following features usually applies to data in a data warehouse?
A.Data are often deleted
B.Most applications consist of transactions
C.Data are rarely deleted
D.Relatively few records are processed by applications
5. Which of the following statement is true?
A.The data warehouse consists of data marts and operational data
B.The data warehouse is used as a source for the operational data
C.The operational data are used as a source for the data warehouse
D.All of the above
Department of Information Technology 20Data base Technologies (ITB4201)
Answers
1. What is true about data mining?
A. Data Mining is defined as the procedure of extracting information from huge sets of data
B. Data mining also involves other processes such as Data Cleaning, Data Integration, Data Transformation
C. Data mining is the procedure of mining knowledge from data.
D. All of the above
2. A goal of data mining includes which of the following?
A. To explain some observed event or condition
B. To confirm that data exists
C. To analyze data for expected relationships
D. To create a new data warehouse
3. A data warehouse is which of the following?
A. Can be updated by end users.
B. Contains numerous naming conventions and formats.
C. Organized around important subject areas.
D. Contains only current data.
4. Which of the following features usually applies to data in a data warehouse?
A.Data are often deleted
B.Most applications consist of transactions
C.Data are rarely deleted
D.Relatively few records are processed by applications
5. Which of the following statement is true?
A.The data warehouse consists of data marts and operational data
B.The data warehouse is used as a source for the operational data
C.The operational data are used as a source for the data warehouse
D.All of the above

More Related Content

PPTX
Introduction to Database Management Systems
PPTX
Distributed databases
PPTX
Introduction to Object Oriented databases
PPTX
Introduction to files and db systems 1.0
PPTX
Database File operation
PPTX
Database system structure
PPTX
PPTX
Introduction to XML
Introduction to Database Management Systems
Distributed databases
Introduction to Object Oriented databases
Introduction to files and db systems 1.0
Database File operation
Database system structure
Introduction to XML

What's hot (20)

PDF
Survey of Object Oriented Database
PPTX
Recovery techniques
PPTX
Overview of dbms
PDF
Open Source Platforms Integration for the Development of an Architecture of C...
PDF
Information Retrieval based on Cluster Analysis Approach
PDF
D1803012022
PDF
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
PDF
Indexing and Retrieval of Audio
PPT
A Comparative Study of RDBMs and OODBMs in Relation to Security of Data
PPSX
Corporate data handling
PPT
Database Management & Models
PDF
Comparative Study on Graph-based Information Retrieval: the Case of XML Document
PPTX
ontology based- data_integration.ali_aljadaa.1125048
PDF
Cs2305 programming paradigms lecturer notes
PDF
DBMS Architectures and Features - Lecture 7 - Introduction to Databases (1007...
PPT
03 Object Dbms Technology
PDF
Entity resolution for hierarchical data using attributes value comparison ove...
PPT
Metadata for digital long-term preservation
PDF
Comparision
PDF
H1803014347
Survey of Object Oriented Database
Recovery techniques
Overview of dbms
Open Source Platforms Integration for the Development of an Architecture of C...
Information Retrieval based on Cluster Analysis Approach
D1803012022
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
Indexing and Retrieval of Audio
A Comparative Study of RDBMs and OODBMs in Relation to Security of Data
Corporate data handling
Database Management & Models
Comparative Study on Graph-based Information Retrieval: the Case of XML Document
ontology based- data_integration.ali_aljadaa.1125048
Cs2305 programming paradigms lecturer notes
DBMS Architectures and Features - Lecture 7 - Introduction to Databases (1007...
03 Object Dbms Technology
Entity resolution for hierarchical data using attributes value comparison ove...
Metadata for digital long-term preservation
Comparision
H1803014347
Ad

Similar to Introduction to Data warehousiing and Mining (20)

DOCX
Abstract
PPTX
Data mining
PPTX
DATA MINING AND WAREHOUSING_MBA_MIS_BMB208
PPT
Data mining and data warehousing
PPT
Data mining Introduction
PPTX
Data warehousing and Data mining
PPT
dwdm unit 1.ppt
PPT
Data mining & data warehousing
PDF
Dm unit i r16
PPTX
Introduction to dm and dw
PDF
Chapter 1 Handoutfffffffffffffffffffffffffffffffffffff.pdf
PPTX
Data Warehouse
PPTX
Data Warehousing , Data Mining and BI.pptx
PPT
Data Mining and Data Warehousing
PPT
Data Warehousing And Data Mining Presentation Transcript
PDF
A Survey on Data Mining
PPT
Data Warehouse and Data Mining
PPT
DATA WAREHOUSING AND DATA MINING
PPT
DATA WAREHOUSING AND DATA MINING
PPTX
Data Warehousing AWS 12345
Abstract
Data mining
DATA MINING AND WAREHOUSING_MBA_MIS_BMB208
Data mining and data warehousing
Data mining Introduction
Data warehousing and Data mining
dwdm unit 1.ppt
Data mining & data warehousing
Dm unit i r16
Introduction to dm and dw
Chapter 1 Handoutfffffffffffffffffffffffffffffffffffff.pdf
Data Warehouse
Data Warehousing , Data Mining and BI.pptx
Data Mining and Data Warehousing
Data Warehousing And Data Mining Presentation Transcript
A Survey on Data Mining
Data Warehouse and Data Mining
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
Data Warehousing AWS 12345
Ad

More from Dr. C.V. Suresh Babu (20)

PPTX
Data analytics with R
PPTX
Association rules
PPTX
PPTX
Classification
PPTX
Blue property assumptions.
PPTX
Introduction to regression
PPTX
Expert systems
PPTX
Dempster shafer theory
PPTX
Bayes network
PPTX
Bayes' theorem
PPTX
Knowledge based agents
PPTX
Rule based system
PPTX
Formal Logic in AI
PPTX
Production based system
PPTX
Game playing in AI
PPTX
Diagnosis test of diabetics and hypertension by AI
PPTX
A study on “impact of artificial intelligence in covid19 diagnosis”
PDF
A study on “impact of artificial intelligence in covid19 diagnosis”
Data analytics with R
Association rules
Classification
Blue property assumptions.
Introduction to regression
Expert systems
Dempster shafer theory
Bayes network
Bayes' theorem
Knowledge based agents
Rule based system
Formal Logic in AI
Production based system
Game playing in AI
Diagnosis test of diabetics and hypertension by AI
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”

Recently uploaded (20)

PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Cell Structure & Organelles in detailed.
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Complications of Minimal Access Surgery at WLH
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Presentation on HIE in infants and its manifestations
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
master seminar digital applications in india
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Abdominal Access Techniques with Prof. Dr. R K Mishra
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Cell Structure & Organelles in detailed.
O5-L3 Freight Transport Ops (International) V1.pdf
Complications of Minimal Access Surgery at WLH
O7-L3 Supply Chain Operations - ICLT Program
Presentation on HIE in infants and its manifestations
Chinmaya Tiranga quiz Grand Finale.pdf
102 student loan defaulters named and shamed – Is someone you know on the list?
Microbial diseases, their pathogenesis and prophylaxis
A systematic review of self-coping strategies used by university students to ...
master seminar digital applications in india
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Pharmacology of Heart Failure /Pharmacotherapy of CHF
202450812 BayCHI UCSC-SV 20250812 v17.pptx

Introduction to Data warehousiing and Mining

  • 1. Department of Information Technology 1Data base Technologies (ITB4201) Dr. C.V. Suresh Babu Professor Department of IT Hindustan Institute of Science & Technology Introduction to Data Mining & Data Warehousing
  • 2. Department of Information Technology 2Data base Technologies (ITB4201) Action Plan • Introduction • Objectives • What is Data Mining? • Data Mining Applications • Data Warehousing • Advantages and disadvantages • Trends and Current Issues • Future Research Possibilities
  • 3. Department of Information Technology 3Data base Technologies (ITB4201) Introduction What is Data Mining? Data Mining is the process of collecting large amounts of raw data and transforming that data into useful information. Data Warehousing? A Data Warehouse is a computerized collection of mined data.
  • 4. Department of Information Technology 4Data base Technologies (ITB4201) Objectives • Explore the business applications of data mining & warehousing • Explain the advantages & disadvantages • Uncover software used in data mining. • Find what data mining is used for. • Discover current trends, regulation, and future uses of the technology.
  • 5. Department of Information Technology 5Data base Technologies (ITB4201) What is Data Mining? Data mining is the practice of searching through large amounts of computerized data to find useful patterns or trends (American Heritage Dictionary, 2008).
  • 6. Department of Information Technology 6Data base Technologies (ITB4201) Data Mining Applications Banking Detect Fraudulent Activity Insurance Risk Assessment Medicine/Healthcare Enhance Research Retail Track consumer buying trends
  • 7. Department of Information Technology 7Data base Technologies (ITB4201) Cross-Industry Standard Process for Data Mining - Understanding the business - Understanding the data - Data preparation - Modeling - Evaluation - Deployment CRISP-DM
  • 8. Department of Information Technology 8Data base Technologies (ITB4201) What is a Data Warehouse? A Practitioners Viewpoint “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.” -- Barry Devlin, IBM Consultant
  • 9. Department of Information Technology 9Data base Technologies (ITB4201) Data Mining Advantages • Improves Customer Satisfaction/service • Saves Time and Money • Increases Sales Effectiveness • Increases profitability
  • 10. Department of Information Technology 10Data base Technologies (ITB4201) Data Mining Disadvantages –Require skilled technical users to interpret and analyze data from warehouse –Validity of the patterns • Related to real world circumstances –Unable to Identify Casual Relationships –Reserved for the few instead of the many
  • 11. Department of Information Technology 11Data base Technologies (ITB4201) A Data Warehouse is... • Stored collection of diverse data – A solution to data integration problem – Single repository of information • Subject-oriented – Organized by subject, not by application – Used for analysis, data mining, etc. • Optimized differently from transaction-oriented db • User interface aimed at executive
  • 12. Department of Information Technology 12Data base Technologies (ITB4201) A Data Warehouse is... (continued) • Large volume of data (Gb, Tb) • Non-volatile – Historical – Time attributes are important • Updates infrequent • May be append-only • Examples – All transactions ever at WalMart – Complete client histories at insurance firm – Stockbroker financial information and portfolios
  • 13. Department of Information Technology 13Data base Technologies (ITB4201) Data Warehousing Advantages –Access to information –Data Inconsistency –Decrease Computing Cost –Productivity Increase –Increase company profits
  • 14. Department of Information Technology 14Data base Technologies (ITB4201) Data Warehousing Disadvantages –Data must be cleaned, loaded, and extracted • 80% of the overall process –User Variability • Proper Training –Difficult to Maintain • Incongruence among systems
  • 15. Department of Information Technology 15Data base Technologies (ITB4201) Current Issues Data Quality – Duplicated records – Lack of Data Standards – Human Error Inoperability – Lack of communications among existing systems Mission Creep
  • 16. Department of Information Technology 16Data base Technologies (ITB4201) Trends & Current Issues •4 Major Trends •Data – growing amount collected to be sifted •Hardware – growing performance & storage •Scientific Computing – theory, experiment, simulation •Business – Meet higher standard in order to foresee risks, opportunities, and benefits for the company •Growing quickly due to renewal with new methodology frequently discovered •Applications of uses & methodology to medical, marketing, operations, & others •The government is closely reviewing the uses of data mining, due to the possibilities both good and bad •Counterterrorism data mining has been done, but in some instances has been deemed a violation of privacy
  • 17. Department of Information Technology 17Data base Technologies (ITB4201) Future Research Possibilities • Government’s uses for data mining. –National Security –Terrorism Detection • Identity theft through data mining.
  • 18. Department of Information Technology 18Data base Technologies (ITB4201) Conclusion/Analysis • Data mining is the extraction of information that can predict future trends & behaviors • Requires a large amount of data to be collected, and then stored in data warehouse • Possible violation of privacy in some circumstances • Government is getting involved with regulation, despite the counterterrorism program being a possible violation
  • 19. Department of Information Technology 19Data base Technologies (ITB4201) Test Yourself 1. What is true about data mining? A. Data Mining is defined as the procedure of extracting information from huge sets of data B. Data mining also involves other processes such as Data Cleaning, Data Integration, Data Transformation C. Data mining is the procedure of mining knowledge from data. D. All of the above 2. A goal of data mining includes which of the following? A. To explain some observed event or condition B. To confirm that data exists C. To analyze data for expected relationships D. To create a new data warehouse 3. A data warehouse is which of the following? A. Can be updated by end users. B. Contains numerous naming conventions and formats. C. Organized around important subject areas. D. Contains only current data. 4. Which of the following features usually applies to data in a data warehouse? A.Data are often deleted B.Most applications consist of transactions C.Data are rarely deleted D.Relatively few records are processed by applications 5. Which of the following statement is true? A.The data warehouse consists of data marts and operational data B.The data warehouse is used as a source for the operational data C.The operational data are used as a source for the data warehouse D.All of the above
  • 20. Department of Information Technology 20Data base Technologies (ITB4201) Answers 1. What is true about data mining? A. Data Mining is defined as the procedure of extracting information from huge sets of data B. Data mining also involves other processes such as Data Cleaning, Data Integration, Data Transformation C. Data mining is the procedure of mining knowledge from data. D. All of the above 2. A goal of data mining includes which of the following? A. To explain some observed event or condition B. To confirm that data exists C. To analyze data for expected relationships D. To create a new data warehouse 3. A data warehouse is which of the following? A. Can be updated by end users. B. Contains numerous naming conventions and formats. C. Organized around important subject areas. D. Contains only current data. 4. Which of the following features usually applies to data in a data warehouse? A.Data are often deleted B.Most applications consist of transactions C.Data are rarely deleted D.Relatively few records are processed by applications 5. Which of the following statement is true? A.The data warehouse consists of data marts and operational data B.The data warehouse is used as a source for the operational data C.The operational data are used as a source for the data warehouse D.All of the above