SlideShare a Scribd company logo
2
Most read
Data Mining Lecture 2
What is Data Warehouse?
Defined in many different ways, but not
rigorously
 A decision support database that is
maintained separately from the organisation’s
operational database
 Support information processing by providing
a solid platform of consolidated, historical
data for analysis
Definition by Inmon
 “A data warehouse is a subject-oriented,
integrated, time-variant, and non-volatile
collection of data in support of management’s
decision-making process”
Data warehousing
 The process of constructing and using data
warehouses
Data Warehouse—Subject-Oriented
 Organised around major subjects, such as
customer, product, sales
 Focusing on the modelling and analysis of
data for decision makers, not on daily
operations or transaction processing
 Provide a simple and concise view around
particular subject issues by excluding data
that are not useful in the decision support
process
Data Warehouse—Integrated
Constructed by integrating multiple,
heterogeneous data sources
 relational databases, flat files, on-line
transaction records
Data cleaning and data integration
techniques are applied
 Ensure consistency in naming conventions,
encoding structures, attribute measures, etc.
among different data sources
o E.g., Hotel price: currency, tax,
breakfast covered, etc.
 When data is moved to the warehouse, it is
converted
Data Warehouse—Time Variant
The time horizon for the data warehouse is
significantly
longer than that of operational systems
 Operational database: current value data
 Data warehouse data: provide information
from a historical perspective (e.g., past 5-10
years)
Every key structure in the data warehouse
 Contains an element of time, explicitly or
implicitly
 But the key of operational data may or may
not contain “time element”
Data Warehouse—Non-Volatile
- Physically separate stores of data
transformed from the operational
environment
- Operational update of data does not
occur in the data warehouse
environment
 Does not require transaction processing,
recovery, and concurrency control
mechanisms
 Requires only two operations in data
accessing: initial loading of data and access of
data
Data Warehouse vs. Heterogeneous
DB
Traditional heterogeneous DB integration
- Build wrappers/mediators on top of
heterogeneous databases
- Query driven approach
o When a query is posed to a client site,
a meta-dictionary is used to translate
the query into queries appropriate for
individual heterogeneous sites
involved, and the results are
integrated into a global answer set
o Complex information filtering,
compete for resources
Data warehouse
- update-driven, high performance
- Information from heterogeneous sources is
integrated in advance and stored in
warehouses for direct access and analysis
Data Warehouse vs. Operational
DB
OLTP (On-Line Transaction Processing)
 Major task of traditional relational DB
 Day-to-day operations: purchasing, inventory,
banking, manufacturing, payroll, registration,
accounting, etc.
OLAP (On-Line Analytical Processing)
 Major task of data warehouse system
 Data analysis and decision making
Distinct features (OLTP vs. OLAP)
 User and system orientation: customer vs.
market
 Data contents: current, detailed vs. historical,
consolidated
 Database design: ER + application vs. star +
subject
 View: current, local vs. evolutionary,
integrated
 Access patterns: update vs. read-only but
complex queries
OLTP vs. OLAP
Why Separate Data Warehouse?
High performance for both systems
 DBMS— tuned for OLTP
-access methods, indexing,
concurrency control, recovery
 Warehouse—tuned for OLAP
-complex OLAP queries,
multidimensional view, consolidation
Different functions and different data
 Missing data: Decision support requires
historical data which operational DBs do not
typically maintain
 Data consolidation: DS requires consolidation
(aggregation, summarisation) of data from
heterogeneous sources
 Data quality: different sources typically use
inconsistent data representations, codes and
formats which have to be reconciled

More Related Content

PDF
Data mining 3 - Data Models and Data Warehouse Design (cheat sheet - printable)
PDF
Rapport DVWA: CSRF
PDF
Présentation de Thèse
PDF
Rapport Projet Gestion des Etudiants avec C++
PPTX
Audit de sécurité informatique
PPTX
La Sécurité informatiques
PPT
Cours CyberSécurité - Concepts Clés
PPTX
Présentation ERP
Data mining 3 - Data Models and Data Warehouse Design (cheat sheet - printable)
Rapport DVWA: CSRF
Présentation de Thèse
Rapport Projet Gestion des Etudiants avec C++
Audit de sécurité informatique
La Sécurité informatiques
Cours CyberSécurité - Concepts Clés
Présentation ERP

What's hot (20)

PDF
ODP
Sicurezza Informatica
PPTX
Ebios
PDF
Presentation (SOUTENANCE) : PFE
PPTX
Présentation pfe Développement d'une application bancaire mobile
PPT
Présentation sécurité informatique naceur chafroud de cynapsys
PDF
DEMARCHE AUDIT INFORMATIQUE DANS UNE BANQUE - RAPPORT DE STAGE
PPTX
Chp2 - Les Entrepôts de Données
PDF
Research Paper on Digital Forensic
PPTX
Big data - Cours d'introduction l Data-business
PDF
Rapport de stage d'été
PPTX
Introduction aux systèmes répartis
PPTX
Cloud & Sécurité : Quels risques et quelles sont les questions importantes à ...
PDF
application desktop pour la gestion d'une auto-ecole
PPTX
Innovations et impact sur notre pratique de soins
PPTX
Prez PFE
PDF
PPTX
pfe_final.pptx
Sicurezza Informatica
Ebios
Presentation (SOUTENANCE) : PFE
Présentation pfe Développement d'une application bancaire mobile
Présentation sécurité informatique naceur chafroud de cynapsys
DEMARCHE AUDIT INFORMATIQUE DANS UNE BANQUE - RAPPORT DE STAGE
Chp2 - Les Entrepôts de Données
Research Paper on Digital Forensic
Big data - Cours d'introduction l Data-business
Rapport de stage d'été
Introduction aux systèmes répartis
Cloud & Sécurité : Quels risques et quelles sont les questions importantes à ...
application desktop pour la gestion d'une auto-ecole
Innovations et impact sur notre pratique de soins
Prez PFE
pfe_final.pptx
Ad

Similar to Data mining 2 - Data warehouse (cheat sheet - printable) (20)

PPTX
Data warehouse
PPTX
DWDM Unit 1 (1).pptx
PPT
Datawarehouse and OLAP
PPT
Ch1 data-warehousing
PPT
Ch1 data-warehousing
PPT
1.4 data warehouse
PDF
SAP HANA Architecture Overview | SAP HANA Tutorial
PDF
data warehousing
PPT
SUPERB DATA WAREHOUSE.ppt
PPT
OLAP technology
PPT
Yoyopresentasi 1225941108853502-8 2
PPT
1-_Intro_to_Data_Minning__DWH.ppt
PPTX
DATA WAREHOUSING.2.pptx
PPT
2. olap warehouse
PDF
Data Warehouse and Architecture, OLAP Operation
PPT
Data ware housing - Introduction to data ware housing process.
PPT
Introduction to Data warehouse
PPTX
datamining techniques and various tools.pptx
PPT
Introduction to Data Warehousing
PPTX
1. Data warehouse Fundamentals for MCA SPPU.pptx
Data warehouse
DWDM Unit 1 (1).pptx
Datawarehouse and OLAP
Ch1 data-warehousing
Ch1 data-warehousing
1.4 data warehouse
SAP HANA Architecture Overview | SAP HANA Tutorial
data warehousing
SUPERB DATA WAREHOUSE.ppt
OLAP technology
Yoyopresentasi 1225941108853502-8 2
1-_Intro_to_Data_Minning__DWH.ppt
DATA WAREHOUSING.2.pptx
2. olap warehouse
Data Warehouse and Architecture, OLAP Operation
Data ware housing - Introduction to data ware housing process.
Introduction to Data warehouse
datamining techniques and various tools.pptx
Introduction to Data Warehousing
1. Data warehouse Fundamentals for MCA SPPU.pptx
Ad

Recently uploaded (20)

PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
1_English_Language_Set_2.pdf probationary
PPTX
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PDF
RMMM.pdf make it easy to upload and study
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
Hazard Identification & Risk Assessment .pdf
PDF
Weekly quiz Compilation Jan -July 25.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
IGGE1 Understanding the Self1234567891011
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PDF
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
PDF
Classroom Observation Tools for Teachers
PDF
Trump Administration's workforce development strategy
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Final Presentation General Medicine 03-08-2024.pptx
1_English_Language_Set_2.pdf probationary
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
What if we spent less time fighting change, and more time building what’s rig...
Practical Manual AGRO-233 Principles and Practices of Natural Farming
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
RMMM.pdf make it easy to upload and study
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
Hazard Identification & Risk Assessment .pdf
Weekly quiz Compilation Jan -July 25.pdf
Final Presentation General Medicine 03-08-2024.pptx
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
History, Philosophy and sociology of education (1).pptx
IGGE1 Understanding the Self1234567891011
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
Classroom Observation Tools for Teachers
Trump Administration's workforce development strategy

Data mining 2 - Data warehouse (cheat sheet - printable)

  • 1. Data Mining Lecture 2 What is Data Warehouse? Defined in many different ways, but not rigorously  A decision support database that is maintained separately from the organisation’s operational database  Support information processing by providing a solid platform of consolidated, historical data for analysis Definition by Inmon  “A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process” Data warehousing  The process of constructing and using data warehouses Data Warehouse—Subject-Oriented  Organised around major subjects, such as customer, product, sales  Focusing on the modelling and analysis of data for decision makers, not on daily operations or transaction processing  Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process Data Warehouse—Integrated Constructed by integrating multiple, heterogeneous data sources  relational databases, flat files, on-line transaction records Data cleaning and data integration techniques are applied  Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources o E.g., Hotel price: currency, tax, breakfast covered, etc.  When data is moved to the warehouse, it is converted Data Warehouse—Time Variant The time horizon for the data warehouse is significantly longer than that of operational systems  Operational database: current value data  Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) Every key structure in the data warehouse  Contains an element of time, explicitly or implicitly  But the key of operational data may or may not contain “time element” Data Warehouse—Non-Volatile - Physically separate stores of data transformed from the operational environment - Operational update of data does not occur in the data warehouse environment  Does not require transaction processing, recovery, and concurrency control mechanisms  Requires only two operations in data accessing: initial loading of data and access of data Data Warehouse vs. Heterogeneous DB Traditional heterogeneous DB integration - Build wrappers/mediators on top of heterogeneous databases - Query driven approach o When a query is posed to a client site, a meta-dictionary is used to translate the query into queries appropriate for individual heterogeneous sites involved, and the results are integrated into a global answer set o Complex information filtering, compete for resources Data warehouse - update-driven, high performance - Information from heterogeneous sources is integrated in advance and stored in warehouses for direct access and analysis
  • 2. Data Warehouse vs. Operational DB OLTP (On-Line Transaction Processing)  Major task of traditional relational DB  Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc. OLAP (On-Line Analytical Processing)  Major task of data warehouse system  Data analysis and decision making Distinct features (OLTP vs. OLAP)  User and system orientation: customer vs. market  Data contents: current, detailed vs. historical, consolidated  Database design: ER + application vs. star + subject  View: current, local vs. evolutionary, integrated  Access patterns: update vs. read-only but complex queries OLTP vs. OLAP Why Separate Data Warehouse? High performance for both systems  DBMS— tuned for OLTP -access methods, indexing, concurrency control, recovery  Warehouse—tuned for OLAP -complex OLAP queries, multidimensional view, consolidation Different functions and different data  Missing data: Decision support requires historical data which operational DBs do not typically maintain  Data consolidation: DS requires consolidation (aggregation, summarisation) of data from heterogeneous sources  Data quality: different sources typically use inconsistent data representations, codes and formats which have to be reconciled