SlideShare a Scribd company logo
Building Data WareHouse
            by Inmon
            Chapter 2: The Data Warehouse Environment




IT-Slideshares                         http://it-slideshares.blogspot.com/
2. The Data Warehouse
Environment
1.   The Structure of the Data Warehouse
2.   Subject Orientation
3.   Day 1 to Day n Phenomenon
4.   Granularity
5.   Exploration and Data Mining
6.   Living Sample Database
7.   Partitioning as a Design Approach
8.   Structuring Data in the Data Warehouse
9.   Auditing and the Data Warehouse
2. The Data Warehouse Environment
(cont.)
10. Data Homogeneity and Heterogeneity
11. Purging Warehouse Data
12. Reporting and the Architected
    Environment
13. The Operational Window of
    Opportunity
14. Incorrect Data in the Data Warehouse
15. Summary
2.0 Introduction – data
warehouse characteristics
 Subject-oriented in regards to DSS
 Integrated of multiple data sources
 Non-volatile data archive
 Time-Variant collection of data in
  support of DSS report
2.1. data warehouse characteristics
2.1. data warehouse characteristics
2.1. The Structure of the Data Warehouse
2.1 The Structure of the Data
warehouse
2.2. Subject Orientation
The data warehouse is oriented to the major
 subject areas of the corporation that have
 been defined in the high-level corporate data
 model. Typical subject areas include the
 following:

   Customer
   Product
   Transaction or activity
   Policy
   Claim
   Account
2.2.1
2.2.2 Subject Orientation (con’t)
2.2.3 Subject-Orientation (con’t)
2.2.4 Subject Orientation (con’t)
2.3. Day 1 to Day n Phenomenon
     Data warehouses are not built all at once.
     data warehouse be built in an orderly,
      iterative, step-at-a-time fashion.
     The ―big bang‖ approach to data warehouse
      development is simply an invitation to
      disaster and is never an appropriate
      alternative.
Lecture 02 - The Data Warehouse Environment
2.4. Granularity
2.4.1. The Benefits of
Granularity
 The granular data found in the data warehouse is the
  key to reusability.
 Looking at the data in different ways is only one
  advantage of having a solid foundation.
    ◦ Focus on specific needs of each DSS report e.g. daily,
      monthly, quarterly or yearly or even multiple years trending
      reports
 Another related benefit of a low level of granularity is
  flexibility
 Another benefit of granular data is that it contains a
  history of activities and events across the corporation.
 largest benefit of a data warehouse foundation is that
  future unknown requirements can be accommodated.
2.4.2. An Example of Granularity
2.4.2.1
2.4.3. Dual Levels of Granularity
2.4.3.1 Telephone example
2.4.3.2 Telephone example (con’t)
2.4.3.3 Telephone Example (cont’)
2.5. Exploration and Data
Mining
 Granular data in Data warehouse support Data
  marts
 Support process of data mining or data exploration
 References

    ◦ Exploration Warehousing: Turning
      Business Information into Business
      Opportunity(Hoboken, N.J.: Wiley, 2000)
2.6. Living Sample Database
2.7. Partitioning as a Design Approach
     Proper partitioning can benefit the data
      warehouse in several ways:

      Loading data
      Accessing data
      Archiving data
      Deleting data
      Monitoring data
      Storing data
2.7.1. Partitioning of Data
2.7.1. Partitioning of Data (cont.)
Following are some of the tasks that cannot
 easily be performed when data resides in
 large physical units:

   Restructuring
   Indexing
   Sequential scanning, if needed
   Reorganization
   Recovery
   Monitoring
2.7.1. Partitioning of Data (cont.)
Data can be divided by many criteria, such
 as:

 By date
 By line of business
 By geography
 By organizational unit
 By all of the above
2.7.1. Partitioning of Data (cont.)
As an example of how a life insurance company may
  choose to partition by physical units of data.

   data, consider the following physical units of data:
   2000 health claims
   2001 health claims
   2002 health claims
   1999 life claims
   2000 life claims
   2001 life claims
   2002 life claims
   2000 casualty claims
   2001 casualty claims
   2002 casualty claims
2.8 Structuring Data in the Data Warehouse
2.8 Structuring Data in the Data Warehouse
                   (cont.)
2.8 Structuring Data in the Data Warehouse
                   (cont.)
2.8 Structuring Data in the Data Warehouse
                   (cont.)
2.8 Structuring Data in the Data Warehouse
                   (cont.)
2.8. Structuring Data in the Data
           Warehouse (cont.)
There are many more ways to structure
 data within the data warehouse. The
 most common are these:

 Simple cumulative
 Rolling summary
 Simple direct
 Continuous
2.8. Structuring Data in the Data
            Warehouse (cont.)
At the key level, data warehouse keys
  are inevitably compounded
  keys.There are two compelling
  reasons for this:
 Date—year, year/month,
  year/month/day, and so on—is almost
  always a part of the key.
 Because data warehouse data is
  partitioned, the different components
  of the partitioning show up as part of
  the key.
2.8. Structuring Data in the Data Warehouse
                   (cont.)
2.9 Auditing and the Data Warehouse
     Data that otherwise would not find its
      way into the warehouse suddenly has to
      be there.
     The timing of data entry into the
      warehouse changes dramatically when
      an auditing capability is required.
     The backup and recovery restrictions for
      the data warehouse change drastically
      when an auditing capability is required.
     Auditing data at the warehouse forces
      the granularity of data in the warehouse
      to be at the very lowest level.
2.10 Data Homogeneity and
Heterogeneity
2.10 Data Homogeneity and Heterogeneity
               (cont.)
2.10 Data Homogeneity and
        Heterogeneity (cont.)
The data in the data warehouse then is
 subdivided by the following criteria:

 Subject area
 Table
 Occurrences of data within table
2.10. Data Homogeneity and Heterogeneity
                (cont.)
2.11 Purging Warehouse Data
There are several ways in which data is purged or
 the detail of data is transformed, including the
 following:

 Data is added to a rolling summary file where
  detail is lost.
 Data is transferred to a bulk storage medium from
  a high-performance medium such as DASD.
 Data is actually purged from the system.
 Data is transferred from one level of the
  architecture to another, such as from the
  operational level to the data warehouse level.
2.12 Reporting and the Architected Environment
2.13. The Operational Window of
Opportunity
The following are some suggestions as to how the operational window
  of archival data may look in different industries:

   Insurance—2 to 3 years
   Bank trust processing—2 to 5 years
   Telephone customer usage—30 to 60 days
   Supplier/vendor activity—2 to 3 years
   Retail banking customer account activity—30 days
   Vendor activity—1 year
   Loans—2 to 5 years
   Retailing SKU activity—1 to 14 days
   Vendor activity—1 week to 1 month
   Airlines flight seat activity—30 to 90 days
   Vendor/supplier activity—1 to 2 years
   Public utility customer utilization—60 to 90 days
   Supplier activity—1 to 5 years
2.14. Incorrect Data in the Data Warehouse
      Choice 1: Go back into the data
       warehouse for July 2 and find the
       offending entry. Then, using update
       capabilities, replace the value $5,000
       with the value $750.
      Choice 2: Enter offsetting entries.
      Choice 3: Reset the account to the
       proper value on August 16.
2.14. Incorrect Data in the Data
           Warehouse (cont.)
Choice 1

 The integrity of the data has been
  destroyed. Any report running between
  July 2 and Aug 16 will not be able to be
  reconciled.
 The update must be done in the data
  warehouse environment.
 In many cases, there is not a single entry
  that must be corrected, but many, many
  entries that must be corrected.
2.14. Incorrect Data in the Data
           Warehouse (cont.)
Choice 2

 Many entries may have to be
  corrected, not just one. Making a
  simple adjustment may not be an easy
  thing to do at all.
 Sometimes the formula for correction
  is so complex that making an
  adjustment cannot be done.
2.14. Incorrect Data in the Data
           Warehouse (cont.)
Choice 2 (con’t)

 The ability to simply reset an account
  as of one moment in time requires
  application and procedural
  conventions.
 Such a resetting of values does not
  accurately account for the error that
  has been made.
2.15. Summary
   1.   The Structure of the Data Warehouse
   2.   Subject Orientation
   3.   Granularity
   4.   Exploration and Data Mining
   5.   Living Sample Database
   6.   Structuring Data in the Data Warehouse
   7.   Auditing and the Data Warehouse
   8.   Data Homogeneity and Heterogeneity
   9.   Purging Warehouse Data
2.15. Summary

10. Reporting  and the Architected
    Environment
11. The Operational Window of
    Opportunity
12. Incorrect Data in the Data Warehouse




                      http://it-slideshares.blogspot.com/

More Related Content

PPT
D01 etl
PPT
Lecture 03 - The Data Warehouse and Design
PPT
Lecture 04 - Granularity in the Data Warehouse
PPTX
Data warehouse,data mining & Big Data
PDF
Role of Data Cleaning in Data Warehouse
PPTX
Datawarehousing Terminology
PDF
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
PPTX
Data warehouse architecture
D01 etl
Lecture 03 - The Data Warehouse and Design
Lecture 04 - Granularity in the Data Warehouse
Data warehouse,data mining & Big Data
Role of Data Cleaning in Data Warehouse
Datawarehousing Terminology
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
Data warehouse architecture

What's hot (18)

DOCX
Components of a Data-Warehouse
PPT
Data Warehouse
PDF
Data mining
PPTX
Data Warehousing
PPTX
Data warehouse logical design
PDF
Data Warehouses & Deployment By Ankita dubey
PDF
Data Warehousing & Basic Architectural Framework
PPT
Data Warehouse Architectures
PPTX
3 tier data warehouse
 
PPT
Database Systems
PPT
Data Warehouse Modeling
PDF
Data Warehouse
PPTX
data warehouse , data mart, etl
PPT
Data management new
PPS
Data Warehouse 101
PPTX
Data warehouse presentaion
PPTX
Data warehouse and olap technology
PDF
Introduction to Data Warehousing
Components of a Data-Warehouse
Data Warehouse
Data mining
Data Warehousing
Data warehouse logical design
Data Warehouses & Deployment By Ankita dubey
Data Warehousing & Basic Architectural Framework
Data Warehouse Architectures
3 tier data warehouse
 
Database Systems
Data Warehouse Modeling
Data Warehouse
data warehouse , data mart, etl
Data management new
Data Warehouse 101
Data warehouse presentaion
Data warehouse and olap technology
Introduction to Data Warehousing
Ad

Viewers also liked (10)

PDF
BIM Case Study
PPTX
Laser Scan to BIM Case Study - Qasr Al Hosn Fort Council Chamber
PDF
5th Qatar BIM User Day, RICS – Decision making, training and BIM
PDF
5th Qatar BIM User Day, BIM Interoperability Issues: Lessons learned from PLM
PDF
5th Qatar BIM User Day, Expanding BIM into Augmented and Virtual Reality
PDF
5th Qatar BIM User Day, Understanding stakeholder roles in BIM
PDF
5th Qatar BIM User Day, Defining the role of the BIM Manager
PDF
5th Qatar BIM User Day, BIM process implementation and management on Qatar me...
PDF
5th Qatar BIM User Day, Perspectives on UK BIM standards & local adoption
PPTX
How education estates can benefit from better information management
BIM Case Study
Laser Scan to BIM Case Study - Qasr Al Hosn Fort Council Chamber
5th Qatar BIM User Day, RICS – Decision making, training and BIM
5th Qatar BIM User Day, BIM Interoperability Issues: Lessons learned from PLM
5th Qatar BIM User Day, Expanding BIM into Augmented and Virtual Reality
5th Qatar BIM User Day, Understanding stakeholder roles in BIM
5th Qatar BIM User Day, Defining the role of the BIM Manager
5th Qatar BIM User Day, BIM process implementation and management on Qatar me...
5th Qatar BIM User Day, Perspectives on UK BIM standards & local adoption
How education estates can benefit from better information management
Ad

Similar to Lecture 02 - The Data Warehouse Environment (20)

PDF
Data Warehousing
PDF
Unit 3 part 2
PPT
introduction to datawarehouse
DOCX
Unit 1
PDF
An Overview of Data Lake
PDF
Get Accounting Information Systems The Crossroads of Accounting and IT 2nd Ed...
DOC
Dw hk-white paper
PPT
Data warehousing and data mining presentation
PPTX
Knowledge Data Discovery-Dataware House.pptx
PDF
Accounting Information Systems The Crossroads of Accounting and IT 2nd Editio...
PDF
UNIT 1 DWDM.pdf
PPTX
Data warehouse
DOC
Data warehouse-dimensional-modeling-and-design
PDF
Traditional BI vs. Business Data Lake – A Comparison
PPT
20IT501_DWDM_PPT_Unit_I.ppt
PDF
IRJET- Physical Database Design Techniques to improve Database Performance
PDF
Accounting Information Systems The Crossroads of Accounting and IT 2nd Editio...
PPTX
Lecture 5: Extraction Transformation and loading.pptx
PDF
oracle-adw-melts snowflake-report.pdf
PDF
Comparing and analyzing various method of data integration in big data
Data Warehousing
Unit 3 part 2
introduction to datawarehouse
Unit 1
An Overview of Data Lake
Get Accounting Information Systems The Crossroads of Accounting and IT 2nd Ed...
Dw hk-white paper
Data warehousing and data mining presentation
Knowledge Data Discovery-Dataware House.pptx
Accounting Information Systems The Crossroads of Accounting and IT 2nd Editio...
UNIT 1 DWDM.pdf
Data warehouse
Data warehouse-dimensional-modeling-and-design
Traditional BI vs. Business Data Lake – A Comparison
20IT501_DWDM_PPT_Unit_I.ppt
IRJET- Physical Database Design Techniques to improve Database Performance
Accounting Information Systems The Crossroads of Accounting and IT 2nd Editio...
Lecture 5: Extraction Transformation and loading.pptx
oracle-adw-melts snowflake-report.pdf
Comparing and analyzing various method of data integration in big data

More from phanleson (20)

PDF
Learning spark ch01 - Introduction to Data Analysis with Spark
PPT
Firewall - Network Defense in Depth Firewalls
PPT
Mobile Security - Wireless hacking
PPT
Authentication in wireless - Security in Wireless Protocols
PPT
E-Commerce Security - Application attacks - Server Attacks
PPT
Hacking web applications
PPTX
HBase In Action - Chapter 04: HBase table design
PPT
HBase In Action - Chapter 10 - Operations
PPT
Hbase in action - Chapter 09: Deploying HBase
PPTX
Learning spark ch11 - Machine Learning with MLlib
PPTX
Learning spark ch10 - Spark Streaming
PPTX
Learning spark ch09 - Spark SQL
PPT
Learning spark ch07 - Running on a Cluster
PPTX
Learning spark ch06 - Advanced Spark Programming
PPTX
Learning spark ch05 - Loading and Saving Your Data
PPTX
Learning spark ch04 - Working with Key/Value Pairs
PPTX
Learning spark ch01 - Introduction to Data Analysis with Spark
PPT
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
PPT
Lecture 1 - Getting to know XML
PPTX
Lecture 4 - Adding XTHML for the Web
Learning spark ch01 - Introduction to Data Analysis with Spark
Firewall - Network Defense in Depth Firewalls
Mobile Security - Wireless hacking
Authentication in wireless - Security in Wireless Protocols
E-Commerce Security - Application attacks - Server Attacks
Hacking web applications
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 10 - Operations
Hbase in action - Chapter 09: Deploying HBase
Learning spark ch11 - Machine Learning with MLlib
Learning spark ch10 - Spark Streaming
Learning spark ch09 - Spark SQL
Learning spark ch07 - Running on a Cluster
Learning spark ch06 - Advanced Spark Programming
Learning spark ch05 - Loading and Saving Your Data
Learning spark ch04 - Working with Key/Value Pairs
Learning spark ch01 - Introduction to Data Analysis with Spark
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
Lecture 1 - Getting to know XML
Lecture 4 - Adding XTHML for the Web

Lecture 02 - The Data Warehouse Environment

  • 1. Building Data WareHouse by Inmon Chapter 2: The Data Warehouse Environment IT-Slideshares http://it-slideshares.blogspot.com/
  • 2. 2. The Data Warehouse Environment 1. The Structure of the Data Warehouse 2. Subject Orientation 3. Day 1 to Day n Phenomenon 4. Granularity 5. Exploration and Data Mining 6. Living Sample Database 7. Partitioning as a Design Approach 8. Structuring Data in the Data Warehouse 9. Auditing and the Data Warehouse
  • 3. 2. The Data Warehouse Environment (cont.) 10. Data Homogeneity and Heterogeneity 11. Purging Warehouse Data 12. Reporting and the Architected Environment 13. The Operational Window of Opportunity 14. Incorrect Data in the Data Warehouse 15. Summary
  • 4. 2.0 Introduction – data warehouse characteristics  Subject-oriented in regards to DSS  Integrated of multiple data sources  Non-volatile data archive  Time-Variant collection of data in support of DSS report
  • 5. 2.1. data warehouse characteristics
  • 6. 2.1. data warehouse characteristics
  • 7. 2.1. The Structure of the Data Warehouse
  • 8. 2.1 The Structure of the Data warehouse
  • 9. 2.2. Subject Orientation The data warehouse is oriented to the major subject areas of the corporation that have been defined in the high-level corporate data model. Typical subject areas include the following:  Customer  Product  Transaction or activity  Policy  Claim  Account
  • 10. 2.2.1
  • 14. 2.3. Day 1 to Day n Phenomenon  Data warehouses are not built all at once.  data warehouse be built in an orderly, iterative, step-at-a-time fashion.  The ―big bang‖ approach to data warehouse development is simply an invitation to disaster and is never an appropriate alternative.
  • 17. 2.4.1. The Benefits of Granularity  The granular data found in the data warehouse is the key to reusability.  Looking at the data in different ways is only one advantage of having a solid foundation. ◦ Focus on specific needs of each DSS report e.g. daily, monthly, quarterly or yearly or even multiple years trending reports  Another related benefit of a low level of granularity is flexibility  Another benefit of granular data is that it contains a history of activities and events across the corporation.  largest benefit of a data warehouse foundation is that future unknown requirements can be accommodated.
  • 18. 2.4.2. An Example of Granularity
  • 20. 2.4.3. Dual Levels of Granularity
  • 24. 2.5. Exploration and Data Mining  Granular data in Data warehouse support Data marts  Support process of data mining or data exploration  References ◦ Exploration Warehousing: Turning Business Information into Business Opportunity(Hoboken, N.J.: Wiley, 2000)
  • 25. 2.6. Living Sample Database
  • 26. 2.7. Partitioning as a Design Approach Proper partitioning can benefit the data warehouse in several ways:  Loading data  Accessing data  Archiving data  Deleting data  Monitoring data  Storing data
  • 28. 2.7.1. Partitioning of Data (cont.) Following are some of the tasks that cannot easily be performed when data resides in large physical units:  Restructuring  Indexing  Sequential scanning, if needed  Reorganization  Recovery  Monitoring
  • 29. 2.7.1. Partitioning of Data (cont.) Data can be divided by many criteria, such as:  By date  By line of business  By geography  By organizational unit  By all of the above
  • 30. 2.7.1. Partitioning of Data (cont.) As an example of how a life insurance company may choose to partition by physical units of data.  data, consider the following physical units of data:  2000 health claims  2001 health claims  2002 health claims  1999 life claims  2000 life claims  2001 life claims  2002 life claims  2000 casualty claims  2001 casualty claims  2002 casualty claims
  • 31. 2.8 Structuring Data in the Data Warehouse
  • 32. 2.8 Structuring Data in the Data Warehouse (cont.)
  • 33. 2.8 Structuring Data in the Data Warehouse (cont.)
  • 34. 2.8 Structuring Data in the Data Warehouse (cont.)
  • 35. 2.8 Structuring Data in the Data Warehouse (cont.)
  • 36. 2.8. Structuring Data in the Data Warehouse (cont.) There are many more ways to structure data within the data warehouse. The most common are these:  Simple cumulative  Rolling summary  Simple direct  Continuous
  • 37. 2.8. Structuring Data in the Data Warehouse (cont.) At the key level, data warehouse keys are inevitably compounded keys.There are two compelling reasons for this:  Date—year, year/month, year/month/day, and so on—is almost always a part of the key.  Because data warehouse data is partitioned, the different components of the partitioning show up as part of the key.
  • 38. 2.8. Structuring Data in the Data Warehouse (cont.)
  • 39. 2.9 Auditing and the Data Warehouse  Data that otherwise would not find its way into the warehouse suddenly has to be there.  The timing of data entry into the warehouse changes dramatically when an auditing capability is required.  The backup and recovery restrictions for the data warehouse change drastically when an auditing capability is required.  Auditing data at the warehouse forces the granularity of data in the warehouse to be at the very lowest level.
  • 40. 2.10 Data Homogeneity and Heterogeneity
  • 41. 2.10 Data Homogeneity and Heterogeneity (cont.)
  • 42. 2.10 Data Homogeneity and Heterogeneity (cont.) The data in the data warehouse then is subdivided by the following criteria:  Subject area  Table  Occurrences of data within table
  • 43. 2.10. Data Homogeneity and Heterogeneity (cont.)
  • 44. 2.11 Purging Warehouse Data There are several ways in which data is purged or the detail of data is transformed, including the following:  Data is added to a rolling summary file where detail is lost.  Data is transferred to a bulk storage medium from a high-performance medium such as DASD.  Data is actually purged from the system.  Data is transferred from one level of the architecture to another, such as from the operational level to the data warehouse level.
  • 45. 2.12 Reporting and the Architected Environment
  • 46. 2.13. The Operational Window of Opportunity The following are some suggestions as to how the operational window of archival data may look in different industries:  Insurance—2 to 3 years  Bank trust processing—2 to 5 years  Telephone customer usage—30 to 60 days  Supplier/vendor activity—2 to 3 years  Retail banking customer account activity—30 days  Vendor activity—1 year  Loans—2 to 5 years  Retailing SKU activity—1 to 14 days  Vendor activity—1 week to 1 month  Airlines flight seat activity—30 to 90 days  Vendor/supplier activity—1 to 2 years  Public utility customer utilization—60 to 90 days  Supplier activity—1 to 5 years
  • 47. 2.14. Incorrect Data in the Data Warehouse  Choice 1: Go back into the data warehouse for July 2 and find the offending entry. Then, using update capabilities, replace the value $5,000 with the value $750.  Choice 2: Enter offsetting entries.  Choice 3: Reset the account to the proper value on August 16.
  • 48. 2.14. Incorrect Data in the Data Warehouse (cont.) Choice 1  The integrity of the data has been destroyed. Any report running between July 2 and Aug 16 will not be able to be reconciled.  The update must be done in the data warehouse environment.  In many cases, there is not a single entry that must be corrected, but many, many entries that must be corrected.
  • 49. 2.14. Incorrect Data in the Data Warehouse (cont.) Choice 2  Many entries may have to be corrected, not just one. Making a simple adjustment may not be an easy thing to do at all.  Sometimes the formula for correction is so complex that making an adjustment cannot be done.
  • 50. 2.14. Incorrect Data in the Data Warehouse (cont.) Choice 2 (con’t)  The ability to simply reset an account as of one moment in time requires application and procedural conventions.  Such a resetting of values does not accurately account for the error that has been made.
  • 51. 2.15. Summary 1. The Structure of the Data Warehouse 2. Subject Orientation 3. Granularity 4. Exploration and Data Mining 5. Living Sample Database 6. Structuring Data in the Data Warehouse 7. Auditing and the Data Warehouse 8. Data Homogeneity and Heterogeneity 9. Purging Warehouse Data
  • 52. 2.15. Summary 10. Reporting and the Architected Environment 11. The Operational Window of Opportunity 12. Incorrect Data in the Data Warehouse http://it-slideshares.blogspot.com/