SlideShare a Scribd company logo
COURCES WE OFFER:
BSC(IT) FY,SY,TY
BSC(CS) FY,SY,TY
BSC(IT/CS) PROJECTS
MCA (ENTRANCE)
ENGG(IT/ELECTRONICS/EXTC)
ADDRESS: 302 PARANJPE UDYOG BHAVAN, NEAR KHANDELWAL SWEETS, THANE STATION, THANE
WEST.
TEL: 8097071144/55
STAY CONNECTED FOR MORE UPDATES AND STUDY NOTES
FACEBOOK : https://www.facebook.com/weittutorial
EMAIL: weit.tutorials@gmail.com
1PHONE:8097071144/55
UNIT 1
Introduction to Data Warehousing: Introduction, Necessity,
Framework of the datawarehouse, options, developing
datawarehouses, end points.
Data Warehousing Design Consideration and Dimensional
Modeling: Defining Dimensional Model, Granularity of Facts,
Additivity of Facts, Functional dependency of the Data, Helper Tables,
Implementation manyto-many relationships between fact and
dimensional modelling
PHONE:8097071144/55 2
What Is A Data Warehouse?
 A data warehouse is a powerful database model that
significantly enhances the user’s ability to quickly
analyze large, multidimensional data sets.
 It cleanses and organizes data to allow users to make
business decisions based on facts.
 Creating data to be analytical requires that it be
subject-oriented, integrated, time-referenced, and
non-volatile.
PHONE:8097071144/55 3
 Subject-Oriented Data
 Data warehouses group data by subject rather than by
activity.
 subjects— employees, accounts, sales, products.
 This subject specific design helps in reducing the query
response time
 Integrated Data
 Integrated data refers to de-duplicating information and
merging it from many sources into one consistent
location.
 Much of the transformation and loading work that goes
into the data warehouse is centered on integrating data
and standardizing it.
PHONE:8097071144/55 4
 Time-Referenced Data
 time-referenced data essentially refers to its time-valued
characteristic.
 EG:the user may ask “What were the total sales of
product ‘A’ for the past three years on New Year’s Day
across region ‘Y ’?”.
 This exploration activity is termed “data mining”
 Non-Volatile Data
 The non-volatility of data, characteristic of data
warehouse, enables users to dig deep into history and
arrive at specific business decisions based on facts.
PHONE:8097071144/55 5
Why A Data Warehouse?
 The Data Access Crisis
 Every day, organizations large and small, create billions
of bytes of data about all aspects of their business;
 millions of individual facts about their customers,
products, operations and people. But for the most part,
this is locked up in a maze of computer systems and is
exceedingly difficult to get at.
 This phenomenon has been described as “data in jail”.
PHONE:8097071144/55 6
Data Warehousing
 Data warehousing is a field that has grown from the
integration of a number of different technologies and
experiences over the past two decades. These
experiences have allowed the IT industry to identify
the key problems that need to be solved.
PHONE:8097071144/55 7
Operational vs. Informational Systems
 OPERATIONAL
 Operational systems, as their name implies, are the
systems that help the every day operation of the
enterprise.
 These are the backbone systems of any enterprise, and
include order entry, inventory, manufacturing, payroll
and accounting.
 INFORMATIONAL
 Informational systems deal with analyzing data and
making decisions
 informational data needs often span a number of
different areas and need large amounts of related
operational data.
PHONE:8097071144/55 8
Framework Of The Data Warehouse
 One of the reasons that data warehousing has taken
such a long time to develop is that it is actually a very
comprehensive technology.
 In fact, it can be best represented as an enterprise-wide
framework for managing informational data within the
organization.
 In order to understand how all the components involved
in a data warehousing strategy are related, it is essential
to have a Data Warehouse Architecture. In order to
understand how all the components involved in a data
warehousing strategy are related, it is essential to have a
Data Warehouse Architecture.
PHONE:8097071144/55 9
Data Warehouse Architecture
 A Data Warehouse Architecture (DWA) is a way of
representing the overall structure of data,
communication, processing and presentation that
exists for end-user computing within the enterprise.
PHONE:8097071144/55 10
The architecture
is made up of a number of interconnected parts
 Source system
 Source data transport layer
 Data quality control and data profiling layer
 Metadata management layer
 Data integration layer
 Data processing layer
 End user reporting layer
PHONE:8097071144/55 11
PHONE:8097071144/55 12
Data Warehouse Options
 number of key factors that need to be considered to
develop data warehouses.
1. Scope of the data warehouse
 The scope of a data warehouse may be as broad as all the
informational data for the entire enterprise from the
beginning of time, or it may be as narrow as a personal data
warehouse for a single manager for a single year.
 broader the scope, the more valuable the warehouse is to the
enterprise and the more expensive and time consuming it is to
create and maintain.
PHONE:8097071144/55 13
2. Data redundancy
 three levels of data redundancy
 “Virtual” or “point-to-point” data warehouses
 End users are allowed to get at operational databases
directly
 Central data warehouses
 Central data warehouses are real. The data stored here is
accessible from one place and must be loaded and
maintained on a regular basis.
 Distributed data warehouses
 Distributed data warehouses are those in which certain
components are distributed across a number of different
physical databases.
PHONE:8097071144/55 14
3. Type of End-user
 Executives and managers
 Power users (business and financial analysts, engineers)
 Support users (clerical, administrative)
PHONE:8097071144/55 15
Developing Data Warehouses
 Developing a good data warehouse is no different from any other IT
project— it requires careful planning, requirements definition, design,
prototyping and implementation.
 Developing Strategy
 There are a number of strategies by which organizations can get into data
warehousing.
 Installing a set of data access, data directory and process management
facilities
 Training the end-users
 Monitoring how the data warehouse facilities are actually used
 Based on actual usage, creating a physical data warehouse to support the
high-frequency requests
PHONE:8097071144/55 16
Evolving DWA
 The DWA (Data Warehouse Architecture) is simply a framework for
understanding data warehousing and how the components of data warehouse
fit together.
 One of the keys to data warehousing is flexibility
PHONE:8097071144/55 17
Designing Data Warehouses
 Designing data warehouses is very different from
designing traditional operational systems.
1. needs as operational users.
2. thinking in terms of much broader, and more difficult
to define
3. quite close to Business Process Reengineering (BPR).
4. design strategy for a data warehouse
PHONE:8097071144/55 18
Managing Data Warehouses
 how they want their warehouses to perform.
 also recognize that the maintenance of the data
warehouse structure
 IT management must understand that if they embark
on a data warehousing program
PHONE:8097071144/55 19
End Points
 Data warehousing is growing by leaps and bounds and
it is becoming increasingly difficult to estimate what
new developments are most likely to affect it.
 development of parallel DB servers with improved
query engines is likely to be one of the most
important.
 Parallel servers will make it possible to access huge
data bases in much less time.
 data warehouse planners and developers have a clear
idea of what they are looking for and then choose
strategies and methods that will provide them with
performance today and flexibility for tomorrow.
PHONE:8097071144/55 20
Goals
 Provide Easy Access to Corporate Data
 must be easy to use
 Access should be graphic
 They must easily get answers to their questions and ask new
questions, all without getting the IT team involved
 The process of getting and analyzing data must be fast.
 Provide Clean and Reliable Data for Analysis
 For consistent analysis, the data environment must be stable
 One department doing an analysis must get the same result as
any other
 Source conflicts must be resolved.
 Historical analysis must be possible, so that data can be
analyzed across a span of time
PHONE:8097071144/55 21
PHONE:8097071144/55 22
 Warehouses support business decisions by collecting,
consolidating, and organizing data for reporting and
analysis with tools such as online analytical processing
(OLAP) and data mining models.
 Although data warehouses are built on relational
database technology, the design of a data warehouse
data model and subsequent physical implementation
differs substantially from the design of an online
transaction processing (OLTP) system.
PHONE:8097071144/55 23
How do these two systems differ and what design considerations
should be kept in mind while designing a data warehouse data
model?
OLTP Database Data Warehouse Database
Designed for real-time business
transactions and processes.
Designed for analysis o f business
measures by subject area, category and
attributes.
Optimized for a common and known
set o f transactions.
Optimized for bulk loads and large
complex, unpredictable queries.
Designed for validation o f data during
transactions,
Designed to be loaded with consistent,
valid data; uses very minimal validation
Supports few concurrent users relative
to the OLTP environment.
Supports large user bases often
distributed across geographies.
Houses very minimal historical data Houses a mix o f most current
information as well as historical data
PHONE:8097071144/55 24
Defining Dimensional Model
 The purpose of dimensional model is to improve
performance by matching data structures to queries.
 Users query the data warehouse looking for data like
 Total sales in volume and revenue for the NE region for
product ‘XYZ’ for a certain period this year compared to
the same period last year
 The central theme of a dimensional model is the star schema, which
consists of a central fact table, containing measures, surrounded by
qualifiers or descriptors called ‘dimensions’.
PHONE:8097071144/55 25
 In a star schema, if a dimension is complex and
contains relationships such as hierarchies, it is
compressed or flattened to a single dimension.
 Another version of star schema is a snowflake schema. In a
snowflake schema complex dimensions are normalized.
Here, dimensions maintain relationships with other levels
of the same dimension.
PHONE:8097071144/55 26
Star Schema Model
PHONE:8097071144/55 27
Snowflake Model
PHONE:8097071144/55 28
Granularity Of Facts
 The granularity of a fact is the level of detail at which it
is recorded. If data is to be analyzed effectively, it must
be all at the same level of granularity.
 As a general rule, data should be kept at the highest
(most detailed) level of granularity.
PHONE:8097071144/55 29
Heavily Snow Flaked Model
PHONE:8097071144/55 30
 Granularity is determined by:
 Number of parts to a key
 Granularity of those parts
 Adding elements to an existing key always increases
the granularity of the data.
 removing any part of an existing key decreases its
granularity
 Using customer sales to illustrate this, a key of
customer ID and period ID is less granular than
customer ID, product ID and period ID.
PHONE:8097071144/55 31
Additivity Of Facts
 A fact is additive over a particular dimension if adding
it through or over the dimension results in a fact with
the same essential meaning as the original, but is now
relative to the new granularity.
 A fact is said to be fully additive if it is additive over
every dimension of its dimensionality
 partially additive if additive over at least one but not all
of the dimensions
 non-additive if not additive over any dimension
PHONE:8097071144/55 32
Functional Dependency Of The Data
 Functional dependency of data means that the
attributes within a given entity are fully dependent on
the entire primary key of the entity— no more, no less.
 (cust_id,cust_name,cust_add,…,…)
PHONE:8097071144/55 33
Helper Tables
 Helper tables usually take one of two forms:
 Help for multi-valued dimensions
 Helper tables for complex hierarchies
PHONE:8097071144/55 34
Multi-Valued Dimensions
 Take a situation where a household can own many
insurance policies, yet any policy could be owned by
multiple households.
 The simple approach to this is the traditional
resolution of the many-to-many relationship, called an
associative entity.
 The traditional way to resolve a many-to-many
relationship is to create an associative entity whose key
is formed from the keys of each participating parent.
PHONE:8097071144/55 35
PHONE:8097071144/55 36
Complex Hierarchies
 A hierarchy is a tree structure, such as an organization
chart. Hierarchies can involve some form of recursive
relationship.
 Recursive relationships come in two forms— “self’
relationships (1:M) and “bill of materials” relationships
(M: M). A self relationship involves one table whereas
a bill of materials involves two.
PHONE:8097071144/55 37
PHONE:8097071144/55 38
END OF UNIT 1
FOR THEORY REFER NOTES AS WELL AS CLASSROOM
NOTES FOR SOME TOPICS.
PHONE:8097071144/55 39

More Related Content

PPTX
Chapter 1 big data
PPTX
Big Data Analytics
PDF
Big Data: Issues and Challenges
PPTX
PPTX
Big Data
PDF
Introduction to Data Warehouse
PDF
Big Data: Its Characteristics And Architecture Capabilities
PPTX
Big data
Chapter 1 big data
Big Data Analytics
Big Data: Issues and Challenges
Big Data
Introduction to Data Warehouse
Big Data: Its Characteristics And Architecture Capabilities
Big data

What's hot (20)

PPT
Visual Analytics in Big Data
PPTX
Big data-ppt
PDF
Big data Analytics
PDF
Data Lake Architecture
PDF
Data science presentation 2nd CI day
PPTX
Big data by Mithlesh sadh
PPTX
PPTX
Relational and non relational database 7
PPTX
3 Data Mining Tasks
PPTX
Big Data Analytics
PPTX
Data mining , Knowledge Discovery Process, Classification
PPTX
Data Analytics Life Cycle
PPT
data warehousing
PPTX
OLAP operations
PPTX
Data analytics vs. Data analysis
PPTX
OLAP & DATA WAREHOUSE
PPTX
Data science.chapter-1,2,3
PPTX
Data preprocessing
PPTX
Kdd process
PPTX
Big Data Analytics with Hadoop
Visual Analytics in Big Data
Big data-ppt
Big data Analytics
Data Lake Architecture
Data science presentation 2nd CI day
Big data by Mithlesh sadh
Relational and non relational database 7
3 Data Mining Tasks
Big Data Analytics
Data mining , Knowledge Discovery Process, Classification
Data Analytics Life Cycle
data warehousing
OLAP operations
Data analytics vs. Data analysis
OLAP & DATA WAREHOUSE
Data science.chapter-1,2,3
Data preprocessing
Kdd process
Big Data Analytics with Hadoop
Ad

Similar to Data warehousing unit 1 (20)

PDF
Data warehousing unit 6.2
PPTX
DATA WAREHOUSING
PPT
Data Warehousing And Data Mining Presentation Transcript
PPTX
Data warehouse
PPT
DATA WAREHOUSING AND DATA MINING
PPT
DATA WAREHOUSING AND DATA MINING
PDF
H1802045666
PDF
Eight styles of data integration
DOC
Data mining notes
PDF
Data warehousing has quickly evolved into a unique and popular busin.pdf
PPT
Dataware housing
PDF
Enterprise Storage Solutions for Overcoming Big Data and Analytics Challenges
PPT
Big Data & Analytics, Peter Jönsson
PDF
The ABCs of Big Data
PDF
Cloudera Enterprise_Data Hub in Telecom
PPT
Introduction to Business Intelligence and Data warehousing - ppt
DOC
Data warehouse concepts
PPT
Datawarehousing
PPT
UNIT - 1 : Part 1: Data Warehousing and Data Mining
PPT
IT Ready - DW: 1st Day
Data warehousing unit 6.2
DATA WAREHOUSING
Data Warehousing And Data Mining Presentation Transcript
Data warehouse
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
H1802045666
Eight styles of data integration
Data mining notes
Data warehousing has quickly evolved into a unique and popular busin.pdf
Dataware housing
Enterprise Storage Solutions for Overcoming Big Data and Analytics Challenges
Big Data & Analytics, Peter Jönsson
The ABCs of Big Data
Cloudera Enterprise_Data Hub in Telecom
Introduction to Business Intelligence and Data warehousing - ppt
Data warehouse concepts
Datawarehousing
UNIT - 1 : Part 1: Data Warehousing and Data Mining
IT Ready - DW: 1st Day
Ad

More from WE-IT TUTORIALS (20)

PDF
TYBSC CS 2018 WEB SERVICES NOTES
PDF
TYBSC CS SEM 5 AI NOTES
PPSX
Geographical information system unit 6
PPSX
Geographical information system unit 5
PPSX
Geographical information system unit 4
PPSX
Geographical information system unit 3
PPSX
Geographical information system unit 2
PPSX
Geographical information system unit 1
PDF
Pm unit 1,2,3,4,5,6
PDF
Internet technology unit 5
PDF
Internet technology unit 4
PDF
Internet technology unit 3
PDF
Internet technology unit 2
PDF
Internet technology unit 1
PDF
Internet technology unit 6
PDF
Data warehousing unit 2
PDF
Data warehousing unit 6.1
PDF
Data warehousing unit 5.2
PDF
Data warehousing unit 5.1
PDF
Data warehousing unit 4.2
TYBSC CS 2018 WEB SERVICES NOTES
TYBSC CS SEM 5 AI NOTES
Geographical information system unit 6
Geographical information system unit 5
Geographical information system unit 4
Geographical information system unit 3
Geographical information system unit 2
Geographical information system unit 1
Pm unit 1,2,3,4,5,6
Internet technology unit 5
Internet technology unit 4
Internet technology unit 3
Internet technology unit 2
Internet technology unit 1
Internet technology unit 6
Data warehousing unit 2
Data warehousing unit 6.1
Data warehousing unit 5.2
Data warehousing unit 5.1
Data warehousing unit 4.2

Recently uploaded (20)

PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Classroom Observation Tools for Teachers
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Pre independence Education in Inndia.pdf
PPTX
master seminar digital applications in india
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Cell Structure & Organelles in detailed.
PDF
Basic Mud Logging Guide for educational purpose
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Cell Types and Its function , kingdom of life
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
Institutional Correction lecture only . . .
PDF
O7-L3 Supply Chain Operations - ICLT Program
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Abdominal Access Techniques with Prof. Dr. R K Mishra
Final Presentation General Medicine 03-08-2024.pptx
Microbial disease of the cardiovascular and lymphatic systems
Renaissance Architecture: A Journey from Faith to Humanism
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Classroom Observation Tools for Teachers
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Pre independence Education in Inndia.pdf
master seminar digital applications in india
Supply Chain Operations Speaking Notes -ICLT Program
Cell Structure & Organelles in detailed.
Basic Mud Logging Guide for educational purpose
2.FourierTransform-ShortQuestionswithAnswers.pdf
Cell Types and Its function , kingdom of life
Complications of Minimal Access Surgery at WLH
Microbial diseases, their pathogenesis and prophylaxis
102 student loan defaulters named and shamed – Is someone you know on the list?
Institutional Correction lecture only . . .
O7-L3 Supply Chain Operations - ICLT Program

Data warehousing unit 1

  • 1. COURCES WE OFFER: BSC(IT) FY,SY,TY BSC(CS) FY,SY,TY BSC(IT/CS) PROJECTS MCA (ENTRANCE) ENGG(IT/ELECTRONICS/EXTC) ADDRESS: 302 PARANJPE UDYOG BHAVAN, NEAR KHANDELWAL SWEETS, THANE STATION, THANE WEST. TEL: 8097071144/55 STAY CONNECTED FOR MORE UPDATES AND STUDY NOTES FACEBOOK : https://www.facebook.com/weittutorial EMAIL: weit.tutorials@gmail.com 1PHONE:8097071144/55
  • 2. UNIT 1 Introduction to Data Warehousing: Introduction, Necessity, Framework of the datawarehouse, options, developing datawarehouses, end points. Data Warehousing Design Consideration and Dimensional Modeling: Defining Dimensional Model, Granularity of Facts, Additivity of Facts, Functional dependency of the Data, Helper Tables, Implementation manyto-many relationships between fact and dimensional modelling PHONE:8097071144/55 2
  • 3. What Is A Data Warehouse?  A data warehouse is a powerful database model that significantly enhances the user’s ability to quickly analyze large, multidimensional data sets.  It cleanses and organizes data to allow users to make business decisions based on facts.  Creating data to be analytical requires that it be subject-oriented, integrated, time-referenced, and non-volatile. PHONE:8097071144/55 3
  • 4.  Subject-Oriented Data  Data warehouses group data by subject rather than by activity.  subjects— employees, accounts, sales, products.  This subject specific design helps in reducing the query response time  Integrated Data  Integrated data refers to de-duplicating information and merging it from many sources into one consistent location.  Much of the transformation and loading work that goes into the data warehouse is centered on integrating data and standardizing it. PHONE:8097071144/55 4
  • 5.  Time-Referenced Data  time-referenced data essentially refers to its time-valued characteristic.  EG:the user may ask “What were the total sales of product ‘A’ for the past three years on New Year’s Day across region ‘Y ’?”.  This exploration activity is termed “data mining”  Non-Volatile Data  The non-volatility of data, characteristic of data warehouse, enables users to dig deep into history and arrive at specific business decisions based on facts. PHONE:8097071144/55 5
  • 6. Why A Data Warehouse?  The Data Access Crisis  Every day, organizations large and small, create billions of bytes of data about all aspects of their business;  millions of individual facts about their customers, products, operations and people. But for the most part, this is locked up in a maze of computer systems and is exceedingly difficult to get at.  This phenomenon has been described as “data in jail”. PHONE:8097071144/55 6
  • 7. Data Warehousing  Data warehousing is a field that has grown from the integration of a number of different technologies and experiences over the past two decades. These experiences have allowed the IT industry to identify the key problems that need to be solved. PHONE:8097071144/55 7
  • 8. Operational vs. Informational Systems  OPERATIONAL  Operational systems, as their name implies, are the systems that help the every day operation of the enterprise.  These are the backbone systems of any enterprise, and include order entry, inventory, manufacturing, payroll and accounting.  INFORMATIONAL  Informational systems deal with analyzing data and making decisions  informational data needs often span a number of different areas and need large amounts of related operational data. PHONE:8097071144/55 8
  • 9. Framework Of The Data Warehouse  One of the reasons that data warehousing has taken such a long time to develop is that it is actually a very comprehensive technology.  In fact, it can be best represented as an enterprise-wide framework for managing informational data within the organization.  In order to understand how all the components involved in a data warehousing strategy are related, it is essential to have a Data Warehouse Architecture. In order to understand how all the components involved in a data warehousing strategy are related, it is essential to have a Data Warehouse Architecture. PHONE:8097071144/55 9
  • 10. Data Warehouse Architecture  A Data Warehouse Architecture (DWA) is a way of representing the overall structure of data, communication, processing and presentation that exists for end-user computing within the enterprise. PHONE:8097071144/55 10
  • 11. The architecture is made up of a number of interconnected parts  Source system  Source data transport layer  Data quality control and data profiling layer  Metadata management layer  Data integration layer  Data processing layer  End user reporting layer PHONE:8097071144/55 11
  • 13. Data Warehouse Options  number of key factors that need to be considered to develop data warehouses. 1. Scope of the data warehouse  The scope of a data warehouse may be as broad as all the informational data for the entire enterprise from the beginning of time, or it may be as narrow as a personal data warehouse for a single manager for a single year.  broader the scope, the more valuable the warehouse is to the enterprise and the more expensive and time consuming it is to create and maintain. PHONE:8097071144/55 13
  • 14. 2. Data redundancy  three levels of data redundancy  “Virtual” or “point-to-point” data warehouses  End users are allowed to get at operational databases directly  Central data warehouses  Central data warehouses are real. The data stored here is accessible from one place and must be loaded and maintained on a regular basis.  Distributed data warehouses  Distributed data warehouses are those in which certain components are distributed across a number of different physical databases. PHONE:8097071144/55 14
  • 15. 3. Type of End-user  Executives and managers  Power users (business and financial analysts, engineers)  Support users (clerical, administrative) PHONE:8097071144/55 15
  • 16. Developing Data Warehouses  Developing a good data warehouse is no different from any other IT project— it requires careful planning, requirements definition, design, prototyping and implementation.  Developing Strategy  There are a number of strategies by which organizations can get into data warehousing.  Installing a set of data access, data directory and process management facilities  Training the end-users  Monitoring how the data warehouse facilities are actually used  Based on actual usage, creating a physical data warehouse to support the high-frequency requests PHONE:8097071144/55 16
  • 17. Evolving DWA  The DWA (Data Warehouse Architecture) is simply a framework for understanding data warehousing and how the components of data warehouse fit together.  One of the keys to data warehousing is flexibility PHONE:8097071144/55 17
  • 18. Designing Data Warehouses  Designing data warehouses is very different from designing traditional operational systems. 1. needs as operational users. 2. thinking in terms of much broader, and more difficult to define 3. quite close to Business Process Reengineering (BPR). 4. design strategy for a data warehouse PHONE:8097071144/55 18
  • 19. Managing Data Warehouses  how they want their warehouses to perform.  also recognize that the maintenance of the data warehouse structure  IT management must understand that if they embark on a data warehousing program PHONE:8097071144/55 19
  • 20. End Points  Data warehousing is growing by leaps and bounds and it is becoming increasingly difficult to estimate what new developments are most likely to affect it.  development of parallel DB servers with improved query engines is likely to be one of the most important.  Parallel servers will make it possible to access huge data bases in much less time.  data warehouse planners and developers have a clear idea of what they are looking for and then choose strategies and methods that will provide them with performance today and flexibility for tomorrow. PHONE:8097071144/55 20
  • 21. Goals  Provide Easy Access to Corporate Data  must be easy to use  Access should be graphic  They must easily get answers to their questions and ask new questions, all without getting the IT team involved  The process of getting and analyzing data must be fast.  Provide Clean and Reliable Data for Analysis  For consistent analysis, the data environment must be stable  One department doing an analysis must get the same result as any other  Source conflicts must be resolved.  Historical analysis must be possible, so that data can be analyzed across a span of time PHONE:8097071144/55 21
  • 23.  Warehouses support business decisions by collecting, consolidating, and organizing data for reporting and analysis with tools such as online analytical processing (OLAP) and data mining models.  Although data warehouses are built on relational database technology, the design of a data warehouse data model and subsequent physical implementation differs substantially from the design of an online transaction processing (OLTP) system. PHONE:8097071144/55 23
  • 24. How do these two systems differ and what design considerations should be kept in mind while designing a data warehouse data model? OLTP Database Data Warehouse Database Designed for real-time business transactions and processes. Designed for analysis o f business measures by subject area, category and attributes. Optimized for a common and known set o f transactions. Optimized for bulk loads and large complex, unpredictable queries. Designed for validation o f data during transactions, Designed to be loaded with consistent, valid data; uses very minimal validation Supports few concurrent users relative to the OLTP environment. Supports large user bases often distributed across geographies. Houses very minimal historical data Houses a mix o f most current information as well as historical data PHONE:8097071144/55 24
  • 25. Defining Dimensional Model  The purpose of dimensional model is to improve performance by matching data structures to queries.  Users query the data warehouse looking for data like  Total sales in volume and revenue for the NE region for product ‘XYZ’ for a certain period this year compared to the same period last year  The central theme of a dimensional model is the star schema, which consists of a central fact table, containing measures, surrounded by qualifiers or descriptors called ‘dimensions’. PHONE:8097071144/55 25
  • 26.  In a star schema, if a dimension is complex and contains relationships such as hierarchies, it is compressed or flattened to a single dimension.  Another version of star schema is a snowflake schema. In a snowflake schema complex dimensions are normalized. Here, dimensions maintain relationships with other levels of the same dimension. PHONE:8097071144/55 26
  • 29. Granularity Of Facts  The granularity of a fact is the level of detail at which it is recorded. If data is to be analyzed effectively, it must be all at the same level of granularity.  As a general rule, data should be kept at the highest (most detailed) level of granularity. PHONE:8097071144/55 29
  • 30. Heavily Snow Flaked Model PHONE:8097071144/55 30
  • 31.  Granularity is determined by:  Number of parts to a key  Granularity of those parts  Adding elements to an existing key always increases the granularity of the data.  removing any part of an existing key decreases its granularity  Using customer sales to illustrate this, a key of customer ID and period ID is less granular than customer ID, product ID and period ID. PHONE:8097071144/55 31
  • 32. Additivity Of Facts  A fact is additive over a particular dimension if adding it through or over the dimension results in a fact with the same essential meaning as the original, but is now relative to the new granularity.  A fact is said to be fully additive if it is additive over every dimension of its dimensionality  partially additive if additive over at least one but not all of the dimensions  non-additive if not additive over any dimension PHONE:8097071144/55 32
  • 33. Functional Dependency Of The Data  Functional dependency of data means that the attributes within a given entity are fully dependent on the entire primary key of the entity— no more, no less.  (cust_id,cust_name,cust_add,…,…) PHONE:8097071144/55 33
  • 34. Helper Tables  Helper tables usually take one of two forms:  Help for multi-valued dimensions  Helper tables for complex hierarchies PHONE:8097071144/55 34
  • 35. Multi-Valued Dimensions  Take a situation where a household can own many insurance policies, yet any policy could be owned by multiple households.  The simple approach to this is the traditional resolution of the many-to-many relationship, called an associative entity.  The traditional way to resolve a many-to-many relationship is to create an associative entity whose key is formed from the keys of each participating parent. PHONE:8097071144/55 35
  • 37. Complex Hierarchies  A hierarchy is a tree structure, such as an organization chart. Hierarchies can involve some form of recursive relationship.  Recursive relationships come in two forms— “self’ relationships (1:M) and “bill of materials” relationships (M: M). A self relationship involves one table whereas a bill of materials involves two. PHONE:8097071144/55 37
  • 39. END OF UNIT 1 FOR THEORY REFER NOTES AS WELL AS CLASSROOM NOTES FOR SOME TOPICS. PHONE:8097071144/55 39