SlideShare a Scribd company logo
Er. Nawaraj Bhandari
Data Warehouse/Data Mining
Chapter 3:
Data Warehouse Physical Design
Physical Design
Physical design is the phase of a database design following the logical design that
identifies the actual database tables and index structures used to implement the
logical design.
In the physical design, you look at the most effective way of storing and retrieving
the objects as well as handling them from a transportation and backup/recovery
perspective.
Physical design decisions are mainly driven by query performance and
database maintenance aspects.
During the logical design phase, you defined a model for your data warehouse
consisting of entities, attributes, and relationships. The entities are linked
together using relationships. Attributes are used to describe the entities. The
unique identifier (UID) distinguishes between one instance of an entity and
another.
Figure: Logical Design Compared with Physical Design
During the physical design process, you translate the expected schemas
into actual database structures.
At this time, you have to map:
■ Entities to tables
■ Relationships to foreign key constraints
■ Attributes to columns
■ Primary unique identifiers to primary key constraints
■ Unique identifiers to unique key constraints
Physical Data Model
Features of physical data model include:
Specification all tables and columns.
Specification of Foreign keys.
De-normalization may be performed if necessary.
At this level, specification of logical data model is realized in the database.
The steps for physical data model design involves:
Conversion of entities into tables,
Conversion of relationships into foreign keys, Conversion of attributes into
columns
Changes to the physical data model based on the physical constraints.
Figure: Logical model and physical model
Physical Design Objectives
Involves tradeoffs among
 Performance
 Flexibility
 Scalability
 Ease of Administration
 Data Integrity
 Data Consistency
 Data Availability
 User Satisfaction
Physical Design Structures
Once you have converted your logical design to a physical one,
you will need to create some or all of the following structures:
■ Tablespaces
■ Tables and Partitioned Tables
■ Views
■ Integrity Constraints
■ Dimensions
Some of these structures require disk space. Others exist only in
the data dictionary. Additionally, the following structures may be
created for performance improvement:
■ Indexes and Partitioned Indexes
■ Materialized Views
Tablespaces
 A tablespace consists of one or more datafiles, which are physical
structures within the operating system you are using.
 A datafile is associated with only one tablespace.
 From a design perspective, tablespaces are containers for
physical design structures.
Tables and Partitioned Tables
 Tables are the basic unit of data storage. They are the
container for the expected amount of raw data in your
data warehouse.
 Using partitioned tables instead of non-partitioned ones
addresses the key problem of supporting very large data
volumes by allowing you to divide them into smaller and
more manageable pieces.
 Partitioning large tables improves performance because
each partitioned piece is more manageable.
Views
 A view is a tailored presentation of the data contained in one or
more tables or other views.
 A view takes the output of a query and treats it as a table.
 Views do not require any space in the database.
Improving Performance with the Use of Views
View of
selected rows
or columns of
these tables
Table 1
Table 2
Table 3
Query
View
 A view is a virtual table which
completely acts as a real table.
 The use of view as a way to improve
performance.
 Views can be used to combine tables,
so that instead of joining tables in a
query, the query will just access the
view and thus be quicker.
View
 We can perform different SQL queries.
 DESC department_worker_view;
Integrity Constraints
 Integrity constraints are used to enforce business rules associated
with your database and to prevent having invalid information in
the tables.
 In data warehousing environments, constraints are only used for
query rewrite.
 NOT NULL constraints are particularly common in data
warehouses.
Indexes and Partitioned Indexes
 Indexes are optional structures associated with tables.
 Indexes are just like tables in that you can partition them (but the
partitioning strategy is not dependent upon the table structure)
 Partitioning indexes makes it easier to manage the data warehouse
during refresh and improves query performance.
Materialized Views
 Materialized views are query results that have been stored in
advance so long-running calculations are not necessary when you
actually execute your SQL statements.
 From a physical design point of view, materialized views resemble
tables or partitioned tables and behave like indexes in that they are
used transparently and improve performance.
Data Warehouse: A Multi-Tiered Architecture
Data
Warehouse
Extract
Transform
Load
Refresh
(2) OLAP Engine
Analysis
Query/Reports
Data mining
Monitor
&
Integrator
Metadata
Data Sources (3) Front-End Tools
Server
Data Marts
Operational
DBs
Other
sources
(1) Data Storage
OLAP Server
ROLAP
Server
MOLAP
Server
ETL (Extract-Transform-Load)
 ETL comes from Data Warehousing and stands for Extract-Transform-Load.
ETL covers a process of how the data are loaded from the source system
to the data warehouse.
 Currently, the ETL encompasses a cleaning step as a separate step. The
sequence is then Extract-Clean-Transform-Load.
Extract
 The Extract step covers the data extraction from the source system and
makes it accessible for further processing.
 The main objective of the extract step is to retrieve all the required data
from the source system with as little resources as possible.
 The extract step should be designed in a way that it does not negatively
affect the source system in terms or performance, response time or any
kind of locking.
Extract
There are several ways to perform the extract:
 Update notification - if the source system is able to provide a notification that a record has
been changed and describe the change, this is the easiest way to get the data.
 Incremental extract - some systems may not be able to provide notification that an update
has occurred, but they are able to identify which records have been modified and provide
an extract of such records. During further ETL steps, the system needs to identify changes
and propagate it down. Note, that by using daily extract, we may not be able to handle
deleted records properly.
 Full extract - some systems are not able to identify which data has been changed at all, so
a full extract is the only way one can get the data out of the system. The full extract
requires keeping a copy of the last extract in the same format in order to be able to identify
changes. Full extract handles deletions as well.
Clean
The cleaning step is one of the most important as it ensures the quality of
the data in the data warehouse. Cleaning should perform basic data
unification rules, such as:
 Making identifiers unique (sex categories Male/Female/Unknown, M/F/null,
Man/Woman/Not Available are translated to standard Male/Female/Unknown)
 Convert null values into standardized Not Available/Not Provided value
 Convert phone numbers, ZIP codes to a standardized form
 Validate address fields, convert them into proper naming, e.g. Street/St/St./Str./Str
 Validate address fields against each other (State/Country, City/State, City/ZIP code,
City/Street).
Transform
The transform step applies a set of rules to transform the data from the
source to the target.
This includes converting any measured data to the same dimension (i.e.
conformed dimension) using the same units so that they can later be
joined.
The transformation step also requires joining data from several sources,
generating aggregates, generating surrogate keys(candidate key), sorting,
deriving new calculated values, and applying advanced validation rules.
OLAP Server Architectures
Types of OLAP Servers
 Relational OLAP (ROLAP)
 Multidimensional OLAP (MOLAP)
 Hybrid OLAP (HOLAP)
Relational OLAP (ROLAP)
 Relational OLAP servers are placed between relational back-end server and
client front-end tools. To store and manage the warehouse data, the relational
OLAP uses relational or extended-relational DBMS.
 ROLAP servers can be easily used with existing RDBMS.
 ROLAP tools do not use pre-calculated data cubes.
Multidimensional OLAP(MOLAP)
 Multidimensional OLAP (MOLAP) uses array-based multidimensional storage
engines for multidimensional views of data. With multidimensional data stores,
the storage utilization may be low if the data set is sparse. Therefore, many
MOLAP servers use two levels of data storage representation to handle dense
and sparse data-sets
 MOLAP allows fastest indexing to the pre-computed summarized data.
 Easier to use, therefore MOLAP is suitable for inexperienced users.
MOLAP vs. ROLAP
MOLAP ROLAP
Information retrieval is fast. Information retrieval is comparatively slow.
Uses sparse array to store data-sets. Uses relational table.
MOLAP is best suited for inexperienced
users, since it is very easy to use.
ROLAP is best suited for experienced users.
Maintains a separate database for data
cubes.
It may not require space other than available in
the Data warehouse.
Hybrid OLAP (HOLAP)
 Hybrid OLAP is a combination of both ROLAP and MOLAP. It offers
higher scalability of ROLAP and faster computation of MOLAP.
 HOLAP servers allows to store the large data volumes of detailed
information. The aggregations are stored separately in MOLAP store.
Distributed Data Warehouse
 (DDW) Data shared across multiple data repositories, for the purpose
of OLAP. Each data warehouse may belong to one or many
organizations. The sharing implies a common format or definition of
data elements (e.g. using XML).
 Distributed data warehousing encompasses a complete enterprise DW
but has smaller data stores that are built separately and joined
physically over a network, providing users with access to relevant
reports without impacting on performance.
 A distributed DW, the nucleus of all enterprise data, sends relevant
data to individual data marts from which users can access information
for order management, customer billing, sales analysis, and other
reporting and analytic functions.
Data Warehouse Manager
 Collects data inputs from a variety of sources, including legacy
operational systems, third-party data suppliers, and informal sources.
 Assures the quality of these data inputs by correcting spelling,
removing mistakes, eliminating null data, and combining multiple
sources
 Releases the data from the data staging area to the individual data
marts on a regular schedule.
 Measures the costs and benefits.
 Estimates the cost and benefits
Virtual Warehouse
 The data warehouse is a great idea, but it is complex to build and
requires investment. Why not use a cheap and fast approach
by eliminating the transformation steps of repositories for metadata
and another database.
 This approach is termed the 'virtual data warehouse'. To accomplish
this there is need to define 4 kinds of information:
 A data dictionary containing the definitions of the various databases.
 A description of the relationship among the data elements.
 The description of the way user will interface with the system.
 The algorithms and business rules that define what to do and how to do it.
References
1. Sam Anahory, Dennis Murray, “Data warehousing In the Real World”, Pearson
Education.
2. Kimball, R. “The Data Warehouse Toolkit”, Wiley, 1996.
3. Teorey, T. J., “Database Modeling and Design: The Entity-Relationship Approach”,
Morgan Kaufmann Publishers, Inc., 1990.
4. “An Overview of Data Warehousing and OLAP Technology”, S. Chaudhuri,
Microsoft Research
5. “Data Warehousing with Oracle”, M. A. Shahzad
6. “Data Mining Concepts and Techniques”, Morgan Kaufmann J. Han, M Kamber
Second Edition ISBN : 978-1-55860-901-3
ANY QUESTIONS?

More Related Content

PPSX
OLAP OnLine Analytical Processing
PPTX
3 tier data warehouse
 
PPTX
DATA WAREHOUSING
PDF
Multidimentional data model
PPT
1.4 data warehouse
PDF
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
PPTX
Data warehouse and data mining
PDF
OLAP in Data Warehouse
OLAP OnLine Analytical Processing
3 tier data warehouse
 
DATA WAREHOUSING
Multidimentional data model
1.4 data warehouse
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data warehouse and data mining
OLAP in Data Warehouse

What's hot (20)

PPTX
Major issues in data mining
PPTX
Distributed database management system
PPTX
Data preprocessing
PDF
UNIT 1 -BIG DATA ANALYTICS Full.pdf
PPTX
data generalization and summarization
PPTX
Challenges of Conventional Systems.pptx
PPTX
Data warehousing
PPTX
Client server architecture
PPT
Distributed Database System
PPTX
Big Data Open Source Technologies
PPTX
distributed Computing system model
PPTX
Data mining tasks
PPTX
PPTX
Data Modeling PPT
PPTX
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
PPT
1. Introduction to DBMS
PPTX
Client server architecture
PPTX
PDF
Lecture6 introduction to data streams
PPT
13. Query Processing in DBMS
Major issues in data mining
Distributed database management system
Data preprocessing
UNIT 1 -BIG DATA ANALYTICS Full.pdf
data generalization and summarization
Challenges of Conventional Systems.pptx
Data warehousing
Client server architecture
Distributed Database System
Big Data Open Source Technologies
distributed Computing system model
Data mining tasks
Data Modeling PPT
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
1. Introduction to DBMS
Client server architecture
Lecture6 introduction to data streams
13. Query Processing in DBMS
Ad

Similar to Data warehouse physical design (20)

PPTX
Data warehouse introduction
PPT
ITReady DW Day2
PPT
Datawarehouse and OLAP
PPTX
Online analytical processing
PDF
Data Warehouse and Architecture, OLAP Operation
PPT
Data ware housing - Introduction to data ware housing process.
PPTX
lec 4 Data warehouse course Advanced database.pptx
PPTX
Data warehousing
PPTX
Introduction to Data Warehousing
PPTX
Dataware house multidimensionalmodelling
PPT
Ch1 data-warehousing
PPT
Ch1 data-warehousing
PPT
Chapter 2
PDF
Building Data Warehouse in SQL Server
PDF
data warehousing
PPTX
Data Warehousing
PPT
1-_Intro_to_Data_Minning__DWH.ppt
PPTX
Data warehouse
PPT
Data Warehousing Datamining Concepts
PPT
Data warehousing and online analytical processing
Data warehouse introduction
ITReady DW Day2
Datawarehouse and OLAP
Online analytical processing
Data Warehouse and Architecture, OLAP Operation
Data ware housing - Introduction to data ware housing process.
lec 4 Data warehouse course Advanced database.pptx
Data warehousing
Introduction to Data Warehousing
Dataware house multidimensionalmodelling
Ch1 data-warehousing
Ch1 data-warehousing
Chapter 2
Building Data Warehouse in SQL Server
data warehousing
Data Warehousing
1-_Intro_to_Data_Minning__DWH.ppt
Data warehouse
Data Warehousing Datamining Concepts
Data warehousing and online analytical processing
Ad

More from Er. Nawaraj Bhandari (20)

PPTX
Data mining approaches and methods
PPTX
Research trends in data warehousing and data mining
PPTX
Mining Association Rules in Large Database
PPTX
Introduction to data mining and data warehousing
PPTX
Data warehouse testing
PPTX
Data warehouse logical design
PPTX
Classification and prediction in data mining
PPTX
Chapter 3: Simplification of Boolean Function
PPTX
Chapter 6: Sequential Logic
PPTX
Chapter 5: Cominational Logic with MSI and LSI
PPTX
Chapter 4: Combinational Logic
PPTX
Chapter 2: Boolean Algebra and Logic Gates
PPTX
Chapter 1: Binary System
PPTX
Introduction to Electronic Commerce
PPT
Evaluating software development
PPT
Using macros in microsoft excel part 2
PPT
Using macros in microsoft excel part 1
PPTX
Using macros in microsoft access
PPTX
Testing software development
PPTX
Application software and business processes
Data mining approaches and methods
Research trends in data warehousing and data mining
Mining Association Rules in Large Database
Introduction to data mining and data warehousing
Data warehouse testing
Data warehouse logical design
Classification and prediction in data mining
Chapter 3: Simplification of Boolean Function
Chapter 6: Sequential Logic
Chapter 5: Cominational Logic with MSI and LSI
Chapter 4: Combinational Logic
Chapter 2: Boolean Algebra and Logic Gates
Chapter 1: Binary System
Introduction to Electronic Commerce
Evaluating software development
Using macros in microsoft excel part 2
Using macros in microsoft excel part 1
Using macros in microsoft access
Testing software development
Application software and business processes

Recently uploaded (20)

PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Supervised vs unsupervised machine learning algorithms
PPT
Quality review (1)_presentation of this 21
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Database Infoormation System (DBIS).pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Computer network topology notes for revision
PPTX
Global journeys: estimating international migration
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Foundation of Data Science unit number two notes
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Mega Projects Data Mega Projects Data
PDF
Fluorescence-microscope_Botany_detailed content
Clinical guidelines as a resource for EBP(1).pdf
Supervised vs unsupervised machine learning algorithms
Quality review (1)_presentation of this 21
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Moving the Public Sector (Government) to a Digital Adoption
Galatica Smart Energy Infrastructure Startup Pitch Deck
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Database Infoormation System (DBIS).pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Computer network topology notes for revision
Global journeys: estimating international migration
IB Computer Science - Internal Assessment.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Foundation of Data Science unit number two notes
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Mega Projects Data Mega Projects Data
Fluorescence-microscope_Botany_detailed content

Data warehouse physical design

  • 1. Er. Nawaraj Bhandari Data Warehouse/Data Mining Chapter 3: Data Warehouse Physical Design
  • 2. Physical Design Physical design is the phase of a database design following the logical design that identifies the actual database tables and index structures used to implement the logical design. In the physical design, you look at the most effective way of storing and retrieving the objects as well as handling them from a transportation and backup/recovery perspective.
  • 3. Physical design decisions are mainly driven by query performance and database maintenance aspects. During the logical design phase, you defined a model for your data warehouse consisting of entities, attributes, and relationships. The entities are linked together using relationships. Attributes are used to describe the entities. The unique identifier (UID) distinguishes between one instance of an entity and another.
  • 4. Figure: Logical Design Compared with Physical Design
  • 5. During the physical design process, you translate the expected schemas into actual database structures. At this time, you have to map: ■ Entities to tables ■ Relationships to foreign key constraints ■ Attributes to columns ■ Primary unique identifiers to primary key constraints ■ Unique identifiers to unique key constraints
  • 6. Physical Data Model Features of physical data model include: Specification all tables and columns. Specification of Foreign keys. De-normalization may be performed if necessary. At this level, specification of logical data model is realized in the database.
  • 7. The steps for physical data model design involves: Conversion of entities into tables, Conversion of relationships into foreign keys, Conversion of attributes into columns Changes to the physical data model based on the physical constraints.
  • 8. Figure: Logical model and physical model
  • 9. Physical Design Objectives Involves tradeoffs among  Performance  Flexibility  Scalability  Ease of Administration  Data Integrity  Data Consistency  Data Availability  User Satisfaction
  • 10. Physical Design Structures Once you have converted your logical design to a physical one, you will need to create some or all of the following structures: ■ Tablespaces ■ Tables and Partitioned Tables ■ Views ■ Integrity Constraints ■ Dimensions Some of these structures require disk space. Others exist only in the data dictionary. Additionally, the following structures may be created for performance improvement: ■ Indexes and Partitioned Indexes ■ Materialized Views
  • 11. Tablespaces  A tablespace consists of one or more datafiles, which are physical structures within the operating system you are using.  A datafile is associated with only one tablespace.  From a design perspective, tablespaces are containers for physical design structures.
  • 12. Tables and Partitioned Tables  Tables are the basic unit of data storage. They are the container for the expected amount of raw data in your data warehouse.  Using partitioned tables instead of non-partitioned ones addresses the key problem of supporting very large data volumes by allowing you to divide them into smaller and more manageable pieces.  Partitioning large tables improves performance because each partitioned piece is more manageable.
  • 13. Views  A view is a tailored presentation of the data contained in one or more tables or other views.  A view takes the output of a query and treats it as a table.  Views do not require any space in the database.
  • 14. Improving Performance with the Use of Views View of selected rows or columns of these tables Table 1 Table 2 Table 3 Query
  • 15. View  A view is a virtual table which completely acts as a real table.  The use of view as a way to improve performance.  Views can be used to combine tables, so that instead of joining tables in a query, the query will just access the view and thus be quicker.
  • 16. View  We can perform different SQL queries.  DESC department_worker_view;
  • 17. Integrity Constraints  Integrity constraints are used to enforce business rules associated with your database and to prevent having invalid information in the tables.  In data warehousing environments, constraints are only used for query rewrite.  NOT NULL constraints are particularly common in data warehouses.
  • 18. Indexes and Partitioned Indexes  Indexes are optional structures associated with tables.  Indexes are just like tables in that you can partition them (but the partitioning strategy is not dependent upon the table structure)  Partitioning indexes makes it easier to manage the data warehouse during refresh and improves query performance.
  • 19. Materialized Views  Materialized views are query results that have been stored in advance so long-running calculations are not necessary when you actually execute your SQL statements.  From a physical design point of view, materialized views resemble tables or partitioned tables and behave like indexes in that they are used transparently and improve performance.
  • 20. Data Warehouse: A Multi-Tiered Architecture Data Warehouse Extract Transform Load Refresh (2) OLAP Engine Analysis Query/Reports Data mining Monitor & Integrator Metadata Data Sources (3) Front-End Tools Server Data Marts Operational DBs Other sources (1) Data Storage OLAP Server ROLAP Server MOLAP Server
  • 21. ETL (Extract-Transform-Load)  ETL comes from Data Warehousing and stands for Extract-Transform-Load. ETL covers a process of how the data are loaded from the source system to the data warehouse.  Currently, the ETL encompasses a cleaning step as a separate step. The sequence is then Extract-Clean-Transform-Load.
  • 22. Extract  The Extract step covers the data extraction from the source system and makes it accessible for further processing.  The main objective of the extract step is to retrieve all the required data from the source system with as little resources as possible.  The extract step should be designed in a way that it does not negatively affect the source system in terms or performance, response time or any kind of locking.
  • 23. Extract There are several ways to perform the extract:  Update notification - if the source system is able to provide a notification that a record has been changed and describe the change, this is the easiest way to get the data.  Incremental extract - some systems may not be able to provide notification that an update has occurred, but they are able to identify which records have been modified and provide an extract of such records. During further ETL steps, the system needs to identify changes and propagate it down. Note, that by using daily extract, we may not be able to handle deleted records properly.  Full extract - some systems are not able to identify which data has been changed at all, so a full extract is the only way one can get the data out of the system. The full extract requires keeping a copy of the last extract in the same format in order to be able to identify changes. Full extract handles deletions as well.
  • 24. Clean The cleaning step is one of the most important as it ensures the quality of the data in the data warehouse. Cleaning should perform basic data unification rules, such as:  Making identifiers unique (sex categories Male/Female/Unknown, M/F/null, Man/Woman/Not Available are translated to standard Male/Female/Unknown)  Convert null values into standardized Not Available/Not Provided value  Convert phone numbers, ZIP codes to a standardized form  Validate address fields, convert them into proper naming, e.g. Street/St/St./Str./Str  Validate address fields against each other (State/Country, City/State, City/ZIP code, City/Street).
  • 25. Transform The transform step applies a set of rules to transform the data from the source to the target. This includes converting any measured data to the same dimension (i.e. conformed dimension) using the same units so that they can later be joined. The transformation step also requires joining data from several sources, generating aggregates, generating surrogate keys(candidate key), sorting, deriving new calculated values, and applying advanced validation rules.
  • 26. OLAP Server Architectures Types of OLAP Servers  Relational OLAP (ROLAP)  Multidimensional OLAP (MOLAP)  Hybrid OLAP (HOLAP)
  • 27. Relational OLAP (ROLAP)  Relational OLAP servers are placed between relational back-end server and client front-end tools. To store and manage the warehouse data, the relational OLAP uses relational or extended-relational DBMS.  ROLAP servers can be easily used with existing RDBMS.  ROLAP tools do not use pre-calculated data cubes.
  • 28. Multidimensional OLAP(MOLAP)  Multidimensional OLAP (MOLAP) uses array-based multidimensional storage engines for multidimensional views of data. With multidimensional data stores, the storage utilization may be low if the data set is sparse. Therefore, many MOLAP servers use two levels of data storage representation to handle dense and sparse data-sets  MOLAP allows fastest indexing to the pre-computed summarized data.  Easier to use, therefore MOLAP is suitable for inexperienced users.
  • 29. MOLAP vs. ROLAP MOLAP ROLAP Information retrieval is fast. Information retrieval is comparatively slow. Uses sparse array to store data-sets. Uses relational table. MOLAP is best suited for inexperienced users, since it is very easy to use. ROLAP is best suited for experienced users. Maintains a separate database for data cubes. It may not require space other than available in the Data warehouse.
  • 30. Hybrid OLAP (HOLAP)  Hybrid OLAP is a combination of both ROLAP and MOLAP. It offers higher scalability of ROLAP and faster computation of MOLAP.  HOLAP servers allows to store the large data volumes of detailed information. The aggregations are stored separately in MOLAP store.
  • 31. Distributed Data Warehouse  (DDW) Data shared across multiple data repositories, for the purpose of OLAP. Each data warehouse may belong to one or many organizations. The sharing implies a common format or definition of data elements (e.g. using XML).  Distributed data warehousing encompasses a complete enterprise DW but has smaller data stores that are built separately and joined physically over a network, providing users with access to relevant reports without impacting on performance.  A distributed DW, the nucleus of all enterprise data, sends relevant data to individual data marts from which users can access information for order management, customer billing, sales analysis, and other reporting and analytic functions.
  • 32. Data Warehouse Manager  Collects data inputs from a variety of sources, including legacy operational systems, third-party data suppliers, and informal sources.  Assures the quality of these data inputs by correcting spelling, removing mistakes, eliminating null data, and combining multiple sources  Releases the data from the data staging area to the individual data marts on a regular schedule.  Measures the costs and benefits.  Estimates the cost and benefits
  • 33. Virtual Warehouse  The data warehouse is a great idea, but it is complex to build and requires investment. Why not use a cheap and fast approach by eliminating the transformation steps of repositories for metadata and another database.  This approach is termed the 'virtual data warehouse'. To accomplish this there is need to define 4 kinds of information:  A data dictionary containing the definitions of the various databases.  A description of the relationship among the data elements.  The description of the way user will interface with the system.  The algorithms and business rules that define what to do and how to do it.
  • 34. References 1. Sam Anahory, Dennis Murray, “Data warehousing In the Real World”, Pearson Education. 2. Kimball, R. “The Data Warehouse Toolkit”, Wiley, 1996. 3. Teorey, T. J., “Database Modeling and Design: The Entity-Relationship Approach”, Morgan Kaufmann Publishers, Inc., 1990. 4. “An Overview of Data Warehousing and OLAP Technology”, S. Chaudhuri, Microsoft Research 5. “Data Warehousing with Oracle”, M. A. Shahzad 6. “Data Mining Concepts and Techniques”, Morgan Kaufmann J. Han, M Kamber Second Edition ISBN : 978-1-55860-901-3