SlideShare a Scribd company logo
Lecture-12
Dimensional Modeling (DM)
By Mamuna Fatima
1
 Problems with early COBOLian data processing
systems.
 Data redundancies
 From flat file to Table, each entity ultimately becomes
a Table in the physical schema.
 Simple O(n2
) Join to work with Tables
2
◦ Coupled with normalization drives out all
the redundancy out of the database.
◦ Change (or add or delete) the data at just
one point.
◦ Can be used with indexing for very fast
access.
◦ Resulted in success of OLTP systems.
3
 Lets have a look at a typical ER data model first.
 Some Observations:
◦ All tables look-alike, as a consequence it is difficult to
identify:
 Which table is more important ?
 Which is the largest?
 Which tables contain numerical measurements of the business?
 Which table contain nearly static descriptive attributes?
4
◦ Many topologies for the same ER diagram,
all appearing different.
 Very hard to visualize and remember.
 A large number of possible connections to any
two (or more) tables
5
1
10
3
12
2
6
5
11 4
7
8
9
1
10
3
12
2
6
5
11
4
7
8
9
 The Paradox: Trying to make information
accessible using tables resulted in an inability to
query them!
 ER and Normalization result in large number of tables
which are:
◦ Hard to understand by the users (DB programmers)
◦ Hard to navigate optimally by DBMS software
 Real value of ER is in using tables individually or in
pairs
 Too complex for queries that span multiple tables with
a large number of records
6
ER DM
Constituted to optimize OLTP
performance.
Constituted to optimize DSS
query performance.
Models the micro relationships
among data elements.
Models the macro
relationships among data
elements with an overall
deterministic strategy.
A wild variability of the
structure of ER models.
All dimensions serve as
equal entry points to the
fact table.
Very vulnerable to changes in
the user's querying habits,
because such schemas are
asymmetrical.
Changes in users' querying
habits can be
accommodated by
automatic SQL generators.
7
Two general methods:
◦ De-Normalization
◦ Dimensional Modeling (DM)
8
 A simpler logical model optimized for decision
support.
 Inherently dimensional in nature, with a single
central fact table and a set of smaller
dimensional tables.
 Multi-part key for the fact table
 Dimensional tables with a single-part PK.
 Keys are usually system generated
9
Data cubes
Dimension Table Dimension Table
Fact Table
...
 Results in a star like structure, called star schema
or star join.
◦ All relationships mandatory M-1.
◦ Single path between any two levels.
 Supports ROLAP operations.
11
12
Items
Books Cloths
Fiction Text Men Women
MedicalEngg
Analysts tend to look at the data through dimension at aAnalysts tend to look at the data through dimension at a
particular “level” in the hierarchyparticular “level” in the hierarchy
13
Star
Snow-flake
14
CITY DISTRICT
1
ZONE CITY
DISTRICTDIVISION
MONTH QTR
STORE # STREET ZONE ...
WEEK MONTH
DATE WEEK
RECEIPT #STORE # DATE ...
ITEM #RECEIPT # ... $
ITEM # CATEGORY
ITEM #
DEPTCATEGORY
year
month
week
sale_header
store
sale_detail
item_x_cat
item_x_splir
cat_x_dept
M
1
M
1M
1
M
1
1
M M
1
M
M M1
1
M
1
1
M
YEAR QTR
1
M
quarter
SUPPLIER
DIVISIONPROVINCEM
1 BACK
division
district
zone
15
RECEIPT#
STORE#
DATE
ITEM# M
Fact Table
ITEM#
CATEGORY
DEPT
SUPPLIER
Product Dim
M
Sale Rs.
M
STORE#
ZONE
CITY
PROVINCE
Geography Dim
DISTRICT
DATE
WEEK
QUARTER
YEAR
Time Dim
MONTH
.
.
.
1
1
1
facts
DIVISION
16
Beauty lies in close correspondence
with the business, evident even to
business users.
Dimensional hierarchies are collapsed into a single
table for each dimension. Loss of Information?
A single fact table created with a single header from the
detail records, resulting in:
◦ A vastly simplified physical data model!
◦ Fewer tables (thousands of tables in some ERP systems).
◦ Fewer joins resulting in high performance.
◦ Some requirement of additional space.
17

More Related Content

PPT
Dwh lecture slidesweek7&8
DOC
PPTX
Multiscale Mapper Networks
PPTX
Machine Learning by Analogy II
PPTX
Relational data model
PPTX
OrACLE RELATIONAL
PDF
Data Creation and Importing in IBM SPSS
PPTX
Survival Analysis Superlearner
Dwh lecture slidesweek7&8
Multiscale Mapper Networks
Machine Learning by Analogy II
Relational data model
OrACLE RELATIONAL
Data Creation and Importing in IBM SPSS
Survival Analysis Superlearner

What's hot (20)

PPTX
Spss vs excel
PPTX
Advanced Excel, Day 4
PPTX
Basic Statistics (MEAN)
PPTX
Morse-Smale Regression
DOCX
Sql interview q&a
PPT
Normalization
PPTX
Deep vs diverse architectures for classification problems
PDF
Access 05
PDF
PPTX
Spreadsheets 101
PPTX
Spreadsheet ml subject xml-mapping
PPTX
Knowledge And Patterns
PPT
Functions of ms excel 2003
PPTX
Database design process
PPTX
High-Dimensional Data Visualization, Geometry, and Stock Market Crashes
PDF
Dbms Interview Question And Answer
PPT
Efficient Database Design for Banking System
PPTX
Data pre processing
PPTX
8 system models (1)
PDF
Database aggregation using metadata
Spss vs excel
Advanced Excel, Day 4
Basic Statistics (MEAN)
Morse-Smale Regression
Sql interview q&a
Normalization
Deep vs diverse architectures for classification problems
Access 05
Spreadsheets 101
Spreadsheet ml subject xml-mapping
Knowledge And Patterns
Functions of ms excel 2003
Database design process
High-Dimensional Data Visualization, Geometry, and Stock Market Crashes
Dbms Interview Question And Answer
Efficient Database Design for Banking System
Data pre processing
8 system models (1)
Database aggregation using metadata
Ad

Similar to Dwh lecture 12-dm (20)

PPT
Intro to Data warehousing lecture 08
PPT
Lecture 13
PPT
mdmodel multidimensional (MD) modeling approach to represent more complex da...
PPTX
Lecture 3:Introduction to Dimensional Modelling.pptx
DOC
Difference between ER-Modeling and Dimensional Modeling
PPT
Dimensional Modeling
PPT
Dimensional modelling-mod-3
PPTX
Module 1.2: Data Warehousing Fundamentals.pptx
PPTX
Data Warehouse_Architecture.pptx
PPTX
MULTIMEDIA MODELING
DOCX
PPT
Data Warehouse Modeling
PPT
DBMS AND RDBMS TEACHING POWER POINT .ppt
PPTX
Introduction to Data Warehousing
ODP
04 Dimensional Analysis - v6
PPT
Data Warehouse Models and Operators.ppt
PDF
LECTURE 7.ppt.pdf
PDF
Data Warehouse Designing: Dimensional Modelling and E-R Modelling
PPT
An introduction to data warehousing
PPT
Dimensional Modeling For engineering drawings.ppt
Intro to Data warehousing lecture 08
Lecture 13
mdmodel multidimensional (MD) modeling approach to represent more complex da...
Lecture 3:Introduction to Dimensional Modelling.pptx
Difference between ER-Modeling and Dimensional Modeling
Dimensional Modeling
Dimensional modelling-mod-3
Module 1.2: Data Warehousing Fundamentals.pptx
Data Warehouse_Architecture.pptx
MULTIMEDIA MODELING
Data Warehouse Modeling
DBMS AND RDBMS TEACHING POWER POINT .ppt
Introduction to Data Warehousing
04 Dimensional Analysis - v6
Data Warehouse Models and Operators.ppt
LECTURE 7.ppt.pdf
Data Warehouse Designing: Dimensional Modelling and E-R Modelling
An introduction to data warehousing
Dimensional Modeling For engineering drawings.ppt
Ad

More from Sulman Ahmed (20)

PPT
Entrepreneurial Strategy Generating and Exploiting new entries
PPT
Entrepreneurial Intentions and corporate entrepreneurship
PPT
Entrepreneurship main concepts and description
PPTX
Run time Verification using formal methods
PPTX
Use of Formal Methods at Amazon Web Services
PPTX
student learning App
PPTX
Software Engineering Economics Life Cycle.
PPTX
Data mining Techniques
PPTX
Rules of data mining
PPTX
Rules of data mining
PPTX
Classification in data mining
PPTX
Data mining Basics and complete description
PPTX
Data mining Basics and complete description onword
PPT
Dwh lecture-07-denormalization
PPT
Dwh lecture-06-normalization
PPT
Dwh lecture 13-process dm
PPT
Dwh lecture 11-molap
PPT
Dwh lecture 10-olap
PPT
Dwh lecture 08-denormalization tech
PPT
Dwh lecture 07-denormalization
Entrepreneurial Strategy Generating and Exploiting new entries
Entrepreneurial Intentions and corporate entrepreneurship
Entrepreneurship main concepts and description
Run time Verification using formal methods
Use of Formal Methods at Amazon Web Services
student learning App
Software Engineering Economics Life Cycle.
Data mining Techniques
Rules of data mining
Rules of data mining
Classification in data mining
Data mining Basics and complete description
Data mining Basics and complete description onword
Dwh lecture-07-denormalization
Dwh lecture-06-normalization
Dwh lecture 13-process dm
Dwh lecture 11-molap
Dwh lecture 10-olap
Dwh lecture 08-denormalization tech
Dwh lecture 07-denormalization

Recently uploaded (20)

DOCX
573137875-Attendance-Management-System-original
PDF
PPT on Performance Review to get promotions
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
Artificial Intelligence
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPT
Project quality management in manufacturing
PPTX
Safety Seminar civil to be ensured for safe working.
PPTX
additive manufacturing of ss316l using mig welding
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PPT
Total quality management ppt for engineering students
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
573137875-Attendance-Management-System-original
PPT on Performance Review to get promotions
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
Automation-in-Manufacturing-Chapter-Introduction.pdf
Artificial Intelligence
Internet of Things (IOT) - A guide to understanding
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
Fundamentals of safety and accident prevention -final (1).pptx
CYBER-CRIMES AND SECURITY A guide to understanding
Project quality management in manufacturing
Safety Seminar civil to be ensured for safe working.
additive manufacturing of ss316l using mig welding
III.4.1.2_The_Space_Environment.p pdffdf
Total quality management ppt for engineering students
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT

Dwh lecture 12-dm

  • 2.  Problems with early COBOLian data processing systems.  Data redundancies  From flat file to Table, each entity ultimately becomes a Table in the physical schema.  Simple O(n2 ) Join to work with Tables 2
  • 3. ◦ Coupled with normalization drives out all the redundancy out of the database. ◦ Change (or add or delete) the data at just one point. ◦ Can be used with indexing for very fast access. ◦ Resulted in success of OLTP systems. 3
  • 4.  Lets have a look at a typical ER data model first.  Some Observations: ◦ All tables look-alike, as a consequence it is difficult to identify:  Which table is more important ?  Which is the largest?  Which tables contain numerical measurements of the business?  Which table contain nearly static descriptive attributes? 4
  • 5. ◦ Many topologies for the same ER diagram, all appearing different.  Very hard to visualize and remember.  A large number of possible connections to any two (or more) tables 5 1 10 3 12 2 6 5 11 4 7 8 9 1 10 3 12 2 6 5 11 4 7 8 9
  • 6.  The Paradox: Trying to make information accessible using tables resulted in an inability to query them!  ER and Normalization result in large number of tables which are: ◦ Hard to understand by the users (DB programmers) ◦ Hard to navigate optimally by DBMS software  Real value of ER is in using tables individually or in pairs  Too complex for queries that span multiple tables with a large number of records 6
  • 7. ER DM Constituted to optimize OLTP performance. Constituted to optimize DSS query performance. Models the micro relationships among data elements. Models the macro relationships among data elements with an overall deterministic strategy. A wild variability of the structure of ER models. All dimensions serve as equal entry points to the fact table. Very vulnerable to changes in the user's querying habits, because such schemas are asymmetrical. Changes in users' querying habits can be accommodated by automatic SQL generators. 7
  • 8. Two general methods: ◦ De-Normalization ◦ Dimensional Modeling (DM) 8
  • 9.  A simpler logical model optimized for decision support.  Inherently dimensional in nature, with a single central fact table and a set of smaller dimensional tables.  Multi-part key for the fact table  Dimensional tables with a single-part PK.  Keys are usually system generated 9
  • 10. Data cubes Dimension Table Dimension Table Fact Table ...
  • 11.  Results in a star like structure, called star schema or star join. ◦ All relationships mandatory M-1. ◦ Single path between any two levels.  Supports ROLAP operations. 11
  • 12. 12 Items Books Cloths Fiction Text Men Women MedicalEngg Analysts tend to look at the data through dimension at aAnalysts tend to look at the data through dimension at a particular “level” in the hierarchyparticular “level” in the hierarchy
  • 14. 14 CITY DISTRICT 1 ZONE CITY DISTRICTDIVISION MONTH QTR STORE # STREET ZONE ... WEEK MONTH DATE WEEK RECEIPT #STORE # DATE ... ITEM #RECEIPT # ... $ ITEM # CATEGORY ITEM # DEPTCATEGORY year month week sale_header store sale_detail item_x_cat item_x_splir cat_x_dept M 1 M 1M 1 M 1 1 M M 1 M M M1 1 M 1 1 M YEAR QTR 1 M quarter SUPPLIER DIVISIONPROVINCEM 1 BACK division district zone
  • 15. 15 RECEIPT# STORE# DATE ITEM# M Fact Table ITEM# CATEGORY DEPT SUPPLIER Product Dim M Sale Rs. M STORE# ZONE CITY PROVINCE Geography Dim DISTRICT DATE WEEK QUARTER YEAR Time Dim MONTH . . . 1 1 1 facts DIVISION
  • 16. 16 Beauty lies in close correspondence with the business, evident even to business users.
  • 17. Dimensional hierarchies are collapsed into a single table for each dimension. Loss of Information? A single fact table created with a single header from the detail records, resulting in: ◦ A vastly simplified physical data model! ◦ Fewer tables (thousands of tables in some ERP systems). ◦ Fewer joins resulting in high performance. ◦ Some requirement of additional space. 17

Editor's Notes

  • #3: There were utitlity companies which goes house by house and collect info like meter reading. Now the data is placed on books, and at a centeral place info is entered in computer. Now address remain same, but the reading changes forever. Now the info become redundant. So if data changes it needs to be reflected at a lot of places. So a solution of the problem was normalization which are based on er modeling. The problem was of the slow joins. The er diagram was turned into tables. Which were joined with other tables to collect the info.
  • #5: When things were fine then why we need the DMs. Now look a schema which is in the third normal form. See the next slide Now there are some observations about er diagram. Some questions mentioned above. Now an example from real life. If you go somewhere and you want to know which person is the most important one. Yes, he will be one which has people arround him listening what he is saying. But now can you tell which table is more important? One with largest header size and few rows of record or viceversa. Numerical measurements: e.g. sales data, no of items sold and revenue, the factual data. Descriptive: or dimensional information containing data. So what is the benefit of the simplicity if it may raise more questions at every step.
  • #6: So all the previous points take us to the new representation demand. This is explained using graph theory: An ER model can have different shape based on the designer. Every model looks different. The above two graphs are same, but different representation. The left graph is more difficult to understand. So this is the graph isomorphism problem, that you have to tell, which two graphs are same and this is a very computationaly tough problem. So the same prob exists with ER diagram, that models appear different for every problem. So these complexities are taking us towards the need of DM.
  • #7: Paradox: conflict. An example is that you went in an hospital and said how was the operation, they said the operation was successful but patiant died. So what is the benefit of such successful operation, which could not save a patients life so a paradox. The problem is complex because of so many tables due to normalization. And in erp system this may be in thousands. The real value of er modeling is when you query a single table or few tables then you will have good performance but in dss we by defualt join many many tables, so performance will suddenly go down this is a paradox.
  • #8: So a comparison of er against dm. Er modeling is for oltp and dm for dss. Suppose you have a bike, and you decide that when you make home and decide to load the cement for house making the result is your bike will destroy. But if you do it on a truck it will never have any affect. So the problem is using the right thing for wrong problem. In dss we are concerned with higher level or aggregation, so we will not go on minor details. Er diagrams are different for same problem. But when you make system then all systems will have a lot of variation. But in dss the schema do not change normally. There are smart enviornment which generates sql automatically but they may become in a difficulty while optimizing if the schema always changes. But in DM or star schema, it is very difficult to generate the sql. Er schema changes when business changes, so sql generating tool faces difficulty. But in dss the schema remains constant even with the change of business.
  • #9: ER model can be simplified using de-normalization and DM.
  • #10: So what is a dm or how we tell about a schema that it is optimized for the dss enviornment. The slide points. So the key point is it is simple, logical and intutive. So if it is easy to understand for programmers, it assures better solution. It has two tables fact and dimensional. Fact tables are large and dimensional tables are small. Fact tables are table which store numerical data i.e how much sale, sale revenew. The dimension table has info about dimension i.e time, geography etc. Keys should be sys generated not the business key, so if the business change key should not need the maintainance.
  • #11: Map business analyst representation to relational model Data cubes with dimensions and measures Relational design with tables and 1-M relationships (FKs) Dimensions to dimension tables Measures to fact tables Group fact and dimension tables Grain: most detailed measure values stored
  • #12: How fact and dimension table connects? In the form of star topology where fact table is in center. Dm is designed to support the rolap operations, where we can run on the go queries.
  • #13: Dimensions have hirarchies. i.e books have fiction and text, but you cannot mix them. So the benefit is decision maker can enter at a point in hirerchy to see the details of other hirarchies.
  • #14: The above task can be done by two schemas. Star are simple, either you rotate flip or reposition it wont change, but for snowflake if you do this, you will loose the entire meaning. Star schema represents a complete business process e.g. sales, purchases, inventory etc. For each business process we will have different stars.
  • #16: Star schema of the previous slide, and things become simplified. We create the fact tables having real (physical) records, we do not run the joins on run time. This is the reason that in pivot4j we analyze a physical and real star by placing the dimensions of our requirements and mdx generates automatically. Once a star is created it doesn’t matter how you analyze it. suppose there are hundered records in each table and 4 tables are involved in a query which needs a join, now against the join the output returns 40 rows for a specific join query. Now to retrieve these 40 rows we have computed 100x100x100x100 steps. Now if these 40 records are placed in a table (fact table) which has 1000 total rows then in worst case we will achieve the correct output in 1000 steps in star instead of 100000000 steps. So ultimately we will achieve enormous performance.
  • #18: When we get star schema, we collapsed the hirarchies and make a single table i.e time is now in a single table means we will avoid the sub tables in the form of pk and fk relations, now the name of a column say city will be used in dimensional table instead of FK, it may result the loss of info i.e every city may have the province name fk but now we will not be able to tell the dependency of cities by just looking the diagram. Its disadvantage is that you cannot tell, which element is subset of which element, and what is the level of element in hirerchy. So loss of information. The benefit is that simple schema with few tables as compare to previously hundreds of tables, another disadvantage is the additional space. The simple example could be on next stage.