SlideShare a Scribd company logo
Information & Knowledge
  Management - Class 3
        Marielba Zacarias
       Prof. Auxiliar DEEI
    FCT I, Gab 2.69, Ext. 7749
         Data-warehousing
          mzacaria@ualg.pt
    http://w3.ualg.pt/~mzacaria
Summary

Data-warehouses
 The architected environment
 Design Process
 Data-modeling schemas
Data Warehousing
Data collection for analysis and
reporting taks
Historical data
Stored in a distinct environment from
operational data
Structure different from data-bases
Why
Operational and analitical data have
different requirements in terms of
 usage (frequency, response time)
 hardware
 software
 structure
Data-warehousing
     Users
Before Data-Warehouses....
      The “spider web”




            6
The “arquitected” environment”

                           Atomic                  Dept.              individual
 operational
                             dw                     dw                   dw
                                               “data-marts”
       Detailed                                                           temporal
                         More granular               derived,
         daily                                                             Ad-hoc
                           Temporal              Some primitive
    current value                                                         Heuristic
                          Integrated           Typical of Marketing
  High access prob.                                                    Não-repetitive
                        Subject oriented           Engineering
 Application oriented                                                 Oriented to PC or
                          Sumarized                 Production
                                                                        workstations
                                                   Accounting



                                           7
Type of questions
                  Atomico
  operacional                     Dept.        individual
                    dw


  J. Jones         1986-87
                                Jan – 4101    Clientes
123 Main St.       J. Jones
                                Fev – 4209   Desde 1982
 Credit - AA     456 High St.
                                Mar- 4175    Com saldos
                  Credit - B
                                Apr - 4215    > 5,000
   Jones                                      e crédito
   Credit?         1987-89
                                 Monthly        >= B
                   J. Jones
                 456 High St.    Sales?
                  Credit - A

                 1989 – pte.                 Client types
       Jones       J. Jones                  in analysis?
       Credit    123 Main St.
      History?    Credit - AA
                            8
Architected Environment
                Production
               Environment




 Operational                  Analitical
 environment                 Environment


                   9
Data-warehouse design
 Requirement         Performance Tuning
 Gatherings          Query
 Physical            Optimization
 Environment Setup   Quality Assurance
 Data Modeling       Rolling out to
 ETL                 Production
 OLAP Cube Design    Production
 Front End           Maintenance
 Development         Incremental
                     Enhancements
 Report
 Development
Requirements
       Gathering
Take into account users
  Executive with little time and knowledge about
  technical terms
  Interviews, JAD sessions
    User Reporting/Analysis Requirements
    Hardware, training requirements
    Data source identification
    Concrete project plan
Physical Environment
        Setup
Setup Servers, DBMS and databases,
ETL, OLAP Cubes and reporting services
Create three environments
 development, testing, production
Data-modeling
            Depends on initial data source identification
            Conceptual, logical and physical data modeling




 Should be related
to the information
  architecture!!!!
Data Modeling
  Dimensional Approach
Transactional data is partitioned in facts
  Numeric transaction data
    products ordered, price
Dimensions
  provide context for facts
    order date, customer name, product
    number, location info, salesperson
Dimensional Approaches
 Star
   Fact table (typically a transaction)
   Dimensions (context of the transaction)
 Snowflake
   Dimensions indirectly linked to fact
   tables
Star Metaphor
Star Schema
Relational model
Star schema
Snow-flake schema
OLAP Cube Design
Specification of detailed reporting needs
in terms of the multi-dimensional
structure previously defined (star or
snowflake), but regarded as a n-
dimensional cube
star/snowflake and cubes are pretty
much the same thing
cubes are more appropriate for not IT
users
The Cube Metaphor
Slicing
Dicing
Rotating
ETL

Extraction
Transformation
Loading
SQL Server
Integration Services
SQL Server
Integration Examples
SQL Server
Integration Examples II
    Qualitative data
                 Description term                 ActionId
                 team meeting                          18
                 hr distribution                       19
                 project list                          19
                 team meeting                          19
                 hr distribution                       26
                 project list                          26
                 claims application                    27
                 claims application                    28
                 cards application maintenance         29
                 claims application integration        30
                 hr distribution                       31
                 project list                          31
                 claims application                    34
                 claims application                    35
                 hr distribution                       36
                 project list                          36
SQL Server
Integration Examples III
   Fuzzy Transformations
Front-end development
 Front-ends range from
   in-house development with scripting
   languages php, asp, or perl
   to off-the-shelf products such as Crystal
   Reports or higher-end products such as
   Actuate
   OLAP vendors also offer front-ends of their
   own
Report Development
Derived from requirements
Main point of contact between the data-
warehouse and users
User customization
Report Delivery (web, e-mail, sms, file
formats)
Access privileges
Performance Tuning
ETL
Query Processing
 Users loose interest after 30 sec!
 Query optimization
Report Delivery
Query Optimization
Understand how your DBMS executes queries
Store intermediate results in temporary tables
Query Optimization tips
  Use indexes
  Partition tables (vertically and horizontally)
  De-normalize (less joins)
  Server Tuning
Quality Assurance
Test plan with quality criteria for data
Critical success factor
Often overlooked
Performed by people with knowledge of
the business data not data-warehouses
  Resistance
Rolling to production

Seems easy but..
Putting everyone online may take a full
week in some cases
Online access can be as simple as
sending a link by e-mail
Production Maintenance
 Backup and recovery processes
 Crisis Management
 Monitoring end-user usage
  Capture runaways queries before
  whole system is slowed down
  To measure usage for ROI calculations
  and future enhancements
Incremental enhancements

  Accomplish small changes such as
  changing original geographical
  designations
   A company may add new sales regions
  No matter how simple, never do them
  directly in production environment
Architected environment
Architected Enviroment
Architected
Environment
Architected environment
Tools for unstructured
information management
 Content Management Systems
 Record Management Systems
 Digital Image Management Systems
 Digital Asset Management Systems
 Digital Imaging Systems

More Related Content

PDF
Summit 2011 ods edw technical
PDF
It symposium 2011-ods821_data_replication_04-11-2011
PDF
Wallchart - Data Warehouse Documentation Roadmap
PPTX
CRisMac solution for ADF
PDF
Wallchart - Continuous Data Quality Process
PPTX
Oracle: DW Design
PDF
Good Data: Collaborative Analytics On Demand
PDF
Cv D Pietrzak Dpbc En
Summit 2011 ods edw technical
It symposium 2011-ods821_data_replication_04-11-2011
Wallchart - Data Warehouse Documentation Roadmap
CRisMac solution for ADF
Wallchart - Continuous Data Quality Process
Oracle: DW Design
Good Data: Collaborative Analytics On Demand
Cv D Pietrzak Dpbc En

What's hot (17)

PDF
data archiving
PPTX
Oracle: Fundamental Of DW
PPTX
Oracle Data Warehouse
PPTX
Collaborate 2012-business data transformation and consolidation for a global ...
PDF
Plm Data Migration
PDF
Microsoft SQL Server - How to Collaboratively Manage Excel Data
PDF
Accel Partners New Data Workshop 7-14-10
PPTX
NASA Facilities GIS
PDF
Talk IT_ Oracle_김태완_110831
PDF
Case Study: Using SAP to Streamline Operations of a Manufacturer
PDF
Informatica World 2006 - MDM Data Quality
PDF
Liquidity Risk Management powered by SAP HANA
PPTX
Scaling your applications with the ims catalog
PDF
The fillmore-group-aese-presentation-111810
PDF
January 2006 Document Scanning Considerations Presentation
PPTX
Vbmca204821311240
data archiving
Oracle: Fundamental Of DW
Oracle Data Warehouse
Collaborate 2012-business data transformation and consolidation for a global ...
Plm Data Migration
Microsoft SQL Server - How to Collaboratively Manage Excel Data
Accel Partners New Data Workshop 7-14-10
NASA Facilities GIS
Talk IT_ Oracle_김태완_110831
Case Study: Using SAP to Streamline Operations of a Manufacturer
Informatica World 2006 - MDM Data Quality
Liquidity Risk Management powered by SAP HANA
Scaling your applications with the ims catalog
The fillmore-group-aese-presentation-111810
January 2006 Document Scanning Considerations Presentation
Vbmca204821311240
Ad

Viewers also liked (20)

PDF
Amiel pangilinan how to use ge.tt
PPT
Разработка кросс-платформенных мобильных приложений с использованием Appceler...
PPTX
Community Marketing 2.0
PPTX
SD92 Nisga'a Language & Culture Presentation
PDF
Forest-poverty-commodity links in the Congo Basin: A value chain perspective
PPTX
Glaciers
PPT
Visitation neptune
PPTX
2014 CityMatters Survey Results
PDF
The Halifax Index 2012 Summary
PPTX
#myHFXpledge
PPTX
Andy warhol
PPT
The galaxies
PPT
Corporate wellbeing
PDF
How to use spybot search and destroy
PPT
Reported statements
PPT
Mission mercury
PDF
CESSI en Information Technology - Exportar conocimiento, la clave para crecer
PDF
How to use spagepark billing
PDF
Plan Amsterdam, over de brettenzone en Sloterdijk met onze bijdrage!
PDF
A National Management Plan for a protected non-timber CITES listed tree speci...
Amiel pangilinan how to use ge.tt
Разработка кросс-платформенных мобильных приложений с использованием Appceler...
Community Marketing 2.0
SD92 Nisga'a Language & Culture Presentation
Forest-poverty-commodity links in the Congo Basin: A value chain perspective
Glaciers
Visitation neptune
2014 CityMatters Survey Results
The Halifax Index 2012 Summary
#myHFXpledge
Andy warhol
The galaxies
Corporate wellbeing
How to use spybot search and destroy
Reported statements
Mission mercury
CESSI en Information Technology - Exportar conocimiento, la clave para crecer
How to use spagepark billing
Plan Amsterdam, over de brettenzone en Sloterdijk met onze bijdrage!
A National Management Plan for a protected non-timber CITES listed tree speci...
Ad

Similar to Gic2011 aula3-ingles (20)

PDF
Day 02 sap_bi_overview_and_terminology
PDF
A Hybrid Technology Platform for Increasing the Speed of Operational Analytics
PDF
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
PDF
AIDC NY: BODO AI Presentation - 09.19.2019
PDF
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
PDF
Data Virtualization for Data Architects (New Zealand)
PPT
Demantra Case Study Doug
PPTX
How we evolved data pipeline at Celtra and what we learned along the way
PDF
Introduction to Modern Data Virtualization 2021 (APAC)
PDF
ADV Slides: Comparing the Enterprise Analytic Solutions
PPTX
rough-work.pptx
PPTX
PPTX
Anexinet Big Data Solutions
PPTX
BI Introduction
PPTX
IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...
PPT
OLAP Cubes in Datawarehousing
PDF
FlexPod Datacenter for Oracle’s JD Edwards EnterpriseOne
PDF
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
PDF
Exploring Neo4j Graph Database as a Fast Data Access Layer
PDF
Data-Centric Approach for Project Delivery
Day 02 sap_bi_overview_and_terminology
A Hybrid Technology Platform for Increasing the Speed of Operational Analytics
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
AIDC NY: BODO AI Presentation - 09.19.2019
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Data Virtualization for Data Architects (New Zealand)
Demantra Case Study Doug
How we evolved data pipeline at Celtra and what we learned along the way
Introduction to Modern Data Virtualization 2021 (APAC)
ADV Slides: Comparing the Enterprise Analytic Solutions
rough-work.pptx
Anexinet Big Data Solutions
BI Introduction
IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...
OLAP Cubes in Datawarehousing
FlexPod Datacenter for Oracle’s JD Edwards EnterpriseOne
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Exploring Neo4j Graph Database as a Fast Data Access Layer
Data-Centric Approach for Project Delivery

More from Marielba-Mayeya Zacarias (18)

PDF
Gic2012 aula7-ingles
PDF
Gic2012 aula2-ingles
PDF
Gic2011 aula10-ingles
PDF
Gic2011 aula9-ingles
PDF
Gic2011 aula8-ingles
PDF
Gic2011 aula8-ingles
PDF
Gic2011 aula7-ingles-theory
PDF
Gic2011 aula6-ingles
PDF
Gic2011 aula5-ingles
PDF
Gic2011 aula05-ingles
PDF
Gic2011 aula4-ingles-tool section
PDF
Gic2011 aula4-ingles-theory
PDF
Gic2011 aula3-ingles
PDF
Gic2011 aula1-ingles
PDF
Gic2011 aula1-ingles
PDF
Gic2011 aula0-ingles
Gic2012 aula7-ingles
Gic2012 aula2-ingles
Gic2011 aula10-ingles
Gic2011 aula9-ingles
Gic2011 aula8-ingles
Gic2011 aula8-ingles
Gic2011 aula7-ingles-theory
Gic2011 aula6-ingles
Gic2011 aula5-ingles
Gic2011 aula05-ingles
Gic2011 aula4-ingles-tool section
Gic2011 aula4-ingles-theory
Gic2011 aula3-ingles
Gic2011 aula1-ingles
Gic2011 aula1-ingles
Gic2011 aula0-ingles

Recently uploaded (20)

PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Basic Mud Logging Guide for educational purpose
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Pharma ospi slides which help in ospi learning
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Cell Structure & Organelles in detailed.
PPTX
Institutional Correction lecture only . . .
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Complications of Minimal Access Surgery at WLH
PDF
Insiders guide to clinical Medicine.pdf
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Microbial diseases, their pathogenesis and prophylaxis
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Basic Mud Logging Guide for educational purpose
O5-L3 Freight Transport Ops (International) V1.pdf
Pharma ospi slides which help in ospi learning
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Renaissance Architecture: A Journey from Faith to Humanism
Anesthesia in Laparoscopic Surgery in India
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Supply Chain Operations Speaking Notes -ICLT Program
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Cell Structure & Organelles in detailed.
Institutional Correction lecture only . . .
TR - Agricultural Crops Production NC III.pdf
Complications of Minimal Access Surgery at WLH
Insiders guide to clinical Medicine.pdf
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES

Gic2011 aula3-ingles

  • 1. Information & Knowledge Management - Class 3 Marielba Zacarias Prof. Auxiliar DEEI FCT I, Gab 2.69, Ext. 7749 Data-warehousing mzacaria@ualg.pt http://w3.ualg.pt/~mzacaria
  • 2. Summary Data-warehouses The architected environment Design Process Data-modeling schemas
  • 3. Data Warehousing Data collection for analysis and reporting taks Historical data Stored in a distinct environment from operational data Structure different from data-bases
  • 4. Why Operational and analitical data have different requirements in terms of usage (frequency, response time) hardware software structure
  • 6. Before Data-Warehouses.... The “spider web” 6
  • 7. The “arquitected” environment” Atomic Dept. individual operational dw dw dw “data-marts” Detailed temporal More granular derived, daily Ad-hoc Temporal Some primitive current value Heuristic Integrated Typical of Marketing High access prob. Não-repetitive Subject oriented Engineering Application oriented Oriented to PC or Sumarized Production workstations Accounting 7
  • 8. Type of questions Atomico operacional Dept. individual dw J. Jones 1986-87 Jan – 4101 Clientes 123 Main St. J. Jones Fev – 4209 Desde 1982 Credit - AA 456 High St. Mar- 4175 Com saldos Credit - B Apr - 4215 > 5,000 Jones e crédito Credit? 1987-89 Monthly >= B J. Jones 456 High St. Sales? Credit - A 1989 – pte. Client types Jones J. Jones in analysis? Credit 123 Main St. History? Credit - AA 8
  • 9. Architected Environment Production Environment Operational Analitical environment Environment 9
  • 10. Data-warehouse design Requirement Performance Tuning Gatherings Query Physical Optimization Environment Setup Quality Assurance Data Modeling Rolling out to ETL Production OLAP Cube Design Production Front End Maintenance Development Incremental Enhancements Report Development
  • 11. Requirements Gathering Take into account users Executive with little time and knowledge about technical terms Interviews, JAD sessions User Reporting/Analysis Requirements Hardware, training requirements Data source identification Concrete project plan
  • 12. Physical Environment Setup Setup Servers, DBMS and databases, ETL, OLAP Cubes and reporting services Create three environments development, testing, production
  • 13. Data-modeling Depends on initial data source identification Conceptual, logical and physical data modeling Should be related to the information architecture!!!!
  • 14. Data Modeling Dimensional Approach Transactional data is partitioned in facts Numeric transaction data products ordered, price Dimensions provide context for facts order date, customer name, product number, location info, salesperson
  • 15. Dimensional Approaches Star Fact table (typically a transaction) Dimensions (context of the transaction) Snowflake Dimensions indirectly linked to fact tables
  • 21. OLAP Cube Design Specification of detailed reporting needs in terms of the multi-dimensional structure previously defined (star or snowflake), but regarded as a n- dimensional cube star/snowflake and cubes are pretty much the same thing cubes are more appropriate for not IT users
  • 29. SQL Server Integration Examples II Qualitative data Description term ActionId team meeting 18 hr distribution 19 project list 19 team meeting 19 hr distribution 26 project list 26 claims application 27 claims application 28 cards application maintenance 29 claims application integration 30 hr distribution 31 project list 31 claims application 34 claims application 35 hr distribution 36 project list 36
  • 30. SQL Server Integration Examples III Fuzzy Transformations
  • 31. Front-end development Front-ends range from in-house development with scripting languages php, asp, or perl to off-the-shelf products such as Crystal Reports or higher-end products such as Actuate OLAP vendors also offer front-ends of their own
  • 32. Report Development Derived from requirements Main point of contact between the data- warehouse and users User customization Report Delivery (web, e-mail, sms, file formats) Access privileges
  • 33. Performance Tuning ETL Query Processing Users loose interest after 30 sec! Query optimization Report Delivery
  • 34. Query Optimization Understand how your DBMS executes queries Store intermediate results in temporary tables Query Optimization tips Use indexes Partition tables (vertically and horizontally) De-normalize (less joins) Server Tuning
  • 35. Quality Assurance Test plan with quality criteria for data Critical success factor Often overlooked Performed by people with knowledge of the business data not data-warehouses Resistance
  • 36. Rolling to production Seems easy but.. Putting everyone online may take a full week in some cases Online access can be as simple as sending a link by e-mail
  • 37. Production Maintenance Backup and recovery processes Crisis Management Monitoring end-user usage Capture runaways queries before whole system is slowed down To measure usage for ROI calculations and future enhancements
  • 38. Incremental enhancements Accomplish small changes such as changing original geographical designations A company may add new sales regions No matter how simple, never do them directly in production environment
  • 43. Tools for unstructured information management Content Management Systems Record Management Systems Digital Image Management Systems Digital Asset Management Systems Digital Imaging Systems