SlideShare a Scribd company logo
7
Most read
8
Most read
10
Most read
ETL Process In Data
    Warehouse

  By: Komal Choudhary
Outline
 ETL
 Extraction
 Transformation
 Loading
ETL Overview
 Extraction Transformation Loading – ETL
 To get data out of the source and load it into the
  data warehouse.
 Data is extracted from an OLTP database,
  transformed to match the data warehouse
  schema and loaded into the data warehouse
  database
Process
Why???
 As data sources change the data warehouse will
  periodically updated.
 Also, as business changes the DW system needs
  to change – in order to maintain its value as a tool
  for decision makers, as a result of that the ETL
  also changes and evolves. The ETL processes
  must be designed for ease of modification. As
  solid, well-designed, and documented ETL
  system is necessary for the success of a data
  warehouse project.
 An ETL system consists of three consecutive
  functional
    steps: extraction, transformation, and loading:
Extraction
Extract Process
 The Extract step covers the data extraction from
  the source system and makes it accessible for
  further processing. The main objective of the
  extract step is to retrieve all the required data
  from the source system with as little resources as
  possible.
 There are several ways to perform the extract:
1. Update notification
2. Incremental extract
3. Full extract
Clean
 The cleaning step is one of
     the most important as it
     ensures the quality of the data
     in the data warehouse.
    Cleaning should perform basic
     data unification rules, such as:
1.      Making identifiers unique
2.      Convert null values into
        standardized
3.      Convert phone numbers,
        ZIP codes to a standardized
        form
4.      Validate address fields,
        convert them into proper
        naming, e.g.
        Street/St/St./Str./Str
5.      Validate address fields
        against each other.
Transformation
 applies a set of rules
  to transform the data
  from the source to the
  target.
 This includes
  converting any
  measured data to the
  same dimension using
  the same units so that
  they can later be
  joined.
Problems???
 classes of conficts
  and problems that can
  be distinguished in
  two levels : the
  schema and the
  instance level.
1. Schema-level
    problems.
2. Record-level
    problems.
3. Value-level
    problems.
Solution…
 To deal with such
 issues, the integration
 and transformation
 tasks involve a wide
 variety of functions,
 such as normalizing,
 de-normalizing ,
 reformatting,
 recalculating,
 summarizing, merging
 data from multiple
 sources, modifying key
 structures, adding an
 element of time,
 identifying default
 values, supplying
 decision commands to
 choose between
Loading
 Loading data to the
 target
 multidimensional
 structure is the final
 ETL step. In this step,
 extracted and
 transformed data is
 written into the
 dimensional
 structures actually
 accessed by the end
 users and application
 systems. Loading step
 includes both loading
 dimension tables and
Thanks!!!!!

More Related Content

PPTX
ETL Testing Overview
PPT
Dw & etl concepts
PPTX
Introduction to ETL process
PPT
Star schema PPT
PPT
Data Warehouse Architectures
PPTX
Data extraction, transformation, and loading
PPT
Date warehousing concepts
PPTX
What is ETL?
ETL Testing Overview
Dw & etl concepts
Introduction to ETL process
Star schema PPT
Data Warehouse Architectures
Data extraction, transformation, and loading
Date warehousing concepts
What is ETL?

What's hot (20)

PPTX
ETL_Methodology.pptx
PDF
Etl overview training
PPTX
Data warehouse
PPTX
Data Warehouse
PPT
Data warehouse
PDF
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
PPTX
Introduction to Data Science
PPTX
Oracle database introduction
PPT
Data Warehousing and Data Mining
PPTX
Introduction to oracle database (basic concepts)
PPTX
ETL Testing Interview Questions and Answers
PPTX
Data warehousing and data mart
PDF
Informatica Tutorial For Beginners | Informatica Powercenter Tutorial | Edureka
PPTX
Etl - Extract Transform Load
PPTX
Data Warehouse Fundamentals
PPS
Data Warehouse 101
PPT
PPT
Data preprocessing ng
PPTX
Oltp vs olap
PPTX
Basic Introduction of Data Warehousing from Adiva Consulting
ETL_Methodology.pptx
Etl overview training
Data warehouse
Data Warehouse
Data warehouse
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
Introduction to Data Science
Oracle database introduction
Data Warehousing and Data Mining
Introduction to oracle database (basic concepts)
ETL Testing Interview Questions and Answers
Data warehousing and data mart
Informatica Tutorial For Beginners | Informatica Powercenter Tutorial | Edureka
Etl - Extract Transform Load
Data Warehouse Fundamentals
Data Warehouse 101
Data preprocessing ng
Oltp vs olap
Basic Introduction of Data Warehousing from Adiva Consulting
Ad

Viewers also liked (7)

PPTX
Le processus ETL (Extraction, Transformation, Chargement)
PDF
ETL Process
PPTX
data warehouse , data mart, etl
PPS
Etl Overview (Extract, Transform, And Load)
DOC
Data Warehouse (ETL) testing process
PDF
Introduction to ETL and Data Integration
PPTX
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Le processus ETL (Extraction, Transformation, Chargement)
ETL Process
data warehouse , data mart, etl
Etl Overview (Extract, Transform, And Load)
Data Warehouse (ETL) testing process
Introduction to ETL and Data Integration
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Ad

Similar to Etl process in data warehouse (20)

PPT
definign etl process extract transform load.ppt
PDF
ETL Process & Data Warehouse Fundamentals
PPTX
GROPSIKS.pptx
PPTX
1.3 CLASS-DW.pptx-ETL process in details with detailed descriptions
PPT
Intro to Data warehousing lecture 09
PPTX
Data warehouse 5: Data Reconciliation and Transformation in Data Warehouse
DOCX
Etl techniques
DOCX
Final Project Write-up
PPT
Data Warehouse Basic Guide
PPT
extract, transform, load_Data Analyt.ppt
PPTX
Extract, Transform and Load.pptx
PPTX
Data Mining and Data Warehousing Presentation
PPT
Etl data processing system which is very useful for the engineering students
PPT
Introduction to ETL Data Warehousing.ppt
PPTX
ETL-Datawarehousing.ppt.pptx
PPT
D01 etl
PPTX
Chapter 6.pptx
PPTX
Lecture13- Extract Transform Load presentation.pptx
PPT
ETL Testing - Introduction to ETL testing
PPT
ETL Testing - Introduction to ETL Testing
definign etl process extract transform load.ppt
ETL Process & Data Warehouse Fundamentals
GROPSIKS.pptx
1.3 CLASS-DW.pptx-ETL process in details with detailed descriptions
Intro to Data warehousing lecture 09
Data warehouse 5: Data Reconciliation and Transformation in Data Warehouse
Etl techniques
Final Project Write-up
Data Warehouse Basic Guide
extract, transform, load_Data Analyt.ppt
Extract, Transform and Load.pptx
Data Mining and Data Warehousing Presentation
Etl data processing system which is very useful for the engineering students
Introduction to ETL Data Warehousing.ppt
ETL-Datawarehousing.ppt.pptx
D01 etl
Chapter 6.pptx
Lecture13- Extract Transform Load presentation.pptx
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL Testing

Etl process in data warehouse

  • 1. ETL Process In Data Warehouse By: Komal Choudhary
  • 2. Outline  ETL  Extraction  Transformation  Loading
  • 3. ETL Overview  Extraction Transformation Loading – ETL  To get data out of the source and load it into the data warehouse.  Data is extracted from an OLTP database, transformed to match the data warehouse schema and loaded into the data warehouse database
  • 5. Why???  As data sources change the data warehouse will periodically updated.  Also, as business changes the DW system needs to change – in order to maintain its value as a tool for decision makers, as a result of that the ETL also changes and evolves. The ETL processes must be designed for ease of modification. As solid, well-designed, and documented ETL system is necessary for the success of a data warehouse project.  An ETL system consists of three consecutive functional steps: extraction, transformation, and loading:
  • 7. Extract Process  The Extract step covers the data extraction from the source system and makes it accessible for further processing. The main objective of the extract step is to retrieve all the required data from the source system with as little resources as possible.  There are several ways to perform the extract: 1. Update notification 2. Incremental extract 3. Full extract
  • 8. Clean  The cleaning step is one of the most important as it ensures the quality of the data in the data warehouse.  Cleaning should perform basic data unification rules, such as: 1. Making identifiers unique 2. Convert null values into standardized 3. Convert phone numbers, ZIP codes to a standardized form 4. Validate address fields, convert them into proper naming, e.g. Street/St/St./Str./Str 5. Validate address fields against each other.
  • 9. Transformation  applies a set of rules to transform the data from the source to the target.  This includes converting any measured data to the same dimension using the same units so that they can later be joined.
  • 10. Problems???  classes of conficts and problems that can be distinguished in two levels : the schema and the instance level. 1. Schema-level problems. 2. Record-level problems. 3. Value-level problems.
  • 11. Solution…  To deal with such issues, the integration and transformation tasks involve a wide variety of functions, such as normalizing, de-normalizing , reformatting, recalculating, summarizing, merging data from multiple sources, modifying key structures, adding an element of time, identifying default values, supplying decision commands to choose between
  • 12. Loading  Loading data to the target multidimensional structure is the final ETL step. In this step, extracted and transformed data is written into the dimensional structures actually accessed by the end users and application systems. Loading step includes both loading dimension tables and