SlideShare a Scribd company logo
Integrating Hadoop into Data
Warehousing Architecture
Where is the Wisdom? Lost in the Knowledge.
Where is the Knowledge? Lost in the Information.
T.S. Eliot
© Humza Naseer, University of Melbourne 2014
Outline
Findings,
Conclusion &
Future Work
Current Work:
Hadoop Integration
into Data Warehouse
Environment
Related Work:
Trends in Data
Warehouse
Architecture
Link Between Hadoop
and Data Warehouse
Introduction
© Humza Naseer, University of Melbourne 2014 2
Identify all possible enterprise data assets
Select those assets that have actionable content and can be
accessed
Bring the data assets into a logically centralized “enterprise
data warehouse”
Expose those data assets most effectively for decision
making
(Kimball & Ross, 2013)
Intro: The Data Warehouse Mission
© Humza Naseer, University of Melbourne 2014 3
Hadoop is an Ecosystem of products
 Open source
 Vendor distributions
 Additional tools for development and administration
Hadoop Benefits
 Enables big data analytics
 Supports advanced forms of analytics
 Scales cost effectively
 Extends a data warehouse environment
Hadoop Limitations
• Low latency queries
• Ease of access
• Data integration and integrity
• Fine grained security
Intro: Overview of Hadoop
Unstructured
Data
Query Results
HDFS
Data Nodes
Map Reduce
© Humza Naseer, University of Melbourne 2014 4
A data warehouse system fetches and unifies data from
heterogeneous source systems into a centralized dimensional
or normalized data repository
(Rainardi, 2008)
Data warehouse is not a tool or technology
 It is a business process which unifies an enterprise through data
(Eckerson, 2012)
Hadoop a problem or an opportunity?
Where Hadoop fits into data warehouse architecture?
Link Between Hadoop and Data
Warehouse
© Humza Naseer, University of Melbourne 2014 5
Traditional RDBMSs cannot handle
 The new data types
 Extended analytic processing
 Terabytes/hour loading with immediate query access
We want to use SQL, but we don’t want the RDBMS storage
constraints
The disruptive solution: Hadoop (Kimball & Ross, 2013)
Why is Integration Happening?
DB1
DB2
DB3
Transformation
and Load
Central
DW
BI App-1
BI App-2
BI App-3
Decision
Making
© Humza Naseer, University of Melbourne 2014 6
Ponniah (2011) notes that selection of DW architecture is based on
enterprise requirements.
DW architecture has multiple architectural layers and components
 Logical architecture
 Physical architecture
(Moss and Atre, 2013)
DW architecture overlaps with data integration, business intelligence and
enterprise data
(Russom, 2014)
Inmon vs Kimball dichotomy
(Ariyachandra and Watson, 2010)
Trends in Data Warehouse
Architectures
© Humza Naseer, University of Melbourne 2014 7
Eckerson (2012) notes that reporting and analytics have different
workload requirements
Reporting is based on the entities and facts which are well known
Advanced analytics empowers the discovery of new facts which are
not well known
Multi-platform unified data architecture
 Includes enterprise data warehouse (EDW) and several other new data
platforms which augment EDW
(Russom, 2013)
Hadoop Integration into data
warehousing environment
© Humza Naseer, University of Melbourne 2014 8
Data Staging
Data archiving
Advanced analytics
Multi-structured data
Uses of Hadoop that Extend DW
Architectures
DB1
DB2
DB3
Transformation
and Load
EDW
BI App-1
BI App-2
BI App-3
Decision
Making
© Humza Naseer, University of Melbourne 2014 9
Analytics and reporting have different requirements for DW
architectures
Characterize the DW architecture by counting the number and
types of workloads it supports
Logical DW architecture must integrate multiple physical
platforms
Design of logical DW architecture must be compartmentalized
Proposed logical architecture for new DW ecosystem
(An Extension of Eckerson (2012) BI architecture)
Findings
© Humza Naseer, University of Melbourne 2014 10
Enterprise Data
WarehouseOperational
System
Operational
System
Operational
Data Store
Subject Area
Data Marts
BI
Server
Online Transaction Processing Systems
(Relational Data) Event driven alerting
environment
Reporting/analysis
Environment
Logical Architecture of New DW
Ecosystem
DW-Centric Sandbox
Web Data
Machine Data
Log files
Legacy/External
Data
Replicated
Sandbox
In-memory
BI Sandbox
Hadoop Ecosystem
Cluster
(Non-relational Data)
Exploration/discovery
environment
Non-relational
Extract, transform and Load
(Batch, real time or near real
time)
Power User
Casual User
QueryETLStreaming
Top down architecture
Bottom up architecture
© Humza Naseer, University of Melbourne 2014 11
BI Assessment Model
Data Warehouse
Ecosystem
Data Marts
Enterprise Data
Warehouse
Work Load Specific
Data Platforms
Workload Capacity
Degree of
Integration
High
High
Low
Low
Degree of
Standardization
High
Low
© Humza Naseer, University of Melbourne 2014 12
Hadoop enables new types of applications within DW
environment
Big data analytics, advanced analytics and discovery analytics
Information exploration and augmenting a data warehouse
Should be implemented in multi-platform DW environment
Future work:
 Conformed dimensions
 BI maturity roadmap
Conclusion
© Humza Naseer, University of Melbourne 2014 13
Questions
© Humza Naseer, University of Melbourne 2014 14

More Related Content

PPTX
Breakout: Hadoop and the Operational Data Store
PDF
A Reference Architecture for ETL 2.0
PPTX
The Future of Data Warehousing: ETL Will Never be the Same
PDF
Planing and optimizing data lake architecture
PDF
Big Data: Architecture and Performance Considerations in Logical Data Lakes
PPTX
Hadoop and Enterprise Data Warehouse
PDF
Hadoop data-lake-white-paper
PPTX
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
Breakout: Hadoop and the Operational Data Store
A Reference Architecture for ETL 2.0
The Future of Data Warehousing: ETL Will Never be the Same
Planing and optimizing data lake architecture
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Hadoop and Enterprise Data Warehouse
Hadoop data-lake-white-paper
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million

What's hot (20)

PDF
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
PPTX
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
PPTX
Hybrid Data Warehouse Hadoop Implementations
PDF
What is hadoop
PDF
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
PDF
Integrated Data Warehouse with Hadoop and Oracle Database
PDF
Big Data Architecture Workshop - Vahid Amiri
PPTX
Hadoop Powers Modern Enterprise Data Architectures
PPTX
Scaling Data Science on Big Data
PPTX
Hadoop and Hive in Enterprises
PPTX
Data Warehouse Optimization
PPTX
PPTX
Microsoft Data Platform - What's included
PPTX
Microsoft Azure Big Data Analytics
PPTX
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
PDF
Data lake
PDF
Data Lake for the Cloud: Extending your Hadoop Implementation
PDF
Hadoop and the Data Warehouse: When to Use Which
PDF
Building a Data Lake - An App Dev's Perspective
PDF
5 Steps for Architecting a Data Lake
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Hybrid Data Warehouse Hadoop Implementations
What is hadoop
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Integrated Data Warehouse with Hadoop and Oracle Database
Big Data Architecture Workshop - Vahid Amiri
Hadoop Powers Modern Enterprise Data Architectures
Scaling Data Science on Big Data
Hadoop and Hive in Enterprises
Data Warehouse Optimization
Microsoft Data Platform - What's included
Microsoft Azure Big Data Analytics
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Data lake
Data Lake for the Cloud: Extending your Hadoop Implementation
Hadoop and the Data Warehouse: When to Use Which
Building a Data Lake - An App Dev's Perspective
5 Steps for Architecting a Data Lake
Ad

Viewers also liked (20)

PPTX
Hadoop and Your Data Warehouse
KEY
Large scale ETL with Hadoop
PPTX
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
PPTX
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
PPTX
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
PDF
Introduction to Apache Tajo: Data Warehouse for Big Data
PPTX
Roadmap for solution company
PDF
Tajo: A Distributed Data Warehouse System for Hadoop
PDF
Informatica Command Line Statements
PDF
Dimensional modeling primer
PPT
Dimensional Modelling Session 2
PPT
Dimensional modelling-mod-3
PPTX
Why PTC for SLM?
PPTX
Cloud- A Technical or Organisational Challenge? Or Both?
PPTX
Dimensional Modeling
PPT
Kimball Vs Inmon
PDF
Designing the Industrial Internet
PDF
Retaam_ThingWorx
PDF
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
PDF
Building a Big Data platform with the Hadoop ecosystem
Hadoop and Your Data Warehouse
Large scale ETL with Hadoop
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Introduction to Apache Tajo: Data Warehouse for Big Data
Roadmap for solution company
Tajo: A Distributed Data Warehouse System for Hadoop
Informatica Command Line Statements
Dimensional modeling primer
Dimensional Modelling Session 2
Dimensional modelling-mod-3
Why PTC for SLM?
Cloud- A Technical or Organisational Challenge? Or Both?
Dimensional Modeling
Kimball Vs Inmon
Designing the Industrial Internet
Retaam_ThingWorx
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
Building a Big Data platform with the Hadoop ecosystem
Ad

Similar to Hadoop Integration into Data Warehousing Architectures (20)

PDF
Trends in Computer Science and Information Technology
PDF
Building a Modern Data Architecture with Enterprise Hadoop
PDF
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
PDF
Big Data , Big Problem?
PDF
Infrastructure Considerations for Analytical Workloads
PDF
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
PPTX
Better Together: The New Data Management Orchestra
PPTX
Better Together: The New Data Management Orchestra
PDF
Introduction to Hadoop
PDF
Modern data warehouse
PDF
Modern data warehouse
PPTX
What is Hadoop? Key Concepts, Architecture, and Applications
PPTX
Presentation ON Hive Big Data NOSQL.pptx
PDF
Big Data Analysis and Its Scheduling Policy – Hadoop
PDF
G017143640
PPTX
The Apache Hadoop software library is a framework that allows for the distrib...
PPTX
Big data and apache hadoop adoption
PDF
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
PDF
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
PDF
Relational Databases For An Efficient Data Management And...
Trends in Computer Science and Information Technology
Building a Modern Data Architecture with Enterprise Hadoop
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Big Data , Big Problem?
Infrastructure Considerations for Analytical Workloads
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Better Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
Introduction to Hadoop
Modern data warehouse
Modern data warehouse
What is Hadoop? Key Concepts, Architecture, and Applications
Presentation ON Hive Big Data NOSQL.pptx
Big Data Analysis and Its Scheduling Policy – Hadoop
G017143640
The Apache Hadoop software library is a framework that allows for the distrib...
Big data and apache hadoop adoption
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Relational Databases For An Efficient Data Management And...

Recently uploaded (20)

PPTX
1_Introduction to advance data techniques.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
annual-report-2024-2025 original latest.
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Mega Projects Data Mega Projects Data
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPT
Reliability_Chapter_ presentation 1221.5784
PPT
Quality review (1)_presentation of this 21
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
.pdf is not working space design for the following data for the following dat...
1_Introduction to advance data techniques.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
annual-report-2024-2025 original latest.
Miokarditis (Inflamasi pada Otot Jantung)
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
IB Computer Science - Internal Assessment.pptx
Introduction to Knowledge Engineering Part 1
Mega Projects Data Mega Projects Data
Galatica Smart Energy Infrastructure Startup Pitch Deck
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Reliability_Chapter_ presentation 1221.5784
Quality review (1)_presentation of this 21
climate analysis of Dhaka ,Banglades.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
.pdf is not working space design for the following data for the following dat...

Hadoop Integration into Data Warehousing Architectures

  • 1. Integrating Hadoop into Data Warehousing Architecture Where is the Wisdom? Lost in the Knowledge. Where is the Knowledge? Lost in the Information. T.S. Eliot © Humza Naseer, University of Melbourne 2014
  • 2. Outline Findings, Conclusion & Future Work Current Work: Hadoop Integration into Data Warehouse Environment Related Work: Trends in Data Warehouse Architecture Link Between Hadoop and Data Warehouse Introduction © Humza Naseer, University of Melbourne 2014 2
  • 3. Identify all possible enterprise data assets Select those assets that have actionable content and can be accessed Bring the data assets into a logically centralized “enterprise data warehouse” Expose those data assets most effectively for decision making (Kimball & Ross, 2013) Intro: The Data Warehouse Mission © Humza Naseer, University of Melbourne 2014 3
  • 4. Hadoop is an Ecosystem of products  Open source  Vendor distributions  Additional tools for development and administration Hadoop Benefits  Enables big data analytics  Supports advanced forms of analytics  Scales cost effectively  Extends a data warehouse environment Hadoop Limitations • Low latency queries • Ease of access • Data integration and integrity • Fine grained security Intro: Overview of Hadoop Unstructured Data Query Results HDFS Data Nodes Map Reduce © Humza Naseer, University of Melbourne 2014 4
  • 5. A data warehouse system fetches and unifies data from heterogeneous source systems into a centralized dimensional or normalized data repository (Rainardi, 2008) Data warehouse is not a tool or technology  It is a business process which unifies an enterprise through data (Eckerson, 2012) Hadoop a problem or an opportunity? Where Hadoop fits into data warehouse architecture? Link Between Hadoop and Data Warehouse © Humza Naseer, University of Melbourne 2014 5
  • 6. Traditional RDBMSs cannot handle  The new data types  Extended analytic processing  Terabytes/hour loading with immediate query access We want to use SQL, but we don’t want the RDBMS storage constraints The disruptive solution: Hadoop (Kimball & Ross, 2013) Why is Integration Happening? DB1 DB2 DB3 Transformation and Load Central DW BI App-1 BI App-2 BI App-3 Decision Making © Humza Naseer, University of Melbourne 2014 6
  • 7. Ponniah (2011) notes that selection of DW architecture is based on enterprise requirements. DW architecture has multiple architectural layers and components  Logical architecture  Physical architecture (Moss and Atre, 2013) DW architecture overlaps with data integration, business intelligence and enterprise data (Russom, 2014) Inmon vs Kimball dichotomy (Ariyachandra and Watson, 2010) Trends in Data Warehouse Architectures © Humza Naseer, University of Melbourne 2014 7
  • 8. Eckerson (2012) notes that reporting and analytics have different workload requirements Reporting is based on the entities and facts which are well known Advanced analytics empowers the discovery of new facts which are not well known Multi-platform unified data architecture  Includes enterprise data warehouse (EDW) and several other new data platforms which augment EDW (Russom, 2013) Hadoop Integration into data warehousing environment © Humza Naseer, University of Melbourne 2014 8
  • 9. Data Staging Data archiving Advanced analytics Multi-structured data Uses of Hadoop that Extend DW Architectures DB1 DB2 DB3 Transformation and Load EDW BI App-1 BI App-2 BI App-3 Decision Making © Humza Naseer, University of Melbourne 2014 9
  • 10. Analytics and reporting have different requirements for DW architectures Characterize the DW architecture by counting the number and types of workloads it supports Logical DW architecture must integrate multiple physical platforms Design of logical DW architecture must be compartmentalized Proposed logical architecture for new DW ecosystem (An Extension of Eckerson (2012) BI architecture) Findings © Humza Naseer, University of Melbourne 2014 10
  • 11. Enterprise Data WarehouseOperational System Operational System Operational Data Store Subject Area Data Marts BI Server Online Transaction Processing Systems (Relational Data) Event driven alerting environment Reporting/analysis Environment Logical Architecture of New DW Ecosystem DW-Centric Sandbox Web Data Machine Data Log files Legacy/External Data Replicated Sandbox In-memory BI Sandbox Hadoop Ecosystem Cluster (Non-relational Data) Exploration/discovery environment Non-relational Extract, transform and Load (Batch, real time or near real time) Power User Casual User QueryETLStreaming Top down architecture Bottom up architecture © Humza Naseer, University of Melbourne 2014 11
  • 12. BI Assessment Model Data Warehouse Ecosystem Data Marts Enterprise Data Warehouse Work Load Specific Data Platforms Workload Capacity Degree of Integration High High Low Low Degree of Standardization High Low © Humza Naseer, University of Melbourne 2014 12
  • 13. Hadoop enables new types of applications within DW environment Big data analytics, advanced analytics and discovery analytics Information exploration and augmenting a data warehouse Should be implemented in multi-platform DW environment Future work:  Conformed dimensions  BI maturity roadmap Conclusion © Humza Naseer, University of Melbourne 2014 13
  • 14. Questions © Humza Naseer, University of Melbourne 2014 14