SlideShare a Scribd company logo
Hadoop & Data Warehouse
Hadoop
- A Set of Technologies
Data Warehouse
- A Concept or Process
And many more..
Comparing Hadoop with Enterprise Data Warehouse ??
Vs
Any attempt to implement Hadoop technology to
replace the organizations existing data warehouse may
lead to failure..
 Hadoop set of technologies should be used to make EDW more powerful.
 A meaningful and honest assessment need to be done
 To decide where and how Hadoop can be integrated to achieve the optimized
architecture
 Finally look at few high level use cases utilizing Hadoop capabilities in DWH
Let's get into some more detail..
 Explore Data Warehouse Business Goals / Benefits
 Glimpse of Core Advantages of Hadoop
 Understand Limitations of Hadoop
Enterprise Data warehouse Business Goals / Benefits:
• Evaluate, monitor, manage and improve corporate performance.
• Customer relationship management and enhancement.
• Cleanse and improve the quality of organization's data.
• Decision support and Forecast future growth and needs
• Support, Monitor and modify a marketing campaign.
Scalable
Hadoop is highly scalable, it can
easily store and distribute very
large datasets on servers that
operate in parallel
Cost Effective
Hadoop is very cost-effective. It is
based on scale out architecture
which can affordably store big
volume of data for future use.
Data are managed through clusters based
on distributed file systems. The technique
used in mapping the data result in faster
data processing
Fast
Flexible
Failure Resistant
Hadoop enables enterprises
to access and process data in
a very easy way to generate
the values required, thereby
providing the enterprises
with the tools to get valuable
insights from various types of
data sources operating in
parallel.
One of the great advantages of Hadoop is its fault
tolerance, which is provided by replicating the data to
another node in the cluster. The data from the
replicated node can be used in the event of a failure.
Hadoop core Advantages
Hadoop Limitations
Vulnerable
Latency
Inaptness with
small data
Stability Issues
Security Concern
Hadoop is written in java which is
most used language, and been most
heavily exploited by cyber attackers
and as a result, implicated in
numerous security breaches.
Hadoop is not suited for small
data. HDFS lacks the ability to
efficiently support the random
reading of small files because of
its high capacity design.
Hadoop being an open
source platform has a
Fair possibilities of
stability issues.
HDFS is optimized to access batches of data set
quicker (high throughput), rather than
particular records in that data set (low latency)
Hadoop is missing encryption at storage and
network levels, which is a major concern.
Hadoop supports Kerberos authentication,
which is not easy to manage
Some scenarios where power of Hadoop is needed to strengthen the Data Warehouse
 Storage and Processing of semi structured and un structured data
 Reducing the cost of Data Storage in case of huge data volumes
 Increase Data retention to avoid premature data death
 Pre processing of big volume of data
CRM
ERP
Legacy
Source Systems
Third Party
External Data
Extract
Transform &
Load
Enterprise Data
Warehouse
ODS
Data Mart
Data Mart
Analytics
ETL Layer Data Repository Layer Analytics Layer
Conventional Data Warehouse Architecture
This is traditional Data Warehouse Architecture which is being used for many
organizations. There are some variance to this based on technical and organizational
needs.
Unstructured
Data Sources
Semi structured
Data Sources
Structured Data
Sources Enterprise Data
Warehouse
Advance
Analytical
Applications
Business
Intelligence
Layer
In this use case, Hadoop is being used for loading the unstructured and semi structured
data and making it available for EDW based on the organizations requirement and also
offering it for further analytical processing. The integration of new data sources into the
existing EDW will empower organizations more and deeper analytics and insights.
CRM
ERP
Legacy
Third Party
External Data
Extract
Transform &
Load
Enterprise Data
Warehouse
ODS
Data Mart
Data Mart
Analytics
Unstructured
Sources
XMLs, Doc
Files
Web Logs,
Emails
Images,
Videos
File Copy Analytic Tools
In this use case, Hadoop is being used as a main data repository and data from data
warehouse is being archived in Hadoop taking advantage of its low cost storage. Data
warehouse is being taken here as a source for Hadoop. Another point to note here is that
there is no change in existing setup of organization's EDW.
Unstructured
Sources
Structured
Sources
CRM
ERP
Legacy
XMLs, Doc Files
Web Logs, Emails
Images, Videos
Enterprise Data
Warehouse
ODS
Data Mart
Data Mart
Analytics LayerAnalytic tools
In this use case, Hadoop is shown as a layer before existing EDW. Sourcing all of the data,
Hadoop's capability of parallel processing is being utilized. It offloads majority of
transformations from EDW and feed pre processed data. EDW is used to more focus on
Aggregations and Analytical reporting.
Data Sources
XMLs, Doc Files
Web Logs, Emails
Images, Videos
CRM
ERP
Legacy
Data Lake
Extract
&
Load
Analytic Sandbox
Transformation
Enterprise Data
Warehouse
Business
Intelligence
Layer
In this scenario, Data lake is utilized and ELT over ETL is being used. A Data lake is a
storage repository that hold a vast amount of raw data in its native form and can be
transformed later as per the need. EDW is applying transformations and utilizing the data.
This kind of architecture is great for Organization's data science needs where Data
Scientists can use sandbox to apply their models on the raw data stored in Data Lake.
To Conclude..
Data Warehouse architects have more tools to play with and there is a need of detailed
analysis for the organization and business goals before choosing the right set of
technologies to build a data warehouse.
The core benefits of data warehouse are still in need and will always be. There is always
an opportunity to strengthen them by smart use of appropriate tools and technologies.
Hadoop can only fail if there is an attempt to use it just for replacement of existing data
warehouse without the proper feasibility analysis and intent to come up with optimized
architecture aligned with Organizational goals.
Hadoop & Data Warehouse

More Related Content

PPTX
Introduction to Hadoop and Hadoop component
PPTX
Introduction to YARN and MapReduce 2
PDF
Regularization
PDF
Word Embeddings - Introduction
PPTX
Meta-Learning Presentation
PPT
Rule Based System
PDF
Einfuehrung in Apache Spark
PDF
정말 딥러닝은 사람처럼 세상을 인식하고 있을까?
Introduction to Hadoop and Hadoop component
Introduction to YARN and MapReduce 2
Regularization
Word Embeddings - Introduction
Meta-Learning Presentation
Rule Based System
Einfuehrung in Apache Spark
정말 딥러닝은 사람처럼 세상을 인식하고 있을까?

What's hot (20)

PPTX
Deep learning with keras
PDF
Recurrent Neural Networks. Part 1: Theory
PPTX
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
PPTX
Big Data Open Source Technologies
PDF
Introduction to Solr
PDF
Apache Spark Notes
PPTX
Natural language processing and transformer models
PPTX
Few shot learning/ one shot learning/ machine learning
PPTX
NEURAL NETWORK IN MACHINE LEARNING FOR STUDENTS
PPTX
Knowledge representation
PPTX
Big data Hadoop presentation
PDF
Hadoop Overview & Architecture
 
PPTX
Top 5 Python Libraries For Data Science | Python Libraries Explained | Python...
PDF
Information Extraction
PPTX
Natural Language Processing
PPTX
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
PDF
Introduction to Natural Language Processing (NLP)
PDF
And then there were ... Large Language Models
PDF
Data Warehouse Design Considerations
Deep learning with keras
Recurrent Neural Networks. Part 1: Theory
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Big Data Open Source Technologies
Introduction to Solr
Apache Spark Notes
Natural language processing and transformer models
Few shot learning/ one shot learning/ machine learning
NEURAL NETWORK IN MACHINE LEARNING FOR STUDENTS
Knowledge representation
Big data Hadoop presentation
Hadoop Overview & Architecture
 
Top 5 Python Libraries For Data Science | Python Libraries Explained | Python...
Information Extraction
Natural Language Processing
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Introduction to Natural Language Processing (NLP)
And then there were ... Large Language Models
Data Warehouse Design Considerations
Ad

Similar to Hadoop & Data Warehouse (20)

PDF
Hadoop data-lake-white-paper
PDF
Data Ware House System in Cloud Environment
PDF
50 Shades of SQL
PPTX
Business Intelligence Module 3_Datawarehousing.pptx
PDF
3 dw architectures
ODP
EDW and Hadoop
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r2)
PDF
Innovation in the Data Warehouse - StampedeCon 2016
PDF
Hw09 Data Processing In The Enterprise
PPTX
Business intelligence-sharda-dss10-ppt-03-pptx.pptx
PDF
Hadoop and SQL: Delivery Analytics Across the Organization
PDF
Architecting Agile Data Applications for Scale
PPTX
sharda_dss10_ppt_03_GE-211566.pptx000000000
PPT
Data wirehouse
PDF
BI Chapter 03.pdf business business business business business business
PPTX
Data warehouseold
PPTX
Hadoop and Your Data Warehouse
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
DOC
Data mining notes
PDF
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Hadoop data-lake-white-paper
Data Ware House System in Cloud Environment
50 Shades of SQL
Business Intelligence Module 3_Datawarehousing.pptx
3 dw architectures
EDW and Hadoop
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Innovation in the Data Warehouse - StampedeCon 2016
Hw09 Data Processing In The Enterprise
Business intelligence-sharda-dss10-ppt-03-pptx.pptx
Hadoop and SQL: Delivery Analytics Across the Organization
Architecting Agile Data Applications for Scale
sharda_dss10_ppt_03_GE-211566.pptx000000000
Data wirehouse
BI Chapter 03.pdf business business business business business business
Data warehouseold
Hadoop and Your Data Warehouse
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data mining notes
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Ad

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Big Data Technologies - Introduction.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation theory and applications.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Approach and Philosophy of On baking technology
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Review of recent advances in non-invasive hemoglobin estimation
MIND Revenue Release Quarter 2 2025 Press Release
Diabetes mellitus diagnosis method based random forest with bat algorithm
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Big Data Technologies - Introduction.pptx
Network Security Unit 5.pdf for BCA BBA.
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Machine learning based COVID-19 study performance prediction
Empathic Computing: Creating Shared Understanding
Encapsulation theory and applications.pdf
sap open course for s4hana steps from ECC to s4
Programs and apps: productivity, graphics, security and other tools
Approach and Philosophy of On baking technology
Dropbox Q2 2025 Financial Results & Investor Presentation
Advanced methodologies resolving dimensionality complications for autism neur...
Chapter 3 Spatial Domain Image Processing.pdf
The AUB Centre for AI in Media Proposal.docx

Hadoop & Data Warehouse

  • 2. Hadoop - A Set of Technologies Data Warehouse - A Concept or Process And many more..
  • 3. Comparing Hadoop with Enterprise Data Warehouse ?? Vs Any attempt to implement Hadoop technology to replace the organizations existing data warehouse may lead to failure..
  • 4.  Hadoop set of technologies should be used to make EDW more powerful.  A meaningful and honest assessment need to be done  To decide where and how Hadoop can be integrated to achieve the optimized architecture
  • 5.  Finally look at few high level use cases utilizing Hadoop capabilities in DWH Let's get into some more detail..  Explore Data Warehouse Business Goals / Benefits  Glimpse of Core Advantages of Hadoop  Understand Limitations of Hadoop
  • 6. Enterprise Data warehouse Business Goals / Benefits: • Evaluate, monitor, manage and improve corporate performance. • Customer relationship management and enhancement. • Cleanse and improve the quality of organization's data. • Decision support and Forecast future growth and needs • Support, Monitor and modify a marketing campaign.
  • 7. Scalable Hadoop is highly scalable, it can easily store and distribute very large datasets on servers that operate in parallel Cost Effective Hadoop is very cost-effective. It is based on scale out architecture which can affordably store big volume of data for future use. Data are managed through clusters based on distributed file systems. The technique used in mapping the data result in faster data processing Fast Flexible Failure Resistant Hadoop enables enterprises to access and process data in a very easy way to generate the values required, thereby providing the enterprises with the tools to get valuable insights from various types of data sources operating in parallel. One of the great advantages of Hadoop is its fault tolerance, which is provided by replicating the data to another node in the cluster. The data from the replicated node can be used in the event of a failure. Hadoop core Advantages
  • 8. Hadoop Limitations Vulnerable Latency Inaptness with small data Stability Issues Security Concern Hadoop is written in java which is most used language, and been most heavily exploited by cyber attackers and as a result, implicated in numerous security breaches. Hadoop is not suited for small data. HDFS lacks the ability to efficiently support the random reading of small files because of its high capacity design. Hadoop being an open source platform has a Fair possibilities of stability issues. HDFS is optimized to access batches of data set quicker (high throughput), rather than particular records in that data set (low latency) Hadoop is missing encryption at storage and network levels, which is a major concern. Hadoop supports Kerberos authentication, which is not easy to manage
  • 9. Some scenarios where power of Hadoop is needed to strengthen the Data Warehouse  Storage and Processing of semi structured and un structured data  Reducing the cost of Data Storage in case of huge data volumes  Increase Data retention to avoid premature data death  Pre processing of big volume of data
  • 10. CRM ERP Legacy Source Systems Third Party External Data Extract Transform & Load Enterprise Data Warehouse ODS Data Mart Data Mart Analytics ETL Layer Data Repository Layer Analytics Layer Conventional Data Warehouse Architecture This is traditional Data Warehouse Architecture which is being used for many organizations. There are some variance to this based on technical and organizational needs.
  • 11. Unstructured Data Sources Semi structured Data Sources Structured Data Sources Enterprise Data Warehouse Advance Analytical Applications Business Intelligence Layer In this use case, Hadoop is being used for loading the unstructured and semi structured data and making it available for EDW based on the organizations requirement and also offering it for further analytical processing. The integration of new data sources into the existing EDW will empower organizations more and deeper analytics and insights.
  • 12. CRM ERP Legacy Third Party External Data Extract Transform & Load Enterprise Data Warehouse ODS Data Mart Data Mart Analytics Unstructured Sources XMLs, Doc Files Web Logs, Emails Images, Videos File Copy Analytic Tools In this use case, Hadoop is being used as a main data repository and data from data warehouse is being archived in Hadoop taking advantage of its low cost storage. Data warehouse is being taken here as a source for Hadoop. Another point to note here is that there is no change in existing setup of organization's EDW.
  • 13. Unstructured Sources Structured Sources CRM ERP Legacy XMLs, Doc Files Web Logs, Emails Images, Videos Enterprise Data Warehouse ODS Data Mart Data Mart Analytics LayerAnalytic tools In this use case, Hadoop is shown as a layer before existing EDW. Sourcing all of the data, Hadoop's capability of parallel processing is being utilized. It offloads majority of transformations from EDW and feed pre processed data. EDW is used to more focus on Aggregations and Analytical reporting.
  • 14. Data Sources XMLs, Doc Files Web Logs, Emails Images, Videos CRM ERP Legacy Data Lake Extract & Load Analytic Sandbox Transformation Enterprise Data Warehouse Business Intelligence Layer In this scenario, Data lake is utilized and ELT over ETL is being used. A Data lake is a storage repository that hold a vast amount of raw data in its native form and can be transformed later as per the need. EDW is applying transformations and utilizing the data. This kind of architecture is great for Organization's data science needs where Data Scientists can use sandbox to apply their models on the raw data stored in Data Lake.
  • 15. To Conclude.. Data Warehouse architects have more tools to play with and there is a need of detailed analysis for the organization and business goals before choosing the right set of technologies to build a data warehouse. The core benefits of data warehouse are still in need and will always be. There is always an opportunity to strengthen them by smart use of appropriate tools and technologies. Hadoop can only fail if there is an attempt to use it just for replacement of existing data warehouse without the proper feasibility analysis and intent to come up with optimized architecture aligned with Organizational goals.