SlideShare a Scribd company logo
Experts in
&
Enterprise Data Lake
Build Lake on Cloud
A T T U N I T Y
PARTNERS
Innovating and Engineering
High Performance Data Integration
BI & Analytics
Platforms
On-Premise, Cloud or Hybrid.
Data Lake is a repository for large quantities and varieties of data,
both structured and unstructured. Data generalists / programmers
can tap the stream data for
real-time analytics.
Data scientists use the
lake for discovery and
ideation.
Data lakes take advantage of commodity cluster
computing techniques for massively scalable,
low-cost storage of data files in any format.
Of�load “Cold” Data From DW to Hadoop
Dramatically lowers the cost per
terabyte to store data - Hadoop
based storage is 30x cheaper
More Information can be
retained and analyzed
Improves performance of the
Data Warehouse
“Cold” data still available to be
queried on-line or interactively
“Cold” data in Hadoop can be
mined for additional insights or
combined with other data
Bene�its
Data
WarehouseETL
Reports / Dashboard /
Queries
“HOT”
Hadoop “COLD”
Ongoing
data load
Initial bulk load of raw or
infrequently used data
Re-factor queries
and reports to
work via HIVE-QL
Translate DW Data
Model to Hive /
HCatalog
For
frequently
used data
AFTERBEFORE
The data lake accepts input from various sources and
can preserve both the original data fidelity and the
lineage of data transformations. Data models emerge
with usage over time rather than being imposed up front.
The lake can serve as a staging area for
the data warehouse, the location of more
carefully "treated" data for reporting and
analysis in batch mode.
What is a Data Lake?
Qubole
AWS Data
Pipe Line
FTP
EnterpriseSystems
DATA LAKE
ON CLOUD
AWS - S3
Amazon AWS Cloud
Facebook
Twitter
Google +
iTunes Store
Google Play
You Tube
Amazon MP3
Spotify
VEVO
Amazon Prime
HULU
DATA ARCHIVES
XML
OTHER
EXCEL
TXT
CSV
JSON
EDI
External Business
Partners & Third Party
SAP
MySQL
Product,Customer
&OtherData
CRMOracle
Oracle SQL
Server
MySQL Oracle SQL
Server
MicroStrategy | Business Objects
Dashboard
ETL
Reporting
FTP
Spark
HIVE
Presto
Hadoop
Qubole
Analytics & Data
Scientist
MicroStrategy | TableauHadoop Map
Reduce
Data
Stream’s
to Data
Lake On-Demand Data Flow
Regular Data Flow
Replication
Data Lake
Reference Architecture
SERVICES
STAFFING DATA WAREHOUSING BI APPLICATIONS CLOUD BI MOBILE BI BIG DATA
MASTER DATA MANAGEMENT
W W W . A G I L E I S S . C O M

More Related Content

PDF
Data platform architecture
PDF
Building Data Lakes with Apache Airflow
PPTX
Data quality patterns in the cloud with ADF
PDF
Hadoop and Vertica: Data Analytics Platform at Twitter
PPTX
Big Data on the Cloud
PDF
The Holy Grail of Data Analytics
PPTX
Big Data - HDInsight and Power BI
PPTX
Eugene Polonichko "Architecture of modern data warehouse"
Data platform architecture
Building Data Lakes with Apache Airflow
Data quality patterns in the cloud with ADF
Hadoop and Vertica: Data Analytics Platform at Twitter
Big Data on the Cloud
The Holy Grail of Data Analytics
Big Data - HDInsight and Power BI
Eugene Polonichko "Architecture of modern data warehouse"

What's hot (20)

PDF
Modern Data architecture Design
PPTX
BIG DATA HADOOP
PPTX
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
PDF
Building Custom Big Data Integrations
PPTX
Big Data Analytics Projects - Real World with Pentaho
PPTX
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
PPTX
Dealing with Drift: Building an Enterprise Data Lake
PDF
How to Build Modern Data Architectures Both On Premises and in the Cloud
PDF
From zero to hero with the actor model - Tamir Dresher - Odessa 2019
PPTX
Big Data in the Real World
PPTX
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
PDF
Database Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, Couchbase
PPTX
Azure Data Factory Data Wrangling with Power Query
PPTX
Optimize Data for the Logical Data Warehouse
PPTX
ADF Mapping Data Flows Training Slides V1
PPTX
Azure Data Factory Data Flows Training v005
PPTX
ETL in the Cloud With Microsoft Azure
PDF
Introduction to Hivemall
PDF
Hugfr SPARK & RIAK -20160114_hug_france
Modern Data architecture Design
BIG DATA HADOOP
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Building Custom Big Data Integrations
Big Data Analytics Projects - Real World with Pentaho
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Dealing with Drift: Building an Enterprise Data Lake
How to Build Modern Data Architectures Both On Premises and in the Cloud
From zero to hero with the actor model - Tamir Dresher - Odessa 2019
Big Data in the Real World
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
Database Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, Couchbase
Azure Data Factory Data Wrangling with Power Query
Optimize Data for the Logical Data Warehouse
ADF Mapping Data Flows Training Slides V1
Azure Data Factory Data Flows Training v005
ETL in the Cloud With Microsoft Azure
Introduction to Hivemall
Hugfr SPARK & RIAK -20160114_hug_france
Ad

Similar to Hadoop Big data Solution Provider (20)

PDF
Whitepaper-The-Data-Lake-3_0
PPTX
Data Lake Overview
PDF
Enterprise Data Lake - Scalable Digital
PDF
Enterprise Data Lake
PDF
The Data Lake and Getting Buisnesses the Big Data Insights They Need
PDF
Data lakehouse fallacies
PDF
Planing and optimizing data lake architecture
PDF
Planning and Optimizing Data Lake Architecture - Milos Milovanovic
PDF
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
PDF
Unleashing the Power of your Data
PDF
Data lakes
PDF
Data Lakes: A Logical Approach for Faster Unified Insights
PPTX
Big data architectures and the data lake
PPTX
lec 3 Data warehouse course Advanced database.pptx
PDF
Data Lakes versus Data Warehouses
PDF
The technology of the business data lake
PDF
Big Data, Ingeniería de datos, y Data Lakes en AWS
PPTX
Exploiting Data Lakes: Architecture, Capabilities & Future
PDF
Agile data lake? An oxymoron?
PPTX
Data lake ppt
Whitepaper-The-Data-Lake-3_0
Data Lake Overview
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake
The Data Lake and Getting Buisnesses the Big Data Insights They Need
Data lakehouse fallacies
Planing and optimizing data lake architecture
Planning and Optimizing Data Lake Architecture - Milos Milovanovic
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Unleashing the Power of your Data
Data lakes
Data Lakes: A Logical Approach for Faster Unified Insights
Big data architectures and the data lake
lec 3 Data warehouse course Advanced database.pptx
Data Lakes versus Data Warehouses
The technology of the business data lake
Big Data, Ingeniería de datos, y Data Lakes en AWS
Exploiting Data Lakes: Architecture, Capabilities & Future
Agile data lake? An oxymoron?
Data lake ppt
Ad

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
cuic standard and advanced reporting.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Encapsulation theory and applications.pdf
20250228 LYD VKU AI Blended-Learning.pptx
sap open course for s4hana steps from ECC to s4
MIND Revenue Release Quarter 2 2025 Press Release
Unlocking AI with Model Context Protocol (MCP)
Digital-Transformation-Roadmap-for-Companies.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Machine learning based COVID-19 study performance prediction
Programs and apps: productivity, graphics, security and other tools
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Assigned Numbers - 2025 - Bluetooth® Document
Chapter 3 Spatial Domain Image Processing.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
cuic standard and advanced reporting.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Electronic commerce courselecture one. Pdf
Encapsulation theory and applications.pdf

Hadoop Big data Solution Provider

  • 1. Experts in & Enterprise Data Lake Build Lake on Cloud A T T U N I T Y PARTNERS Innovating and Engineering High Performance Data Integration BI & Analytics Platforms On-Premise, Cloud or Hybrid.
  • 2. Data Lake is a repository for large quantities and varieties of data, both structured and unstructured. Data generalists / programmers can tap the stream data for real-time analytics. Data scientists use the lake for discovery and ideation. Data lakes take advantage of commodity cluster computing techniques for massively scalable, low-cost storage of data files in any format. Of�load “Cold” Data From DW to Hadoop Dramatically lowers the cost per terabyte to store data - Hadoop based storage is 30x cheaper More Information can be retained and analyzed Improves performance of the Data Warehouse “Cold” data still available to be queried on-line or interactively “Cold” data in Hadoop can be mined for additional insights or combined with other data Bene�its Data WarehouseETL Reports / Dashboard / Queries “HOT” Hadoop “COLD” Ongoing data load Initial bulk load of raw or infrequently used data Re-factor queries and reports to work via HIVE-QL Translate DW Data Model to Hive / HCatalog For frequently used data AFTERBEFORE The data lake accepts input from various sources and can preserve both the original data fidelity and the lineage of data transformations. Data models emerge with usage over time rather than being imposed up front. The lake can serve as a staging area for the data warehouse, the location of more carefully "treated" data for reporting and analysis in batch mode. What is a Data Lake?
  • 3. Qubole AWS Data Pipe Line FTP EnterpriseSystems DATA LAKE ON CLOUD AWS - S3 Amazon AWS Cloud Facebook Twitter Google + iTunes Store Google Play You Tube Amazon MP3 Spotify VEVO Amazon Prime HULU DATA ARCHIVES XML OTHER EXCEL TXT CSV JSON EDI External Business Partners & Third Party SAP MySQL Product,Customer &OtherData CRMOracle Oracle SQL Server MySQL Oracle SQL Server MicroStrategy | Business Objects Dashboard ETL Reporting FTP Spark HIVE Presto Hadoop Qubole Analytics & Data Scientist MicroStrategy | TableauHadoop Map Reduce Data Stream’s to Data Lake On-Demand Data Flow Regular Data Flow Replication Data Lake Reference Architecture
  • 4. SERVICES STAFFING DATA WAREHOUSING BI APPLICATIONS CLOUD BI MOBILE BI BIG DATA MASTER DATA MANAGEMENT W W W . A G I L E I S S . C O M