SlideShare a Scribd company logo
Evolution of Data
Architectures:
From Hadoop to Data Lake
in becoming Data Driven
Alexandre Vasseur, Pivotal
@PivotalFrance
© Copyright 2015 Pivotal. All rights reserved.
If you have one thing to do
Store Massive
Data Sets
Achieve Continuous
Innovation at Scale
Becoming Data
Driven with Apps
Data Driven Apps
AGILE
DEV & DATA
SCIENCE
MODERN,
COLLABORATIVE
APP & DEV
PLATFORM:
MODERN,
CLOUD-ORIENTED
& OPEN
DATA FABRIC:
MODERN
CLOUD-ORIENTED
& OPEN
© Copyright 2015 Pivotal. All rights reserved.
The Big Data Problem
Fragmentation ContraintsComplexity
© Copyright 2015 Pivotal. All rights reserved.
Pivotal + Hortonworks Alliance
•  Started July 2014 around Ambari collaboration
•  Announcing Pivotal Big Data Suite
on Hortonworks Data Platform
•  Advanced support from world’s leading Hortonworks
support services
•  Joint engineering efforts and enhanced Pivotal HD
© Copyright 2015 Pivotal. All rights reserved.
ODP - Standardize Hadoop Ecosystem
•  Deliver ODP Core to build a versionned, packaged,
tested set of Hadoop components.
•  Focus on developing a platform, rather than projects
•  Initial scope on Apache Hadoop
HDFS / MR / Yarn / Ambari
Remove
vendors lock-in
Ecosystem
Effect
Shorter
Innovation Cycles
http://opendataplatform.org
…
© Copyright 2015 Pivotal. All rights reserved.
Open Sourced but not just Hadoop
•  Open sourcing all Pivotal Big Data Suite components
–  Pivotal GemFire - premium in-memory NoSQL database
–  Pivotal HAWQ - world’s leading SQL compliant enterprise
SQL on Hadoop
–  Pivotal Greenplum Database - advanced enterprise MPP
analytic database with Hadoop interconnect
– SpringXD - Unified, distributed, and extensible system for
data driven application development
© Copyright 2015 Pivotal. All rights reserved.
HAWQ SQL on Hadoop
PROVEN AT SCALE
PRODUCTIVE
NATIVE on HADOOP / ODP
OPEN & EXTENSIBLE
© Copyright 2015 Pivotal. All rights reserved.
HAWQ SQL on Hadoop
10+ years R&D in Massively Parallel SQL
SQL engine at peta scale analytics in world’s largest industries
Mature cost based query optimizer
Full SQL semantics
Rich ecosystem of ELT/dataviz/BI & partners
PL/*, build in analytics, R native framing
All Hadoop formats (gz, Parquet, HAWQ etc)
Data node short circuit reads (colocated, not M/R based)
Predicate pushdown to Hive, HBase
HAWQ PXF: Query federation to NoSQL, DB, etc
© Copyright 2015 Pivotal. All rights reserved.
SpringXD
Data from anywhere, to anywhere
Real time & batch
Ingest + analytics
+ jobs orchestration
Developer friendly
Built in connectors
With / without Spark
DSL
Your choice of Hadoop
Your choice of messaging
Standalone, YARN & outside Hadoop
© Copyright 2015 Pivotal. All rights reserved.
Simplify Data Driven Applications
•  PaaS with NoSQL & Big Data choices built-in
•  Emergence of vertical services: Mobile, IoT, …
Data centric runtimes built in
Java/PHP/Node.js/Ruby
Python
R/Shiny
Scala
SpringXD
Large choice of data services
DB, clustered MySQL etc
Memcache, Redis etc
GemFire, Cassandra etc
Hadoop, GreenPlum etc
Can run virtualized inside PaaS
Can run multi-tenant-ified alongside PaaS
© Copyright 2015 Pivotal. All rights reserved.
DEMO
PHD (or any ODP Core-based Hadoop Distribution)
HDFS
HAWQ
(SQL on Hadoop)
GreenplumDB
(Analytics DW)
GemFire
(JSON/Object
in memory data grid)
Redis
(Key Value Store)
RabbitMQ
SpringXD
(Stream Processing/scoring)
SpringXD
CloudFoundryDataServices
HBase Hive
PXF
(Filtered Pushdown)
Direct Store
Federated
GPHDFS
Write behind
Persistence
Analytic AppsOnline Apps
Pivotal
Big Data Suite
Spark
© Copyright 2015 Pivotal. All rights reserved.
The New Data Imperatives
Converged
Data & Cloud
OpenData-Driven
Apps
A NEW PLATFORM FOR A NEW ERA
Meet us at the booth !
Come to do a “HAWQ in 2 min” lab
Win a Solo2 Beats Headphone !

More Related Content

PPTX
10 Amazing Things To Do With a Hadoop-Based Data Lake
PPTX
Hadoop Powers Modern Enterprise Data Architectures
PPTX
Big data architectures and the data lake
PPTX
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
PDF
Planing and optimizing data lake architecture
PDF
Data lake benefits
PDF
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
PDF
Big Data: Architecture and Performance Considerations in Logical Data Lakes
10 Amazing Things To Do With a Hadoop-Based Data Lake
Hadoop Powers Modern Enterprise Data Architectures
Big data architectures and the data lake
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Planing and optimizing data lake architecture
Data lake benefits
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Big Data: Architecture and Performance Considerations in Logical Data Lakes

What's hot (20)

PDF
Hadoop data-lake-white-paper
PPTX
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
PDF
Data Governance for Data Lakes
PDF
Data Lake for the Cloud: Extending your Hadoop Implementation
PDF
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
PDF
Incorporating the Data Lake into Your Analytic Architecture
PPTX
Data lake – On Premise VS Cloud
PPTX
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
PPTX
Why Data Lake should be the foundation of Enterprise Data Architecture
PDF
A Reference Architecture for ETL 2.0
PDF
Solving Big Data Problems using Hortonworks
PDF
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
PDF
Building the Enterprise Data Lake: A look at architecture
PDF
Hadoop Integration into Data Warehousing Architectures
PDF
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
PDF
5 Steps for Architecting a Data Lake
PDF
Building a Data Lake - An App Dev's Perspective
PDF
Data Lake Architecture
PDF
The Warranty Data Lake – After, Inc.
Hadoop data-lake-white-paper
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Data Governance for Data Lakes
Data Lake for the Cloud: Extending your Hadoop Implementation
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Incorporating the Data Lake into Your Analytic Architecture
Data lake – On Premise VS Cloud
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Why Data Lake should be the foundation of Enterprise Data Architecture
A Reference Architecture for ETL 2.0
Solving Big Data Problems using Hortonworks
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
Building the Enterprise Data Lake: A look at architecture
Hadoop Integration into Data Warehousing Architectures
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
5 Steps for Architecting a Data Lake
Building a Data Lake - An App Dev's Perspective
Data Lake Architecture
The Warranty Data Lake – After, Inc.
Ad

Viewers also liked (16)

PDF
The Emerging Data Lake IT Strategy
PDF
Extensible Database APIs and their role in Software Architecture
PDF
Understanding and building big data Architectures - NoSQL
PPTX
Build & test Apache Hawq
PDF
Modern Big Data Analytics Tools: An Overview
PPTX
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
PPTX
Pivotal hawq internals
PDF
Webinar turbo charging_data_science_hawq_on_hdp_final
PDF
How to manage Hortonworks HDB Resources with YARN
PPT
RWDG Webinar: The New Non-Invasive Data Governance Framework
PPTX
Apache HAWQ Architecture
PDF
LDM Slides: How Data Modeling Fits into an Overall Enterprise Architecture
PDF
DI&A Slides: Data Lake vs. Data Warehouse
PDF
Implementing a Data Lake with Enterprise Grade Data Governance
PDF
HDP2.5 Updates
PPTX
Apache Spark Architecture
The Emerging Data Lake IT Strategy
Extensible Database APIs and their role in Software Architecture
Understanding and building big data Architectures - NoSQL
Build & test Apache Hawq
Modern Big Data Analytics Tools: An Overview
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal hawq internals
Webinar turbo charging_data_science_hawq_on_hdp_final
How to manage Hortonworks HDB Resources with YARN
RWDG Webinar: The New Non-Invasive Data Governance Framework
Apache HAWQ Architecture
LDM Slides: How Data Modeling Fits into an Overall Enterprise Architecture
DI&A Slides: Data Lake vs. Data Warehouse
Implementing a Data Lake with Enterprise Grade Data Governance
HDP2.5 Updates
Apache Spark Architecture
Ad

Similar to Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015 (20)

PDF
Federated Queries with HAWQ - SQL on Hadoop and Beyond
PDF
ds_Pivotal_Big_Data_Suite_Product_Suite
PDF
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
PPTX
A new platform for a new era emc
PPTX
Pivotal Strata NYC 2015 Apache HAWQ Launch
PDF
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
PDF
Operationalizing Data Analytics
PDF
SQL and Machine Learning on Hadoop using HAWQ
PDF
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
 
PPTX
Pivotal HD and Spring for Apache Hadoop
PPTX
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
PPTX
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
PDF
SQL and Machine Learning on Hadoop
PPTX
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
PPTX
Apache HAWQ and Apache MADlib: Journey to Apache
PDF
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
PPTX
Overview of big data & hadoop v1
PDF
VMUGIT UC 2013 - 08a VMware Hadoop
PDF
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
PPTX
Overview of big data & hadoop version 1 - Tony Nguyen
Federated Queries with HAWQ - SQL on Hadoop and Beyond
ds_Pivotal_Big_Data_Suite_Product_Suite
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
A new platform for a new era emc
Pivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
Operationalizing Data Analytics
SQL and Machine Learning on Hadoop using HAWQ
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
 
Pivotal HD and Spring for Apache Hadoop
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
SQL and Machine Learning on Hadoop
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Apache HAWQ and Apache MADlib: Journey to Apache
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Overview of big data & hadoop v1
VMUGIT UC 2013 - 08a VMware Hadoop
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
Overview of big data & hadoop version 1 - Tony Nguyen

More from NoSQLmatters (20)

PDF
Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...
PDF
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
PDF
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
PDF
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
PDF
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
PDF
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
PDF
Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...
PDF
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
PDF
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
PDF
Chris Ward - Understanding databases for distributed docker applications - No...
PDF
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...
PDF
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
PDF
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
PDF
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
PDF
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
PDF
David Pilato - Advance search for your legacy application - NoSQL matters Par...
PDF
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
PDF
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
PDF
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...
PDF
Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015
Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Chris Ward - Understanding databases for distributed docker applications - No...
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
David Pilato - Advance search for your legacy application - NoSQL matters Par...
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...
Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015

Recently uploaded (20)

PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPT
Introduction Database Management System for Course Database
PDF
Nekopoi APK 2025 free lastest update
PPTX
ai tools demonstartion for schools and inter college
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
medical staffing services at VALiNTRY
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
history of c programming in notes for students .pptx
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PPTX
Introduction to Artificial Intelligence
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
AI in Product Development-omnex systems
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Introduction Database Management System for Course Database
Nekopoi APK 2025 free lastest update
ai tools demonstartion for schools and inter college
How to Migrate SBCGlobal Email to Yahoo Easily
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Softaken Excel to vCard Converter Software.pdf
ManageIQ - Sprint 268 Review - Slide Deck
medical staffing services at VALiNTRY
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Wondershare Filmora 15 Crack With Activation Key [2025
How Creative Agencies Leverage Project Management Software.pdf
history of c programming in notes for students .pptx
2025 Textile ERP Trends: SAP, Odoo & Oracle
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
Introduction to Artificial Intelligence
Operating system designcfffgfgggggggvggggggggg
AI in Product Development-omnex systems

Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015

  • 1. Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven Alexandre Vasseur, Pivotal @PivotalFrance
  • 2. © Copyright 2015 Pivotal. All rights reserved. If you have one thing to do Store Massive Data Sets Achieve Continuous Innovation at Scale Becoming Data Driven with Apps
  • 3. Data Driven Apps AGILE DEV & DATA SCIENCE MODERN, COLLABORATIVE APP & DEV PLATFORM: MODERN, CLOUD-ORIENTED & OPEN DATA FABRIC: MODERN CLOUD-ORIENTED & OPEN
  • 4. © Copyright 2015 Pivotal. All rights reserved. The Big Data Problem Fragmentation ContraintsComplexity
  • 5. © Copyright 2015 Pivotal. All rights reserved. Pivotal + Hortonworks Alliance •  Started July 2014 around Ambari collaboration •  Announcing Pivotal Big Data Suite on Hortonworks Data Platform •  Advanced support from world’s leading Hortonworks support services •  Joint engineering efforts and enhanced Pivotal HD
  • 6. © Copyright 2015 Pivotal. All rights reserved. ODP - Standardize Hadoop Ecosystem •  Deliver ODP Core to build a versionned, packaged, tested set of Hadoop components. •  Focus on developing a platform, rather than projects •  Initial scope on Apache Hadoop HDFS / MR / Yarn / Ambari Remove vendors lock-in Ecosystem Effect Shorter Innovation Cycles http://opendataplatform.org …
  • 7. © Copyright 2015 Pivotal. All rights reserved. Open Sourced but not just Hadoop •  Open sourcing all Pivotal Big Data Suite components –  Pivotal GemFire - premium in-memory NoSQL database –  Pivotal HAWQ - world’s leading SQL compliant enterprise SQL on Hadoop –  Pivotal Greenplum Database - advanced enterprise MPP analytic database with Hadoop interconnect – SpringXD - Unified, distributed, and extensible system for data driven application development
  • 8. © Copyright 2015 Pivotal. All rights reserved. HAWQ SQL on Hadoop PROVEN AT SCALE PRODUCTIVE NATIVE on HADOOP / ODP OPEN & EXTENSIBLE
  • 9. © Copyright 2015 Pivotal. All rights reserved. HAWQ SQL on Hadoop 10+ years R&D in Massively Parallel SQL SQL engine at peta scale analytics in world’s largest industries Mature cost based query optimizer Full SQL semantics Rich ecosystem of ELT/dataviz/BI & partners PL/*, build in analytics, R native framing All Hadoop formats (gz, Parquet, HAWQ etc) Data node short circuit reads (colocated, not M/R based) Predicate pushdown to Hive, HBase HAWQ PXF: Query federation to NoSQL, DB, etc
  • 10. © Copyright 2015 Pivotal. All rights reserved. SpringXD Data from anywhere, to anywhere Real time & batch Ingest + analytics + jobs orchestration Developer friendly Built in connectors With / without Spark DSL Your choice of Hadoop Your choice of messaging Standalone, YARN & outside Hadoop
  • 11. © Copyright 2015 Pivotal. All rights reserved. Simplify Data Driven Applications •  PaaS with NoSQL & Big Data choices built-in •  Emergence of vertical services: Mobile, IoT, … Data centric runtimes built in Java/PHP/Node.js/Ruby Python R/Shiny Scala SpringXD Large choice of data services DB, clustered MySQL etc Memcache, Redis etc GemFire, Cassandra etc Hadoop, GreenPlum etc Can run virtualized inside PaaS Can run multi-tenant-ified alongside PaaS
  • 12. © Copyright 2015 Pivotal. All rights reserved. DEMO PHD (or any ODP Core-based Hadoop Distribution) HDFS HAWQ (SQL on Hadoop) GreenplumDB (Analytics DW) GemFire (JSON/Object in memory data grid) Redis (Key Value Store) RabbitMQ SpringXD (Stream Processing/scoring) SpringXD CloudFoundryDataServices HBase Hive PXF (Filtered Pushdown) Direct Store Federated GPHDFS Write behind Persistence Analytic AppsOnline Apps Pivotal Big Data Suite Spark
  • 13. © Copyright 2015 Pivotal. All rights reserved. The New Data Imperatives Converged Data & Cloud OpenData-Driven Apps
  • 14. A NEW PLATFORM FOR A NEW ERA Meet us at the booth ! Come to do a “HAWQ in 2 min” lab Win a Solo2 Beats Headphone !