10 Amazing Things To Do With a Hadoop-Based Data Lake

10 Amazing Things
To Do With a
Hadoop-Based Data
Lake
Strata Conference New York 2014
Greg Chase
Director, Product Marketing, Pivotal Software
© 2014 Pivotal Software, Inc. All rights reserved. 2

Pivotal Business Data Lake Architecture
Sources Ingestion
Action Tier
Tier
Insights
Tier
Unified Operations Tier
Command Center
Spring XD, Oozie
Processing Tier
GemFire XD
HAWQ/Greenplum
Distillation Tier
Pivotal HD
Unstructured and structured data
GemFire XD
Spring XD
Spring XD
GemFire XD
Sqoop
Flume
Spring XD
GemFire XD
HAWQ
HBase
HAWQ
GemFire XD
HBase
HAWQ
MapReduce
Hive
Pig
Query interfaces
Clickstream
Sensor Data
Weblogs
Network
Data
CRM Data
ERP Data
GemFire
RabbitMQ
Redis
Pivotal CF

Sources Ingestion
Action Tier
Tier
Insights
Tier
Command Center
Spring XD, Oozie
Processing Tier
GemFire XD
HAWQ/Greenplum
Distillation Tier
Pivotal HD
GemFire XD
Spring XD
Spring XD
GemFire XD
Sqoop
Flume
Spring XD
GemFire XD
HAWQ
HBase
HAWQ
GemFire XD
HBase
HAWQ
MapReduce
Hive
Pig
Query interfaces
Clickstream
Sensor Data
Weblogs
Network
Data
CRM Data
ERP Data
GemFire
RabbitMQ
Redis
Pivotal CF

1. Store Massive Data Sets
…
Rack 1 Rack 2 Rack 3 Rack n
Scale-out:
use
commodity
hardware
and storage

2. Mix Disparate Data Sources
101010101010
Sensor data
CRM data
Website click streams
Schema
flexibility:
adsorb
different
data types
from data
sources

Sources Ingestion
Action Tier
Tier
Insights
Tier
Command Center
Spring XD, Oozie
Processing Tier
GemFire XD
HAWQ/Greenplum
Distillation Tier
Pivotal HD
GemFire XD
Spring XD
Spring XD
GemFire XD
Sqoop
Flume
Spring XD
GemFire XD
HAWQ
HBase
HAWQ
GemFire XD
HBase
HAWQ
MapReduce
Hive
Pig
Query interfaces
Clickstream
Sensor Data
Weblogs
Network
Data
CRM Data
ERP Data
GemFire
RabbitMQ
Redis
Pivotal CF

3. Ingest Bulk Data
D …
D … D
Microbatch
Scalable
open source
tools for
batch
loading data
Batch
Flume
 Event driven
 Any source
Spring XD
 Bulk load
 With processing
 With analytics
 Any source
Sqoop
 Bulk load
 RDBMS

4. Ingest High-Velocity Data
Capture all
volatile data.
Apply
structure.
1010101010101010101
1010101010101010101
1010101010101010101
Spring XD
 Bulk load
 Real-time ingest
 With processing
 With analytics
 Any source
Pivotal GemFire XD
 Advanced DB operations
 Consistency
 Reliable persistence
 Convert to structured
Streaming data

Sources Ingestion
Action Tier
Tier
Insights
Tier
Command Center
Spring XD, Oozie
Processing Tier
GemFire XD
HAWQ/Greenplum
Distillation Tier
Pivotal HD
GemFire XD
Spring XD
Spring XD
GemFire XD
Sqoop
Flume
Spring XD
GemFire XD
HAWQ
HBase
HAWQ
GemFire XD
HBase
HAWQ
MapReduce
Hive
Pig
Query interfaces
Clickstream
Sensor Data
Weblogs
Network
Data
CRM Data
ERP Data
GemFire
RabbitMQ
Redis
Pivotal CF

5. Apply Structure to Unstructured / Semi-
Structured Data
Flexible
processing
of different
data types
101010101010
1
101010101010
1
101010101010
1

6. Make Data Available for MPP SQL Analysis
Name
Node
Fast
processing
for
advanced
analytics in
many
supported
HDFS
formats
Resource
Manager
HAWQ
Master
Data
Node
Node
Manager
HAWQ
Segment(s)
Data
Node
Node
Manager
Data
Node
Node
Manager
Data
Node
Node
Manager
HAWQ
Segment(s)
HAWQ
Segment(s)
HAWQ
Segment(s)
Hadoop Cluster

7. Achieve Data Integration
Create multi-dimensional
analytical
models.
101010101010
1
101010101010
1
101010101010
1

8. Improve Machine Learning & Predictive
Analytics
Richer,
deeper data
sets for
accurate
predictive
analytics.
HAWQ
Master
HAWQ
Segment(s)
HAWQ
Segment(s)
HAWQ
Segment(s)

9. Deploy Real-Time Automation at Scale
Respond in
real-time, at
scale.
Archive
history in
Hadoop.
Pivotal
GemFire XD
101010101010
Web
App
Web
App
Web
App
101010101010
In-Memory

10. Achieve Continuous Innovation at Scale
HAWQ
Master
HAWQ
Segment(s)
HAWQ
Segment(s)
HAWQ
Segment(s)
In-Memory
Web
App
Web
App
Web
App
101010101010
Sensor data
CRM data
Website click streams
Deploy automation
At scale
Capture and store all data
Analyze to
discover insights
& algorithms

Increase Value Derived from Data With a Data
Lake
Store
massive
data sets
Mix
disparate
data
Ingest bulk
data
Ingest
high-velocity
data
Apply
structure
Enable
MPP
analysis
Achieve
data
integration
Business Value
Improve
predictive
analytics
Deploy
real-time
automation
at scale
Achieve
continuous
innovation

For more information on
Pivotal Big Data Suite
Visit Pivotal.io/big-data

10 Amazing Things To Do With a Hadoop-Based Data Lake

More Related Content

What's hot (20)

Similar to 10 Amazing Things To Do With a Hadoop-Based Data Lake (20)

More from VMware Tanzu (20)

Recently uploaded (20)

10 Amazing Things To Do With a Hadoop-Based Data Lake

Editor's Notes