SlideShare a Scribd company logo
1Pivotal Confidential–Internal Use Only 1Pivotal Confidential–Internal Use Only
Modern Data Architecture
Alexey Grishchenko
2Pivotal Confidential–Internal Use Only
About me
Enterprise Architect @ Pivotal
 7 years in data processing
 5 years with MPP
 4 years with Hadoop
 Spark contributor
 http://0x0fff.com
3Pivotal Confidential–Internal Use Only
How it started…
Front
End
4Pivotal Confidential–Internal Use Only
How it started…
Front
End
Back
End
5Pivotal Confidential–Internal Use Only
How it started…
Front
End
Back
End
DBMS
6Pivotal Confidential–Internal Use Only
How it started…
Front
End
Back
End
DBMS
What about BI?
7Pivotal Confidential–Internal Use Only
How it started…
Front
End
Back
End
DBMS
Just put it there!
8Pivotal Confidential–Internal Use Only
How it started…
Front
End
Back
End
DBMS
BI
9Pivotal Confidential–Internal Use Only
How it started…
Front
End
Back
End
DBMS
BI
Was it fast?
10Pivotal Confidential–Internal Use Only
How it started…
Front
End
10ms
Back
End
DBMS
BI
100ms
200ms
1-2 min
11Pivotal Confidential–Internal Use Only
How it started…
Front
End
10ms
Back
End
DBMS
BI
100ms
200ms
1-2 min
yes, single server…
12Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
200ms
1-2 min
More users got
workstations
13Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
400ms
800ms
1-2 min
14Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
400ms
800ms
1-2 min
Split!
15Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
300ms
600ms
1-2 min
16Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
300ms
600ms
1-2 min
Even more users?
17Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
300ms
600ms
1-2 min
Split!
18Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
400ms
1-2 min
Front
End
Back
End
Front
End
Back
End
19Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
400ms
1-2 min
Front
End
Back
End
Front
End
Back
End
What about
automated systems?
20Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
1 sec
5-10 min
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
21Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
1 sec
5-10 min
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Database, please, live!
22Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
1 sec
5-10 min
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
23Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
800ms
15-20 min
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
24Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
800ms
15-20 min
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
What if “split” didn’t
help this time?
25Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
800ms
15-20 min
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Split more! Eventually
it will help…
26Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
300ms
35-40 min
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
DBMS DBMSDBMSDBMS
27Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
300ms
35-40 min
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
DBMS DBMSDBMSDBMS
28Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
300ms
35-40 min
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
DBMS DBMSDBMSDBMS
Sales went
10% up!
29Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
300ms
35-40 min
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
DBMS DBMSDBMSDBMS
Sales went
10% up!
Sales went
20%
down!
30Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
600ms
2-3 hrs
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
DBMS DBMSDBMSDBMS
Sales went
10% up!
Sales went
20%
down!
31Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
600ms
2-3 hrs
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
DBMS DBMSDBMSDBMS
Sales went
10% up!
Sales went
20%
down!
Stop loading my
system with your
stupid reports!
32Pivotal Confidential–Internal Use Only
BI
The Era of Data Warehouse
100ms
DBMS
300ms
2 days
FE
BE
DBMS DBMSDBMSDBMS
FE
BE
FE
BE
FE
BE
FE
BE
ETL
DWH
1 day
33Pivotal Confidential–Internal Use Only
BI
The Era of Data Warehouse
100ms
DBMS
300ms
2 days
FE
BE
DBMS DBMSDBMSDBMS
FE
BE
FE
BE
FE
BE
FE
BE
ETL
DWH
1 day
We need more
reports!
34Pivotal Confidential–Internal Use Only
BI
The Era of Data Warehouse
100ms
DBMS
300ms
3-4 days
FE
BE
DBMS DBMSDBMSDBMS
FE
BE
FE
BE
FE
BE
FE
BE
ETL
DWH
1 day
Data
Mining
OLAP…
35Pivotal Confidential–Internal Use Only
BI
The Era of Data Warehouse
100ms
DBMS
300ms
3-4 days
FE
BE
DBMS DBMSDBMSDBMS
FE
BE
FE
BE
FE
BE
FE
BE
ETL
DWH
1 day
Data
Mining
OLAP… We need
secondary site!
36Pivotal Confidential–Internal Use Only
The Era of Data Warehouse
100ms
300ms
3-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ETL
DWH
1 day
BI
Data
Mining
OLAP…
37Pivotal Confidential–Internal Use Only
The Era of Data Warehouse
100ms
300ms
3-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ETL
DWH
1 day
BI
Data
Mining
OLAP…
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
WAL Replication
3-5 minutes late
38Pivotal Confidential–Internal Use Only
The Era of Data Warehouse
100ms
300ms
3-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ETL
DWH
1 day
BI
Data
Mining
OLAP…
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
WAL Replication
3-5 minutes late
39Pivotal Confidential–Internal Use Only
The Era of Data Warehouse
100ms
300ms
3-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ETL
DWH
1 day
BI
Data
Mining
OLAP…
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
WAL Replication
3-5 minutes late
Where is our
DWH? We need
this data now!
40Pivotal Confidential–Internal Use Only
The Era of Data Warehouse
100ms
300ms
3-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ETL
DWH
1 day
BI
Data
Mining
OLAP…
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
WAL Replication
3-5 minutes late
41Pivotal Confidential–Internal Use Only
ETL
The Era of Data Warehouse
100ms
300ms
3-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ETL
DWH
1 day
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
DWH
BI
Data
Mining
OLAP…
5-7 days
DBMS DBMS DBMS DBMS DBMS
42Pivotal Confidential–Internal Use Only
ETL
The Era of Data Warehouse
100ms
300ms
3-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ETL
DWH
1 day
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
DWH
BI
Data
Mining
OLAP…
5-7 days
DBMS DBMS DBMS DBMS DBMS
Why is this data
so old?
43Pivotal Confidential–Internal Use Only
ETL
The Era of Data Warehouse
100ms
300ms
3-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ETL
DWH
1 day
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
DWH
BI
Data
Mining
OLAP…
5-7 days
DBMS DBMS DBMS DBMS DBMS
44Pivotal Confidential–Internal Use Only
ETL
Advanced Architecture – ELT
100ms
300ms
3-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ETL
DWH
1 day
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
DWH
BI
Data
Mining
OLAP…
5-7 days
DBMS DBMS DBMS DBMS DBMS
DBMS DBMS DBMS…
ETL
DDS
Data Marts Reports
Aggregates
OLAP
DBMS DBMS DBMS…
ELT
DDS
Data Marts Reports
Aggregates
OLAP
ODS ODS ODS…
45Pivotal Confidential–Internal Use Only
ELT
Advanced Architecture – ELT
100ms
300ms
3-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
1 day
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
DWH
BI
Data
Mining
OLAP…
5-7 days
DBMS DBMS DBMS DBMS DBMS
46Pivotal Confidential–Internal Use Only
ELT
Advanced Architecture – CDC
100ms
300ms
3-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
1 day
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
DWH
BI
Data
Mining
OLAP…
5-7 days
DBMS DBMS DBMS DBMS DBMS
DBMS DBMS DBMS…
ELT
DDS
Data Marts Reports
Aggregates
OLAP
ODS ODS ODS…
DBMS DBMS DBMS…
ELT
DDS
Data Marts Reports
Aggregates
OLAP
ODS ODS ODS…
CDC
1 day
1 hour
47Pivotal Confidential–Internal Use Only
ELT CDC
Advanced Architecture – CDC
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
BI
Data
Mining
OLAP…
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
48Pivotal Confidential–Internal Use Only
ELT CDC
Advanced Architecture – CDC
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
BI
Data
Mining
OLAP…
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Why is our
secondary site’s
DWH so old?
49Pivotal Confidential–Internal Use Only
ELT CDC
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
BI
Data
Mining
OLAP…
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Moving Forward
50Pivotal Confidential–Internal Use Only
ELT CDC
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
BI
Data
Mining
OLAP…
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Our problems are
Moving Forward
51Pivotal Confidential–Internal Use Only
ELT CDC
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
BI
Data
Mining
OLAP…
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Our problems are
 Time to action takes up to 7 days
Moving Forward
52Pivotal Confidential–Internal Use Only
ELT CDC
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
BI
Data
Mining
OLAP…
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Our problems are
 Time to action takes up to 7 days
 Amount of data is growing
Moving Forward
53Pivotal Confidential–Internal Use Only
ELT CDC
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
BI
Data
Mining
OLAP…
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Our problems are
 Time to action takes up to 7 days
 Amount of data is growing
 DWH MPP storage is expensive
Moving Forward
54Pivotal Confidential–Internal Use Only
ELT CDC
Modern Architectures
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
BI
Data
Mining
OLAP…
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Our problems are
 Time to action takes up to 7 days
 Amount of data is growing
 DWH MPP storage is expensive
Data Lake
55Pivotal Confidential–Internal Use Only
ELT CDC
Modern Architectures
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
BI
Data
Mining
OLAP…
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Our problems are
 Time to action takes up to 7 days
 Amount of data is growing
 DWH MPP storage is expensive
Lambda
Data Lake
56Pivotal Confidential–Internal Use Only
ELT CDC
Modern Architectures – Data Lake
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
BI
Data
Mining
OLAP…
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Hadoop
DBMS DBMS DBMS…
ELT
DDS
OLAP Data Marts
Aggregates
Reports
ODS ODS ODS…
CDC
DWH
ODS UDS
Analytical Archives
BI
Data
Mining
OLAP
SQL-on-Hadoop
Data Mining
At Scale
57Pivotal Confidential–Internal Use Only
ELT CDC
Modern Architectures – Data Lake
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
BI
Data
Mining
OLAP…
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
58Pivotal Confidential–Internal Use Only
ELT CDC
Modern Architectures – Data Lake
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
Data
Mining
BI OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
59Pivotal Confidential–Internal Use Only
ELT CDC
Modern Architectures – Lambda
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
Data
Mining
BI OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
Source
Data
Speed Layer Batch Layer
Serving Layer
Query Query
Master Dataset
Batch
View
Batch
View
Batch
View
Real-time
View
Real-time
View
Real-time
View
60Pivotal Confidential–Internal Use Only
ELT CDC
Modern Architectures – Lambda
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
Data
Mining
BI OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
61Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
Modern Architectures – Lambda
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
62Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
Modern Architectures
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Our problems are
63Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
Modern Architectures
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Our problems are
 Too many standby systems
64Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
Modern Architectures
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Our problems are
 Too many standby systems
 How to replicate Hadoop cluster?
65Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
Modern Architectures
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Our problems are
 Too many standby systems
 How to replicate Hadoop cluster?
 How to sync data in real-time systems?
66Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
Modern Architectures
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Our problems are
 Too many standby systems
 How to replicate Hadoop cluster?
 How to sync data in real-time systems?
 How to better sync DWH?
67Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
Modern Architectures
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Our problems are
 Too many standby systems
 How to replicate Hadoop cluster?
 How to sync data in real-time systems?
 How to better sync DWH?
Pipelining
68Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
69Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
App
App
App
…HTTP
70Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
App
App
App
…HTTP
BE
Srv
Srv
Srv
…SOAP
71Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
App
App
App
…HTTP
BE
Srv
Srv
Srv
…SOAP
OLTP
SP
JDBC
Table
72Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
App
App
App
…HTTP
BE
Srv
Srv
Srv
…SOAP
OLTP
SP
JDBC
Log
Table
73Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
App
App
App
…HTTP
BE
Srv
Srv
Srv
…SOAP
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
74Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
App
App
App
…HTTP
BE
Srv
Srv
Srv
…SOAP
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
ETL
cp
Batch
ETL
75Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
App
App
App
…HTTP
BE
Srv
Srv
Srv
…SOAP
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
ETL
cp
Batch
ETL
load
ODS
DWH
76Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
App
App
App
…HTTP
BE
Srv
Srv
Srv
…SOAP
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
ETL
cp
Batch
ETL
load
ODS
DDS
DWH
77Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
App
App
App
…HTTP
BE
Srv
Srv
Srv
…SOAP
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
ETL
cp
Batch
ETL
load
ODS
DDS
DataMart
DWH
78Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
…HTTP
BE
Srv
Srv
Srv
…SOAP
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
ETL
cp
Batch
ETL
load
ODS
DDS
DataMart
DWH
JDBC
79Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
…HTTP
BE
Srv
Srv
Srv
…
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
ETL
cp
Batch
ETL
ODS
DDS
DataMart
DWH
JDBC
80Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
…HTTP
BE
Srv
Srv
Srv
…
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
load
ODS
DDS
DataMart
DWH
JDBC
API
Queue ETL
ETLBatch
81Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
…HTTP
BE
Srv
Srv
Srv
…
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
load
ODS
DDS
DataMart
DWH
JDBC
API
Queue ETL
ETLBatch
loadETL
82Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
…HTTP
BE
Srv
Srv
Srv
…
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
load
ODS
DDS
DataMart
DWH
JDBC
API
Queue ETL
ETLBatchApp
ETLBatch
load
loadETL
83Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
…HTTP
BE
Srv
Srv
Srv
…
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
load
ODS
DDS
DataMart
DWH
JDBC
API
Queue ETL
ETLBatchApp
ETLBatch
load
loadETL
STG
BatchApp
Hadoop
HDFS
SQL
On
Hadoop
84Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
…HTTP
BE
Srv
Srv
Srv
…
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
load
ODS
DDS
DataMart
DWH
JDBC
API
Queue ETL
ETLBatchApp
ETLBatch
load
loadETL
STG
BatchApp
Hadoop
HDFS
SQL
On
Hadoop
RTI
App
85Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
…HTTP
BE
Srv
Srv
Srv
…
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
load
ODS
DDS
DataMart
DWH
JDBC
API
Queue ETL
ETLBatchApp
ETLBatch
load
loadETL
STG
BatchApp
Hadoop
HDFS
SQL
On
Hadoop
RTI
AppReplicate
86Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
87Pivotal Confidential–Internal Use Only
ELT CDC
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
OLAP
Data
Mining
RTBI…
FE
BE
FE
BE
FE
BE
CDC
Hadoop
In-Memory
Data Store
BI
Modern Data Architecture – Pipelining
Replication Queue
3-5 minutes late
In-Memory
Data Store
OLAP…
DWHHadoop
BI
Data
Mining
RTBI
DBMS DBMS DBMSWAL Replication
3-5 minutes late
88Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Pivotal GemFire
App
Spring XD
Streaming
Streaming
Data
Pivotal HD
Pivotal
HAWQ
ES
DDS
DataMart
Pivotal
Greenplum
Data
MartPostgreSQL
SP
Table
ODS
ETL
ETL
89Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
HTTP
Pivotal GemFire
App
Spring XD
Streaming
Streaming
Data
Pivotal HD
Pivotal
HAWQ
ES
DDS
DataMart
Pivotal
Greenplum
Data
MartPostgreSQL
SP
Table
ODS
ETL
ETL
Pivotal Cloud Foundry
FE
…
App
App
App
Queue BE
…
App
App
App
 Pivotal Labs – agile software
development for next-generation
applications
 Pivotal Cloud Foundry – PaaS for
customer applications
 RabbitMQ – distributed message
queue service on top of PCF
 Spring IO – foundation platform for
modern applications
90Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Spring XD
Streaming
Streaming
Data
Pivotal HD
Pivotal
HAWQ
ES
DDS
DataMart
Pivotal
Greenplum
Data
MartPostgreSQL
SP
Table
ODS
ETL
ETL
Pivotal GemFire
App
Pivotal GemFire and Apache Geode (incubating) –
in-memory data grid enabling real-time data processing and
real-time decision making for enterprises
91Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Pivotal GemFire
App
Streaming
Data
Pivotal HD
Pivotal
HAWQ
ES
DDS
DataMart
Pivotal
Greenplum
Data
MartPostgreSQL
SP
Table
ODS
ETL
ETL
Spring XD
Streaming
Spring XD – unified, distributed and extensible framework for
data pipelining: ingesting, batching, processing and exporting
92Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Pivotal GemFire
App
Spring XD
Streaming
ES
DDS
DataMart
Pivotal
Greenplum
PostgreSQL
SP
Table
ODS
ETL
ETL
Streaming
Data
Pivotal HD
Pivotal
HAWQ
Data
Mart
 Pivotal HD – leading Hadoop distribution based on ODP
 Pivotal HAWQ and Apache HAWQ (incubating) – bringing the
power of MPP to the Hadoop cluster, best in class SQL-on-
Hadoop solution
 Apache Spark – component of the Pivotal HD distribution,
modern framework for distributed data processing
93Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Pivotal GemFire
App
Spring XD
Streaming
Streaming
Data
Pivotal HD
Pivotal
HAWQ
ES
DDS
DataMart
Pivotal
Greenplum
Data
Mart
ODS
ETL
ETL
PostgreSQL
SP
Table
 Pivotal PostgreSQL – commercially supported by Pivotal
open source distribution of PostgreSQL
94Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Pivotal GemFire
App
Spring XD
Streaming
Streaming
Data
Pivotal HD
Pivotal
HAWQ
Data
MartPostgreSQL
SP
Table
ETL
ETL
ES
DDS
DataMart
Pivotal
Greenplum
ODS
Pivotal Greenplum – leading analytical MPP database,
foundation for the enterprise data warehousing systems and
advanced analytics
95Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
Pivotal GemFire
App
Spring XD
Streaming
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Streaming
Data
Pivotal HD
Pivotal
HAWQ
ES
DDS
DataMart
Pivotal
Greenplum
Data
MartPostgreSQL
SP
Table
ODS
ETL
ETL
Data Lake
96Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Spring XD
Streaming
ES
DDS
DataMart
Pivotal
Greenplum
PostgreSQL
SP
Table
ODS
ETL
ETL
Pivotal GemFire
App
Streaming
Data
Pivotal HD
Pivotal
HAWQ
Data
Mart
BI
Lambda Architecture
97Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
ES
DDS
DataMart
Pivotal
Greenplum
PostgreSQL
SP
Table
ODS
ETL
ETL
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Streaming
Pivotal HD
BI
Pivotal GemFire
App
Spring XD
Streaming
Data
Pivotal
HAWQ
Data
Mart
Pipelining
98Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Pivotal GemFire
App
Spring XD
Streaming
Streaming
Data
Pivotal HD
Pivotal
HAWQ
ES
DDS
DataMart
Pivotal
Greenplum
Data
MartPostgreSQL
SP
Table
ODS
ETL
ETL
99Pivotal Confidential–Internal Use Only 99Pivotal Confidential–Internal Use Only
Questions?
BUILT FOR THE SPEED OF BUSINESS

More Related Content

PDF
Data platform architecture
PDF
Information & Data Architecture
PDF
Modern Data Architecture
PDF
Building a Logical Data Fabric using Data Virtualization (ASEAN)
PDF
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
PPTX
Data Virtualization: An Introduction
PPTX
Design Principles for a Modern Data Warehouse
PPTX
Modern Data Warehousing with the Microsoft Analytics Platform System
Data platform architecture
Information & Data Architecture
Modern Data Architecture
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Data Virtualization: An Introduction
Design Principles for a Modern Data Warehouse
Modern Data Warehousing with the Microsoft Analytics Platform System

What's hot (20)

PPTX
DW Migration Webinar-March 2022.pptx
PDF
Building Lakehouses on Delta Lake with SQL Analytics Primer
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
PDF
DAS Slides: Data Governance - Combining Data Management with Organizational ...
PDF
Data Architecture vs Data Modeling
PDF
8 Steps to Creating a Data Strategy
PDF
Reference master data management
PPTX
Data Warehousing Trends, Best Practices, and Future Outlook
PDF
Data Architecture Strategies: Data Architecture for Digital Transformation
PDF
Enterprise Architecture vs. Data Architecture
PDF
Modern Data architecture Design
PPTX
Building a modern data warehouse
PPTX
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
PDF
Data Governance and Metadata Management
PDF
Data Warehouse - Incremental Migration to the Cloud
PPTX
Databricks on AWS.pptx
PDF
Modernizing to a Cloud Data Architecture
PDF
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
PDF
Data Governance
PPTX
Free Training: How to Build a Lakehouse
DW Migration Webinar-March 2022.pptx
Building Lakehouses on Delta Lake with SQL Analytics Primer
Data Lakehouse, Data Mesh, and Data Fabric (r1)
DAS Slides: Data Governance - Combining Data Management with Organizational ...
Data Architecture vs Data Modeling
8 Steps to Creating a Data Strategy
Reference master data management
Data Warehousing Trends, Best Practices, and Future Outlook
Data Architecture Strategies: Data Architecture for Digital Transformation
Enterprise Architecture vs. Data Architecture
Modern Data architecture Design
Building a modern data warehouse
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
Data Governance and Metadata Management
Data Warehouse - Incremental Migration to the Cloud
Databricks on AWS.pptx
Modernizing to a Cloud Data Architecture
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Governance
Free Training: How to Build a Lakehouse
Ad

Viewers also liked (13)

PPTX
Apache Spark Architecture
PDF
MapR M7: Providing an enterprise quality Apache HBase API
PPTX
Deep Learning for Fraud Detection
PDF
Apache Spark & Hadoop
PDF
MapR Tutorial Series
PDF
Architectural Overview of MapR's Apache Hadoop Distribution
PDF
Simplifying Big Data Analytics with Apache Spark
PDF
Hands on MapR -- Viadea
PPTX
MapR and Cisco Make IT Better
PDF
Apache Spark in Depth: Core Concepts, Architecture & Internals
PDF
Apache Spark 2.0: Faster, Easier, and Smarter
PDF
MapR Data Analyst
PDF
Introduction to Spark Internals
Apache Spark Architecture
MapR M7: Providing an enterprise quality Apache HBase API
Deep Learning for Fraud Detection
Apache Spark & Hadoop
MapR Tutorial Series
Architectural Overview of MapR's Apache Hadoop Distribution
Simplifying Big Data Analytics with Apache Spark
Hands on MapR -- Viadea
MapR and Cisco Make IT Better
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark 2.0: Faster, Easier, and Smarter
MapR Data Analyst
Introduction to Spark Internals
Ad

Similar to Modern Data Architecture (20)

PPT
Corporate-data-warehousing-training
ODP
Database Shootout: What's best for BI?
PPT
How To Buy Data Warehouse
PDF
Business intelligence an Overview
PDF
EMC Pivotal overview deck
PPT
Introduction To Msbi By Yasir
PPTX
Database-Management-Systems-An-Introduction (1).pptx
PPTX
ERP technology Areas.pptx
PDF
BI on Big Data Presentation
PPT
Data ware housing- Introduction to data ware housing
PPT
Business Intelligence with SQL Server
PDF
Cs437 lecture 1-6
PPT
Data Warehousing Datamining Concepts
PPTX
Big Data and the BI Wild West
PPTX
Day 1 (Lecture 1): Data Management- The Foundation of all Analytics
PPTX
SoftServe BI/BigData Workshop in Utah
PPT
Msbi by quontra us
PPTX
Business Analytics
PPTX
Modernizing Your Data Warehouse using APS
Corporate-data-warehousing-training
Database Shootout: What's best for BI?
How To Buy Data Warehouse
Business intelligence an Overview
EMC Pivotal overview deck
Introduction To Msbi By Yasir
Database-Management-Systems-An-Introduction (1).pptx
ERP technology Areas.pptx
BI on Big Data Presentation
Data ware housing- Introduction to data ware housing
Business Intelligence with SQL Server
Cs437 lecture 1-6
Data Warehousing Datamining Concepts
Big Data and the BI Wild West
Day 1 (Lecture 1): Data Management- The Foundation of all Analytics
SoftServe BI/BigData Workshop in Utah
Msbi by quontra us
Business Analytics
Modernizing Your Data Warehouse using APS

Recently uploaded (20)

PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Computer network topology notes for revision
PDF
.pdf is not working space design for the following data for the following dat...
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
Mega Projects Data Mega Projects Data
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
IB Computer Science - Internal Assessment.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Computer network topology notes for revision
.pdf is not working space design for the following data for the following dat...
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Business Ppt On Nestle.pptx huunnnhhgfvu
Mega Projects Data Mega Projects Data
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Foundation of Data Science unit number two notes
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Clinical guidelines as a resource for EBP(1).pdf
oil_refinery_comprehensive_20250804084928 (1).pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Business Acumen Training GuidePresentation.pptx
Database Infoormation System (DBIS).pptx
Miokarditis (Inflamasi pada Otot Jantung)
IB Computer Science - Internal Assessment.pptx

Modern Data Architecture

  • 1. 1Pivotal Confidential–Internal Use Only 1Pivotal Confidential–Internal Use Only Modern Data Architecture Alexey Grishchenko
  • 2. 2Pivotal Confidential–Internal Use Only About me Enterprise Architect @ Pivotal  7 years in data processing  5 years with MPP  4 years with Hadoop  Spark contributor  http://0x0fff.com
  • 3. 3Pivotal Confidential–Internal Use Only How it started… Front End
  • 4. 4Pivotal Confidential–Internal Use Only How it started… Front End Back End
  • 5. 5Pivotal Confidential–Internal Use Only How it started… Front End Back End DBMS
  • 6. 6Pivotal Confidential–Internal Use Only How it started… Front End Back End DBMS What about BI?
  • 7. 7Pivotal Confidential–Internal Use Only How it started… Front End Back End DBMS Just put it there!
  • 8. 8Pivotal Confidential–Internal Use Only How it started… Front End Back End DBMS BI
  • 9. 9Pivotal Confidential–Internal Use Only How it started… Front End Back End DBMS BI Was it fast?
  • 10. 10Pivotal Confidential–Internal Use Only How it started… Front End 10ms Back End DBMS BI 100ms 200ms 1-2 min
  • 11. 11Pivotal Confidential–Internal Use Only How it started… Front End 10ms Back End DBMS BI 100ms 200ms 1-2 min yes, single server…
  • 12. 12Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 200ms 1-2 min More users got workstations
  • 13. 13Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 400ms 800ms 1-2 min
  • 14. 14Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 400ms 800ms 1-2 min Split!
  • 15. 15Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 300ms 600ms 1-2 min
  • 16. 16Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 300ms 600ms 1-2 min Even more users?
  • 17. 17Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 300ms 600ms 1-2 min Split!
  • 18. 18Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 400ms 1-2 min Front End Back End Front End Back End
  • 19. 19Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 400ms 1-2 min Front End Back End Front End Back End What about automated systems?
  • 20. 20Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 1 sec 5-10 min Front End Back End Front End Back End Front End Back End Front End Back End
  • 21. 21Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 1 sec 5-10 min Front End Back End Front End Back End Front End Back End Front End Back End Database, please, live!
  • 22. 22Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 1 sec 5-10 min Front End Back End Front End Back End Front End Back End Front End Back End
  • 23. 23Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 800ms 15-20 min Front End Back End Front End Back End Front End Back End Front End Back End
  • 24. 24Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 800ms 15-20 min Front End Back End Front End Back End Front End Back End Front End Back End What if “split” didn’t help this time?
  • 25. 25Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 800ms 15-20 min Front End Back End Front End Back End Front End Back End Front End Back End Split more! Eventually it will help…
  • 26. 26Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 300ms 35-40 min Front End Back End Front End Back End Front End Back End Front End Back End DBMS DBMSDBMSDBMS
  • 27. 27Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 300ms 35-40 min Front End Back End Front End Back End Front End Back End Front End Back End DBMS DBMSDBMSDBMS
  • 28. 28Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 300ms 35-40 min Front End Back End Front End Back End Front End Back End Front End Back End DBMS DBMSDBMSDBMS Sales went 10% up!
  • 29. 29Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 300ms 35-40 min Front End Back End Front End Back End Front End Back End Front End Back End DBMS DBMSDBMSDBMS Sales went 10% up! Sales went 20% down!
  • 30. 30Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 600ms 2-3 hrs Front End Back End Front End Back End Front End Back End Front End Back End DBMS DBMSDBMSDBMS Sales went 10% up! Sales went 20% down!
  • 31. 31Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 600ms 2-3 hrs Front End Back End Front End Back End Front End Back End Front End Back End DBMS DBMSDBMSDBMS Sales went 10% up! Sales went 20% down! Stop loading my system with your stupid reports!
  • 32. 32Pivotal Confidential–Internal Use Only BI The Era of Data Warehouse 100ms DBMS 300ms 2 days FE BE DBMS DBMSDBMSDBMS FE BE FE BE FE BE FE BE ETL DWH 1 day
  • 33. 33Pivotal Confidential–Internal Use Only BI The Era of Data Warehouse 100ms DBMS 300ms 2 days FE BE DBMS DBMSDBMSDBMS FE BE FE BE FE BE FE BE ETL DWH 1 day We need more reports!
  • 34. 34Pivotal Confidential–Internal Use Only BI The Era of Data Warehouse 100ms DBMS 300ms 3-4 days FE BE DBMS DBMSDBMSDBMS FE BE FE BE FE BE FE BE ETL DWH 1 day Data Mining OLAP…
  • 35. 35Pivotal Confidential–Internal Use Only BI The Era of Data Warehouse 100ms DBMS 300ms 3-4 days FE BE DBMS DBMSDBMSDBMS FE BE FE BE FE BE FE BE ETL DWH 1 day Data Mining OLAP… We need secondary site!
  • 36. 36Pivotal Confidential–Internal Use Only The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP…
  • 37. 37Pivotal Confidential–Internal Use Only The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE WAL Replication 3-5 minutes late
  • 38. 38Pivotal Confidential–Internal Use Only The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE WAL Replication 3-5 minutes late
  • 39. 39Pivotal Confidential–Internal Use Only The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE WAL Replication 3-5 minutes late Where is our DWH? We need this data now!
  • 40. 40Pivotal Confidential–Internal Use Only The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE WAL Replication 3-5 minutes late
  • 41. 41Pivotal Confidential–Internal Use Only ETL The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late DWH BI Data Mining OLAP… 5-7 days DBMS DBMS DBMS DBMS DBMS
  • 42. 42Pivotal Confidential–Internal Use Only ETL The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late DWH BI Data Mining OLAP… 5-7 days DBMS DBMS DBMS DBMS DBMS Why is this data so old?
  • 43. 43Pivotal Confidential–Internal Use Only ETL The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late DWH BI Data Mining OLAP… 5-7 days DBMS DBMS DBMS DBMS DBMS
  • 44. 44Pivotal Confidential–Internal Use Only ETL Advanced Architecture – ELT 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late DWH BI Data Mining OLAP… 5-7 days DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS… ETL DDS Data Marts Reports Aggregates OLAP DBMS DBMS DBMS… ELT DDS Data Marts Reports Aggregates OLAP ODS ODS ODS…
  • 45. 45Pivotal Confidential–Internal Use Only ELT Advanced Architecture – ELT 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 1 day BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late DWH BI Data Mining OLAP… 5-7 days DBMS DBMS DBMS DBMS DBMS
  • 46. 46Pivotal Confidential–Internal Use Only ELT Advanced Architecture – CDC 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 1 day BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late DWH BI Data Mining OLAP… 5-7 days DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS… ELT DDS Data Marts Reports Aggregates OLAP ODS ODS ODS… DBMS DBMS DBMS… ELT DDS Data Marts Reports Aggregates OLAP ODS ODS ODS… CDC 1 day 1 hour
  • 47. 47Pivotal Confidential–Internal Use Only ELT CDC Advanced Architecture – CDC 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH
  • 48. 48Pivotal Confidential–Internal Use Only ELT CDC Advanced Architecture – CDC 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Why is our secondary site’s DWH so old?
  • 49. 49Pivotal Confidential–Internal Use Only ELT CDC 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Moving Forward
  • 50. 50Pivotal Confidential–Internal Use Only ELT CDC 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Our problems are Moving Forward
  • 51. 51Pivotal Confidential–Internal Use Only ELT CDC 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Our problems are  Time to action takes up to 7 days Moving Forward
  • 52. 52Pivotal Confidential–Internal Use Only ELT CDC 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Our problems are  Time to action takes up to 7 days  Amount of data is growing Moving Forward
  • 53. 53Pivotal Confidential–Internal Use Only ELT CDC 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Our problems are  Time to action takes up to 7 days  Amount of data is growing  DWH MPP storage is expensive Moving Forward
  • 54. 54Pivotal Confidential–Internal Use Only ELT CDC Modern Architectures 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Our problems are  Time to action takes up to 7 days  Amount of data is growing  DWH MPP storage is expensive Data Lake
  • 55. 55Pivotal Confidential–Internal Use Only ELT CDC Modern Architectures 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Our problems are  Time to action takes up to 7 days  Amount of data is growing  DWH MPP storage is expensive Lambda Data Lake
  • 56. 56Pivotal Confidential–Internal Use Only ELT CDC Modern Architectures – Data Lake 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Hadoop DBMS DBMS DBMS… ELT DDS OLAP Data Marts Aggregates Reports ODS ODS ODS… CDC DWH ODS UDS Analytical Archives BI Data Mining OLAP SQL-on-Hadoop Data Mining At Scale
  • 57. 57Pivotal Confidential–Internal Use Only ELT CDC Modern Architectures – Data Lake 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH
  • 58. 58Pivotal Confidential–Internal Use Only ELT CDC Modern Architectures – Data Lake 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late Data Mining BI OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ?
  • 59. 59Pivotal Confidential–Internal Use Only ELT CDC Modern Architectures – Lambda 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late Data Mining BI OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? Source Data Speed Layer Batch Layer Serving Layer Query Query Master Dataset Batch View Batch View Batch View Real-time View Real-time View Real-time View
  • 60. 60Pivotal Confidential–Internal Use Only ELT CDC Modern Architectures – Lambda 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late Data Mining BI OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ?
  • 61. 61Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC Modern Architectures – Lambda 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining
  • 62. 62Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC Modern Architectures 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Our problems are
  • 63. 63Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC Modern Architectures 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Our problems are  Too many standby systems
  • 64. 64Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC Modern Architectures 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Our problems are  Too many standby systems  How to replicate Hadoop cluster?
  • 65. 65Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC Modern Architectures 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Our problems are  Too many standby systems  How to replicate Hadoop cluster?  How to sync data in real-time systems?
  • 66. 66Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC Modern Architectures 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Our problems are  Too many standby systems  How to replicate Hadoop cluster?  How to sync data in real-time systems?  How to better sync DWH?
  • 67. 67Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC Modern Architectures 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Our problems are  Too many standby systems  How to replicate Hadoop cluster?  How to sync data in real-time systems?  How to better sync DWH? Pipelining
  • 68. 68Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining
  • 69. 69Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP
  • 70. 70Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP
  • 71. 71Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Table
  • 72. 72Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Log Table
  • 73. 73Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Log Table CDC copy Parse Batch
  • 74. 74Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Log Table CDC copy Parse Batch ETL cp Batch ETL
  • 75. 75Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Log Table CDC copy Parse Batch ETL cp Batch ETL load ODS DWH
  • 76. 76Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Log Table CDC copy Parse Batch ETL cp Batch ETL load ODS DDS DWH
  • 77. 77Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Log Table CDC copy Parse Batch ETL cp Batch ETL load ODS DDS DataMart DWH
  • 78. 78Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Log Table CDC copy Parse Batch ETL cp Batch ETL load ODS DDS DataMart DWH JDBC
  • 79. 79Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv … OLTP SP JDBC Log Table CDC copy Parse Batch ETL cp Batch ETL ODS DDS DataMart DWH JDBC
  • 80. 80Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv … OLTP SP JDBC Log Table CDC copy Parse Batch load ODS DDS DataMart DWH JDBC API Queue ETL ETLBatch
  • 81. 81Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv … OLTP SP JDBC Log Table CDC copy Parse Batch load ODS DDS DataMart DWH JDBC API Queue ETL ETLBatch loadETL
  • 82. 82Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv … OLTP SP JDBC Log Table CDC copy Parse Batch load ODS DDS DataMart DWH JDBC API Queue ETL ETLBatchApp ETLBatch load loadETL
  • 83. 83Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv … OLTP SP JDBC Log Table CDC copy Parse Batch load ODS DDS DataMart DWH JDBC API Queue ETL ETLBatchApp ETLBatch load loadETL STG BatchApp Hadoop HDFS SQL On Hadoop
  • 84. 84Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv … OLTP SP JDBC Log Table CDC copy Parse Batch load ODS DDS DataMart DWH JDBC API Queue ETL ETLBatchApp ETLBatch load loadETL STG BatchApp Hadoop HDFS SQL On Hadoop RTI App
  • 85. 85Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv … OLTP SP JDBC Log Table CDC copy Parse Batch load ODS DDS DataMart DWH JDBC API Queue ETL ETLBatchApp ETLBatch load loadETL STG BatchApp Hadoop HDFS SQL On Hadoop RTI AppReplicate
  • 86. 86Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining
  • 87. 87Pivotal Confidential–Internal Use Only ELT CDC FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH OLAP Data Mining RTBI… FE BE FE BE FE BE CDC Hadoop In-Memory Data Store BI Modern Data Architecture – Pipelining Replication Queue 3-5 minutes late In-Memory Data Store OLAP… DWHHadoop BI Data Mining RTBI DBMS DBMS DBMSWAL Replication 3-5 minutes late
  • 88. 88Pivotal Confidential–Internal Use Only Pivotal and Modern Data Architecture BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Pivotal GemFire App Spring XD Streaming Streaming Data Pivotal HD Pivotal HAWQ ES DDS DataMart Pivotal Greenplum Data MartPostgreSQL SP Table ODS ETL ETL
  • 89. 89Pivotal Confidential–Internal Use Only Pivotal and Modern Data Architecture BI HTTP Pivotal GemFire App Spring XD Streaming Streaming Data Pivotal HD Pivotal HAWQ ES DDS DataMart Pivotal Greenplum Data MartPostgreSQL SP Table ODS ETL ETL Pivotal Cloud Foundry FE … App App App Queue BE … App App App  Pivotal Labs – agile software development for next-generation applications  Pivotal Cloud Foundry – PaaS for customer applications  RabbitMQ – distributed message queue service on top of PCF  Spring IO – foundation platform for modern applications
  • 90. 90Pivotal Confidential–Internal Use Only Pivotal and Modern Data Architecture BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Spring XD Streaming Streaming Data Pivotal HD Pivotal HAWQ ES DDS DataMart Pivotal Greenplum Data MartPostgreSQL SP Table ODS ETL ETL Pivotal GemFire App Pivotal GemFire and Apache Geode (incubating) – in-memory data grid enabling real-time data processing and real-time decision making for enterprises
  • 91. 91Pivotal Confidential–Internal Use Only Pivotal and Modern Data Architecture BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Pivotal GemFire App Streaming Data Pivotal HD Pivotal HAWQ ES DDS DataMart Pivotal Greenplum Data MartPostgreSQL SP Table ODS ETL ETL Spring XD Streaming Spring XD – unified, distributed and extensible framework for data pipelining: ingesting, batching, processing and exporting
  • 92. 92Pivotal Confidential–Internal Use Only Pivotal and Modern Data Architecture BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Pivotal GemFire App Spring XD Streaming ES DDS DataMart Pivotal Greenplum PostgreSQL SP Table ODS ETL ETL Streaming Data Pivotal HD Pivotal HAWQ Data Mart  Pivotal HD – leading Hadoop distribution based on ODP  Pivotal HAWQ and Apache HAWQ (incubating) – bringing the power of MPP to the Hadoop cluster, best in class SQL-on- Hadoop solution  Apache Spark – component of the Pivotal HD distribution, modern framework for distributed data processing
  • 93. 93Pivotal Confidential–Internal Use Only Pivotal and Modern Data Architecture BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Pivotal GemFire App Spring XD Streaming Streaming Data Pivotal HD Pivotal HAWQ ES DDS DataMart Pivotal Greenplum Data Mart ODS ETL ETL PostgreSQL SP Table  Pivotal PostgreSQL – commercially supported by Pivotal open source distribution of PostgreSQL
  • 94. 94Pivotal Confidential–Internal Use Only Pivotal and Modern Data Architecture BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Pivotal GemFire App Spring XD Streaming Streaming Data Pivotal HD Pivotal HAWQ Data MartPostgreSQL SP Table ETL ETL ES DDS DataMart Pivotal Greenplum ODS Pivotal Greenplum – leading analytical MPP database, foundation for the enterprise data warehousing systems and advanced analytics
  • 95. 95Pivotal Confidential–Internal Use Only Pivotal and Modern Data Architecture Pivotal GemFire App Spring XD Streaming BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Streaming Data Pivotal HD Pivotal HAWQ ES DDS DataMart Pivotal Greenplum Data MartPostgreSQL SP Table ODS ETL ETL Data Lake
  • 96. 96Pivotal Confidential–Internal Use Only Pivotal and Modern Data Architecture Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Spring XD Streaming ES DDS DataMart Pivotal Greenplum PostgreSQL SP Table ODS ETL ETL Pivotal GemFire App Streaming Data Pivotal HD Pivotal HAWQ Data Mart BI Lambda Architecture
  • 97. 97Pivotal Confidential–Internal Use Only Pivotal and Modern Data Architecture ES DDS DataMart Pivotal Greenplum PostgreSQL SP Table ODS ETL ETL Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Streaming Pivotal HD BI Pivotal GemFire App Spring XD Streaming Data Pivotal HAWQ Data Mart Pipelining
  • 98. 98Pivotal Confidential–Internal Use Only Pivotal and Modern Data Architecture BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Pivotal GemFire App Spring XD Streaming Streaming Data Pivotal HD Pivotal HAWQ ES DDS DataMart Pivotal Greenplum Data MartPostgreSQL SP Table ODS ETL ETL
  • 99. 99Pivotal Confidential–Internal Use Only 99Pivotal Confidential–Internal Use Only Questions?
  • 100. BUILT FOR THE SPEED OF BUSINESS