SlideShare a Scribd company logo
© 2014 VMware Inc. All rights reserved.
Virtualized Big Data Platform
@ VMware Corp IT
Rajit Saha
Hadoop Development Lead
VMware Corp IT Data Solution and Delivery
An Enterprise Data Warehouse meets an Elephant
2
Business Use Case for Big Data Analytics
@ VMware BI Space
Personalized Marketing & Customer Targeting
Personalized Campaign Content Strategy
MyVMware Log Analytics
Combine User Level data -
logins and other activities with
Clickstream Data and Product
Data
VMware Product’s List Price Optimization and
Deal Analytics for VMware Pricing Team
- Complex ETL, Bigger Joins
- Flattening Star Schema Tables
- Propensity Modeling
E
D
W
- Deeper Learning of VMware Product Issues
- Build highly intelligent recommendation
System to fix Customer Issues with faster turn
around time
GSS Service Request
Logs Analytics
- High Volume ~ 400TB
- A lot of Variety of data
- Complex parsing
Clickstream Data Analytics
• Path analysis – First user visit to buy
product
• Propensity Modeling
• Predictive Analytics - which product
user will buy
• Customer Lifetime Value Analysis
554 columns
1.5B Rows
20TB Data (
2yrs)
Variety
Volume
Velocity
B
I
G
D
A
T
A
3
• This Big Data Cluster is fully Virtualized
• based on vSphere 6.0 and VMware Big Data Extensions 2.2
• We used EMC Isilon 7.2.0.2 with two patches for HDFS Storage
• We used Pivotal Big Data Suite 3.0 for Hadoop 2.6 and HAWQ 1.3
• We used Pivotal Spring XD 1.2 for Data Ingestion to Hadoop
• We integrated this with Alpine Data Lab 5.4 for running
• Deeper Analytic Functions
• Machine Learning Algorithms
• We integrated HUE 2.6 for GUI based HIVE/PIG Query execution client
Components of Big Data Cluster
4
NAS Shared Storage
[HadoopTempSpace]
H S3
H S 4
PXFH A WQ Segment 1
H A WQ Segment 4
Pivotal Extension Framework
ZK Zookeeper Server
SC Spring XD Container
N M YA RN Node M anager
10G Data Link
I si lon
H
D
F
S
I si lon
I si lon
I si lon
I si lon
· VMWare Big Data Extension 2.2 - provisions H adoop VM s
· A pplication Stack: Pivotal HD (PHD 3.0), Spring XD 1.2, RabbitMQ 3.5.3, PostGres
9.4
· A nalytics Tool : Alpine Data Lab 5.4
· H DFS Storage: EMC Isilon 7.2.0.2 + Restricted Patch-14925
· H DFS Capacity : 30T
· Temp Storage on : vmdks on VNX NAS
· 4 HAWQ Segments & 4 Mapred Local Directories w ill be mounted on 4 VMDKs
on NFS in Worker VMs
HADOOP WORKER 1
8 vCPU & 52G RAM
NM HS 4
HS 3
PXF
200G
HS 1
HS 2
200G 200G 200G
HADOOP WORKER 2
8 vCPU & 52G RAM
NM HS 4
HS 3
PXF
200G
HS 1
HS 2
200G 200G 200G
HADOOP WORKER 3
8 vCPU & 52G RAM
NM HS 4
HS 3
PXFZK
200G
HS 1
HS 2
200G 200G 200G
HADOOP WORKER 4
8 vCPU & 52G RAM
NM HS 4
HS 3
PXFZK
200G
HS 1
HS 2
200G 200G 200G
HADOOP WORKER 5
8 vCPU & 52G RAM
NM HS 4
HS 3
PXFZK
200G
HS 1
HS 2
200G 200G 200G
H S2
H S1
H A WQ Segment 2
H A WQ Segment 3
180G
HADOOP MASTER 2
8 vCPU & 48G RAM
HAWQ Master
Standby RM
History Server
App Timeline Server
180G
HADOOP MASTER 1
8 vCPU & 48G RAM
Active RM
HAWQ Master Standby
Hive Server2
Hive -Metastore
180G
HADOOP CLIENT
4 vCPU & 36G RAM
Clients
HCat, HDFS,
Hive,
MapReduce2, Pig
, Tez, YARN,
ZooKeeper
Spring XD
Admin
POSTGRES
RABBITMQ
200G
MANAGEMENT
4 vCPU & 12G RAM
NAGIOS
AMBARI
GANGLIA
A LPIN E DA TA
LAB(PROD)
8vCPU & 48G
RA M
500G
A LPIN E DA TA
LAB (STAGE)
8vCPU & 48G
RA M
500G
VM w are Corp IT Big Data A nalytic Platform [ Production ] – A pplication A rchitecture Stack
HUE
SC
Hive MySQL
Web HCat
Server
SC
5
On-Prem Big Data Production Datacenter
6
Apache Ambari
– The Hadoop Cluster Management Console
Management
&
Monitoring
- HDFS
- Yarn/Map reduce
- Hive
- HAWQ
- Spring XD
Clickstream
ftps.vmware.com
raw data files
firewall
Daily push of
Clickstream Logs
Data Ingestion to Isilon HDFS
via Spring XD
Lookup
Logs
Clickstream
Logs
Adv. Analytics
Users
• Data Cleaning
• Better Consumable
Structured data
• Data Partitioning
• Schema Building
• Faster Analytic Power
- Daily 2M Clickstream Records ( ~10GB ) ares being ingested
from Adobe Omniture to Isilon HDFS
- 1.5Billion Records and 554 columns and ~20TB of
data
- Data Cleanup and Pre Processing using PIG, Hadoop
Streaming and Python Scripts
- Fit the Data into HIVE/HAWQ Schema
- End Users ( Data Scientists ) consume via HUE/pgAdmin/Alpine
Data Lab
python
Data Processing Pipeline – Click Stream Data
8
Data Consumption – pgAdmin3 ( via HAWQ Database) ….
9
And visualize the results ..
37%
7%
7%6%
6%
6%
4%
4%
3%
3%
2%
2%
2%
2%
2% 1%
1% 1% 1% 1%
Top 20 Countries with unique
vmware.com Visits
on 2015 Q1 usa
jpn
deu
gbr
chn
ind
can
fra
aus
kor
esp
bra
34%
7%
7%
6%
10%
6%
3%
3%
2%
4%
3%
3%
2%
2%
2% 1%
1%
1%
1% 1%
Top 20 Countries with unique
vmware.com Visitors
on 2015 Q1
usa
jpn
deu
gbr
chn
ind
can
fra
aus
kor
esp
bra
ita
nld
rus
che
twn
pol
mex
swe
Disclaimer : This is based on Synthesized Dataset for demo purpose, not
Real Data
10
Data Consumption – HUE
Hive Query to find out unique
visits in VMware site 2015 Q1 0
2000000
4000000
6000000
8000000
10000000
12000000
14000000
VisitCount
Month
Unique Visits in 2014 and 2015 month wise
visits
Disclaimer : This is based on Synthesized Dataset for demo purpose, not
Real Data
11
Advanced Data Analytics by Alpine Data Lab
Time Series Analysis on Jan 2015
Clickstream Data
12
At VMware IT, we have established the fact that an
Enterprise Big Data Analytics Platform can be
successfully built and run on top of VMware Virtual
Infrastructure with EMC Isilon and PHD 3.0
-with great performance
Take Away …
13
Thank You
QA

More Related Content

PPTX
LendingClub RealTime BigData Platform with Oracle GoldenGate
PPTX
Oracle Goldengate for Big Data - LendingClub Implementation
PPTX
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
PPTX
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
PDF
Real-time analytics with Druid at Appsflyer
PDF
BIG DATA: From mammoth to elephant
PPTX
Symantec: Cassandra Data Modelling techniques in action
PDF
SDM (Standardized Data Management) - A Dynamic Adaptive Ingestion Frameworks ...
LendingClub RealTime BigData Platform with Oracle GoldenGate
Oracle Goldengate for Big Data - LendingClub Implementation
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
Real-time analytics with Druid at Appsflyer
BIG DATA: From mammoth to elephant
Symantec: Cassandra Data Modelling techniques in action
SDM (Standardized Data Management) - A Dynamic Adaptive Ingestion Frameworks ...

What's hot (20)

PDF
Data Gloveboxes: A Philosophy of Data Science Data Security
PDF
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
PDF
Druid @ branch
PDF
Building Data Applications with Apache Druid
PDF
Benchmarking Apache Druid
PPTX
Apache Druid Design and Future prospect
PDF
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
PDF
Hybrid solutions – combining in memory solutions with SSD - Christos Erotocritou
PPTX
Big Data at Tube: Events to Insights to Action
PPTX
Programmatic Bidding Data Streams & Druid
PDF
NoSQL no more: SQL on Druid with Apache Calcite
PDF
Change Data Streaming Patterns for Microservices With Debezium
PDF
August meetup - All about Apache Druid
PDF
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
PDF
Archmage, Pinterest’s Real-time Analytics Platform on Druid
PDF
PDF
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
PDF
Performance Analysis of Apache Spark and Presto in Cloud Environments
PPTX
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
PPTX
Druid at Hadoop Ecosystem
Data Gloveboxes: A Philosophy of Data Science Data Security
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Druid @ branch
Building Data Applications with Apache Druid
Benchmarking Apache Druid
Apache Druid Design and Future prospect
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Hybrid solutions – combining in memory solutions with SSD - Christos Erotocritou
Big Data at Tube: Events to Insights to Action
Programmatic Bidding Data Streams & Druid
NoSQL no more: SQL on Druid with Apache Calcite
Change Data Streaming Patterns for Microservices With Debezium
August meetup - All about Apache Druid
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Performance Analysis of Apache Spark and Presto in Cloud Environments
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
Druid at Hadoop Ecosystem
Ad

Viewers also liked (15)

PDF
WBDB 2014 Benchmarking Virtualized Hadoop Clusters
PPTX
1. beyond mission critical virtualizing big data and hadoop
PDF
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
PDF
Soyez Big Data ready avec Isilon
 
PPTX
7. emc isilon hdfs enterprise storage for hadoop
PPTX
Emerging Big Data & Analytics Trends with Hadoop
PDF
EMC Hadoop Starter Kit
 
PPTX
EMC config Hadoop
PDF
Big data on virtualized infrastucture
PPTX
Gartner IT Symposium 2014 - VMware Cloud Services
PPTX
VMworld - vSphere Distributed Switch 6.0 Technical Deep Dive
PPTX
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
PDF
Cloud Management with vRealize Operations
PDF
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
 
PDF
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
 
WBDB 2014 Benchmarking Virtualized Hadoop Clusters
1. beyond mission critical virtualizing big data and hadoop
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
Soyez Big Data ready avec Isilon
 
7. emc isilon hdfs enterprise storage for hadoop
Emerging Big Data & Analytics Trends with Hadoop
EMC Hadoop Starter Kit
 
EMC config Hadoop
Big data on virtualized infrastucture
Gartner IT Symposium 2014 - VMware Cloud Services
VMworld - vSphere Distributed Switch 6.0 Technical Deep Dive
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Cloud Management with vRealize Operations
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
 
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
 
Ad

Similar to Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015 (20)

PDF
Pivotal Real Time Data Stream Analytics
PDF
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
PDF
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
PDF
How the Development Bank of Singapore solves on-prem compute capacity challen...
PPTX
Big Data Applications Made Easy: Fact Or Fiction?
PPTX
Vmware Serengeti - Based on Infochimps Ironfan
PPTX
Empower Data-Driven Organizations with HPE and Hadoop
PDF
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
PDF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
PDF
Presentation architecting virtualized infrastructure for big data
PDF
Presentation architecting virtualized infrastructure for big data
PPTX
Modernizing Your Data Warehouse using APS
PPTX
Architecting virtualized infrastructure for big data presentation
PPTX
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
PDF
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
PDF
From limited Hadoop compute capacity to increased data scientist efficiency
PDF
Data Orchestration Platform for the Cloud
PDF
Key trends in Big Data and new reference architecture from Hewlett Packard En...
PDF
Replication in real-time from Oracle and MySQL into data warehouses and analy...
PDF
Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Pivotal Real Time Data Stream Analytics
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
How the Development Bank of Singapore solves on-prem compute capacity challen...
Big Data Applications Made Easy: Fact Or Fiction?
Vmware Serengeti - Based on Infochimps Ironfan
Empower Data-Driven Organizations with HPE and Hadoop
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Presentation architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big data
Modernizing Your Data Warehouse using APS
Architecting virtualized infrastructure for big data presentation
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
From limited Hadoop compute capacity to increased data scientist efficiency
Data Orchestration Platform for the Cloud
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics

Recently uploaded (20)

PPTX
Modelling in Business Intelligence , information system
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PDF
Introduction to the R Programming Language
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Lecture1 pattern recognition............
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Transcultural that can help you someday.
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
Managing Community Partner Relationships
PDF
Microsoft Core Cloud Services powerpoint
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
Modelling in Business Intelligence , information system
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Introduction to the R Programming Language
importance of Data-Visualization-in-Data-Science. for mba studnts
STERILIZATION AND DISINFECTION-1.ppthhhbx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
IBA_Chapter_11_Slides_Final_Accessible.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Qualitative Qantitative and Mixed Methods.pptx
Lecture1 pattern recognition............
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Transcultural that can help you someday.
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Optimise Shopper Experiences with a Strong Data Estate.pdf
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Managing Community Partner Relationships
Microsoft Core Cloud Services powerpoint
Galatica Smart Energy Infrastructure Startup Pitch Deck

Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015

  • 1. © 2014 VMware Inc. All rights reserved. Virtualized Big Data Platform @ VMware Corp IT Rajit Saha Hadoop Development Lead VMware Corp IT Data Solution and Delivery An Enterprise Data Warehouse meets an Elephant
  • 2. 2 Business Use Case for Big Data Analytics @ VMware BI Space Personalized Marketing & Customer Targeting Personalized Campaign Content Strategy MyVMware Log Analytics Combine User Level data - logins and other activities with Clickstream Data and Product Data VMware Product’s List Price Optimization and Deal Analytics for VMware Pricing Team - Complex ETL, Bigger Joins - Flattening Star Schema Tables - Propensity Modeling E D W - Deeper Learning of VMware Product Issues - Build highly intelligent recommendation System to fix Customer Issues with faster turn around time GSS Service Request Logs Analytics - High Volume ~ 400TB - A lot of Variety of data - Complex parsing Clickstream Data Analytics • Path analysis – First user visit to buy product • Propensity Modeling • Predictive Analytics - which product user will buy • Customer Lifetime Value Analysis 554 columns 1.5B Rows 20TB Data ( 2yrs) Variety Volume Velocity B I G D A T A
  • 3. 3 • This Big Data Cluster is fully Virtualized • based on vSphere 6.0 and VMware Big Data Extensions 2.2 • We used EMC Isilon 7.2.0.2 with two patches for HDFS Storage • We used Pivotal Big Data Suite 3.0 for Hadoop 2.6 and HAWQ 1.3 • We used Pivotal Spring XD 1.2 for Data Ingestion to Hadoop • We integrated this with Alpine Data Lab 5.4 for running • Deeper Analytic Functions • Machine Learning Algorithms • We integrated HUE 2.6 for GUI based HIVE/PIG Query execution client Components of Big Data Cluster
  • 4. 4 NAS Shared Storage [HadoopTempSpace] H S3 H S 4 PXFH A WQ Segment 1 H A WQ Segment 4 Pivotal Extension Framework ZK Zookeeper Server SC Spring XD Container N M YA RN Node M anager 10G Data Link I si lon H D F S I si lon I si lon I si lon I si lon · VMWare Big Data Extension 2.2 - provisions H adoop VM s · A pplication Stack: Pivotal HD (PHD 3.0), Spring XD 1.2, RabbitMQ 3.5.3, PostGres 9.4 · A nalytics Tool : Alpine Data Lab 5.4 · H DFS Storage: EMC Isilon 7.2.0.2 + Restricted Patch-14925 · H DFS Capacity : 30T · Temp Storage on : vmdks on VNX NAS · 4 HAWQ Segments & 4 Mapred Local Directories w ill be mounted on 4 VMDKs on NFS in Worker VMs HADOOP WORKER 1 8 vCPU & 52G RAM NM HS 4 HS 3 PXF 200G HS 1 HS 2 200G 200G 200G HADOOP WORKER 2 8 vCPU & 52G RAM NM HS 4 HS 3 PXF 200G HS 1 HS 2 200G 200G 200G HADOOP WORKER 3 8 vCPU & 52G RAM NM HS 4 HS 3 PXFZK 200G HS 1 HS 2 200G 200G 200G HADOOP WORKER 4 8 vCPU & 52G RAM NM HS 4 HS 3 PXFZK 200G HS 1 HS 2 200G 200G 200G HADOOP WORKER 5 8 vCPU & 52G RAM NM HS 4 HS 3 PXFZK 200G HS 1 HS 2 200G 200G 200G H S2 H S1 H A WQ Segment 2 H A WQ Segment 3 180G HADOOP MASTER 2 8 vCPU & 48G RAM HAWQ Master Standby RM History Server App Timeline Server 180G HADOOP MASTER 1 8 vCPU & 48G RAM Active RM HAWQ Master Standby Hive Server2 Hive -Metastore 180G HADOOP CLIENT 4 vCPU & 36G RAM Clients HCat, HDFS, Hive, MapReduce2, Pig , Tez, YARN, ZooKeeper Spring XD Admin POSTGRES RABBITMQ 200G MANAGEMENT 4 vCPU & 12G RAM NAGIOS AMBARI GANGLIA A LPIN E DA TA LAB(PROD) 8vCPU & 48G RA M 500G A LPIN E DA TA LAB (STAGE) 8vCPU & 48G RA M 500G VM w are Corp IT Big Data A nalytic Platform [ Production ] – A pplication A rchitecture Stack HUE SC Hive MySQL Web HCat Server SC
  • 5. 5 On-Prem Big Data Production Datacenter
  • 6. 6 Apache Ambari – The Hadoop Cluster Management Console Management & Monitoring - HDFS - Yarn/Map reduce - Hive - HAWQ - Spring XD
  • 7. Clickstream ftps.vmware.com raw data files firewall Daily push of Clickstream Logs Data Ingestion to Isilon HDFS via Spring XD Lookup Logs Clickstream Logs Adv. Analytics Users • Data Cleaning • Better Consumable Structured data • Data Partitioning • Schema Building • Faster Analytic Power - Daily 2M Clickstream Records ( ~10GB ) ares being ingested from Adobe Omniture to Isilon HDFS - 1.5Billion Records and 554 columns and ~20TB of data - Data Cleanup and Pre Processing using PIG, Hadoop Streaming and Python Scripts - Fit the Data into HIVE/HAWQ Schema - End Users ( Data Scientists ) consume via HUE/pgAdmin/Alpine Data Lab python Data Processing Pipeline – Click Stream Data
  • 8. 8 Data Consumption – pgAdmin3 ( via HAWQ Database) ….
  • 9. 9 And visualize the results .. 37% 7% 7%6% 6% 6% 4% 4% 3% 3% 2% 2% 2% 2% 2% 1% 1% 1% 1% 1% Top 20 Countries with unique vmware.com Visits on 2015 Q1 usa jpn deu gbr chn ind can fra aus kor esp bra 34% 7% 7% 6% 10% 6% 3% 3% 2% 4% 3% 3% 2% 2% 2% 1% 1% 1% 1% 1% Top 20 Countries with unique vmware.com Visitors on 2015 Q1 usa jpn deu gbr chn ind can fra aus kor esp bra ita nld rus che twn pol mex swe Disclaimer : This is based on Synthesized Dataset for demo purpose, not Real Data
  • 10. 10 Data Consumption – HUE Hive Query to find out unique visits in VMware site 2015 Q1 0 2000000 4000000 6000000 8000000 10000000 12000000 14000000 VisitCount Month Unique Visits in 2014 and 2015 month wise visits Disclaimer : This is based on Synthesized Dataset for demo purpose, not Real Data
  • 11. 11 Advanced Data Analytics by Alpine Data Lab Time Series Analysis on Jan 2015 Clickstream Data
  • 12. 12 At VMware IT, we have established the fact that an Enterprise Big Data Analytics Platform can be successfully built and run on top of VMware Virtual Infrastructure with EMC Isilon and PHD 3.0 -with great performance Take Away …