SlideShare a Scribd company logo
Open Source Real Time BI using 
Storm, Hadoop, Titan, Druid & D3 
Anil Madan 
Sr. Director Engineering, PayPal 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
$1 in every $6 
Spent on e-commerce is 
spent through PayPal.*
Creating Tomorrow’s 
Mobile Payment 
Experiences 
25 countries with live PayPal 
fingerprint authentication 
on Samsung devices.
Helping Developers 
Innovate & Monetize 
New Mobile Apps 
Braintree launches its new API, including Pay with 
PayPal.
PayPal Now Available in 203 Markets 
10 new markets added in the second quarter, 
making PayPal available to 80 million new internet users. 
Paraguay 
Côte d’Ivoire 
Nigeria 
Monaco 
Belarus 
Montenegro 
Moldova 
Macedonia 
Cameroon 
Zimbabwe
How can we 
help them to 
complete their 
1st payment? 
Business Problem 
Acquisition Awareness Activation Adoption 
Where do 
prospects 
sign up for 
accounts? 
How do 
prospective 
customers 
learn about 
PayPal? 
How can we 
help them use 
PayPal even 
more? 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. We need to better understand our customers…
How we solved it… 
Tracking Servers 
Mobile 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 
Direct/Home 
Page 
Product 
Experiences 
Search Engine 
Marketing 
Transaction 
Emails 
Tracking Metadata 
Tool 
Taxonomy 
Tracking Event 
Service 
Tag 
Catalog 
Tracking Validation 
Service 
Real Time Systems 
Marketing 
Segmentation 
Experimentation 
Metadata 
Big Data 
Exploratory Analytics Attribution Predictive Analytics
Metadata Instrumentation Collection Processing Analytics 
Server Side 
Events 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 
Pathing 
Store 
DRUID 
Metrics 
Store 
Reporting & 
Visualization 
Logical View 
Client Side 
Events 
Page 
Performance 
Events 
Collection 
Service 
Sessionization 
Behavioral 
Metrics 
Marketing 
Metrics 
Performance 
Metrics 
Operational Metrics (OpenTSDB) 
Real Time 
Event 
Metrics
Metadata –Logical Entity Model 
TEMPLATE PAGE 
COMPONENTS 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 
LINK 
TAGS
Metadata – Logical Event Model 
Impression 
Event 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 
Tracking 
Event 
Reaction 
Event 
Component 
Impression 
Event 
Ad 
Impression 
Event 
Click 
Event 
Click-Through 
Event 
Mouse-over 
Event 
Entry 
Event 
Exit 
Event 
Outcome 
Event 
Page 
Impression 
Event 
Client Page 
Impression 
Event 
Server Page 
Impression 
Event
Metadata - Self-Service Management Workflow… 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 11
DATA PIPELINE 
Processing Analysis & 
Customers 
Client Visualization 
Side 
Metadata 
HTTP 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 
Performance 
Collection 
Metrics 
Tools 
REST 
Spout 
Bot 
flagging 
Bolt 
Sessionization Aggregation 
R 
E 
S 
Proxy T 
Server 
Side 
Geo 
Enrichment 
Bolt R 
e 
p 
o 
r 
ti 
n 
g 
Data Stores 
Druid 
Apache 
Titan 
Developers 
Product Owners 
Meta 
data 
Reporting 
Consumers 
Metadata 
Service
Druid Architecture 
• Open-source 
• Distributed 
• Real-time 
• Highly-Available Data store 
• Column-oriented 
• Approximate or Exact 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Real Time Nodes 
• Ingest data and buffer events in 
memory 
• Incremental indexing 
• Query data as soon as it is 
ingested 
• Periodically persist collected 
events to disk 
• Combine multiple disk indexes 
to create immutable ‘segments’ 
• Log-structured merge-tree 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 14
Druid Architecture 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Historical Nodes 
• Load immutable read-optimized data 
from deep storage 
• Memory mapped storage engine 
• Caches segments 
• Supports tiered storage 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 16
Druid Architecture 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Druid Systems Overview 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 18
Metrics & Dimensions 
"type": "doubleSum", 
"name": "pageviews", 
"fieldName": "PV" 
}, 
{ 
"type": "doubleSum", 
"name": "bounces", 
"fieldName": "bnc" 
}, 
.... 
{ 
"type": "hyperUnique", 
"name": "unique_visits", 
"fieldName": "user_session_guid" 
}, 
{ 
"type": "hyperUnique", 
"name": "unique_visitors", 
"fieldName": "user_guid" 
} 
2014/06/11/10", 
"filter": "part-", 
"parser": { 
"type": "string", 
"timestampSpec": { 
"column": "timestamp", 
"format": "auto" 
}, 
"data": { 
"format": "json", 
"dimensions": [ 
"timestamp", 
"USER_GUID", 
"USER_SESSION_GUID", 
"PAGE_GROUP", 
"PAGE_NAME", 
"PAGEGROUP_LINK_NAME", 
"PAGE_LINK_NAME", 
… 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 19 
Standard 
Metrics 
Estimated 
Metrics 
HyperLogLog 
Dimensions
Sessionization 
Events VisitContainer 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 20 
Visitor 
ID 
Session 
ID 
Timestamp Event 
Payload 
V1 S1 2014-10-16 
05:12 
E1 
V2 S2 2014-10-16 
05:14 
E2 
V1 S1 2014-10-16 
05:15 
E3 
V1 S1 2014-10-16 
05:20 
E4 
V2 S2 2014-10-16 
05:21 
E5 
V1 S3 2014-10-16 
05:25 
E6 
… … … … 
Visitor 
ID 
Session 
ID 
Payload 
V1 S1 sf, mac, {flash, quicktime}, {ca, 
usa}, 480 secs,…. 
E1 
E3 
E4 
V2 S2 ff, win, {acrobat, mediaplayer}. 
{wb, in}, 420 secs….. 
E2 
E5 
V1 S3 sf, mac, {quicktime, java}, {on, ca}, 
60 secs 
E6
Druid Storage – Columns & Dictionaries 
Timestamp (Hr) Sessi 
on 
ID 
Country OS User 
Agent 
Page Name 
Page Name 
0 
1 
2014-10-16 05 S1 US MAC SF Login 
AccountOverview 
0 
2 
3 
0 
2 
4 
0 
5 
4 
0 
5 
2014-10-16 05 S2 DE WIN IE Login 
PaymentReview 
AccountHistory 
2014-10-16 05 S3 US LNX FF Login 
PaymentReview 
Checkout 
2014-10-16 05 S4 UK LNX FF Login 
Profile 
Checkout 
2014-10-16 05 S5 DE WIN CR Login 
Profile 
0 
1 
4 
2014-10-16 05 S6 UK MAC SF Login 
AccountOverview 
Checkout 
Dictionary 
Login 0 
AccountOvervie 
w 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 21 
1 
PaymentReview 2 
AccountHistory 3 
Checkout 4 
LZF Profile 5
Druid Data Structure - Bitmap Indices 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 22
Herald – Self Service Analytics 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 23
Herald – Self Service Analytics 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 24
Druid Metrics 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 25
Pathing 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 26 
Enter
Fallout Reports 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 27
Pathing 
A->B->C->D->X->A->M and A->B->C->D->E 
Visitor ID Current Page Next Page 1 Next Page 2 Prev Page 1 Prev Page 2 
S1 A B C null null 
S1 B C D A null 
S1 C D X B A 
S1 D X A C B 
S1 X A M D C 
S1 A M null X D 
S1 M Null null A X 
S2 A B C null Null 
S2 B C D null A 
S2 C D E B A 
S2 D E Null C B 
S2 E Null null D C 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 28
Pathing 
Next Page 
{ 
“queryType” : “groupBy” 
“dimensions” : (“current_page”, “dimensions like country, segmentation etc”} 
“aggregations” : [ 
{ “type”: “count”, “name”: “next_page_count”, “fieldname” : “next_page, next_page2” }] 
“filter”: { “type”: “selector”, “dimension”: “current_page”, “value”: “C” } 
} 
Previous Page 
{ 
“queryType” : “groupBy” 
“dimensions” : {“current_page”, “dimensions like country, segmentations etc”} 
“aggregations” : [ 
{ “type”: “count”, “name”: “prev_page_count”, “fieldname” : “prev_page1, prev_page2” }] 
“filter”: { “type”: “selector”, “dimension”: “current_page”, “value”: “C” } 
} 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 29
A->B->C->D->X->A->M 
A->D-> X->M 
“queryType” : “search” 
“dimensions” : { “current_page_path_count”, “dimensions like country, segmentation 
etc”} 
“filter”: { “type”: “regex”, “dimension”: “next_page_path”, “pattern”: “^A*D*X*M$” } 
} 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 30 
Fallout 
• Apply them to the dictionary 
• Figure out the values that match 
• Take those bitmap indices 
• OR the bitmap indices together 
• Use the output bitmap as the filter
Model View 
Controller 
Directives NVD3 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 31 
CLIENT SERVER 
Herald Architecture
SSO 
Druid 
Herald Deployment 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 32
Adhoc Graph Analytics 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 33 
Name: 
Login_20141 
01611 
Country: US 
Count: 15 
Name: 
AccountOver 
view_201410 
1611 
Name: 
PaymentRevi 
ew_ 
2014101611 
Name: 
Checkout_20 
14101611 
Country: US 
Count: 5 
Country: US 
Count: 5 
Country: US 
Count: 10 
5 
8 
7 
6
Name: 
Login_2014 
101611 
Country: US 
Count: 15 
Name: 
AccountOv 
erview_201 
4101611 
Name: 
PaymentRe 
view_2014 
101611 
Name: 
Checkout_ 
201410161 
1 
Country: US 
Count: 5 
6 
Country: US 
Count: 5 
7 
Country: US 
Count: 10 
5 
8 
gremlin> g.v(‘Name’, ‘Login_2014101611'). 
as('x’). 
outE.inV.loop('x') 
{it.loops < 4} 
{it.object.getProperty('name') == 
'Checkout_2014101611'}.path 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 34
Summary 
• Problem 
• Understand our customer behavior 
• Across disparate channels & experiences 
• Solution 
• Democratize data 
• Consistent standardized metadata 
• Disciplined instrumentation 
• Distributed scalable backend for adhoc & interactive analytics 
• Self-service BI through modern visualization tools 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 35
Questions ? 
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

More Related Content

PDF
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
PPTX
SRE-iously! Reliability!
PDF
The path to success with Graph Database and Graph Data Science
PPT
It Service Management Implementation Overview
PDF
Cloud native-apps-architectures
PPTX
Platforms, Platform Engineering, & Platform as a Product
PPTX
ServiceNow Overview
PDF
Introduction to Istio Service Mesh
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
SRE-iously! Reliability!
The path to success with Graph Database and Graph Data Science
It Service Management Implementation Overview
Cloud native-apps-architectures
Platforms, Platform Engineering, & Platform as a Product
ServiceNow Overview
Introduction to Istio Service Mesh

What's hot (20)

PDF
CI/CD Tools Universe: The Ultimate List
PPT
Metodologia DSDM
PPTX
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...
PDF
Observability & Datadog
PPT
ITSM Presentation
PPTX
The Customer Journey Is a Graph
PPS
OSS Service Assurance -Concept Presentation by Biju M Rr
PPTX
DevOps explained
PPTX
What is DevOps? What is DevOps CoE?
PDF
Provisioning Datadog with Terraform
PPTX
Dynatrace
PPTX
Model Context Protocol - path to LLM standartization
PPTX
SRE vs DevOps
PDF
Slide DevSecOps Microservices
PPTX
Introduction to ITIL 4 and IT service management
PDF
Elasticsearch
PDF
CompTIA IT Skills Presentation
PDF
Introduction to Open Source RAG and RAG Evaluation
PDF
Kubeflow
PPTX
Service mesh
CI/CD Tools Universe: The Ultimate List
Metodologia DSDM
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...
Observability & Datadog
ITSM Presentation
The Customer Journey Is a Graph
OSS Service Assurance -Concept Presentation by Biju M Rr
DevOps explained
What is DevOps? What is DevOps CoE?
Provisioning Datadog with Terraform
Dynatrace
Model Context Protocol - path to LLM standartization
SRE vs DevOps
Slide DevSecOps Microservices
Introduction to ITIL 4 and IT service management
Elasticsearch
CompTIA IT Skills Presentation
Introduction to Open Source RAG and RAG Evaluation
Kubeflow
Service mesh
Ad

Viewers also liked (20)

PPTX
EAP - Accelerating behavorial analytics at PayPal using Hadoop
PPT
Big data – can it deliver speed and accuracy v1
PPTX
July 2014 HUG : Pushing the limits of Realtime Analytics using Druid
PPTX
Druid realtime indexing
PDF
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
PDF
Aggregated queries with Druid on terrabytes and petabytes of data
PPTX
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
PPTX
Scalable Real-time analytics using Druid
PDF
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
PDF
Building a Data Pipeline from Scratch - Joe Crobak
PPT
Hadoop at eBay
PDF
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
PPTX
Big Data: It's More Than Volume, Paypal
PPTX
Monitoring @ scale over diverse data sources @ PayPal - Druid, TSDB, Hadoop
PDF
Big- Data and Risk Management - Ido Lustig, PayPal
PDF
Druid at SF Big Analytics 2015-12-01
PPTX
Using druid for interactive count distinct queries at scale @ nmc
PDF
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
PDF
Interactive analytics at scale with druid
PDF
Clash of the Titans: Releasing the Kraken | NodeJS @paypal
EAP - Accelerating behavorial analytics at PayPal using Hadoop
Big data – can it deliver speed and accuracy v1
July 2014 HUG : Pushing the limits of Realtime Analytics using Druid
Druid realtime indexing
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Aggregated queries with Druid on terrabytes and petabytes of data
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Scalable Real-time analytics using Druid
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
Building a Data Pipeline from Scratch - Joe Crobak
Hadoop at eBay
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Big Data: It's More Than Volume, Paypal
Monitoring @ scale over diverse data sources @ PayPal - Druid, TSDB, Hadoop
Big- Data and Risk Management - Ido Lustig, PayPal
Druid at SF Big Analytics 2015-12-01
Using druid for interactive count distinct queries at scale @ nmc
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
Interactive analytics at scale with druid
Clash of the Titans: Releasing the Kraken | NodeJS @paypal
Ad

Similar to PayPal Real Time Analytics (20)

PPTX
PayPal couchbase 2014
PDF
Neo4j Aura on AWS: The Customer Choice for Graph Databases
PDF
Paypal Clone Script : Bridging Finance and Technology for Cross-Border Payment
PPTX
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
PDF
[WSO2Con Asia 2018] Patterns for Building Streaming Apps
PDF
Redesigning PayPal APIs for Scale and Simplicity - QCon San Francisco 2013
PPTX
New Approaches for Fraud Detection on Apache Kafka and KSQL
PDF
APIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , Kong
PPTX
Building upon existing infrastructure for Mobile Applications with WSO2
PDF
When Data Visualizations and Data Imports Just Don’t Work
PDF
fundamentalsofeventdrivenmicroservices11728489736099.pdf
PDF
Transforming Financial Services with Event Streaming Data
PDF
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
PDF
Modernizing i5 Applications
PPTX
Azure Stream Analytics : Analyse Data in Motion
PDF
"Fintech inside of a SaaS powered by 2000+ Microservices", Volodymyr Malyk
PPTX
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
PDF
Eventos y Microservicios - Santander TechTalk
PDF
An Entry Point to Impactful Open Banking Architecture
PDF
From a hack to Data Mesh (Devoxx 2022)
PayPal couchbase 2014
Neo4j Aura on AWS: The Customer Choice for Graph Databases
Paypal Clone Script : Bridging Finance and Technology for Cross-Border Payment
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
[WSO2Con Asia 2018] Patterns for Building Streaming Apps
Redesigning PayPal APIs for Scale and Simplicity - QCon San Francisco 2013
New Approaches for Fraud Detection on Apache Kafka and KSQL
APIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , Kong
Building upon existing infrastructure for Mobile Applications with WSO2
When Data Visualizations and Data Imports Just Don’t Work
fundamentalsofeventdrivenmicroservices11728489736099.pdf
Transforming Financial Services with Event Streaming Data
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
Modernizing i5 Applications
Azure Stream Analytics : Analyse Data in Motion
"Fintech inside of a SaaS powered by 2000+ Microservices", Volodymyr Malyk
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Eventos y Microservicios - Santander TechTalk
An Entry Point to Impactful Open Banking Architecture
From a hack to Data Mesh (Devoxx 2022)

Recently uploaded (20)

PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Introduction to the R Programming Language
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Computer network topology notes for revision
PPTX
Managing Community Partner Relationships
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Leprosy and NLEP programme community medicine
PDF
Mega Projects Data Mega Projects Data
PDF
Transcultural that can help you someday.
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
IB Computer Science - Internal Assessment.pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Introduction-to-Cloud-ComputingFinal.pptx
Introduction to the R Programming Language
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
IBA_Chapter_11_Slides_Final_Accessible.pptx
Qualitative Qantitative and Mixed Methods.pptx
Computer network topology notes for revision
Managing Community Partner Relationships
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Leprosy and NLEP programme community medicine
Mega Projects Data Mega Projects Data
Transcultural that can help you someday.
ISS -ESG Data flows What is ESG and HowHow
Introduction to Knowledge Engineering Part 1
climate analysis of Dhaka ,Banglades.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf

PayPal Real Time Analytics

  • 1. Open Source Real Time BI using Storm, Hadoop, Titan, Druid & D3 Anil Madan Sr. Director Engineering, PayPal © 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
  • 2. $1 in every $6 Spent on e-commerce is spent through PayPal.*
  • 3. Creating Tomorrow’s Mobile Payment Experiences 25 countries with live PayPal fingerprint authentication on Samsung devices.
  • 4. Helping Developers Innovate & Monetize New Mobile Apps Braintree launches its new API, including Pay with PayPal.
  • 5. PayPal Now Available in 203 Markets 10 new markets added in the second quarter, making PayPal available to 80 million new internet users. Paraguay Côte d’Ivoire Nigeria Monaco Belarus Montenegro Moldova Macedonia Cameroon Zimbabwe
  • 6. How can we help them to complete their 1st payment? Business Problem Acquisition Awareness Activation Adoption Where do prospects sign up for accounts? How do prospective customers learn about PayPal? How can we help them use PayPal even more? © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. We need to better understand our customers…
  • 7. How we solved it… Tracking Servers Mobile © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. Direct/Home Page Product Experiences Search Engine Marketing Transaction Emails Tracking Metadata Tool Taxonomy Tracking Event Service Tag Catalog Tracking Validation Service Real Time Systems Marketing Segmentation Experimentation Metadata Big Data Exploratory Analytics Attribution Predictive Analytics
  • 8. Metadata Instrumentation Collection Processing Analytics Server Side Events © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. Pathing Store DRUID Metrics Store Reporting & Visualization Logical View Client Side Events Page Performance Events Collection Service Sessionization Behavioral Metrics Marketing Metrics Performance Metrics Operational Metrics (OpenTSDB) Real Time Event Metrics
  • 9. Metadata –Logical Entity Model TEMPLATE PAGE COMPONENTS © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. LINK TAGS
  • 10. Metadata – Logical Event Model Impression Event © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. Tracking Event Reaction Event Component Impression Event Ad Impression Event Click Event Click-Through Event Mouse-over Event Entry Event Exit Event Outcome Event Page Impression Event Client Page Impression Event Server Page Impression Event
  • 11. Metadata - Self-Service Management Workflow… © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 11
  • 12. DATA PIPELINE Processing Analysis & Customers Client Visualization Side Metadata HTTP © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. Performance Collection Metrics Tools REST Spout Bot flagging Bolt Sessionization Aggregation R E S Proxy T Server Side Geo Enrichment Bolt R e p o r ti n g Data Stores Druid Apache Titan Developers Product Owners Meta data Reporting Consumers Metadata Service
  • 13. Druid Architecture • Open-source • Distributed • Real-time • Highly-Available Data store • Column-oriented • Approximate or Exact © 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
  • 14. Real Time Nodes • Ingest data and buffer events in memory • Incremental indexing • Query data as soon as it is ingested • Periodically persist collected events to disk • Combine multiple disk indexes to create immutable ‘segments’ • Log-structured merge-tree © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 14
  • 15. Druid Architecture © 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
  • 16. Historical Nodes • Load immutable read-optimized data from deep storage • Memory mapped storage engine • Caches segments • Supports tiered storage © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 16
  • 17. Druid Architecture © 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
  • 18. Druid Systems Overview © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 18
  • 19. Metrics & Dimensions "type": "doubleSum", "name": "pageviews", "fieldName": "PV" }, { "type": "doubleSum", "name": "bounces", "fieldName": "bnc" }, .... { "type": "hyperUnique", "name": "unique_visits", "fieldName": "user_session_guid" }, { "type": "hyperUnique", "name": "unique_visitors", "fieldName": "user_guid" } 2014/06/11/10", "filter": "part-", "parser": { "type": "string", "timestampSpec": { "column": "timestamp", "format": "auto" }, "data": { "format": "json", "dimensions": [ "timestamp", "USER_GUID", "USER_SESSION_GUID", "PAGE_GROUP", "PAGE_NAME", "PAGEGROUP_LINK_NAME", "PAGE_LINK_NAME", … © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 19 Standard Metrics Estimated Metrics HyperLogLog Dimensions
  • 20. Sessionization Events VisitContainer © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 20 Visitor ID Session ID Timestamp Event Payload V1 S1 2014-10-16 05:12 E1 V2 S2 2014-10-16 05:14 E2 V1 S1 2014-10-16 05:15 E3 V1 S1 2014-10-16 05:20 E4 V2 S2 2014-10-16 05:21 E5 V1 S3 2014-10-16 05:25 E6 … … … … Visitor ID Session ID Payload V1 S1 sf, mac, {flash, quicktime}, {ca, usa}, 480 secs,…. E1 E3 E4 V2 S2 ff, win, {acrobat, mediaplayer}. {wb, in}, 420 secs….. E2 E5 V1 S3 sf, mac, {quicktime, java}, {on, ca}, 60 secs E6
  • 21. Druid Storage – Columns & Dictionaries Timestamp (Hr) Sessi on ID Country OS User Agent Page Name Page Name 0 1 2014-10-16 05 S1 US MAC SF Login AccountOverview 0 2 3 0 2 4 0 5 4 0 5 2014-10-16 05 S2 DE WIN IE Login PaymentReview AccountHistory 2014-10-16 05 S3 US LNX FF Login PaymentReview Checkout 2014-10-16 05 S4 UK LNX FF Login Profile Checkout 2014-10-16 05 S5 DE WIN CR Login Profile 0 1 4 2014-10-16 05 S6 UK MAC SF Login AccountOverview Checkout Dictionary Login 0 AccountOvervie w © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 21 1 PaymentReview 2 AccountHistory 3 Checkout 4 LZF Profile 5
  • 22. Druid Data Structure - Bitmap Indices © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 22
  • 23. Herald – Self Service Analytics © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 23
  • 24. Herald – Self Service Analytics © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 24
  • 25. Druid Metrics © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 25
  • 26. Pathing © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 26 Enter
  • 27. Fallout Reports © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 27
  • 28. Pathing A->B->C->D->X->A->M and A->B->C->D->E Visitor ID Current Page Next Page 1 Next Page 2 Prev Page 1 Prev Page 2 S1 A B C null null S1 B C D A null S1 C D X B A S1 D X A C B S1 X A M D C S1 A M null X D S1 M Null null A X S2 A B C null Null S2 B C D null A S2 C D E B A S2 D E Null C B S2 E Null null D C © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 28
  • 29. Pathing Next Page { “queryType” : “groupBy” “dimensions” : (“current_page”, “dimensions like country, segmentation etc”} “aggregations” : [ { “type”: “count”, “name”: “next_page_count”, “fieldname” : “next_page, next_page2” }] “filter”: { “type”: “selector”, “dimension”: “current_page”, “value”: “C” } } Previous Page { “queryType” : “groupBy” “dimensions” : {“current_page”, “dimensions like country, segmentations etc”} “aggregations” : [ { “type”: “count”, “name”: “prev_page_count”, “fieldname” : “prev_page1, prev_page2” }] “filter”: { “type”: “selector”, “dimension”: “current_page”, “value”: “C” } } © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 29
  • 30. A->B->C->D->X->A->M A->D-> X->M “queryType” : “search” “dimensions” : { “current_page_path_count”, “dimensions like country, segmentation etc”} “filter”: { “type”: “regex”, “dimension”: “next_page_path”, “pattern”: “^A*D*X*M$” } } © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 30 Fallout • Apply them to the dictionary • Figure out the values that match • Take those bitmap indices • OR the bitmap indices together • Use the output bitmap as the filter
  • 31. Model View Controller Directives NVD3 © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 31 CLIENT SERVER Herald Architecture
  • 32. SSO Druid Herald Deployment © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 32
  • 33. Adhoc Graph Analytics © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 33 Name: Login_20141 01611 Country: US Count: 15 Name: AccountOver view_201410 1611 Name: PaymentRevi ew_ 2014101611 Name: Checkout_20 14101611 Country: US Count: 5 Country: US Count: 5 Country: US Count: 10 5 8 7 6
  • 34. Name: Login_2014 101611 Country: US Count: 15 Name: AccountOv erview_201 4101611 Name: PaymentRe view_2014 101611 Name: Checkout_ 201410161 1 Country: US Count: 5 6 Country: US Count: 5 7 Country: US Count: 10 5 8 gremlin> g.v(‘Name’, ‘Login_2014101611'). as('x’). outE.inV.loop('x') {it.loops < 4} {it.object.getProperty('name') == 'Checkout_2014101611'}.path © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 34
  • 35. Summary • Problem • Understand our customer behavior • Across disparate channels & experiences • Solution • Democratize data • Consistent standardized metadata • Disciplined instrumentation • Distributed scalable backend for adhoc & interactive analytics • Self-service BI through modern visualization tools © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 35
  • 36. Questions ? © 2014 PayPal Inc. All rights reserved. Confidential and proprietary.