Introducing VenmoPlus.com
-Explore your Venmo network!
Qingpeng “Q.P.” Zhang, Insight Data Engineering Fellow
Historical
transactions
Real time
transactions
Pipeline
2013
Biggest Challenge:
● Calculate/Query graph distance in real time
● Cache of 2nd degree friends list
● Partitioned GraphDB
● Good for Linkedin (hundreds of million
users, with higher degree)
● 5 million vertices (users)
● 32 million distinct edges (transactions)
● 88 million total edges (transactions)
● Cache of 2nd degree friends list
● Partitioned GraphDB
● Good for Linkedin (hundreds of million
users, with higher degree)
● 5 million vertices (users)
● 32 million distinct edges (transactions)
● 88 million total edges (transactions)
No cache (precalculation)?
No GraphDB?
Historical
transactions
Real time
transactions
Two Databases
Two Databases
420890 Graham Hadley
1630476 Leon Tang
810029 Harminder Toor
1371353 Ephraim Park
562884 Paul Min
420890 set(14935158, 562884)
1630476 set(1371353)
810029 set(190230,14935158)
1371353 set(810029,971156)
562884 set(196371,1371353)
Two Databases
Optimizations
● Two databases
● Graph algorithms optimization
● S3⇔Redis S3⇔ Elasticsearch distributedly with Spark
● ...
VenmoPlus.com
m4.xlarge
m4.large
m4.xlarge
m4.large
t2.micro
$29.11/day
About Me
● Postdoc in Lawrence Berkeley National Lab
● PhD in Computer Science, Michigan State
● BS in Physics, Nanjing U.
Certified Volunteers:
● Software Carpentry
● Data Carpentry
● American Red Cross
Christmas Eve 2014, ice storm, Michigan
Algorithm Optimization
Shortest distance -> intersection of sets (friend lists)
● 1st degree friends of A ∩ 1st degree friends of B == [] ?
● 2nd degree friends of A ∩ 1st degree friends of B == []?
Algorithms Design -2
Query distance between vertices in a historic moment in a constantly changing graph (because we
don’t pre-calculate the distance….)
● A recent transaction for a user is history and has changed the graph
● Query distance of the two users at that moment.
○ not considering that specific transaction)
○ Remove the influence of that specific transaction temporarily and restore
■ Test if that transaction is the first between the pair of users.
1 Spark m4.large 0.12 2.88
2 Spark m4.large 0.12 2.88
3 redis m4.xlarge 0.24 5.76
4 Elasticsearc
h
m4.xlarge 0.24 5.76
5 Elasticsearc
h
m4.xlarge 0.24 5.76
6 Kafka,
producer
m4.large 0.12 2.88
7 kafka m4.large 0.12 2.88
8 webserver t2.micro 0.013 0.312
https://github.com/qingpeng/VenmoPlus for more details!
$29.11/24hours
Algorithms
Distance detection between vertices in graph (1st, 2nd, 3rd friends?)
● 1st degree friends of A ∩ 1st degree friends of B == [] ?
● 2nd degree friends of A ∩ 1st degree friends of B == []?
Pipeline
Redis:
● Graph Edges: userID -> userID
● Graph Vertices: userID -> userName
In memory DB -> Fast graph updating, graph traversal, in real time
ElasticSearch:
● Everything about the transactions
Distributed -> Data storage and full text search, in real time
Big Challenge:
● Graph distance + Common connections in real time
Pipeline
Historical
transactions
This, or that? - to build graph
This, or that? - for fast searching
Lesson learned
Qingpeng zhang week5
Qingpeng zhang week5

More Related Content

PDF
Introducing VenmoPlus.com 6/27 version
PPTX
Geo data analytics
PDF
Service discovery using crd ts fun-conf
PPTX
Cassandra Lunch #59 Functions in Cassandra
PDF
umeng analytical arch
PDF
ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams wit...
PDF
Cassandra meetup slides - Oct 15 Santa Monica Coloft
PPTX
Migration strategies for a mission critical cluster
Introducing VenmoPlus.com 6/27 version
Geo data analytics
Service discovery using crd ts fun-conf
Cassandra Lunch #59 Functions in Cassandra
umeng analytical arch
ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams wit...
Cassandra meetup slides - Oct 15 Santa Monica Coloft
Migration strategies for a mission critical cluster

What's hot (6)

PPTX
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
PDF
MongoDB - Warehouse and Aggregator of Events
PDF
MySQL Spatial Extensions And Ruby
PDF
SqliteToRealm
PDF
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
PDF
Locality Sensitive Hashing By Spark
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
MongoDB - Warehouse and Aggregator of Events
MySQL Spatial Extensions And Ruby
SqliteToRealm
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Locality Sensitive Hashing By Spark
Ad

Viewers also liked (20)

PPT
Q2 final
PDF
Presentation4
PPTX
Receta para una vejez feliz
PDF
Hair factor pdf
PPSX
Vike lavike
PDF
Normas apa
PPTX
London & paris
PPTX
Prezentacja SP3
PPTX
Langkah Membuat Setting Permalink WordPress
PPTX
Itg investor presentation_06feb15
PDF
It's futvre time pdf
DOC
Kien tap
PPSX
Pg history and_programs_castellano
PPTX
PPTX
Rivers and wetlands
PDF
VenmoPlus demo week6
PPTX
LavaCon Portland 2013 - Cloudwords
PPTX
Evolution of Technology: 30 Years of Innovation to Reach the Cloud
Q2 final
Presentation4
Receta para una vejez feliz
Hair factor pdf
Vike lavike
Normas apa
London & paris
Prezentacja SP3
Langkah Membuat Setting Permalink WordPress
Itg investor presentation_06feb15
It's futvre time pdf
Kien tap
Pg history and_programs_castellano
Rivers and wetlands
VenmoPlus demo week6
LavaCon Portland 2013 - Cloudwords
Evolution of Technology: 30 Years of Innovation to Reach the Cloud
Ad

Similar to Qingpeng zhang week5 (20)

PDF
0629venmoplus
PDF
Qingpeng zhang 0713
PDF
PDF
VenmoPlus0708
PDF
Qingpeng zhang 0711
PDF
Shortest path estimation for graph
PPTX
Who’s Afraid of Graphs?
PDF
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
PDF
Challenging Web-Scale Graph Analytics with Apache Spark
PDF
Graph Algorithms - Map-Reduce Graph Processing
PDF
Advanced Analytics: Graph Database Use Cases
PPTX
Using Graph Analysis and Fraud Detection in the Fintech Industry
PPTX
Using Graph Analysis and Fraud Detection in the Fintech Industry
PPTX
distance_matrix_ch
DOCX
Bandwidth distributed denial of service attacks and defenses
DOCX
Approximate shortest distance computing
PPTX
NoSQL Tel Aviv Meetup #2: Who Is Afraid of Graphs?
PPTX
Who's afraid of graphs
PPTX
Neo4j Introduction at Imperial College London
PPT
Sigmod11 outsource shortest path
0629venmoplus
Qingpeng zhang 0713
VenmoPlus0708
Qingpeng zhang 0711
Shortest path estimation for graph
Who’s Afraid of Graphs?
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark
Graph Algorithms - Map-Reduce Graph Processing
Advanced Analytics: Graph Database Use Cases
Using Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech Industry
distance_matrix_ch
Bandwidth distributed denial of service attacks and defenses
Approximate shortest distance computing
NoSQL Tel Aviv Meetup #2: Who Is Afraid of Graphs?
Who's afraid of graphs
Neo4j Introduction at Imperial College London
Sigmod11 outsource shortest path

Recently uploaded (20)

PPTX
ANN DL UNIT 1 ANIL 13.10.24.pptxcccccccccc
PDF
Women’s Talk Session 1- Talking about women
PPTX
DiagdndigsbskshsvsjsisDiarrheal Diseases-1.pptx
PDF
BPT_Beach_Energy_FY25_half_year_results_presentation.pdf
PPTX
Unit 3 Presentation Etiquette Business and Corporate Etiquette
PPT
14001jhgASIUODOHAFJCOPJHF9SUY9GFJSLOGHJ9IOSUHG98
PDF
202s5_Luciano André Deitos Koslowski.pdf
PDF
Branding_RAMP-ML........................
PPTX
E-commerce Security and Fraud Issues and Protection
PPTX
The Mother of all Operational Terms and Graphics Presentations
PPTX
Q1 Review Spoke Centre _ Project समर्थ (1) (1).pptx
PPTX
mathsportfoliomanvi-211121071838 (1).pptx
PPTX
UNIT 1 about all the important topics that you need
PPTX
Trends in Recruitment and Talent acquisition___.pptx
PPT
ppt-of-extraction-of-metals-12th-1.pptb9
PPTX
Unit 2 CORPORATE CULTURE AND EXPECTATIONS
PPTX
Final Second DC Messeting PPT-Pradeep.M final.pptx
PDF
Beyond the Lab Coat - Perjalanan Karier di Dunia Pasca-Fisika S1
PDF
CollegePresentation.pdf hsjsisjsjsjsssoo
PPTX
430838499-Anaesthesiiiia-Equipmenooot.pptx
ANN DL UNIT 1 ANIL 13.10.24.pptxcccccccccc
Women’s Talk Session 1- Talking about women
DiagdndigsbskshsvsjsisDiarrheal Diseases-1.pptx
BPT_Beach_Energy_FY25_half_year_results_presentation.pdf
Unit 3 Presentation Etiquette Business and Corporate Etiquette
14001jhgASIUODOHAFJCOPJHF9SUY9GFJSLOGHJ9IOSUHG98
202s5_Luciano André Deitos Koslowski.pdf
Branding_RAMP-ML........................
E-commerce Security and Fraud Issues and Protection
The Mother of all Operational Terms and Graphics Presentations
Q1 Review Spoke Centre _ Project समर्थ (1) (1).pptx
mathsportfoliomanvi-211121071838 (1).pptx
UNIT 1 about all the important topics that you need
Trends in Recruitment and Talent acquisition___.pptx
ppt-of-extraction-of-metals-12th-1.pptb9
Unit 2 CORPORATE CULTURE AND EXPECTATIONS
Final Second DC Messeting PPT-Pradeep.M final.pptx
Beyond the Lab Coat - Perjalanan Karier di Dunia Pasca-Fisika S1
CollegePresentation.pdf hsjsisjsjsjsssoo
430838499-Anaesthesiiiia-Equipmenooot.pptx

Qingpeng zhang week5

  • 1. Introducing VenmoPlus.com -Explore your Venmo network! Qingpeng “Q.P.” Zhang, Insight Data Engineering Fellow
  • 3. 2013 Biggest Challenge: ● Calculate/Query graph distance in real time
  • 4. ● Cache of 2nd degree friends list ● Partitioned GraphDB ● Good for Linkedin (hundreds of million users, with higher degree) ● 5 million vertices (users) ● 32 million distinct edges (transactions) ● 88 million total edges (transactions)
  • 5. ● Cache of 2nd degree friends list ● Partitioned GraphDB ● Good for Linkedin (hundreds of million users, with higher degree) ● 5 million vertices (users) ● 32 million distinct edges (transactions) ● 88 million total edges (transactions) No cache (precalculation)? No GraphDB?
  • 7. Two Databases 420890 Graham Hadley 1630476 Leon Tang 810029 Harminder Toor 1371353 Ephraim Park 562884 Paul Min 420890 set(14935158, 562884) 1630476 set(1371353) 810029 set(190230,14935158) 1371353 set(810029,971156) 562884 set(196371,1371353)
  • 9. Optimizations ● Two databases ● Graph algorithms optimization ● S3⇔Redis S3⇔ Elasticsearch distributedly with Spark ● ...
  • 11. About Me ● Postdoc in Lawrence Berkeley National Lab ● PhD in Computer Science, Michigan State ● BS in Physics, Nanjing U. Certified Volunteers: ● Software Carpentry ● Data Carpentry ● American Red Cross Christmas Eve 2014, ice storm, Michigan
  • 12. Algorithm Optimization Shortest distance -> intersection of sets (friend lists) ● 1st degree friends of A ∩ 1st degree friends of B == [] ? ● 2nd degree friends of A ∩ 1st degree friends of B == []?
  • 13. Algorithms Design -2 Query distance between vertices in a historic moment in a constantly changing graph (because we don’t pre-calculate the distance….) ● A recent transaction for a user is history and has changed the graph ● Query distance of the two users at that moment. ○ not considering that specific transaction) ○ Remove the influence of that specific transaction temporarily and restore ■ Test if that transaction is the first between the pair of users.
  • 14. 1 Spark m4.large 0.12 2.88 2 Spark m4.large 0.12 2.88 3 redis m4.xlarge 0.24 5.76 4 Elasticsearc h m4.xlarge 0.24 5.76 5 Elasticsearc h m4.xlarge 0.24 5.76 6 Kafka, producer m4.large 0.12 2.88 7 kafka m4.large 0.12 2.88 8 webserver t2.micro 0.013 0.312 https://github.com/qingpeng/VenmoPlus for more details! $29.11/24hours
  • 15. Algorithms Distance detection between vertices in graph (1st, 2nd, 3rd friends?) ● 1st degree friends of A ∩ 1st degree friends of B == [] ? ● 2nd degree friends of A ∩ 1st degree friends of B == []?
  • 17. Redis: ● Graph Edges: userID -> userID ● Graph Vertices: userID -> userName In memory DB -> Fast graph updating, graph traversal, in real time ElasticSearch: ● Everything about the transactions Distributed -> Data storage and full text search, in real time Big Challenge: ● Graph distance + Common connections in real time
  • 19. This, or that? - to build graph
  • 20. This, or that? - for fast searching