SlideShare a Scribd company logo
Physical Design for
Non-relational Data Systems
Michael Mior • University of Waterloo
Proper design and configuration of
data systems is critical for achieving
good performance
2
3
Many tools exist for relational
database design optimization
Source: https://www.databasejournal.com/features/mssql/article.php/10894_3523616_2/Index-Tuning-Wizard.htm
https://dev.mysql.com/doc/mysql-monitor/4.0/en/mem-qanal-using-ui.html
Microsoft AutoAdmin (1998)
DB2 Design Advisor (2004)
Oracle SQL Tuning (2004)
We want applications to be up 24/7
We're frequently dealing with changing
data or with unstructured data
We require sub-second responses to queries
4 Source: Mike Loukides, VP Content Strategy, O’Reilly Media
Relational databases are not
always sufficient for these uses
“Over 30 years, we've learned how to
write business intelligence
applications on top of relational
databases -- there are patterns. With
NoSQL today, we have no cookie
cutters. We don't have any blueprints.”
--Ravi Krishnappa, NetApp solutions architect
5 Source: TechTarget, 2015
• NoSQL Database Design Optimization
• Understanding Existing NoSQL Designs
• Optimizing Big Data Applications
• NoSQL Database Design Optimization
• Understanding Existing NoSQL Designs
• Optimizing Big Data Applications
Model column families around query patterns
But start your design with entities and relationships, if you can
De-normalize and duplicate for read performance
But don’t de-normalize if you don’t need to
Leverage wide rows for ordering, grouping, and filtering
But don’t go too wide
Schema Design Best Practices
Source: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/
But
But
But
?
?
?
8
NoSQL Application Development
Requirements ImplementationData Model
App LogicDB Access
NoSE[MSAL, ICDE ‘16] [MSAL, TKDE ‘17]9
Database
Design
Example
Comment
com_id
com_date
text
User
user_id
nickname
Post
user_id
post_date
title
10
Database
Design
Example
11
SELECT post_id, post_title
FROM users u JOIN comments
c ON u.user_id = c.user_id
JOIN posts p
ON p.post_id = c.post_id
ORDER BY p.post_date
Query
Find information on all posts a user has
commented on in order by post date
Database
Design
Example
user_id
↓
nickname
comment_id
post_id
↓
title
post_date
comment_id
↓
post_id
post_date
nickname
nickname
↓
title
post_date
Execution A
Execution B
12
NoSE
Workload
Query Plans
Data Model
1. Candidate Enumeration
13
2. Query Planning
3. Design Optimization
4. Plan Recommendation
Database Design
14
3
4 5
Database Design Optimization
NoSE considers all
possible query plans
and picks the one
with minimum
expected cost
Evaluation
15
Overall workload performance
improves by 5x
• NoSQL Database Design Optimization
• Understanding Existing NoSQL Designs
• Optimizing Big Data Applications
Physical
Logical
17
{user_id: 1, post_date: "2017-04-05",
com_id: 3, …}
{user_id: 2, post_date: "2017-04-05",
com_id: 7, …}
{post_id: 6, com_date: "2017-04-03",
com_id: 3, user_id: 1, …}
{post_id: 6, com_date: "2017-04-01",
com_id: 7, user_id: 2, …}
?
Existing NoSQL designs are a black box
?!?
JSON!
Removes redundancy implied by both
functional and inclusion dependencies
Recovering
Logical
Schemas
Extract the structure of existing data
Discover dependencies
Produce a logical model of the database
18
user_comments
{░░░░░░░: ░, ░░░░░░░░░: "░░░░░░░░░░",
░░░░░░: ░, …}
{░░░░░░░: ░, ░░░░░░░░░: "░░░░░░░░░░",
░░░░░░: ░, …}
comments_by_date
{░░░░░░░: ░, ░░░░░░░░: "░░░░░░░░░░",
░░░░░░: ░, ░░░░░░░: ░, …}
{░░░░░░░: ░, ░░░░░░░░: "░░░░░░░░░░",
░░░░░░: ░, ░░░░░░░: ░, …}
We want to go from raw data
to a logical model
Comment
User
Post
19 [MS, ER ‘18] (to appear)
20
user_comments
user_id post_date com_id post_id title
1 2017-04-05 3 6 Stargate
2 2017-04-05 7 6 Stargate
Data on the same logical entity
appears multiple times
user_comments
user_id com_id post_id
1 3 6
2 7 6
posts
post_date post_id title
2017-04-05 6 Stargate
21
Post data can be
(logically) extracted
to normalize
22
user_comments_user
user_id
user_comments_post
post_id
post_date,
title
comments_by_date_post
post_id
comments_by_date_com
com_id
com_date, text
comments_by_date_user
user_id, nickname22
2323
posts
post_id
post_date,
title
comments
com_id
com_date, text
users
user_id, nickname
This is the original logical model!
Comment
User
Post
• NoSQL Database Design Optimization
• Understanding Existing NoSQL Designs
• Optimizing Big Data Applications
Apache
Spark
Model
▸ Series of lazy transformations which
are followed by actions that force
evaluation of all transformations
▸ Each step produces a resilient
distributed dataset (RDD)
▸ Intermediate results can be cached on
memory or disk, optionally serialized
25
Caching is very useful for applications that re-use an RDD multiple times.
Caching all of the generated RDDs is not a good strategy…
Caching is very useful for applications that re-use an RDD multiple times.
Caching all of the generated RDDs is not a good strategy…
…deciding which ones to cache may be challenging.
Spark Caching Best Practices
Source: https://unraveldata.com/to-cache-or-not-to-cache/26
PageRank Example
var rankGraph = graph.outerJoinVertices(...).map(...)
var iteration = 0
while (iteration < numIter) {
rankGraph.persist()
val rankUpdates = rankGraph.aggregateMessages(...)
prevRankGraph = rankGraph
rankGraph = rankGraph.outerJoinVertices(rankUpdates)
.persist()
rankGraph.edges.foreachPartition(...)
prevRankGraph.unpersist()
}
rankGraph.vertices.values.sum()
27
Transformations
var rankGraph = graph
var iteration = 0
while (iteration < numIter) {
rankGraph.persist()
val rankUpdates = rankGraph
prevRankGraph = rankGraph
rankGraph = rankGraph
.persist()
rankGraph.edges.foreachPartition(...)
prevRankGraph.unpersist()
}
rankGraph.vertices.values.sum()
.outerJoinVertices(...).map(...)
.aggregateMessages(...)
.outerJoinVertices(rankUpdates)
28
Actions
var rankGraph = graph.outerJoinVertices(...).map(...)
var iteration = 0
while (iteration < numIter) {
rankGraph.persist()
val rankUpdates = rankGraph.aggregateMessages(...)
prevRankGraph = rankGraph
rankGraph = rankGraph.outerJoinVertices(rankUpdates)
.persist()
prevRankGraph.unpersist()
}
rankGraph.edges.foreachPartition(...)
rankGraph.vertices.values.sum()
29
30
PageRank RDDs
Some RDDs are used more than once
Spark
Model
Caching
var rankGraph = graph.outerJoinVertices(...).map(...)
var iteration = 0
while (iteration < numIter) {
val rankUpdates = rankGraph.aggregateMessages(...)
prevRankGraph = rankGraph
rankGraph = rankGraph.outerJoinVertices(rankUpdates)
rankGraph.edges.foreachPartition(...)
}
rankGraph.vertices.values.sum()
rankGraph.persist()
.persist()
prevRankGraph.unpersist()
31
ReSpark
var rankGraph = graph.outerJoinVertices(...).map(...)
var iteration = 0
whileLoop (sc, iteration < numIter {
rankGraph.persist()
val rankUpdates = rankGraph.aggregateMessages(...)
prevRankGraph = rankGraph
rankGraph = rankGraph.outerJoinVertices(rankUpdates)
.persist()
rankGraph.edges.foreachPartition(...)
prevRankGraph.unpersist()
})
rankGraph.vertices.values.sum()
32
ReSpark
var rankGraph = graph.outerJoinVertices(...).map(...)
var iteration = 0
whileLoop (sc, iteration < numIter {
val rankUpdates = rankGraph.aggregateMessages(...)
prevRankGraph = rankGraph
rankGraph = rankGraph.outerJoinVertices(rankUpdates)
})
rankGraph.vertices.values.sum()
33
rankGraph: 0
ReSpark
var rankGraph = graph.outerJoinVertices(...).map(...)
var iteration = 0
whileLoop (sc, iteration < numIter {
val rankUpdates = rankGraph.aggregateMessages(...)
prevRankGraph = rankGraph
rankGraph = rankGraph.outerJoinVertices(rankUpdates)
})
rankGraph.vertices.values.sum()
34
rankGraph: 2 Persist!
PageRank
on
ReSpark
35
Without any caching,
many jobs take hours!
36
Questions?
xkcd.com

More Related Content

PPTX
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
PPTX
Big Data Analysis Patterns - TriHUG 6/27/2013
PPTX
Analysis of historical movie data by BHADRA
PPT
BigData Analytics with Hadoop and BIRT
PPTX
Intro to bigdata on gcp (1)
PDF
What is Big Data?
PDF
Introduction to Big Data & Hadoop
PDF
big data
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Big Data Analysis Patterns - TriHUG 6/27/2013
Analysis of historical movie data by BHADRA
BigData Analytics with Hadoop and BIRT
Intro to bigdata on gcp (1)
What is Big Data?
Introduction to Big Data & Hadoop
big data

What's hot (20)

PPTX
Big Data Analytics for Non-Programmers
PPTX
Lect 1 introduction
PDF
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
PPTX
Big Data Analysis Patterns with Hadoop, Mahout and Solr
PPTX
Real-time analytics with HBase
PDF
WSO2Con Asia 2014 - Simultaneous Analysis of Massive Data Streams in real-tim...
PPTX
Big Data, Baby Steps
PPTX
The Very ^ 2 Basics of R
PPTX
Topic modeling using big data analytics
PDF
Seeing at the Speed of Thought: Empowering Others Through Data Exploration
PDF
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
PDF
Introduction to Bigdata and HADOOP
PPTX
The future of Big Data tooling
PDF
R statistics with mongo db
PDF
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
PDF
JDD 2016 - Michal Matloka - Small Intro To Big Data
PDF
Data Storage and Management project Report
PDF
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
PDF
Big Data Real Time Applications
Big Data Analytics for Non-Programmers
Lect 1 introduction
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Real-time analytics with HBase
WSO2Con Asia 2014 - Simultaneous Analysis of Massive Data Streams in real-tim...
Big Data, Baby Steps
The Very ^ 2 Basics of R
Topic modeling using big data analytics
Seeing at the Speed of Thought: Empowering Others Through Data Exploration
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Bigdata and HADOOP
The future of Big Data tooling
R statistics with mongo db
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
JDD 2016 - Michal Matloka - Small Intro To Big Data
Data Storage and Management project Report
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
Big Data Real Time Applications
Ad

Similar to Physical Design for Non-Relational Data Systems (20)

PDF
Database Systems - A Historical Perspective
PDF
Understanding and building big data Architectures - NoSQL
PPT
No sql databases explained
PDF
Nosql part1 8th December
PPTX
Big Data (NJ SQL Server User Group)
PPTX
NoSQLDatabases
PPTX
Introduction to No SQL - Learn nosql databases
PPTX
No sql database
PDF
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
PDF
A Study on Graph Storage Database of NOSQL
PDF
A Study on Graph Storage Database of NOSQL
PDF
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
PPTX
NoSql Brownbag
PDF
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
PDF
Hybrid Database System for Big Data Storage and Management
PDF
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
PPTX
NOSQL PRESENTATION ON INTRRODUCTION Intro.pptx
PPTX
Introduction to Data Science NoSQL.pptx
PDF
Nosql Essentials Navigating The World Of Nonrelational Databases Kameron Huss...
PDF
History of NoSQL and Azure Documentdb feature set
Database Systems - A Historical Perspective
Understanding and building big data Architectures - NoSQL
No sql databases explained
Nosql part1 8th December
Big Data (NJ SQL Server User Group)
NoSQLDatabases
Introduction to No SQL - Learn nosql databases
No sql database
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
NoSql Brownbag
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
Hybrid Database System for Big Data Storage and Management
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
NOSQL PRESENTATION ON INTRRODUCTION Intro.pptx
Introduction to Data Science NoSQL.pptx
Nosql Essentials Navigating The World Of Nonrelational Databases Kameron Huss...
History of NoSQL and Azure Documentdb feature set
Ad

More from Michael Mior (6)

PDF
A view from the ivory tower: Participating in Apache as a member of academia
PDF
Apache Calcite: One Frontend to Rule Them All
PDF
Locomotor: transparent migration of client-side database code
PDF
Automated Schema Design for NoSQL Databases
PDF
NoSE: Schema Design for NoSQL Applications
PDF
FlurryDB: A Dynamically Scalable Relational Database with Virtual Machine Clo...
A view from the ivory tower: Participating in Apache as a member of academia
Apache Calcite: One Frontend to Rule Them All
Locomotor: transparent migration of client-side database code
Automated Schema Design for NoSQL Databases
NoSE: Schema Design for NoSQL Applications
FlurryDB: A Dynamically Scalable Relational Database with Virtual Machine Clo...

Recently uploaded (20)

PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Unlocking AI with Model Context Protocol (MCP)
PPT
Teaching material agriculture food technology
PPTX
Machine Learning_overview_presentation.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
cuic standard and advanced reporting.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Encapsulation theory and applications.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Programs and apps: productivity, graphics, security and other tools
Network Security Unit 5.pdf for BCA BBA.
sap open course for s4hana steps from ECC to s4
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
A Presentation on Artificial Intelligence
Per capita expenditure prediction using model stacking based on satellite ima...
Unlocking AI with Model Context Protocol (MCP)
Teaching material agriculture food technology
Machine Learning_overview_presentation.pptx
NewMind AI Weekly Chronicles - August'25-Week II
The Rise and Fall of 3GPP – Time for a Sabbatical?
“AI and Expert System Decision Support & Business Intelligence Systems”
Dropbox Q2 2025 Financial Results & Investor Presentation
Reach Out and Touch Someone: Haptics and Empathic Computing
cuic standard and advanced reporting.pdf
Big Data Technologies - Introduction.pptx
Electronic commerce courselecture one. Pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Encapsulation theory and applications.pdf

Physical Design for Non-Relational Data Systems

  • 1. Physical Design for Non-relational Data Systems Michael Mior • University of Waterloo
  • 2. Proper design and configuration of data systems is critical for achieving good performance 2
  • 3. 3 Many tools exist for relational database design optimization Source: https://www.databasejournal.com/features/mssql/article.php/10894_3523616_2/Index-Tuning-Wizard.htm https://dev.mysql.com/doc/mysql-monitor/4.0/en/mem-qanal-using-ui.html Microsoft AutoAdmin (1998) DB2 Design Advisor (2004) Oracle SQL Tuning (2004)
  • 4. We want applications to be up 24/7 We're frequently dealing with changing data or with unstructured data We require sub-second responses to queries 4 Source: Mike Loukides, VP Content Strategy, O’Reilly Media Relational databases are not always sufficient for these uses
  • 5. “Over 30 years, we've learned how to write business intelligence applications on top of relational databases -- there are patterns. With NoSQL today, we have no cookie cutters. We don't have any blueprints.” --Ravi Krishnappa, NetApp solutions architect 5 Source: TechTarget, 2015
  • 6. • NoSQL Database Design Optimization • Understanding Existing NoSQL Designs • Optimizing Big Data Applications
  • 7. • NoSQL Database Design Optimization • Understanding Existing NoSQL Designs • Optimizing Big Data Applications
  • 8. Model column families around query patterns But start your design with entities and relationships, if you can De-normalize and duplicate for read performance But don’t de-normalize if you don’t need to Leverage wide rows for ordering, grouping, and filtering But don’t go too wide Schema Design Best Practices Source: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/ But But But ? ? ? 8
  • 9. NoSQL Application Development Requirements ImplementationData Model App LogicDB Access NoSE[MSAL, ICDE ‘16] [MSAL, TKDE ‘17]9
  • 11. Database Design Example 11 SELECT post_id, post_title FROM users u JOIN comments c ON u.user_id = c.user_id JOIN posts p ON p.post_id = c.post_id ORDER BY p.post_date Query Find information on all posts a user has commented on in order by post date
  • 13. NoSE Workload Query Plans Data Model 1. Candidate Enumeration 13 2. Query Planning 3. Design Optimization 4. Plan Recommendation Database Design
  • 14. 14 3 4 5 Database Design Optimization NoSE considers all possible query plans and picks the one with minimum expected cost
  • 16. • NoSQL Database Design Optimization • Understanding Existing NoSQL Designs • Optimizing Big Data Applications
  • 17. Physical Logical 17 {user_id: 1, post_date: "2017-04-05", com_id: 3, …} {user_id: 2, post_date: "2017-04-05", com_id: 7, …} {post_id: 6, com_date: "2017-04-03", com_id: 3, user_id: 1, …} {post_id: 6, com_date: "2017-04-01", com_id: 7, user_id: 2, …} ? Existing NoSQL designs are a black box ?!? JSON!
  • 18. Removes redundancy implied by both functional and inclusion dependencies Recovering Logical Schemas Extract the structure of existing data Discover dependencies Produce a logical model of the database 18
  • 19. user_comments {░░░░░░░: ░, ░░░░░░░░░: "░░░░░░░░░░", ░░░░░░: ░, …} {░░░░░░░: ░, ░░░░░░░░░: "░░░░░░░░░░", ░░░░░░: ░, …} comments_by_date {░░░░░░░: ░, ░░░░░░░░: "░░░░░░░░░░", ░░░░░░: ░, ░░░░░░░: ░, …} {░░░░░░░: ░, ░░░░░░░░: "░░░░░░░░░░", ░░░░░░: ░, ░░░░░░░: ░, …} We want to go from raw data to a logical model Comment User Post 19 [MS, ER ‘18] (to appear)
  • 20. 20 user_comments user_id post_date com_id post_id title 1 2017-04-05 3 6 Stargate 2 2017-04-05 7 6 Stargate Data on the same logical entity appears multiple times
  • 21. user_comments user_id com_id post_id 1 3 6 2 7 6 posts post_date post_id title 2017-04-05 6 Stargate 21 Post data can be (logically) extracted to normalize
  • 24. • NoSQL Database Design Optimization • Understanding Existing NoSQL Designs • Optimizing Big Data Applications
  • 25. Apache Spark Model ▸ Series of lazy transformations which are followed by actions that force evaluation of all transformations ▸ Each step produces a resilient distributed dataset (RDD) ▸ Intermediate results can be cached on memory or disk, optionally serialized 25
  • 26. Caching is very useful for applications that re-use an RDD multiple times. Caching all of the generated RDDs is not a good strategy… Caching is very useful for applications that re-use an RDD multiple times. Caching all of the generated RDDs is not a good strategy… …deciding which ones to cache may be challenging. Spark Caching Best Practices Source: https://unraveldata.com/to-cache-or-not-to-cache/26
  • 27. PageRank Example var rankGraph = graph.outerJoinVertices(...).map(...) var iteration = 0 while (iteration < numIter) { rankGraph.persist() val rankUpdates = rankGraph.aggregateMessages(...) prevRankGraph = rankGraph rankGraph = rankGraph.outerJoinVertices(rankUpdates) .persist() rankGraph.edges.foreachPartition(...) prevRankGraph.unpersist() } rankGraph.vertices.values.sum() 27
  • 28. Transformations var rankGraph = graph var iteration = 0 while (iteration < numIter) { rankGraph.persist() val rankUpdates = rankGraph prevRankGraph = rankGraph rankGraph = rankGraph .persist() rankGraph.edges.foreachPartition(...) prevRankGraph.unpersist() } rankGraph.vertices.values.sum() .outerJoinVertices(...).map(...) .aggregateMessages(...) .outerJoinVertices(rankUpdates) 28
  • 29. Actions var rankGraph = graph.outerJoinVertices(...).map(...) var iteration = 0 while (iteration < numIter) { rankGraph.persist() val rankUpdates = rankGraph.aggregateMessages(...) prevRankGraph = rankGraph rankGraph = rankGraph.outerJoinVertices(rankUpdates) .persist() prevRankGraph.unpersist() } rankGraph.edges.foreachPartition(...) rankGraph.vertices.values.sum() 29
  • 30. 30 PageRank RDDs Some RDDs are used more than once
  • 31. Spark Model Caching var rankGraph = graph.outerJoinVertices(...).map(...) var iteration = 0 while (iteration < numIter) { val rankUpdates = rankGraph.aggregateMessages(...) prevRankGraph = rankGraph rankGraph = rankGraph.outerJoinVertices(rankUpdates) rankGraph.edges.foreachPartition(...) } rankGraph.vertices.values.sum() rankGraph.persist() .persist() prevRankGraph.unpersist() 31
  • 32. ReSpark var rankGraph = graph.outerJoinVertices(...).map(...) var iteration = 0 whileLoop (sc, iteration < numIter { rankGraph.persist() val rankUpdates = rankGraph.aggregateMessages(...) prevRankGraph = rankGraph rankGraph = rankGraph.outerJoinVertices(rankUpdates) .persist() rankGraph.edges.foreachPartition(...) prevRankGraph.unpersist() }) rankGraph.vertices.values.sum() 32
  • 33. ReSpark var rankGraph = graph.outerJoinVertices(...).map(...) var iteration = 0 whileLoop (sc, iteration < numIter { val rankUpdates = rankGraph.aggregateMessages(...) prevRankGraph = rankGraph rankGraph = rankGraph.outerJoinVertices(rankUpdates) }) rankGraph.vertices.values.sum() 33 rankGraph: 0
  • 34. ReSpark var rankGraph = graph.outerJoinVertices(...).map(...) var iteration = 0 whileLoop (sc, iteration < numIter { val rankUpdates = rankGraph.aggregateMessages(...) prevRankGraph = rankGraph rankGraph = rankGraph.outerJoinVertices(rankUpdates) }) rankGraph.vertices.values.sum() 34 rankGraph: 2 Persist!
  • 36. 36