SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct: Fast
Interactive Queries
Anurag Khandelwal

Interactive Queries at Scale
Search Tweets by @AMPLab about #Succinct

Search
Regular Expressions
Tweets by @AMPLab about #Succinct
Links to Berkeley or Stanford domains 
.*(berkeley|stanford).edu

Search
Regular Expressions
Range Queries
All my Facebook posts between 2013 and 2016

Search
Regular Expressions
Range Queries
Graph Queries
Friends of my friends who like trekking

Search
Random Access
Regular Expressions
Range Queries
Graph Queries
Aggregate Queries
Updates
Friends of my friends who like trekking

Search
Random Access
Regular Expressions
Range Queries
Graph Queries
Aggregate Queries
Updates
Compute Platforms

Search
Random Access
Regular Expressions
Range Queries
Graph Queries
Aggregate Queries
Updates
Compute Platforms
Query Engines

Search
Random Access
Regular Expressions
Range Queries
Graph Queries
Aggregate Queries
Updates
Compute Platforms
Query Engines
Data Stores

Today’s focus on two main issues:

‣ Performance degradation when data size > memory

Throughput
(Ops)
0
500
1000
1500
2000
Input Size
1GB 2GB 4GB 8GB 16GB 32GB 64GB 128GB

‣ Handling skewed query workloads
Throughput
(Ops)
0
500
1000
1500
2000
Input Size

‣ Handling skewed query workloads
Throughput
(Ops)
0
500
1000
1500
2000
Input Size
Maximum sustainable throughput

Our Solution
BlowFish [NSDI’16]
Succinct [NSDI’15]
Succinct 
Encryption
GraphStore
KVStore
ColumnarStore
RowStore
UnstructuredData

Our Solution
‣ Compressed representation → More queries in faster storage
Succinct 
Encryption
GraphStore
KVStore
ColumnarStore
RowStore
UnstructuredData

Our Solution
‣ Rich functionality directly on compressed representation
‣ Search, RegEx, Range queries
Succinct 
Encryption
GraphStore
KVStore
ColumnarStore
RowStore
UnstructuredData

Our Solution
‣ Flexible support for diﬀerent data models
Succinct 
Encryption
GraphStore
KVStore
ColumnarStore
RowStore
UnstructuredData

Our Solution
‣ Flexible support for diﬀerent data models
‣ Handles skewed & time-varying workloads
Succinct 
Encryption
GraphStore
KVStore
ColumnarStore
RowStore
UnstructuredData

Existing Techniques
SEARCH( )
Example:

Existing Techniques
Data Scans
SEARCH( )
Example:

Existing Techniques
Data Scans
SEARCH( )
Example: Ex: Apache Spark

Existing Techniques
Data Scans
Low storage
High Latency
SEARCH( )

Existing Techniques
Data Scans Indexes
Low storage
High Latency
SEARCH( )

Existing Techniques
Data Scans Indexes
Low storage
High Latency
SEARCH( )
Example: Ex: Apache Spark Ex: SOLR

Existing Techniques
0, 10, 14, 16, 19, 26, 29
1, 4, 5, 8, 20, 22, 24
2, 15, 17, 27
3, 6, 7, 9, 12, 13, 18, 23 ..
11, 21
Data Scans Indexes
Low storage
High Latency
SEARCH( )

Existing Techniques
0, 10, 14, 16, 19, 26, 29
1, 4, 5, 8, 20, 22, 24
2, 15, 17, 27
3, 6, 7, 9, 12, 13, 18, 23 ..
11, 21
Data Scans Indexes
Low storage
High Latency
High storage
Low Latency
SEARCH( )

Succinct
Queries executed directly
on the
compressed representation
Succinct

Succinct
on the
Low Storage
Low Latency
Succinct

Succinct
on the
Low Storage
Low Latency
Succinct
What makes Succinct unique

Succinct
on the
Low Storage
Low Latency
Succinct
No additional
indexes
Query responses embedded
in the compressed
representation

Succinct
on the
Low Storage
Low Latency
Succinct
No additional
indexes
in the compressed
representation
No data scans Functionality of indexes

Succinct
on the
Low Storage
Low Latency
Succinct
No additional
indexes
in the compressed
representation
No data scans Functionality of indexes
No
decompression
Queries directly on
the compressed representation
(except for data access queries)

Succinct
on the
Low Storage
Low Latency
Succinct Scale
In-memory data sizes >= memory capacity

Succinct
on the
Low Storage
Low Latency
Succinct Scale
Complex queries
Search, range, random access, RegEx

Succinct
on the
Low Storage
Low Latency
Succinct Scale
Complex queries
Search, range, random access, RegEx
Interactivity
Avoids data scans and decompression

Succinct Data Representation
Builds on a large body of theory work

Suﬃx Arrays

Suﬃx Arrays
‣ Strong functionality (search)

Suﬃx Arrays
‣ Strong functionality (search) ‣ No structure

Suﬃx Arrays
Compression?

Suﬃx Arrays
Compression?
‣ Sample the suﬃx array

Suﬃx Arrays
Compression?
‣ Store set of pointers to compute unsampled values on the ﬂy

Suﬃx Arrays
Compression?
‣ Store set of pointers to compute unsampled values on the ﬂy
Possesses structure that enables compression!

Succinct Data Model
‣ Unstructured data
‣ Key-value stores (Voldemort, Dynamo)
‣ Document store (Elasticsearch, MongoDB)
‣ Tables (Cassandra, BigTable)
‣ And many more ....
Uniﬁed
Interface

Succinct Data Model
‣ Unstructured data
‣ Key-value stores (Voldemort, Dynamo)
‣ Document store (Elasticsearch, MongoDB)
‣ Tables (Cassandra, BigTable)
‣ And many more ....
Uniﬁed
Interface
With all the powerful queries on
values, documents, columns

Data Model & Functionality
For unstructured data:

Original Input Succinct

SEARCH( )= {0, 10, 14, 16, 19, 26, 29}
Search: returns oﬀsets of arbitrary strings in uncompressed ﬁle

SEARCH( )= {0, 10, 14, 16, 19, 26, 29}
Extract(0, 5) = { , , , , }
Extract: returns data at arbitrary oﬀsets in uncompressed ﬁle

SEARCH( )= {0, 10, 14, 16, 19, 26, 29}
Extract(0, 5) = { , , , , }
COUNT( ) = 7
Count: returns count of arbitrary strings in uncompressed ﬁle

SEARCH( )= {0, 10, 14, 16, 19, 26, 29}
Extract(0, 5) = { , , , , }
COUNT( ) = 7
Append( , , , , )
Append: appends arbitrary strings to uncompressed ﬁle

SEARCH( )= {0, 10, 14, 16, 19, 26, 29}
Extract(0, 5) = { , , , , }
COUNT( ) = 7
Append( , , , , )
Range Queries, REGULAR EXPRESSIONS

Unifying the Data Models
SEARCH(Column1, )

Unifying the Data Models
SEARCH(Column1, )SEARCH( )

Succinct Architecture
Multi-store Architecture

SuccinctStore

SuccinctStore
SuffixStore

SuccinctStore
SuffixStore
LogStore

SuccinctStore
SuffixStore
LogStore
Data APPENDS

Queries on Compressed RDDs
New Functionalities
Document store,  
Key-Value store
search on documents,
values

New Functionalities
Document store,  
Key-Value store
values
Faster operations on
RDDs
random access, ﬁlters avoid scans

New Functionalities
Document store,  
Key-Value store
values
Faster operations on
RDDs
random access, ﬁlters avoid scans
More in-memory Compressed RDDs
no decompression
overheads

Unstructured data using SuccinctRDD

import edu.berkeley.cs.succinct._ Import classes

import edu.berkeley.cs.succinct._
val rdd = ctx.textFile(…).map(_.getBytes)
val succinctRDD = rdd.succinct
Load data & compress
using Succinct

val offsets = succinctRDD.search("Berkeley")
Find all occurrences
of “Berkeley”

val count = succinctRDD.count("Berkeley")
Count #occurrences
of “Berkeley”

val bytes = succinctRDD.extract(50, 100)
val count = succinctRDD.count("Berkeley")
Extract 100 bytes
from offset 50

Key-Value Store using SuccinctKVRDD

import edu.berkeley.cs.succinct.kv._ Import classes

import edu.berkeley.cs.succinct.kv._
val kvRDD = rdd.zipWithIndex.map(t => (t._2, t._1.getBytes)) 
val succinctKVRDD = kvRDD.succinctKV Load data & compress using Succinct

val succinctKVRDD = kvRDD.succinctKV
val keys = succinctKVRDD.search("Berkeley") Find all keys for values that
contain “Berkeley”

val succinctKVRDD = kvRDD.succinctKV
val value = succinctKVRDD.get(0)
val keys = succinctKVRDD.search("Berkeley")
Get value for key 0

Evaluation
Dataset Wikipedia dataset 
~40GB data

Evaluation
Dataset
Cluster
Wikipedia dataset 
~40GB data
Amazon EC2, 5 machines, 30GB RAM each

Evaluation
Dataset
Cluster
Workload
~40GB data
Search queries, 1-10,000 occurrences

Evaluation
Dataset
Cluster
Workload
Systems
~40GB data
Spark, Elasticsearch

Evaluation
Dataset
Cluster
Workload
Systems
~40GB data
Spark, Elasticsearch
Caveats Absolute numbers are dataset dependent

Evaluation: Search
Takeaway: Succinct on Apache Spark is 2.5x faster than Elasticsearch
while being 2.5x more space eﬃcient. 
(Data ﬁts in memory for all systems)

Support for Regular Expressions

Applications Data Cleaning 
Information Extraction 
Bioinformatics
Document Stores

Applications
Operators
Data Cleaning 
Bioinformatics
Document Stores
Union, Concat, Wildcard, Repeat

Applications
Operators
Data Cleaning 
Bioinformatics
Document Stores
Union, Concat, Wildcard, Repeat
Example .*(berkeley|stanford).edu

val matches = succinctRDD.regexSearch(".*(berkeley|stanford).edu")
Find all matches for the RegEx
“.*(berkeley|stanford).edu”
SuccinctRDD

val matches = succinctRDD.regexSearch(".*(berkeley|stanford).edu")
Find all matches for the RegEx
SuccinctRDD
val matchKeys = succinctKVRDD.regexSearch(".*(berkeley|stanford).edu")
Find all keys for values that contain the RegEx
SuccinctKVRDD

Evaluation: RegEx
Takeaway: Succinct signiﬁcantly speeds up RegEx queries even when
all the data ﬁts in memory for all systems.

Succinct on Apache Spark
Already in use at Elsevier Labs

‣ Use case: Annotation Search

Documents

Documents
1, sentence, (0, 15)
2, word, (0, 4)
3, word, (5, 10)
4, word, (11, 15)
Annotations

Documents
2, word, (0, 4)
3, word, (5, 10)
4, word, (11, 15)
Annotations
“Find sentences that talk about open problems in research”

Documents
2, word, (0, 4)
3, word, (5, 10)
4, word, (11, 15)
Annotations
(remains|is|still) (unknown|unclear|uncertain) within <sentence>
RegEx Annotation

Documents
2, word, (0, 4)
3, word, (5, 10)
4, word, (11, 15)
Annotations
https://spark-packages.org/package/amplab/succinct
(remains|is|still) (unknown|unclear|uncertain) within <sentence>
RegEx Annotation

Problem: Skewed Query Workloads

Load distribution across partitions is often non-uniform

‣ Succinct: Larger fraction of queries in main memory
‣ Challenge: skewed load across shards?
‣ Challenge: time varying loads?

‣ Succinct: Larger fraction of queries in main memory
‣ Challenge: skewed load across shards?
‣ Challenge: time varying loads?
‣ E.g.: Memcached + MySQL deployment @ Facebook

Selective Replication
Traditional approach:

#Replicas

#Replicas
#Replicas α Load

#Replicas
#Replicas α Load
Coarse grained

#Replicas
#Replicas α Load
Coarse grained 1-2× throughput → 2× storage

Succinct + BlowFish
Storage
Throughput

Succinct + BlowFish
Storage
Throughput
Indexes

Succinct + BlowFish
Storage
Throughput
Scans
Indexes

Succinct + BlowFish
Storage
Throughput
Scans
Indexes
Succinct

Succinct + BlowFish
Storage
Throughput
Scans
Indexes
Succinct
Storage-Performance tradeoﬀ
curve for each partition

BlowFish: Layered Sampled Array

Recap: Succinct stores a sampled suﬃx array

Unsampled values computed on the ﬂy

OriginalSampled  
Array 9 15 3 0 12 8 14 5
Rate = 2

OriginalSampled  
Array 9 15 3 0 12 8 14 5
9 12RATE = 8
Rate = 2

OriginalSampled  
Array 9 15 3 0 12 8 14 5
9 12RATE = 8
3 14RATE = 4
Rate = 2

OriginalSampled  
Array 9 15 3 0 12 8 14 5
9 12RATE = 8
3 14RATE = 4
15 0 8 5RATE = 2
Rate = 2

OriginalSampled  
Array 9 15 3 0 12 8 14 5
9 12RATE = 8
3 14RATE = 4
15 0 8 5RATE = 2
Diﬀerent combination of layers
Rate = 2

OriginalSampled  
Array 9 15 3 0 12 8 14 5
9 12RATE = 8
3 14RATE = 4
15 0 8 5RATE = 2
Different combination of layers Different points on tradeoff curve
→
Rate = 2

OriginalSampled  
Array 9 15 3 0 12 8 14 5
9 12RATE = 8
3 14RATE = 4
15 0 8 5RATE = 2
→
Rate = 2
Layer Additions and Deletions

OriginalSampled  
Array 9 15 3 0 12 8 14 5
9 12RATE = 8
3 14RATE = 4
15 0 8 5RATE = 2
→
Rate = 2
Layer Additions and Deletions Move along tradeoﬀ curve→

BlowFish: Technical Details
‣ How should partitions share cache on a server?

Low Threshold

High ThresholdLow Threshold

‣ How should partitions share cache across servers?

‣ How should requests be scheduled across replicas?

Uniﬁed Solution: Back-pressure style scheduling

Cache proportional to load,

Cache proportional to load,
without explicit coordination

1.5x higher throughput than Selective Replication,

within 11% of maximum possible throughput

‣ Standalone system (prototyped & tested)
Succinct
+
BlowFish

‣ Spark Package: Succinct on Apache Spark
Succinct
+
BlowFish

‣ Spark Package: Succinct on Apache Spark
‣ As libraries
‣ C++, Java, Scala
‣ for ease of integration
Succinct
+
BlowFish

Thanks! 
 
succinct.cs.berkeley.edu

Array of Suﬃxes (AoS)
banana$
(Input)

banana$
banana$
anana$
nana$
ana$
na$
a$
$
Sufﬁxes
(Input)

banana$
banana$
anana$
nana$
ana$
na$
a$
$
Sufﬁxes
$
a$
ana$
anana$
banana$
na$
nana$
Array of
Sufﬁxes (AoS)
lexicographicalorder
(Input)

AoS to Input (AoS2Input) Array
$
a$
ana$
anana$
banana$
na$
nana$
AoS
6
AoS2Input
5
3
1
0
4
2
b
Input
0
1
2
3
4
5
6
a
n
a
n
a
$

AoS to Input (AoS2Input) Array
$
a$
ana$
anana$
banana$
na$
nana$
AoS
6
AoS2Input
5
3
1
0
4
2
b
Input
0
1
2
3
4
5
6
a
n
a
n
a
$
locations of sufﬁxes

(sufﬁx array)

Example: search(“an”)
$
a$
ana$
anana$
banana$
na$
nana$
AoS
6
AoS2Input
5
3
1
0
4
2
b
Input
0
1
2
3
4
5
6
a
n
a
n
a
$

Example: search(“an”)
$
a$
ana$
anana$
banana$
na$
nana$
AoS
6
AoS2Input
5
3
1
0
4
2
b
Input
0
1
2
3
4
5
6
a
n
a
n
a
$
search(“an”) = {1, 3}

Next Pointer Array: Reducing AoS Size
$
a$
ana$
anana$
banana$
na$
nana$
AoS
0
1
2
3
4
5
6
NPA

$
a$
ana$
anana$
banana$
na$
nana$
AoS
0
1
2
3
4
5
6
NPA
3

$
a$
ana$
anana$
banana$
na$
nana$
AoS
0
1
2
3
4
5
6
NPA
3
6

$
a$
ana$
anana$
banana$
na$
nana$
AoS
0
1
2
3
4
5
6
NPA
2
3
6

$
a$
ana$
anana$
banana$
na$
nana$
AoS
0
1
2
3
4
5
6
NPA
4
0
5
1
2
3
6

$
a$
ana$
anana$
banana$
na$
nana$
AoS
0
1
2
3
4
5
6
NPA
4
0
5
1
2
AoS NPA
$0
1
2
3
4
5
6
a
a
a
b
n
n
4
0
5
6
3
1
2
3
6

$
a$
ana$
anana$
banana$
na$
nana$
AoS
0
1
2
3
4
5
6
NPA
4
0
5
1
2
AoS NPA
$0
1
2
3
4
5
6
a
a
a
b
n
n
4
0
5
6
3
1
2
Store only the first character
(entire suffix can be computed
“on the fly” using Next Pointer Array (NPA))
3
6

$
a$
ana$
anana$
banana$
na$
nana$
AoS
0
1
2
3
4
5
6
NPA
4
0
5
1
2
AoS NPA
$0
1
2
3
4
5
6
a
a
a
b
n
n
4
0
5
6
3
1
2
3
6
a

$
a$
ana$
anana$
banana$
na$
nana$
AoS
0
1
2
3
4
5
6
NPA
4
0
5
1
2
AoS NPA
$0
1
2
3
4
5
6
a
a
a
b
n
n
4
0
5
6
3
1
2
3
6
an

$
a$
ana$
anana$
banana$
na$
nana$
AoS
0
1
2
3
4
5
6
NPA
4
0
5
1
2
AoS NPA
$0
1
2
3
4
5
6
a
a
a
b
n
n
4
0
5
6
3
1
2
3
6
ana

$
a$
ana$
anana$
banana$
na$
nana$
AoS
0
1
2
3
4
5
6
NPA
4
0
5
1
2
AoS NPA
$0
1
2
3
4
5
6
a
a
a
b
n
n
4
0
5
6
3
1
2
3
6
ana$

$
a$
ana$
anana$
banana$
na$
nana$
AoS
0
1
2
3
4
5
6
NPA
4
0
5
1
2
AoS NPA
$
a
b
n
4
0
5
6
3
1
2
0
1
2
3
4
5
6
AoS NPA
$0
1
2
3
4
5
6
a
a
a
b
n
n
4
0
5
6
3
1
2
3
6

Reducing the size of AoS2Input
6
AoS2Input
5
0
2
4
NPA
0
5
6
3
1
2
0
1
2
3
4
5
6
3
1
4

6
AoS2Input
5
0
2
4
NPA
0
5
6
3
1
2
0
1
2
3
4
5
6
3
1
4
AoS2Input NPA
4
0
5
6
3
1
2
6
0
2
0
1
2
3
4
5
6
3

6
AoS2Input
5
0
2
4
NPA
0
5
6
3
1
2
0
1
2
3
4
5
6
3
1
4
AoS2Input NPA
4
0
5
6
3
1
2
6
0
2
0
1
2
3
4
5
6
3
Store only a few sampled values

(unsampled values computed

“on the ﬂy” using NPA)

Compressing NPA
Increasing sequence of integers

(values for sufﬁxes starting with
same character)
Can be compressed

(E.g., using run-length encoding)
$
a
b
n
4
0
5
6
3
1
2

Compressing NPA

same character)
Can be compressed

Succinct uses a 2-dimensional representation of NPA
$
a
b
n
4
0
5
6
3
1
2

Compressing NPA

same character)
Can be compressed

- better compressibility
$
a
b
n
4
0
5
6
3
1
2

Compressing NPA

same character)
Can be compressed

- avoids binary search on AoS (lower latency)
$
a
b
n
4
0
5
6
3
1
2

Compressing NPA

same character)
Can be compressed

- enables wider range of queries (E.g., RegEx)
$
a
b
n
4
0
5
6
3
1
2

Compressing NPA

same character)
Can be compressed

- enables wider range of queries (E.g., RegEx)
See upcoming NSDI paper!
$
a
b
n
4
0
5
6
3
1
2

Evaluation: Storage Footprint
10 node 150GB cluster

10 with in-
h metadata;
r-store with
a bug also
es the sys-
ariable. For
ss to the in-
orm micro-
single ma-
failure sce-
nd Cassan-
exes. These
nd wildcard
rt wildcard
ide slightly
y, for Suc-
valuate the
tion.
lti-attribute
rgeKV from
75
150
225
DataSizethat
FitsinMemory(GB)
SmallKV LargeKV
MongoDB
Cassandra
HyperDex
Succinct
RAM
Figure 12: Succinct pushes more than 10× larger amount
of data in memory compared to the next best system, while
providing similar or stronger functionality.

Takeaway: Succinct can push >11x more data in memory
10 with in-
h metadata;
r-store with
a bug also
es the sys-
ariable. For
ss to the in-
orm micro-
single ma-
failure sce-
nd Cassan-
exes. These
nd wildcard
rt wildcard
ide slightly
y, for Suc-
valuate the
tion.
lti-attribute
rgeKV from
75
150
225
DataSizethat
FitsinMemory(GB)
SmallKV LargeKV
MongoDB
Cassandra
HyperDex
Succinct
RAM
Figure 12: Succinct pushes more than 10× larger amount
of data in memory compared to the next best system, while
providing similar or stronger functionality.

Evaluation: Throughput (95% GET + 5% PUT)
10 node 150GB cluster, uniform random access pattern

Evaluation: Throughput (95% GET + 5% PUT)
Takeaway: Succinct achieves performance comparable to existing
open source systems for queries on primary attributes
10 node 150GB cluster, uniform random access pattern

Evaluation: Throughput (95% SEARCH + 5% PUT)
10 node 150GB cluster, search queries with 1-10K occurrences

Evaluation: Throughput (95% SEARCH + 5% PUT)
Takeaway: Succinct by pushing more data in faster storage provides
performance similar to existing systems for 10-11x larger data sizes.
10 node 150GB cluster, search queries with 1-10K occurrences

Evaluation: RegEx Latency
40GB Wikipedia dataset, 5 commonly used RegEx queries
Single EC2 node, 32 vCPUs, 244GB RAM

Evaluation: RegEx Latency
Takeaway: Succinct signiﬁcantly speeds up RegEx queries even when
all the data ﬁts in memory for all systems.
40GB Wikipedia dataset, 5 commonly used RegEx queries
Single EC2 node, 32 vCPUs, 244GB RAM

val ids1 = succinctJsonRDD.search("AMPLab")
Search for JSON documents containing “AMPLab”
Support for JSON

val ids2 = succinctJsonRDD.filter("city", "Berkeley")
Filter JSON documents where the “city” attribute has value “Berkeley”
Support for JSON

val jsonDoc = succinctJsonRDD.get(0)
val ids2 = succinctJsonRDD.filter("city", "Berkeley")
Get JSON document with id 0
Support for JSON

9 12RATE = 8
3 14RATE = 4
15 0 8 5RATE = 2
Layer Additions & Deletions

9 12RATE = 8
3 14RATE = 4
Layer Deletions: simple

RATE = 2
9 12RATE = 8
3 14RATE = 4
Layer Addition:

RATE = 2
9 12RATE = 8
3 14RATE = 4
Unsampled values already computed during query execution
Layer Addition:

RATE = 2
9 12RATE = 8
3 14RATE = 4
815
Unsampled values already computed during query execution
Layer Addition:
Layers in LSA populated opportunistically!!

Spatial Skew
Load distribution across partitions is heavily skewed

Object
Load
1
Compressed
Wasted Cache!
Spatial Skew
#Replicas α Load

Spatial Skew
#Replicas α Load
BlowFish
Fractionally change storage
just enough to meet load
1
Compressed
Uncompressed
10
Object
Load

Spatial Skew
#Replicas α Load
BlowFish
1
Compressed
Uncompressed
10
Object
Load

Spatial Skew
#Replicas α Load
BlowFish
within 10% of optimal
1
Compressed
Uncompressed
10
Object
Load

Changes in Spatial Skew
Study on Facebook
Warehouse Cluster
[HotStorage’13]

Transient failures → 90% of failuresStudy on Facebook
Warehouse Cluster
[HotStorage’13]

Transient failures → 90% of failures
Replica creation delayed by 15 mins
Study on Facebook
Warehouse Cluster
[HotStorage’13]

Study on Facebook
Warehouse Cluster
[HotStorage’13]
Leads to variation in load over time

Replica#1
Replica#2
Replica#3
Data Partitions Request Queues
Study on Facebook
Warehouse Cluster
[HotStorage’13]
Leads to variation in load over time

Replica#1
Replica#2
Replica#3

Changes in Spatial SkewOperations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Replica#1
Replica#2
Replica#3

Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Load
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Replica#1
Replica#2
Replica#3

Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Load Throughput
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Replica#1
Replica#2
Replica#3

Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Load Throughput
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Replica#1
Replica#2
Replica#3

Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Load Throughput
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
RequestQueueSize
0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Replica#1
Replica#2
Replica#3

Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
RequestQueueSize
0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Load Throughput
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
RequestQueueSize
0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Replica#1
Replica#2
Replica#3

Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
RequestQueueSize
0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
RequestQueueSize
0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Load Throughput
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
RequestQueueSize
0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Replica#1
Replica#2
Replica#3

Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
RequestQueueSize
0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
RequestQueueSize
0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Load Throughput
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
RequestQueueSize
0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Replica#1
Replica#2
Replica#3

Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
RequestQueueSize
0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
RequestQueueSize
0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Load Throughput
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
RequestQueueSize
0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Replica#1
Replica#2
Replica#3

Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
RequestQueueSize
0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
RequestQueueSize
0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
RequestQueueSize
0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Load Throughput
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
RequestQueueSize
0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Replica#1
Replica#2
Replica#3

Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
RequestQueueSize
0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
RequestQueueSize
0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
RequestQueueSize
0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Load Throughput
Operations/second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
RequestQueueSize
0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Adapts to 3x higher load in < 5 mins
Replica#1
Replica#2
Replica#3

SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

More Related Content

What's hot (7)

Viewers also liked (18)

Similar to SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab (20)

More from Chester Chen (20)

Recently uploaded (20)

SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab