SlideShare a Scribd company logo
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
Master Thesis Presentation
Internet Architecture and Systems Laboratory
Reducing Tail Latency In Cassandra Cluster Using Regression
Based Replica Selection Algorithm
Chauque Euclides
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
Outline
1. Background
2. Tail Latency
3. Replica Selection
4. Proposed Approach
4.1. Linear Regression Based Replica Selection
4.2. Predicting Query Execution Time
4.3. Training Data Generation
4.4. Model Training
4.5. Experimental Results
4.6. Comparison With the Heron
5. Summary
6. Future work
2
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
1. Background
u For business oriented applications, fast and predictable response times are critical for a good
user experience.
u A study conducted by Amazon and Google [1], where a controlled delay was added on every query
before sending back results to the user, found that:
u An extra delay of 500ms per query resulted in a 1.2% loss of revenue.
u Bounce probability in a website increases the longer the website takes to load.
3
[1] https://www.gigaspaces.com/blog/ amazon-found-every-100ms-of-latency-cost-them-1-in-sales/
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
2. Tail Latency
• It is challenging to consistently deliver fast response time, since applications are generally multi-
tiered, where serving a single end-user request may involve contacting multiple servers
l Causes of Latency can be attributed to Server Performance Variability, due to: Queuing,
Shared Resources, Background Demons
4
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
3. Replica Selection [1/3]
u Looking into the causes of the tail latency, it follows that it is infeasible to
eliminate all latency variability.
u However some approaches were developed to reduce its impact, these
approaches rely on standard techniques including:
u Giving preferential resource allocations or guarantees;
u Reissuing requests;
uTrading off completeness for latency;
5
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
3. Replica selection [2/3]
u A recurring pattern to reducing tail latency is to take advantage of the redundancy built into each tier of the
application architecture.
u Replica selection strategies can help reducing tail latency when the performance of the servers differ.
u A request can be directed to the presumably best replica, i.e. the one that is expected to serve the
request with the smallest latency.
u Ideal Replica Selection Proprieties
u Replica selection needs to quickly adapt to changing system dynamics.
u Must avoid entering oscillating instabilities.
u Should not be computationally costly, nor require significant coordination overheads
6
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
3. Replica selection [3/3]
7
l Jaiman, et al. Heron: Taming Tail Latencies in
Key-Value Stores under Heterogeneous
Workloads, 37th ISRDS, IEEE 2018.
- Takes into consideration the size of the
values associated with keys.
- The algorithm uses Bloom filters to keep track
of keys associated with large values.
- Whenever a replica a processing a request for
a large value, it is marked as busy.
- As the amount of data in the datastore
increases, the bloom filter cannot be expanded
without loosing previous mapping.
l Suresh et al. C3: Cutting Tail Latency in Cloud
Data Stores via Adaptive Replica Selection,
NSDI’15, USENIX 2015
- The Algorithm consists of a replica ranking
algorithm and a rate control and
backpressure algorithm;
- It ranks the the servers, taking into account
server side queue, and service time.
- An incoming request is sent to a server with
the minimum expected service time.
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
8
4. Proposed Approach
Linear Regression Based Replica Selection
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
4.1. Linear Regression Based Replica Selection
9
l Previous approaches do not support aggregation
queries
l Query duration is inferred based on the size of the
value requested and not on real estimates
l In my research I explore a different approach, using a
regression model to predict query duration;
l And focus on reducing the tail latency above p999
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
4.2. Predicting Query Execution Time
10
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
4.3. Training Data Generation [1/2]
u For data collection, 3 tables for from the TPC-H benchmark were loaded into a Cassandra
cluster;
u 8 servers were used, a replication factor was set to 3.
u Subsequently, locust, was used to issue requests, using the chosen subset of TPC-H queries,
to simulate user requests;
u The response time values for different percentiles were recorded for each request.
u The same process was repeated, with different number of simulated users to simulate an
increased load.
11
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
4.3. Training Data Generation [2/2]
12
Ø The queries show different response time
behavior;
Ø The queries with longer response time show a
greater variation of response time as the load
is increased.
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
4.4. Model Training [1/2]
u To keep prediction overhead low, based on [1] a Linear Regression was chosen to fit the
data.
u For each query template data, a regression model was fit.
u As the evaluation method for the regressors the R Squared was used:
u The R squared is the percentage of the dependent variable variation that a linear model
explains
[1] https://scikit-learn.org/0.16/modules/computational_performance.html
13
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
4.4. Model Training [2/2]
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
4.5. Results: Homogeneous Servers [1/3]
15
l Homogeneous Servers
- Figures below shows the tail latency values p999 and p99999 for each query;
- Overall latency is improved.
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
4.5. Results Homogeneous Servers [2/3]
16
u Comparison of p50, p90 and p999.
u Higher percentile latency (99.9%) is improved,
however the 90% percentile is degraded.
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
4.5. Results Homogeneous Servers [3/3]
17
u Throughput Comparison
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
4.5. Results Heterogeneous servers [1/3]
18
l Up to 2 seconds delay was introduced into 4 servers response to simulate an
environment with servers with different processing capabilities.
l Figures below shows the tail latency values p999 and p99999 for each query
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
4.5. Results Heterogeneous servers [2/3]
19
u Comparison of p50, p90 and p999.
u Higher percentile latency p999 and p99999 are improved, however the p50 percentile is degraded.
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
4.5. Results Heterogeneous servers [3/3]
20
u Throughput comparison for a cluster with heterogeneous servers
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
4.6. Comparison with Heron
21
Ø Colloquium B comment by Professor Keiichi Yasumuto. Proposed approach relation with previous work
Ø P999 Response time for all queries, and p999 aggregate comparison between proposed method and heron.
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
5. Summary
Ø In the present work the tail latency problem was reviewed, and the problem of server
selection was considered as a method to reduce tail latency..
Ø Previous work had been based in a simpler queries, thus is no longer suitable for the
complex queries that came to be supported in Cassandra, this served as motivation for
exploring a new approach for server selection using a regression model to model the
interaction between the queries.
Ø This new approach proved to be successful in reducing tail latency, while preserving the
throughput, however it affected negatively the lower percentiles..
22
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
6. Future Work
Ø A still remaining point to explore is the use of more evolved machine learning
models, to see if the excessive overhead assumption holds true or not.
Ø And also, experiment with an even greater number of servers
23
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
24
End
奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits-
Models Computational Performance
Ø Prediction Latency
l Sklearn benchmark for different models prediction latency
Ø Prediction Throughput
l Sklearn benchmark for different models prediction throughput
https://scikit-learn.org/0.16/modules/computational_performance.html
25

More Related Content

PDF
Acl Optimisation - Computer Networks
PPTX
Optimization of Continuous Queries in Federated Database and Stream Processin...
PPTX
Adaptive Replication for Elastic Data Stream Processing
PPTX
Cloud Computing and PSo
PPTX
Chapter 8 Operating Systems silberschatz : deadlocks
PDF
Genetic Algorithm for Process Scheduling
PDF
Configuration Optimization for Big Data Software
PDF
A Domain-Specific Embedded Language for Programming Parallel Architectures.
Acl Optimisation - Computer Networks
Optimization of Continuous Queries in Federated Database and Stream Processin...
Adaptive Replication for Elastic Data Stream Processing
Cloud Computing and PSo
Chapter 8 Operating Systems silberschatz : deadlocks
Genetic Algorithm for Process Scheduling
Configuration Optimization for Big Data Software
A Domain-Specific Embedded Language for Programming Parallel Architectures.

What's hot (6)

PDF
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...
PDF
capacityshifting1
PDF
[IJET-V1I5P2] Authors :Hind HazzaAlsharif , Razan Hamza Bawareth
PPTX
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
PDF
Job Resource Ratio Based Priority Driven Scheduling in Cloud Computing
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...
capacityshifting1
[IJET-V1I5P2] Authors :Hind HazzaAlsharif , Razan Hamza Bawareth
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
Job Resource Ratio Based Priority Driven Scheduling in Cloud Computing
Ad

Similar to euclides-c mthesis (7)

PDF
USENIX NSDI 2016 (Session: Resource Sharing)
PDF
Forecasting across time series databases using recurrent neural networks on g...
PDF
Fuzzy Control meets Software Engineering
PDF
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
PPTX
23卒lt
PDF
A game theoretic approach for runtime capacity allocation in map-reduce (WACC...
PDF
Autonomic Resource Provisioning for Cloud-Based Software
USENIX NSDI 2016 (Session: Resource Sharing)
Forecasting across time series databases using recurrent neural networks on g...
Fuzzy Control meets Software Engineering
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
23卒lt
A game theoretic approach for runtime capacity allocation in map-reduce (WACC...
Autonomic Resource Provisioning for Cloud-Based Software
Ad

More from inet-lab (6)

PDF
清掃工場における磁気フィンガープリンティングパスマッチングによる 屋内測位手法の性能評価
PDF
2022/02 情報基盤システム学(NAIST)の研究室紹介
PDF
運行情報と気象情報の畳み込みによるバス到着時刻予測手法の提案と評価
PDF
パフォーマンスを考慮したプリミティブなTrusted TypesによるClient-Side XSS防御手法
PPTX
shuji-oh master thesis
PDF
情報基盤システム学(NAIST)の研究室紹介
清掃工場における磁気フィンガープリンティングパスマッチングによる 屋内測位手法の性能評価
2022/02 情報基盤システム学(NAIST)の研究室紹介
運行情報と気象情報の畳み込みによるバス到着時刻予測手法の提案と評価
パフォーマンスを考慮したプリミティブなTrusted TypesによるClient-Side XSS防御手法
shuji-oh master thesis
情報基盤システム学(NAIST)の研究室紹介

Recently uploaded (20)

PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
OOP with Java - Java Introduction (Basics)
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
composite construction of structures.pdf
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPT
Mechanical Engineering MATERIALS Selection
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
Geodesy 1.pptx...............................................
PPTX
Construction Project Organization Group 2.pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Current and future trends in Computer Vision.pptx
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
UNIT 4 Total Quality Management .pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
OOP with Java - Java Introduction (Basics)
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
composite construction of structures.pdf
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
R24 SURVEYING LAB MANUAL for civil enggi
Foundation to blockchain - A guide to Blockchain Tech
Mechanical Engineering MATERIALS Selection
CYBER-CRIMES AND SECURITY A guide to understanding
Model Code of Practice - Construction Work - 21102022 .pdf
CH1 Production IntroductoryConcepts.pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Geodesy 1.pptx...............................................
Construction Project Organization Group 2.pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
Current and future trends in Computer Vision.pptx

euclides-c mthesis