SlideShare a Scribd company logo
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017
Tuning Java Driver for
Apache Cassandra
November 2017
Nenad Bozic
@NenadBozicNs
nenad.bozic@smartcat.
SmartCat
www.smartcat.
io
When people start with Apache Cassandra
When people call us for help
Agenda
• intro to Apache Cassandra
• tuning options in driver
• use cases
• takeaways and Q&A
Apache Cassandra
Cassandra Overview
• partitioned data with tunable consistency
• replication factor - how many replicas
• masterless architecture
• native multi-datacenter support
Architecture
Client contact
Architecture
Client request
Consistency level 1
Replication factor 3
Architecture
Client request
response
Consistency level 1
Replication factor 3
Architecture
DC1 DC2
Cluster
Data Modeling
• query based modeling
• data is denormalized
• data is duplicated
Use Cases
• when high availability is crucial, and eventual consistency is tolerable
• event sourcing
• logging continuous streams of data
• deep visitor analytics
• early prototyping with significant query changes
• referential integrity required
• dynamic access patterns on data
Tuning options in driver
Drivers for Apache Cassandra
Load balancing
https://www.slideshare.net/planetcassandra/apache-cassandra-and-drivers
Data Center Aware Load Balancing
https://www.slideshare.net/planetcassandra/apache-cassandra-and-drivers
Toke Aware Load Balancing
https://www.slideshare.net/planetcassandra/apache-cassandra-and-drivers
Latency Aware Load Balancing
Pooling options
• driver communicates with cluster with pool of connections
• changed between V2 and V3 version of protocol (core lowered to 1)
• going for more requests on connection can put more load to cluster
• add monitoring of in flight queries on driver side and tune for your use case
Pooling options
Speculative executions
• spawn additional queries to other nodes after configured time
http://docs.datastax.com/en/developer/java-driver/3.1/manual/speculative_execution/
Speculative executions
• constant speculative execution policy
• percentile speculative execution policy
Timeouts
• driver read timeout vs server read timeout
• driver settings for all queries or per query settings
• setReadTimeoutMillis and setConnectionTimeoutMillis
Retry policies
• fail early and retry
• add retry policy or speculative execution
• downgrading retry policy if inconsistent data is more important than no data
Use cases
Click stream and IoT measurements
• visualize measurements from many devices
• fast access with tolerable inconsistencies
• DC aware and token aware policy to land on local node with data
• lower consistency level (ONE) or use downgrading retry policy
• use speculative executions to query more nodes if cluster can manage load
Mission critical data with tolerable performance
• stock data in warehouse used to compare with ERP system
• high consistency (read + write > replication factor)
• retry and reconnect policy is a must
• choose lower requests per connection numbers not to overload cluster
• set lower read timeout to fail early and retry
Write heavy low latency read use case
• ad serving (store user analytics and serve ads fast)
• separate read and write for different tuning options
• latency aware policy on reads to choose always fast performing nodes
• lower down read timeout on driver and server to fail early
• increase maximum requests per connection
Conclusion
Conclusion and take aways
• know your use case and know your database
• each tuning options requires good monitoring
TEST
ADJUST MEASURE
Links
• SmartCat Blog post - Tuning Java driver for Apache Cassandra - part 1
• SmartCat Blog post - Tuning Java driver for Apache Cassandra - part 2
• Use case example - Tuning for heavy write and low latency read scenario
Q&A
Thank you
Nenad Bozic
@NenadBozic
Ns
SmartCat
www.smartcat.i
o

More Related Content

PDF
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
PDF
Capital One: Using Cassandra In Building A Reporting Platform
PPTX
In Flux Limiting for a multi-tenant logging service
PPTX
IoT Austin CUG talk
PDF
Proofpoint: Fraud Detection and Security on Social Media
PDF
Apache Spark At Scale in the Cloud
PPTX
Real time analytics with Kafka and SparkStreaming
PDF
Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge_David
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
Capital One: Using Cassandra In Building A Reporting Platform
In Flux Limiting for a multi-tenant logging service
IoT Austin CUG talk
Proofpoint: Fraud Detection and Security on Social Media
Apache Spark At Scale in the Cloud
Real time analytics with Kafka and SparkStreaming
Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge_David

What's hot (20)

PDF
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
PDF
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
PDF
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
PPTX
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
PDF
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
PDF
Building A Diverse Geo-Architecture For Cloud Native Applications In One Day
PDF
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
PPTX
Assaf Araki – Real Time Analytics at Scale
PPTX
INTRODUCING: CREATE PIPELINE
PDF
Lambda at Weather Scale - Cassandra Summit 2015
PDF
Tsinghua University: Two Exemplary Applications in China
ODP
Kick-Start with SMACK Stack
PDF
British Gas Connected Homes: Data Engineering
PDF
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
PDF
Petabridge: The New .NET Enterprise Stack
PDF
Monitoring Large-Scale Apache Spark Clusters at Databricks
PPTX
High cardinality time series search: A new level of scale - Data Day Texas 2016
PDF
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
PPTX
Spark Streaming the Industrial IoT
PDF
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Building A Diverse Geo-Architecture For Cloud Native Applications In One Day
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Assaf Araki – Real Time Analytics at Scale
INTRODUCING: CREATE PIPELINE
Lambda at Weather Scale - Cassandra Summit 2015
Tsinghua University: Two Exemplary Applications in China
Kick-Start with SMACK Stack
British Gas Connected Homes: Data Engineering
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Petabridge: The New .NET Enterprise Stack
Monitoring Large-Scale Apache Spark Clusters at Databricks
High cardinality time series search: A new level of scale - Data Day Texas 2016
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark Streaming the Industrial IoT
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Ad

Similar to Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017 (20)

PPTX
Tuning Java Driver for Apache Cassandra
PDF
NoSQL – Data Center Centric Application Enablement
PDF
Scaling distributed data systems: A LinkedIn Case study
PDF
20160331 sa introduction to big data pipelining berlin meetup 0.3
PPTX
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
PDF
Kanthaka - High Volume CDR Analyzer
PPTX
ngs07.data-center.ssadasdasdasdlides.pptx
PPT
Performance and Scalability Tuning
PPTX
Azure DocumentDB Overview
PPTX
Building a highly scalable and available cloud application
PDF
Data Pipelines with Spark & DataStax Enterprise
PDF
Meta scale kognitio hadoop webinar
PPTX
Got documents - The Raven Bouns Edition
PPTX
Cassandra - A Basic Introduction Guide
PPTX
Best Practices Using RTI Connext DDS
PDF
4. (mjk) extreme performance 2
PDF
PNDA - Platform for Network Data Analytics
PPTX
Amazon`s Dynamo
PPTX
Data Centre of the Future and challenges
PPTX
Data lake-itweekend-sharif university-vahid amiry
Tuning Java Driver for Apache Cassandra
NoSQL – Data Center Centric Application Enablement
Scaling distributed data systems: A LinkedIn Case study
20160331 sa introduction to big data pipelining berlin meetup 0.3
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Kanthaka - High Volume CDR Analyzer
ngs07.data-center.ssadasdasdasdlides.pptx
Performance and Scalability Tuning
Azure DocumentDB Overview
Building a highly scalable and available cloud application
Data Pipelines with Spark & DataStax Enterprise
Meta scale kognitio hadoop webinar
Got documents - The Raven Bouns Edition
Cassandra - A Basic Introduction Guide
Best Practices Using RTI Connext DDS
4. (mjk) extreme performance 2
PNDA - Platform for Network Data Analytics
Amazon`s Dynamo
Data Centre of the Future and challenges
Data lake-itweekend-sharif university-vahid amiry
Ad

More from Big Data Spain (20)

PDF
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
PDF
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
PDF
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
PDF
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
PDF
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
PDF
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
PDF
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
PDF
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
PDF
State of the art time-series analysis with deep learning by Javier Ordóñez at...
PDF
Trading at market speed with the latest Kafka features by Iñigo González at B...
PDF
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
PDF
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
PDF
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
PDF
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
PDF
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
PDF
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
PDF
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
PDF
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
PDF
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
PDF
Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
Trading at market speed with the latest Kafka features by Iñigo González at B...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
cuic standard and advanced reporting.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
KodekX | Application Modernization Development
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Approach and Philosophy of On baking technology
PPTX
Cloud computing and distributed systems.
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPT
Teaching material agriculture food technology
20250228 LYD VKU AI Blended-Learning.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Network Security Unit 5.pdf for BCA BBA.
cuic standard and advanced reporting.pdf
Empathic Computing: Creating Shared Understanding
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Advanced methodologies resolving dimensionality complications for autism neur...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
KodekX | Application Modernization Development
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Review of recent advances in non-invasive hemoglobin estimation
Approach and Philosophy of On baking technology
Cloud computing and distributed systems.
Chapter 3 Spatial Domain Image Processing.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Machine learning based COVID-19 study performance prediction
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Teaching material agriculture food technology

Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017