SlideShare a Scribd company logo
Eyal Gutkind - Head of Solutions Architects
WEBINAR
2
Eyal Gutkind is head of solution architects at Scylla. Prior to
Scylla Eyal held product management roles at Mirantis and
DataStax. Prior to DataStax Eyal spent 12 years with Mellanox
Technologies in various engineering management and product
marketing roles. Eyal holds a BSc. degree in Electrical and
Computer Engineering from Ben Gurion University, Israel and
MBA from Fuqua School of Business at Duke University, North
Carolina.
3
+ Next-generation NoSQL database
+ Drop-in replacement for Cassandra
+ 10X the performance & low tail latency
+ Open source and enterprise editions
+ Founded by the creators of KVM hypervisor
+ HQs: Palo Alto, CA; Herzelia, Israel
4
5
6
7
8
9
10
11
12
https://github.com/datastax/spark-cassandra-connector
+ Provides Spark Context to data stored in Scylla/Cassandra
+ Batch writes
+ Read Scylla/Cassandra partitions to Spark Partitions
+ Connection management between Scylla and Spark driver and executors
+ Utilizes the Cassandra Java driver
13
output.batch.grouping.buffer.size
output.batch.size.bytes
output.concurrent.writes
output.batch.grouping.key
14
input.split.size_in_mb
Don’t forget data is compressed on Disk!
Scylla paging capabilities will have an impact!
input.fetch.size_in_rows
15
16
17
18
19
+ ncrease default Spark parallelism (number
of cores in the Spark local machine deployment)
+ Reduced Spark split size (64 -> 1)
+ Connection.connections_per_executor_max
(# of core or more)
+ Output.concurrent.writes default 5
+ Concurrent.reads default is 512
20
21
22
23
+ Scylla enables analytics on top transactional data
+ Performance tuning is required for certain workloads
+ Resource management is key to stability of your deployment
24
- Docker based deployment example:
- https://www.scylladb.com/2018/06/04/mms-day-13-spark-hive/
- Scylla and Spark integration example:
- http://docs.scylladb.com/kb/scylla-and-spark-integration/
- Cassandra-Spark connector:
- https://github.com/datastax/spark-cassandra-connector
- Good source of information about Cassandra-Spark connector
- http://www.russellspitzer.com/
-
25
26
United States Israel www.scylladb.com
@scylladb

More Related Content

PDF
Steering the Sea Monster - Integrating Scylla with Kubernetes
PDF
Database Jiu Jitsu: How ScyllaDB Open Sourced a DynamoDB-compatible API
PDF
How to achieve no compromise performance and availability
PDF
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...
PDF
Webinar how to build a highly available time series solution with kairos-db (1)
PPTX
Overcoming Barriers of Scaling Your Database
PDF
The Do’s and Don’ts of Benchmarking Databases
PDF
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Steering the Sea Monster - Integrating Scylla with Kubernetes
Database Jiu Jitsu: How ScyllaDB Open Sourced a DynamoDB-compatible API
How to achieve no compromise performance and availability
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...
Webinar how to build a highly available time series solution with kairos-db (1)
Overcoming Barriers of Scaling Your Database
The Do’s and Don’ts of Benchmarking Databases
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API

What's hot (20)

PDF
NoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency
PDF
Scylla Virtual Workshop 2020
PDF
Critical Attributes for a High-Performance, Low-Latency Database
PDF
Addressing the High Cost of Apache Cassandra
PDF
Under the Hood of a Shard-per-Core Database Architecture
PDF
Numberly on Joining Billions of Rows in Seconds: Replacing MongoDB and Hive w...
PPTX
Scylla Summit 2018: Keynote - 4 Years of Scylla
PDF
Keeping your application’s latency SLAs no matter what
PDF
RDBMS to NoSQL: Practical Advice from Successful Migrations
PDF
Running Scylla on Kubernetes with Scylla Operator
PDF
Wide Column Store NoSQL vs SQL Data Modeling
PDF
The True Cost of NoSQL DBaaS Options
PDF
Building Event Streaming Architectures on Scylla and Kafka
PDF
Measuring Database Performance on Bare Metal AWS Instances
PDF
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
PDF
Introducing Scylla Cloud
PPTX
Lightweight Transactions in Scylla versus Apache Cassandra
PPTX
Real-time Fraud Detection for Southeast Asia’s Leading Mobile Platform
PDF
How to Build a Scylla Database Cluster that Fits Your Needs
PPTX
Cassandra vs. ScyllaDB: Evolutionary Differences
NoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency
Scylla Virtual Workshop 2020
Critical Attributes for a High-Performance, Low-Latency Database
Addressing the High Cost of Apache Cassandra
Under the Hood of a Shard-per-Core Database Architecture
Numberly on Joining Billions of Rows in Seconds: Replacing MongoDB and Hive w...
Scylla Summit 2018: Keynote - 4 Years of Scylla
Keeping your application’s latency SLAs no matter what
RDBMS to NoSQL: Practical Advice from Successful Migrations
Running Scylla on Kubernetes with Scylla Operator
Wide Column Store NoSQL vs SQL Data Modeling
The True Cost of NoSQL DBaaS Options
Building Event Streaming Architectures on Scylla and Kafka
Measuring Database Performance on Bare Metal AWS Instances
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Introducing Scylla Cloud
Lightweight Transactions in Scylla versus Apache Cassandra
Real-time Fraud Detection for Southeast Asia’s Leading Mobile Platform
How to Build a Scylla Database Cluster that Fits Your Needs
Cassandra vs. ScyllaDB: Evolutionary Differences
Ad

Similar to Spark Powered by Scylla (20)

PPTX
Scylla Summit 2018: Best Practices for Running Spark with Scylla
PDF
Workshop - How to benchmark your database
PDF
Scylla Summit 2017: Welcome and Keynote - Nextgen NoSQL
PDF
Observability Best Practices for Your Cloud DBaaS
PDF
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
PDF
Scylla Summit 2017 Keynote: NextGen NoSQL with CEO Dor Laor
PPTX
Scylla Virtual Workshop 2022
PDF
ScyllaDB Virtual Workshop
PPTX
Scylla Summit 2019 Keynote - Avi Kivity
PDF
Scylla Summit 2016: Keynote - Big Data Goes Native
PPTX
How to be Successful with Scylla
PDF
Scylla Summit 2017: Keynote, Looking back, looking ahead
PDF
Transforming the Database: Critical Innovations for Performance at Scale
PPTX
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
PPTX
Scylla Summit 2018: Scylla 3.0 and Beyond
PDF
Running a DynamoDB-compatible Database on Managed Kubernetes Services
PDF
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
PDF
Running a Cost-Effective DynamoDB-Compatible Database on Managed Kubernetes S...
PDF
Scylla Summit 2017: Scylla's Open Source Monitoring Solution
PDF
New Ways to Reduce Database Costs with ScyllaDB
Scylla Summit 2018: Best Practices for Running Spark with Scylla
Workshop - How to benchmark your database
Scylla Summit 2017: Welcome and Keynote - Nextgen NoSQL
Observability Best Practices for Your Cloud DBaaS
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Scylla Summit 2017 Keynote: NextGen NoSQL with CEO Dor Laor
Scylla Virtual Workshop 2022
ScyllaDB Virtual Workshop
Scylla Summit 2019 Keynote - Avi Kivity
Scylla Summit 2016: Keynote - Big Data Goes Native
How to be Successful with Scylla
Scylla Summit 2017: Keynote, Looking back, looking ahead
Transforming the Database: Critical Innovations for Performance at Scale
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Scylla Summit 2018: Scylla 3.0 and Beyond
Running a DynamoDB-compatible Database on Managed Kubernetes Services
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Running a Cost-Effective DynamoDB-Compatible Database on Managed Kubernetes S...
Scylla Summit 2017: Scylla's Open Source Monitoring Solution
New Ways to Reduce Database Costs with ScyllaDB
Ad

More from ScyllaDB (20)

PDF
Understanding The True Cost of DynamoDB Webinar
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
PDF
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
PDF
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
PDF
Leading a High-Stakes Database Migration
PDF
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
PDF
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
PDF
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
PDF
ScyllaDB: 10 Years and Beyond by Dor Laor
PDF
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
PDF
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
PDF
Vector Search with ScyllaDB by Szymon Wasik
PDF
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
PDF
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
PDF
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
PDF
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
PDF
Lessons Learned from Building a Serverless Notifications System by Srushith R...
PDF
A Dist Sys Programmer's Journey into AI by Piotr Sarna
PDF
High Availability: Lessons Learned by Paul Preuveneers
Understanding The True Cost of DynamoDB Webinar
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Leading a High-Stakes Database Migration
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB: 10 Years and Beyond by Dor Laor
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Vector Search with ScyllaDB by Szymon Wasik
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
Lessons Learned from Building a Serverless Notifications System by Srushith R...
A Dist Sys Programmer's Journey into AI by Piotr Sarna
High Availability: Lessons Learned by Paul Preuveneers

Recently uploaded (20)

PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Machine learning based COVID-19 study performance prediction
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Big Data Technologies - Introduction.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Encapsulation theory and applications.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
Understanding_Digital_Forensics_Presentation.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Reach Out and Touch Someone: Haptics and Empathic Computing
“AI and Expert System Decision Support & Business Intelligence Systems”
Machine learning based COVID-19 study performance prediction
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Dropbox Q2 2025 Financial Results & Investor Presentation
Big Data Technologies - Introduction.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
NewMind AI Weekly Chronicles - August'25 Week I
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
20250228 LYD VKU AI Blended-Learning.pptx
Approach and Philosophy of On baking technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Network Security Unit 5.pdf for BCA BBA.
Encapsulation theory and applications.pdf
The AUB Centre for AI in Media Proposal.docx

Spark Powered by Scylla

  • 1. Eyal Gutkind - Head of Solutions Architects WEBINAR
  • 2. 2 Eyal Gutkind is head of solution architects at Scylla. Prior to Scylla Eyal held product management roles at Mirantis and DataStax. Prior to DataStax Eyal spent 12 years with Mellanox Technologies in various engineering management and product marketing roles. Eyal holds a BSc. degree in Electrical and Computer Engineering from Ben Gurion University, Israel and MBA from Fuqua School of Business at Duke University, North Carolina.
  • 3. 3 + Next-generation NoSQL database + Drop-in replacement for Cassandra + 10X the performance & low tail latency + Open source and enterprise editions + Founded by the creators of KVM hypervisor + HQs: Palo Alto, CA; Herzelia, Israel
  • 4. 4
  • 5. 5
  • 6. 6
  • 7. 7
  • 8. 8
  • 9. 9
  • 10. 10
  • 11. 11
  • 12. 12 https://github.com/datastax/spark-cassandra-connector + Provides Spark Context to data stored in Scylla/Cassandra + Batch writes + Read Scylla/Cassandra partitions to Spark Partitions + Connection management between Scylla and Spark driver and executors + Utilizes the Cassandra Java driver
  • 14. 14 input.split.size_in_mb Don’t forget data is compressed on Disk! Scylla paging capabilities will have an impact! input.fetch.size_in_rows
  • 15. 15
  • 16. 16
  • 17. 17
  • 18. 18
  • 19. 19
  • 20. + ncrease default Spark parallelism (number of cores in the Spark local machine deployment) + Reduced Spark split size (64 -> 1) + Connection.connections_per_executor_max (# of core or more) + Output.concurrent.writes default 5 + Concurrent.reads default is 512 20
  • 21. 21
  • 22. 22
  • 23. 23
  • 24. + Scylla enables analytics on top transactional data + Performance tuning is required for certain workloads + Resource management is key to stability of your deployment 24
  • 25. - Docker based deployment example: - https://www.scylladb.com/2018/06/04/mms-day-13-spark-hive/ - Scylla and Spark integration example: - http://docs.scylladb.com/kb/scylla-and-spark-integration/ - Cassandra-Spark connector: - https://github.com/datastax/spark-cassandra-connector - Good source of information about Cassandra-Spark connector - http://www.russellspitzer.com/ - 25
  • 26. 26
  • 27. United States Israel www.scylladb.com @scylladb