SlideShare a Scribd company logo
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB

More Related Content

PPTX
Scylla @ GumGum: Contextual Ads
PPTX
Comcast: Sprinting from Cassandra to Scylla
PDF
Case Study: Troubleshooting Cassandra performance issues as a developer
PDF
Scylla Summit 2022: ScyllaDB Cloud: Simplifying Deployment to the Public Cloud
PDF
Ebay: DB Capacity planning at eBay
PPTX
Scylla Summit 2018: Scylla and KairosDB in Smart Vehicle Diagnostics
PDF
Introducing Scylla Cloud
PDF
Introduction to AWS Outposts
Scylla @ GumGum: Contextual Ads
Comcast: Sprinting from Cassandra to Scylla
Case Study: Troubleshooting Cassandra performance issues as a developer
Scylla Summit 2022: ScyllaDB Cloud: Simplifying Deployment to the Public Cloud
Ebay: DB Capacity planning at eBay
Scylla Summit 2018: Scylla and KairosDB in Smart Vehicle Diagnostics
Introducing Scylla Cloud
Introduction to AWS Outposts

What's hot (20)

PDF
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
PPTX
Captial One: Why Stream Data as Part of Data Transformation?
PDF
Real-World Resiliency: Surviving Datacenter Disaster
PPTX
Augury: Real-Time Insights for the Industrial IoT
PPTX
Scylla Summit 2018: Grab and Scylla: Driving Southeast Asia Forward
PPTX
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
PPTX
Free & Open DynamoDB API for Everyone
PPTX
Scylla Cloud on Display: Functionality, Performance and Demos
PDF
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
PPTX
Scylla Summit 2018: Scylla 3.0 and Beyond
PPTX
Scylla Summit 2018: Kiwi.com Migration to Scylla - The Why, the How, the Fail...
PPTX
Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...
PPTX
How to Secure Your Scylla Deployment: Authorization, Encryption, LDAP Authent...
PPTX
Real-time Fraud Detection for Southeast Asia’s Leading Mobile Platform
PDF
Introducing Scylla Open Source 4.0
PDF
Scylla Summit 2016: ScyllaDB, Present and Future
PDF
Workshop - How to benchmark your database
PPTX
High-Load Storage of Users’ Actions with ScyllaDB and HDDs
PPTX
Scylla Summit 2018: Getting the Most Out of Scylla on Kubernetes
PPTX
Scylla Summit 2019 Keynote - Avi Kivity
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
Captial One: Why Stream Data as Part of Data Transformation?
Real-World Resiliency: Surviving Datacenter Disaster
Augury: Real-Time Insights for the Industrial IoT
Scylla Summit 2018: Grab and Scylla: Driving Southeast Asia Forward
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
Free & Open DynamoDB API for Everyone
Scylla Cloud on Display: Functionality, Performance and Demos
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
Scylla Summit 2018: Scylla 3.0 and Beyond
Scylla Summit 2018: Kiwi.com Migration to Scylla - The Why, the How, the Fail...
Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...
How to Secure Your Scylla Deployment: Authorization, Encryption, LDAP Authent...
Real-time Fraud Detection for Southeast Asia’s Leading Mobile Platform
Introducing Scylla Open Source 4.0
Scylla Summit 2016: ScyllaDB, Present and Future
Workshop - How to benchmark your database
High-Load Storage of Users’ Actions with ScyllaDB and HDDs
Scylla Summit 2018: Getting the Most Out of Scylla on Kubernetes
Scylla Summit 2019 Keynote - Avi Kivity
Ad

Similar to Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB (20)

PPTX
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
PDF
The Cloud Revolution - Philippines Cloud Summit
PDF
MongoDB World 2019: Implementation and Operationalization of MongoDB Sharding...
PDF
When Open Source Meets the Enterprise
PDF
AWS Summit London 2023 - Migrating 600 Databases To AWS
PDF
Cisco Connect Ottawa 2018 consuming public and private clouds
PDF
EsgynDB: A Big Data Engine. Simplifying Fast and Reliable Mixed Workloads
PDF
Building Real-Time Serverless Data Applications With Joseph Morais and Adam W...
PPTX
Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edition
PDF
Building a Cross Cloud Data Protection Engine
PPTX
06_08_emea_how_to_evaluate_rollout_and_operationalize_your_sdwan_projects_web...
PDF
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
PPTX
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
PPTX
Optimizing Your Supply Chain with Neo4j
PPTX
Hybrid Cloud on AWS
PDF
Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
PPTX
Building Modern Applications on AWS.pptx
PDF
Confluent_AWS_ImmersionDay_Q42023.pdf
PDF
Build real-time streaming data pipelines to AWS with Confluent
PDF
Let’s Make Your CFO Happy; A Practical Guide for Kafka Cost Reduction with El...
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
The Cloud Revolution - Philippines Cloud Summit
MongoDB World 2019: Implementation and Operationalization of MongoDB Sharding...
When Open Source Meets the Enterprise
AWS Summit London 2023 - Migrating 600 Databases To AWS
Cisco Connect Ottawa 2018 consuming public and private clouds
EsgynDB: A Big Data Engine. Simplifying Fast and Reliable Mixed Workloads
Building Real-Time Serverless Data Applications With Joseph Morais and Adam W...
Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edition
Building a Cross Cloud Data Protection Engine
06_08_emea_how_to_evaluate_rollout_and_operationalize_your_sdwan_projects_web...
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
Optimizing Your Supply Chain with Neo4j
Hybrid Cloud on AWS
Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
Building Modern Applications on AWS.pptx
Confluent_AWS_ImmersionDay_Q42023.pdf
Build real-time streaming data pipelines to AWS with Confluent
Let’s Make Your CFO Happy; A Practical Guide for Kafka Cost Reduction with El...
Ad

More from ScyllaDB (20)

PDF
Understanding The True Cost of DynamoDB Webinar
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
PDF
New Ways to Reduce Database Costs with ScyllaDB
PDF
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
PDF
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
PDF
Leading a High-Stakes Database Migration
PDF
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
PDF
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
PDF
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
PDF
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
PDF
ScyllaDB: 10 Years and Beyond by Dor Laor
PDF
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
PDF
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
PDF
Vector Search with ScyllaDB by Szymon Wasik
PDF
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
PDF
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
PDF
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
PDF
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
PDF
Lessons Learned from Building a Serverless Notifications System by Srushith R...
Understanding The True Cost of DynamoDB Webinar
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
New Ways to Reduce Database Costs with ScyllaDB
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Leading a High-Stakes Database Migration
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB: 10 Years and Beyond by Dor Laor
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Vector Search with ScyllaDB by Szymon Wasik
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
Lessons Learned from Building a Serverless Notifications System by Srushith R...

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Big Data Technologies - Introduction.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Empathic Computing: Creating Shared Understanding
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Machine Learning_overview_presentation.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Cloud computing and distributed systems.
PDF
Machine learning based COVID-19 study performance prediction
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Spectral efficient network and resource selection model in 5G networks
Big Data Technologies - Introduction.pptx
Unlocking AI with Model Context Protocol (MCP)
Empathic Computing: Creating Shared Understanding
sap open course for s4hana steps from ECC to s4
Machine Learning_overview_presentation.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
MYSQL Presentation for SQL database connectivity
Chapter 3 Spatial Domain Image Processing.pdf
Review of recent advances in non-invasive hemoglobin estimation
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Cloud computing and distributed systems.
Machine learning based COVID-19 study performance prediction
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Programs and apps: productivity, graphics, security and other tools
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Per capita expenditure prediction using model stacking based on satellite ima...
gpt5_lecture_notes_comprehensive_20250812015547.pdf

Editor's Notes

  • #2: Hello everyone.....It gives us immense pleasure in sharing our journey towards ScyllaDB, during this ScyllaSummit 2021. As part of this presentation, would take you all through who are we, what we do, why we choose ScyllaDB and finally the outcome.
  • #3: I'm Singa, I will be joined by Dilip copresentor for this talk.  We are both passionate database engineers at Expedia Group, working with multiple NoSQL technologies and strive to align use cases that makes the best out of the underlying datastore.
  • #4: Expedia Group, Inc. is one of world’s largest travel platforms. At ExpediaGroup – Our mission is to bring the world within reach. We firmly believe that travel has the power to change lives!
  • #5: We do that through the power of our brands
  • #6: Alright lets get into the nitty gritty of why we picked ScyllaDB and how it helped our developer journey. Currently at EG, there are multiple applications built on top of Apache cassandra, which comes with its own set of challenges. We will be going through some of them, throughout this deck.
  • #7: Apache Cassandra written in Java, brings in the onus of managing GC & making sure its appropriately tuned for the workload in hand. Though GC is tunable, it takes significant amount of time and effort as well as expertise required to handle/tune GC pause for every specific use case. With burst traffic or sudden peak in the workload, there is significant disturbance to the P99 response time. So, we end up adding buffer nodes to handle this peak capacity, which results in more infrastructure costs. Another significant worry is, based on past 4 years history the number of Apache Cassandra releases has significantly slowed down.
  • #8: We would like to compare the open source commits in Cassandra vs ScyllaDB here and highlight the amount of releases that Scylla has gone through the same past 3 year period. As you can see, it gives enough confidence towards ScyllaDB that given an issue/bug with a specific release, it will be soon addressed with a patch. In contrast with Apache Cassandra one might have to wait longer.
  • #9: So why did we end up with ScyllaDB? From an Apache Cassandra codebase, its frictionless for developers to switch over to ScyllaDB. For the use cases that we tried, there wasn’t any data model changes necessary and the scylladb driver was entirely compatible and a swap in replacement with Cassandra driver dependency. With a few tweaks to our automation framework that provisions Apache Cassandra cluster, we were able to provision ScyllaDB open source cluster. Thanks to C++ backend of ScyllaDB, we no longer have to worry about stop the world GC pauses. Also we were able to store more data per node and achieve more throughput per node, thereby saving significant $$$ for company. Clear roadmap and support from ScyllaDB slack community comes in very handy.
  • #10: The candidate application chosen for this POC, is our geo system that provides information about geographical entities and the relationships between them. It aggregates data from multiple systems like hotel location info, 3rd party data , etc. This rich geography dataset enables different types of data searches, using a simple REST API while guarantying single digit msec P99 read response time. To speed up API responses, we are using  multi layered cache with redis as first layer and cassandra as second level. With ScyllaDB as a swap in replacement for Cassandra, I’m handing it over to Dilip for going over the infra setup, benchmark results and next steps.
  • #11: Thank you Singa. Our POC cluster in ScyllaDB were to store around 25TB of data exactly like our existing PROD Cassandra cluster. To begin with we provisioned same total number of instances between Cassandra and ScyllaDB but the instance type chosen was I3.2XL which is 35% cheaper than I3EN.2XL. 
  • #12: The use case demands a high read throughput, while tiny write throughput. As shown in the first graph whether its ScyllaDB or Cassandra the writes are almost negligible or flat line at bottom. While the real winner is on Reads where the Cassandra P99 throughput is flaky as shown by the spikes, while the ScyllaDB P99 read response times are relatively flat. This is of significant advantage to our read heavy application. In terms of throughput comparison as shown in second graph, we were able to push almost double the TPS with ScyllaDB when compared with Cassandra, especially with a flat P99 SLA.
  • #13: Here are some of the facts that made ScyllaDB benchmark stand out. We were able to get triple the throughput with flat single digit P99 read response times, at the same time achieve over 35% reduction in total cost of ownership. At this point it was a no brainer to switch towards ScyllaDB for this application production workload.
  • #14: Huge shoutout to our automation team which made the provisioning of ScyllaDB cluster a breeze., made possible via our internal tool called Cerebro. We use this same internal tool for managing over 7 different NoSQL technologies with the aim of enabling our application teams to focus on bringing great products to the market, without having to worry about managing databases.
  • #15: This application in hand currently uses L1 cache (Redis) before hitting this backend persistent store ScyllaDB. With the advantage of ScyllaDB supporting Redis compatible API and proven P99 improves to be under single digit msec, we are thinking about turning off the in memory cache engine and rely completely on ScyllaDB as only database backend for the application. This will bring in significant additional cost advantage both in terms of infrastructure and application code. Also we recently learned about Scylla Alternator and are currently evaluating if it’s a viable alternative to DynamoDB as advertised.
  • #16: Logs are being pushed to syslog and there isn’t a configuration to route them to a customer folder of your choice. The CDC functionality is significantly better compared to Apache Cassandra, so this might entice applications that rely on change streams. A good thing about ScyllaDB node replacements either during scale up or scale down are resumable. Please pay caution while using large partition, the performance might vary depending on how large the partitions are.
  • #17: If you are interested in what you heard and want to build great products with us....Please join hands 
  • #18: Thanks for this opportunity to present to ScyllaDB enthuasists all over the world. We enjoyed every moment of putting this together and hope you did too.