SlideShare a Scribd company logo
Jeff Carpenter and Andrew Baker
Choice Hotels International
Building a Distributed Reservation System Using Cassandra
1 The Problem – Replacing a Reservation System
2 Managing Consistency
3 Microservices and Data Integrity
4 Schema Evolution
5 Data Retention and TTL
6 Performance and Cost Tradeoffs
2© DataStax, All Rights Reserved.
Central Reservation System Interfaces
© DataStax, All Rights Reserved. 3
CRSProperty
Systems
Web and
Mobile
External
Channels
Reporting
& Billing
Customer
&
Loyalty
Current Reservation System – By The Numbers
© DataStax, All Rights Reserved. 4
25 years
6,000 hotels
50
transactions / second4,000
distribution channels
1 instance
Architecture Tenets
Microservices
Cloud-native
Rules-based
Open Source Infrastructure
Stable, Scalable, Secure
© DataStax, All Rights Reserved. 5
© DataStax, All Rights Reserved. 6
Non-
relational
Rules-based
Reporting
& Analytics
Cloud
deployment
RESTful
services
High
availability
Our configuration
© DataStax, All Rights Reserved. 7
• 18 nodes
• Cassandra 2.2.X
• I2.2XL (Smaller in Dev/Test)
• 1 TB and growing
• 3 regions
• AWS VPC
• Direct Connect
• Legacy systems in
on-prem data center
C*
Project Timeline
© DataStax, All Rights Reserved. 8
Inception
• Proof of
concept
Beta
• Initial
Capability
• Beta Release
• <1%
production
traffic
Release 1
• Full Capability
• ~10%
production
traffic
Completion
• 100%
production
traffic
• Legacy
System
Retirement
Look, Ma, 100K
writes/sec!
Why are my
repairs failing?
We got this!
Key Data Types
© DataStax, All Rights Reserved. 9
rates inventoryhotels reservations
Key Data Types → Microservices
© DataStax, All Rights Reserved. 10
Hotel
Service
Booking
Service
Rates
Service
Shopping
Service
Data Maintenance
Apps
Inventory
Service
Reservation
Service
Inventory
keyspace
Rates
keyspace
Hotels
keyspace
Reservations
keyspace
Varying Consistency Needs
© DataStax, All Rights Reserved. 11
Eventual consistency Immediate consistency
ratesreservationshotels inventory
Distributed Transactions, Anyone?
© DataStax, All Rights Reserved. 12
Commit the
contract
Reserve
the inventory
Booking
Service
Data Maintenance
Apps
Inventory
Service
Reservation
Service
inventory
reservations
Data
synchronization
Alternatives to Distributed Transactions
Approach Example Scope
Lightweight Transaction Updating inventory counts Data Tier
Logged Batch
Writing to multiple denormalized
hotel tables
Data Tier
Retrying failed calls
Data synchronization,
reservation processing
Service
Compensating
processes
Verifying reservation processing System
© DataStax, All Rights Reserved. 13
Eventual
consistency
Strong
consistency
Where is my data?
© DataStax, All Rights Reserved. 14
Create
Block
C*
node
node
node
node
Update
Inventory
Check Block &
Adjust Inventory
Count
LOCAL_ONE
LOCAL_QUORUM
LOCAL_QUORUM
Configurable Consistency Levels
© DataStax, All Rights Reserved. 15
C*
node
node
node
nodeA LOCAL_ONE
C*
node
node
node
nodeB LOCAL_QUORUM
Test
© DataStax, All Rights Reserved. 16© DataStax, All Rights Reserved. 16
Cross Region Issues
Service
Reliability
© DataStax, All Rights Reserved. 17
ClusterBuilder.
addContactPoint()
ClusterBuilder.
addContactPoints()
VS
Java Driver
C*
node
node
node
node
Complex Time Queries
© DataStax, All Rights Reserved. 18
Shopping
request
Rate data
I can’t do a range query
on departure date
If I do a range query
on arrival date…
Time
Keyspace edge_hotel
Denormalization Gone Wild
(aka “Hotel Access Patterns”)
© DataStax, All Rights Reserved. 19
Locate hotel
by identifier
Find hotels
within X miles
of point Y
Find hotels by
city, state,
country
Find hotels
by postal
code
Hotels by
amenity
Find hotels
by brand
hotels_by_id
hotels_by_brand
hotels_by_postal_code
…
Hotels by
this
Hotels by
that
Hotels by
something
else
Schema Evolution
© DataStax, All Rights Reserved. 20
CREATE TABLE rates_by_hotel
(id text, hotel_id text,
code text, name text,
product_ids set<text>,
categories set<text>,
PRIMARY KEY ((hotel_id), code)
);
ALTER TABLE rates_by_hotel
DROP code;
ALTER TABLE rates_by_hotel
ADD code int;
schema.cql
001_code_to_integer.cql
001_rollback.cql
ALTER TABLE rates_by_hotel
DROP code;
ALTER TABLE rates_by_hotel
add code text;
Size and Cost Estimation
© DataStax, All Rights Reserved. 21
47%
37%
7%
9%
DATA SIZE
Inventory Rates Hotels Reservations
Schemas
Sizes
Chebotko
Formulas
Estimates
• String length
• Collection size
• Partition and
row counts
TTL for Data Cleanup
© DataStax, All Rights Reserved. 22
Now
Time
Yesterday’s data is
ancient history
Rate + Inventory Data
Service Level Agreement Decomposition
© DataStax, All Rights Reserved. 23
The shopping
service must
complete in 80 ms
The inventory
service must
complete in 20 ms
C* inventory
query must
complete in 8 ms
The rates service
must complete in
30 ms
C* rates query
must complete in
10 ms
A shopping request for a 3-night stay at rack rates for a property that has 15 room
types must have a 95 percentile completion time of 80 ms
Roadmap
Cassandra 3.X
Materialized Views & SASI
Row Cache
Spark & Search
© DataStax, All Rights Reserved. 24
Final Thoughts
Let Cassandra be Cassandra
Be Flexible in Consistency
Manage the Joins
Get to Scale
Automate Everything… with care
© DataStax, All Rights Reserved. 25
We’re Hiring!
© DataStax, All Rights Reserved. 26
http://careers.choicehotels.com
• Sr. Cassandra Database Administrator
• DevOps Architect
• Multiple Java development positions
Now Available!
© DataStax, All Rights Reserved. 27
Cassandra: The Definitive Guide, 2nd Edition
Completely reworked for Cassandra 3.X:
• Data modeling in CQL
• SASI indexes
• materialized views
• lightweight transactions
• DataStax drivers
• New chapters on security, deployment, and integration
Contact us
@JavaBakerag
@choicehotels
Choice Hotels
International
@jscarp
© DataStax, All Rights Reserved. 28

More Related Content

PDF
CockroachDB: Architecture of a Geo-Distributed SQL Database
KEY
Big Data in Real-Time at Twitter
PPTX
Sizing MongoDB Clusters
PPTX
Kafka 101
PDF
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
PDF
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
PDF
Multi-Tenant HBase Cluster - HBaseCon2018-final
PPTX
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
CockroachDB: Architecture of a Geo-Distributed SQL Database
Big Data in Real-Time at Twitter
Sizing MongoDB Clusters
Kafka 101
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
Multi-Tenant HBase Cluster - HBaseCon2018-final
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data

What's hot (20)

PPTX
Introduction to Apache ZooKeeper
PPTX
elasticsearch_적용 및 활용_정리
PDF
Presto on YARNの導入・運用
PDF
Apache HBase Improvements and Practices at Xiaomi
PPTX
[211] HBase 기반 검색 데이터 저장소 (공개용)
PPTX
PDF
How to tune Kafka® for production
PPTX
Introduction to Storm
PPTX
2.[d2 오픈세미나]네이버클라우드 시스템 아키텍처 및 활용 방안
PDF
Producer Performance Tuning for Apache Kafka
PDF
Hardening Kafka Replication
PDF
ETL and Event Sourcing
PPT
9. Document Oriented Databases
PPTX
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
PPTX
Apache HBase Performance Tuning
PDF
Exactly-once Semantics in Apache Kafka
ODP
Архитектура программных систем на Node.js
PDF
2021.02 new in Ceph Pacific Dashboard
PPTX
Elastic stack Presentation
PDF
Apache Kafka Fundamentals for Architects, Admins and Developers
Introduction to Apache ZooKeeper
elasticsearch_적용 및 활용_정리
Presto on YARNの導入・運用
Apache HBase Improvements and Practices at Xiaomi
[211] HBase 기반 검색 데이터 저장소 (공개용)
How to tune Kafka® for production
Introduction to Storm
2.[d2 오픈세미나]네이버클라우드 시스템 아키텍처 및 활용 방안
Producer Performance Tuning for Apache Kafka
Hardening Kafka Replication
ETL and Event Sourcing
9. Document Oriented Databases
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
Apache HBase Performance Tuning
Exactly-once Semantics in Apache Kafka
Архитектура программных систем на Node.js
2021.02 new in Ceph Pacific Dashboard
Elastic stack Presentation
Apache Kafka Fundamentals for Architects, Admins and Developers
Ad

Viewers also liked (11)

PDF
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
PDF
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
PDF
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
PPTX
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
PDF
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
PDF
PagerDuty: One Year of Cassandra Failures
PPTX
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
PPTX
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
PDF
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
PPTX
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
PPTX
Always On: Building Highly Available Applications on Cassandra
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
PagerDuty: One Year of Cassandra Failures
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Always On: Building Highly Available Applications on Cassandra
Ad

Similar to Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeffrey Carpenter, Choice Hotels) | C* Summit 2016 (20)

PDF
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
PDF
Cisco Connect 2018 Thailand - Cisco aci delivering intent for data center net...
PDF
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
PDF
Accelerating Digital Transformation with App Modernization
DOC
IbrahimAYussif_Resume
PPTX
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
PPSX
Corporate-Overview-Slides
PDF
Slides: Relational to NoSQL Migration
PPTX
Dublin Ireland Spark Meetup October 15, 2015
PDF
Siddhi - cloud-native stream processor
PPTX
NoSQL Application Development with JSON and MapR-DB
PDF
MongodB Internals
PDF
Marcel Kornacker, Software Enginner at Cloudera - "Data modeling for data sci...
PPTX
Datastax - The Architect's guide to customer experience (CX)
PDF
How to Evaluate Cloud Databases for eCommerce
PPTX
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
PPTX
How to get Real-Time Value from your IoT Data - Datastax
PPTX
Domino and AWS: collaborative analytics and model governance at financial ser...
PPTX
Insurtech, Cloud and Cybersecurity - Chartered Insurance Institute
PPTX
SDX Pitch Deck (201) - Apresentação SDP 2024
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
Cisco Connect 2018 Thailand - Cisco aci delivering intent for data center net...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Accelerating Digital Transformation with App Modernization
IbrahimAYussif_Resume
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Corporate-Overview-Slides
Slides: Relational to NoSQL Migration
Dublin Ireland Spark Meetup October 15, 2015
Siddhi - cloud-native stream processor
NoSQL Application Development with JSON and MapR-DB
MongodB Internals
Marcel Kornacker, Software Enginner at Cloudera - "Data modeling for data sci...
Datastax - The Architect's guide to customer experience (CX)
How to Evaluate Cloud Databases for eCommerce
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
How to get Real-Time Value from your IoT Data - Datastax
Domino and AWS: collaborative analytics and model governance at financial ser...
Insurtech, Cloud and Cybersecurity - Chartered Insurance Institute
SDX Pitch Deck (201) - Apresentação SDP 2024

More from DataStax (20)

PPTX
Is Your Enterprise Ready to Shine This Holiday Season?
PPTX
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
PPTX
Best Practices for Getting to Production with DataStax Enterprise Graph
PPTX
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
PPTX
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
PDF
Webinar | Better Together: Apache Cassandra and Apache Kafka
PDF
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
PDF
Introduction to Apache Cassandra™ + What’s New in 4.0
PPTX
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
PPTX
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
PDF
Designing a Distributed Cloud Database for Dummies
PDF
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
PPTX
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
PPTX
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
PPTX
An Operational Data Layer is Critical for Transformative Banking Applications
PPTX
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
PPTX
Innovation Around Data and AI for Fraud Detection
PPTX
Webinar: Building a Multi-Cloud Strategy with Data Autonomy featuring 451 Res...
PPTX
Real Time Customer Experience for today's Right-Now Economy
PPTX
Accelerating Digital Transformation using Cloud Native Solutions
Is Your Enterprise Ready to Shine This Holiday Season?
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Best Practices for Getting to Production with DataStax Enterprise Graph
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | Better Together: Apache Cassandra and Apache Kafka
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Introduction to Apache Cassandra™ + What’s New in 4.0
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Designing a Distributed Cloud Database for Dummies
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
An Operational Data Layer is Critical for Transformative Banking Applications
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Innovation Around Data and AI for Fraud Detection
Webinar: Building a Multi-Cloud Strategy with Data Autonomy featuring 451 Res...
Real Time Customer Experience for today's Right-Now Economy
Accelerating Digital Transformation using Cloud Native Solutions

Recently uploaded (20)

PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Digital Strategies for Manufacturing Companies
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
Transform Your Business with a Software ERP System
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Nekopoi APK 2025 free lastest update
PPTX
ai tools demonstartion for schools and inter college
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Understanding Forklifts - TECH EHS Solution
PDF
PTS Company Brochure 2025 (1).pdf.......
Design an Analysis of Algorithms II-SECS-1021-03
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Digital Strategies for Manufacturing Companies
2025 Textile ERP Trends: SAP, Odoo & Oracle
wealthsignaloriginal-com-DS-text-... (1).pdf
Transform Your Business with a Software ERP System
Which alternative to Crystal Reports is best for small or large businesses.pdf
Design an Analysis of Algorithms I-SECS-1021-03
Odoo Companies in India – Driving Business Transformation.pdf
Nekopoi APK 2025 free lastest update
ai tools demonstartion for schools and inter college
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
How to Choose the Right IT Partner for Your Business in Malaysia
Internet Downloader Manager (IDM) Crack 6.42 Build 41
CHAPTER 2 - PM Management and IT Context
Understanding Forklifts - TECH EHS Solution
PTS Company Brochure 2025 (1).pdf.......

Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeffrey Carpenter, Choice Hotels) | C* Summit 2016

  • 1. Jeff Carpenter and Andrew Baker Choice Hotels International Building a Distributed Reservation System Using Cassandra
  • 2. 1 The Problem – Replacing a Reservation System 2 Managing Consistency 3 Microservices and Data Integrity 4 Schema Evolution 5 Data Retention and TTL 6 Performance and Cost Tradeoffs 2© DataStax, All Rights Reserved.
  • 3. Central Reservation System Interfaces © DataStax, All Rights Reserved. 3 CRSProperty Systems Web and Mobile External Channels Reporting & Billing Customer & Loyalty
  • 4. Current Reservation System – By The Numbers © DataStax, All Rights Reserved. 4 25 years 6,000 hotels 50 transactions / second4,000 distribution channels 1 instance
  • 5. Architecture Tenets Microservices Cloud-native Rules-based Open Source Infrastructure Stable, Scalable, Secure © DataStax, All Rights Reserved. 5
  • 6. © DataStax, All Rights Reserved. 6 Non- relational Rules-based Reporting & Analytics Cloud deployment RESTful services High availability
  • 7. Our configuration © DataStax, All Rights Reserved. 7 • 18 nodes • Cassandra 2.2.X • I2.2XL (Smaller in Dev/Test) • 1 TB and growing • 3 regions • AWS VPC • Direct Connect • Legacy systems in on-prem data center C*
  • 8. Project Timeline © DataStax, All Rights Reserved. 8 Inception • Proof of concept Beta • Initial Capability • Beta Release • <1% production traffic Release 1 • Full Capability • ~10% production traffic Completion • 100% production traffic • Legacy System Retirement Look, Ma, 100K writes/sec! Why are my repairs failing? We got this!
  • 9. Key Data Types © DataStax, All Rights Reserved. 9 rates inventoryhotels reservations
  • 10. Key Data Types → Microservices © DataStax, All Rights Reserved. 10 Hotel Service Booking Service Rates Service Shopping Service Data Maintenance Apps Inventory Service Reservation Service Inventory keyspace Rates keyspace Hotels keyspace Reservations keyspace
  • 11. Varying Consistency Needs © DataStax, All Rights Reserved. 11 Eventual consistency Immediate consistency ratesreservationshotels inventory
  • 12. Distributed Transactions, Anyone? © DataStax, All Rights Reserved. 12 Commit the contract Reserve the inventory Booking Service Data Maintenance Apps Inventory Service Reservation Service inventory reservations Data synchronization
  • 13. Alternatives to Distributed Transactions Approach Example Scope Lightweight Transaction Updating inventory counts Data Tier Logged Batch Writing to multiple denormalized hotel tables Data Tier Retrying failed calls Data synchronization, reservation processing Service Compensating processes Verifying reservation processing System © DataStax, All Rights Reserved. 13 Eventual consistency Strong consistency
  • 14. Where is my data? © DataStax, All Rights Reserved. 14 Create Block C* node node node node Update Inventory Check Block & Adjust Inventory Count LOCAL_ONE LOCAL_QUORUM LOCAL_QUORUM
  • 15. Configurable Consistency Levels © DataStax, All Rights Reserved. 15 C* node node node nodeA LOCAL_ONE C* node node node nodeB LOCAL_QUORUM Test
  • 16. © DataStax, All Rights Reserved. 16© DataStax, All Rights Reserved. 16 Cross Region Issues
  • 17. Service Reliability © DataStax, All Rights Reserved. 17 ClusterBuilder. addContactPoint() ClusterBuilder. addContactPoints() VS Java Driver C* node node node node
  • 18. Complex Time Queries © DataStax, All Rights Reserved. 18 Shopping request Rate data I can’t do a range query on departure date If I do a range query on arrival date… Time
  • 19. Keyspace edge_hotel Denormalization Gone Wild (aka “Hotel Access Patterns”) © DataStax, All Rights Reserved. 19 Locate hotel by identifier Find hotels within X miles of point Y Find hotels by city, state, country Find hotels by postal code Hotels by amenity Find hotels by brand hotels_by_id hotels_by_brand hotels_by_postal_code … Hotels by this Hotels by that Hotels by something else
  • 20. Schema Evolution © DataStax, All Rights Reserved. 20 CREATE TABLE rates_by_hotel (id text, hotel_id text, code text, name text, product_ids set<text>, categories set<text>, PRIMARY KEY ((hotel_id), code) ); ALTER TABLE rates_by_hotel DROP code; ALTER TABLE rates_by_hotel ADD code int; schema.cql 001_code_to_integer.cql 001_rollback.cql ALTER TABLE rates_by_hotel DROP code; ALTER TABLE rates_by_hotel add code text;
  • 21. Size and Cost Estimation © DataStax, All Rights Reserved. 21 47% 37% 7% 9% DATA SIZE Inventory Rates Hotels Reservations Schemas Sizes Chebotko Formulas Estimates • String length • Collection size • Partition and row counts
  • 22. TTL for Data Cleanup © DataStax, All Rights Reserved. 22 Now Time Yesterday’s data is ancient history Rate + Inventory Data
  • 23. Service Level Agreement Decomposition © DataStax, All Rights Reserved. 23 The shopping service must complete in 80 ms The inventory service must complete in 20 ms C* inventory query must complete in 8 ms The rates service must complete in 30 ms C* rates query must complete in 10 ms A shopping request for a 3-night stay at rack rates for a property that has 15 room types must have a 95 percentile completion time of 80 ms
  • 24. Roadmap Cassandra 3.X Materialized Views & SASI Row Cache Spark & Search © DataStax, All Rights Reserved. 24
  • 25. Final Thoughts Let Cassandra be Cassandra Be Flexible in Consistency Manage the Joins Get to Scale Automate Everything… with care © DataStax, All Rights Reserved. 25
  • 26. We’re Hiring! © DataStax, All Rights Reserved. 26 http://careers.choicehotels.com • Sr. Cassandra Database Administrator • DevOps Architect • Multiple Java development positions
  • 27. Now Available! © DataStax, All Rights Reserved. 27 Cassandra: The Definitive Guide, 2nd Edition Completely reworked for Cassandra 3.X: • Data modeling in CQL • SASI indexes • materialized views • lightweight transactions • DataStax drivers • New chapters on security, deployment, and integration

Editor's Notes

  • #3: (Jeff) Overview of what we'll cover
  • #4: (Jeff) The reservation system interfaces to many of our other IT systems so replacing it is a major undertaking We interface with property systems so our franchisees can tell us about their room types, rates, and inventory Internal channels like our website and mobile applications allow customers to shop and book rooms External channels as well Reporting and billing systems pull information about reservations We interface with customer and loyalty systems to credit stays and support reward reservations
  • #5: (Jeff) Our current reservation system is over 25 years old - written in C and running on a large UNIX box with traditional RDBMS We’re currently making reservations for over 6000 hotels worldwide, and distributing over 50 different channels – everything from our own website and mobile apps to GDS and OTA partners This system is very performant and reliable, servicing over 4000 TPS However, the system scales vertically - we need horizontal scalability for future growth
  • #6: (Jeff) Here are some of the tenets of our architecture that led to our use of Cassandra: We wanted a microservices architecture based primarily on RESTful APIs. We designed for the cloud to run anywhere Externalize business rules We use open source infrastructure where possible The key architectural qualities we focus on are scalability, stability and security
  • #7: (Jeff) These are a few of the elements in our stack. We are mostly OSS We use Cassandra as our primary data store We use Spark and related technologies for reporting and analytics tasks Drools is our rule engine Our RESTful microservices are deployed using technologies such as tomcat and spring We’re using Netflix open source technologies including Hystrix and the Simian Army to build in high availability
  • #8: (Jeff) We’ve deployed our system in 3 AWS regions, two of which are currently active, with more on the way We’ve recently upgraded to direct connect between regions and our legacy data center, which has helped resolve some of the latency issues we’ll discuss later. Our application is currently running on a single 12 node cluster (although we’ve been up as high as 18) We’re running Cassandra 2.2 series releases but are planning upgrade to 3.X releases in order to take advantage of materialized views (Can we add something on data size?) This is a starting point, we’ll see where it grows…
  • #9: (Jeff) As our project has matured, so has our use of Cassandra When we started the project in 2014, one of our first tasks was a Cassandra proof of concept where we scaled up to more writes a second than we thought we might reasonably need. We were excited to learn the ways of Cassandra data modeling and configuration. As we worked toward our an initial capability toward the end of 2015, we had some growing pains as we learned the ropes of operating Cassandra in production. We put a lot of automation around Cassandra deployment and cluster maintenance and learned a lot about how to manage repairs. We ended 2015 with a successful Beta release where we booked our first reservation in production In 2016 we’re continuing to add functionality while increasing the traffic and working on improving scalability, stability and security In 2017 we’ll complete migration of our various partners and internal systems to the new system and retire the old reservation system
  • #10: (Andrew) Hotels - descriptive data about the hotels and their products, and policies. Quite static Rates - prices that are charged for the products. these can change many times a day, and could include an automated pricing system Inventory - constantly changes as rooms are booked, cancelled, etc. Data quality and currency is extremely important here so we don’t oversell our hotels Reservations - contract with the customer. Generally only changed when initiated by the customer, infrequent changes (The Marriott may disagree after dealing with me)
  • #11: (Andrew) After we identified our key data types, they seemed like a good way to divide the work and the system landscape As work commenced, we actually divided things a little further, and kept the keyspace per service idea going, approaching a share-nothing architecture style. We stayed with a single cluster for now to ease operations and reduce cost. On top of this we added services to encapsulate the complexities shopping and booking We used rules at this level to define business logic likely to change We also built data maintenance applications to: synchronize of data from other systems – our legacy system as well as some other systems that will stay in operation, such as property management systems Verify data accuracy across systems and across service boundaries Correct data issues caused by defects
  • #12: (Andrew) One of the things we’ve learned over time is that not all of our data has the same consistency requirements. Our hotel owners make their living off of their rates and inventory, if we sell too few rooms or offer the wrong rate, they are well justified to complain and may be compensated for the trouble. Customers hate being sold a room that wasn’t actually available, and it costs us money. With all of the talk of consistency requirements, there is always an undercurrent of performance requirements. Slow websites don’t sell rooms, we have to deliver the goods quickly, so we have to give ourselves some wiggle room on the consistency, we can’t use EACH_QUORUM everywhere And sometimes we might really need a LWT We implemented varying consistency by using separate CLs for each query, carefully considering the tradeoffs and documenting the decisions
  • #13: (Jeff) One of the challenges of a microservices architecture is keeping changes in sync across service boundaries. One example situation is in booking a reservation. Since the reservation represents our contract with the customer to reserve a specific room at a specific price and with certain conditions, we need to mark a reservation as committed at the same time as we reserve the inventory. This is important so that we don’t accidentally overbook our hotel. Making the situation more complicated, there could be simultaneous bookings and data maintenance activities also trying to access the same inventory Since these types are split across microservice boundaries, there is no transaction mechanism. In fact, since the data is in different rows (and different tables), Cassandra’s lightweight transactions are of no use to us here. We solved this by a layered approach – LWTs to protect inventory counts, retries within the booking service, and compensating processes to detect and cleanup failures
  • #14: (Jeff) Thankfully we have a variety of tools in our toolbox for guaranteeing consistency. Some of these are provided by Cassandra but some of them are architecture approaches.
  • #15: (Andrew) Our hard work of planning consistency levels bit us in short order when we began migrating our data We chose to migrate inventory in the same way we would maintain it, by processing update notifications Our legacy system was able to perform a recap, sending an update for every record When updating inventory counts from other systems, we wanted to make sure that we only accept counts for blocks of inventory that already exist. Creating a block is an infrequent and low priority activity, so we picked CL_ONE. Unfortunately, this is quite frequent when a hotel is not already set up, and is immediate followed by setting the inventory count This breaks the “check block” step, which then breaks the inventory sync for that hotel. This was a problem that only appeared when we were running at scale and doing a large inventory sync. We realized that we needed to take a step back and evaluate flows more carefully in order to assign the right consistency levels.
  • #16: (Andrew) When we deployed multiple regions for the first time, we soon realized more of our CLs were wrong, with our EACH_QUORUM requests timing out on every request for short periods Realizing we got it wrong the first time, our confidence getting it right now was low. Rather than mope, we made it easy to change, exposing environment variables for read and write consistency settings Making consistency level a matter of configuration enables us to launch instances with different consistency levels to measure the differences in speed an accuracy of data.
  • #17: (Andrew) We started with EACH_QUORUM CLs but found our VPN configuration didn’t allow us to achieve QUORUM across regions with the default timeout settings We had considered that and created a downgrading consistency retry policy. If you think you can tolerate downgrading your consistency on retry, you can probably tolerate that low of a level to begin with. If the initial times out regularly, you are essentially adding the timeout setting to your latency Our initial deployment had our nodes communicating to each other through VPN in Phoenix, when we did not have direct connect We have just switched to Direct Connect and hope to soon try higher consistency levels for some of our queries
  • #18: (Jeff) We’ve made use of Netflix’s Simian Army in order to build reliability into the system Part of this was allowing Chaos Monkey to kill Cassandra nodes to make sure our clusters could survive losing nodes abruptly This helped us mature our cluster monitoring and test automated cluster management capabilities. It also helped us uncover an unexpected behavior of the Java driver, which has since been fixed in the 3.0 driver. Our configuration was using a DNS name to locate the nodes in the cluster. Calling the “addContactPoint()” operation with the well known name initially bound to one record. If this happened to be the node that was killed by Chaos Monkey, before the record was cleared from DNS, the driver would fail to connect to that node, and would be unable to bootstrap. We worked around this by calling addContactPoints() instead, which binds to multiple IP addresses so that it can make multiple connection attempts. This has been fixed in the 3.0 driver – addContactPoint() now binds to multiple Ips if you use a DNS name. The moral of the story - to help mitigate against common connection issues, we created a common library to manage connections across services and connectivity. It loads connection information from the environment, including the cluster name, security credentials
  • #19: (Andrew) We store the rates per day we need all the rates that overlap the requested arrival and departure dates. We could iterate through the days in the search range, but we wanted to keep ourselves as flexible as we could, in order to handle other rate units, such as meeting rooms for 4 hours our cabins for 7 days. Doing a true search for overlapping ranges requires a range search against the start and end time of each rate. Since Cassandra doesn’t support range queries over multiple attributes, we determined that searching for all rates before the departure date, would usually eliminate the largest volume of data, then the service could trim the rest.
  • #20: (Jeff) after religiously following the mantra of designing tables for each access pattern, we soon ran into cases where adding a table per unique access pattern proved to be too much Take for example hotels and the number of ways by which various clients could search for hotels Since the hotel records are quite large, imagine the impact of all of these tables on our cluster size and storage requirements for 6000+ hotels. We reined this in by designing tables to support multiple queries and doing some filtering at the service layer, which helped us rein in our computing costs. We’re also looking to move to Cassandra 3.X in order to take advantage of materialized views and SASI indexes, which will allow us to shift some of the processing burden back to the database
  • #21: (Andrew) As the previous slide implies, we are frequently adding to our schemas Our platform team made the schema creation and evolution part of our deployment pipeline When changing schemas, we produce separate CQL files, in numbered order, so that the pipeline can apply them sequentially. We also provide rollback scripts so that we can return the database to a stable state if a deployment fails. This is a fairly standard schema management approach.
  • #22: (Jeff) In order to keep ourselves on track for database sizing and cost management, we’ve implemented a process to estimate these up front Our estimation process begins with proposed schemas Then in order to project size, we need to make some educated guesses – how long will string attributes be? How many elements will be in a Set, List, or Map? What are the likely partition and row counts for each table Given this information, we use Artem Chebotko’s formulas to estimate the total data size for each table and then keyspace Based on our estimates, the majority of our data consists of rates and inventory
  • #23: (Andrew) We have separated the shopping and booking concerns from our analysis and history uses, which means that in the shopping and booking systems, data relevant to the past is not much use. As we insert our data, we set the TTL for when it will no longer be needed, which saves us from developing our own cleanup process and reduces our storage footprint. We still need the historic data for analysis and customer service purposes, though, so we store it in a separate data platform which we feed from the reservation system using asynchronous event processing Our colleague Narasimhan Sampath is talking at Strata NYC later this month about our data and analytics platform, which is based on Spark and Hadoop. Make sure to check out his talk if you’re attending Strata.
  • #24: (Jeff) We’ve defined SLAs for the overall performance of the system for key flows such as shopping SLAs are defined in terms of 95% percentile response times under specified operating conditions We allocate the response times down to individual services and progressively down to individual Cassandra queries We can then create load tests to measure these targeted response times We’ve focused on using actual services for load testing rather than Cassandra stress due to its limitations
  • #25: (Jeff) Moving forward, we’ll be investigating including Materialized Views and row caching to improve performance We’re also looking at using Spark in some environments in order to support ad-hoc searching and exploration
  • #26: (Jeff) Cassandra is only part of your system architecture – don’t ask it to do too much or too little Choose the right consistency level for each data type and query Microservice architectures are great, but you’ll have to address joins and data consistency across services Get to scale as quickly as possible to discover the edge cases “Automate everything” includes monitoring data size and quality