SlideShare a Scribd company logo
Cassandra Evolution at Expedia
Adoption timeline
  2011
  •   Q2 Cassandra 0.6 live for First Pricing index
  •   Q3 Cassandra 0.7 live for Session state cache
  •   Q3 Cassandra 0.7 for high speed transaction logging
  •   Q4 Cassandra Splunk integration complete for put/get
  •   Q4 Cassandra Web Portal and RESTful interface available

  2012
  •   Q2 DataStax upgrade for first pricing index & enable SOLR
  •   Q2 DataStax live for Second Pricing index & enable SOLR
  •   Q2 DataStax live for high speed transaction logging
  •   Q2 Onboarding new markets to indexes
  •   Q2 DataStax PoC for Content cache
  •   Q3 Upgrade Web Portal to use SOLR and Hadoop
      functionality
Pricing Index Use Case
Pricing Index Ring Solution
Current Solution
• 9 nodes split across 3 racks running Cassandra 1.0
• There is a separate 2 node App tier written in Java that access data
  in CFS
• On top of that is a 4 node SOLR tier for search

Next Gen 2Q12 solution
• DataStax upgrade and elimination of app and SOLR nodes
• Given this change we can onboard two new indexes and all major
  markets
• New ring is 27 nodes in one physical datacenter and will expand to
  54 nodes in June split across two datacenters
• Expanding from initial small market set to more markets, price
  elements and options
• Using DataStax solution is showing positive results in the lab and
  shaving seconds off of user search times
High Speed Logging Use Case
Logging Ring Solution
Current Solution
• 40 MSSQL database high end servers using 160TB Tier 1 SAN and 14
  days of online information and no redundancy or replication
• 5 Reporting archive MSSQL databases that keep 30 day roll ups of
  the 40 databases for trending
• Need to know SQL to pull any data or reports, have multiple db
  developers working on team

Next Gen 2Q12 solution
• DataStax deployment of 72 commodity nodes across 2 physical
  datacenter with replication
• 1/3 of nodes will have Hadoop enabled for reporting and will be
  available via web service for non-SQL business personnel
• 90 day retention window of raw data
• Cost Savings $1.5M in 2012
Logging Ring Architecture




              Data       Data
            Center 1   Center 2
Access Architecture
Summary
• Current footprint 132 nodes production, 27 nodes dev, 24
  nodes PoC
• Expedia is investing heavily in Cassandra and specifically
  DataStax solution to optimize user experience
• Heavily leveraging Hadoop and SOLR integrations to get rapid
  value from solutions
• Using DataStax is shaving seconds off of search times which
  equates to significantly better customer experience
• Continuing to look at new use cases and expect to have 240
  nodes of DataStax in production by Q412
Questions ?

More Related Content

PDF
Logging infrastructure for Microservices using StreamSets Data Collector
PPTX
Accelerating Data Warehouse Modernization
PPTX
Mainframe Modernization with Precisely and Microsoft Azure
PDF
Cassandra in e-commerce
PDF
Cassandra at eBay - Cassandra Summit 2013
PPTX
Preventative Maintenance of Robots in Automotive Industry
PPTX
Big Data Quickstart Series 3: Perform Data Integration
PPTX
Launch and Scale Your E-commerce Website with Magento
Logging infrastructure for Microservices using StreamSets Data Collector
Accelerating Data Warehouse Modernization
Mainframe Modernization with Precisely and Microsoft Azure
Cassandra in e-commerce
Cassandra at eBay - Cassandra Summit 2013
Preventative Maintenance of Robots in Automotive Industry
Big Data Quickstart Series 3: Perform Data Integration
Launch and Scale Your E-commerce Website with Magento

What's hot (7)

PDF
Designing For Multicloud, CF Summit Frankfurt 2016
PDF
Getting Started with Elasticsearch
PPTX
Benchmark of Alibaba Cloud capabilities
PPTX
HIPAA Compliance in the Cloud
PPTX
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
PDF
Big data on aws
PPTX
Lambda-less Stream Processing @Scale in LinkedIn
Designing For Multicloud, CF Summit Frankfurt 2016
Getting Started with Elasticsearch
Benchmark of Alibaba Cloud capabilities
HIPAA Compliance in the Cloud
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Big data on aws
Lambda-less Stream Processing @Scale in LinkedIn
Ad

Similar to Datastax Expedia (20)

PPTX
Webinar: Don't Leave Your Data in the Dark
PPTX
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
PDF
Slides: Relational to NoSQL Migration
PDF
State of Cassandra 2012
PDF
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
PPTX
DataStax
PDF
Austin Cassandra Users 6/19: Apache Cassandra at Vast
PDF
Cassandra at Vast
PPTX
John Glendenning - Real time data driven services in the Cloud
PDF
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...
PDF
BigData as a Platform: Cassandra and Current Trends
PDF
Polyglot Persistence in the Real World: Cassandra + S3 + MapReduce
PDF
Tues 115pm cassandra + s3 + hadoop = quick auditing and analytics_yazovskiy
PPTX
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
PPTX
Performance tuning - A key to successful cassandra migration
PDF
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
PPTX
Cassandra training
PPTX
BigData Developers MeetUp
PDF
Capital One: Using Cassandra In Building A Reporting Platform
PPTX
Webinar: Comparing DataStax Enterprise with Open Source Apache Cassandra
Webinar: Don't Leave Your Data in the Dark
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Slides: Relational to NoSQL Migration
State of Cassandra 2012
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
DataStax
Austin Cassandra Users 6/19: Apache Cassandra at Vast
Cassandra at Vast
John Glendenning - Real time data driven services in the Cloud
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...
BigData as a Platform: Cassandra and Current Trends
Polyglot Persistence in the Real World: Cassandra + S3 + MapReduce
Tues 115pm cassandra + s3 + hadoop = quick auditing and analytics_yazovskiy
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Performance tuning - A key to successful cassandra migration
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
Cassandra training
BigData Developers MeetUp
Capital One: Using Cassandra In Building A Reporting Platform
Webinar: Comparing DataStax Enterprise with Open Source Apache Cassandra
Ad

Recently uploaded (20)

PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Getting Started with Data Integration: FME Form 101
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Mushroom cultivation and it's methods.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Encapsulation theory and applications.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
1. Introduction to Computer Programming.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Getting Started with Data Integration: FME Form 101
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
MIND Revenue Release Quarter 2 2025 Press Release
Mushroom cultivation and it's methods.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Heart disease approach using modified random forest and particle swarm optimi...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Network Security Unit 5.pdf for BCA BBA.
OMC Textile Division Presentation 2021.pptx
Encapsulation theory and applications.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
A comparative analysis of optical character recognition models for extracting...
Advanced methodologies resolving dimensionality complications for autism neur...
1. Introduction to Computer Programming.pptx

Datastax Expedia

  • 2. Adoption timeline 2011 • Q2 Cassandra 0.6 live for First Pricing index • Q3 Cassandra 0.7 live for Session state cache • Q3 Cassandra 0.7 for high speed transaction logging • Q4 Cassandra Splunk integration complete for put/get • Q4 Cassandra Web Portal and RESTful interface available 2012 • Q2 DataStax upgrade for first pricing index & enable SOLR • Q2 DataStax live for Second Pricing index & enable SOLR • Q2 DataStax live for high speed transaction logging • Q2 Onboarding new markets to indexes • Q2 DataStax PoC for Content cache • Q3 Upgrade Web Portal to use SOLR and Hadoop functionality
  • 4. Pricing Index Ring Solution Current Solution • 9 nodes split across 3 racks running Cassandra 1.0 • There is a separate 2 node App tier written in Java that access data in CFS • On top of that is a 4 node SOLR tier for search Next Gen 2Q12 solution • DataStax upgrade and elimination of app and SOLR nodes • Given this change we can onboard two new indexes and all major markets • New ring is 27 nodes in one physical datacenter and will expand to 54 nodes in June split across two datacenters • Expanding from initial small market set to more markets, price elements and options • Using DataStax solution is showing positive results in the lab and shaving seconds off of user search times
  • 6. Logging Ring Solution Current Solution • 40 MSSQL database high end servers using 160TB Tier 1 SAN and 14 days of online information and no redundancy or replication • 5 Reporting archive MSSQL databases that keep 30 day roll ups of the 40 databases for trending • Need to know SQL to pull any data or reports, have multiple db developers working on team Next Gen 2Q12 solution • DataStax deployment of 72 commodity nodes across 2 physical datacenter with replication • 1/3 of nodes will have Hadoop enabled for reporting and will be available via web service for non-SQL business personnel • 90 day retention window of raw data • Cost Savings $1.5M in 2012
  • 7. Logging Ring Architecture Data Data Center 1 Center 2
  • 9. Summary • Current footprint 132 nodes production, 27 nodes dev, 24 nodes PoC • Expedia is investing heavily in Cassandra and specifically DataStax solution to optimize user experience • Heavily leveraging Hadoop and SOLR integrations to get rapid value from solutions • Using DataStax is shaving seconds off of search times which equates to significantly better customer experience • Continuing to look at new use cases and expect to have 240 nodes of DataStax in production by Q412