SlideShare a Scribd company logo
Apache Cassandra at Target: 
Pioneering NoSQL 
in a Big Enterprise 
Dan Cundiff (@pmotch) 
Target
Context 
● Target’s API platform 
● mostly REST APIs 
● e.g. products, locations, inventory, etc. 
● consumers inside and outside of Target 
● wide variety of providing systems (legacy, in-house 
built, saas, packages, etc.)
Problems we needed to solve 
● slow providing systems 
● cost prohibitive to call directly 
● unable to scale from increased demand 
● need a place to aggregate data from multiple 
systems 
● some data wasn’t even in a database to 
begin with!
Barriers with existing tools, part 1 
● cost too much 
● process for traditional DBs wasn’t a fit 
● too few tools/vendors
Barriers with existing tools, part 2 
● RDBMS isn’t: 
○ distributed (multi-tenant) 
○ close to Guests (geographic distribution) 
○ distributed across our data centers 
○ distributed to the cloud!
Barriers with existing tools, part 3 
● lack of performance control 
○ process, not owning it all, flexibility on 
changes like indexing, etc 
● availability 
○ systems before had outages, downtime, 
etc. 
● not automate-able
Discovering the solution
Taking the idea back 
● i just went and talked to Pete and we 
decided to do it! 
● tried other things in the past 
● show results by trying; succeed or fail fast
Reasons trying was attractive, part 1 
● fit 80% of our need 
● years in development 
● rich C* dev ecosystem
Reasons trying was attractive, part 2 
● google-able 
● strong community 
● a company who would support it
Reasons trying was attractive, part 3 
● chef-able 
● aligned well with existing investments 
● simple pricing model
Barriers to adoption 
● enterprise IT; the nature of it 
● selling it 
● NoSQL for the first time 
● automation (was happening at the time; 
scary to do) 
● political
Challenges integrating 
● bulk loading data 
● keeping cassandra in sync 
● many systems not event driven 
● packaged software 
● limited ways to integrate with providing 
systems
Challenges of standing it up, part 1 
● early distributed system (new to teams) 
● needed local disk (always used SAN before) 
● needed SSDs (always used spinning things) 
● existing config conflicts (backups, 
monitoring, raid, swap, etc) 
● use right sized server (don’t settle for what 
your infra friends give you by default)
Challenges of standing it up, part 2 
● full stack ownership 
● it’s new, don’t hand it off 
● support response is quick because we own it 
● you’re closest to the problem; you’re best 
suited to solve it 
● tuned to meet the needs of our APIs 
● data is modeled for API performance gains
Challenges of standing it up, part 3 
● skills supply is low (but getting better) 
● train your people 
● be wary of promises from consultants 
○ grill them on what they claim to know
Challenges of development, part 1 
● skills ramp up (data modeling, datastax 
driver, etc) 
● developers need to care 
○ encourage tweaking, research, make 
things better 
○ clients are equally as important to get the 
most out of C*
Challenges of development, part 2 
● mind shift from RDBMS 
● started with Astyanax; switched to DataStax 
driver 
○ DataStax supported 
○ newer features
Ops challenges, part 1 
● lots of machines; don’t config by hand 
● wrote Chef cookbooks 
● support people saw these odd servers and 
turned on things we disabled (like swap) 
● can’t use “legacy” testing, cassandra works 
differently; chaos stuff (turn off gossip, thrift, 
etc.)
Ops challenges, part 2 
● made logging awesome; we can see 
anything 
● utilized C* jmx interface to send data in real-time 
to Splunk 
● can correlate these events with the app tier 
(because app logs are in Splunk too!)
Ops challenges, part 3 
● useful mbeans: 
○ heap usage 
○ specific read/write latencies 
○ dropped reads/writes 
○ bloom filter ratios 
○ column count, size
Ops challenges, part 4 
● more useful mbeans: 
○ ss tables per read 
○ tombstones 
○ cache hits and ratios 
○ misbehaving queries (range slice)
Open source cookbook! 
● https://github.com/target/dse-cookbook 
● by Danny Parker 
● pull requests encouraged
Blog post on tuning 
● http://target.github.io/infrastructure/tuning-cassandra/ 
● by Danny Parker (@dcparker88)
Results, part 1 
● from n00bs to production ready = 2 months! 
○ infra, operation testing, app dev, and 
deployed! 
○ just in time before peak season 
● today our highest volume APIs depend on it
Results, part 2 
● growth (↑ functions + ↑ volume) = ~2000% 
● increased adoption of our APIs 
● C* unlocking things we couldn't do before 
● quick changes possible 
○ makes Agile possible 
○ gets us close to continuous delivery
Results, part 3 
● other teams are using it; more coming 
● sharing our cookbooks, lessons, etc. 
● opened the door to other distributed systems
Future, part 1 
● Use across more of our APIs 
● Remove remaining spinning disks
Future, part 2 
● move to cloud 
● automate full stack down to infra 
○ scale, quick geo-distribute, flexibility to 
tweak new infra settings, etc.
Future, part 3 
● get better at data modeling designs 
● less bulk loading 
○ remove compaction process overhead 
● weave in Spark, Kafka 
○ more event-based updates
Future, part crazy 
● Docker + Cassandra?
We’re hiring! 
Come talk to us
#CassandraSummit 
Dan Cundiff (@pmotch) 
Danny Parker (@dcparker88) 
Pete Guidarelli (@pguidarelli) 
Heather Mickman (@hmmickman)
Apache Cassandra at Target - Cassandra Summit 2014

More Related Content

PPTX
Mobile Synchronization Patterns for Large Volumes of Data
PDF
Core Services behind Spark Job Execution
PDF
Making Automation Work
PDF
Reactive Integrations - Caveats and bumps in the road explained
PPTX
Zero-downtime deployment on Kubernetes with Hazelcast
PPTX
Demystify Big Data Breakfast Briefing - Juergen Urbanski, T-Systems
PDF
Using APIs to Create an Omni-Channel Retail Experience
PPTX
Electronics Industry (Marketing Management)
Mobile Synchronization Patterns for Large Volumes of Data
Core Services behind Spark Job Execution
Making Automation Work
Reactive Integrations - Caveats and bumps in the road explained
Zero-downtime deployment on Kubernetes with Hazelcast
Demystify Big Data Breakfast Briefing - Juergen Urbanski, T-Systems
Using APIs to Create an Omni-Channel Retail Experience
Electronics Industry (Marketing Management)

Viewers also liked (12)

PPTX
Operating Model
PPTX
Ceph Deployment at Target: Customer Spotlight
PPTX
Hadoop for the Masses
PDF
Target Holding - Big Dikes and Big Data
PPTX
Best buy strategic analysis (bb team) final
PPTX
Strategic Design by Architecture and Organisation @ FINN.no - JavaZone 2016
PPTX
Webinar | Target Modernizes Retail with Engaging Digital Experiences
PDF
Target: Performance Tuning Cassandra at Target
PPTX
Best buy
PDF
Best buy-analysis
PPTX
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
PDF
GWU Strategy Formulation & Implementation--Best Buy Case Study: Spring 2014
Operating Model
Ceph Deployment at Target: Customer Spotlight
Hadoop for the Masses
Target Holding - Big Dikes and Big Data
Best buy strategic analysis (bb team) final
Strategic Design by Architecture and Organisation @ FINN.no - JavaZone 2016
Webinar | Target Modernizes Retail with Engaging Digital Experiences
Target: Performance Tuning Cassandra at Target
Best buy
Best buy-analysis
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
GWU Strategy Formulation & Implementation--Best Buy Case Study: Spring 2014
Ad

Similar to Apache Cassandra at Target - Cassandra Summit 2014 (20)

PPT
Technology Overview
PDF
Cassandra's Odyssey @ Netflix
PDF
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
PDF
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
PDF
Five Years of EC2 Distilled
PDF
Glynn Bird – Cloudant – Building applications for success.- NoSQL matters Bar...
PDF
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
PDF
Vertafore: Database Evaluation - Selecting Apache Cassandra
PDF
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
PDF
Scalable, good, cheap
PDF
Introduction to Apache Cassandra
PPTX
How Rackspace Cloud Monitoring uses Cassandra
PPTX
Webinar: Overcoming the Storage Challenges Cassandra and Couchbase Create
PPT
UnConference for Georgia Southern Computer Science March 31, 2015
PPTX
BigData Developers MeetUp
PDF
Hpc lunch and learn
PPTX
Austin cassandra meetup
PDF
Cloud arch patterns
PPTX
Beyond Jurassic NoSQL: New Designs for a New World
PDF
Data Lake and the rise of the microservices
Technology Overview
Cassandra's Odyssey @ Netflix
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
Five Years of EC2 Distilled
Glynn Bird – Cloudant – Building applications for success.- NoSQL matters Bar...
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Vertafore: Database Evaluation - Selecting Apache Cassandra
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
Scalable, good, cheap
Introduction to Apache Cassandra
How Rackspace Cloud Monitoring uses Cassandra
Webinar: Overcoming the Storage Challenges Cassandra and Couchbase Create
UnConference for Georgia Southern Computer Science March 31, 2015
BigData Developers MeetUp
Hpc lunch and learn
Austin cassandra meetup
Cloud arch patterns
Beyond Jurassic NoSQL: New Designs for a New World
Data Lake and the rise of the microservices
Ad

More from Dan Cundiff (7)

PPTX
Governance to Guidance to Awesome Product - DOES 2018
PDF
How Target Made It Super Easy for Developers to Contribute to Open Source - L...
PDF
From No Git to 3000 GitHub Users and How to Keep Them Happy - GitHub Universe...
PDF
How to Build APIs - MHacks 2016
PDF
Why DevOps != the Wild West and How Embracing it Can Improve Security - RSA C...
PDF
Jenkins User Conference 2014
PPTX
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
Governance to Guidance to Awesome Product - DOES 2018
How Target Made It Super Easy for Developers to Contribute to Open Source - L...
From No Git to 3000 GitHub Users and How to Keep Them Happy - GitHub Universe...
How to Build APIs - MHacks 2016
Why DevOps != the Wild West and How Embracing it Can Improve Security - RSA C...
Jenkins User Conference 2014
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...

Recently uploaded (20)

PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Spectral efficient network and resource selection model in 5G networks
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
A Presentation on Artificial Intelligence
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Encapsulation theory and applications.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPT
Teaching material agriculture food technology
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
Network Security Unit 5.pdf for BCA BBA.
Spectral efficient network and resource selection model in 5G networks
The AUB Centre for AI in Media Proposal.docx
NewMind AI Monthly Chronicles - July 2025
A Presentation on Artificial Intelligence
Building Integrated photovoltaic BIPV_UPV.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Per capita expenditure prediction using model stacking based on satellite ima...
Reach Out and Touch Someone: Haptics and Empathic Computing
Machine learning based COVID-19 study performance prediction
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Encapsulation_ Review paper, used for researhc scholars
Encapsulation theory and applications.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Teaching material agriculture food technology
Dropbox Q2 2025 Financial Results & Investor Presentation

Apache Cassandra at Target - Cassandra Summit 2014

  • 1. Apache Cassandra at Target: Pioneering NoSQL in a Big Enterprise Dan Cundiff (@pmotch) Target
  • 2. Context ● Target’s API platform ● mostly REST APIs ● e.g. products, locations, inventory, etc. ● consumers inside and outside of Target ● wide variety of providing systems (legacy, in-house built, saas, packages, etc.)
  • 3. Problems we needed to solve ● slow providing systems ● cost prohibitive to call directly ● unable to scale from increased demand ● need a place to aggregate data from multiple systems ● some data wasn’t even in a database to begin with!
  • 4. Barriers with existing tools, part 1 ● cost too much ● process for traditional DBs wasn’t a fit ● too few tools/vendors
  • 5. Barriers with existing tools, part 2 ● RDBMS isn’t: ○ distributed (multi-tenant) ○ close to Guests (geographic distribution) ○ distributed across our data centers ○ distributed to the cloud!
  • 6. Barriers with existing tools, part 3 ● lack of performance control ○ process, not owning it all, flexibility on changes like indexing, etc ● availability ○ systems before had outages, downtime, etc. ● not automate-able
  • 8. Taking the idea back ● i just went and talked to Pete and we decided to do it! ● tried other things in the past ● show results by trying; succeed or fail fast
  • 9. Reasons trying was attractive, part 1 ● fit 80% of our need ● years in development ● rich C* dev ecosystem
  • 10. Reasons trying was attractive, part 2 ● google-able ● strong community ● a company who would support it
  • 11. Reasons trying was attractive, part 3 ● chef-able ● aligned well with existing investments ● simple pricing model
  • 12. Barriers to adoption ● enterprise IT; the nature of it ● selling it ● NoSQL for the first time ● automation (was happening at the time; scary to do) ● political
  • 13. Challenges integrating ● bulk loading data ● keeping cassandra in sync ● many systems not event driven ● packaged software ● limited ways to integrate with providing systems
  • 14. Challenges of standing it up, part 1 ● early distributed system (new to teams) ● needed local disk (always used SAN before) ● needed SSDs (always used spinning things) ● existing config conflicts (backups, monitoring, raid, swap, etc) ● use right sized server (don’t settle for what your infra friends give you by default)
  • 15. Challenges of standing it up, part 2 ● full stack ownership ● it’s new, don’t hand it off ● support response is quick because we own it ● you’re closest to the problem; you’re best suited to solve it ● tuned to meet the needs of our APIs ● data is modeled for API performance gains
  • 16. Challenges of standing it up, part 3 ● skills supply is low (but getting better) ● train your people ● be wary of promises from consultants ○ grill them on what they claim to know
  • 17. Challenges of development, part 1 ● skills ramp up (data modeling, datastax driver, etc) ● developers need to care ○ encourage tweaking, research, make things better ○ clients are equally as important to get the most out of C*
  • 18. Challenges of development, part 2 ● mind shift from RDBMS ● started with Astyanax; switched to DataStax driver ○ DataStax supported ○ newer features
  • 19. Ops challenges, part 1 ● lots of machines; don’t config by hand ● wrote Chef cookbooks ● support people saw these odd servers and turned on things we disabled (like swap) ● can’t use “legacy” testing, cassandra works differently; chaos stuff (turn off gossip, thrift, etc.)
  • 20. Ops challenges, part 2 ● made logging awesome; we can see anything ● utilized C* jmx interface to send data in real-time to Splunk ● can correlate these events with the app tier (because app logs are in Splunk too!)
  • 21. Ops challenges, part 3 ● useful mbeans: ○ heap usage ○ specific read/write latencies ○ dropped reads/writes ○ bloom filter ratios ○ column count, size
  • 22. Ops challenges, part 4 ● more useful mbeans: ○ ss tables per read ○ tombstones ○ cache hits and ratios ○ misbehaving queries (range slice)
  • 23. Open source cookbook! ● https://github.com/target/dse-cookbook ● by Danny Parker ● pull requests encouraged
  • 24. Blog post on tuning ● http://target.github.io/infrastructure/tuning-cassandra/ ● by Danny Parker (@dcparker88)
  • 25. Results, part 1 ● from n00bs to production ready = 2 months! ○ infra, operation testing, app dev, and deployed! ○ just in time before peak season ● today our highest volume APIs depend on it
  • 26. Results, part 2 ● growth (↑ functions + ↑ volume) = ~2000% ● increased adoption of our APIs ● C* unlocking things we couldn't do before ● quick changes possible ○ makes Agile possible ○ gets us close to continuous delivery
  • 27. Results, part 3 ● other teams are using it; more coming ● sharing our cookbooks, lessons, etc. ● opened the door to other distributed systems
  • 28. Future, part 1 ● Use across more of our APIs ● Remove remaining spinning disks
  • 29. Future, part 2 ● move to cloud ● automate full stack down to infra ○ scale, quick geo-distribute, flexibility to tweak new infra settings, etc.
  • 30. Future, part 3 ● get better at data modeling designs ● less bulk loading ○ remove compaction process overhead ● weave in Spark, Kafka ○ more event-based updates
  • 31. Future, part crazy ● Docker + Cassandra?
  • 32. We’re hiring! Come talk to us
  • 33. #CassandraSummit Dan Cundiff (@pmotch) Danny Parker (@dcparker88) Pete Guidarelli (@pguidarelli) Heather Mickman (@hmmickman)