CASSANDRA
@
ULTRAVISUAL
Cassandra Day New York 2014
Skye Book
Lead Systems Architect
ULTRAVISUA
L
A visual network for
inspiration, expression,
and collaboration
The Feed
• A user’s first taste of UV
• More than just posts
• Constantly being
tweaked and re-thought
SELECT	
DISTINCT _post.*	
FROM	
_post	
JOIN	
_collection_post cp ON _post.uuid=cp.post_uuid	
JOIN	
_collection_follow cf ON cp.c_uuid=cf.collection_uuid	
WHERE	
cf.user_id = ?	
ORDER BY _post.created_at DESC	
LIMIT 20 OFFSET 0
The Old Way
Started Simple
!
“Show me recent posts in
collections I follow”
SELECT	
a.*	
FROM	
_user_follow a, _user_follow b	
WHERE	
b.follower=12345	
AND	
a.follower=b.followed	
ORDER BY a.followed_at DESC	
LIMIT 20 OFFSET 0
The Old Way
Added Complexity
!
“Show me people recently
followed by my connections”
The Old Way
Every new feature needs
another query
!
Feed requests generate a
disproportionate amount of
load to normal CRUD ops
Reframing the Problem
From This:
A place for posts, new
collections, social activity, and
anything else interesting
nitro404.com/computers/knex.php
Reframing the Problem
To This:
A list of items interesting to
the user
The New Way
Model First
• With an SQL background, this can be
misleading.
• Essential Question: “How do I need to access
this data?”
–Rick Branson, Instagram
Cassandra Summit 2013
“Try to model data as a log of user intent”
The New Way
}
The New Way
user statu
s
created_a
t
story json
2 0 61b97280 user_follow:3:5 {“foo”:”bar”}
2 1 5daa04c0 post:bfbd0a39 {“foo”:”bar”}
2 1 565752e0 collection_follow:
5:d70961c1
{“foo”:”bar”}
2 1 4a8189e0 user_follow:3:5 {“foo”:”bar”}
Primary Key Cached story JSON
Model for user feeds
• Fast to fetch user stories
• Cached JSON means almost zero SQL requests
Fast.
Response times cut from
over 100’s ms to 30ms
range
Launch Week
Featured by Apple!
Cluster Disk Usage
26%
74%
Don’t be too cute
cqlsh:ultravisual> ALTER TABLE latest_feed DROP json;
Handling Deletions
• Data is only appended,
never deleted from user
feeds
• Adapted Instagram’s ‘Anti-
Column’ solution
• Avoids missed deletions
for nodes down longer
than GCGraceSeconds
• Avoids race condition
where deletion arrives
before write.
Sam follows Sandy
use
r
created_a
t
statu
s
story
2 4a8189e0 1 user_follow:
3:5
Sam unfollows Sandy
use
r
created_a
t
statu
s
story
2 61b97280 0 user_follow:
3:5
2 4a8189e0 1 user_follow:
3:5
Negated Entries
use
r
created_a
t
statu
s
story
2 61b97280 0 user_follow:
3:5
2 4a8189e0 1 user_follow:
3:5
use
r
statu
s
created_a
t
story
2 0 61b97280 user_follow:
3:5
2 1 4a8189e0 user_follow:
3:5
Keeps all entries in a single
time series
First page can usually be
populated by a single read
Splits user’s row into two lists,
live and undo
Will always require at least
two reads
Further Uses
• User Notifications
• User Onboarding
• Reshare Statistics
• User & Content Reports
• API Statistics
User Onboarding
user created_a
t
sequence step content
2 61b97280 onboaring_v2 1 rec_collections_1
3 5daa04c0 onboaring_v2 2 rec_collections_2
5 565752e0 onboaring_v3 1 find_friends
6 4a8189e0 onboaring_v3 1 find_friends
Sequenced feed entries
for users on signup
Production Experiences
Drivers
• Java: Started with Astyanax, moved to Datastax
v2
• Node.js: node-cassandra-cql
Cryptic message with large batch updates in pre-release versions of
2.0 driver
DS Driver Issue 229
com.datastax.driver.core.exceptions.DriverInternalError: An
unexpected protocol error occured. This is a bug in this library,
please report: Unknown code 256 for a consistency level
As of 2.0, batches with more than 64k statements throw a better
exception:
java.lang.IllagalStateException: Batch statement cannot contain
more than 65536 statements.
Just use LZ4
Compression
Cassandra-4851
Unfortunate truth in Cassandra 2.0.5
!
cqlsh:test> SELECT *	
	 	 FROM user_feed	
	 	 WHERE user = 2	
	 	 	 AND created_at > :some_uuid	
	 	 	 AND status=0;	
!
cqlsh:test> Bad Request: PRIMARY KEY part status cannot be	
	 	 	 	 	 restricted (preceding part created_at is either not 	
	 	 	 	 	 restricted or by a non-EQ relation)
Cassandra-4851
Adds CQL3 support for vector
comparison syntax
!
cqlsh:test> SELECT *	
	 	 FROM timeline	
	 	 WHERE day = ’21 Jun 2014’	
	 	 	 AND (hour,min) >= (3,50)	
	 	 	 AND (hour,min,sec) <= (4,37,30);
Available in 2.0.6
Production Experiences
Upgrades
• Manual package installs (dsc20 from Datastax)
• One node at a time
• Upgrade, wait for healthy status &
operations, move on
• OpsCenter provides good overview
Production Experiences
Speaking of OpsCenter…
• Don’t be alarmed if nodes appear but agent
data does not
• opscenterd often needs a restart after cluster
upgrade to see agents again
Production Experiences
Service Discovery
• Running on AWS using EC2MultiRegionSnitch
• Using OpsWorks (Amazon’s Chef service) for
seed config
Chef Cookbook
github.com/skyebook/cassandra-opsworks-chef-
cookbook
• Forked from Michael Klishin’s awesome C* cookbook
• Added integration with OpsWorks’ stack.json
# Add this node as the first seed	
# If using the multi-region snitch, we must use the public IP address	
if node["cassandra"]["snitch"] == "Ec2MultiRegionSnitch"	
seed_array << node["opsworks"]["instance"]["ip"]	
else	
seed_array << node["opsworks"]["instance"]["private_ip"]	
end	
!
node["opsworks"]["layers"]["cassandra"]["instances"].each do |instance_name, values|	
if node["cassandra"]["snitch"] == "Ec2MultiRegionSnitch"	
seed_array << values["ip"]	
else	
seed_array << values["private_ip"]	
end	
end	
	
set[:cassandra][:seeds] = seed_array
Questions

More Related Content

PDF
Getting Started with Apache Cassandra by Junior Evangelist Rebecca Mills
PDF
A Small Talk on Getting Big
PDF
Cassandra Bootstap from Backups
PDF
Cassandra Bootstrap from Backups
PDF
Advanced Cassandra
PPTX
Oracle real application clusters system tests with demo
PPTX
Confoo 2021 -- MySQL New Features
PDF
Monitoring all Elements of Your Database Operations With Zabbix
Getting Started with Apache Cassandra by Junior Evangelist Rebecca Mills
A Small Talk on Getting Big
Cassandra Bootstap from Backups
Cassandra Bootstrap from Backups
Advanced Cassandra
Oracle real application clusters system tests with demo
Confoo 2021 -- MySQL New Features
Monitoring all Elements of Your Database Operations With Zabbix

Similar to Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual (20)

PDF
Strategic Autovacuum
PPTX
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
PPTX
Cassandra - A decentralized storage system
PPTX
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
PDF
My sql 5.7-upcoming-changes-v2
PDF
Intro to Databases
PDF
Why MySQL Replication Fails, and How to Get it Back
PDF
An introduction to_rac_system_test_planning_methods
PPTX
Boot Strapping in Cassandra
PDF
Optimizing Slow Queries with Indexes and Creativity
PDF
Mysql 57-upcoming-changes
PDF
Strategic autovacuum
PDF
Training Slides: 202 - Monitoring & Troubleshooting
PDF
Cassandra 3.0
PDF
Slide presentation pycassa_upload
PPTX
Devops kc
PPTX
2019 Blackhat Booth Presentation - PowerUpSQL
PDF
PowerUpSQL - 2018 Blackhat USA Arsenal Presentation
PDF
Macy's: Changing Engines in Mid-Flight
PDF
Ben Coverston - The Apache Cassandra Project
Strategic Autovacuum
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
Cassandra - A decentralized storage system
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
My sql 5.7-upcoming-changes-v2
Intro to Databases
Why MySQL Replication Fails, and How to Get it Back
An introduction to_rac_system_test_planning_methods
Boot Strapping in Cassandra
Optimizing Slow Queries with Indexes and Creativity
Mysql 57-upcoming-changes
Strategic autovacuum
Training Slides: 202 - Monitoring & Troubleshooting
Cassandra 3.0
Slide presentation pycassa_upload
Devops kc
2019 Blackhat Booth Presentation - PowerUpSQL
PowerUpSQL - 2018 Blackhat USA Arsenal Presentation
Macy's: Changing Engines in Mid-Flight
Ben Coverston - The Apache Cassandra Project
Ad

More from DataStax Academy (20)

PDF
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
PPTX
Introduction to DataStax Enterprise Graph Database
PPTX
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
PPTX
Cassandra on Docker @ Walmart Labs
PDF
Cassandra 3.0 Data Modeling
PPTX
Cassandra Adoption on Cisco UCS & Open stack
PDF
Data Modeling for Apache Cassandra
PDF
Coursera Cassandra Driver
PDF
Production Ready Cassandra
PDF
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 1
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 2
PDF
Standing Up Your First Cluster
PDF
Real Time Analytics with Dse
PDF
Introduction to Data Modeling with Apache Cassandra
PDF
Cassandra Core Concepts
PPTX
Enabling Search in your Cassandra Application with DataStax Enterprise
PPTX
Bad Habits Die Hard
PDF
Advanced Data Modeling with Apache Cassandra
PDF
Apache Cassandra and Drivers
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Cassandra on Docker @ Walmart Labs
Cassandra 3.0 Data Modeling
Cassandra Adoption on Cisco UCS & Open stack
Data Modeling for Apache Cassandra
Coursera Cassandra Driver
Production Ready Cassandra
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 2
Standing Up Your First Cluster
Real Time Analytics with Dse
Introduction to Data Modeling with Apache Cassandra
Cassandra Core Concepts
Enabling Search in your Cassandra Application with DataStax Enterprise
Bad Habits Die Hard
Advanced Data Modeling with Apache Cassandra
Apache Cassandra and Drivers
Ad

Recently uploaded (20)

PPTX
Web Crawler for Trend Tracking Gen Z Insights.pptx
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
Tartificialntelligence_presentation.pptx
PDF
Zenith AI: Advanced Artificial Intelligence
Web Crawler for Trend Tracking Gen Z Insights.pptx
observCloud-Native Containerability and monitoring.pptx
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
WOOl fibre morphology and structure.pdf for textiles
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
DP Operators-handbook-extract for the Mautical Institute
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
1 - Historical Antecedents, Social Consideration.pdf
Final SEM Unit 1 for mit wpu at pune .pptx
A novel scalable deep ensemble learning framework for big data classification...
A contest of sentiment analysis: k-nearest neighbor versus neural network
sustainability-14-14877-v2.pddhzftheheeeee
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Enhancing emotion recognition model for a student engagement use case through...
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
A review of recent deep learning applications in wood surface defect identifi...
Univ-Connecticut-ChatGPT-Presentaion.pdf
Tartificialntelligence_presentation.pptx
Zenith AI: Advanced Artificial Intelligence

Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

  • 1. CASSANDRA @ ULTRAVISUAL Cassandra Day New York 2014 Skye Book Lead Systems Architect
  • 2. ULTRAVISUA L A visual network for inspiration, expression, and collaboration
  • 3. The Feed • A user’s first taste of UV • More than just posts • Constantly being tweaked and re-thought
  • 4. SELECT DISTINCT _post.* FROM _post JOIN _collection_post cp ON _post.uuid=cp.post_uuid JOIN _collection_follow cf ON cp.c_uuid=cf.collection_uuid WHERE cf.user_id = ? ORDER BY _post.created_at DESC LIMIT 20 OFFSET 0 The Old Way Started Simple ! “Show me recent posts in collections I follow”
  • 5. SELECT a.* FROM _user_follow a, _user_follow b WHERE b.follower=12345 AND a.follower=b.followed ORDER BY a.followed_at DESC LIMIT 20 OFFSET 0 The Old Way Added Complexity ! “Show me people recently followed by my connections”
  • 6. The Old Way Every new feature needs another query ! Feed requests generate a disproportionate amount of load to normal CRUD ops
  • 7. Reframing the Problem From This: A place for posts, new collections, social activity, and anything else interesting nitro404.com/computers/knex.php
  • 8. Reframing the Problem To This: A list of items interesting to the user
  • 9. The New Way Model First • With an SQL background, this can be misleading. • Essential Question: “How do I need to access this data?”
  • 10. –Rick Branson, Instagram Cassandra Summit 2013 “Try to model data as a log of user intent” The New Way
  • 11. } The New Way user statu s created_a t story json 2 0 61b97280 user_follow:3:5 {“foo”:”bar”} 2 1 5daa04c0 post:bfbd0a39 {“foo”:”bar”} 2 1 565752e0 collection_follow: 5:d70961c1 {“foo”:”bar”} 2 1 4a8189e0 user_follow:3:5 {“foo”:”bar”} Primary Key Cached story JSON Model for user feeds • Fast to fetch user stories • Cached JSON means almost zero SQL requests
  • 12. Fast. Response times cut from over 100’s ms to 30ms range
  • 13. Launch Week Featured by Apple! Cluster Disk Usage 26% 74%
  • 14. Don’t be too cute cqlsh:ultravisual> ALTER TABLE latest_feed DROP json;
  • 15. Handling Deletions • Data is only appended, never deleted from user feeds • Adapted Instagram’s ‘Anti- Column’ solution • Avoids missed deletions for nodes down longer than GCGraceSeconds • Avoids race condition where deletion arrives before write. Sam follows Sandy use r created_a t statu s story 2 4a8189e0 1 user_follow: 3:5 Sam unfollows Sandy use r created_a t statu s story 2 61b97280 0 user_follow: 3:5 2 4a8189e0 1 user_follow: 3:5
  • 16. Negated Entries use r created_a t statu s story 2 61b97280 0 user_follow: 3:5 2 4a8189e0 1 user_follow: 3:5 use r statu s created_a t story 2 0 61b97280 user_follow: 3:5 2 1 4a8189e0 user_follow: 3:5 Keeps all entries in a single time series First page can usually be populated by a single read Splits user’s row into two lists, live and undo Will always require at least two reads
  • 17. Further Uses • User Notifications • User Onboarding • Reshare Statistics • User & Content Reports • API Statistics
  • 18. User Onboarding user created_a t sequence step content 2 61b97280 onboaring_v2 1 rec_collections_1 3 5daa04c0 onboaring_v2 2 rec_collections_2 5 565752e0 onboaring_v3 1 find_friends 6 4a8189e0 onboaring_v3 1 find_friends Sequenced feed entries for users on signup
  • 19. Production Experiences Drivers • Java: Started with Astyanax, moved to Datastax v2 • Node.js: node-cassandra-cql
  • 20. Cryptic message with large batch updates in pre-release versions of 2.0 driver DS Driver Issue 229 com.datastax.driver.core.exceptions.DriverInternalError: An unexpected protocol error occured. This is a bug in this library, please report: Unknown code 256 for a consistency level As of 2.0, batches with more than 64k statements throw a better exception: java.lang.IllagalStateException: Batch statement cannot contain more than 65536 statements.
  • 22. Cassandra-4851 Unfortunate truth in Cassandra 2.0.5 ! cqlsh:test> SELECT * FROM user_feed WHERE user = 2 AND created_at > :some_uuid AND status=0; ! cqlsh:test> Bad Request: PRIMARY KEY part status cannot be restricted (preceding part created_at is either not restricted or by a non-EQ relation)
  • 23. Cassandra-4851 Adds CQL3 support for vector comparison syntax ! cqlsh:test> SELECT * FROM timeline WHERE day = ’21 Jun 2014’ AND (hour,min) >= (3,50) AND (hour,min,sec) <= (4,37,30); Available in 2.0.6
  • 24. Production Experiences Upgrades • Manual package installs (dsc20 from Datastax) • One node at a time • Upgrade, wait for healthy status & operations, move on • OpsCenter provides good overview
  • 25. Production Experiences Speaking of OpsCenter… • Don’t be alarmed if nodes appear but agent data does not • opscenterd often needs a restart after cluster upgrade to see agents again
  • 26. Production Experiences Service Discovery • Running on AWS using EC2MultiRegionSnitch • Using OpsWorks (Amazon’s Chef service) for seed config
  • 27. Chef Cookbook github.com/skyebook/cassandra-opsworks-chef- cookbook • Forked from Michael Klishin’s awesome C* cookbook • Added integration with OpsWorks’ stack.json # Add this node as the first seed # If using the multi-region snitch, we must use the public IP address if node["cassandra"]["snitch"] == "Ec2MultiRegionSnitch" seed_array << node["opsworks"]["instance"]["ip"] else seed_array << node["opsworks"]["instance"]["private_ip"] end ! node["opsworks"]["layers"]["cassandra"]["instances"].each do |instance_name, values| if node["cassandra"]["snitch"] == "Ec2MultiRegionSnitch" seed_array << values["ip"] else seed_array << values["private_ip"] end end set[:cassandra][:seeds] = seed_array