What We Learned About Cassandra While Building go90 (Christopher Webster & Thomas Ng, AOL) | C* Summit 2016

What we learned about Cassandra while building
go90 ?
Chris Webster
Thomas Ng

1 What is go90 ?
2 What do we use Cassandra for ?
3 Lessons learned
4 Q and A
2© DataStax, All Rights Reserved.

What is go90 ?
© DataStax, All Rights Reserved. 3
Mobile video entertainment
platform
On demand original content
Live events ( NBA / NFL / Soccer /
Reality Show / Concerts)
Interactive and Social

What do we use Cassandra for ?
• User metadata storage and search
• Schema evolution
• DSE cassandra/solr integration
• Comments
• Time series data
• Complex pagination
• Counters
• Resume point
• Expiration (TTL)

What do we use Cassandra for ?
• Activity / Feed
• Activity aggregation
• Fan-out to followers
• User accounts/rights
• Service management
• Content discovery

go90 Cassandra setup
• DSE 4.8.4
• Cassandra 2.1.12.1046
• Java driver version 2.10
• Native Protocol v3
• Java 8
• Running on Amazon Web Services EC2
• c3/4 4xlarge instances
• Mission critical service on own cluster
• Shared cluster for others
• Ephemeral ssd and encrypted ebs

Schema evolution
• Use case: Add new column to table schema
• Existing user profile table:
• Primary key: pid (UUID)
• Columns: lastName, firstName, gender, lastModified
• Deployed and running in production
• Lookup user info with prepared statement:
• Query: select * from user_profile where pid = ‘some-uuid’;
• Add new column for imageUrl
• Service code change to extract new column from ResultSet in existing query above
• Apply schema change to production server
• alter table user_profile add imageurl varchar;
• Deploy new service
• No down time at all !?

Avoid SELECT * !
• Prepared statement running on existing service with the old schema might start to fall as soon as
new column is added:
• Java driver could throw InvalidTypeException at runtime when it tries to de-serialize the ResultSet
• Cassandra’s cache of prepared statement could go out-of-sync with the new table schema
• https://support.datastax.com/hc/en-us/articles/209573086-Java-driver-queries-result-in-
InvalidTypeException-Not-enough-bytes-to-deserialize-type-
• Always explicitly specify the fields you need in your SELECT query:
• Predictable result
• Avoid down time during schema change
• More data efficient - only get what you need
• Query: select lastName, firstName, imageUrl from user_profile where pid = ‘some-uuid’;

Data modeling with time series data
• Use case:
• Look up latest comments (timestamp descending) on a video id, paginated
• Create schema based on the query you need
• Make use of clustering order to do the sorting for you!
• Make sure your pagination code covers each clustering key
• Different people could comment on a video at the same timestamp!
• Or make use of automatic paging support in Java driver

Time series data example
Video id timestamp User id Comment
va_therunner 1470090047166 user_t
this is a comment
string
va_therunner 1470090031702 user_z Hi there
va_therunner 1470090031702 user_t Yo
va_therunner 1470090031702 user_a Love it!
va_tagged 1458951942903 user_b tagged
va_tagged 1458951902463 user_x go90
va_guidance 1470090031702 user_v whodunit
CREATE TABLE IF NOT EXISTS comments (
videoid varchar,
timestamp bigint,
userid varchar,
comment varchar,
PRIMARY KEY(videoid, timestamp, userid))
WITH CLUSTERING ORDER BY (timestamp DESC,
userid DESC);

Pagination example
Video id timestamp User id Comment
va_therunner 1470090047166 user_t
this is a comment
string
va_therunner 1470090031702 user_z Hi there
va_therunner 1470090031702 user_t Yo
va_therunner 1470090031702 user_a Love it!
va_therunner 1458951942903 user_b tagged
va_tagged 1458951902463 user_x go90
va_guidance 1470090031702 user_v whodunit
// start pagination thru comments table
select ts, uid, comment from comments where vid =
'va_therunner' limit 3;
> Returns first 3 rows
// incorrect second call
select ts, uid, comment from comments where
timestamp < 1470090031702 AND vid = 'va_therunner'
limit 3;
> Returns “tagged” comment // “Love it!” comment
will be skipped
// need to paginate clustering column “user id” too
select ts, uid, comment from comments where
timestamp = 1470090031702 AND vid = 'va_therunner'
AND uid < 'user_t' limit 3;
> Returns “Love it!”

Counters
• Use case:
• Display total number of comments for each video asset
• Avoid select count (*)!
• Built in support for synchronized concurrent access
• Use a separate table for all counters (separate from original metadata)
• Cannot add counter column to non-counter column family
• Sometimes counter value can get out of sync
• http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-
of-counters
• background job at night to count the table and adjust counter values if needed
• Counters cannot be deleted
• Once deleted – you will not be able to use the same counter for sometime (undefined
state)
• Workaround – read value and add negative value (not concurrent safe)

Make use of TTL and DTCS !
• Use case:
• Storing resume points for every user, and every video they watched
• Lookup what is recently watched by a user
• Problem:
• This can grow fast and might not be scalable! (why store the resume point for a person that only watches
one video and leave ?)
• Solution:
• For resume points and watch history, insert with TTL of 30 days.
• Combine it with DateTieredCompactionStragtegy (DTCS)
• Best fit: time series fact data, delete by TTL
• Help cassandra to drop expired data (sstables on disk) effectively by grouping data into sstables by timestamp.
• Can drop whole sstables at once
• Less disk read means faster read time

Avoid deletes (tombstones)
• Use case:
• Activity feed with aggregation support
• Problem:
• How to group similar activity into one and not show duplicates ?
• User follows DreamWorksTV and Sabrina
• They publish a new episode for the same series (Songs that stick) at the same
time
• In user’s feed, we want to show one combined event instead of 2 duplicate events
• Feed read needs to be fast – first screen in 1.0 app!

First solution
• Two separate tables
• Feed table: primary key on (userID, timestamp). Always contains aggregated final
view of a user’s feed. Lookup is simple read query on the user id => fast.
• Aggregation table: primary key (userID, targetID). For each key, we store the
current activity written to feed with it’s timestamp.
• Feed update is done async on a background job – which involves:
• Read aggregation table to see if there is previous entry
• Update aggregation table (either insert or update)
• Update feed table, which can be a insert if no previous entry, or a delete to remove
previous entry and then insert new aggregated entry.
• Feed update is expensive, but is done asynchronously
• Feed read is fast since is a simple read
• It works - ship it!

Empty feed
• Field reports of getting empty feed screen
• Can occur at random times

Read timeout and tombstones
• Long compaction is happening and causing read timeout
• Too many delete operations
• Each delete will create a new tombstone
• Too many tombstone will cause expensive compaction
• It will also significantly slow down read operations because too many tombstones
needs to be scanned

How to avoid tombstones ?
• Adjust gc_grace_seconds so compaction happen more frequently to reduce number of
tombstones
• Smaller compaction each time
• Node repair should happen more frequently too:
• http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
• New data model and algorithm could help too!
• Avoid excessive delete ops if possible!
• Make use of TTL and DTCS
• In our case, we switched to a write-only algorithm:
• aggregation in memory by reading more entries instead
• 45 days TTL with DTCS
• time series fact data, delete by TTL

Search: DSE Solr integration
• Real time fuzzy user
search
• Zero down time to add this
feature to existing
production cluster
• Separate small solr data
center dedicated for new
search queries only
• Existing queries
unchanged
• Writes into existing cluster
will be replicated into solr
nodes automatically
Solr
C*
WebService
App
Request
Search
request
DB
queries
replication

Solr index disappearing
• While we try to set up this initially – new data written to the original cluster will be available for
search, but then entries starts to disappear after a few minutes.
• Turns out to be combination of two problems:
• Existing bug in DSE 4.6.9 or earlier: Top deletion may cause unwanted deletes from the index. (DSP-
6654)
• In the solr schema xml – if you are going to index the primary key field in the schema, the field cannot
be tokenized. (In our case, we do not need to index the primary key anyway – it’s an UUID and no
one is going to search with that from the app)
• https://docs.datastax.com/en/datastax_enterprise/4.0/datastax_enterprise/srch/srchConfSkema.html
• We fixed solr schema and upgrade to DSE 4.8.4 – and all is well!

Upgrade DSE and Java
• Upgrade
• DSE 4.6 to 4.8 (Cassandra 2.0 to 2.1)
• Java 7 to 8
• Benchmarks with cassandra-stress
• https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCStress_t.html
• Findings
• In general, Cassandra 2.1 gives better performance in both read and write.
• We discovered minor peak performance degradation when running with Java 8 and Cassandra 2.1
• http://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/install/installTARdse.html

PV or HVM ?
• Linux Amazon Machine Images (AMI)
• Paravirtual (PV)
• Hardware virtual machine (HVM)
• http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/virtualization_types.html
• HVM gives better performance
• Align with Amazon recommendations
• Cassandra-stress results:
• HVM: ~105K write/s
• PV: ~95K write/s

Storage with EC2
• Ephemeral (internal) vs Elastic block storage (EBS)
• In general, ephemeral gives better performance and is recommended
• Internal disks are physically attached to the instance
• http://www.datastax.com/dev/blog/what-is-the-story-with-aws-storage
• Our mixed mode (read/write) test results:
• Ephemeral: 61K ops rate
• EBS with encryption: 45K ops rate
• But what about when encryption is required ?
• EBS has built-in encryption support
• http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSEncryption.html
• Ephemeral - no native support from AWS, you need to deploy your own solution.

Maintenance
• Repairs
• Cron job to schedule repair jobs weekly
• Full repair on each node
• Can take long for big clusters to complete full round
• Looking to move to opscenter 6.0.2 with better management interface
• Future:
• Parallel node repairs
• Increment repairs
• Backups
• Daily backup to S3
• Can only restore data since last backup
• Future: commit log backup for point-in-time restore

Summary
• Avoid SELECT *
• Effective data modeling
• Make use of TTL and DTCS to avoid tombstones!
• Search with SOLR
• https://go90.com

What We Learned About Cassandra While Building go90 (Christopher Webster & Thomas Ng, AOL) | C* Summit 2016

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to What We Learned About Cassandra While Building go90 (Christopher Webster & Thomas Ng, AOL) | C* Summit 2016 (20)

More from DataStax (20)

Recently uploaded (20)

What We Learned About Cassandra While Building go90 (Christopher Webster & Thomas Ng, AOL) | C* Summit 2016