SlideShare a Scribd company logo
Building Your First Application with
Cassandra
Luke Tillman (@LukeTillman)
Language Evangelist at DataStax
Who are you?!
• Evangelist with a focus on the .NET Community
• Long-time .NET Developer
• Recently presented at Cassandra Summit 2014 with Microsoft
2
KillrVideo, a Video Sharing Site
• Think a YouTube competitor
– Users add videos, rate them, comment on them, etc.
– Can search for videos by tag
See the Live Demo, Get the Code
• Live demo available at http://www.killrvideo.com
– Written in C#
– Live Demo running in Azure
– Open source: https://github.com/luketillman/killrvideo-csharp
• Interesting use case because of different data modeling
challenges and the scale of something like YouTube
– More than 1 billion unique users visit YouTube each month
– 100 hours of video are uploaded to YouTube every minute
4
1 Think Before You Model
2 A Data Model for Cat Videos
3 Phase 2: Build the Application
4 Software Architecture, A Love Story
5 The Future
5
Think Before You Model
Or how to keep doing what you’re already doing
6
Getting to Know Your Data
• What things do I have in the system?
• What are the relationships between them?
• This is your conceptual data model
• You already do this in the RDBMS world
Some of the Entities and Relationships in KillrVideo
8
User
id
firstname
lastname
email
password
Video
id
name
description
location
preview_image
tags
features
Comment
comment
id
adds
timestamp
posts
timestamp
1
n
n
1
1
n
n
m
rates
rating
Getting to Know Your Queries
• What are your application’s workflows?
• How will I access the data?
• Knowing your queries in advance is NOT optional
• Different from RDBMS because I can’t just JOIN or create a new
indexes to support new queries
9
Some Application Workflows in KillrVideo
10
User Logs
into site
Show basic
information
about user
Show videos
added by a
user
Show
comments
posted by a
user
Search for a
video by tag
Show latest
videos added
to the site
Show
comments
for a video
Show ratings
for a video
Show video
and its
details
Some Queries in KillrVideo to Support Workflows
11
Users
User Logs
into site
Find user by email
address
Show basic
information
about user
Find user by id
Comments
Show
comments
for a video
Find comments by
video (latest first)
Show
comments
posted by a
user
Find comments by
user (latest first)
Ratings
Show ratings
for a video Find ratings by video
Some Queries in KillrVideo to Support Workflows
12
Videos
Search for a
video by tag Find video by tag
Show latest
videos added
to the site
Find videos by date
(latest first)
Show video
and its
details
Find video by id
Show videos
added by a
user
Find videos by user
(latest first)
A Data Model for Cat Videos
Because the Internet loves ‘em some cat videos
13
Just How Popular are Cats on the Internet?
14
http://mashable.com/2013/07/08/cats-bacon-rule-internet/
Just How Popular are Cats on the Internet?
15
http://mashable.com/2013/07/08/cats-bacon-rule-internet/
Data Modeling Refresher
• Cassandra limits us to queries that can scale across many nodes
– Include value for Partition Key and optionally, Clustering Column(s)
• We know our queries, so we build tables to answer them
• Denormalize at write time to do as few reads as possible
• Many times we end up with a “table per query”
– Similar to materialized views from the RDBMS world
16
Users – The Relational Way
• Single Users table with all user data and an Id Primary Key
• Add an index on email address to allow queries by email
User Logs
into site
Find user by email
address
Show basic
information
about user
Find user by id
Users – The Cassandra Way
User Logs
into site
Find user by email
address
Show basic
information
about user
Find user by id
CREATE TABLE user_credentials (
email text,
password text,
userid uuid,
PRIMARY KEY (email)
);
CREATE TABLE users (
userid uuid,
firstname text,
lastname text,
email text,
created_date timestamp,
PRIMARY KEY (userid)
);
Videos Everywhere!
19
Show video
and its
details
Find video by id
Show videos
added by a
user
Find videos by user
(latest first)
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
location text,
location_type int,
preview_image_location text,
tags set<text>,
added_date timestamp,
PRIMARY KEY (videoid)
);
CREATE TABLE user_videos (
userid uuid,
added_date timestamp,
videoid uuid,
name text,
preview_image_location text,
PRIMARY KEY (userid,
added_date, videoid)
)
WITH CLUSTERING ORDER BY (
added_date DESC,
videoid ASC);
Videos Everywhere!
Considerations When Duplicating Data
• Can the data change?
• How likely is it to change or how frequently will it change?
• Do I have all the information I need to update duplicates and
maintain consistency?
20
Search for a
video by tag Find video by tag
Show latest
videos added
to the site
Find videos by date
(latest first)
Modeling Relationships – Collection Types
• Cassandra doesn’t support JOINs, but your data will still have
relationships (and you can still model that in Cassandra)
• One tool available is CQL collection types
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
location text,
location_type int,
preview_image_location text,
tags set<text>,
added_date timestamp,
PRIMARY KEY (videoid)
);
Modeling Relationships – Client Side Joins
22
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
location text,
location_type int,
preview_image_location text,
tags set<text>,
added_date timestamp,
PRIMARY KEY (videoid)
);
CREATE TABLE users (
userid uuid,
firstname text,
lastname text,
email text,
created_date timestamp,
PRIMARY KEY (userid)
);
Currently requires query for video,
followed by query for user by id based
on results of first query
Modeling Relationships – Client Side Joins
• What is the cost? Might be OK in small situations
• Do NOT scale
• Avoid when possible
23
Modeling Relationships – Client Side Joins
24
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
...
user_firstname text,
user_lastname text,
user_email text,
PRIMARY KEY (videoid)
);
CREATE TABLE users_by_video (
videoid uuid,
userid uuid,
firstname text,
lastname text,
email text,
PRIMARY KEY (videoid)
);
or
ALTER TABLE videos ADD user_firstname text;
ALTER TABLE videos ADD user_lastname text;
ALTER TABLE videos ADD user_email text;
Modeling Relationships – Client Side Joins
• Remember the considerations when you duplicate data
• What happens if a user changes their name or email address?
• Can I update the duplicated data?
25
Cassandra Rules Can Impact Your Design
• Video Ratings – use counters to track sum of all ratings and
count of ratings
• Counters are a good example of something with special rules
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
...
rating_counter counter,
rating_total counter,
PRIMARY KEY (videoid)
);
CREATE TABLE video_ratings (
videoid uuid,
rating_counter counter,
rating_total counter,
PRIMARY KEY (videoid)
);
Single Nodes Have Limits Too
• Latest videos are bucketed by
day
• Means all reads/writes to latest
videos are going to same
partition (and thus the same
nodes)
• Could create a hotspot
27
Show latest
videos added
to the site
Find videos by date
(latest first)
CREATE TABLE latest_videos (
yyyymmdd text,
added_date timestamp,
videoid uuid,
name text,
preview_image_location text,
PRIMARY KEY (yyyymmdd,
added_date, videoid)
) WITH CLUSTERING ORDER BY (
added_date DESC,
videoid ASC
);
Single Nodes Have Limits Too
• Mitigate by adding data to the
Partition Key to spread load
• Data that’s already naturally a
part of the domain
– Latest videos by category?
• Arbitrary data, like a bucket
number
– Round robin at the app level
28
Show latest
videos added
to the site
Find videos by date
(latest first)
CREATE TABLE latest_videos (
yyyymmdd text,
bucket_number int,
added_date timestamp,
videoid uuid,
name text,
preview_image_location text,
PRIMARY KEY (
(yyyymmdd, bucket_number)
added_date, videoid)
) ...
Phase 2: Build the Application
Phase 3: Profit
29
Phase 1: Data Model
The DataStax Drivers for Cassandra
• Currently Available
– C# (.NET)
– Python
– Java
– NodeJS
– Ruby
– C++
• Will Probably Happen
– PHP
– Scala
– JDBC
• Early Discussions
– Go
– Rust
30
• Open source, Apache 2 licensed, available on GitHub
– https://github.com/datastax/
The DataStax Drivers for Cassandra
Language Bootstrapping Code
C#
Cluster cluster = Cluster.Builder().AddContactPoint("127.0.0.1").Build();
ISession session = cluster.Connect("killrvideo");
Python
from cassandra.cluster import Cluster
cluster = Cluster(contact_points=['127.0.0.1'])
session = cluster.connect('killrvideo')
Java
Cluster cluster = Cluster.builder().addContactPoint("127.0.0.1").build();
Session session = cluster.connect("killrvideo");
NodeJS
var cassandra = require('cassandra-driver');
var client = new cassandra.Client({
contactPoints: ['127.0.0.1'], keyspace: 'killrvideo'
});
Use Prepared Statements
• Performance optimization for queries you run repeatedly
• Pay the cost of preparing once (causes roundtrip to Cassandra)
• KillrVideo: looking a user’s credentials up by email address
• Save and reuse the PreparedStatement instance after preparing
32
PreparedStatement prepared = session.Prepare(
"SELECT * FROM user_credentials WHERE email = ?");
Use Prepared Statements
• Bind variable values when ready to execute
• Execution only has to send variable values over the wire
• Cassandra doesn’t have to reparse the CQL string each time
• Remember: Prepare once, bind and execute many
33
BoundStatement bound = prepared.Bind("luke.tillman@datastax.com");
RowSet rows = await _session.ExecuteAsync(bound);
Batch Statements: Use and Misuse
• You can mix and match Simple/Bound statements in a batch
• Batches are Logged (atomic) by default
• Use when you want a group of mutations (statements) to all
succeed or all fail (denormalizing at write time)
• Large batches are an anti-pattern (Cassandra will warn you)
• Not a performance optimization for bulk-loading data
34
KillrVideo: Update a Video’s Name with a Batch
35
public class VideoCatalogDataAccess
{
public VideoCatalogDataAccess(ISession session)
{
_session = session;
_prepared = _session.Prepare(
"UPDATE user_videos SET name = ? WHERE userid = ? AND videoid = ?");
}
public async Task UpdateVideoName(UpdateVideoDto video)
{
BoundStatement bound = _prepared.Bind(video.Name, video.UserId, video.VideoId);
var simple = new SimpleStatement("UPDATE videos SET name = ? WHERE videoid = ?",
video.Name, video.VideoId);
// Use an atomic batch to send over all the mutations
var batchStatement = new BatchStatement();
batchStatement.Add(bound);
batchStatement.Add(simple);
RowSet rows = await _session.ExecuteAsync(batch);
}
}
Lightweight Transactions when you need them
• Use when you don’t want writes to step on each other
– Sometimes called Linearizable Consistency
– Similar to Serial Isolation Level from RDBMS
• Essentially a Check and Set (CAS) operation using Paxos
• Read the fine print: has a latency cost associated with it
• The canonical example: unique user accounts
36
KillrVideo: LWT to create user accounts
• Returns a column called [applied] indicating success/failure
• Different from relational world where you might expect an
Exception (i.e. PrimaryKeyViolationException or similar)
37
string cql = "INSERT INTO user_credentials (email, password, userid)" +
"VALUES (?, ?, ?) IF NOT EXISTS";
var statement = new SimpleStatement(cql, user.Email, hashedPassword, user.UserId);
RowSet rows = await _session.ExecuteAsync(statement);
var userInserted = rows.Single().GetValue<bool>("[applied]");
Software Architecture, A Love Story
Disclaimer: I am not paid to be a software architect
38
KillrVideo Logical Architecture
Web UI
HTML5 / JavaScript
KillrVideo MVC App
Serves up Web UI HTML and handles JSON requests from Web UI
Comments
Tracks comments on
videos by users
Uploads
Handles processing,
storing, and encoding
uploaded videos
Video Catalog
Tracks the catalog of
available videos
User Management
User accounts, login
credentials, profiles
Cassandra
Cluster (DSE)
App data storage
for services (e.g.
users, comments)
DataStax
OpsCenter
Management,
provisioning, and
monitoring
Azure Media
Services
Video encoding,
thumbnail
generation
Azure Storage
(Blob, Queue)
Video file and
thumbnail image
storage
Azure Service
Bus
Published events
from services for
interactions
Browser
Server
Services
Infrastructure
Inside a Simple Service: Video Catalog
Video Catalog
Tracks the catalog of
available videos
Cassandra
Cluster (DSE)
App data storage
for services (e.g.
users, comments)
Azure Service
Bus
Published events
from services for
interactions
Inside a Simple Service: Video Catalog
Video Catalog
Tracks the catalog of
available videos
Cassandra
Cluster (DSE)
App data storage
for services (e.g.
users, comments)
• Stores metadata about videos in
Cassandra (e.g. name, description,
location, thumbnail location, etc.)
Inside a Simple Service: Video Catalog
Video Catalog
Tracks the catalog of
available videos
Azure Service
Bus
Published events
from services for
interactions
• Publishes events about interesting things
that happen (e.g. YouTubeVideoAdded,
UploadedVideoAccepted, etc.)
Inside a More Complicated Service: Uploads
Uploads
Handles processing,
storing, and encoding
uploaded videos
Cassandra
Cluster (DSE)
App data storage
for services (e.g.
users, comments)
Azure Storage
(Blob, Queue)
Video file and
thumbnail image
storage
Azure Media
Services
Video encoding,
thumbnail
generation
Azure Service
Bus
Published events
from services for
interactions
Inside a More Complicated Service: Uploads
Uploads
Handles processing,
storing, and encoding
uploaded videos
Cassandra
Cluster (DSE)
App data storage
for services (e.g.
users, comments)
• Stores data about uploaded video file
locations, encoding jobs, job status, etc.
Inside a More Complicated Service: Uploads
Uploads
Handles processing,
storing, and encoding
uploaded videos
Azure Storage
(Blob, Queue)
Video file and
thumbnail image
storage
• Stores original and re-encoded video file
assets, as well as thumbnail preview
images generated
Inside a More Complicated Service: Uploads
Uploads
Handles processing,
storing, and encoding
uploaded videos
Azure Media
Services
Video encoding,
thumbnail
generation
• Re-encodes uploaded videos to format
suitable for the web, generates
thumbnail image previews
Inside a More Complicated Service: Uploads
Uploads
Handles processing,
storing, and encoding
uploaded videos
Azure Service
Bus
Published events
from services for
interactions
• Publishes events about interesting things
that happen (e.g.
UploadedVideoPublished, etc.)
Event Driven Architecture
• Only the application(s)
give commands
• Decoupled: Pub-sub
messaging to tell other
parts of the system
something interesting
happened
• Services could be
deployed, scaled, and
versioned independently
(AKA microservices)
42
Azure Service
Bus
User
Management
Comments
Video
Ratings
Sample Data
Search
Statistics
Suggested
Videos
Uploads
Video
Catalog
Event Driven Architecture
• Only the application(s)
give commands
• Decoupled: Pub-sub
messaging to tell other
parts of the system
something interesting
happened
• Services could be
deployed, scaled, and
versioned independently
(AKA microservices)
42
Azure Service
Bus
Search
Suggested
Videos
Video
Catalog
Hey, I added this
new YouTube video
to the catalog!
Event Driven Architecture
• Only the application(s)
give commands
• Decoupled: Pub-sub
messaging to tell other
parts of the system
something interesting
happened
• Services could be
deployed, scaled, and
versioned independently
(AKA microservices)
42
Azure Service
Bus
Search
Suggested
Videos
Video
Catalog
Hey, I added this
new YouTube video
to the catalog!
Time to figure
out what videos
to suggest for
that new video.
Better index that
new video so it
shows up in
search results.
The Future
In the year 3,000…
51
The Future, Conan?
Where do we go with KillrVideo from here?
• Spark or AzureML for video suggestions
• Video search via Solr
• Actors that store state in C* (Akka.NET or Orleans)
• Storing file data (thumbnails, profile pics) in C* using pithos
Questions?
54
Follow me on Twitter for updates or to ask questions later: @LukeTillman

More Related Content

PDF
Introduction to Data Modeling with Apache Cassandra
PDF
KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...
PDF
Via forensics icloud-keychain_passwords_13
PPTX
Enabling Search in your Cassandra Application with DataStax Enterprise
PPTX
Integrating OpenStack with Active Directory
PDF
TROOPERS 20 - SQL Server Hacking Tips for Active Directory Environments
PDF
Play Framework + Docker + CircleCI + AWS + EC2 Container Service
PPT
2 Linux Container and Docker
Introduction to Data Modeling with Apache Cassandra
KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...
Via forensics icloud-keychain_passwords_13
Enabling Search in your Cassandra Application with DataStax Enterprise
Integrating OpenStack with Active Directory
TROOPERS 20 - SQL Server Hacking Tips for Active Directory Environments
Play Framework + Docker + CircleCI + AWS + EC2 Container Service
2 Linux Container and Docker

What's hot (10)

PDF
[JSDC 2016] Codex: Conditional Modules Strike Back
PPTX
DataStax NYC Java Meetup: Cassandra with Java
PPTX
Hadoop Hive
PDF
Creating PostgreSQL-as-a-Service at Scale
PDF
Introduction to Apache ZooKeeper
PDF
lock, block & two smoking barrels
PPTX
DevOpsDays InSpec Workshop
PPTX
OpenStack Glance
PDF
Hostingultraso com (10)
PPTX
Simple docker hosting in FIWARE Lab
[JSDC 2016] Codex: Conditional Modules Strike Back
DataStax NYC Java Meetup: Cassandra with Java
Hadoop Hive
Creating PostgreSQL-as-a-Service at Scale
Introduction to Apache ZooKeeper
lock, block & two smoking barrels
DevOpsDays InSpec Workshop
OpenStack Glance
Hostingultraso com (10)
Simple docker hosting in FIWARE Lab
Ad

Viewers also liked (7)

PDF
Introduction to Apache Cassandra
PDF
Getting started with DataStax .NET Driver for Cassandra
PDF
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
PDF
Avoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
PDF
A Deep Dive into Apache Cassandra for .NET Developers
PDF
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
PDF
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
Introduction to Apache Cassandra
Getting started with DataStax .NET Driver for Cassandra
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Avoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
A Deep Dive into Apache Cassandra for .NET Developers
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
Ad

Similar to Building your First Application with Cassandra (20)

PDF
Cassandra Day Chicago 2015: Building Your First Application with Apache Cassa...
PDF
Cassandra Day Atlanta 2015: Building Your First Application with Apache Cassa...
PDF
Cassandra Day London 2015: Building Your First Application in Apache Cassandra
PDF
Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos
PDF
Advanced Data Modeling with Apache Cassandra
PPTX
Creating a Python Microservice Tier in Four Sprints with Cassandra, Kafka, an...
PDF
Cassandra 3.0 advanced preview
PDF
Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...
PPTX
Microservices with Node.js and Apache Cassandra
PPTX
Real-time Code Sharing Service for one-to-many coding classes
PPTX
Getting started with titanium
PPTX
Zimmertwins Presentation
PPTX
Getting started with Appcelerator Titanium
PDF
Probo.ci Drupal 4 Gov Devops 1/2 day Presentation
PDF
Red Hat JBoss BRMS and BPMS Workbench and Rich Client Technology
PDF
Busy Developers Guide to AngularJS (Tiberiu Covaci)
PDF
HTML5: The Parts You Care About - 4/Nov/13 - PrDC Saskatoon, SK
PDF
Crossroads of Asynchrony and Graceful Degradation
PPTX
Capture, record, clip, embed and play, search: video from newbie to ninja
PPTX
Building a Video Encoding Pipeline at The New York Times
Cassandra Day Chicago 2015: Building Your First Application with Apache Cassa...
Cassandra Day Atlanta 2015: Building Your First Application with Apache Cassa...
Cassandra Day London 2015: Building Your First Application in Apache Cassandra
Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos
Advanced Data Modeling with Apache Cassandra
Creating a Python Microservice Tier in Four Sprints with Cassandra, Kafka, an...
Cassandra 3.0 advanced preview
Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...
Microservices with Node.js and Apache Cassandra
Real-time Code Sharing Service for one-to-many coding classes
Getting started with titanium
Zimmertwins Presentation
Getting started with Appcelerator Titanium
Probo.ci Drupal 4 Gov Devops 1/2 day Presentation
Red Hat JBoss BRMS and BPMS Workbench and Rich Client Technology
Busy Developers Guide to AngularJS (Tiberiu Covaci)
HTML5: The Parts You Care About - 4/Nov/13 - PrDC Saskatoon, SK
Crossroads of Asynchrony and Graceful Degradation
Capture, record, clip, embed and play, search: video from newbie to ninja
Building a Video Encoding Pipeline at The New York Times

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
KodekX | Application Modernization Development
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPT
Teaching material agriculture food technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Encapsulation theory and applications.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
KodekX | Application Modernization Development
MIND Revenue Release Quarter 2 2025 Press Release
Reach Out and Touch Someone: Haptics and Empathic Computing
20250228 LYD VKU AI Blended-Learning.pptx
Programs and apps: productivity, graphics, security and other tools
Understanding_Digital_Forensics_Presentation.pptx
Spectral efficient network and resource selection model in 5G networks
“AI and Expert System Decision Support & Business Intelligence Systems”
NewMind AI Weekly Chronicles - August'25 Week I
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Teaching material agriculture food technology
Digital-Transformation-Roadmap-for-Companies.pptx
Review of recent advances in non-invasive hemoglobin estimation
Building Integrated photovoltaic BIPV_UPV.pdf
Empathic Computing: Creating Shared Understanding
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

Building your First Application with Cassandra

  • 1. Building Your First Application with Cassandra Luke Tillman (@LukeTillman) Language Evangelist at DataStax
  • 2. Who are you?! • Evangelist with a focus on the .NET Community • Long-time .NET Developer • Recently presented at Cassandra Summit 2014 with Microsoft 2
  • 3. KillrVideo, a Video Sharing Site • Think a YouTube competitor – Users add videos, rate them, comment on them, etc. – Can search for videos by tag
  • 4. See the Live Demo, Get the Code • Live demo available at http://www.killrvideo.com – Written in C# – Live Demo running in Azure – Open source: https://github.com/luketillman/killrvideo-csharp • Interesting use case because of different data modeling challenges and the scale of something like YouTube – More than 1 billion unique users visit YouTube each month – 100 hours of video are uploaded to YouTube every minute 4
  • 5. 1 Think Before You Model 2 A Data Model for Cat Videos 3 Phase 2: Build the Application 4 Software Architecture, A Love Story 5 The Future 5
  • 6. Think Before You Model Or how to keep doing what you’re already doing 6
  • 7. Getting to Know Your Data • What things do I have in the system? • What are the relationships between them? • This is your conceptual data model • You already do this in the RDBMS world
  • 8. Some of the Entities and Relationships in KillrVideo 8 User id firstname lastname email password Video id name description location preview_image tags features Comment comment id adds timestamp posts timestamp 1 n n 1 1 n n m rates rating
  • 9. Getting to Know Your Queries • What are your application’s workflows? • How will I access the data? • Knowing your queries in advance is NOT optional • Different from RDBMS because I can’t just JOIN or create a new indexes to support new queries 9
  • 10. Some Application Workflows in KillrVideo 10 User Logs into site Show basic information about user Show videos added by a user Show comments posted by a user Search for a video by tag Show latest videos added to the site Show comments for a video Show ratings for a video Show video and its details
  • 11. Some Queries in KillrVideo to Support Workflows 11 Users User Logs into site Find user by email address Show basic information about user Find user by id Comments Show comments for a video Find comments by video (latest first) Show comments posted by a user Find comments by user (latest first) Ratings Show ratings for a video Find ratings by video
  • 12. Some Queries in KillrVideo to Support Workflows 12 Videos Search for a video by tag Find video by tag Show latest videos added to the site Find videos by date (latest first) Show video and its details Find video by id Show videos added by a user Find videos by user (latest first)
  • 13. A Data Model for Cat Videos Because the Internet loves ‘em some cat videos 13
  • 14. Just How Popular are Cats on the Internet? 14 http://mashable.com/2013/07/08/cats-bacon-rule-internet/
  • 15. Just How Popular are Cats on the Internet? 15 http://mashable.com/2013/07/08/cats-bacon-rule-internet/
  • 16. Data Modeling Refresher • Cassandra limits us to queries that can scale across many nodes – Include value for Partition Key and optionally, Clustering Column(s) • We know our queries, so we build tables to answer them • Denormalize at write time to do as few reads as possible • Many times we end up with a “table per query” – Similar to materialized views from the RDBMS world 16
  • 17. Users – The Relational Way • Single Users table with all user data and an Id Primary Key • Add an index on email address to allow queries by email User Logs into site Find user by email address Show basic information about user Find user by id
  • 18. Users – The Cassandra Way User Logs into site Find user by email address Show basic information about user Find user by id CREATE TABLE user_credentials ( email text, password text, userid uuid, PRIMARY KEY (email) ); CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) );
  • 19. Videos Everywhere! 19 Show video and its details Find video by id Show videos added by a user Find videos by user (latest first) CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) ); CREATE TABLE user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (userid, added_date, videoid) ) WITH CLUSTERING ORDER BY ( added_date DESC, videoid ASC);
  • 20. Videos Everywhere! Considerations When Duplicating Data • Can the data change? • How likely is it to change or how frequently will it change? • Do I have all the information I need to update duplicates and maintain consistency? 20 Search for a video by tag Find video by tag Show latest videos added to the site Find videos by date (latest first)
  • 21. Modeling Relationships – Collection Types • Cassandra doesn’t support JOINs, but your data will still have relationships (and you can still model that in Cassandra) • One tool available is CQL collection types CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) );
  • 22. Modeling Relationships – Client Side Joins 22 CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) ); CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) ); Currently requires query for video, followed by query for user by id based on results of first query
  • 23. Modeling Relationships – Client Side Joins • What is the cost? Might be OK in small situations • Do NOT scale • Avoid when possible 23
  • 24. Modeling Relationships – Client Side Joins 24 CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, ... user_firstname text, user_lastname text, user_email text, PRIMARY KEY (videoid) ); CREATE TABLE users_by_video ( videoid uuid, userid uuid, firstname text, lastname text, email text, PRIMARY KEY (videoid) ); or ALTER TABLE videos ADD user_firstname text; ALTER TABLE videos ADD user_lastname text; ALTER TABLE videos ADD user_email text;
  • 25. Modeling Relationships – Client Side Joins • Remember the considerations when you duplicate data • What happens if a user changes their name or email address? • Can I update the duplicated data? 25
  • 26. Cassandra Rules Can Impact Your Design • Video Ratings – use counters to track sum of all ratings and count of ratings • Counters are a good example of something with special rules CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, ... rating_counter counter, rating_total counter, PRIMARY KEY (videoid) ); CREATE TABLE video_ratings ( videoid uuid, rating_counter counter, rating_total counter, PRIMARY KEY (videoid) );
  • 27. Single Nodes Have Limits Too • Latest videos are bucketed by day • Means all reads/writes to latest videos are going to same partition (and thus the same nodes) • Could create a hotspot 27 Show latest videos added to the site Find videos by date (latest first) CREATE TABLE latest_videos ( yyyymmdd text, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (yyyymmdd, added_date, videoid) ) WITH CLUSTERING ORDER BY ( added_date DESC, videoid ASC );
  • 28. Single Nodes Have Limits Too • Mitigate by adding data to the Partition Key to spread load • Data that’s already naturally a part of the domain – Latest videos by category? • Arbitrary data, like a bucket number – Round robin at the app level 28 Show latest videos added to the site Find videos by date (latest first) CREATE TABLE latest_videos ( yyyymmdd text, bucket_number int, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY ( (yyyymmdd, bucket_number) added_date, videoid) ) ...
  • 29. Phase 2: Build the Application Phase 3: Profit 29 Phase 1: Data Model
  • 30. The DataStax Drivers for Cassandra • Currently Available – C# (.NET) – Python – Java – NodeJS – Ruby – C++ • Will Probably Happen – PHP – Scala – JDBC • Early Discussions – Go – Rust 30 • Open source, Apache 2 licensed, available on GitHub – https://github.com/datastax/
  • 31. The DataStax Drivers for Cassandra Language Bootstrapping Code C# Cluster cluster = Cluster.Builder().AddContactPoint("127.0.0.1").Build(); ISession session = cluster.Connect("killrvideo"); Python from cassandra.cluster import Cluster cluster = Cluster(contact_points=['127.0.0.1']) session = cluster.connect('killrvideo') Java Cluster cluster = Cluster.builder().addContactPoint("127.0.0.1").build(); Session session = cluster.connect("killrvideo"); NodeJS var cassandra = require('cassandra-driver'); var client = new cassandra.Client({ contactPoints: ['127.0.0.1'], keyspace: 'killrvideo' });
  • 32. Use Prepared Statements • Performance optimization for queries you run repeatedly • Pay the cost of preparing once (causes roundtrip to Cassandra) • KillrVideo: looking a user’s credentials up by email address • Save and reuse the PreparedStatement instance after preparing 32 PreparedStatement prepared = session.Prepare( "SELECT * FROM user_credentials WHERE email = ?");
  • 33. Use Prepared Statements • Bind variable values when ready to execute • Execution only has to send variable values over the wire • Cassandra doesn’t have to reparse the CQL string each time • Remember: Prepare once, bind and execute many 33 BoundStatement bound = prepared.Bind("luke.tillman@datastax.com"); RowSet rows = await _session.ExecuteAsync(bound);
  • 34. Batch Statements: Use and Misuse • You can mix and match Simple/Bound statements in a batch • Batches are Logged (atomic) by default • Use when you want a group of mutations (statements) to all succeed or all fail (denormalizing at write time) • Large batches are an anti-pattern (Cassandra will warn you) • Not a performance optimization for bulk-loading data 34
  • 35. KillrVideo: Update a Video’s Name with a Batch 35 public class VideoCatalogDataAccess { public VideoCatalogDataAccess(ISession session) { _session = session; _prepared = _session.Prepare( "UPDATE user_videos SET name = ? WHERE userid = ? AND videoid = ?"); } public async Task UpdateVideoName(UpdateVideoDto video) { BoundStatement bound = _prepared.Bind(video.Name, video.UserId, video.VideoId); var simple = new SimpleStatement("UPDATE videos SET name = ? WHERE videoid = ?", video.Name, video.VideoId); // Use an atomic batch to send over all the mutations var batchStatement = new BatchStatement(); batchStatement.Add(bound); batchStatement.Add(simple); RowSet rows = await _session.ExecuteAsync(batch); } }
  • 36. Lightweight Transactions when you need them • Use when you don’t want writes to step on each other – Sometimes called Linearizable Consistency – Similar to Serial Isolation Level from RDBMS • Essentially a Check and Set (CAS) operation using Paxos • Read the fine print: has a latency cost associated with it • The canonical example: unique user accounts 36
  • 37. KillrVideo: LWT to create user accounts • Returns a column called [applied] indicating success/failure • Different from relational world where you might expect an Exception (i.e. PrimaryKeyViolationException or similar) 37 string cql = "INSERT INTO user_credentials (email, password, userid)" + "VALUES (?, ?, ?) IF NOT EXISTS"; var statement = new SimpleStatement(cql, user.Email, hashedPassword, user.UserId); RowSet rows = await _session.ExecuteAsync(statement); var userInserted = rows.Single().GetValue<bool>("[applied]");
  • 38. Software Architecture, A Love Story Disclaimer: I am not paid to be a software architect 38
  • 39. KillrVideo Logical Architecture Web UI HTML5 / JavaScript KillrVideo MVC App Serves up Web UI HTML and handles JSON requests from Web UI Comments Tracks comments on videos by users Uploads Handles processing, storing, and encoding uploaded videos Video Catalog Tracks the catalog of available videos User Management User accounts, login credentials, profiles Cassandra Cluster (DSE) App data storage for services (e.g. users, comments) DataStax OpsCenter Management, provisioning, and monitoring Azure Media Services Video encoding, thumbnail generation Azure Storage (Blob, Queue) Video file and thumbnail image storage Azure Service Bus Published events from services for interactions Browser Server Services Infrastructure
  • 40. Inside a Simple Service: Video Catalog Video Catalog Tracks the catalog of available videos Cassandra Cluster (DSE) App data storage for services (e.g. users, comments) Azure Service Bus Published events from services for interactions
  • 41. Inside a Simple Service: Video Catalog Video Catalog Tracks the catalog of available videos Cassandra Cluster (DSE) App data storage for services (e.g. users, comments) • Stores metadata about videos in Cassandra (e.g. name, description, location, thumbnail location, etc.)
  • 42. Inside a Simple Service: Video Catalog Video Catalog Tracks the catalog of available videos Azure Service Bus Published events from services for interactions • Publishes events about interesting things that happen (e.g. YouTubeVideoAdded, UploadedVideoAccepted, etc.)
  • 43. Inside a More Complicated Service: Uploads Uploads Handles processing, storing, and encoding uploaded videos Cassandra Cluster (DSE) App data storage for services (e.g. users, comments) Azure Storage (Blob, Queue) Video file and thumbnail image storage Azure Media Services Video encoding, thumbnail generation Azure Service Bus Published events from services for interactions
  • 44. Inside a More Complicated Service: Uploads Uploads Handles processing, storing, and encoding uploaded videos Cassandra Cluster (DSE) App data storage for services (e.g. users, comments) • Stores data about uploaded video file locations, encoding jobs, job status, etc.
  • 45. Inside a More Complicated Service: Uploads Uploads Handles processing, storing, and encoding uploaded videos Azure Storage (Blob, Queue) Video file and thumbnail image storage • Stores original and re-encoded video file assets, as well as thumbnail preview images generated
  • 46. Inside a More Complicated Service: Uploads Uploads Handles processing, storing, and encoding uploaded videos Azure Media Services Video encoding, thumbnail generation • Re-encodes uploaded videos to format suitable for the web, generates thumbnail image previews
  • 47. Inside a More Complicated Service: Uploads Uploads Handles processing, storing, and encoding uploaded videos Azure Service Bus Published events from services for interactions • Publishes events about interesting things that happen (e.g. UploadedVideoPublished, etc.)
  • 48. Event Driven Architecture • Only the application(s) give commands • Decoupled: Pub-sub messaging to tell other parts of the system something interesting happened • Services could be deployed, scaled, and versioned independently (AKA microservices) 42 Azure Service Bus User Management Comments Video Ratings Sample Data Search Statistics Suggested Videos Uploads Video Catalog
  • 49. Event Driven Architecture • Only the application(s) give commands • Decoupled: Pub-sub messaging to tell other parts of the system something interesting happened • Services could be deployed, scaled, and versioned independently (AKA microservices) 42 Azure Service Bus Search Suggested Videos Video Catalog Hey, I added this new YouTube video to the catalog!
  • 50. Event Driven Architecture • Only the application(s) give commands • Decoupled: Pub-sub messaging to tell other parts of the system something interesting happened • Services could be deployed, scaled, and versioned independently (AKA microservices) 42 Azure Service Bus Search Suggested Videos Video Catalog Hey, I added this new YouTube video to the catalog! Time to figure out what videos to suggest for that new video. Better index that new video so it shows up in search results.
  • 51. The Future In the year 3,000… 51
  • 53. Where do we go with KillrVideo from here? • Spark or AzureML for video suggestions • Video search via Solr • Actors that store state in C* (Akka.NET or Orleans) • Storing file data (thumbnails, profile pics) in C* using pithos
  • 54. Questions? 54 Follow me on Twitter for updates or to ask questions later: @LukeTillman