SlideShare a Scribd company logo
Building Your First Application with
Cassandra
Luke Tillman (@LukeTillman)
Language Evangelist at DataStax
Who are you?!
• Evangelist with a focus on the .NET Community
• Long-time .NET Developer
• Recently presented at Cassandra Summit 2014 with Microsoft
2
KillrVideo, a Video Sharing Site
• Think a YouTube competitor
– Users add videos, rate them, comment on them, etc.
– Can search for videos by tag
See the Live Demo, Get the Code
• Live demo available at http://www.killrvideo.com
– Written in C#
– Live Demo running in Azure
– Open source: https://github.com/luketillman/killrvideo-csharp
• Interesting use case because of different data modeling
challenges and the scale of something like YouTube
– More than 1 billion unique users visit YouTube each month
– 100 hours of video are uploaded to YouTube every minute
4
1 Think Before You Model
2 A Data Model for Cat Videos
3 Phase 2: Build the Application
4 Software Architecture, A Love Story
5 The Future
5
Think Before You Model
Or how to keep doing what you’re already doing
6
Getting to Know Your Data
• What things do I have in the system?
• What are the relationships between them?
• This is your conceptual data model
• You already do this in the RDBMS world
Some of the Entities and Relationships in KillrVideo
8
User
id
firstname
lastname
email
password
Video
id
name
description
location
preview_image
tags
features
Comment
comment
id
adds
timestamp
posts
timestamp
1
n
n
1
1
n
n
m
rates
rating
Getting to Know Your Queries
• What are your application’s workflows?
• How will I access the data?
• Knowing your queries in advance is NOT optional
• Different from RDBMS because I can’t just JOIN or create a new
indexes to support new queries
9
Some Application Workflows in KillrVideo
10
User Logs
into site
Show basic
information
about user
Show videos
added by a
user
Show
comments
posted by a
user
Search for a
video by tag
Show latest
videos added
to the site
Show
comments
for a video
Show ratings
for a video
Show video
and its
details
Some Queries in KillrVideo to Support Workflows
11
Users
User Logs
into site
Find user by email
address
Show basic
information
about user
Find user by id
Comments
Show
comments
for a video
Find comments by
video (latest first)
Show
comments
posted by a
user
Find comments by
user (latest first)
Ratings
Show ratings
for a video Find ratings by video
Some Queries in KillrVideo to Support Workflows
12
Videos
Search for a
video by tag Find video by tag
Show latest
videos added
to the site
Find videos by date
(latest first)
Show video
and its
details
Find video by id
Show videos
added by a
user
Find videos by user
(latest first)
A Data Model for Cat Videos
Because the Internet loves ‘em some cat videos
13
Just How Popular are Cats on the Internet?
14
http://mashable.com/2013/07/08/cats-bacon-rule-internet/
Just How Popular are Cats on the Internet?
15
http://mashable.com/2013/07/08/cats-bacon-rule-internet/
Data Modeling Refresher
• Cassandra limits us to queries that can scale across many nodes
– Include value for Partition Key and optionally, Clustering Column(s)
• We know our queries, so we build tables to answer them
• Denormalize at write time to do as few reads as possible
• Many times we end up with a “table per query”
– Similar to materialized views from the RDBMS world
16
Users – The Relational Way
• Single Users table with all user data and an Id Primary Key
• Add an index on email address to allow queries by email
User Logs
into site
Find user by email
address
Show basic
information
about user
Find user by id
Users – The Cassandra Way
User Logs
into site
Find user by email
address
Show basic
information
about user
Find user by id
CREATE TABLE user_credentials (
email text,
password text,
userid uuid,
PRIMARY KEY (email)
);
CREATE TABLE users (
userid uuid,
firstname text,
lastname text,
email text,
created_date timestamp,
PRIMARY KEY (userid)
);
Videos Everywhere!
19
Show video
and its
details
Find video by id
Show videos
added by a
user
Find videos by user
(latest first)
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
location text,
location_type int,
preview_image_location text,
tags set<text>,
added_date timestamp,
PRIMARY KEY (videoid)
);
CREATE TABLE user_videos (
userid uuid,
added_date timestamp,
videoid uuid,
name text,
preview_image_location text,
PRIMARY KEY (userid,
added_date, videoid)
)
WITH CLUSTERING ORDER BY (
added_date DESC,
videoid ASC);
Videos Everywhere!
Considerations When Duplicating Data
• Can the data change?
• How likely is it to change or how frequently will it change?
• Do I have all the information I need to update duplicates and
maintain consistency?
20
Search for a
video by tag Find video by tag
Show latest
videos added
to the site
Find videos by date
(latest first)
Modeling Relationships – Collection Types
• Cassandra doesn’t support JOINs, but your data will still have
relationships (and you can still model that in Cassandra)
• One tool available is CQL collection types
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
location text,
location_type int,
preview_image_location text,
tags set<text>,
added_date timestamp,
PRIMARY KEY (videoid)
);
Modeling Relationships – Client Side Joins
22
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
location text,
location_type int,
preview_image_location text,
tags set<text>,
added_date timestamp,
PRIMARY KEY (videoid)
);
CREATE TABLE users (
userid uuid,
firstname text,
lastname text,
email text,
created_date timestamp,
PRIMARY KEY (userid)
);
Currently requires query for video,
followed by query for user by id based
on results of first query
Modeling Relationships – Client Side Joins
• What is the cost? Might be OK in small situations
• Do NOT scale
• Avoid when possible
23
Modeling Relationships – Client Side Joins
24
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
...
user_firstname text,
user_lastname text,
user_email text,
PRIMARY KEY (videoid)
);
CREATE TABLE users_by_video (
videoid uuid,
userid uuid,
firstname text,
lastname text,
email text,
PRIMARY KEY (videoid)
);
or
Modeling Relationships – Client Side Joins
• Remember the considerations when you duplicate data
• What happens if a user changes their name or email address?
• Can I update the duplicated data?
25
Cassandra Rules Can Impact Your Design
• Video Ratings – use counters to track sum of all ratings and
count of ratings
• Counters are a good example of something with special rules
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
...
rating_counter counter,
rating_total counter,
PRIMARY KEY (videoid)
);
CREATE TABLE video_ratings (
videoid uuid,
rating_counter counter,
rating_total counter,
PRIMARY KEY (videoid)
);
Single Nodes Have Limits Too
• Latest videos are bucketed by
day
• Means all reads/writes to latest
videos are going to same
partition (and thus the same
nodes)
• Could create a hotspot
27
Show latest
videos added
to the site
Find videos by date
(latest first)
CREATE TABLE latest_videos (
yyyymmdd text,
added_date timestamp,
videoid uuid,
name text,
preview_image_location text,
PRIMARY KEY (yyyymmdd,
added_date, videoid)
) WITH CLUSTERING ORDER BY (
added_date DESC,
videoid ASC
);
Single Nodes Have Limits Too
• Mitigate by adding data to the
Partition Key to spread load
• Data that’s already naturally a
part of the domain
– Latest videos by category?
• Arbitrary data, like a bucket
number
– Round robin at the app level
28
Show latest
videos added
to the site
Find videos by date
(latest first)
CREATE TABLE latest_videos (
yyyymmdd text,
bucket_number int,
added_date timestamp,
videoid uuid,
name text,
preview_image_location text,
PRIMARY KEY (
(yyyymmdd, bucket_number)
added_date, videoid)
) ...
Phase 2: Build the Application
Phase 3: Profit
29
Phase 1: Data Model
The DataStax Drivers for Cassandra
• Currently Available
– C# (.NET)
– Python
– Java
– NodeJS
– Ruby
– C++
• Will Probably Happen
– PHP
– Scala
– JDBC
• Early Discussions
– Go
– Rust
30
• Open source, Apache 2 licensed, available on GitHub
– https://github.com/datastax/
The DataStax Drivers for Cassandra
Language Bootstrapping Code
C#
Cluster cluster = Cluster.Builder().AddContactPoint("127.0.0.1").Build();
ISession session = cluster.Connect("killrvideo");
Python
from cassandra.cluster import Cluster
cluster = Cluster(contact_points=['127.0.0.1'])
session = cluster.connect('killrvideo')
Java
Cluster cluster = Cluster.builder().addContactPoint("127.0.0.1").build();
Session session = cluster.connect("killrvideo");
NodeJS
var cassandra = require('cassandra-driver');
var client = new cassandra.Client({
contactPoints: ['127.0.0.1'], keyspace: 'killrvideo'
});
Use Prepared Statements
• Performance optimization for queries you run repeatedly
• Pay the cost of preparing once (causes roundtrip to Cassandra)
• KillrVideo: looking a user’s credentials up by email address
• Save and reuse the PreparedStatement instance after preparing
32
PreparedStatement prepared = session.Prepare(
"SELECT * FROM user_credentials WHERE email = ?");
Use Prepared Statements
• Bind variable values when ready to execute
• Execution only has to send variable values over the wire
• Cassandra doesn’t have to reparse the CQL string each time
• Remember: Prepare once, bind and execute many
33
BoundStatement bound = prepared.Bind("luke.tillman@datastax.com");
RowSet rows = await _session.ExecuteAsync(bound);
Batch Statements: Use and Misuse
• You can mix and match Simple/Bound statements in a batch
• Batches are Logged (atomic) by default
• Use when you want a group of mutations (statements) to all
succeed or all fail (denormalizing at write time)
• Large batches are an anti-pattern (Cassandra will warn you)
• Not a performance optimization for bulk-loading data
34
KillrVideo: Update a Video’s Name with a Batch
35
public class VideoCatalogDataAccess
{
public VideoCatalogDataAccess(ISession session)
{
_session = session;
_prepared = _session.Prepare(
"UPDATE user_videos SET name = ? WHERE userid = ? AND videoid = ?");
}
public async Task UpdateVideoName(UpdateVideoDto video)
{
BoundStatement bound = _prepared.Bind(video.Name, video.UserId, video.VideoId);
var simple = new SimpleStatement("UPDATE videos SET name = ? WHERE videoid = ?",
video.Name, video.VideoId);
// Use an atomic batch to send over all the mutations
var batchStatement = new BatchStatement();
batchStatement.Add(bound);
batchStatement.Add(simple);
RowSet rows = await _session.ExecuteAsync(batch);
}
}
Lightweight Transactions when you need them
• Use when you don’t want writes to step on each other
– Sometimes called Linearizable Consistency
– Similar to Serial Isolation Level from RDBMS
• Essentially a Check and Set (CAS) operation using Paxos
• Read the fine print: has a latency cost associated with it
• The canonical example: unique user accounts
36
KillrVideo: LWT to create user accounts
• Returns a column called [applied] indicating success/failure
• Different from relational world where you might expect an
Exception (i.e. PrimaryKeyViolationException or similar)
37
string cql = "INSERT INTO user_credentials (email, password, userid)" +
"VALUES (?, ?, ?) IF NOT EXISTS";
var statement = new SimpleStatement(cql, user.Email, hashedPassword, user.UserId);
RowSet rows = await _session.ExecuteAsync(statement);
var userInserted = rows.Single().GetValue<bool>("[applied]");
Software Architecture, A Love Story
Disclaimer: I am not paid to be a software architect
38
KillrVideo Logical Architecture
Web UI
HTML5 / JavaScript
KillrVideo MVC App
Serves up Web UI HTML and handles JSON requests from Web UI
Comments
Tracks comments on
videos by users
Uploads
Handles processing,
storing, and encoding
uploaded videos
Video Catalog
Tracks the catalog of
available videos
User Management
User accounts, login
credentials, profiles
Cassandra
Cluster (DSE)
App data storage
for services (e.g.
users, comments)
DataStax
OpsCenter
Management,
provisioning, and
monitoring
Azure Media
Services
Video encoding,
thumbnail
generation
Azure Storage
(Blob, Queue)
Video file and
thumbnail image
storage
Azure Service
Bus
Published events
from services for
interactions
Browser
Server
Services
Infrastructure
Inside a Simple Service: Video Catalog
Video Catalog
Tracks the catalog of
available videos
Cassandra
Cluster (DSE)
App data storage
for services (e.g.
users, comments)
Azure Service
Bus
Published events
from services for
interactions
Inside a Simple Service: Video Catalog
Video Catalog
Tracks the catalog of
available videos
Cassandra
Cluster (DSE)
App data storage
for services (e.g.
users, comments)
• Stores metadata about videos in
Cassandra (e.g. name, description,
location, thumbnail location, etc.)
Inside a Simple Service: Video Catalog
Video Catalog
Tracks the catalog of
available videos
Azure Service
Bus
Published events
from services for
interactions
• Publishes events about interesting things
that happen (e.g. YouTubeVideoAdded,
UploadedVideoAccepted, etc.)
Inside a More Complicated Service: Uploads
Uploads
Handles processing,
storing, and encoding
uploaded videos
Cassandra
Cluster (DSE)
App data storage
for services (e.g.
users, comments)
Azure Storage
(Blob, Queue)
Video file and
thumbnail image
storage
Azure Media
Services
Video encoding,
thumbnail
generation
Azure Service
Bus
Published events
from services for
interactions
Inside a More Complicated Service: Uploads
Uploads
Handles processing,
storing, and encoding
uploaded videos
Cassandra
Cluster (DSE)
App data storage
for services (e.g.
users, comments)
• Stores data about uploaded video file
locations, encoding jobs, job status, etc.
Inside a More Complicated Service: Uploads
Uploads
Handles processing,
storing, and encoding
uploaded videos
Azure Storage
(Blob, Queue)
Video file and
thumbnail image
storage
• Stores original and re-encoded video file
assets, as well as thumbnail preview
images generated
Inside a More Complicated Service: Uploads
Uploads
Handles processing,
storing, and encoding
uploaded videos
Azure Media
Services
Video encoding,
thumbnail
generation
• Re-encodes uploaded videos to format
suitable for the web, generates
thumbnail image previews
Inside a More Complicated Service: Uploads
Uploads
Handles processing,
storing, and encoding
uploaded videos
Azure Service
Bus
Published events
from services for
interactions
• Publishes events about interesting things
that happen (e.g.
UploadedVideoPublished, etc.)
Event Driven Architecture
• Only the application(s)
give commands
• Decoupled: Pub-sub
messaging to tell other
parts of the system
something interesting
happened
• Services could be
deployed, scaled, and
versioned independently
(AKA microservices)
42
Azure Service
Bus
User
Management
Comments
Video
Ratings
Sample Data
Search
Statistics
Suggested
Videos
Uploads
Video
Catalog
Event Driven Architecture
• Only the application(s)
give commands
• Decoupled: Pub-sub
messaging to tell other
parts of the system
something interesting
happened
• Services could be
deployed, scaled, and
versioned independently
(AKA microservices)
42
Azure Service
Bus
Search
Suggested
Videos
Video
Catalog
Hey, I added this
new YouTube video
to the catalog!
Event Driven Architecture
• Only the application(s)
give commands
• Decoupled: Pub-sub
messaging to tell other
parts of the system
something interesting
happened
• Services could be
deployed, scaled, and
versioned independently
(AKA microservices)
42
Azure Service
Bus
Search
Suggested
Videos
Video
Catalog
Hey, I added this
new YouTube video
to the catalog!
Time to figure
out what videos
to suggest for
that new video.
Better index that
new video so it
shows up in
search results.
The Future
In the year 3,000…
51
The Future, Conan?
Where do we go with KillrVideo from here?
• Spark or AzureML for video suggestions
• Video search via Solr
• Actors that store state in C* (Akka.NET or Orleans)
• Storing file data (thumbnails, profile pics) in C* using pithos
Questions?
54
Follow me on Twitter for updates or to ask questions later: @LukeTillman

More Related Content

PDF
Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos
PDF
Using CredHub for Kubernetes Deployments
PDF
Common primitives in Docker environments
PPTX
Managing modular software for your nu get, c++ and java development
PPTX
Take control. write a plugin. part II
PDF
Core Principles Of Ci
PDF
Cassandra Day Atlanta 2015: Building Your First Application with Apache Cassa...
Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos
Using CredHub for Kubernetes Deployments
Common primitives in Docker environments
Managing modular software for your nu get, c++ and java development
Take control. write a plugin. part II
Core Principles Of Ci
Cassandra Day Atlanta 2015: Building Your First Application with Apache Cassa...

Viewers also liked (20)

PDF
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
PDF
C* Summit 2013: Lock it Up: Securing Sensitive Data by Sam Heywood
PDF
C* Summit EU 2013: Mixing Batch and Real-Time: Cassandra with Shark
PDF
C* Summit EU 2013: Using Cassandra in a Telco Storage System
PDF
Writing Space and the Cassandra NoSQL DBMS
PDF
C* Summit 2013: Aligning Technology Infrastructure With Horizontal Business G...
PDF
Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Wal...
PDF
Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSE
PDF
C* Summit EU 2013: One Million Books: Adventures in Discoverability with Cass...
PPTX
High Throughput Analytics with Cassandra & Azure
PDF
Cassandra Day London 2015: Getting Started with Apache Cassandra and Java
PDF
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
PDF
Kindling: Getting Started with Spark and Cassandra
PPTX
C* Summit 2013: Eventual Consistency != Hopeful Consistency by Christos Kalan...
PDF
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
PPTX
NYC* 2013 — "Using Cassandra for DVR Scheduling at Comcast"
PDF
durability, durability, durability
PDF
Redundant Virtual Private Clouds
PDF
Canit AntiSpam Technology Report by Linux Magazine
PDF
The Cascading (big) data application framework
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
C* Summit 2013: Lock it Up: Securing Sensitive Data by Sam Heywood
C* Summit EU 2013: Mixing Batch and Real-Time: Cassandra with Shark
C* Summit EU 2013: Using Cassandra in a Telco Storage System
Writing Space and the Cassandra NoSQL DBMS
C* Summit 2013: Aligning Technology Infrastructure With Horizontal Business G...
Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Wal...
Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSE
C* Summit EU 2013: One Million Books: Adventures in Discoverability with Cass...
High Throughput Analytics with Cassandra & Azure
Cassandra Day London 2015: Getting Started with Apache Cassandra and Java
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
Kindling: Getting Started with Spark and Cassandra
C* Summit 2013: Eventual Consistency != Hopeful Consistency by Christos Kalan...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
NYC* 2013 — "Using Cassandra for DVR Scheduling at Comcast"
durability, durability, durability
Redundant Virtual Private Clouds
Canit AntiSpam Technology Report by Linux Magazine
The Cascading (big) data application framework
Ad

Similar to Cassandra Day London 2015: Building Your First Application in Apache Cassandra (20)

PDF
Building your First Application with Cassandra
PDF
KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...
PDF
Cassandra 3.0 Data Modeling
PPTX
Cassandra20141009
PPTX
Cassandra20141113
PDF
Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassan...
PDF
Apache Cassandra & Data Modeling
PDF
Advanced Data Modeling with Apache Cassandra
PDF
Introduction to Data Modeling with Apache Cassandra
PDF
Cassandra Data Modeling
PDF
Introduction to Data Modeling with Apache Cassandra
PDF
Cassandra at Zalando
PDF
The data model is dead, long live the data model
PDF
Introduction to .Net Driver
PDF
Cassandra and Spark
PDF
Cassandra 3.0 advanced preview
PDF
Paris Day Cassandra: Use case
PPTX
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
PPTX
PPTX
Apache Cassandra Developer Training Slide Deck
Building your First Application with Cassandra
KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...
Cassandra 3.0 Data Modeling
Cassandra20141009
Cassandra20141113
Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassan...
Apache Cassandra & Data Modeling
Advanced Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
Cassandra Data Modeling
Introduction to Data Modeling with Apache Cassandra
Cassandra at Zalando
The data model is dead, long live the data model
Introduction to .Net Driver
Cassandra and Spark
Cassandra 3.0 advanced preview
Paris Day Cassandra: Use case
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
Apache Cassandra Developer Training Slide Deck
Ad

More from DataStax Academy (20)

PDF
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
PPTX
Introduction to DataStax Enterprise Graph Database
PPTX
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
PPTX
Cassandra on Docker @ Walmart Labs
PPTX
Cassandra Adoption on Cisco UCS & Open stack
PDF
Data Modeling for Apache Cassandra
PDF
Coursera Cassandra Driver
PDF
Production Ready Cassandra
PDF
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 1
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 2
PDF
Standing Up Your First Cluster
PDF
Real Time Analytics with Dse
PDF
Cassandra Core Concepts
PPTX
Enabling Search in your Cassandra Application with DataStax Enterprise
PPTX
Bad Habits Die Hard
PDF
Advanced Cassandra
PDF
Apache Cassandra and Drivers
PDF
Getting Started with Graph Databases
PDF
Cassandra Data Maintenance with Spark
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Cassandra on Docker @ Walmart Labs
Cassandra Adoption on Cisco UCS & Open stack
Data Modeling for Apache Cassandra
Coursera Cassandra Driver
Production Ready Cassandra
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 2
Standing Up Your First Cluster
Real Time Analytics with Dse
Cassandra Core Concepts
Enabling Search in your Cassandra Application with DataStax Enterprise
Bad Habits Die Hard
Advanced Cassandra
Apache Cassandra and Drivers
Getting Started with Graph Databases
Cassandra Data Maintenance with Spark

Recently uploaded (20)

PPTX
Cloud computing and distributed systems.
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Empathic Computing: Creating Shared Understanding
PDF
cuic standard and advanced reporting.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Machine learning based COVID-19 study performance prediction
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Cloud computing and distributed systems.
Network Security Unit 5.pdf for BCA BBA.
Reach Out and Touch Someone: Haptics and Empathic Computing
The Rise and Fall of 3GPP – Time for a Sabbatical?
Empathic Computing: Creating Shared Understanding
cuic standard and advanced reporting.pdf
Approach and Philosophy of On baking technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Digital-Transformation-Roadmap-for-Companies.pptx
Big Data Technologies - Introduction.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Machine learning based COVID-19 study performance prediction
NewMind AI Weekly Chronicles - August'25 Week I
Unlocking AI with Model Context Protocol (MCP)
Programs and apps: productivity, graphics, security and other tools
Chapter 3 Spatial Domain Image Processing.pdf
20250228 LYD VKU AI Blended-Learning.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

Cassandra Day London 2015: Building Your First Application in Apache Cassandra

  • 1. Building Your First Application with Cassandra Luke Tillman (@LukeTillman) Language Evangelist at DataStax
  • 2. Who are you?! • Evangelist with a focus on the .NET Community • Long-time .NET Developer • Recently presented at Cassandra Summit 2014 with Microsoft 2
  • 3. KillrVideo, a Video Sharing Site • Think a YouTube competitor – Users add videos, rate them, comment on them, etc. – Can search for videos by tag
  • 4. See the Live Demo, Get the Code • Live demo available at http://www.killrvideo.com – Written in C# – Live Demo running in Azure – Open source: https://github.com/luketillman/killrvideo-csharp • Interesting use case because of different data modeling challenges and the scale of something like YouTube – More than 1 billion unique users visit YouTube each month – 100 hours of video are uploaded to YouTube every minute 4
  • 5. 1 Think Before You Model 2 A Data Model for Cat Videos 3 Phase 2: Build the Application 4 Software Architecture, A Love Story 5 The Future 5
  • 6. Think Before You Model Or how to keep doing what you’re already doing 6
  • 7. Getting to Know Your Data • What things do I have in the system? • What are the relationships between them? • This is your conceptual data model • You already do this in the RDBMS world
  • 8. Some of the Entities and Relationships in KillrVideo 8 User id firstname lastname email password Video id name description location preview_image tags features Comment comment id adds timestamp posts timestamp 1 n n 1 1 n n m rates rating
  • 9. Getting to Know Your Queries • What are your application’s workflows? • How will I access the data? • Knowing your queries in advance is NOT optional • Different from RDBMS because I can’t just JOIN or create a new indexes to support new queries 9
  • 10. Some Application Workflows in KillrVideo 10 User Logs into site Show basic information about user Show videos added by a user Show comments posted by a user Search for a video by tag Show latest videos added to the site Show comments for a video Show ratings for a video Show video and its details
  • 11. Some Queries in KillrVideo to Support Workflows 11 Users User Logs into site Find user by email address Show basic information about user Find user by id Comments Show comments for a video Find comments by video (latest first) Show comments posted by a user Find comments by user (latest first) Ratings Show ratings for a video Find ratings by video
  • 12. Some Queries in KillrVideo to Support Workflows 12 Videos Search for a video by tag Find video by tag Show latest videos added to the site Find videos by date (latest first) Show video and its details Find video by id Show videos added by a user Find videos by user (latest first)
  • 13. A Data Model for Cat Videos Because the Internet loves ‘em some cat videos 13
  • 14. Just How Popular are Cats on the Internet? 14 http://mashable.com/2013/07/08/cats-bacon-rule-internet/
  • 15. Just How Popular are Cats on the Internet? 15 http://mashable.com/2013/07/08/cats-bacon-rule-internet/
  • 16. Data Modeling Refresher • Cassandra limits us to queries that can scale across many nodes – Include value for Partition Key and optionally, Clustering Column(s) • We know our queries, so we build tables to answer them • Denormalize at write time to do as few reads as possible • Many times we end up with a “table per query” – Similar to materialized views from the RDBMS world 16
  • 17. Users – The Relational Way • Single Users table with all user data and an Id Primary Key • Add an index on email address to allow queries by email User Logs into site Find user by email address Show basic information about user Find user by id
  • 18. Users – The Cassandra Way User Logs into site Find user by email address Show basic information about user Find user by id CREATE TABLE user_credentials ( email text, password text, userid uuid, PRIMARY KEY (email) ); CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) );
  • 19. Videos Everywhere! 19 Show video and its details Find video by id Show videos added by a user Find videos by user (latest first) CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) ); CREATE TABLE user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (userid, added_date, videoid) ) WITH CLUSTERING ORDER BY ( added_date DESC, videoid ASC);
  • 20. Videos Everywhere! Considerations When Duplicating Data • Can the data change? • How likely is it to change or how frequently will it change? • Do I have all the information I need to update duplicates and maintain consistency? 20 Search for a video by tag Find video by tag Show latest videos added to the site Find videos by date (latest first)
  • 21. Modeling Relationships – Collection Types • Cassandra doesn’t support JOINs, but your data will still have relationships (and you can still model that in Cassandra) • One tool available is CQL collection types CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) );
  • 22. Modeling Relationships – Client Side Joins 22 CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) ); CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) ); Currently requires query for video, followed by query for user by id based on results of first query
  • 23. Modeling Relationships – Client Side Joins • What is the cost? Might be OK in small situations • Do NOT scale • Avoid when possible 23
  • 24. Modeling Relationships – Client Side Joins 24 CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, ... user_firstname text, user_lastname text, user_email text, PRIMARY KEY (videoid) ); CREATE TABLE users_by_video ( videoid uuid, userid uuid, firstname text, lastname text, email text, PRIMARY KEY (videoid) ); or
  • 25. Modeling Relationships – Client Side Joins • Remember the considerations when you duplicate data • What happens if a user changes their name or email address? • Can I update the duplicated data? 25
  • 26. Cassandra Rules Can Impact Your Design • Video Ratings – use counters to track sum of all ratings and count of ratings • Counters are a good example of something with special rules CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, ... rating_counter counter, rating_total counter, PRIMARY KEY (videoid) ); CREATE TABLE video_ratings ( videoid uuid, rating_counter counter, rating_total counter, PRIMARY KEY (videoid) );
  • 27. Single Nodes Have Limits Too • Latest videos are bucketed by day • Means all reads/writes to latest videos are going to same partition (and thus the same nodes) • Could create a hotspot 27 Show latest videos added to the site Find videos by date (latest first) CREATE TABLE latest_videos ( yyyymmdd text, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (yyyymmdd, added_date, videoid) ) WITH CLUSTERING ORDER BY ( added_date DESC, videoid ASC );
  • 28. Single Nodes Have Limits Too • Mitigate by adding data to the Partition Key to spread load • Data that’s already naturally a part of the domain – Latest videos by category? • Arbitrary data, like a bucket number – Round robin at the app level 28 Show latest videos added to the site Find videos by date (latest first) CREATE TABLE latest_videos ( yyyymmdd text, bucket_number int, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY ( (yyyymmdd, bucket_number) added_date, videoid) ) ...
  • 29. Phase 2: Build the Application Phase 3: Profit 29 Phase 1: Data Model
  • 30. The DataStax Drivers for Cassandra • Currently Available – C# (.NET) – Python – Java – NodeJS – Ruby – C++ • Will Probably Happen – PHP – Scala – JDBC • Early Discussions – Go – Rust 30 • Open source, Apache 2 licensed, available on GitHub – https://github.com/datastax/
  • 31. The DataStax Drivers for Cassandra Language Bootstrapping Code C# Cluster cluster = Cluster.Builder().AddContactPoint("127.0.0.1").Build(); ISession session = cluster.Connect("killrvideo"); Python from cassandra.cluster import Cluster cluster = Cluster(contact_points=['127.0.0.1']) session = cluster.connect('killrvideo') Java Cluster cluster = Cluster.builder().addContactPoint("127.0.0.1").build(); Session session = cluster.connect("killrvideo"); NodeJS var cassandra = require('cassandra-driver'); var client = new cassandra.Client({ contactPoints: ['127.0.0.1'], keyspace: 'killrvideo' });
  • 32. Use Prepared Statements • Performance optimization for queries you run repeatedly • Pay the cost of preparing once (causes roundtrip to Cassandra) • KillrVideo: looking a user’s credentials up by email address • Save and reuse the PreparedStatement instance after preparing 32 PreparedStatement prepared = session.Prepare( "SELECT * FROM user_credentials WHERE email = ?");
  • 33. Use Prepared Statements • Bind variable values when ready to execute • Execution only has to send variable values over the wire • Cassandra doesn’t have to reparse the CQL string each time • Remember: Prepare once, bind and execute many 33 BoundStatement bound = prepared.Bind("luke.tillman@datastax.com"); RowSet rows = await _session.ExecuteAsync(bound);
  • 34. Batch Statements: Use and Misuse • You can mix and match Simple/Bound statements in a batch • Batches are Logged (atomic) by default • Use when you want a group of mutations (statements) to all succeed or all fail (denormalizing at write time) • Large batches are an anti-pattern (Cassandra will warn you) • Not a performance optimization for bulk-loading data 34
  • 35. KillrVideo: Update a Video’s Name with a Batch 35 public class VideoCatalogDataAccess { public VideoCatalogDataAccess(ISession session) { _session = session; _prepared = _session.Prepare( "UPDATE user_videos SET name = ? WHERE userid = ? AND videoid = ?"); } public async Task UpdateVideoName(UpdateVideoDto video) { BoundStatement bound = _prepared.Bind(video.Name, video.UserId, video.VideoId); var simple = new SimpleStatement("UPDATE videos SET name = ? WHERE videoid = ?", video.Name, video.VideoId); // Use an atomic batch to send over all the mutations var batchStatement = new BatchStatement(); batchStatement.Add(bound); batchStatement.Add(simple); RowSet rows = await _session.ExecuteAsync(batch); } }
  • 36. Lightweight Transactions when you need them • Use when you don’t want writes to step on each other – Sometimes called Linearizable Consistency – Similar to Serial Isolation Level from RDBMS • Essentially a Check and Set (CAS) operation using Paxos • Read the fine print: has a latency cost associated with it • The canonical example: unique user accounts 36
  • 37. KillrVideo: LWT to create user accounts • Returns a column called [applied] indicating success/failure • Different from relational world where you might expect an Exception (i.e. PrimaryKeyViolationException or similar) 37 string cql = "INSERT INTO user_credentials (email, password, userid)" + "VALUES (?, ?, ?) IF NOT EXISTS"; var statement = new SimpleStatement(cql, user.Email, hashedPassword, user.UserId); RowSet rows = await _session.ExecuteAsync(statement); var userInserted = rows.Single().GetValue<bool>("[applied]");
  • 38. Software Architecture, A Love Story Disclaimer: I am not paid to be a software architect 38
  • 39. KillrVideo Logical Architecture Web UI HTML5 / JavaScript KillrVideo MVC App Serves up Web UI HTML and handles JSON requests from Web UI Comments Tracks comments on videos by users Uploads Handles processing, storing, and encoding uploaded videos Video Catalog Tracks the catalog of available videos User Management User accounts, login credentials, profiles Cassandra Cluster (DSE) App data storage for services (e.g. users, comments) DataStax OpsCenter Management, provisioning, and monitoring Azure Media Services Video encoding, thumbnail generation Azure Storage (Blob, Queue) Video file and thumbnail image storage Azure Service Bus Published events from services for interactions Browser Server Services Infrastructure
  • 40. Inside a Simple Service: Video Catalog Video Catalog Tracks the catalog of available videos Cassandra Cluster (DSE) App data storage for services (e.g. users, comments) Azure Service Bus Published events from services for interactions
  • 41. Inside a Simple Service: Video Catalog Video Catalog Tracks the catalog of available videos Cassandra Cluster (DSE) App data storage for services (e.g. users, comments) • Stores metadata about videos in Cassandra (e.g. name, description, location, thumbnail location, etc.)
  • 42. Inside a Simple Service: Video Catalog Video Catalog Tracks the catalog of available videos Azure Service Bus Published events from services for interactions • Publishes events about interesting things that happen (e.g. YouTubeVideoAdded, UploadedVideoAccepted, etc.)
  • 43. Inside a More Complicated Service: Uploads Uploads Handles processing, storing, and encoding uploaded videos Cassandra Cluster (DSE) App data storage for services (e.g. users, comments) Azure Storage (Blob, Queue) Video file and thumbnail image storage Azure Media Services Video encoding, thumbnail generation Azure Service Bus Published events from services for interactions
  • 44. Inside a More Complicated Service: Uploads Uploads Handles processing, storing, and encoding uploaded videos Cassandra Cluster (DSE) App data storage for services (e.g. users, comments) • Stores data about uploaded video file locations, encoding jobs, job status, etc.
  • 45. Inside a More Complicated Service: Uploads Uploads Handles processing, storing, and encoding uploaded videos Azure Storage (Blob, Queue) Video file and thumbnail image storage • Stores original and re-encoded video file assets, as well as thumbnail preview images generated
  • 46. Inside a More Complicated Service: Uploads Uploads Handles processing, storing, and encoding uploaded videos Azure Media Services Video encoding, thumbnail generation • Re-encodes uploaded videos to format suitable for the web, generates thumbnail image previews
  • 47. Inside a More Complicated Service: Uploads Uploads Handles processing, storing, and encoding uploaded videos Azure Service Bus Published events from services for interactions • Publishes events about interesting things that happen (e.g. UploadedVideoPublished, etc.)
  • 48. Event Driven Architecture • Only the application(s) give commands • Decoupled: Pub-sub messaging to tell other parts of the system something interesting happened • Services could be deployed, scaled, and versioned independently (AKA microservices) 42 Azure Service Bus User Management Comments Video Ratings Sample Data Search Statistics Suggested Videos Uploads Video Catalog
  • 49. Event Driven Architecture • Only the application(s) give commands • Decoupled: Pub-sub messaging to tell other parts of the system something interesting happened • Services could be deployed, scaled, and versioned independently (AKA microservices) 42 Azure Service Bus Search Suggested Videos Video Catalog Hey, I added this new YouTube video to the catalog!
  • 50. Event Driven Architecture • Only the application(s) give commands • Decoupled: Pub-sub messaging to tell other parts of the system something interesting happened • Services could be deployed, scaled, and versioned independently (AKA microservices) 42 Azure Service Bus Search Suggested Videos Video Catalog Hey, I added this new YouTube video to the catalog! Time to figure out what videos to suggest for that new video. Better index that new video so it shows up in search results.
  • 51. The Future In the year 3,000… 51
  • 53. Where do we go with KillrVideo from here? • Spark or AzureML for video suggestions • Video search via Solr • Actors that store state in C* (Akka.NET or Orleans) • Storing file data (thumbnails, profile pics) in C* using pithos
  • 54. Questions? 54 Follow me on Twitter for updates or to ask questions later: @LukeTillman