SlideShare a Scribd company logo
©2014 DataStax Confidential. Do not distribute without consent.
CTO, DataStax
Jonathan Ellis
Project Chair, Apache Cassandra
Cassandra 3.0: JSON at Scale
How we got here
Cassandra 3.0 - JSON at scale - StampedeCon 2015
objectContainer.store(batch);
objectContainer.store(batch);
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015
SELECT offices.name, MAX(orders.created_at)
FROM offices NATURAL JOIN orders
GROUP BY offices.name;
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Why today is different
"glossary": {
"title": "example glossary",
! ! "GlossDiv": {
"title": "S",
! ! ! "GlossList": {
"GlossEntry": {
"ID": "SGML",
! ! ! ! ! "SortAs": "SGML",
! ! ! ! ! "GlossTerm": "Standard Generalized Markup Language",
! ! ! ! ! "Acronym": "SGML",
! ! ! ! ! "Abbrev": "ISO 8879:1986",
! ! ! ! ! "GlossDef": {
"para": "A meta-markup language, used to create markup
languages such as DocBook.",
! ! ! ! ! ! "GlossSeeAlso": ["GML", "XML"]
},
! ! ! ! ! "GlossSee": "markup"
}
}
}
p1
p1
p1
p1
p1
p1
p1
p1
p1
Documents in Cassandra
CQL
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
state text,
birth_date int
);
CREATE INDEX ON users(state);
SELECT * FROM users
WHERE state=‘Texas’
AND birth_date > 1950;
Collections
CREATE TABLE example (
    id int PRIMARY KEY,
    tupleval tuple<int, text>,
    numbers set<int>,
    words list<text>
);
INSERT INTO example (id, tupleval, numbers, words)
VALUES (0, (1, 'foo'), {1, 2, 3, 6}, ['the', 'quick', 'brown', 'fox']);
User-defined types (UDT)
CREATE TYPE address (number int, street text);
CREATE TABLE users (
id int PRIMARY KEY,
street_address frozen<address>
);
INSERT INTO users (id, street_address)
VALUES (1, {number: 123, street: 'Cassandra Ave'});
JSON
INSERT INTO example JSON
'{"id": 0,
"tupleval": [1, "foo"],
"numbers": [1, 2, 3, 6],
"words": ["the", "quick", "brown", "fox"]}';
INSERT INTO users JSON
'{"id": 1,
"street_address": {"number": 1,
"" "street": "Cassandra Ave"}}';
Nested
CREATE TYPE address (
street text,
city text,
zip_code int,
phones set<text>
);
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
addresses map<text, frozen<address>>
);
Nested
INSERT INTO users JSON
'{"id": "0514e410-2a9f-11e5-a2cb-0800200c9a66",
"name": "jbellis",
"addresses": {"home": {"street": "9920 Cassandra Ave",
"city": "Austin",
"zip_code": 78700,
"phones": ["1238614789"]}}}';
What about schemaless documents?
Cassandra 3.0 - JSON at scale - StampedeCon 2015
{"userid": "2452347",
"name": "jbellis",
... }
{"userid": 2452348,
"name": "jhaddad",
... }
{"user_id": 2452349,
"name": "jlacefield",
... }
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Performance and scale
Performance and scale
read-mostly
Performance and scale
read-mostly balanced
Performance and scale
read-mostly balanced
write-mostly
Performance and scale
read-mostly balanced
write-mostly op/analytic
Latency
balanced
write-mostly op/analytic
read-mostly
See also
•The myth of schema-less:
http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/
•Schema-less is (usually) a lie:
https://www.compose.io/articles/schema-less-is-usually-a-lie/
•Schemaless databases don’t exist:
https://vividcortex.com/blog/2015/02/24/schemaless-databases-dont-
exist/
ACID in Cassandra
Lightweight transactions
Lightweight transactions
INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ba27e03fd9...',
'2011-06-20 13:50:00')
IF NOT EXISTS;
Lightweight transactions
INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ba27e03fd9...',
'2011-06-20 13:50:00')
IF NOT EXISTS;
INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ea24e13ad9...',
'2011-06-20 13:50:01')
IF NOT EXISTS;
Lightweight transactions
INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ba27e03fd9...',
'2011-06-20 13:50:00')
IF NOT EXISTS;
[applied]
-----------
True
INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ea24e13ad9...',
'2011-06-20 13:50:01')
IF NOT EXISTS;
Lightweight transactions
[applied] | username | created_date | name
-----------+----------+----------------+----------------
False | pmcfadin | 2011-06-20 ... | Patrick McFadin
INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ba27e03fd9...',
'2011-06-20 13:50:00')
IF NOT EXISTS;
[applied]
-----------
True
INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ea24e13ad9...',
'2011-06-20 13:50:01')
IF NOT EXISTS;
Static columns
CREATE TABLE bills (
user text,
balance int static,
expense_id int,
amount int,
description text,
paid boolean,
PRIMARY KEY (user, expense_id)
);
Static columns + LWT
CREATE TABLE bills (
user text,
balance int static,
expense_id int,
amount int,
description text,
paid boolean,
PRIMARY KEY (user, expense_id)
);
BEGIN BATCH
UPDATE bills SET balance = -116 WHERE user='user1' IF balance = 84;
INSERT INTO bills (user, expense_id, amount, description, paid)
VALUES ('user1', 2, 200, 'hotel room', false);
APPLY BATCH;
2.2 and 3.0 Preview
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Role-based authorization
CREATE ROLE manager
WITH PASSWORD 'foo' LOGIN;
GRANT authorize TO manager;
GRANT manager TO jbellis;
Hinted handoff improvements
CREATE TABLE system.hints (
target_id uuid,
hint_id timeuuid,
message_version int,
mutation blob,
PRIMARY KEY (target_id, hint_id, message_version)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (hint_id ASC, message_version ASC)
SSTable-based hints
Hint
SSTable-based hints
Hint
Commitlog
SSTable-based hints
Hint
Commitlog
Memtable
SSTable-based hints
Hint
Commitlog
Memtable
SSTable
SSTable-based hints
Hint
Commitlog
Memtable
SSTable
Tombstone
SSTable-based hints
Hint
Commitlog
Memtable
SSTable
Tombstone
Commitlog
SSTable-based hints
Hint
Commitlog
Memtable
SSTable
Memtable
Tombstone
Commitlog
SSTable-based hints
Hint
Commitlog
Memtable
SSTable
Memtable
SSTable
Tombstone
Commitlog
SSTable-based hints
Hint
Commitlog
Memtable
SSTable
Memtable
SSTable
Tombstone
Commitlog
Compacted
File-based hints
.168.101
File-based hints
Hint
.168.101
File-based hints
Hint
.168.101
Hint
File-based hints
Hint
.168.101
Hint
Hint
File-based hints
Hint
.168.101
Hint
Hint
Hint
File-based hints
Hint
.168.101
Hint
Hint
Hint
Hint
File-based hints
Hint
.168.101
Hint
Hint
Hint
Hint
Hint
File-based hints
Hint
.168.101
Hint
Hint
Hint
Hint
Hint
Hint
File-based hints
Hint
.168.101
Hint
Hint
Hint
Hint
Hint
Hint
Hint
File-based hints
Hint
.168.101
Hint
Hint
Hint
Hint
Hint
Hint
Hint
.168.104
Hint
Hint
Hint
Hint
Hint
Hint
Hint
Hint
File-based hints
Hint
.168.101
Hint
Hint
Hint
Hint
Hint
Hint
Hint
.168.104
Hint
Hint
Hint
Hint
Hint
Hint
Hint
Hint
.168.112
Hint
Hint
Hint
Hint
Hint
Hint
Hint
Hint
File-based hints
.168.104
Hint
Hint
Hint
Hint
Hint
Hint
Hint
Hint
.168.112
Hint
Hint
Hint
Hint
Hint
Hint
Hint
Hint
User-defined functions
CREATE FUNCTION my_sin (input double)
RETURNS double LANGUAGE java
AS ’
return input == null
? null
: Double.valueOf(Math.sin(input.doubleValue()));
’;
SELECT key, my_sin(value) FROM my_table WHERE key IN (1, 2, 3);
also aggregates
http://www.slideshare.net/RobertStupp/user-definedfunctionscassandrasummiteu2014
[robert stupp user defined functions]
3.x development process
Materialized views
CREATE MATERIALIZED VIEW songs_by_album AS
SELECT * FROM songs
WHERE album IS NOT NULL
PRIMARY KEY (album, id);
SELECT * FROM songs_by_album
WHERE album = ‘Tres Hombres’;
Indexes
CREATE TABLE songs (
  id uuid PRIMARY KEY,
  title text,
  album text,
  artist text
);
CREATE INDEX songs_by_album on songs(album);
insert into songs (id, title, artist, album)
values ('a3e64f8f...', 'La Grange', 'ZZ Top', 'Tres Hombres');
insert into songs (id, title, artist, album)
values ('8a172618...', 'Waitin for the Bus', 'ZZ Top', 'Tres Hombres');
insert into songs (id, title, artist, album)
values ('2b09185b...', 'Outside Woman Blues', 'Back Door Slam', 'Roll Away');
SELECT * FROM songs
WHERE album = ‘Tres Hombres’;
Local indexes
client
title artist album
La
Grange
ZZ Top
Tres
Hombre
s
title artist album
Outside...
Back Door
Slam
Roll Away
title artist album
Waitin... ZZ Top
Tres
Hombres
Materialized Views
client
album id
Tres
Hombres
a3e64f8f
Tres
Hombres
8a172618
album id
Roll Away 2b09185b
Upcoming releases
Upcoming releases
•2.2: July 20th
Upcoming releases
•2.2: July 20th
•3.0: Late September
Upcoming releases
•2.2: July 20th
•3.0: Late September
•3.1: November
Upcoming releases
•2.2: July 20th
•3.0: Late September
•3.1: November
•3.2: December
Questions

More Related Content

PDF
Cassandra 3.0
PDF
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
PPTX
Cassandra 2.2 & 3.0
PDF
Cassandra Materialized Views
PDF
Cassandra Community Webinar | Become a Super Modeler
PDF
Cassandra nice use cases and worst anti patterns
PDF
Bulk Loading Data into Cassandra
PDF
CQL3 in depth
Cassandra 3.0
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
Cassandra 2.2 & 3.0
Cassandra Materialized Views
Cassandra Community Webinar | Become a Super Modeler
Cassandra nice use cases and worst anti patterns
Bulk Loading Data into Cassandra
CQL3 in depth

What's hot (20)

PDF
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
PDF
Cassandra 3.0 Awesomeness
PDF
Cassandra introduction 2016
PDF
Apache cassandra in 2016
PDF
How to Use JSON in MySQL Wrong
PDF
Indexing in Cassandra
PDF
Cutting Edge Data Processing with PHP & XQuery
PDF
Using JSON with MariaDB and MySQL
PPTX
Using Spark to Load Oracle Data into Cassandra
PPTX
Apache Cassandra Data Modeling with Travis Price
PDF
Cassandra EU - Data model on fire
PPTX
Getting started with Elasticsearch and .NET
PPTX
BGOUG15: JSON support in MySQL 5.7
PDF
Cassandra Day Chicago 2015: Advanced Data Modeling
PDF
Cassandra 3 new features 2016
PDF
Developing and Deploying Apps with the Postgres FDW
PPTX
Slick: Bringing Scala’s Powerful Features to Your Database Access
PDF
Cloudera Impala, updated for v1.0
PDF
XQuery in the Cloud
PDF
Polyglot Persistence
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
Cassandra 3.0 Awesomeness
Cassandra introduction 2016
Apache cassandra in 2016
How to Use JSON in MySQL Wrong
Indexing in Cassandra
Cutting Edge Data Processing with PHP & XQuery
Using JSON with MariaDB and MySQL
Using Spark to Load Oracle Data into Cassandra
Apache Cassandra Data Modeling with Travis Price
Cassandra EU - Data model on fire
Getting started with Elasticsearch and .NET
BGOUG15: JSON support in MySQL 5.7
Cassandra Day Chicago 2015: Advanced Data Modeling
Cassandra 3 new features 2016
Developing and Deploying Apps with the Postgres FDW
Slick: Bringing Scala’s Powerful Features to Your Database Access
Cloudera Impala, updated for v1.0
XQuery in the Cloud
Polyglot Persistence
Ad

Viewers also liked (20)

PPTX
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
PDF
Cassandra 3.0 Data Modeling
PPT
Orange County HUG - Agile Data on HDP
PPTX
Angular JS 2.0 & React with Kendo UI
PDF
Scalable PHP Applications With Cassandra
PPTX
C*ollege Credit: Creating Your First App in Java with Cassandra
PPT
Schemaless Databases
PPTX
Cassandraのバックアップと運用を考える
PDF
Apache Cassandra and Go
PDF
Cassandra 3 new features @ Geecon Krakow 2016
PDF
Introduction spark
PDF
[TDC2016] Apache Cassandra Estratégias de Modelagem de Dados
PDF
Sasi, cassandra on full text search ride
PDF
Wayne State University & DataStax: World's best data modeling tool for Apache...
PDF
Cassandra vs. Redis
PDF
Spark (v1.3) - Présentation (Français)
PDF
Spark, ou comment traiter des données à la vitesse de l'éclair
PDF
Spark SQL principes et fonctions
PDF
Algorithme distribués pour big data saison 2 @DevoxxFR 2016
PPT
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
Cassandra 3.0 Data Modeling
Orange County HUG - Agile Data on HDP
Angular JS 2.0 & React with Kendo UI
Scalable PHP Applications With Cassandra
C*ollege Credit: Creating Your First App in Java with Cassandra
Schemaless Databases
Cassandraのバックアップと運用を考える
Apache Cassandra and Go
Cassandra 3 new features @ Geecon Krakow 2016
Introduction spark
[TDC2016] Apache Cassandra Estratégias de Modelagem de Dados
Sasi, cassandra on full text search ride
Wayne State University & DataStax: World's best data modeling tool for Apache...
Cassandra vs. Redis
Spark (v1.3) - Présentation (Français)
Spark, ou comment traiter des données à la vitesse de l'éclair
Spark SQL principes et fonctions
Algorithme distribués pour big data saison 2 @DevoxxFR 2016
Ad

Similar to Cassandra 3.0 - JSON at scale - StampedeCon 2015 (20)

PDF
Cassandra Summit 2013 Keynote
PPT
2011 Mongo FR - MongoDB introduction
PPT
Introduction to MongoDB
PPTX
To scale or not to scale: Key/Value, Document, SQL, JPA – What’s right for my...
PDF
Avro, la puissance du binaire, la souplesse du JSON
PDF
Abusing text/template for data transformation
PDF
Making your elastic cluster perform - Jettro Coenradie - Codemotion Amsterdam...
PDF
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
PDF
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
PDF
Integrate CI/CD Pipelines with Jira Software Cloud
PPTX
ElasticSearch for .NET Developers
ODP
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
ODP
Intravert Server side processing for Cassandra
KEY
Mongo db勉強会20110730
PPTX
N1QL: What's new in Couchbase 5.0
PPTX
NoSQL Endgame DevoxxUA Conference 2020
PPTX
Query in Couchbase. N1QL: SQL for JSON
PDF
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
PPTX
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
Cassandra Summit 2013 Keynote
2011 Mongo FR - MongoDB introduction
Introduction to MongoDB
To scale or not to scale: Key/Value, Document, SQL, JPA – What’s right for my...
Avro, la puissance du binaire, la souplesse du JSON
Abusing text/template for data transformation
Making your elastic cluster perform - Jettro Coenradie - Codemotion Amsterdam...
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Integrate CI/CD Pipelines with Jira Software Cloud
ElasticSearch for .NET Developers
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
Intravert Server side processing for Cassandra
Mongo db勉強会20110730
N1QL: What's new in Couchbase 5.0
NoSQL Endgame DevoxxUA Conference 2020
Query in Couchbase. N1QL: SQL for JSON
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5

More from StampedeCon (20)

PDF
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
PDF
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
PDF
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
PDF
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
PDF
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
PDF
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
PDF
Foundations of Machine Learning - StampedeCon AI Summit 2017
PDF
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
PDF
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
PDF
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
PDF
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
PDF
A Different Data Science Approach - StampedeCon AI Summit 2017
PDF
Graph in Customer 360 - StampedeCon Big Data Conference 2017
PDF
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
PDF
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
PDF
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
PDF
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
PDF
Innovation in the Data Warehouse - StampedeCon 2016
PPTX
Creating a Data Driven Organization - StampedeCon 2016
PPTX
Using The Internet of Things for Population Health Management - StampedeCon 2016
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Innovation in the Data Warehouse - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016

Recently uploaded (20)

PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Mega Projects Data Mega Projects Data
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
A Quantitative-WPS Office.pptx research study
PPTX
Logistic Regression ml machine learning.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Global journeys: estimating international migration
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPT
Quality review (1)_presentation of this 21
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Miokarditis (Inflamasi pada Otot Jantung)
Mega Projects Data Mega Projects Data
.pdf is not working space design for the following data for the following dat...
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
climate analysis of Dhaka ,Banglades.pptx
Introduction to machine learning and Linear Models
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
A Quantitative-WPS Office.pptx research study
Logistic Regression ml machine learning.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
STUDY DESIGN details- Lt Col Maksud (21).pptx
Clinical guidelines as a resource for EBP(1).pdf
Global journeys: estimating international migration
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Moving the Public Sector (Government) to a Digital Adoption
Quality review (1)_presentation of this 21
IB Computer Science - Internal Assessment.pptx
Introduction-to-Cloud-ComputingFinal.pptx

Cassandra 3.0 - JSON at scale - StampedeCon 2015