SlideShare a Scribd company logo
Chicago 2015
CQL: This is not the SQL
you are looking for
Aaron Ploetz
Wait... CQL is not SQL?
l CQL3 introduced in Cassandra 1.1.
l CQL is beneficial to new users who have a relational background
(which is most of us).
l However similar, CQL is NOT a direct implementation of SQL.
l New users leave themselves open to issues and frustration when
they use CQL with SQL-based expectations.
$ whoami
l Aaron Ploetz
l @APloetz
l Lead Database Engineer
l Using Cassandra since version 0.8.
l Contributor to the Cassandra tag on
l 2014/15 DataStax MVP for Apache Cassandra
1 SQL features/keywords not present in CQL
2 Differences between CQL and SQL keywords
3 Secondary Indexes
4 Anti-Patterns
5 Questions
SQL features/keywords not present in CQL
l JOINs
l LIKE
l Subqueries
l Aggregation
l Arithmetic
l Except for counters and collections.
Differences between CQL and SQL keywords
l WHERE
l PRIMARY KEY
l ORDER BY
l IN
l DISTINCT
l COUNT
l LIMIT
l INSERT vs. UPDATE (“upsert”)
WHERE
l Only supports AND, IN, =, >, >=, <, <=.
l Some only function under certain conditions.
l Also: CONTAINS, CONTAINS KEY for indexed collections.
l Does not exist: OR, !=
l Conditions can only operate on PRIMARY KEY components, and
in the defined order of the keys.
WHERE (cont)
l SELECT * FROM shipcrewregistry WHERE
shipname='Serenity';
l Start with partition key(s); cannot skip PRIMARY KEY
components.
l CREATE TABLE shipcrewregistry
(shipname text, lastname text, firstname
text, citizenid uuid,
aliases set<text>, PRIMARY KEY
(shipname, lastname, firstname,
citizenid));
ALLOW FILTERING
l Actually I lied, you can skip primary key components if you apply
the ALLOW FILTERING clause.
l SELECT * FROM shipcrewregistry WHERE
lastname='Washburne';
ALLOW FILTERING (cont)
l SELECT * FROM shipcrewregistry WHERE
lastname='Washburne' ALLOW FILTERING;
l But I don't recommend that.
l ALLOW FILTERING pulls back all rows and then applies your
WHERE conditions.
l The folks at DataStax have proposed some alternate names...
l Bottom line, if you are using ALLOW FILTERING, you are doing it
wrong.
PRIMARY KEY
l PRIMARY KEYs function differently between Cassandra and
relational databases.
l Cassandra uses primary keys to determine data distribution and
on-disk sort order.
l Partition keys are the equivalent of “old school” row keys.
l Clustering keys determine on-disk sort order within a
partitioning key.
ORDER BY
l One of the most misunderstood aspects of CQL.
l Can only order by clustering columns, in the key order of the
clustering columns listed in the table definition (CLUSTERING
ORDER).
l Which means, that you really don't need ORDER BY.
l So what does it do? It can reverse the sort direction (ASCending
vs. DESCending) of the first clustering column.
PRIMARY KEY / ORDER BY Example:
Table Definition
l CREATE TABLE postsByUserYear
l (userid text, year bigint, tag text, posttime
timestamp, content text, postid UUID,
PRIMARY KEY ((userid, year), posttime,
tag)) WITH CLUSTERING ORDER BY
(posttime desc, tag asc);
PRIMARY KEY / ORDER BY Example:
Queries
l SELECT * FROM postsByUserYear WHERE userid='2';
l SELECT * FROM postsByUserYear ORDER BY
posttime;
l SELECT * FROM postsByUserYear WHERE userid='2'
AND year=2015 ORDER BY posttime DESC;
l SELECT * FROM postsByUserYear WHERE userid='2'
AND year=2015 ORDER BY tag;
IN
l Can only operate on the last partition key and/or the last clustering
key.
l And only when the first partition/clustering keys are restricted
by an equals relation.
l Does not perform well...especially with large clusters.
Testing IN
l CREATE TABLE bladerunners (id text, type text, ts
timestamp, name text, data text, PRIMARY KEY (id));
Testing IN (cont)
l  SELECT * FROM bladerunners WHERE id IN
('B26354','B26354');
DISTINCT
l Returns a list of the queried partition keys.
l Can only operate on partition key column(s).
l In Cassandra DISTINCT returns the partition (row) keys, so it is a
fairly light operation (relative to the size of the cluster and/or data
set).
l Whereas in the relational world, DISTINCT is a very resource
intensive operation.
COUNT
l Counts the number of rows returned, dependent on the WHERE
clause.
l Does not aggregate.
l Similar to its RDBMs counterpart.
l Can be (inadvertently) restricted by LIMIT.
l Resource intensive command; especially because it has to scan
each row in the table (which may be on different nodes), and apply
the WHERE conditions.
Limit
l Limits your query to N rows (where N is a positive integer).
l SELECT * FROM bladerunners LIMIT 2;
l Does not allow you to specify a start point.
l You cannot use LIMIT to “page” through your result set.
Cassandra “Upserts”
l Under the hood, INSERT and UPDATE are treated the same by
Cassandra.
l Colloquially known as an “Upsert.”
l Both INSERT and UPDATE operations require the complete
PRIMARY KEY.
So why the different syntax?
l Flexibility. Some situations call for one or the other.
l Counter columns/tables can only be incremented with an
UPDATE.
l INSERTs can save you some dev time in the application layer if
your PRIMARY KEY changes.
“Upsert” example
l UPDATE bladerunners SET data='This guy
is a one-man slaughterhouse.',name='Harry
Bryant',ts='2015-03-30 14:47:00-0600',type='Captain'
WHERE id='B16442';
l UPDATE bladerunners SET data = 'Drink
some for me, huh pal?' WHERE id='B16442';
“Upsert” example (cont)
l INSERT INTO bladerunners (id, type, ts, data, name)
VALUES ('B29591','Blade Runner','2015-03-30
14:34:00-0600','Captain Bryant would like a
word.','Eduardo Gaff');
l INSERT INTO bladerunners (id,data) VALUES
('B29591','It''s too bad she won't live. But then again,
who does?');
Secondary Indexes
l Cassandra provides secondary indexes to allow queries on non-
partition key columns.
l In 2.1.x you can even create indexes on collections and user
defined types.
l Designed for convenience, not for performance.
l Does not perform well on high-cardinality columns.
l Extremely low cardinality is also not a good idea.
l Low performance on a frequently updated column.
l In my opinion, try to avoid using them all together.
Anti-Patterns
l Multi-Key queries: IN
l Secondary Index queries
l DELETEs or INSERTing null values
Summary
l While CQL is designed to make use of our previous experience
using SQL, it is important to remember that the two do not behave
the same.
l Even if you are at an expert level in SQL, read the CQL
documentation before making any assumptions.
Additional Reading
l Getting Started with Time Series Data Modeling –
Patrick McFadin
l SELECT – DataStax CQL 3.1 documentation
l Counting Keys in Cassandra –
Richard Low
l Cassandra High Availability –
Robbie Strickland
Questions?

More Related Content

PPTX
CQL: This is not the SQL you are looking for.
PDF
Efficient Indexes in MySQL
PDF
Efficient Use of indexes in MySQL
PDF
Meetup - Exabyte Big Data - HPCC Systems - SQL to ECL
PDF
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
PDF
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
PDF
Introduction to Data Modeling with Apache Cassandra
PDF
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
CQL: This is not the SQL you are looking for.
Efficient Indexes in MySQL
Efficient Use of indexes in MySQL
Meetup - Exabyte Big Data - HPCC Systems - SQL to ECL
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Introduction to Data Modeling with Apache Cassandra
Use Your MySQL Knowledge to Become an Instant Cassandra Guru

Similar to Cassandra Day Chicago 2015: CQL: This is not he SQL you are looking for (20)

PDF
CQL In Cassandra 1.0 (and beyond)
PPTX
Apache Cassandra Data Modeling with Travis Price
PDF
Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling
PDF
Cassandra 2012
PDF
Cassandra Basics, Counters and Time Series Modeling
PDF
Big Data Grows Up - A (re)introduction to Cassandra
PDF
Cassandra Data Modeling
PDF
Introduction to CQL and Data Modeling with Apache Cassandra
PPTX
Apache Cassandra Developer Training Slide Deck
PPTX
CQL, then and now
PDF
Presentation.pdf
PDF
Deep Dive into Cassandra
PDF
Apache Cassandra - Data modelling
PDF
Cassandra EU - State of CQL
PDF
C* Summit EU 2013: The State of CQL
PDF
Introduction to data modeling with apache cassandra
PDF
Apache Cassandra & Data Modeling
PDF
Cassandra Day Atlanta 2015: Data Modeling 101
PDF
Cassandra Day London 2015: Data Modeling 101
PDF
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
CQL In Cassandra 1.0 (and beyond)
Apache Cassandra Data Modeling with Travis Price
Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling
Cassandra 2012
Cassandra Basics, Counters and Time Series Modeling
Big Data Grows Up - A (re)introduction to Cassandra
Cassandra Data Modeling
Introduction to CQL and Data Modeling with Apache Cassandra
Apache Cassandra Developer Training Slide Deck
CQL, then and now
Presentation.pdf
Deep Dive into Cassandra
Apache Cassandra - Data modelling
Cassandra EU - State of CQL
C* Summit EU 2013: The State of CQL
Introduction to data modeling with apache cassandra
Apache Cassandra & Data Modeling
Cassandra Day Atlanta 2015: Data Modeling 101
Cassandra Day London 2015: Data Modeling 101
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
Ad

More from DataStax Academy (20)

PDF
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
PPTX
Introduction to DataStax Enterprise Graph Database
PPTX
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
PPTX
Cassandra on Docker @ Walmart Labs
PDF
Cassandra 3.0 Data Modeling
PPTX
Cassandra Adoption on Cisco UCS & Open stack
PDF
Data Modeling for Apache Cassandra
PDF
Coursera Cassandra Driver
PDF
Production Ready Cassandra
PDF
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 1
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 2
PDF
Standing Up Your First Cluster
PDF
Real Time Analytics with Dse
PDF
Introduction to Data Modeling with Apache Cassandra
PDF
Cassandra Core Concepts
PPTX
Enabling Search in your Cassandra Application with DataStax Enterprise
PPTX
Bad Habits Die Hard
PDF
Advanced Data Modeling with Apache Cassandra
PDF
Advanced Cassandra
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Cassandra on Docker @ Walmart Labs
Cassandra 3.0 Data Modeling
Cassandra Adoption on Cisco UCS & Open stack
Data Modeling for Apache Cassandra
Coursera Cassandra Driver
Production Ready Cassandra
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 2
Standing Up Your First Cluster
Real Time Analytics with Dse
Introduction to Data Modeling with Apache Cassandra
Cassandra Core Concepts
Enabling Search in your Cassandra Application with DataStax Enterprise
Bad Habits Die Hard
Advanced Data Modeling with Apache Cassandra
Advanced Cassandra
Ad

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Electronic commerce courselecture one. Pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Cloud computing and distributed systems.
PDF
Approach and Philosophy of On baking technology
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Dropbox Q2 2025 Financial Results & Investor Presentation
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Electronic commerce courselecture one. Pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Cloud computing and distributed systems.
Approach and Philosophy of On baking technology
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Digital-Transformation-Roadmap-for-Companies.pptx
Review of recent advances in non-invasive hemoglobin estimation
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
NewMind AI Monthly Chronicles - July 2025
MYSQL Presentation for SQL database connectivity
Per capita expenditure prediction using model stacking based on satellite ima...
Understanding_Digital_Forensics_Presentation.pptx
A Presentation on Artificial Intelligence
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Network Security Unit 5.pdf for BCA BBA.
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

Cassandra Day Chicago 2015: CQL: This is not he SQL you are looking for

  • 1. Chicago 2015 CQL: This is not the SQL you are looking for Aaron Ploetz
  • 2. Wait... CQL is not SQL? l CQL3 introduced in Cassandra 1.1. l CQL is beneficial to new users who have a relational background (which is most of us). l However similar, CQL is NOT a direct implementation of SQL. l New users leave themselves open to issues and frustration when they use CQL with SQL-based expectations.
  • 3. $ whoami l Aaron Ploetz l @APloetz l Lead Database Engineer l Using Cassandra since version 0.8. l Contributor to the Cassandra tag on l 2014/15 DataStax MVP for Apache Cassandra
  • 4. 1 SQL features/keywords not present in CQL 2 Differences between CQL and SQL keywords 3 Secondary Indexes 4 Anti-Patterns 5 Questions
  • 5. SQL features/keywords not present in CQL l JOINs l LIKE l Subqueries l Aggregation l Arithmetic l Except for counters and collections.
  • 6. Differences between CQL and SQL keywords l WHERE l PRIMARY KEY l ORDER BY l IN l DISTINCT l COUNT l LIMIT l INSERT vs. UPDATE (“upsert”)
  • 7. WHERE l Only supports AND, IN, =, >, >=, <, <=. l Some only function under certain conditions. l Also: CONTAINS, CONTAINS KEY for indexed collections. l Does not exist: OR, != l Conditions can only operate on PRIMARY KEY components, and in the defined order of the keys.
  • 8. WHERE (cont) l SELECT * FROM shipcrewregistry WHERE shipname='Serenity'; l Start with partition key(s); cannot skip PRIMARY KEY components. l CREATE TABLE shipcrewregistry (shipname text, lastname text, firstname text, citizenid uuid, aliases set<text>, PRIMARY KEY (shipname, lastname, firstname, citizenid));
  • 9. ALLOW FILTERING l Actually I lied, you can skip primary key components if you apply the ALLOW FILTERING clause. l SELECT * FROM shipcrewregistry WHERE lastname='Washburne';
  • 10. ALLOW FILTERING (cont) l SELECT * FROM shipcrewregistry WHERE lastname='Washburne' ALLOW FILTERING; l But I don't recommend that. l ALLOW FILTERING pulls back all rows and then applies your WHERE conditions. l The folks at DataStax have proposed some alternate names... l Bottom line, if you are using ALLOW FILTERING, you are doing it wrong.
  • 11. PRIMARY KEY l PRIMARY KEYs function differently between Cassandra and relational databases. l Cassandra uses primary keys to determine data distribution and on-disk sort order. l Partition keys are the equivalent of “old school” row keys. l Clustering keys determine on-disk sort order within a partitioning key.
  • 12. ORDER BY l One of the most misunderstood aspects of CQL. l Can only order by clustering columns, in the key order of the clustering columns listed in the table definition (CLUSTERING ORDER). l Which means, that you really don't need ORDER BY. l So what does it do? It can reverse the sort direction (ASCending vs. DESCending) of the first clustering column.
  • 13. PRIMARY KEY / ORDER BY Example: Table Definition l CREATE TABLE postsByUserYear l (userid text, year bigint, tag text, posttime timestamp, content text, postid UUID, PRIMARY KEY ((userid, year), posttime, tag)) WITH CLUSTERING ORDER BY (posttime desc, tag asc);
  • 14. PRIMARY KEY / ORDER BY Example: Queries l SELECT * FROM postsByUserYear WHERE userid='2'; l SELECT * FROM postsByUserYear ORDER BY posttime; l SELECT * FROM postsByUserYear WHERE userid='2' AND year=2015 ORDER BY posttime DESC; l SELECT * FROM postsByUserYear WHERE userid='2' AND year=2015 ORDER BY tag;
  • 15. IN l Can only operate on the last partition key and/or the last clustering key. l And only when the first partition/clustering keys are restricted by an equals relation. l Does not perform well...especially with large clusters.
  • 16. Testing IN l CREATE TABLE bladerunners (id text, type text, ts timestamp, name text, data text, PRIMARY KEY (id));
  • 17. Testing IN (cont) l  SELECT * FROM bladerunners WHERE id IN ('B26354','B26354');
  • 18. DISTINCT l Returns a list of the queried partition keys. l Can only operate on partition key column(s). l In Cassandra DISTINCT returns the partition (row) keys, so it is a fairly light operation (relative to the size of the cluster and/or data set). l Whereas in the relational world, DISTINCT is a very resource intensive operation.
  • 19. COUNT l Counts the number of rows returned, dependent on the WHERE clause. l Does not aggregate. l Similar to its RDBMs counterpart. l Can be (inadvertently) restricted by LIMIT. l Resource intensive command; especially because it has to scan each row in the table (which may be on different nodes), and apply the WHERE conditions.
  • 20. Limit l Limits your query to N rows (where N is a positive integer). l SELECT * FROM bladerunners LIMIT 2; l Does not allow you to specify a start point. l You cannot use LIMIT to “page” through your result set.
  • 21. Cassandra “Upserts” l Under the hood, INSERT and UPDATE are treated the same by Cassandra. l Colloquially known as an “Upsert.” l Both INSERT and UPDATE operations require the complete PRIMARY KEY.
  • 22. So why the different syntax? l Flexibility. Some situations call for one or the other. l Counter columns/tables can only be incremented with an UPDATE. l INSERTs can save you some dev time in the application layer if your PRIMARY KEY changes.
  • 23. “Upsert” example l UPDATE bladerunners SET data='This guy is a one-man slaughterhouse.',name='Harry Bryant',ts='2015-03-30 14:47:00-0600',type='Captain' WHERE id='B16442'; l UPDATE bladerunners SET data = 'Drink some for me, huh pal?' WHERE id='B16442';
  • 24. “Upsert” example (cont) l INSERT INTO bladerunners (id, type, ts, data, name) VALUES ('B29591','Blade Runner','2015-03-30 14:34:00-0600','Captain Bryant would like a word.','Eduardo Gaff'); l INSERT INTO bladerunners (id,data) VALUES ('B29591','It''s too bad she won't live. But then again, who does?');
  • 25. Secondary Indexes l Cassandra provides secondary indexes to allow queries on non- partition key columns. l In 2.1.x you can even create indexes on collections and user defined types. l Designed for convenience, not for performance. l Does not perform well on high-cardinality columns. l Extremely low cardinality is also not a good idea. l Low performance on a frequently updated column. l In my opinion, try to avoid using them all together.
  • 26. Anti-Patterns l Multi-Key queries: IN l Secondary Index queries l DELETEs or INSERTing null values
  • 27. Summary l While CQL is designed to make use of our previous experience using SQL, it is important to remember that the two do not behave the same. l Even if you are at an expert level in SQL, read the CQL documentation before making any assumptions.
  • 28. Additional Reading l Getting Started with Time Series Data Modeling – Patrick McFadin l SELECT – DataStax CQL 3.1 documentation l Counting Keys in Cassandra – Richard Low l Cassandra High Availability – Robbie Strickland