SlideShare a Scribd company logo
Rimas Silkaitis
From Postgres to Cassandra
NoSQL vs SQL
||
&&
Rimas Silkaitis
Product
@neovintage
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
app cloud
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
DEPLOY MANAGE SCALE
$ git push heroku master
Counting objects: 11, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (10/10), done.
Writing objects: 100% (11/11), 22.29 KiB | 0 bytes/s, done.
Total 11 (delta 1), reused 0 (delta 0)
remote: Compressing source files... done.
remote: Building source:
remote:
remote: -----> Ruby app detected
remote: -----> Compiling Ruby
remote: -----> Using Ruby version: ruby-2.3.1
Heroku Postgres
Over 1 Million Active DBs
Heroku Redis
Over 100K Active Instances
Apache Kafka on Heroku
Runtime
Runtime
Workers
$ psql
psql => d
List of relations
schema | name | type | owner
--------+----------+-------+-----------
public | users | table | neovintage
public | accounts | table | neovintage
public | events | table | neovintage
public | tasks | table | neovintage
public | lists | table | neovintage
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
Ugh… Database Problems
$ psql
psql => d
List of relations
schema | name | type | owner
--------+----------+-------+-----------
public | users | table | neovintage
public | accounts | table | neovintage
public | events | table | neovintage
public | tasks | table | neovintage
public | lists | table | neovintage
Site Traffic
Events
* Totally Not to Scale
One
Big Table
Problem
CREATE TABLE users (
id bigserial,
account_id bigint,
name text,
email text,
encrypted_password text,
created_at timestamptz,
updated_at timestamptz
);
CREATE TABLE accounts (
id bigserial,
name text,
owner_id bigint,
created_at timestamptz,
updated_at timestamptz
);
CREATE TABLE events (
user_id bigint,
account_id bigint,
session_id text,
occurred_at timestamptz,
category text,
action text,
label text,
attributes jsonb
);
Table
events
events
events_20160901
events_20160902
events_20160903
events_20160904
Add Some Triggers
$ psql
neovintage::DB=> e
INSERT INTO events (
user_id,
account_id,
category,
action,
created_at)
VALUES (1,
2,
“in_app”,
“purchase_upgrade”
“2016-09-07 11:00:00 -07:00”);
events_20160901
events_20160902
events_20160903
events_20160904
events
INSERT
query
Constraints
• Data has little value after a period of time
• Small range of data has to be queried
• Old data can be archived or aggregated
There’s A Better Way
&&
One
Big Table
Problem
$ psql
psql => d
List of relations
schema | name | type | owner
--------+----------+-------+-----------
public | users | table | neovintage
public | accounts | table | neovintage
public | events | table | neovintage
public | tasks | table | neovintage
public | lists | table | neovintage
Why Introduce
Cassandra?
• Linear Scalability
• No Single Point of Failure
• Flexible Data Model
• Tunable Consistency
Runtime
WorkersNew Architecture
I only know relational databases.
How do I do this?
Understanding Cassandra
Two Dimensional
Table Spaces
RELATIONAL
Associative Arrays
or Hash
KEY-VALUE
Postgres is Typically Run as Single Instance*
• Partitioned Key-Value Store
• Has a Grouping of Nodes (data
center)
• Data is distributed amongst the
nodes
Cassandra Cluster with 2 Data Centers
assandra uery anguage
SQL-like
[sēkwel lahyk]
adjective
Resembling SQL in appearance,
behavior or character
adverb
In the manner of SQL
s Talk About Primary K
Partition
Table
Partition Key
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
• 5 Node Cluster
• Simplest terms: Data is partitioned
amongst all the nodes using the
hashing function.
Replication Factor
Replication Factor
Setting this parameter
tells Cassandra how
many nodes to copy
incoming the data to
This is a replication factor of 3
But I thought
Cassandra had
tables?
Prior to 3.0, tables were called column families
Let’s Model Our Events
Table in Cassandra
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
We’re not going to go
through any setup
Plenty of tutorials exist
for that sort of thing
Let’s assume were
working with 5 node
cluster
$ psql
neovintage::DB=> d events
Table “public.events"
Column | Type | Modifiers
---------------+--------------------------+-----------
user_id | bigint |
account_id | bigint |
session_id | text |
occurred_at | timestamp with time zone |
category | text |
action | text |
label | text |
attributes | jsonb |
$ cqlsh
cqlsh> CREATE KEYSPACE
IF NOT EXISTS neovintage_prod
WITH REPLICATION = {
‘class’: ‘NetworkTopologyStrategy’,
‘us-east’: 3
};
$ cqlsh
cqlsh> CREATE SCHEMA
IF NOT EXISTS neovintage_prod
WITH REPLICATION = {
‘class’: ‘NetworkTopologyStrategy’,
‘us-east’: 3
};
KEYSPACE ==
SCHEMA
• CQL can use KEYSPACE and SCHEMA
interchangeably
• SCHEMA in Cassandra is somewhere between
`CREATE DATABASE` and `CREATE SCHEMA` in
Postgres
$ cqlsh
cqlsh> CREATE SCHEMA
IF NOT EXISTS neovintage_prod
WITH REPLICATION = {
‘class’: ‘NetworkTopologyStrategy’,
‘us-east’: 3
};
Replication Strategy
$ cqlsh
cqlsh> CREATE SCHEMA
IF NOT EXISTS neovintage_prod
WITH REPLICATION = {
‘class’: ‘NetworkTopologyStrategy’,
‘us-east’: 3
};
Replication Factor
Replication Strategies
• NetworkTopologyStrategy - You have to define the
network topology by defining the data centers. No
magic here
• SimpleStrategy - Has no idea of the topology and
doesn’t care to. Data is replicated to adjacent nodes.
$ cqlsh
cqlsh> CREATE TABLE neovintage_prod.events (
user_id bigint primary key,
account_id bigint,
session_id text,
occurred_at timestamp,
category text,
action text,
label text,
attributes map<text, text>
);
Remember the Primary
Key?
• Postgres defines a PRIMARY KEY as a constraint
that a column or group of columns can be used as a
unique identifier for rows in the table.
• CQL shares that same constraint but extends the
definition even further. Although the main purpose is
to order information in the cluster.
• CQL includes partitioning and sort order of the data
on disk (clustering).
$ cqlsh
cqlsh> CREATE TABLE neovintage_prod.events (
user_id bigint primary key,
account_id bigint,
session_id text,
occurred_at timestamp,
category text,
action text,
label text,
attributes map<text, text>
);
Single Column Primary
Key
• Used for both partitioning and clustering.
• Syntactically, can be defined inline or as a separate
line within the DDL statement.
$ cqlsh
cqlsh> CREATE TABLE neovintage_prod.events (
user_id bigint,
account_id bigint,
session_id text,
occurred_at timestamp,
category text,
action text,
label text,
attributes map<text, text>,
PRIMARY KEY (
(user_id, occurred_at),
account_id,
session_id
)
);
$ cqlsh
cqlsh> CREATE TABLE neovintage_prod.events (
user_id bigint,
account_id bigint,
session_id text,
occurred_at timestamp,
category text,
action text,
label text,
attributes map<text, text>,
PRIMARY KEY (
(user_id, occurred_at),
account_id,
session_id
)
);
Composite
Partition Key
$ cqlsh
cqlsh> CREATE TABLE neovintage_prod.events (
user_id bigint,
account_id bigint,
session_id text,
occurred_at timestamp,
category text,
action text,
label text,
attributes map<text, text>,
PRIMARY KEY (
(user_id, occurred_at),
account_id,
session_id
)
);
Clustering Keys
PRIMARY KEY (
(user_id, occurred_at),
account_id,
session_id
)
Composite Partition Key
• This means that both the user_id and the occurred_at
columns are going to be used to partition data.
• If you were to not include the inner parenthesis, the the
first column listed in this PRIMARY KEY definition
would be the sole partition key.
PRIMARY KEY (
(user_id, occurred_at),
account_id,
session_id
)
Clustering Columns
• Defines how the data is sorted on disk. In this case, its
by account_id and then session_id
• It is possible to change the direction of the sort order
$ cqlsh
cqlsh> CREATE TABLE neovintage_prod.events (
user_id bigint,
account_id bigint,
session_id text,
occurred_at timestamp,
category text,
action text,
label text,
attributes map<text, text>,
PRIMARY KEY (
(user_id, occurred_at),
account_id,
session_id
)
) WITH CLUSTERING ORDER BY (
account_id desc, session_id acc
);
Ahhhhh… Just
like SQL
Data TypesTypes
Postgres Type Cassandra Type
bigint bigint
int int
decimal decimal
float float
text text
varchar(n) varchar
blob blob
json N/A
jsonb N/A
hstore map<type>, <type>
Postgres Type Cassandra Type
bigint bigint
int int
decimal decimal
float float
text text
varchar(n) varchar
blob blob
json N/A
jsonb N/A
hstore map<type>, <type>
Challenges
• JSON / JSONB columns don't have 1:1 mappings in
Cassandra
• You’ll need to nest MAP type in Cassandra or flatten
out your JSON
• Be careful about timestamps!! Time zones are already
challenging in Postgres.
• If you don’t specify a time zone in Cassandra the time
zone of the coordinator node is used. Always specify
one.
Ready for
Webscale
General Tips
• Just like Table Partitioning in Postgres, you need to
think about how you’re going to query the data in
Cassandra. This dictates how you set up your keys.
• We just walked through the semantics on the
database side. Tackling this change on the
application-side is a whole extra topic.
• This is just enough information to get you started.
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
Runtime
Workers
Runtime
Workers
Foreign Data Wrapper
fdw
=>
fdw
We’re not going to go through
any setup, again……..
https://bitbucket.org/openscg/cassandra_fdw
$ psql
neovintage::DB=> CREATE EXTENSION cassandra_fdw;
CREATE EXTENSION
$ psql
neovintage::DB=> CREATE EXTENSION cassandra_fdw;
CREATE EXTENSION
neovintage::DB=> CREATE SERVER cass_serv
FOREIGN DATA WRAPPER cassandra_fdw
OPTIONS (host ‘127.0.0.1');
CREATE SERVER
$ psql
neovintage::DB=> CREATE EXTENSION cassandra_fdw;
CREATE EXTENSION
neovintage::DB=> CREATE SERVER cass_serv
FOREIGN DATA WRAPPER cassandra_fdw
OPTIONS (host ‘127.0.0.1');
CREATE SERVER
neovintage::DB=> CREATE USER MAPPING FOR public
SERVER cass_serv
OPTIONS (username 'test', password ‘test');
CREATE USER
$ psql
neovintage::DB=> CREATE EXTENSION cassandra_fdw;
CREATE EXTENSION
neovintage::DB=> CREATE SERVER cass_serv
FOREIGN DATA WRAPPER cassandra_fdw
OPTIONS (host ‘127.0.0.1');
CREATE SERVER
neovintage::DB=> CREATE USER MAPPING FOR public SERVER cass_serv
OPTIONS (username 'test', password ‘test');
CREATE USER
neovintage::DB=> CREATE FOREIGN TABLE cass.events (id int)
SERVER cass_serv
OPTIONS (schema_name ‘neovintage_prod',
table_name 'events', primary_key ‘id');
CREATE FOREIGN TABLE
neovintage::DB=> INSERT INTO cass.events (
user_id,
occurred_at,
label
)
VALUES (
1234,
“2016-09-08 11:00:00 -0700”,
“awesome”
);
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
Some Gotchas
• No Composite Primary Key Support in
cassandra_fdw
• No support for UPSERT
• Postgres 9.5+ and Cassandra 3.0+ Supported
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016

More Related Content

PDF
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
PDF
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
PPTX
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
PPTX
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
PPTX
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
PPTX
Processing 50,000 events per second with Cassandra and Spark
PDF
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
PDF
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Processing 50,000 events per second with Cassandra and Spark
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...

What's hot (20)

PPTX
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
PPTX
Real time data pipeline with spark streaming and cassandra with mesos
PPTX
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
PPTX
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
PDF
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
PDF
Apache Cassandra at Macys
PDF
Introduction to data modeling with apache cassandra
PPTX
Cassandra Summit 2015: Intro to DSE Search
PDF
Time series with Apache Cassandra - Long version
PPTX
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
PPTX
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
PPTX
BI, Reporting and Analytics on Apache Cassandra
PDF
Datastax day 2016 : Cassandra data modeling basics
PPTX
Using Spark to Load Oracle Data into Cassandra
PDF
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
PDF
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
PPTX
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
PDF
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
PDF
Cassandra Basics, Counters and Time Series Modeling
PPTX
Everyday I’m scaling... Cassandra
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Real time data pipeline with spark streaming and cassandra with mesos
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
Apache Cassandra at Macys
Introduction to data modeling with apache cassandra
Cassandra Summit 2015: Intro to DSE Search
Time series with Apache Cassandra - Long version
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
BI, Reporting and Analytics on Apache Cassandra
Datastax day 2016 : Cassandra data modeling basics
Using Spark to Load Oracle Data into Cassandra
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra Basics, Counters and Time Series Modeling
Everyday I’m scaling... Cassandra
Ad

Viewers also liked (20)

PDF
EDF2013: Selected Talk, Simon Riggs: Practical PostgreSQL and AXLE Project
PDF
Cassandra db
PDF
BKK16-400B ODPI - Standardizing Hadoop
PDF
Music Recommendations at Spotify
PDF
EXPLicando o Explain no PostgreSQL
PDF
PGDay Campinas 2013 - PL/pg…ETL – Transformação de dados para DW e BI usando ...
PDF
PGDay Campinas 2013 - Como Full Text Search pode ajudar na busca textual
PDF
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
PDF
PostgreSQL: How to Store Passwords Safely
PPTX
Apache Cassandra Data Modeling with Travis Price
PDF
Dba PostgreSQL desde básico a avanzado parte2
PDF
Building an Activity Feed with Cassandra
PDF
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
PDF
PgBouncer: Pool, Segurança e Disaster Recovery | Felipe Pereira
PDF
DevOps e PostgreSQL: Replicação de forma simplificada | Miguel Di Ciurcio
PPSX
Testing - Ing. Gabriela Muñoz
PDF
Cassandra By Example: Data Modelling with CQL3
PPTX
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
PDF
Cassandra NoSQL Tutorial
PPTX
Cassandra Data Modeling - Practical Considerations @ Netflix
EDF2013: Selected Talk, Simon Riggs: Practical PostgreSQL and AXLE Project
Cassandra db
BKK16-400B ODPI - Standardizing Hadoop
Music Recommendations at Spotify
EXPLicando o Explain no PostgreSQL
PGDay Campinas 2013 - PL/pg…ETL – Transformação de dados para DW e BI usando ...
PGDay Campinas 2013 - Como Full Text Search pode ajudar na busca textual
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
PostgreSQL: How to Store Passwords Safely
Apache Cassandra Data Modeling with Travis Price
Dba PostgreSQL desde básico a avanzado parte2
Building an Activity Feed with Cassandra
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
PgBouncer: Pool, Segurança e Disaster Recovery | Felipe Pereira
DevOps e PostgreSQL: Replicação de forma simplificada | Miguel Di Ciurcio
Testing - Ing. Gabriela Muñoz
Cassandra By Example: Data Modelling with CQL3
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Cassandra NoSQL Tutorial
Cassandra Data Modeling - Practical Considerations @ Netflix
Ad

Similar to From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016 (20)

PPTX
Presentation
PDF
10 Reasons to Start Your Analytics Project with PostgreSQL
PDF
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
PDF
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
PDF
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...
PDF
SparkSQL: A Compiler from Queries to RDDs
PPTX
Data stores: beyond relational databases
PDF
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
PPTX
Apache Cassandra, part 1 – principles, data model
PDF
3 Dundee-Spark Overview for C* developers
PDF
Avoiding Pitfalls for Cassandra.pdf
PDF
Spark & Cassandra - DevFest Córdoba
ODP
Intro to cassandra
PDF
Instaclustr webinar 2017 feb 08 japan
PDF
Manchester Hadoop Meetup: Spark Cassandra Integration
PPTX
Riak add presentation
PDF
Cassandra Talk: Austin JUG
PPTX
Einführung in MongoDB
PPTX
2015 02-09 - NoSQL Vorlesung Mosbach
PPTX
Cassandra Java APIs Old and New – A Comparison
Presentation
10 Reasons to Start Your Analytics Project with PostgreSQL
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...
SparkSQL: A Compiler from Queries to RDDs
Data stores: beyond relational databases
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Apache Cassandra, part 1 – principles, data model
3 Dundee-Spark Overview for C* developers
Avoiding Pitfalls for Cassandra.pdf
Spark & Cassandra - DevFest Córdoba
Intro to cassandra
Instaclustr webinar 2017 feb 08 japan
Manchester Hadoop Meetup: Spark Cassandra Integration
Riak add presentation
Cassandra Talk: Austin JUG
Einführung in MongoDB
2015 02-09 - NoSQL Vorlesung Mosbach
Cassandra Java APIs Old and New – A Comparison

More from DataStax (20)

PPTX
Is Your Enterprise Ready to Shine This Holiday Season?
PPTX
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
PPTX
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
PPTX
Best Practices for Getting to Production with DataStax Enterprise Graph
PPTX
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
PPTX
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
PDF
Webinar | Better Together: Apache Cassandra and Apache Kafka
PDF
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
PDF
Introduction to Apache Cassandra™ + What’s New in 4.0
PPTX
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
PPTX
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
PDF
Designing a Distributed Cloud Database for Dummies
PDF
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
PDF
How to Evaluate Cloud Databases for eCommerce
PPTX
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
PPTX
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
PPTX
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
PPTX
Datastax - The Architect's guide to customer experience (CX)
PPTX
An Operational Data Layer is Critical for Transformative Banking Applications
PPTX
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Is Your Enterprise Ready to Shine This Holiday Season?
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Best Practices for Getting to Production with DataStax Enterprise Graph
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | Better Together: Apache Cassandra and Apache Kafka
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Introduction to Apache Cassandra™ + What’s New in 4.0
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Designing a Distributed Cloud Database for Dummies
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Evaluate Cloud Databases for eCommerce
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Datastax - The Architect's guide to customer experience (CX)
An Operational Data Layer is Critical for Transformative Banking Applications
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking

Recently uploaded (20)

PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
Online Work Permit System for Fast Permit Processing
PDF
AI in Product Development-omnex systems
PPTX
L1 - Introduction to python Backend.pptx
PDF
System and Network Administraation Chapter 3
PPT
Introduction Database Management System for Course Database
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
ISO 45001 Occupational Health and Safety Management System
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Nekopoi APK 2025 free lastest update
PDF
System and Network Administration Chapter 2
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Odoo POS Development Services by CandidRoot Solutions
How Creative Agencies Leverage Project Management Software.pdf
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
VVF-Customer-Presentation2025-Ver1.9.pptx
Which alternative to Crystal Reports is best for small or large businesses.pdf
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Online Work Permit System for Fast Permit Processing
AI in Product Development-omnex systems
L1 - Introduction to python Backend.pptx
System and Network Administraation Chapter 3
Introduction Database Management System for Course Database
CHAPTER 2 - PM Management and IT Context
ISO 45001 Occupational Health and Safety Management System
ManageIQ - Sprint 268 Review - Slide Deck
Navsoft: AI-Powered Business Solutions & Custom Software Development
Nekopoi APK 2025 free lastest update
System and Network Administration Chapter 2
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
How to Choose the Right IT Partner for Your Business in Malaysia
Odoo POS Development Services by CandidRoot Solutions

From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016

  • 3. ||
  • 4. &&
  • 10. $ git push heroku master Counting objects: 11, done. Delta compression using up to 8 threads. Compressing objects: 100% (10/10), done. Writing objects: 100% (11/11), 22.29 KiB | 0 bytes/s, done. Total 11 (delta 1), reused 0 (delta 0) remote: Compressing source files... done. remote: Building source: remote: remote: -----> Ruby app detected remote: -----> Compiling Ruby remote: -----> Using Ruby version: ruby-2.3.1
  • 11. Heroku Postgres Over 1 Million Active DBs
  • 12. Heroku Redis Over 100K Active Instances
  • 13. Apache Kafka on Heroku
  • 16. $ psql psql => d List of relations schema | name | type | owner --------+----------+-------+----------- public | users | table | neovintage public | accounts | table | neovintage public | events | table | neovintage public | tasks | table | neovintage public | lists | table | neovintage
  • 21. $ psql psql => d List of relations schema | name | type | owner --------+----------+-------+----------- public | users | table | neovintage public | accounts | table | neovintage public | events | table | neovintage public | tasks | table | neovintage public | lists | table | neovintage
  • 24. CREATE TABLE users ( id bigserial, account_id bigint, name text, email text, encrypted_password text, created_at timestamptz, updated_at timestamptz ); CREATE TABLE accounts ( id bigserial, name text, owner_id bigint, created_at timestamptz, updated_at timestamptz );
  • 25. CREATE TABLE events ( user_id bigint, account_id bigint, session_id text, occurred_at timestamptz, category text, action text, label text, attributes jsonb );
  • 26. Table
  • 29. $ psql neovintage::DB=> e INSERT INTO events ( user_id, account_id, category, action, created_at) VALUES (1, 2, “in_app”, “purchase_upgrade” “2016-09-07 11:00:00 -07:00”);
  • 31. Constraints • Data has little value after a period of time • Small range of data has to be queried • Old data can be archived or aggregated
  • 33. &&
  • 35. $ psql psql => d List of relations schema | name | type | owner --------+----------+-------+----------- public | users | table | neovintage public | accounts | table | neovintage public | events | table | neovintage public | tasks | table | neovintage public | lists | table | neovintage
  • 36. Why Introduce Cassandra? • Linear Scalability • No Single Point of Failure • Flexible Data Model • Tunable Consistency
  • 38. I only know relational databases. How do I do this?
  • 42. Postgres is Typically Run as Single Instance*
  • 43. • Partitioned Key-Value Store • Has a Grouping of Nodes (data center) • Data is distributed amongst the nodes
  • 44. Cassandra Cluster with 2 Data Centers
  • 46. SQL-like [sēkwel lahyk] adjective Resembling SQL in appearance, behavior or character adverb In the manner of SQL
  • 47. s Talk About Primary K Partition
  • 48. Table
  • 51. • 5 Node Cluster • Simplest terms: Data is partitioned amongst all the nodes using the hashing function.
  • 53. Replication Factor Setting this parameter tells Cassandra how many nodes to copy incoming the data to This is a replication factor of 3
  • 54. But I thought Cassandra had tables?
  • 55. Prior to 3.0, tables were called column families
  • 56. Let’s Model Our Events Table in Cassandra
  • 58. We’re not going to go through any setup Plenty of tutorials exist for that sort of thing Let’s assume were working with 5 node cluster
  • 59. $ psql neovintage::DB=> d events Table “public.events" Column | Type | Modifiers ---------------+--------------------------+----------- user_id | bigint | account_id | bigint | session_id | text | occurred_at | timestamp with time zone | category | text | action | text | label | text | attributes | jsonb |
  • 60. $ cqlsh cqlsh> CREATE KEYSPACE IF NOT EXISTS neovintage_prod WITH REPLICATION = { ‘class’: ‘NetworkTopologyStrategy’, ‘us-east’: 3 };
  • 61. $ cqlsh cqlsh> CREATE SCHEMA IF NOT EXISTS neovintage_prod WITH REPLICATION = { ‘class’: ‘NetworkTopologyStrategy’, ‘us-east’: 3 };
  • 62. KEYSPACE == SCHEMA • CQL can use KEYSPACE and SCHEMA interchangeably • SCHEMA in Cassandra is somewhere between `CREATE DATABASE` and `CREATE SCHEMA` in Postgres
  • 63. $ cqlsh cqlsh> CREATE SCHEMA IF NOT EXISTS neovintage_prod WITH REPLICATION = { ‘class’: ‘NetworkTopologyStrategy’, ‘us-east’: 3 }; Replication Strategy
  • 64. $ cqlsh cqlsh> CREATE SCHEMA IF NOT EXISTS neovintage_prod WITH REPLICATION = { ‘class’: ‘NetworkTopologyStrategy’, ‘us-east’: 3 }; Replication Factor
  • 65. Replication Strategies • NetworkTopologyStrategy - You have to define the network topology by defining the data centers. No magic here • SimpleStrategy - Has no idea of the topology and doesn’t care to. Data is replicated to adjacent nodes.
  • 66. $ cqlsh cqlsh> CREATE TABLE neovintage_prod.events ( user_id bigint primary key, account_id bigint, session_id text, occurred_at timestamp, category text, action text, label text, attributes map<text, text> );
  • 67. Remember the Primary Key? • Postgres defines a PRIMARY KEY as a constraint that a column or group of columns can be used as a unique identifier for rows in the table. • CQL shares that same constraint but extends the definition even further. Although the main purpose is to order information in the cluster. • CQL includes partitioning and sort order of the data on disk (clustering).
  • 68. $ cqlsh cqlsh> CREATE TABLE neovintage_prod.events ( user_id bigint primary key, account_id bigint, session_id text, occurred_at timestamp, category text, action text, label text, attributes map<text, text> );
  • 69. Single Column Primary Key • Used for both partitioning and clustering. • Syntactically, can be defined inline or as a separate line within the DDL statement.
  • 70. $ cqlsh cqlsh> CREATE TABLE neovintage_prod.events ( user_id bigint, account_id bigint, session_id text, occurred_at timestamp, category text, action text, label text, attributes map<text, text>, PRIMARY KEY ( (user_id, occurred_at), account_id, session_id ) );
  • 71. $ cqlsh cqlsh> CREATE TABLE neovintage_prod.events ( user_id bigint, account_id bigint, session_id text, occurred_at timestamp, category text, action text, label text, attributes map<text, text>, PRIMARY KEY ( (user_id, occurred_at), account_id, session_id ) ); Composite Partition Key
  • 72. $ cqlsh cqlsh> CREATE TABLE neovintage_prod.events ( user_id bigint, account_id bigint, session_id text, occurred_at timestamp, category text, action text, label text, attributes map<text, text>, PRIMARY KEY ( (user_id, occurred_at), account_id, session_id ) ); Clustering Keys
  • 73. PRIMARY KEY ( (user_id, occurred_at), account_id, session_id ) Composite Partition Key • This means that both the user_id and the occurred_at columns are going to be used to partition data. • If you were to not include the inner parenthesis, the the first column listed in this PRIMARY KEY definition would be the sole partition key.
  • 74. PRIMARY KEY ( (user_id, occurred_at), account_id, session_id ) Clustering Columns • Defines how the data is sorted on disk. In this case, its by account_id and then session_id • It is possible to change the direction of the sort order
  • 75. $ cqlsh cqlsh> CREATE TABLE neovintage_prod.events ( user_id bigint, account_id bigint, session_id text, occurred_at timestamp, category text, action text, label text, attributes map<text, text>, PRIMARY KEY ( (user_id, occurred_at), account_id, session_id ) ) WITH CLUSTERING ORDER BY ( account_id desc, session_id acc ); Ahhhhh… Just like SQL
  • 77. Postgres Type Cassandra Type bigint bigint int int decimal decimal float float text text varchar(n) varchar blob blob json N/A jsonb N/A hstore map<type>, <type>
  • 78. Postgres Type Cassandra Type bigint bigint int int decimal decimal float float text text varchar(n) varchar blob blob json N/A jsonb N/A hstore map<type>, <type>
  • 79. Challenges • JSON / JSONB columns don't have 1:1 mappings in Cassandra • You’ll need to nest MAP type in Cassandra or flatten out your JSON • Be careful about timestamps!! Time zones are already challenging in Postgres. • If you don’t specify a time zone in Cassandra the time zone of the coordinator node is used. Always specify one.
  • 81. General Tips • Just like Table Partitioning in Postgres, you need to think about how you’re going to query the data in Cassandra. This dictates how you set up your keys. • We just walked through the semantics on the database side. Tackling this change on the application-side is a whole extra topic. • This is just enough information to get you started.
  • 86. fdw
  • 87. We’re not going to go through any setup, again…….. https://bitbucket.org/openscg/cassandra_fdw
  • 88. $ psql neovintage::DB=> CREATE EXTENSION cassandra_fdw; CREATE EXTENSION
  • 89. $ psql neovintage::DB=> CREATE EXTENSION cassandra_fdw; CREATE EXTENSION neovintage::DB=> CREATE SERVER cass_serv FOREIGN DATA WRAPPER cassandra_fdw OPTIONS (host ‘127.0.0.1'); CREATE SERVER
  • 90. $ psql neovintage::DB=> CREATE EXTENSION cassandra_fdw; CREATE EXTENSION neovintage::DB=> CREATE SERVER cass_serv FOREIGN DATA WRAPPER cassandra_fdw OPTIONS (host ‘127.0.0.1'); CREATE SERVER neovintage::DB=> CREATE USER MAPPING FOR public SERVER cass_serv OPTIONS (username 'test', password ‘test'); CREATE USER
  • 91. $ psql neovintage::DB=> CREATE EXTENSION cassandra_fdw; CREATE EXTENSION neovintage::DB=> CREATE SERVER cass_serv FOREIGN DATA WRAPPER cassandra_fdw OPTIONS (host ‘127.0.0.1'); CREATE SERVER neovintage::DB=> CREATE USER MAPPING FOR public SERVER cass_serv OPTIONS (username 'test', password ‘test'); CREATE USER neovintage::DB=> CREATE FOREIGN TABLE cass.events (id int) SERVER cass_serv OPTIONS (schema_name ‘neovintage_prod', table_name 'events', primary_key ‘id'); CREATE FOREIGN TABLE
  • 92. neovintage::DB=> INSERT INTO cass.events ( user_id, occurred_at, label ) VALUES ( 1234, “2016-09-08 11:00:00 -0700”, “awesome” );
  • 94. Some Gotchas • No Composite Primary Key Support in cassandra_fdw • No support for UPSERT • Postgres 9.5+ and Cassandra 3.0+ Supported