SlideShare a Scribd company logo
August 6, 2015 www.ExigenServices.com
Apache Cassandra – Future without
Boundaries
Part 1
2 www.ExigenServices.com
I. RDBMS Pros and Cons
3 www.ExigenServices.com
Pros
1. Good balance between functionality and usability.
Powerful tools support.
2. SQL has feature rich syntax.
3. Set of widely accepted standards.
4. ACID
4 www.ExigenServices.com
Scalability
RDBMS were mainstream for tens of years till
 requirements for scalability increased
dramatically;
 complexity of processed data structures increased
dramatically;
5 www.ExigenServices.com
Scaling
Two ways of scaling:
– Vertical scaling
– Horizontal scaling
6 www.ExigenServices.com
CAP Theorem
7 www.ExigenServices.com
Cons
Cost of distributed transactions
a) Lower availability. Two DB with 99.9% have
availability.
99.9% * 99.9% ~ 99.8% (43 min. downtime per month).
b) Additional synchronization overhead.
c) As slow as slowest DB node + network latency.
d) 2PC is blocking protocol.
e) It is possible to lock resources forever.
8 www.ExigenServices.com
Cons
Usage of master - slave replication.
 Makes write side (master) performance
bottleneck and requires additional CPU/IO
resources.
 There is no partition tolerance.
9 www.ExigenServices.com
Sharding
a) Vertical sharding
b) Horizontal sharding
10 www.ExigenServices.com
Vertical sharding
DB instances are divided
by DB functions.
11 www.ExigenServices.com
Horizontal sharding
One table is divided onto
several resources
Hashcode sharding
12 www.ExigenServices.com
Cassandra sharding
 Cassandra uses hash code load balancing
 Cassandra better fits for reporting than for business
logic processing.
 Cassandra + Hadoop == OLAP server with high
performance and availability.
13 www.ExigenServices.com
II. Apache Cassandra. Overview
14 www.ExigenServices.com
Cassandra
Amazon Dynamo
(architecture)
 DHT
 Eventual consistency
 Tunable trade-offs, tunable
consistency
Google BigTable
(data model)
 Values are structured and
indexed
 Column families and columns
+
15 www.ExigenServices.com
Distributed and decentralized
 No master/slave nodes (server symmetry)
 No single point of failure
16 www.ExigenServices.com
DHT
Distributed hash table
 lookup service similar to a hash table - (key, value)
 any participating node can efficiently retrieve the value associated
with a given key
17 www.ExigenServices.com
Keyspace
Abstract keyspace, such as the set of 128 or 160
bit strings.
18 www.ExigenServices.com
Partitioning
 A keyspace partitioning scheme splits
ownership of this keyspace among the
participating nodes.
19 www.ExigenServices.com
Keyspace partitioning
 Keyspace distance function δ(k1,k2)
 A node with ID ix owns all the keys km for which
ix is the closest ID, measured according to δ(km,ix).
20 www.ExigenServices.com
Keyspace partitioning
 Imagine mapping range from 0 to 2128 into a circle
so the values wrap around.
21 www.ExigenServices.com
Keyspace partitioning
 Consider what happens if node C is removed
22 www.ExigenServices.com
Keyspace partitioning
 Consider what happens if node D is added
23 www.ExigenServices.com
Overlay network
 For any key k, each node either has a node ID
that owns k or has a link to a node whose node ID
is closer to k
 Greedy algorithm: at each step, forward the
message to the neighbor whose ID is closest to k
24 www.ExigenServices.com
Elastic scalability
 Adding/removing new node doesn’t require
reconfiguring of Cassandra, changing application
queries or restarting system
25 www.ExigenServices.com
High availability and fault tolerance
 Cassandra picks A and P from CAP
 Eventual consistency
26 www.ExigenServices.com
Tunable consistency
 Replication factor (number of copies of each piece
of data)
 Consistency level (number of replicas to access
on every read/write operation)
Consistency level Read / Write
ONE 1 replica
QUORUM N/2 + 1
ALL N
27 www.ExigenServices.com
Quorum consistency level
R = N/2 + 1
W = N/2 + 1
R + W > N
28 www.ExigenServices.com
Hybrid orientation
 Column orientation
– columns aren’t fixed
– columns can be sorted
– columns can be queried for a certain range
 Row orientation
– each row is uniquely identifiable by key
– columns are grouped into rows
29 www.ExigenServices.com
Schema-free
 You don’t have to define columns when you
create data model
 You think of queries you will use and then provide
data around them
31 www.ExigenServices.com
III. Data Model
32 www.ExigenServices.com
Table1 Table2
Database
Relational data model
Column1 Column2
Row1 value value
Row2 null value
…
Column1 Column2 Column3
Row1 value value value
Row2 null value null
…
33 www.ExigenServices.com
Cassandra data model
Keyspace
Column Family
RowKey1
RowKey2
Column1 Column2 Column3
Value3Value2Value1
Value4Value1
Column4Column1
34 www.ExigenServices.com
Keyspace
 Keyspace is close to a relational database
 Basic attributes:
– replication factor
– replica placement strategy
– column families (tables from relational model)
 Possible to create several keyspaces per application (for
example, if you need different replica placement strategy
or replication factor)
35 www.ExigenServices.com
Column family
 Container for collection of rows
 Column family is close to a table from relational
data model
Column Family
Row
RowKey
Column1 Column2 Column3
Value3Value2Value1
36 www.ExigenServices.com
Key-value store
Four-dimensional hash map
[Keyspace][ColumnFamily][RowKey][Column]
37 www.ExigenServices.com
Column family vs. Table
 The columns are not strictly defined
 A column family can hold columns or super
columns (collection of subcolumns)
38 www.ExigenServices.com
Column family vs. Table
 Column family has an comparator attribute
 Each column family is stored in separate file on
disk
39 www.ExigenServices.com
Column
 Basic unit of data structure
Column
name: byte[] value: byte[] clock: long
40 www.ExigenServices.com
Skinny and wide rows
 Wide rows – huge number of columns and
several rows (are used to store lists of things)
 Skinny rows – small number of columns and
many different rows (close to the relational model)
41 www.ExigenServices.com
Disadvantages of wide rows
 Badly work with RowCash
 If you have many rows and many columns you
end up with larger indexes
(~ 40GB of data and 10GB index)
42 www.ExigenServices.com
Column sorting
 Column sorting is typically important only with
wide model
 Comparator – is an attribute of column family that
specifies how column names will be compared for
sort order
43 www.ExigenServices.com
Comparator types
 Cassandra has following predefined types:
– AsciiType
– BytesType
– LexicalUUIDType
– IntegerType
– LongType
– TimeUUIDType
– UTF8Type
44 www.ExigenServices.com
Super column
Super column
name: byte[] cols: Map<byte[], Column>
• Cannot store map of super columns (only one
level deep)
• Five-dimensional hash:
[Keyspace][ColumnFamily][Key][SuperColumn][SubColumn]
 Stores map of subcolumns
45 www.ExigenServices.com
Super column family
Column families:
– Standard (default)
 Can combine columns and super columns
– Super
 More strict schema constraints
 Can store only super columns
 Subcomparator can be specified for
subcolumns
46 www.ExigenServices.com
Note that
There are no joins in Cassandra, so you can
– join data on a client side
– create denormalized second column family

More Related Content

ODP
PPT
The No SQL Principles and Basic Application Of Casandra Model
PDF
04 2017 emea_roadshowmilan_mariadb columnstore
PPTX
Cassandra Overview
PPTX
Ruby,no sql and tokyocabinet
PPTX
Hive query optimization infinity
PPTX
H base introduction & development
PDF
Redshift performance tuning
The No SQL Principles and Basic Application Of Casandra Model
04 2017 emea_roadshowmilan_mariadb columnstore
Cassandra Overview
Ruby,no sql and tokyocabinet
Hive query optimization infinity
H base introduction & development
Redshift performance tuning

Viewers also liked (9)

PPTX
Distributed data base management system
PPTX
Data base management system
PPT
Intro to DBMS
PPTX
Data Base Management System(Dbms)Sunita
PPT
Distributed Database Management System
PPTX
Rdbms
PPT
3. Relational Models in DBMS
PPTX
Distributed database
PPT
Distributed Database System
Distributed data base management system
Data base management system
Intro to DBMS
Data Base Management System(Dbms)Sunita
Distributed Database Management System
Rdbms
3. Relational Models in DBMS
Distributed database
Distributed Database System
Ad

Similar to Apache cassandra - future without boundaries (part1) (20)

PPTX
Apache cassandra - future without boundaries (part2)
PPTX
Apache cassandra - future without boundaries (part3)
PPTX
Introduction to cassandra
PPTX
Netcetera
PPTX
Apache Cassandra, part 1 – principles, data model
PPT
NOSQL Database: Apache Cassandra
PPTX
Cassandra tutorial
ODP
Intro to cassandra
PPTX
Cassandra for mission critical data
PPTX
An Overview of Apache Cassandra
PPT
5266732.ppt
PPTX
Learning Cassandra NoSQL
PDF
Cassandra introduction mars jug
PDF
White paper on cassandra
PDF
An Introduction to Apache Cassandra
PPTX
NoSQL - Cassandra & MongoDB.pptx
PPTX
BigData Developers MeetUp
PDF
Introduction to Cassandra & Data model
PDF
About "Apache Cassandra"
PPTX
Cassandra
Apache cassandra - future without boundaries (part2)
Apache cassandra - future without boundaries (part3)
Introduction to cassandra
Netcetera
Apache Cassandra, part 1 – principles, data model
NOSQL Database: Apache Cassandra
Cassandra tutorial
Intro to cassandra
Cassandra for mission critical data
An Overview of Apache Cassandra
5266732.ppt
Learning Cassandra NoSQL
Cassandra introduction mars jug
White paper on cassandra
An Introduction to Apache Cassandra
NoSQL - Cassandra & MongoDB.pptx
BigData Developers MeetUp
Introduction to Cassandra & Data model
About "Apache Cassandra"
Cassandra
Ad

More from Return on Intelligence (20)

PPTX
Clean Code Approach
PPTX
Code Coverage
PPTX
Effective Communication in english
PPTX
Anti-patterns
PPTX
Conflicts Resolving
PPTX
Database versioning with liquibase
PPTX
Effective Feedback
PPTX
English for Negotiations 2016
PPTX
Lean Software Development
PPT
Unit Tests? It is Very Simple and Easy!
PPTX
Quick Start to AngularJS
PPTX
Introduction to Backbone.js & Marionette.js
PPTX
Types of testing and their classification
PPTX
Introduction to EJB
PPTX
Enterprise Service Bus
PPTX
Career development in exigen services
PPTX
Introduction to selenium web driver
PPTX
Enterprise service bus part 2
PPT
Enterprise service bus part 1
PPTX
Apache maven 2. advanced topics
Clean Code Approach
Code Coverage
Effective Communication in english
Anti-patterns
Conflicts Resolving
Database versioning with liquibase
Effective Feedback
English for Negotiations 2016
Lean Software Development
Unit Tests? It is Very Simple and Easy!
Quick Start to AngularJS
Introduction to Backbone.js & Marionette.js
Types of testing and their classification
Introduction to EJB
Enterprise Service Bus
Career development in exigen services
Introduction to selenium web driver
Enterprise service bus part 2
Enterprise service bus part 1
Apache maven 2. advanced topics

Recently uploaded (20)

PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Digital Strategies for Manufacturing Companies
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
history of c programming in notes for students .pptx
PDF
Nekopoi APK 2025 free lastest update
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
System and Network Administraation Chapter 3
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Digital Strategies for Manufacturing Companies
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Understanding Forklifts - TECH EHS Solution
history of c programming in notes for students .pptx
Nekopoi APK 2025 free lastest update
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Design an Analysis of Algorithms II-SECS-1021-03
How to Choose the Right IT Partner for Your Business in Malaysia
CHAPTER 2 - PM Management and IT Context
System and Network Administraation Chapter 3
PTS Company Brochure 2025 (1).pdf.......
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Design an Analysis of Algorithms I-SECS-1021-03
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
VVF-Customer-Presentation2025-Ver1.9.pptx

Apache cassandra - future without boundaries (part1)