SlideShare a Scribd company logo
My First 100 days with a
Cassandra Cluster
Presented by :
Gustavo René Antúnez
February, 2017
© 2016 Pythian. Confidential
ABOUT PYTHIAN
Pythian’s 400+ IT professionals help
companies adopt 

and manage disruptive technologies
to better compete
2
TECHNICAL EXPERTISE
Infrastructure: Transforming and
managing the IT infrastructure
that supports the business
DevOps: Providing critical velocity

in software deployment by adopting

DevOps practices
Cloud: Using the disruptive

nature of cloud for accelerated,
cost-effective growth
Databases: Ensuring databases

are reliable, secure, available and
continuously optimized
Big Data: Harnessing the transformative
power of data on a massive scale
Advanced Analytics: Mining data for
insights & business transformation

using data science
3
© 2016 Pythian. Confidential 4
Welcome to RMOUG 2017
Where do I come From
–Oracle	DBA	
• Started	with	Version	9.2	in	2004	
–Speaker	at	Oracle	Open	World,	Oracle	Developers	
Day	and	Collaborate		
–Co-President	of	ORAMEX	(Mexico	Oracle	User	
Group)		
–Web	Events	Chair	for	IOUG	Cloud	Computing	
Special	Interest	Group	(SIG)	
–International	Chair	RAC	Special	Interest	Group	
(SIG);	
–Movie	Fanatic	&	Music	Lover	
–Bringing	the	best	from	México	(Mexihtli)	to	the	rest	
of	the	world	and	in	the	process	photographing	it	:)	
–rene-ace.com	
–@rene_ace	
• #TD16
5
© 2016 Pythian. Confidential 6
Where do I come From
rene-ace.com	
@rene_ace	
• #TD17
© 2016 Pythian. Confidential 7
How did you get to be a DBA
© 2016 Pythian. Confidential 8
6th Happiest Job of 2015!
http://www.forbes.com/sites/susanadams/2014/03/20/the-happiest-and-unhappiest-jobs-in-2014/
Work-life
balance
Relationship with
boss and co-workers
Daily tasks
Job resources
Field will grow by
15% between
2012 and 2022
DBA can be the
key driver of
success
© 2016 Pythian. Confidential 9
Happiest Job of 2034?
• 47	percent	of	American	jobs	are	at	high	risk	of	being	taken	by	computers	
within	the	next	two	decades.	
– 1st	Wave		
• Computers	will	start	replacing	people	in	especially	vulnerable	
fields	like	transportation/logistics,	production	labor,	and	
administrative	support.	
– 2nd	Wave	
• Dependent	upon	the	development	of	good	artificial	intelligence.	
This	could	next	put	jobs	in	management,	science	and	
engineering,	and	the	arts	at	risk.
© 2016 Pythian. Confidential 10
The most important question
Normalize	or	Denormalize
• Goal	of	normalization	is	to	store	a	
fact	in	one	place	to	minimize	
update,	delete	and	insert	
anomalies*	
• Normalized	data,	depending	on	
how	complex	the	schema	
becomes,	often	affects	query	
performance
http://blog.rdx.com/cassandra-and-relational-database-schema-comparison-query-vs-relationship-modeling
© 2016 Pythian. Confidential 11
The most important question
Normalize	or	Denormalize
• Normalize	to	reduce	data	anomalies	
and	denormalize	to	improve	query	
performance.	
• In	relational	systems,	administrators	
model	the	data.	In	Cassandra,	
administrators	design	schemas	that	
are	based	on	query	patterns.	
• Denormalization	process	is	the	
merging	of	attributes	that	are	often	
accessed	together	into	a	single	schema	
object	
http://blog.rdx.com/cassandra-and-relational-database-schema-comparison-query-vs-relationship-modeling
© 2016 Pythian. Confidential 12
What is Cassandra ?

• NoSQL	database,	developed	in	JavaOne		
• Fully	distributed	DB	
• Meaning	that	there	is	no	master	DB,	
unlike	Oracle	or	MySQL.	
• Linearly	scalable	
• Based	on	2	core	technologies,	Google’s	Big	
Table	and	Amazon’s	Dynamo	
• 2	versions	of	Cassandra	
• Community	Edition.-	This	is	distributed	
under	the	Apache™	License	
• Enterprise	Edition	.-	This	is	distributed	by	
Datastax
≠
© 2016 Pythian. Confidential 13
CAP	Theorem
• In	a	distributed	system	you	can	only	have	two	
out	of	the	following	three	guarantees	across	a	
write/read	pair:	
• Consistency.-	A	read	is	guaranteed	to	
return	the	most	recent	write	for	a	given	
client.	
• Availability.-A	non-failing	node	will	return	
a	reasonable	response	within	a	
reasonable	amount	of	time	(no	error	or	
timeout).	
• Partition	Tolerance.-The	system	will	
continue	to	function	when	network	
partitions	occur.
N1 N2
X X
N1 N2
N1 N2
What is Cassandra ?

What is Cassandra ?

© 2016 Pythian. Confidential 14
CAP	Theorem
• One	fallacy	of	distributed	computing	
is	that	networks	are	reliable
• AP	-	Availability/Partition	Tolerance	-	
Return	the	most	recent	version	of	the	
data	you	have,which	could	be	stale.	Will	
also	accept	writes	that	can	be	processed	
later	when	the	partition	is	resolved
• CP	-	Consistency/Partition	
Tolerance	-	Wait	for	a	response	
from	the	partitioned	node	which	
could	result	in	a	timeout	error.
© 2016 Pythian. Confidential 15
What is Cassandra ?

Cassandra	is	a	BASE	(Basically	
Available,	Soft	state,	Eventually	
consistent)	type	system
• Not	an	ACID	(Atomicity,	Consistency,	
Isolation,	Durability)	type	system	
Cassandra	is	classified	as	an	AP	system
© 2016 Pythian. Confidential 16
It Can be as easy as …
• Start	your	machine	and	install	the	following:	
• ntp	(Packages	are	normally	ntp,	ntpdata	and	ntp-
doc)	
• wget	(Unless	you	have	your	packages	copied	over	via	
other	means)	
• vim	(Or	your	favorite	text	editor)	
• Yum	Package	Management		
• Root	or	sudo	access	to	the	install	machine	
• Latest	version	of	Oracle	Java	SE	Runtime	
Environment	(JRE)	8	(recommended)	or	OpenJDK	7.	
• Python	2.6+	(needed	if	installing	OpsCenter)
© 2016 Pythian. Confidential 17
It Can be as easy as …
• Install	Cassandra.	
~$ sudo yum install dsc21-2.1.5-1 cassandra2.1.5-1
• Install	optional	utilities.	
~$ sudo yum install cassandra21-tools-2.1.5-1
• Start	Cassandra	service	
~$ sudo service cassandra stop
~$ sudo rm -rf /var/lib/cassandra/data/system/*
• In	the	cassandra-rackdc.properties	file	
#	indicate	the	rack	and	dc	for	this	node	
dc=Pythian	
rack=RAC1	
~$ sudo service cassandra start
© 2016 Pythian. Confidential 18
Where is everything in Cassandra?
Directories Description
/var/lib/cassandra Data	directories
/var/log/	cassandra Log	directory
/var/run/	cassandra Runtime	files
/usr/share/	cassandra Environment	settings
/usr/share/	cassandra/
lib
JAR	files
/usr/bin Optional	utilities,	such	as	sstablelevelreset,	
sstablerepairedset,	and	sstablesplit
/usr/bin Binary	files
/usr/sbin
/etc/cassandra Configuration	files
/etc/init.d Service	startup	script
/etc/security/	limits.d Cassandra	user	limits
/etc/default
/usr/share/	doc/
cassandra/examples
Sample	cassandra.yaml	files	for	stress	
testing
© 2016 Pythian. Confidential 19
I come from this world…
12c	Version	
Architecture…
© 2016 Pythian. Confidential 20
I come from this world…
Oracle…
101010
Online Redo
Log10100
Data Files Control Files
Segment
Database
Tablespace
Extent
Oracle data
block
Schema Data file
OS block
Logical
Datafile
Physical
Datafile
© 2016 Pythian. Confidential 21
I come from this world…
RAC	-	For	Node	Point	of	Failure
RAC Cluster
Node3Node2
ASM Disks
Node1
Public Network
Storage Network
ASM Network
CSS Network
ASM ASM ASM
DBB DBBDBB
Global	Data	Services		
– Service Failover / Load Balancing
© 2016 Pythian. Confidential 22
I come from this world…
Dataguard	-	For	Failover
Primary
Standby
Far	Sync	
Instance
SYNC
ASYNC
Zero	data	loss	failover
© 2016 Pythian. Confidential 23
Cassandra Architecture
Cassandra	Cluster
N1
Node
N2
Node
Rack	1
Datacenter	México
N3
Node
N4
Node
Rack	2
Datacenter	Portugal
© 2016 Pythian. Confidential 24
One Ring to Rule them All
• The	total	amount	of	data	
managed	by	the	cluster	is	
represented	as	a	ring	
• Each	node	is	assigned	a	part	of	
the	database	to	hold	based	on	
each	table’s	primary	key.	
• To	guarantee	both	availability	
and	durability	multiple	nodes	will	
be	assigned	to	the	same	data.	
• There	is	no	master	node	all	
nodes	can	perform	all	operations
1
4
3
2
A-F,T-Z,M-S
G-L,A-F,T-Z
M-S,G-L,A-F
T-Z,M-S,G-L
© 2016 Pythian. Confidential 25
Gossip
• Peer-to-peer	communication	
protocol	in	which	nodes	periodically	
exchange	state	information		
• Runs	every	second	and	exchanges	
state	messages	with	up	to	three	
other	nodes	in	the	cluster		
• Failure	detection		
• It	determines	locally	from	
gossip	state	and	history	if	
another	node	in	the	system	is	
down	or	has	come	back	up.
© 2016 Pythian. Confidential 26
Consistent Hashing
• A	hash	consists	of	one	or	more	
arithmetic	operations	on	a	piece	of	
data		
• Common	way	of	load	balancing	across	
several	nodes	
• Hash	function	must	have	a	upper	and	
lower	bound	so	objects	can	be	
mapped	in	a	circle	
• Common	Hash	algorithms	
– Simple	checksums	
– Message	Digest	(MD5)	
– Secure	Hash	Algorithm	(SHA-1/2)	
– MurmurHash
© 2016 Pythian. Confidential 27
Partitioners
• Determines	how	data	is	
distributed	across	the	nodes	
in	the	cluster		
• Function	for	deriving	a	token	
representing	a	row	from	its	
partition	key	by	hashing.
Cassandra	Offers:	
– Murmur3Partition	
– RandomPartitioner	
– ByteOrderedPartitioner	
(Not	Recommended)
© 2016 Pythian. Confidential 28
Coordinators
• Acts	as	a	proxy	between	the	client	
application	and	the	nodes	that	
own	the	data	being	requested.	
• Any	client	request	can	be	sent	to	
any	node.
© 2016 Pythian. Confidential 29
Snitch
• Is	responsible	for	keeping	all	
of	the	nodes	up	to	date	on	
what	node	has	what	data,	
what	nodes	are	currently	
down,	what	nodes	are	
bootstrapping,	etc.		
• It	Interprets	the	topology
The	most	popular	are:	
– Gossiping	property	file	
snitch	
– EC2	Snitch	
– EC2	Multi-region	snitch	
– Dynamic	Snitch
© 2016 Pythian. Confidential 30
© 2016 Pythian. Confidential 31
Logical database container
Data	is	Stored	in	Keyspaces
© 2016 Pythian. Confidential 32
Model Around Your Queries
• Determine	What	Queries	to	
Support	
• Grouping	by	an	attribute	
• Ordering	by	an	attribute	
• Filtering	based	on	some	
set	of	conditions	
• Create	a	table	where	you	
can	satisfy	your	query	
• generally	means	you	will	
use	roughly	one	table	
per	query	pattern
© 2016 Pythian. Confidential 33
A Cassandra Table or Column Family
Coordinator
Snitch
Commitlog	Writer
Mem	table	writer
Mem	Table	Flush	(Sstable	
writer)
Reader
Mem	tables
Bloom	Filters
Cassandra	Node
CommitLog
10100
SSTables
© 2016 Pythian. Confidential 34
A Cassandra Table or Column Family
• Consists	of	one	or	more	SStables	and	
0	or	more	MEMtables	
• SStable	stands	for	Sorted	String	Table.		
• E.G.	all	of	the	Columns	in	the	
SStable	are	sorted	in	order	by	
key.	
• Each	SStable	consists	of	the	data	
table,	bloom	filter,	index	and	some	
other	minor	files.	
• SStables	are	immutable.	Once	written	
they	are	never	altered	only	read	and	
eventually	deleted
videogames-events-data-jb-1.db
videogames-events-filters-jb-1.db
videogames-events-index-jb-1.db
videogames-events-data-jb-2.db
videogames-events-filters-jb-2.db
videogames-events-index-jb-2.db
videogames-events-data-jb-3.db
videogames-events-filters-jb-3.db
videogames-events-index-jb-3.db
videogames-events-data-jb-4.db
videogames-events-filters-jb-4.db
videogames-events-index-jb-4.db
SStables	on	disk	
/var/lib/cassandra
© 2016 Pythian. Confidential 35
Replication Factor (RF) and Consistency
• Replication	Factor	is	the	
number	of	copies	of	
columns	stored	in	the	ring	
• Replication	factor	should	
not	exceed	the	number	of	
nodes	in	the	cluster
– RF=1	is	one	copy	this	means	that	
the	data	for	each	column	is	stored	
only	once	in	the	ring.	
– RF=3	(default)	means	every	
column	stored	in	the	database	is	
stored	three	times.	
– Quorum	.-	The	read	and	write	
must	be	acked/returned	from	a	
quorum	of	nodes.
© 2016 Pythian. Confidential 36
Replication Factor (RF) and Consistency
• Consistency	
– When	write	or	read	is	
performed	the	application	can	
choose	to	wait	for	n	copies	of	
the	data	to	be	written	or	read	
this	is	referred	to	as	consistency	
of	n.	
– There	is	a	special	consistency	
value	called	quorum	which	
means	a	response	from	RF/2+1	
nodes	is	required.
© 2016 Pythian. Confidential 37
How to make sure we don’t loose data
• Three	anti-entropy	mechanisms	in	Cassandra	
1)	Hinted	handoff	
2)	Read	repair		
3)	Repair
A.K.A.	Anti-Entropy
© 2016 Pythian. Confidential 38
Write Path
© 2016 Pythian. Confidential 39
Compactions
• SStables	are	immutable.	
• Deletes	and	updates	are	just	new	
writes		
• SStables	are	merged	together	by	
partitioned	key.Old	obsolete	data	is	
discarded.	
• Lots	of	SStables	become	a	few.	
• Compaction	can	require	a	lot	of	
disk	space.	DO	NOT	LET	your	disks	
get	more	than	50%	full.
© 2016 Pythian. Confidential 40
CQL - Cassandra Query Language
CQL	is	not	SQL
• Default	and	primary	interface	into	the	Cassandra	Database	(since	2.0)	
• Cassandra	does	not	support	joins	or	subqueries	
• Only	way	to	create	users	and	user	based	permissions	
• Very	similar:	
cqlsh> CREATE KEYSPACE sandbox WITH REPLICATION = { 'class' :
'NetworkTopologyStrategy', DC1 : 1};
cqlsh> USE sandbox;
cqlsh:sandbox>CREATE TABLE data (id uuid, data text, PRIMARY KEY (id));
cqlsh:sandbox> INSERT INTO data (id, data) values
(c37d661d-7e61-49ea-96a5-68c34e83db3a, 'testing');
cqlsh:sandbox> SELECT * FROM data;
© 2016 Pythian. Confidential 41
© 2016 Pythian. Confidential 42
Feature/Function	 DSE/Cassandra Oracle	RDBMS	
Core architecture “Masterless”; peer-to-peer with
all nodes being the same
Traditional standalone
High availability Continuous availability with built
in redundancy and hardware
rack awareness in both single
and multiple data centers
Oracle Dataguard (for failover)
and Oracle RAC (Node SPOF)
GoldenGate
Data model Google Bigtable Relational/tabular
Data consistency model Tunable consistency (CAP
theorem consistency per
operation
Traditional ACID
Storage model Targeted directories with
separation
Tablespaces
Logical database
container
Keyspace Database
Backup/recovery Online, point-in-time restore Online, point-in-time restore
Enterprise management/
monitoring
DataStax OpsCenter Oracle Enterprise Manager
© 2016 Pythian. Confidential 43
Lessons Learned
• Understand	the	Data	Model	Differences	
• Hardware	Setup	does	Matter	
• Grep	the	logs	for	errors	and	warnings	
• Make	sure	each	node	is	created	properly	
• Know	your	tools	
• nodetool	utility	
• Cassandra	bulk	loader	(sstableloader)	
• jconsole/JavaVisualVM	
• Cassandra-Stress	
• OpsCenter
© 2016 Pythian. Confidential 44
© 2016 Pythian. Confidential 45
rene-ace.com
Thank you – Q&A CONSULTING & STRATEGY
IMPLEMENTATIONMANAGED SERVICES
To contact us
sales@pythian.com
1-877-PYTHIAN
To follow us
http://www.pythian.com/blog
http://goo.gl/bImXcJ
@pythian
http://goo.gl/DMXExf

More Related Content

PDF
My First 100 days with a MySQL DBMS
PDF
My First 100 days with a MySQL DBMS (WP)
PDF
Fast, Flexible Application Development with Oracle Database Cloud Service
PDF
Optimize and Simplify Oracle 12C RAC using dNFS, ZFS and OISP
PDF
RMAN in 12c: The Next Generation (WP)
PDF
DBA 101 : Calling all New Database Administrators (WP)
PDF
How DBAs can garner the power of the Oracle Public Cloud?
PPTX
Exadata 12c New Features RMOUG
My First 100 days with a MySQL DBMS
My First 100 days with a MySQL DBMS (WP)
Fast, Flexible Application Development with Oracle Database Cloud Service
Optimize and Simplify Oracle 12C RAC using dNFS, ZFS and OISP
RMAN in 12c: The Next Generation (WP)
DBA 101 : Calling all New Database Administrators (WP)
How DBAs can garner the power of the Oracle Public Cloud?
Exadata 12c New Features RMOUG

What's hot (20)

PDF
My First 100 days with an Exadata (PPT)
PDF
RMAN best practices for RAC
PDF
MIgrating to RAC using Dataguard
PDF
12 Things about Oracle WebLogic Server 12c
PDF
Oracle 12c and its pluggable databases
PDF
DBA 101 : Calling all New Database Administrators (PPT)
PPTX
MIgrating from Single Instance to RAC via Dataguard
PPTX
A Second Look at Oracle RAC 12c
PDF
Oracle RAC 12c Collaborate Best Practices - IOUG 2014 version
PPTX
Intro to Exadata
PPT
MySQL Cluster Basics
PDF
MySQL 5.7: Focus on InnoDB
PDF
RAC Attack 12c Installation Instruction
PDF
Oracle Failover Database Cluster with Grid Infrastructure 12c
DOC
Migrating from Single Instance to RAC Data guard
PDF
Oracle data guard for beginners
PDF
Winning performance challenges in oracle multitenant
PDF
RMAN in 12c: The Next Generation (PPT)
PDF
Oracle12c data guard farsync and whats new - Nassyam Basha
PDF
Oracle RAC 12c Release 2 - Overview
My First 100 days with an Exadata (PPT)
RMAN best practices for RAC
MIgrating to RAC using Dataguard
12 Things about Oracle WebLogic Server 12c
Oracle 12c and its pluggable databases
DBA 101 : Calling all New Database Administrators (PPT)
MIgrating from Single Instance to RAC via Dataguard
A Second Look at Oracle RAC 12c
Oracle RAC 12c Collaborate Best Practices - IOUG 2014 version
Intro to Exadata
MySQL Cluster Basics
MySQL 5.7: Focus on InnoDB
RAC Attack 12c Installation Instruction
Oracle Failover Database Cluster with Grid Infrastructure 12c
Migrating from Single Instance to RAC Data guard
Oracle data guard for beginners
Winning performance challenges in oracle multitenant
RMAN in 12c: The Next Generation (PPT)
Oracle12c data guard farsync and whats new - Nassyam Basha
Oracle RAC 12c Release 2 - Overview
Ad

Similar to My First 100 days with a Cassandra Cluster (20)

PDF
Pythian: My First 100 days with a Cassandra Cluster
PDF
Cassandra Workshop - Cassandra from scratch in one day
PDF
Getting Started with Apache Cassandra by Junior Evangelist Rebecca Mills
PDF
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
PPTX
Performance Testing: Scylla vs. Cassandra vs. Datastax
PPT
Webinar: Getting Started with Apache Cassandra
PPTX
Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...
PDF
Vertafore: Database Evaluation - Selecting Apache Cassandra
PPTX
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
PPTX
Using Cassandra with your Web Application
PDF
Cassandra and docker
PPTX
Simplify Consolidation with Oracle Database 12c
PDF
Running Oracle EBS in the cloud (UKOUG APPS16 edition)
PPTX
John Glendenning - Real time data driven services in the Cloud
PDF
Building Apache Cassandra clusters for massive scale
PPTX
Cassandra installation
PDF
OOW13: Accelerate your Exadata deployment with the DBA skills you already have
PDF
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
PPTX
DevOps for DBAs
PPTX
BigData Developers MeetUp
Pythian: My First 100 days with a Cassandra Cluster
Cassandra Workshop - Cassandra from scratch in one day
Getting Started with Apache Cassandra by Junior Evangelist Rebecca Mills
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
Performance Testing: Scylla vs. Cassandra vs. Datastax
Webinar: Getting Started with Apache Cassandra
Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...
Vertafore: Database Evaluation - Selecting Apache Cassandra
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
Using Cassandra with your Web Application
Cassandra and docker
Simplify Consolidation with Oracle Database 12c
Running Oracle EBS in the cloud (UKOUG APPS16 edition)
John Glendenning - Real time data driven services in the Cloud
Building Apache Cassandra clusters for massive scale
Cassandra installation
OOW13: Accelerate your Exadata deployment with the DBA skills you already have
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
DevOps for DBAs
BigData Developers MeetUp
Ad

More from Gustavo Rene Antunez (7)

PDF
Why You Need Manageability Now More than Ever and How to Get It
PDF
#19sverificado : How Data Analytics helped put order in chaos in Mexico's ...
PDF
Architecting your own DBaaS in a Private Cloud with EM12c (WP)
PDF
Architecting Your Own DBaaS in a Private Cloud with EM12c
PDF
Cosas que “probablemente” no sabes pero deberías de saber en Oracle 12c
PDF
My First 100 days with an Exadata (WP)
PDF
How to survive a disaster with RMAN
Why You Need Manageability Now More than Ever and How to Get It
#19sverificado : How Data Analytics helped put order in chaos in Mexico's ...
Architecting your own DBaaS in a Private Cloud with EM12c (WP)
Architecting Your Own DBaaS in a Private Cloud with EM12c
Cosas que “probablemente” no sabes pero deberías de saber en Oracle 12c
My First 100 days with an Exadata (WP)
How to survive a disaster with RMAN

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
KodekX | Application Modernization Development
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
MYSQL Presentation for SQL database connectivity
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPT
Teaching material agriculture food technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation_ Review paper, used for researhc scholars
Per capita expenditure prediction using model stacking based on satellite ima...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Chapter 3 Spatial Domain Image Processing.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Unlocking AI with Model Context Protocol (MCP)
Advanced methodologies resolving dimensionality complications for autism neur...
MIND Revenue Release Quarter 2 2025 Press Release
KodekX | Application Modernization Development
“AI and Expert System Decision Support & Business Intelligence Systems”
MYSQL Presentation for SQL database connectivity
The Rise and Fall of 3GPP – Time for a Sabbatical?
Teaching material agriculture food technology
The AUB Centre for AI in Media Proposal.docx
sap open course for s4hana steps from ECC to s4
Building Integrated photovoltaic BIPV_UPV.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx

My First 100 days with a Cassandra Cluster

  • 1. My First 100 days with a Cassandra Cluster Presented by : Gustavo René Antúnez February, 2017
  • 2. © 2016 Pythian. Confidential ABOUT PYTHIAN Pythian’s 400+ IT professionals help companies adopt 
 and manage disruptive technologies to better compete 2
  • 3. TECHNICAL EXPERTISE Infrastructure: Transforming and managing the IT infrastructure that supports the business DevOps: Providing critical velocity
 in software deployment by adopting
 DevOps practices Cloud: Using the disruptive
 nature of cloud for accelerated, cost-effective growth Databases: Ensuring databases
 are reliable, secure, available and continuously optimized Big Data: Harnessing the transformative power of data on a massive scale Advanced Analytics: Mining data for insights & business transformation
 using data science 3
  • 4. © 2016 Pythian. Confidential 4 Welcome to RMOUG 2017
  • 5. Where do I come From –Oracle DBA • Started with Version 9.2 in 2004 –Speaker at Oracle Open World, Oracle Developers Day and Collaborate –Co-President of ORAMEX (Mexico Oracle User Group) –Web Events Chair for IOUG Cloud Computing Special Interest Group (SIG) –International Chair RAC Special Interest Group (SIG); –Movie Fanatic & Music Lover –Bringing the best from México (Mexihtli) to the rest of the world and in the process photographing it :) –rene-ace.com –@rene_ace • #TD16 5
  • 6. © 2016 Pythian. Confidential 6 Where do I come From rene-ace.com @rene_ace • #TD17
  • 7. © 2016 Pythian. Confidential 7 How did you get to be a DBA
  • 8. © 2016 Pythian. Confidential 8 6th Happiest Job of 2015! http://www.forbes.com/sites/susanadams/2014/03/20/the-happiest-and-unhappiest-jobs-in-2014/ Work-life balance Relationship with boss and co-workers Daily tasks Job resources Field will grow by 15% between 2012 and 2022 DBA can be the key driver of success
  • 9. © 2016 Pythian. Confidential 9 Happiest Job of 2034? • 47 percent of American jobs are at high risk of being taken by computers within the next two decades. – 1st Wave • Computers will start replacing people in especially vulnerable fields like transportation/logistics, production labor, and administrative support. – 2nd Wave • Dependent upon the development of good artificial intelligence. This could next put jobs in management, science and engineering, and the arts at risk.
  • 10. © 2016 Pythian. Confidential 10 The most important question Normalize or Denormalize • Goal of normalization is to store a fact in one place to minimize update, delete and insert anomalies* • Normalized data, depending on how complex the schema becomes, often affects query performance http://blog.rdx.com/cassandra-and-relational-database-schema-comparison-query-vs-relationship-modeling
  • 11. © 2016 Pythian. Confidential 11 The most important question Normalize or Denormalize • Normalize to reduce data anomalies and denormalize to improve query performance. • In relational systems, administrators model the data. In Cassandra, administrators design schemas that are based on query patterns. • Denormalization process is the merging of attributes that are often accessed together into a single schema object http://blog.rdx.com/cassandra-and-relational-database-schema-comparison-query-vs-relationship-modeling
  • 12. © 2016 Pythian. Confidential 12 What is Cassandra ?
 • NoSQL database, developed in JavaOne • Fully distributed DB • Meaning that there is no master DB, unlike Oracle or MySQL. • Linearly scalable • Based on 2 core technologies, Google’s Big Table and Amazon’s Dynamo • 2 versions of Cassandra • Community Edition.- This is distributed under the Apache™ License • Enterprise Edition .- This is distributed by Datastax ≠
  • 13. © 2016 Pythian. Confidential 13 CAP Theorem • In a distributed system you can only have two out of the following three guarantees across a write/read pair: • Consistency.- A read is guaranteed to return the most recent write for a given client. • Availability.-A non-failing node will return a reasonable response within a reasonable amount of time (no error or timeout). • Partition Tolerance.-The system will continue to function when network partitions occur. N1 N2 X X N1 N2 N1 N2 What is Cassandra ?

  • 14. What is Cassandra ?
 © 2016 Pythian. Confidential 14 CAP Theorem • One fallacy of distributed computing is that networks are reliable • AP - Availability/Partition Tolerance - Return the most recent version of the data you have,which could be stale. Will also accept writes that can be processed later when the partition is resolved • CP - Consistency/Partition Tolerance - Wait for a response from the partitioned node which could result in a timeout error.
  • 15. © 2016 Pythian. Confidential 15 What is Cassandra ?
 Cassandra is a BASE (Basically Available, Soft state, Eventually consistent) type system • Not an ACID (Atomicity, Consistency, Isolation, Durability) type system Cassandra is classified as an AP system
  • 16. © 2016 Pythian. Confidential 16 It Can be as easy as … • Start your machine and install the following: • ntp (Packages are normally ntp, ntpdata and ntp- doc) • wget (Unless you have your packages copied over via other means) • vim (Or your favorite text editor) • Yum Package Management • Root or sudo access to the install machine • Latest version of Oracle Java SE Runtime Environment (JRE) 8 (recommended) or OpenJDK 7. • Python 2.6+ (needed if installing OpsCenter)
  • 17. © 2016 Pythian. Confidential 17 It Can be as easy as … • Install Cassandra. ~$ sudo yum install dsc21-2.1.5-1 cassandra2.1.5-1 • Install optional utilities. ~$ sudo yum install cassandra21-tools-2.1.5-1 • Start Cassandra service ~$ sudo service cassandra stop ~$ sudo rm -rf /var/lib/cassandra/data/system/* • In the cassandra-rackdc.properties file # indicate the rack and dc for this node dc=Pythian rack=RAC1 ~$ sudo service cassandra start
  • 18. © 2016 Pythian. Confidential 18 Where is everything in Cassandra? Directories Description /var/lib/cassandra Data directories /var/log/ cassandra Log directory /var/run/ cassandra Runtime files /usr/share/ cassandra Environment settings /usr/share/ cassandra/ lib JAR files /usr/bin Optional utilities, such as sstablelevelreset, sstablerepairedset, and sstablesplit /usr/bin Binary files /usr/sbin /etc/cassandra Configuration files /etc/init.d Service startup script /etc/security/ limits.d Cassandra user limits /etc/default /usr/share/ doc/ cassandra/examples Sample cassandra.yaml files for stress testing
  • 19. © 2016 Pythian. Confidential 19 I come from this world… 12c Version Architecture…
  • 20. © 2016 Pythian. Confidential 20 I come from this world… Oracle… 101010 Online Redo Log10100 Data Files Control Files Segment Database Tablespace Extent Oracle data block Schema Data file OS block Logical Datafile Physical Datafile
  • 21. © 2016 Pythian. Confidential 21 I come from this world… RAC - For Node Point of Failure RAC Cluster Node3Node2 ASM Disks Node1 Public Network Storage Network ASM Network CSS Network ASM ASM ASM DBB DBBDBB Global Data Services – Service Failover / Load Balancing
  • 22. © 2016 Pythian. Confidential 22 I come from this world… Dataguard - For Failover Primary Standby Far Sync Instance SYNC ASYNC Zero data loss failover
  • 23. © 2016 Pythian. Confidential 23 Cassandra Architecture Cassandra Cluster N1 Node N2 Node Rack 1 Datacenter México N3 Node N4 Node Rack 2 Datacenter Portugal
  • 24. © 2016 Pythian. Confidential 24 One Ring to Rule them All • The total amount of data managed by the cluster is represented as a ring • Each node is assigned a part of the database to hold based on each table’s primary key. • To guarantee both availability and durability multiple nodes will be assigned to the same data. • There is no master node all nodes can perform all operations 1 4 3 2 A-F,T-Z,M-S G-L,A-F,T-Z M-S,G-L,A-F T-Z,M-S,G-L
  • 25. © 2016 Pythian. Confidential 25 Gossip • Peer-to-peer communication protocol in which nodes periodically exchange state information • Runs every second and exchanges state messages with up to three other nodes in the cluster • Failure detection • It determines locally from gossip state and history if another node in the system is down or has come back up.
  • 26. © 2016 Pythian. Confidential 26 Consistent Hashing • A hash consists of one or more arithmetic operations on a piece of data • Common way of load balancing across several nodes • Hash function must have a upper and lower bound so objects can be mapped in a circle • Common Hash algorithms – Simple checksums – Message Digest (MD5) – Secure Hash Algorithm (SHA-1/2) – MurmurHash
  • 27. © 2016 Pythian. Confidential 27 Partitioners • Determines how data is distributed across the nodes in the cluster • Function for deriving a token representing a row from its partition key by hashing. Cassandra Offers: – Murmur3Partition – RandomPartitioner – ByteOrderedPartitioner (Not Recommended)
  • 28. © 2016 Pythian. Confidential 28 Coordinators • Acts as a proxy between the client application and the nodes that own the data being requested. • Any client request can be sent to any node.
  • 29. © 2016 Pythian. Confidential 29 Snitch • Is responsible for keeping all of the nodes up to date on what node has what data, what nodes are currently down, what nodes are bootstrapping, etc. • It Interprets the topology The most popular are: – Gossiping property file snitch – EC2 Snitch – EC2 Multi-region snitch – Dynamic Snitch
  • 30. © 2016 Pythian. Confidential 30
  • 31. © 2016 Pythian. Confidential 31 Logical database container Data is Stored in Keyspaces
  • 32. © 2016 Pythian. Confidential 32 Model Around Your Queries • Determine What Queries to Support • Grouping by an attribute • Ordering by an attribute • Filtering based on some set of conditions • Create a table where you can satisfy your query • generally means you will use roughly one table per query pattern
  • 33. © 2016 Pythian. Confidential 33 A Cassandra Table or Column Family Coordinator Snitch Commitlog Writer Mem table writer Mem Table Flush (Sstable writer) Reader Mem tables Bloom Filters Cassandra Node CommitLog 10100 SSTables
  • 34. © 2016 Pythian. Confidential 34 A Cassandra Table or Column Family • Consists of one or more SStables and 0 or more MEMtables • SStable stands for Sorted String Table. • E.G. all of the Columns in the SStable are sorted in order by key. • Each SStable consists of the data table, bloom filter, index and some other minor files. • SStables are immutable. Once written they are never altered only read and eventually deleted videogames-events-data-jb-1.db videogames-events-filters-jb-1.db videogames-events-index-jb-1.db videogames-events-data-jb-2.db videogames-events-filters-jb-2.db videogames-events-index-jb-2.db videogames-events-data-jb-3.db videogames-events-filters-jb-3.db videogames-events-index-jb-3.db videogames-events-data-jb-4.db videogames-events-filters-jb-4.db videogames-events-index-jb-4.db SStables on disk /var/lib/cassandra
  • 35. © 2016 Pythian. Confidential 35 Replication Factor (RF) and Consistency • Replication Factor is the number of copies of columns stored in the ring • Replication factor should not exceed the number of nodes in the cluster – RF=1 is one copy this means that the data for each column is stored only once in the ring. – RF=3 (default) means every column stored in the database is stored three times. – Quorum .- The read and write must be acked/returned from a quorum of nodes.
  • 36. © 2016 Pythian. Confidential 36 Replication Factor (RF) and Consistency • Consistency – When write or read is performed the application can choose to wait for n copies of the data to be written or read this is referred to as consistency of n. – There is a special consistency value called quorum which means a response from RF/2+1 nodes is required.
  • 37. © 2016 Pythian. Confidential 37 How to make sure we don’t loose data • Three anti-entropy mechanisms in Cassandra 1) Hinted handoff 2) Read repair 3) Repair A.K.A. Anti-Entropy
  • 38. © 2016 Pythian. Confidential 38 Write Path
  • 39. © 2016 Pythian. Confidential 39 Compactions • SStables are immutable. • Deletes and updates are just new writes • SStables are merged together by partitioned key.Old obsolete data is discarded. • Lots of SStables become a few. • Compaction can require a lot of disk space. DO NOT LET your disks get more than 50% full.
  • 40. © 2016 Pythian. Confidential 40 CQL - Cassandra Query Language CQL is not SQL • Default and primary interface into the Cassandra Database (since 2.0) • Cassandra does not support joins or subqueries • Only way to create users and user based permissions • Very similar: cqlsh> CREATE KEYSPACE sandbox WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', DC1 : 1}; cqlsh> USE sandbox; cqlsh:sandbox>CREATE TABLE data (id uuid, data text, PRIMARY KEY (id)); cqlsh:sandbox> INSERT INTO data (id, data) values (c37d661d-7e61-49ea-96a5-68c34e83db3a, 'testing'); cqlsh:sandbox> SELECT * FROM data;
  • 41. © 2016 Pythian. Confidential 41
  • 42. © 2016 Pythian. Confidential 42 Feature/Function DSE/Cassandra Oracle RDBMS Core architecture “Masterless”; peer-to-peer with all nodes being the same Traditional standalone High availability Continuous availability with built in redundancy and hardware rack awareness in both single and multiple data centers Oracle Dataguard (for failover) and Oracle RAC (Node SPOF) GoldenGate Data model Google Bigtable Relational/tabular Data consistency model Tunable consistency (CAP theorem consistency per operation Traditional ACID Storage model Targeted directories with separation Tablespaces Logical database container Keyspace Database Backup/recovery Online, point-in-time restore Online, point-in-time restore Enterprise management/ monitoring DataStax OpsCenter Oracle Enterprise Manager
  • 43. © 2016 Pythian. Confidential 43 Lessons Learned • Understand the Data Model Differences • Hardware Setup does Matter • Grep the logs for errors and warnings • Make sure each node is created properly • Know your tools • nodetool utility • Cassandra bulk loader (sstableloader) • jconsole/JavaVisualVM • Cassandra-Stress • OpsCenter
  • 44. © 2016 Pythian. Confidential 44
  • 45. © 2016 Pythian. Confidential 45 rene-ace.com
  • 46. Thank you – Q&A CONSULTING & STRATEGY IMPLEMENTATIONMANAGED SERVICES To contact us sales@pythian.com 1-877-PYTHIAN To follow us http://www.pythian.com/blog http://goo.gl/bImXcJ @pythian http://goo.gl/DMXExf