SlideShare a Scribd company logo
NoSQL Session II
Agenda
• Session I recap
- Why NoSQL/ Drawback of Relational DB
- Common Characteristics
- Storage Mechanism
- CAP Theorem & Advantages
• Data stax Apache Cassandra Installation
• Cassandra Concepts
Features of Cassandra
• Column based storage mechanism
• High Availability
• High Scalability/ Horizontal scaling
• Predictable performance
• No SPOF – Single point of failure
• Multi DC – Data Center/ Multi region availability
• Commodity Hardware
• Easy to manage operationally
ARCHITECTURE
• Node – One Cassandra instance
• Rack – A Logical set of Nodes
• Data Center – A Logical set of Racks
• Cluster - The full set of nodes which map to a
single complete token ring
CQL
• CREATE KEYSPACE “KeySpace Name” WITH
replication = {'class': ‘Strategy name’,
'replication_factor' : ‘No.Of replicas’}
• CREATE TABLE tablename( column1 name
datatype PRIMARYKEY, column2 name data
type, column3 name data type, PRIMARY KEY
(column1) )
NoSQL Session II
Strategy name Description
Simple Strategy' Specifies a simple replication
factor for the cluster.
Network Topology
Strategy
Using this option, you can set
the replication factor for each
data-center independently.
The replication option is to specify the Replica
Placement strategy and the number of replicas
wanted. The following table lists all the replica
placement strategies.
CONSISTENCY
• Consistency levels are available for Read and
Write Operations.
• ANY, ALL, QUORUM([RF/2]+1), EACH, etc
• High Consistency – Low Availability
• Low Consistency – High Availability
SEED & CO-ORDINATOR NODE
• Seeds and Coordinators serve different purposes.
• Seed nodes: In general it is recommended to have 2 seeds for
the whole cluster. If you have multi-datacenter cluster then
you may want to distribute the seeds across each datacenter.
• Coordinator nodes: Every node can be a coordinator (as
designed by Cassandra). Coordinator is picked by Cassandra
per request and the only thing you can change is how it is
picked - for example Round-Robin (default) or DC-aware,
LatencyAware. This is found in the cassandra.yaml file.
NoSQL Session II
• Maximum columns per row is 2 billion, but in
practical it is about 10 to 20 thousand max
used
• Maximum data size per cell (Column value) is
2 GB, but in practical it is about 10MB used.
CLUSTER TOPOLOGY
• Cluster communicated - SNITCH and Gossip
• Hinted- Handoff
• Write path
• Read path
• Read Repair
• Configuration – Cassandra.yaml file
SNITCHES & GOSSIP
• Snitch - Cassandra does its best not to have
more than one replica on the same rack to
avoid duplicate
• Determines the location of nodes by rack and
data center corresponding to the IP addresses
• Gossip – Once per second each node gossip’s
each other to update themselves
• Hinted Hand Off – A recovery mechanism for
writes targeting offline nodes
• Grace time can be maintained yaml file
• Property – max_hint_window_in_ms : 1000
• hinted_handoff enable: true
Write Path
SSTable – Static & Sorted Table
• Immutable data file for row storage
• Partition is spread across multiple SS Table
based on timestamp
• Easy Backup – Delete is marked as
“TombStones”
Read Path
• Read Repair – When any node is stale it is
marked as read-repair
• Property – read_repair_chance
Thank You !!!
Continue in Next session

More Related Content

PDF
Cassandra
PPTX
Cassandra
PDF
Introduction to Apache Cassandra
PDF
Ceph as storage for CloudStack
PDF
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
PDF
Effectively deploying hadoop to the cloud
PDF
Elastic HBase on Mesos - HBaseCon 2015
PDF
Cassandra for Sysadmins
Cassandra
Cassandra
Introduction to Apache Cassandra
Ceph as storage for CloudStack
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
Effectively deploying hadoop to the cloud
Elastic HBase on Mesos - HBaseCon 2015
Cassandra for Sysadmins

What's hot (20)

PPTX
Kafka website activity architecture
PDF
Using ZFS file system with MySQL
PDF
Apache Cassandra in the Real World
ODP
Hadoop2
PDF
Apache cassandra architecture internals
ODP
MySQL HA
PDF
Building AuroraObjects- Ceph Day Frankfurt
PDF
Cassandra Workshop - Cassandra from scratch in one day
PPTX
UJUG Craftsmanship Roundup April 2017
PDF
MEETUP - Unboxing Apache Cassandra 3.10
PDF
HBaseCon 2015- HBase @ Flipboard
PDF
Cassandra Redis
PDF
Orchestrating Cassandra with Kubernetes: Challenges and Opportunities
PDF
ScyllaDB: NoSQL at Ludicrous Speed
PDF
CASSANDRA MEETUP - Choosing the right cloud instances for success
ODP
Intro to cassandra
PDF
DataStax: Backup and Restore in Cassandra and OpsCenter
PPTX
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
PPTX
Hadoop Meetup Jan 2019 - Hadoop On Azure
PDF
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
Kafka website activity architecture
Using ZFS file system with MySQL
Apache Cassandra in the Real World
Hadoop2
Apache cassandra architecture internals
MySQL HA
Building AuroraObjects- Ceph Day Frankfurt
Cassandra Workshop - Cassandra from scratch in one day
UJUG Craftsmanship Roundup April 2017
MEETUP - Unboxing Apache Cassandra 3.10
HBaseCon 2015- HBase @ Flipboard
Cassandra Redis
Orchestrating Cassandra with Kubernetes: Challenges and Opportunities
ScyllaDB: NoSQL at Ludicrous Speed
CASSANDRA MEETUP - Choosing the right cloud instances for success
Intro to cassandra
DataStax: Backup and Restore in Cassandra and OpsCenter
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Hadoop Meetup Jan 2019 - Hadoop On Azure
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
Ad

Viewers also liked (15)

PPTX
Stations and Yards of Railway
PDF
Top 5 - Insight Report
DOCX
La corrosión
PPT
Ccn pm1ch02 v5
PPTX
Web Académica 2.0
PPTX
Phlipp, Franco, Nacho y Lucas
PDF
Vfd drive motor load
PPT
Comparacion netiner linkedin aguilar-moron
ODT
Ud06 e04 comandos avanzados
PPS
Errores Historicos
PPTX
Legislación penal especial presentación sandimar
PPTX
3Com 005686-03
PPT
Vii+jornadas+de+historia+en+llerena
PPTX
Conferencia comunicacion-interna
PDF
16. columna vertebral torácica
Stations and Yards of Railway
Top 5 - Insight Report
La corrosión
Ccn pm1ch02 v5
Web Académica 2.0
Phlipp, Franco, Nacho y Lucas
Vfd drive motor load
Comparacion netiner linkedin aguilar-moron
Ud06 e04 comandos avanzados
Errores Historicos
Legislación penal especial presentación sandimar
3Com 005686-03
Vii+jornadas+de+historia+en+llerena
Conferencia comunicacion-interna
16. columna vertebral torácica
Ad

Similar to NoSQL Session II (20)

PPT
Cassandra - A Distributed Database System
PPTX
BigData Developers MeetUp
PPTX
Apache Cassandra at the Geek2Geek Berlin
PDF
04-Introduction-to-CassandraDB-.pdf
PPTX
Cassandra for mission critical data
PPTX
Cassandra - A decentralized storage system
PPTX
CASSANDRA apache cassandra apacheee.pptx
PPTX
Talk about apache cassandra, TWJUG 2011
PPTX
Talk About Apache Cassandra
PDF
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
PPTX
Appache Cassandra
PPTX
NoSQL - Cassandra & MongoDB.pptx
PPTX
CASSANDRA - Next to RDBMS
PPTX
4 use cases for C* to Scylla
PPTX
Cassndra (4).pptx
PDF
Using cassandra as a distributed logging to store pb data
PPTX
Yaroslav Nedashkovsky - "Data Engineering in Information Security: how to col...
PDF
Apache Cassandra in the Real World
PPTX
M6d cassandrapresentation
PPTX
Cassandra
Cassandra - A Distributed Database System
BigData Developers MeetUp
Apache Cassandra at the Geek2Geek Berlin
04-Introduction-to-CassandraDB-.pdf
Cassandra for mission critical data
Cassandra - A decentralized storage system
CASSANDRA apache cassandra apacheee.pptx
Talk about apache cassandra, TWJUG 2011
Talk About Apache Cassandra
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Appache Cassandra
NoSQL - Cassandra & MongoDB.pptx
CASSANDRA - Next to RDBMS
4 use cases for C* to Scylla
Cassndra (4).pptx
Using cassandra as a distributed logging to store pb data
Yaroslav Nedashkovsky - "Data Engineering in Information Security: how to col...
Apache Cassandra in the Real World
M6d cassandrapresentation
Cassandra

NoSQL Session II

  • 2. Agenda • Session I recap - Why NoSQL/ Drawback of Relational DB - Common Characteristics - Storage Mechanism - CAP Theorem & Advantages • Data stax Apache Cassandra Installation • Cassandra Concepts
  • 3. Features of Cassandra • Column based storage mechanism • High Availability • High Scalability/ Horizontal scaling • Predictable performance • No SPOF – Single point of failure • Multi DC – Data Center/ Multi region availability • Commodity Hardware • Easy to manage operationally
  • 5. • Node – One Cassandra instance • Rack – A Logical set of Nodes • Data Center – A Logical set of Racks • Cluster - The full set of nodes which map to a single complete token ring
  • 6. CQL • CREATE KEYSPACE “KeySpace Name” WITH replication = {'class': ‘Strategy name’, 'replication_factor' : ‘No.Of replicas’} • CREATE TABLE tablename( column1 name datatype PRIMARYKEY, column2 name data type, column3 name data type, PRIMARY KEY (column1) )
  • 8. Strategy name Description Simple Strategy' Specifies a simple replication factor for the cluster. Network Topology Strategy Using this option, you can set the replication factor for each data-center independently. The replication option is to specify the Replica Placement strategy and the number of replicas wanted. The following table lists all the replica placement strategies.
  • 9. CONSISTENCY • Consistency levels are available for Read and Write Operations. • ANY, ALL, QUORUM([RF/2]+1), EACH, etc • High Consistency – Low Availability • Low Consistency – High Availability
  • 10. SEED & CO-ORDINATOR NODE • Seeds and Coordinators serve different purposes. • Seed nodes: In general it is recommended to have 2 seeds for the whole cluster. If you have multi-datacenter cluster then you may want to distribute the seeds across each datacenter. • Coordinator nodes: Every node can be a coordinator (as designed by Cassandra). Coordinator is picked by Cassandra per request and the only thing you can change is how it is picked - for example Round-Robin (default) or DC-aware, LatencyAware. This is found in the cassandra.yaml file.
  • 12. • Maximum columns per row is 2 billion, but in practical it is about 10 to 20 thousand max used • Maximum data size per cell (Column value) is 2 GB, but in practical it is about 10MB used.
  • 13. CLUSTER TOPOLOGY • Cluster communicated - SNITCH and Gossip • Hinted- Handoff • Write path • Read path • Read Repair • Configuration – Cassandra.yaml file
  • 14. SNITCHES & GOSSIP • Snitch - Cassandra does its best not to have more than one replica on the same rack to avoid duplicate • Determines the location of nodes by rack and data center corresponding to the IP addresses • Gossip – Once per second each node gossip’s each other to update themselves
  • 15. • Hinted Hand Off – A recovery mechanism for writes targeting offline nodes • Grace time can be maintained yaml file • Property – max_hint_window_in_ms : 1000 • hinted_handoff enable: true
  • 17. SSTable – Static & Sorted Table • Immutable data file for row storage • Partition is spread across multiple SS Table based on timestamp • Easy Backup – Delete is marked as “TombStones”
  • 18. Read Path • Read Repair – When any node is stale it is marked as read-repair • Property – read_repair_chance
  • 19. Thank You !!! Continue in Next session