SlideShare a Scribd company logo
Unit -3
Cassandra
Cassandra –
Apache Cassandra - An Introduction, Features of Cassandra, CQL Data types, CQLSH,
Keyspaces, CRUD (Create, Read, Update and Delete) Operations, Collections, Using a
Counter, Time to Live (TTL), Alter Commands, Import and Export, Querying System
Tables, Practice Examples
Unit -3 -Features of Cassandra, CQL Data types,  CQLSH, Keyspaces
What is Apache Cassandra?
• Apache Cassandra is an opensource,distributed and decentralized/distributed
storage system (database),for managing very large amounts of structured data
spread out across the world.
• It provides highly available service with no single point of failure.
• Listed below are some of the notable points of Apache Cassandra −
• It is scalable, fault-tolerant, and consistent.
• It is a column-oriented database.
• Its distribution design is basedon Amazon’s Dynamo and its data model on
Google’s Bigtable.
• Created at Facebook, it differs sharply from relational database management
systems.
• Cassandra implements a Dynamo-style replication model with no single point
of failure, but adds a more powerful “column family” data model.
• Cassandra is being used by some of the biggest companies such as Facebook,
Twitter, Cisco, Rackspace, ebay, Twitter, Netflix, and more.
Unit -3 -Features of Cassandra, CQL Data types,  CQLSH, Keyspaces
NoSQLDatabase
• A NoSQL database (sometimes called as Not Only SQL) is a database
that provides a mechanism to store and retrieve data other than the tabular
relations used in relational databases.
• These databases are schema-free, support easy replication, have simple API,
eventually consistent, and can handle huge amounts of data.
• The primary objective of a NoSQL database is to have
• simplicity of design,
• horizontal scaling, and
• finer control over availability.
• NoSql databases use different data structures compared to relational databases.
• It makes some operations faster in NoSQL.
• The suitability of a given NoSQL database depends on the problem it must solve.
Unit -3 -Features of Cassandra, CQL Data types,  CQLSH, Keyspaces
• Besides Cassandra, we have the following NoSQL databases that
are quite popular −
• Apache HBase −
• HBase is an open source, non-relational, distributed database modeled after
Google’s BigTable and is written in Java.
• It is developed as a part of Apache Hadoop project and runs on top of HDFS,
providing BigTable-like capabilities for Hadoop.
• MongoDB −
• MongoDB is a cross-platform document-oriented database system that
avoids using the traditional table-based relational database structure in favor
of JSON-like documents with dynamic schemas making the integration of
data in certain types of applications easier and faster.
Unit -3 -Features of Cassandra, CQL Data types,  CQLSH, Keyspaces
Features of Cassandra
•Cassandra has become so popular because of its outstanding technical
features.
•Elastic scalability − Cassandra is highly scalable; it allows to add more hardware to
accommodate more customers and more data as per requirement.
•Always on architecture − Cassandra has no single point of failure and it is
continuously available for business-critical applications that cannot afford a failure.
•Fast linear-scale performance − Cassandra is linearly scalable, i.e., it increases
your throughput as you increase the number of nodes in the cluster. Therefore it
maintains a quick response time.
•Flexible data storage − Cassandra accommodates all possible data
formats including: structured,semi-structured, and unstructured. It
can dynamically accommodate changes to your data structures according to
your need.
•Easy data distribution − Cassandra provides the flexibility to distribute data where
you need by replicating data across multiple data centers.
•Transaction support − Cassandra supports properties like Atomicity, Consistency,
Isolation, and Durability (ACID).
•Fast writes − Cassandra was designed to run on cheap commodity hardware. It
performs blazingly fast writes and can store hundreds of terabytes of data, without
sacrificing the read efficiency.
Unit -3 -Features of Cassandra, CQL Data types,  CQLSH, Keyspaces
APPLICATIONS
a. Cassandra Storage
• One of the major applications of Cassandra is storage.
• The broad coverage of Cassandra enables the user to store any kind of data.
• This data is stored in various nodes that Cassandra provides. Cisco WebEx, InWorldz, Formspring, OpenX are some companies using
Cassandra for storage.
b. Back-end development applications
• Users can also use Cassandra for back-end development of their applications.
• Many software and applications have front-end and back-end.
• Cassandra provides a wide platform for the development of the back-end. It also provides a huge database of the data.
• Talentica software uses back-end for analytics.
c. Cassandra Monitoring
• Many applications are based on a wide scale of user activity.
• Developers can also use Cassandra to monitor the user activity.
• This user activity can be based on the different parameter, media, art, music etc. CERN, Cloudkick and many such companies use Cassandra
monitoring.
d. Time-series-based applications
• Time-series-based applications are basically the applications in real time.
• These applications include hits on the internet browser, traffic light data, GPS location tracking data etc.
• These applications require heavy write systems.
• Cassandra is best for these kinds of applications.
e. Cassandra Analytics
• Cassandra provides a platform to analyse data collected from various sources.
• These sources may include social media, product feedback catalogues, retail inputs and lookups.
• Developers can use Cassandra to retrieve and analyse this data.
• Ooyala is using Cassandra Analytics applications.
f. Cassandra Messaging
• Nowadays, people use messaging services all the time.
• This eventually, demands a need for a platform to manage these message data.
• Therefore, Cassandra acts as a platform for the message providers for their database management.
Casandra Architecture
• Cassandra takes hardware failure into consideration.
• Thus, it possesses plans of contingency to avoid such
failures.
• It consists of a ring type structure i.e. its nodes are logically
distributed like a ring.
• Thus it has no master or slave nodes.
• It makes replicas of data on several homogenous
nodes of the cluster.
• Each information exchanges among the nodes of the cluster
every second.
• A sequentially written commit log on each node
captures write activity to make sure data durability.
• This data is then indexed and written to memtable.
• Once the memtable is full, we write data on disk on SSTable
data file.
• All the data is partitioned and replicated to other nodes
automatically.
• By using a process known as compaction Cassandra
periodically updates SSTables and remove outdated data.
• A client can make read/write request to any node in the
cluster.
What is Cassandra Architecture?
Storage Components
Key Terms Of Cassandra Architecture
a. Cassandra Nodes
• It is the basic fundamental unit of Cassandra.
• Data stores in these units(computer/server).
b. Cassandra Data Center
• Cassandra Datacenter, basically a collection of related Cassandra nodes.
• A centralized place to accommodate computer and networking system to meet the needs of
an organization’s information technology.
c. Cassandra Rack
• A rack is a unit that contains all the multiple servers all stacked on top of another.
• A node is a single server in a rack.
d. Cassandra Cluster
• A collection of many data centers form a Cassandra cluster.
• It can be spanned to physical locations.
e. Cassandra Commit log
• Every writes operation performs in a commit log to ensure the durability of the data.
• After it has been flushed to an SSTable data archives or delete or change here.
• It is like a crash recovery mechanism.
f. MemTables
• A temporary memory location where we write data during updates or
deletion.
• Data is written in memtables after it has been written in the commit log.
• When the data in memtables is full, we flush them to the disk to SSTables
g. SSTables
• SSTables, the fixed set of data files in which Cassandra writes memtables
periodically.
• These are appended only, which means that we can add data at the end of
the file thus helping in the sequential storage in the disk.
h. Data Replication
• Imagine a situation if one of the nodes goes down in a data center then a part
of information will lost.
• Thus to overcome this limitation, Cassandra made replicas of data on various
nodes. This is called replication.
• This ensures fault tolerance and reliability.
Cassandra Query Language
Users can access Cassandra through its nodes using Cassandra Query Language (CQL). CQL
treats the database (Keyspace) as a container of tables. Programmers use cqlsh: a prompt to
work with CQL or separate application language drivers.
Clients approach any of the nodes for their read-write operations. That node (coordinator) plays
a proxy between the client and the nodes holding the data.
Write Operations
Every write activity of nodes is captured by the commit logs written in the nodes. Later the data
will be captured and stored in the mem-table. Whenever the mem-table is full, data will be
written into the SStable data file. All writes are automatically partitioned and replicated
throughout the cluster. Cassandra periodically consolidates the SSTables, discarding
unnecessary data.
Read Operations
During read operations, Cassandra gets values
from the mem-table and checks the bloom filter
to find the appropriate SSTable that holds the
required data.
What is Cassandra Keyspace?
• In the Cassandra Data Model, Cassandra Keyspace is a container for
data.
• It contains many attributes. The basic attributes are:-
• a. Replication Factor
• It basically signifies the number of copies of a data. In other words, the number of nodes in a
cluster that are copies of a data.
• b. Replica Placement Strategy
• We have strategies such as
• simple strategy (rack-aware strategy),
• old network topology strategy (rack-aware strategy),
• network topology strategy (datacenter-shared strategy).
• c. Cassandra Column Families
• Column Family in Cassandra is a collection of rows, which contains ordered columns.
They represent a structure of the stored data. These Cassandra Column families are
contained in Keyspace.
• There is at least one Column family in each Keyspace.
• The rows in each column are once again the collection of many columns.
• The columns are the basic unit of the data structure in Cassandra.
• Columns have three values stored in them.
• They are key or columns name, timestamp and value.
CQL Data Type
CQLSH
• cqlsh: the CQL shell
• cqlsh is a command line shell for interacting with Cassandra through CQL (the
Cassandra Query Language).
• It is shipped with every Cassandra package, and can be found in the bin/
directory alongside the cassandra executable.
• cqlsh utilizes the Python native protocol driver, and connects to the single node
specified on the command line.
Unit -3 -Features of Cassandra, CQL Data types,  CQLSH, Keyspaces
Cqlsh Commands
Cqlsh has a few commands that allow users to interact with it.
• HELP − Displays help topics for all cqlsh commands.
• CAPTURE − Captures the output of a command and adds it to a file.
• CONSISTENCY − Shows the current consistency level, or sets a new consistency level.
• COPY − Copies data to and from Cassandra.
• DESCRIBE − Describes the current cluster of Cassandra and its objects.
• EXPAND − Expands the output of a query vertically.
• EXIT − Using this command, you can terminate cqlsh.
• PAGING − Enables or disables query paging.
• SHOW − Displays the details of current cqlsh session such as Cassandra version, host, or
data type assumptions.
• SOURCE − Executes a file that contains CQL statements.
• TRACING − Enables or disables request tracing.
CQL Data Definition Commands
• CREATE KEYSPACE − Creates a KeySpace in Cassandra.
• USE − Connects to a created KeySpace.
• ALTER KEYSPACE − Changes the properties of a KeySpace.
• DROP KEYSPACE − Removes a KeySpace
• CREATE TABLE − Creates a table in a KeySpace.
• ALTER TABLE − Modifies the column properties of a table.
• DROP TABLE − Removes a table.
• TRUNCATE − Removes all the data from a table.
• CREATE INDEX − Defines a new index on a single column of a
table.
• DROP INDEX − Deletes a named index.
CQL Data Manipulation Commands
• INSERT − Adds columns for a row in a table.
• UPDATE − Updates a column of a row.
• DELETE − Deletes data from a table.
• BATCH − Executes multiple DML statements at once.
CQL Clauses
• SELECT − This clause reads data from a table
• WHERE − The where clause is used along with select to read a
specific data.
• ORDERBY − The orderby clause is used along with select to read a
specific data in a specific order.
KEY SPACES
With in the keyspace tables can be defined
Table
Keyspace
Table
Table
Unit -3 -Features of Cassandra, CQL Data types,  CQLSH, Keyspaces
•CREATE KEYSPACE “KeySpace Name” WITH replication =
{'class': ‘Strategy name’, 'replication_factor' : ‘No.Of
replicas’};
•CREATE KEYSPACE “KeySpace Name” WITH replication =
{'class': ‘Strategy name’, 'replication_factor' : ‘No.Of
replicas’} AND durable_writes = ‘Boolean value’;
•The CREATE KEYSPACE statement has two properties:
replication and durable_writes.
Creating a Keyspace using Cqlsh
• A keyspace in Cassandra is a namespace that defines data replication
on nodes.
• A cluster contains one keyspace per node.
• Given below is the syntax for creating a keyspace using the statement
CREATE KEYSPACE.
• CREATE KEYSPACE <identifier> WITH <properties>
Replication
• The replication option is to specify the Replica Placement strategy and the number of
replicas wanted. The following table lists all the replica placement strategies.
Strategy name
• Simple Strategy’
• Network Topology
Strategy
Description
Specifies a simple replication factor for the cluster.
Using this option, you can set the replication factor for each data-
center independently.
• Old Network Topology
Strategy
This is a legacy replication strategy.
Using this option, you can instruct Cassandra whether to
use commitlog for updates on the
current KeySpace. This option is not mandatory and by default, it
is set to true.
•Given below is an example of creating a KeySpace.
•Here we are creating a KeySpace named DATADABSE1. We are using
the first replica placement strategy, i.e.., Simple Strategy. And we are
choosing the replication factor to 1 replica.
cqlsh.> CREATE KEYSPACE DATABASE1 WITH replication
={'class':'SimpleStrategy', 'replication_factor' : 3};
Verification
•You can verify whether the table is created or not using the command
Describe.
•If you use this command over keyspaces, it will display all the
keyspaces created as shown below.
•cqlsh> DESCRIBE keyspaces;
DATABASE1 system system_traces
Durable_writes
•By default, the durable_writes properties of a table is set to true,
however it can be set to false. You cannot set this property to
simplex strategy.
Example
•Given below is the example demonstrating the usage of
durable writes property.
•cqlsh> CREATE KEYSPACE test ... WITH REPLICATION
= { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 }
... AND DURABLE_WRITES = false;
Verification
•You can verify whether the durable_writes property of test
KeySpace was set to false by querying the System Keyspace.
This query gives you all the KeySpaces along with their
properties.
•cqlsh> SELECT * FROM system_schema.keyspaces;
Using a Keyspace
•You can use a created KeySpace using the keyword USE. Its
syntax is as follows −
•Syntax:USE <identifier>
Example
•In the following example, we are using the KeySpace
DATABASE1.
•cqlsh> USE DATABASE1;
•cqlsh:DATABASE1>
Altering a KeySpace
• ALTER KEYSPACE can be used to alter properties such as the number of
replicas and the durable_writes of a KeySpace. Given below is the syntax of
this command.
Syntax
ALTER KEYSPACE <identifier> WITH <properties>
i.e.
ALTER KEYSPACE “KeySpace Name” WITH replication = {'class': ‘Strategy name’,
'replication_factor' : ‘No.Of replicas’};
The properties of ALTER KEYSPACE are same as CREATE KEYSPACE. It has
two properties: replication and durable_writes.
Example
•Here we are altering a KeySpace named DATABASE1.
•We are changing the replication factor from 1 to 3.
•cqlsh.> ALTER KEYSPACE DATABASE1 WITH replication =
{'class':'NetworkTopologyStrategy', 'replication_factor' : 3};
•ALTER KEYSPACE test WITH REPLICATION = {'class’ :
'NetworkTopologyStrategy', 'datacenter1' : 3} AND
DURABLE_WRITES
= true;
Dropping a Keyspace
• You can drop a KeySpace using the command DROP KEYSPACE.
Given below is the syntax for dropping a KeySpace.
Syntax
DROP KEYSPACE <identifier>
i.e.
DROP KEYSPACE “KeySpace name”
Example
cqlsh> DROP KEYSPACE DATABASE1;
CRUD Operation

More Related Content

PPTX
Cassandra tutorial
PDF
04-Introduction-to-CassandraDB-.pdf
PPTX
Cassandra - A Basic Introduction Guide
PPTX
CASSANDRA apache cassandra apacheee.pptx
PPTX
cybersecurity notes for mca students for learning
PDF
cassandra
PPT
Cassandra - A Distributed Database System
PPTX
Apache cassandra
Cassandra tutorial
04-Introduction-to-CassandraDB-.pdf
Cassandra - A Basic Introduction Guide
CASSANDRA apache cassandra apacheee.pptx
cybersecurity notes for mca students for learning
cassandra
Cassandra - A Distributed Database System
Apache cassandra

Similar to Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces (20)

PPTX
Apache Cassandra introduction
PPTX
Appache Cassandra
PPTX
Cassandra Learning
PPTX
Big Data_Architecture.pptx
PPTX
Presentation of Apache Cassandra
PPTX
cassandra.pptx
PPTX
Why Cassandra?
PPTX
Column db dol
PDF
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
PPTX
Cassandra implementation for collecting data and presenting data
PPTX
Cassandra an overview
PPTX
Cassndra (4).pptx
PPTX
9. AWS_Databases_Databases_Aws_Cloud.pptx
PDF
Apache Cassandra overview
PDF
Dsm project-h base-cassandra
PPTX
Cassandra - A decentralized storage system
PPT
5266732.ppt
PPTX
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
PDF
No sq lv1_0
PPTX
cassandra_presentation_final
Apache Cassandra introduction
Appache Cassandra
Cassandra Learning
Big Data_Architecture.pptx
Presentation of Apache Cassandra
cassandra.pptx
Why Cassandra?
Column db dol
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
Cassandra implementation for collecting data and presenting data
Cassandra an overview
Cassndra (4).pptx
9. AWS_Databases_Databases_Aws_Cloud.pptx
Apache Cassandra overview
Dsm project-h base-cassandra
Cassandra - A decentralized storage system
5266732.ppt
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
No sq lv1_0
cassandra_presentation_final
Ad

Recently uploaded (20)

PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Insiders guide to clinical Medicine.pdf
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Cell Structure & Organelles in detailed.
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Basic Mud Logging Guide for educational purpose
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Complications of Minimal Access Surgery at WLH
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Anesthesia in Laparoscopic Surgery in India
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Insiders guide to clinical Medicine.pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Final Presentation General Medicine 03-08-2024.pptx
TR - Agricultural Crops Production NC III.pdf
Pre independence Education in Inndia.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Cell Structure & Organelles in detailed.
Pharmacology of Heart Failure /Pharmacotherapy of CHF
O5-L3 Freight Transport Ops (International) V1.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Basic Mud Logging Guide for educational purpose
2.FourierTransform-ShortQuestionswithAnswers.pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
Complications of Minimal Access Surgery at WLH
VCE English Exam - Section C Student Revision Booklet
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Ad

Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces

  • 1. Unit -3 Cassandra Cassandra – Apache Cassandra - An Introduction, Features of Cassandra, CQL Data types, CQLSH, Keyspaces, CRUD (Create, Read, Update and Delete) Operations, Collections, Using a Counter, Time to Live (TTL), Alter Commands, Import and Export, Querying System Tables, Practice Examples
  • 3. What is Apache Cassandra? • Apache Cassandra is an opensource,distributed and decentralized/distributed storage system (database),for managing very large amounts of structured data spread out across the world. • It provides highly available service with no single point of failure. • Listed below are some of the notable points of Apache Cassandra − • It is scalable, fault-tolerant, and consistent. • It is a column-oriented database. • Its distribution design is basedon Amazon’s Dynamo and its data model on Google’s Bigtable. • Created at Facebook, it differs sharply from relational database management systems. • Cassandra implements a Dynamo-style replication model with no single point of failure, but adds a more powerful “column family” data model. • Cassandra is being used by some of the biggest companies such as Facebook, Twitter, Cisco, Rackspace, ebay, Twitter, Netflix, and more.
  • 5. NoSQLDatabase • A NoSQL database (sometimes called as Not Only SQL) is a database that provides a mechanism to store and retrieve data other than the tabular relations used in relational databases. • These databases are schema-free, support easy replication, have simple API, eventually consistent, and can handle huge amounts of data. • The primary objective of a NoSQL database is to have • simplicity of design, • horizontal scaling, and • finer control over availability. • NoSql databases use different data structures compared to relational databases. • It makes some operations faster in NoSQL. • The suitability of a given NoSQL database depends on the problem it must solve.
  • 7. • Besides Cassandra, we have the following NoSQL databases that are quite popular − • Apache HBase − • HBase is an open source, non-relational, distributed database modeled after Google’s BigTable and is written in Java. • It is developed as a part of Apache Hadoop project and runs on top of HDFS, providing BigTable-like capabilities for Hadoop. • MongoDB − • MongoDB is a cross-platform document-oriented database system that avoids using the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas making the integration of data in certain types of applications easier and faster.
  • 9. Features of Cassandra •Cassandra has become so popular because of its outstanding technical features. •Elastic scalability − Cassandra is highly scalable; it allows to add more hardware to accommodate more customers and more data as per requirement. •Always on architecture − Cassandra has no single point of failure and it is continuously available for business-critical applications that cannot afford a failure. •Fast linear-scale performance − Cassandra is linearly scalable, i.e., it increases your throughput as you increase the number of nodes in the cluster. Therefore it maintains a quick response time. •Flexible data storage − Cassandra accommodates all possible data formats including: structured,semi-structured, and unstructured. It can dynamically accommodate changes to your data structures according to your need. •Easy data distribution − Cassandra provides the flexibility to distribute data where you need by replicating data across multiple data centers. •Transaction support − Cassandra supports properties like Atomicity, Consistency, Isolation, and Durability (ACID). •Fast writes − Cassandra was designed to run on cheap commodity hardware. It performs blazingly fast writes and can store hundreds of terabytes of data, without sacrificing the read efficiency.
  • 12. a. Cassandra Storage • One of the major applications of Cassandra is storage. • The broad coverage of Cassandra enables the user to store any kind of data. • This data is stored in various nodes that Cassandra provides. Cisco WebEx, InWorldz, Formspring, OpenX are some companies using Cassandra for storage. b. Back-end development applications • Users can also use Cassandra for back-end development of their applications. • Many software and applications have front-end and back-end. • Cassandra provides a wide platform for the development of the back-end. It also provides a huge database of the data. • Talentica software uses back-end for analytics. c. Cassandra Monitoring • Many applications are based on a wide scale of user activity. • Developers can also use Cassandra to monitor the user activity. • This user activity can be based on the different parameter, media, art, music etc. CERN, Cloudkick and many such companies use Cassandra monitoring. d. Time-series-based applications • Time-series-based applications are basically the applications in real time. • These applications include hits on the internet browser, traffic light data, GPS location tracking data etc. • These applications require heavy write systems. • Cassandra is best for these kinds of applications. e. Cassandra Analytics • Cassandra provides a platform to analyse data collected from various sources. • These sources may include social media, product feedback catalogues, retail inputs and lookups. • Developers can use Cassandra to retrieve and analyse this data. • Ooyala is using Cassandra Analytics applications. f. Cassandra Messaging • Nowadays, people use messaging services all the time. • This eventually, demands a need for a platform to manage these message data. • Therefore, Cassandra acts as a platform for the message providers for their database management.
  • 14. • Cassandra takes hardware failure into consideration. • Thus, it possesses plans of contingency to avoid such failures. • It consists of a ring type structure i.e. its nodes are logically distributed like a ring. • Thus it has no master or slave nodes. • It makes replicas of data on several homogenous nodes of the cluster. • Each information exchanges among the nodes of the cluster every second. • A sequentially written commit log on each node captures write activity to make sure data durability. • This data is then indexed and written to memtable. • Once the memtable is full, we write data on disk on SSTable data file. • All the data is partitioned and replicated to other nodes automatically. • By using a process known as compaction Cassandra periodically updates SSTables and remove outdated data. • A client can make read/write request to any node in the cluster. What is Cassandra Architecture?
  • 16. Key Terms Of Cassandra Architecture a. Cassandra Nodes • It is the basic fundamental unit of Cassandra. • Data stores in these units(computer/server). b. Cassandra Data Center • Cassandra Datacenter, basically a collection of related Cassandra nodes. • A centralized place to accommodate computer and networking system to meet the needs of an organization’s information technology. c. Cassandra Rack • A rack is a unit that contains all the multiple servers all stacked on top of another. • A node is a single server in a rack. d. Cassandra Cluster • A collection of many data centers form a Cassandra cluster. • It can be spanned to physical locations. e. Cassandra Commit log • Every writes operation performs in a commit log to ensure the durability of the data. • After it has been flushed to an SSTable data archives or delete or change here. • It is like a crash recovery mechanism.
  • 17. f. MemTables • A temporary memory location where we write data during updates or deletion. • Data is written in memtables after it has been written in the commit log. • When the data in memtables is full, we flush them to the disk to SSTables g. SSTables • SSTables, the fixed set of data files in which Cassandra writes memtables periodically. • These are appended only, which means that we can add data at the end of the file thus helping in the sequential storage in the disk. h. Data Replication • Imagine a situation if one of the nodes goes down in a data center then a part of information will lost. • Thus to overcome this limitation, Cassandra made replicas of data on various nodes. This is called replication. • This ensures fault tolerance and reliability.
  • 18. Cassandra Query Language Users can access Cassandra through its nodes using Cassandra Query Language (CQL). CQL treats the database (Keyspace) as a container of tables. Programmers use cqlsh: a prompt to work with CQL or separate application language drivers. Clients approach any of the nodes for their read-write operations. That node (coordinator) plays a proxy between the client and the nodes holding the data. Write Operations Every write activity of nodes is captured by the commit logs written in the nodes. Later the data will be captured and stored in the mem-table. Whenever the mem-table is full, data will be written into the SStable data file. All writes are automatically partitioned and replicated throughout the cluster. Cassandra periodically consolidates the SSTables, discarding unnecessary data. Read Operations During read operations, Cassandra gets values from the mem-table and checks the bloom filter to find the appropriate SSTable that holds the required data.
  • 19. What is Cassandra Keyspace? • In the Cassandra Data Model, Cassandra Keyspace is a container for data. • It contains many attributes. The basic attributes are:- • a. Replication Factor • It basically signifies the number of copies of a data. In other words, the number of nodes in a cluster that are copies of a data. • b. Replica Placement Strategy • We have strategies such as • simple strategy (rack-aware strategy), • old network topology strategy (rack-aware strategy), • network topology strategy (datacenter-shared strategy). • c. Cassandra Column Families • Column Family in Cassandra is a collection of rows, which contains ordered columns. They represent a structure of the stored data. These Cassandra Column families are contained in Keyspace. • There is at least one Column family in each Keyspace.
  • 20. • The rows in each column are once again the collection of many columns. • The columns are the basic unit of the data structure in Cassandra. • Columns have three values stored in them. • They are key or columns name, timestamp and value.
  • 22. CQLSH • cqlsh: the CQL shell • cqlsh is a command line shell for interacting with Cassandra through CQL (the Cassandra Query Language). • It is shipped with every Cassandra package, and can be found in the bin/ directory alongside the cassandra executable. • cqlsh utilizes the Python native protocol driver, and connects to the single node specified on the command line.
  • 24. Cqlsh Commands Cqlsh has a few commands that allow users to interact with it. • HELP − Displays help topics for all cqlsh commands. • CAPTURE − Captures the output of a command and adds it to a file. • CONSISTENCY − Shows the current consistency level, or sets a new consistency level. • COPY − Copies data to and from Cassandra. • DESCRIBE − Describes the current cluster of Cassandra and its objects. • EXPAND − Expands the output of a query vertically. • EXIT − Using this command, you can terminate cqlsh. • PAGING − Enables or disables query paging. • SHOW − Displays the details of current cqlsh session such as Cassandra version, host, or data type assumptions. • SOURCE − Executes a file that contains CQL statements. • TRACING − Enables or disables request tracing.
  • 25. CQL Data Definition Commands • CREATE KEYSPACE − Creates a KeySpace in Cassandra. • USE − Connects to a created KeySpace. • ALTER KEYSPACE − Changes the properties of a KeySpace. • DROP KEYSPACE − Removes a KeySpace • CREATE TABLE − Creates a table in a KeySpace. • ALTER TABLE − Modifies the column properties of a table. • DROP TABLE − Removes a table. • TRUNCATE − Removes all the data from a table. • CREATE INDEX − Defines a new index on a single column of a table. • DROP INDEX − Deletes a named index.
  • 26. CQL Data Manipulation Commands • INSERT − Adds columns for a row in a table. • UPDATE − Updates a column of a row. • DELETE − Deletes data from a table. • BATCH − Executes multiple DML statements at once. CQL Clauses • SELECT − This clause reads data from a table • WHERE − The where clause is used along with select to read a specific data. • ORDERBY − The orderby clause is used along with select to read a specific data in a specific order.
  • 27. KEY SPACES With in the keyspace tables can be defined Table Keyspace Table Table
  • 29. •CREATE KEYSPACE “KeySpace Name” WITH replication = {'class': ‘Strategy name’, 'replication_factor' : ‘No.Of replicas’}; •CREATE KEYSPACE “KeySpace Name” WITH replication = {'class': ‘Strategy name’, 'replication_factor' : ‘No.Of replicas’} AND durable_writes = ‘Boolean value’; •The CREATE KEYSPACE statement has two properties: replication and durable_writes. Creating a Keyspace using Cqlsh • A keyspace in Cassandra is a namespace that defines data replication on nodes. • A cluster contains one keyspace per node. • Given below is the syntax for creating a keyspace using the statement CREATE KEYSPACE. • CREATE KEYSPACE <identifier> WITH <properties>
  • 30. Replication • The replication option is to specify the Replica Placement strategy and the number of replicas wanted. The following table lists all the replica placement strategies. Strategy name • Simple Strategy’ • Network Topology Strategy Description Specifies a simple replication factor for the cluster. Using this option, you can set the replication factor for each data- center independently. • Old Network Topology Strategy This is a legacy replication strategy. Using this option, you can instruct Cassandra whether to use commitlog for updates on the current KeySpace. This option is not mandatory and by default, it is set to true.
  • 31. •Given below is an example of creating a KeySpace. •Here we are creating a KeySpace named DATADABSE1. We are using the first replica placement strategy, i.e.., Simple Strategy. And we are choosing the replication factor to 1 replica. cqlsh.> CREATE KEYSPACE DATABASE1 WITH replication ={'class':'SimpleStrategy', 'replication_factor' : 3};
  • 32. Verification •You can verify whether the table is created or not using the command Describe. •If you use this command over keyspaces, it will display all the keyspaces created as shown below. •cqlsh> DESCRIBE keyspaces; DATABASE1 system system_traces
  • 33. Durable_writes •By default, the durable_writes properties of a table is set to true, however it can be set to false. You cannot set this property to simplex strategy. Example •Given below is the example demonstrating the usage of durable writes property. •cqlsh> CREATE KEYSPACE test ... WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 } ... AND DURABLE_WRITES = false;
  • 34. Verification •You can verify whether the durable_writes property of test KeySpace was set to false by querying the System Keyspace. This query gives you all the KeySpaces along with their properties. •cqlsh> SELECT * FROM system_schema.keyspaces;
  • 35. Using a Keyspace •You can use a created KeySpace using the keyword USE. Its syntax is as follows − •Syntax:USE <identifier>
  • 36. Example •In the following example, we are using the KeySpace DATABASE1. •cqlsh> USE DATABASE1; •cqlsh:DATABASE1>
  • 37. Altering a KeySpace • ALTER KEYSPACE can be used to alter properties such as the number of replicas and the durable_writes of a KeySpace. Given below is the syntax of this command. Syntax ALTER KEYSPACE <identifier> WITH <properties> i.e. ALTER KEYSPACE “KeySpace Name” WITH replication = {'class': ‘Strategy name’, 'replication_factor' : ‘No.Of replicas’}; The properties of ALTER KEYSPACE are same as CREATE KEYSPACE. It has two properties: replication and durable_writes.
  • 38. Example •Here we are altering a KeySpace named DATABASE1. •We are changing the replication factor from 1 to 3. •cqlsh.> ALTER KEYSPACE DATABASE1 WITH replication = {'class':'NetworkTopologyStrategy', 'replication_factor' : 3}; •ALTER KEYSPACE test WITH REPLICATION = {'class’ : 'NetworkTopologyStrategy', 'datacenter1' : 3} AND DURABLE_WRITES = true;
  • 39. Dropping a Keyspace • You can drop a KeySpace using the command DROP KEYSPACE. Given below is the syntax for dropping a KeySpace. Syntax DROP KEYSPACE <identifier> i.e. DROP KEYSPACE “KeySpace name” Example cqlsh> DROP KEYSPACE DATABASE1;