Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces

Unit -3
Cassandra
Cassandra –
Apache Cassandra - An Introduction, Features of Cassandra, CQL Data types, CQLSH,
Keyspaces, CRUD (Create, Read, Update and Delete) Operations, Collections, Using a
Counter, Time to Live (TTL), Alter Commands, Import and Export, Querying System
Tables, Practice Examples

What is Apache Cassandra?
• Apache Cassandra is an opensource,distributed and decentralized/distributed
storage system (database),for managing very large amounts of structured data
spread out across the world.
• It provides highly available service with no single point of failure.
• Listed below are some of the notable points of Apache Cassandra −
• It is scalable, fault-tolerant, and consistent.
• It is a column-oriented database.
• Its distribution design is basedon Amazon’s Dynamo and its data model on
Google’s Bigtable.
• Created at Facebook, it differs sharply from relational database management
systems.
• Cassandra implements a Dynamo-style replication model with no single point
of failure, but adds a more powerful “column family” data model.
• Cassandra is being used by some of the biggest companies such as Facebook,
Twitter, Cisco, Rackspace, ebay, Twitter, Netflix, and more.

NoSQLDatabase
• A NoSQL database (sometimes called as Not Only SQL) is a database
that provides a mechanism to store and retrieve data other than the tabular
relations used in relational databases.
• These databases are schema-free, support easy replication, have simple API,
eventually consistent, and can handle huge amounts of data.
• The primary objective of a NoSQL database is to have
• simplicity of design,
• horizontal scaling, and
• finer control over availability.
• NoSql databases use different data structures compared to relational databases.
• It makes some operations faster in NoSQL.
• The suitability of a given NoSQL database depends on the problem it must solve.

• Besides Cassandra, we have the following NoSQL databases that
are quite popular −
• Apache HBase −
• HBase is an open source, non-relational, distributed database modeled after
Google’s BigTable and is written in Java.
• It is developed as a part of Apache Hadoop project and runs on top of HDFS,
providing BigTable-like capabilities for Hadoop.
• MongoDB −
• MongoDB is a cross-platform document-oriented database system that
avoids using the traditional table-based relational database structure in favor
of JSON-like documents with dynamic schemas making the integration of
data in certain types of applications easier and faster.

Features of Cassandra
•Cassandra has become so popular because of its outstanding technical
features.
•Elastic scalability − Cassandra is highly scalable; it allows to add more hardware to
accommodate more customers and more data as per requirement.
•Always on architecture − Cassandra has no single point of failure and it is
continuously available for business-critical applications that cannot afford a failure.
•Fast linear-scale performance − Cassandra is linearly scalable, i.e., it increases
your throughput as you increase the number of nodes in the cluster. Therefore it
maintains a quick response time.
•Flexible data storage − Cassandra accommodates all possible data
formats including: structured,semi-structured, and unstructured. It
can dynamically accommodate changes to your data structures according to
your need.
•Easy data distribution − Cassandra provides the flexibility to distribute data where
you need by replicating data across multiple data centers.
•Transaction support − Cassandra supports properties like Atomicity, Consistency,
Isolation, and Durability (ACID).
•Fast writes − Cassandra was designed to run on cheap commodity hardware. It
performs blazingly fast writes and can store hundreds of terabytes of data, without
sacrificing the read efficiency.

a. Cassandra Storage
• One of the major applications of Cassandra is storage.
• The broad coverage of Cassandra enables the user to store any kind of data.
• This data is stored in various nodes that Cassandra provides. Cisco WebEx, InWorldz, Formspring, OpenX are some companies using
Cassandra for storage.
b. Back-end development applications
• Users can also use Cassandra for back-end development of their applications.
• Many software and applications have front-end and back-end.
• Cassandra provides a wide platform for the development of the back-end. It also provides a huge database of the data.
• Talentica software uses back-end for analytics.
c. Cassandra Monitoring
• Many applications are based on a wide scale of user activity.
• Developers can also use Cassandra to monitor the user activity.
• This user activity can be based on the different parameter, media, art, music etc. CERN, Cloudkick and many such companies use Cassandra
monitoring.
d. Time-series-based applications
• Time-series-based applications are basically the applications in real time.
• These applications include hits on the internet browser, traffic light data, GPS location tracking data etc.
• These applications require heavy write systems.
• Cassandra is best for these kinds of applications.
e. Cassandra Analytics
• Cassandra provides a platform to analyse data collected from various sources.
• These sources may include social media, product feedback catalogues, retail inputs and lookups.
• Developers can use Cassandra to retrieve and analyse this data.
• Ooyala is using Cassandra Analytics applications.
f. Cassandra Messaging
• Nowadays, people use messaging services all the time.
• This eventually, demands a need for a platform to manage these message data.
• Therefore, Cassandra acts as a platform for the message providers for their database management.

• Cassandra takes hardware failure into consideration.
• Thus, it possesses plans of contingency to avoid such
failures.
• It consists of a ring type structure i.e. its nodes are logically
distributed like a ring.
• Thus it has no master or slave nodes.
• It makes replicas of data on several homogenous
nodes of the cluster.
• Each information exchanges among the nodes of the cluster
every second.
• A sequentially written commit log on each node
captures write activity to make sure data durability.
• This data is then indexed and written to memtable.
• Once the memtable is full, we write data on disk on SSTable
data file.
• All the data is partitioned and replicated to other nodes
automatically.
• By using a process known as compaction Cassandra
periodically updates SSTables and remove outdated data.
• A client can make read/write request to any node in the
cluster.
What is Cassandra Architecture?

Key Terms Of Cassandra Architecture
a. Cassandra Nodes
• It is the basic fundamental unit of Cassandra.
• Data stores in these units(computer/server).
b. Cassandra Data Center
• Cassandra Datacenter, basically a collection of related Cassandra nodes.
• A centralized place to accommodate computer and networking system to meet the needs of
an organization’s information technology.
c. Cassandra Rack
• A rack is a unit that contains all the multiple servers all stacked on top of another.
• A node is a single server in a rack.
d. Cassandra Cluster
• A collection of many data centers form a Cassandra cluster.
• It can be spanned to physical locations.
e. Cassandra Commit log
• Every writes operation performs in a commit log to ensure the durability of the data.
• After it has been flushed to an SSTable data archives or delete or change here.
• It is like a crash recovery mechanism.

f. MemTables
• A temporary memory location where we write data during updates or
deletion.
• Data is written in memtables after it has been written in the commit log.
• When the data in memtables is full, we flush them to the disk to SSTables
g. SSTables
• SSTables, the fixed set of data files in which Cassandra writes memtables
periodically.
• These are appended only, which means that we can add data at the end of
the file thus helping in the sequential storage in the disk.
h. Data Replication
• Imagine a situation if one of the nodes goes down in a data center then a part
of information will lost.
• Thus to overcome this limitation, Cassandra made replicas of data on various
nodes. This is called replication.
• This ensures fault tolerance and reliability.

Cassandra Query Language
Users can access Cassandra through its nodes using Cassandra Query Language (CQL). CQL
treats the database (Keyspace) as a container of tables. Programmers use cqlsh: a prompt to
work with CQL or separate application language drivers.
Clients approach any of the nodes for their read-write operations. That node (coordinator) plays
a proxy between the client and the nodes holding the data.
Write Operations
Every write activity of nodes is captured by the commit logs written in the nodes. Later the data
will be captured and stored in the mem-table. Whenever the mem-table is full, data will be
written into the SStable data file. All writes are automatically partitioned and replicated
throughout the cluster. Cassandra periodically consolidates the SSTables, discarding
unnecessary data.
Read Operations
During read operations, Cassandra gets values
from the mem-table and checks the bloom filter
to find the appropriate SSTable that holds the
required data.

What is Cassandra Keyspace?
• In the Cassandra Data Model, Cassandra Keyspace is a container for
data.
• It contains many attributes. The basic attributes are:-
• a. Replication Factor
• It basically signifies the number of copies of a data. In other words, the number of nodes in a
cluster that are copies of a data.
• b. Replica Placement Strategy
• We have strategies such as
• simple strategy (rack-aware strategy),
• old network topology strategy (rack-aware strategy),
• network topology strategy (datacenter-shared strategy).
• c. Cassandra Column Families
• Column Family in Cassandra is a collection of rows, which contains ordered columns.
They represent a structure of the stored data. These Cassandra Column families are
contained in Keyspace.
• There is at least one Column family in each Keyspace.

• The rows in each column are once again the collection of many columns.
• The columns are the basic unit of the data structure in Cassandra.
• Columns have three values stored in them.
• They are key or columns name, timestamp and value.

CQLSH
• cqlsh: the CQL shell
• cqlsh is a command line shell for interacting with Cassandra through CQL (the
Cassandra Query Language).
• It is shipped with every Cassandra package, and can be found in the bin/
directory alongside the cassandra executable.
• cqlsh utilizes the Python native protocol driver, and connects to the single node
speciﬁed on the command line.

Cqlsh Commands
Cqlsh has a few commands that allow users to interact with it.
• HELP − Displays help topics for all cqlsh commands.
• CAPTURE − Captures the output of a command and adds it to a file.
• CONSISTENCY − Shows the current consistency level, or sets a new consistency level.
• COPY − Copies data to and from Cassandra.
• DESCRIBE − Describes the current cluster of Cassandra and its objects.
• EXPAND − Expands the output of a query vertically.
• EXIT − Using this command, you can terminate cqlsh.
• PAGING − Enables or disables query paging.
• SHOW − Displays the details of current cqlsh session such as Cassandra version, host, or
data type assumptions.
• SOURCE − Executes a file that contains CQL statements.
• TRACING − Enables or disables request tracing.

CQL Data Definition Commands
• CREATE KEYSPACE − Creates a KeySpace in Cassandra.
• USE − Connects to a created KeySpace.
• ALTER KEYSPACE − Changes the properties of a KeySpace.
• DROP KEYSPACE − Removes a KeySpace
• CREATE TABLE − Creates a table in a KeySpace.
• ALTER TABLE − Modifies the column properties of a table.
• DROP TABLE − Removes a table.
• TRUNCATE − Removes all the data from a table.
• CREATE INDEX − Defines a new index on a single column of a
table.
• DROP INDEX − Deletes a named index.

CQL Data Manipulation Commands
• INSERT − Adds columns for a row in a table.
• UPDATE − Updates a column of a row.
• DELETE − Deletes data from a table.
• BATCH − Executes multiple DML statements at once.
CQL Clauses
• SELECT − This clause reads data from a table
• WHERE − The where clause is used along with select to read a
specific data.
• ORDERBY − The orderby clause is used along with select to read a
specific data in a specific order.

KEY SPACES
With in the keyspace tables can be defined
Table
Keyspace
Table
Table

•CREATE KEYSPACE “KeySpace Name” WITH replication =
{'class': ‘Strategy name’, 'replication_factor' : ‘No.Of
replicas’};
•CREATE KEYSPACE “KeySpace Name” WITH replication =
{'class': ‘Strategy name’, 'replication_factor' : ‘No.Of
replicas’} AND durable_writes = ‘Boolean value’;
•The CREATE KEYSPACE statement has two properties:
replication and durable_writes.
Creating a Keyspace using Cqlsh
• A keyspace in Cassandra is a namespace that defines data replication
on nodes.
• A cluster contains one keyspace per node.
• Given below is the syntax for creating a keyspace using the statement
CREATE KEYSPACE.
• CREATE KEYSPACE <identifier> WITH <properties>

Replication
• The replication option is to specify the Replica Placement strategy and the number of
replicas wanted. The following table lists all the replica placement strategies.
Strategy name
• Simple Strategy’
• Network Topology
Strategy
Description
Specifies a simple replication factor for the cluster.
Using this option, you can set the replication factor for each data-
center independently.
• Old Network Topology
Strategy
This is a legacy replication strategy.
Using this option, you can instruct Cassandra whether to
use commitlog for updates on the
current KeySpace. This option is not mandatory and by default, it
is set to true.

•Given below is an example of creating a KeySpace.
•Here we are creating a KeySpace named DATADABSE1. We are using
the first replica placement strategy, i.e.., Simple Strategy. And we are
choosing the replication factor to 1 replica.
cqlsh.> CREATE KEYSPACE DATABASE1 WITH replication
={'class':'SimpleStrategy', 'replication_factor' : 3};

Verification
•You can verify whether the table is created or not using the command
Describe.
•If you use this command over keyspaces, it will display all the
keyspaces created as shown below.
•cqlsh> DESCRIBE keyspaces;
DATABASE1 system system_traces

Durable_writes
•By default, the durable_writes properties of a table is set to true,
however it can be set to false. You cannot set this property to
simplex strategy.
Example
•Given below is the example demonstrating the usage of
durable writes property.
•cqlsh> CREATE KEYSPACE test ... WITH REPLICATION
= { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 }
... AND DURABLE_WRITES = false;

Verification
•You can verify whether the durable_writes property of test
KeySpace was set to false by querying the System Keyspace.
This query gives you all the KeySpaces along with their
properties.
•cqlsh> SELECT * FROM system_schema.keyspaces;

Using a Keyspace
•You can use a created KeySpace using the keyword USE. Its
syntax is as follows −
•Syntax:USE <identifier>

Example
•In the following example, we are using the KeySpace
DATABASE1.
•cqlsh> USE DATABASE1;
•cqlsh:DATABASE1>

Altering a KeySpace
• ALTER KEYSPACE can be used to alter properties such as the number of
replicas and the durable_writes of a KeySpace. Given below is the syntax of
this command.
Syntax
ALTER KEYSPACE <identifier> WITH <properties>
i.e.
ALTER KEYSPACE “KeySpace Name” WITH replication = {'class': ‘Strategy name’,
'replication_factor' : ‘No.Of replicas’};
The properties of ALTER KEYSPACE are same as CREATE KEYSPACE. It has
two properties: replication and durable_writes.

Example
•Here we are altering a KeySpace named DATABASE1.
•We are changing the replication factor from 1 to 3.
•cqlsh.> ALTER KEYSPACE DATABASE1 WITH replication =
{'class':'NetworkTopologyStrategy', 'replication_factor' : 3};
•ALTER KEYSPACE test WITH REPLICATION = {'class’ :
'NetworkTopologyStrategy', 'datacenter1' : 3} AND
DURABLE_WRITES
= true;

Dropping a Keyspace
• You can drop a KeySpace using the command DROP KEYSPACE.
Given below is the syntax for dropping a KeySpace.
Syntax
DROP KEYSPACE <identifier>
i.e.
DROP KEYSPACE “KeySpace name”
Example
cqlsh> DROP KEYSPACE DATABASE1;

Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces

More Related Content

Similar to Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces (20)

Recently uploaded (20)

Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces