SlideShare a Scribd company logo
Indexing Cassandra data in SQL-storage

Indexing Cassandra data in SQL-storage

Kurpilyansky Eugene
SKB Kontur
December 9th, 2013
Indexing Cassandra data in SQL-storage
What do we want?

Suppose, we want to store objects of dierent types in
Cassandra.
Any object has a primary string key.
Cassandra is well-suited for using it as key-value storage.
But we usually want to search among all objects of same type
by some criterion.
Results of searching must be consistent and reect current
state of database.
How can we implement storage which satises these
requirements?
Indexing Cassandra data in SQL-storage
What do we want?

Suppose, we want to store objects of dierent types in
Cassandra.
Any object has a primary string key.
Cassandra is well-suited for using it as key-value storage.
But we usually want to search among all objects of same type
by some criterion.
Results of searching must be consistent and reect current
state of database.
How can we implement storage which satises these
requirements?
Indexing Cassandra data in SQL-storage
What do we want?

Suppose, we want to store objects of dierent types in
Cassandra.
Any object has a primary string key.
Cassandra is well-suited for using it as key-value storage.
But we usually want to search among all objects of same type
by some criterion.
Results of searching must be consistent and reect current
state of database.
How can we implement storage which satises these
requirements?
Indexing Cassandra data in SQL-storage
What do we want?

Suppose, we want to store objects of dierent types in
Cassandra.
Any object has a primary string key.
Cassandra is well-suited for using it as key-value storage.
But we usually want to search among all objects of same type
by some criterion.
Results of searching must be consistent and reect current
state of database.
How can we implement storage which satises these
requirements?
Indexing Cassandra data in SQL-storage
What do we want?

Suppose, we want to store objects of dierent types in
Cassandra.
Any object has a primary string key.
Cassandra is well-suited for using it as key-value storage.
But we usually want to search among all objects of same type
by some criterion.
Results of searching must be consistent and reect current
state of database.
How can we implement storage which satises these
requirements?
Indexing Cassandra data in SQL-storage
Using native Cassandra indexes

We can use native Cassandra indexes.
Advantages
There is no need to support additional storage.
Disadvantages
Every custom query may require new CF-structure for
eective searching.
SQL-indexes are more ecient than Cassandra's indexes.
There exist a lot of complex indexes (e.g. full-text search
indexing).
Indexing Cassandra data in SQL-storage
Using native Cassandra indexes

We can use native Cassandra indexes.
Advantages
There is no need to support additional storage.
Disadvantages
Every custom query may require new CF-structure for
eective searching.
SQL-indexes are more ecient than Cassandra's indexes.
There exist a lot of complex indexes (e.g. full-text search
indexing).
Indexing Cassandra data in SQL-storage
Using native Cassandra indexes

We can use native Cassandra indexes.
Advantages
There is no need to support additional storage.
Disadvantages
Every custom query may require new CF-structure for
eective searching.
SQL-indexes are more ecient than Cassandra's indexes.
There exist a lot of complex indexes (e.g. full-text search
indexing).
Indexing Cassandra data in SQL-storage
Using native Cassandra indexes

We can use native Cassandra indexes.
Advantages
There is no need to support additional storage.
Disadvantages
Every custom query may require new CF-structure for
eective searching.
SQL-indexes are more ecient than Cassandra's indexes.
There exist a lot of complex indexes (e.g. full-text search
indexing).
Indexing Cassandra data in SQL-storage
Using native Cassandra indexes

We can use native Cassandra indexes.
Advantages
There is no need to support additional storage.
Disadvantages
Every custom query may require new CF-structure for
eective searching.
SQL-indexes are more ecient than Cassandra's indexes.
There exist a lot of complex indexes (e.g. full-text search
indexing).
Indexing Cassandra data in SQL-storage
Using synchronization with SQL-storage
Main idea

Main idea
Run IndexService application which is synchronizing data in
SQL-storage with data in Cassandra (constantly,
in background thread).
To perform a search we should make a query to IndexService
which will return the search result after nishing SQL-storage
synchronization process.
Indexing Cassandra data in SQL-storage
Using synchronization with SQL-storage
Main idea

Main idea
Run IndexService application which is synchronizing data in
SQL-storage with data in Cassandra (constantly,
in background thread).
To perform a search we should make a query to IndexService
which will return the search result after nishing SQL-storage
synchronization process.
Indexing Cassandra data in SQL-storage
Using synchronization with SQL-storage
Implementation of EventLog

Create event log
One event per one write-request or delete-request.
Event log sorted by time of event.
Indexing Cassandra data in SQL-storage
Using synchronization with SQL-storage
Implementation of EventLog

Create event log
One event per one write-request or delete-request.
Event log sorted by time of event.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of EventLog

Event
string EventId;
long Timestamp;
string ObjectId;
interface IEventLog
void AddEvent(Event event);
IEnumerableEvent GetEvents(long fromTicks);

New implementation of IObjectStorage
Before writing or deleting objects call method
IEventLog.AddEvent.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of EventLog

Event
string EventId;
long Timestamp;
string ObjectId;
interface IEventLog
void AddEvent(Event event);
IEnumerableEvent GetEvents(long fromTicks);

New implementation of IObjectStorage
Before writing or deleting objects call method
IEventLog.AddEvent.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of EventLog

Event
string EventId;
long Timestamp;
string ObjectId;
interface IEventLog
void AddEvent(Event event);
IEnumerableEvent GetEvents(long fromTicks);

New implementation of IObjectStorage
Before writing or deleting objects call method
IEventLog.AddEvent.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of EventLog

EventLog.AddEvent(Event event)

Create column:
ColumnName = event.Timestamp + ':' + event.EventId
ColumnValue = event
EventLog.GetEvents(long fromTicks)
Execute get_slice from exclusive column for one row.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of EventLog

EventLog.AddEvent(Event event)

Create column:
ColumnName = event.Timestamp + ':' + event.EventId
ColumnValue = event
EventLog.GetEvents(long fromTicks)
Execute get_slice from exclusive column for one row.

We should split all event log into rows using
PartitionInterval to limit size of rows.
PartitionInterval is some constant period of time (e.g.
one hour, or six minutes).
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of EventLog

We should split all event log into rows using
PartitionInterval to limit size of rows.
PartitionInterval is some constant period of time (e.g.
one hour, or six minutes).
EventLog.AddEvent(Event event)

Create column:
RowKey = event.Timestamp / PartitionInterval.Ticks
ColumnName = event.Timestamp + ':' + event.EventId
ColumnValue = event
EventLog.GetEvents(long fromTicks)
Execute get_slice from exclusive column for one or

more rows.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

IndexService
It has a local SQL-storage (one storage per one service replica).
There is one SQL-table per one type of object.
There is one specic SQL-table for storing times of last
synchronization for each type of object.
There is one background thread per one type of object, which
is reading event log and updating SQL-storage.
For executing incoming SQL-query, we can use data from
SQL-storage and a little range of events.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

IndexService
It has a local SQL-storage (one storage per one service replica).
There is one SQL-table per one type of object.
There is one specic SQL-table for storing times of last
synchronization for each type of object.
There is one background thread per one type of object, which
is reading event log and updating SQL-storage.
For executing incoming SQL-query, we can use data from
SQL-storage and a little range of events.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

IndexService
It has a local SQL-storage (one storage per one service replica).
There is one SQL-table per one type of object.
There is one specic SQL-table for storing times of last
synchronization for each type of object.
There is one background thread per one type of object, which
is reading event log and updating SQL-storage.
For executing incoming SQL-query, we can use data from
SQL-storage and a little range of events.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

IndexService
It has a local SQL-storage (one storage per one service replica).
There is one SQL-table per one type of object.
There is one specic SQL-table for storing times of last
synchronization for each type of object.
There is one background thread per one type of object, which
is reading event log and updating SQL-storage.
For executing incoming SQL-query, we can use data from
SQL-storage and a little range of events.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

IndexService
It has a local SQL-storage (one storage per one service replica).
There is one SQL-table per one type of object.
There is one specic SQL-table for storing times of last
synchronization for each type of object.
There is one background thread per one type of object, which
is reading event log and updating SQL-storage.
For executing incoming SQL-query, we can use data from
SQL-storage and a little range of events.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

Periodic synchronization action
Set startSynchronizationTime = NowTicks.
Find all events which should be processed.
Process these events: update SQL-storage and keep
unprocessed events (they should be processed on the next
iteration).
Update time of last synchronization to
startSynchronizationTime in SQL-storage.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

Periodic synchronization action
Set startSynchronizationTime = NowTicks.
Find all events which should be processed.
Process these events: update SQL-storage and keep
unprocessed events (they should be processed on the next
iteration).
Update time of last synchronization to
startSynchronizationTime in SQL-storage.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

Periodic synchronization action
Set startSynchronizationTime = NowTicks.
Find all events which should be processed.
Process these events: update SQL-storage and keep
unprocessed events (they should be processed on the next
iteration).
Update time of last synchronization to
startSynchronizationTime in SQL-storage.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

Periodic synchronization action
Set startSynchronizationTime = NowTicks.
Find all events which should be processed.
Process these events: update SQL-storage and keep
unprocessed events (they should be processed on the next
iteration).
Update time of last synchronization to
startSynchronizationTime in SQL-storage.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

ProcessEvents(Event[] events)

This function actualizes values of related objects in SQL-storage.
Remember, that we update object after creating an event.
So, we can not process some of events at the moment, because
correspoding object isn't updated yet.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

ProcessEvents(Event[] events)

This function actualizes values of related objects in SQL-storage.
Remember, that we update object after creating an event.
So, we can not process some of events at the moment, because
correspoding object isn't updated yet.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

Event[] ProcessEvents(Event[] events)

This function actualizes values of related objects in SQL-storage
and returns events, which have not been processed.
How will this function be implemented?
For every event we should analyze corresponding objects from both
Cassandra and SQL-storage.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

Event[] ProcessEvents(Event[] events)

This function actualizes values of related objects in SQL-storage
and returns events, which have not been processed.
How will this function be implemented?
For every event we should analyze corresponding objects from both
Cassandra and SQL-storage.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

Event[] ProcessEvents(Event[] events)

This function actualizes values of related objects in SQL-storage
and returns events, which have not been processed.
How will this function be implemented?
For every event we should analyze corresponding objects from both
Cassandra and SQL-storage.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

Example 1
event = {Timestamp: 2008}
cassObj = {Timestamp: 2008, School: 'USU'}
sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}

What should we do?
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

Example 1
event = {Timestamp: 2008}
cassObj = {Timestamp: 2008, School: 'USU'}
sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}

Write cassObj in SQL-storage and mark event as processed.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

Example 1
event = {Timestamp: 2008}
cassObj = {Timestamp: 2008, School: 'USU'}
sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}

Write cassObj in SQL-storage and mark event as processed.
Example 2
event = {Timestamp: 2012}
cassObj = {Timestamp: 2008, School: 'USU'}
sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}

What should we do?
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

Example 1
event = {Timestamp: 2008}
cassObj = {Timestamp: 2008, School: 'USU'}
sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}

Write cassObj in SQL-storage and mark event as processed.
Example 2
event = {Timestamp: 2012}
cassObj = {Timestamp: 2008, School: 'USU'}
sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}

Timestamp of event is greater than timestamp of cassObj.
Probably, it needs to wait for updating of object.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

Example 1
event = {Timestamp: 2008}
cassObj = {Timestamp: 2008, School: 'USU'}
sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}

Write cassObj in SQL-storage and mark event as processed.
Example 2
event = {Timestamp: 2012}
cassObj = {Timestamp: 2008, School: 'USU'}
sqlObj = {Timestamp: 2005, School: 'AESÑ USU'}

Timestamp of event is greater than timestamp of cassObj.
Probably, it needs to wait for updating of object.
Write cassObj in SQL-storage and mark event as unprocessed.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

Example 3
event = {Timestamp: 1997}
cassObj is missing
sqlObj is missing

What should we do?
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

Example 3
event = {Timestamp: 1997}
cassObj is missing
sqlObj is missing

Probably, that event corresponds to the creation of object.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

Example 3
event = {Timestamp: 1997}
cassObj is missing
sqlObj is missing

Probably, that event corresponds to the creation of object.
Mark event as unprocessed.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

Example 3
event = {Timestamp: 1997}
cassObj is missing
sqlObj is missing

Probably, that event corresponds to the creation of object.
Mark event as unprocessed.
Example 4
event = {Timestamp: 2017}
cassObj is missing
sqlObj = {Timestamp: 2012, School: 'UFU'}

What should we do?
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

Example 3
event = {Timestamp: 1997}
cassObj is missing
sqlObj is missing

Probably, that event corresponds to the creation of object.
Mark event as unprocessed.
Example 4
event = {Timestamp: 2017}
cassObj is missing
sqlObj = {Timestamp: 2012, School: 'UFU'}

Two cases are possible:
1 That event corresponds to the deletion of object.
2 That event corresponds to the creation of object. sqlObj is
not missing, because there were two operationsin a row: delete
and create.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

Example 4
event = {Timestamp: 2017}
cassObj is missing
sqlObj = {Timestamp: 2012, School: 'UFU'}

Two cases are possible:
1 That event corresponds to the deletion of object.
2 That event corresponds to the creation of object. sqlObj is
not missing, because there were two operationsin a row: delete
and create.
Delete sqlObj from SQL-storage and mark event as unprocessed.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

Event[] ProcessEvents(Event[] events)

Read objects, which occured in these events, from Cassandra and
SQL-storage (some of them can be missing).
For each (event, cassObj, sqlObj) do
If cassObj is not missing

cassObj in SQL-storage
event.Timestamp = cassObj.Timestamp

Save
If

then mark
else mark

event as processed;
event as unprocessed.

else (i.e. cassObj is missing)

sqlObj from SQL-storage
event as unprocessed.

Delete
Mark

if it's not missing.

Return events which has been marked as unprocessed.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

Periodic synchronization action
Set startSynchronizationTime = NowTicks.
Find all events which should be processed.
Process these events: update SQL-storage and keep
unprocessed events (they should be processed on the next
iteration).
Update time of last synchronization to
startSynchronizationTime in SQL-storage.
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

What events should we use as arguments in ProcessEvents
function?
Of course, all unprocessed events from previous iteration.
Also all new events, i.e. IEventLog.GetEvents(fromTicks).
What is fromTicks?
fromTicks = lastSynchronizationTime?
No. Unfortunately, any operation with Cassandra can be
executed for a long time.
This time is limited by

writeTimeout

=

attemptsCount · connectionTimeout.

We should make undertow back, otherwise we can lose some
events.
fromTicks = lastSynchronizationTime - writeTimeout
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

What events should we use as arguments in ProcessEvents
function?
Of course, all unprocessed events from previous iteration.
Also all new events, i.e. IEventLog.GetEvents(fromTicks).
What is fromTicks?
fromTicks = lastSynchronizationTime?
No. Unfortunately, any operation with Cassandra can be
executed for a long time.
This time is limited by

writeTimeout

=

attemptsCount · connectionTimeout.

We should make undertow back, otherwise we can lose some
events.
fromTicks = lastSynchronizationTime - writeTimeout
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

What events should we use as arguments in ProcessEvents
function?
Of course, all unprocessed events from previous iteration.
Also all new events, i.e. IEventLog.GetEvents(fromTicks).
What is fromTicks?
fromTicks = lastSynchronizationTime?
No. Unfortunately, any operation with Cassandra can be
executed for a long time.
This time is limited by

writeTimeout

=

attemptsCount · connectionTimeout.

We should make undertow back, otherwise we can lose some
events.
fromTicks = lastSynchronizationTime - writeTimeout
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

What events should we use as arguments in ProcessEvents
function?
Of course, all unprocessed events from previous iteration.
Also all new events, i.e. IEventLog.GetEvents(fromTicks).
What is fromTicks?
fromTicks = lastSynchronizationTime?
No. Unfortunately, any operation with Cassandra can be
executed for a long time.
This time is limited by

writeTimeout

=

attemptsCount · connectionTimeout.

We should make undertow back, otherwise we can lose some
events.
fromTicks = lastSynchronizationTime - writeTimeout
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

What events should we use as arguments in ProcessEvents
function?
Of course, all unprocessed events from previous iteration.
Also all new events, i.e. IEventLog.GetEvents(fromTicks).
What is fromTicks?
fromTicks = lastSynchronizationTime?
No. Unfortunately, any operation with Cassandra can be
executed for a long time.
This time is limited by

writeTimeout

=

attemptsCount · connectionTimeout.

We should make undertow back, otherwise we can lose some
events.
fromTicks = lastSynchronizationTime - writeTimeout
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

What events should we use as arguments in ProcessEvents
function?
Of course, all unprocessed events from previous iteration.
Also all new events, i.e. IEventLog.GetEvents(fromTicks).
What is fromTicks?
fromTicks = lastSynchronizationTime?
No. Unfortunately, any operation with Cassandra can be
executed for a long time.
This time is limited by

writeTimeout

=

attemptsCount · connectionTimeout.

We should make undertow back, otherwise we can lose some
events.
fromTicks = lastSynchronizationTime - writeTimeout
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

What events should we use as arguments in ProcessEvents
function?
Of course, all unprocessed events from previous iteration.
Also all new events, i.e. IEventLog.GetEvents(fromTicks).
What is fromTicks?
fromTicks = lastSynchronizationTime?
No. Unfortunately, any operation with Cassandra can be
executed for a long time.
This time is limited by

writeTimeout

=

attemptsCount · connectionTimeout.

We should make undertow back, otherwise we can lose some
events.
fromTicks = lastSynchronizationTime - writeTimeout
Indexing Cassandra data in SQL-storage
Synchronizing SQL-storage with Cassandra
Implementation of IndexService

Executing search request
Indexing Cassandra data in SQL-storage

Advantages.
Scalability.
Availability.
Fault tolerance.
Sharding.
Indexing Cassandra data in SQL-storage

Advantages.
Scalability.
Availability.
Fault tolerance.
Sharding.
Indexing Cassandra data in SQL-storage
Questions

Thank you for your attention. Any questions?

More Related Content

PDF
Cassandra in e-commerce
PDF
Log analytics with ELK stack
PDF
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
PPTX
AWS Cyber Security Best Practices
PPTX
NDC Minnesota - Analyzing StackExchange data with Azure Data Lake
PPTX
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
PDF
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
PDF
Plazma - Treasure Data’s distributed analytical database -
Cassandra in e-commerce
Log analytics with ELK stack
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
AWS Cyber Security Best Practices
NDC Minnesota - Analyzing StackExchange data with Azure Data Lake
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Plazma - Treasure Data’s distributed analytical database -

What's hot (20)

PDF
Elasticsearch in Netflix
PPTX
Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...
PDF
What's new in Elasticsearch v5
PDF
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
PPTX
Elastic Stack Introduction
PPTX
Cassandra Summit 2015: Intro to DSE Search
PPTX
Enhancements that will make your sql database roar sp1 edition sql bits 2017
PPTX
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
PPTX
Amazon Athena Hands-On Workshop
PPTX
Azure DocumentDB 101
PPTX
SQL Server R Services: What Every SQL Professional Should Know
PDF
Real Time Analytics with Dse
PPTX
NDC Sydney - Analyzing StackExchange with Azure Data Lake
PPTX
Using Spark to Load Oracle Data into Cassandra
PPTX
Managing Security At 1M Events a Second using Elasticsearch
PDF
Elasticsearch
PDF
What's new in MongoDB 2.6 at India event by company
PPTX
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS Insight
PPTX
Webinar : Nouveautés de MongoDB 3.2
PDF
Helsinki Cassandra Meetup #2: From Postgres to Cassandra
Elasticsearch in Netflix
Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...
What's new in Elasticsearch v5
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Elastic Stack Introduction
Cassandra Summit 2015: Intro to DSE Search
Enhancements that will make your sql database roar sp1 edition sql bits 2017
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Amazon Athena Hands-On Workshop
Azure DocumentDB 101
SQL Server R Services: What Every SQL Professional Should Know
Real Time Analytics with Dse
NDC Sydney - Analyzing StackExchange with Azure Data Lake
Using Spark to Load Oracle Data into Cassandra
Managing Security At 1M Events a Second using Elasticsearch
Elasticsearch
What's new in MongoDB 2.6 at India event by company
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS Insight
Webinar : Nouveautés de MongoDB 3.2
Helsinki Cassandra Meetup #2: From Postgres to Cassandra
Ad

Viewers also liked (20)

PDF
Евгений Тихонов "Введение в Cassandra". Выступление на Cassandrd conf 2013
PPTX
Максим Сычев и Александр Коковин "Как мы переезжали на Cassandra". Выступлени...
PDF
CQL3 and Data Modeling 101 with Apache Cassandra
PDF
Javantura v2 - Data modeling with Apapche Cassandra - Marko Švaljek
PDF
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
PDF
Иван Бурмистров "Строго ориентированная последовательность временных событий"...
PDF
Александр Соловьев "Cassandra in-e commerce". Выступление на Cassandra conf 2013
PDF
Aleksey Yeschenko "Моделирование данных с помощью CQL3". Выступление на Cassa...
PPTX
Ольга Соболева и Кирилл Иванов "Обработка транзакций на примере телекоммуника...
PDF
Олег Анастасьев "Ближе к Cassandra". Выступление на Cassandra Conf 2013
PDF
CodeFest 2013. Анастасьев О. — Класс!ная Cassandra
PPTX
Apache Cassandra, part 2 – data model example, machinery
PDF
Java Runtime: повседневные обязанности JVM
PDF
C*ollege Credit: Data Modeling for Apache Cassandra
PDF
Класс!ная Cassandra
PDF
Александр Сабинин "Организация динамической циклической очереди задач для ска...
PDF
Платформа для видео сроком в квартал. Александр Тоболь.
PPTX
Introduction in CUDA (1-3)
PDF
Cassandra 101
PDF
Франкенштейнизация Voldemort или key-value данные в Одноклассниках. Роман Ан...
Евгений Тихонов "Введение в Cassandra". Выступление на Cassandrd conf 2013
Максим Сычев и Александр Коковин "Как мы переезжали на Cassandra". Выступлени...
CQL3 and Data Modeling 101 with Apache Cassandra
Javantura v2 - Data modeling with Apapche Cassandra - Marko Švaljek
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Иван Бурмистров "Строго ориентированная последовательность временных событий"...
Александр Соловьев "Cassandra in-e commerce". Выступление на Cassandra conf 2013
Aleksey Yeschenko "Моделирование данных с помощью CQL3". Выступление на Cassa...
Ольга Соболева и Кирилл Иванов "Обработка транзакций на примере телекоммуника...
Олег Анастасьев "Ближе к Cassandra". Выступление на Cassandra Conf 2013
CodeFest 2013. Анастасьев О. — Класс!ная Cassandra
Apache Cassandra, part 2 – data model example, machinery
Java Runtime: повседневные обязанности JVM
C*ollege Credit: Data Modeling for Apache Cassandra
Класс!ная Cassandra
Александр Сабинин "Организация динамической циклической очереди задач для ска...
Платформа для видео сроком в квартал. Александр Тоболь.
Introduction in CUDA (1-3)
Cassandra 101
Франкенштейнизация Voldemort или key-value данные в Одноклассниках. Роман Ан...
Ad

Similar to Евгений Курпилянский "Индексирование поверх Cassandra". Выступление на Cassandra conf 2013 (20)

KEY
Data perisistence in iOS
KEY
Data perisistance i_os
PPTX
Elasticsearch
PDF
Intro to Core Data
PDF
Introduction to MongoDB
PPTX
5 Ways to Use Spark to Enrich your Cassandra Environment
PPTX
Getting Started With Elasticsearch In .NET
PPTX
Getting started with Elasticsearch in .net
PPT
Mongo Bb - NoSQL tutorial
PDF
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
PDF
Samedi SQL Québec - La plateforme data de Azure
PPT
Connecting to a REST API in iOS
PDF
Core data in Swfit
PPTX
Spark sql
PPTX
171_74_216_Module_5-Non_relational_database_-mongodb.pptx
PPTX
PPTX
Entity Framework Database and Code First
PPTX
Cassandra Java APIs Old and New – A Comparison
PDF
Using elasticsearch with rails
PPTX
Rails meets no sql
Data perisistence in iOS
Data perisistance i_os
Elasticsearch
Intro to Core Data
Introduction to MongoDB
5 Ways to Use Spark to Enrich your Cassandra Environment
Getting Started With Elasticsearch In .NET
Getting started with Elasticsearch in .net
Mongo Bb - NoSQL tutorial
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
Samedi SQL Québec - La plateforme data de Azure
Connecting to a REST API in iOS
Core data in Swfit
Spark sql
171_74_216_Module_5-Non_relational_database_-mongodb.pptx
Entity Framework Database and Code First
Cassandra Java APIs Old and New – A Comparison
Using elasticsearch with rails
Rails meets no sql

More from it-people (20)

PDF
«Про аналитику и серебряные пули» Александр Подсобляев, Rambler&Co
PDF
«Scrapy internals» Александр Сибиряков, Scrapinghub
PDF
«Отладка в Python 3.6: Быстрее, Выше, Сильнее» Елизавета Шашкова, JetBrains
PDF
«Gevent — быть или не быть?» Александр Мокров, Positive Technologies
PDF
«Ещё один Поиск Яндекса» Александр Кошелев, Яндекс
PDF
«How I Learned to Stop Worrying and Love the BFG: нагрузочное тестирование со...
PDF
«Write once run anywhere — почём опиум для народа?» Игорь Новиков, Scalr
PDF
«Gensim — тематическое моделирование для людей» Иван Меньших, Лев Константино...
PDF
«Тотальный контроль производительности» Михаил Юматов, ЦИАН
PDF
«Детские болезни live-чата» Ольга Сентемова, Тинькофф Банк
PDF
«Микросервисы наносят ответный удар!» Олег Чуркин, Rambler&Co
PDF
«Память и Python. Что надо знать для счастья?» Алексей Кузьмин, ЦНС
PDF
«Что такое serverless-архитектура и как с ней жить?» Николай Марков, Aligned ...
PDF
«Python на острие бритвы: PyPy project» Александр Кошкин, Positive Technologies
PDF
«PyWat. А хорошо ли вы знаете Python?» Александр Швец, Marilyn System
PDF
«(Без)опасный Python», Иван Цыганов, Positive Technologies
PDF
«Python of Things», Кирилл Борисов, Яндекс
PDF
«Как сделать так, чтобы тесты на Swift не причиняли боль» Сычев Александр, Ra...
PDF
«Клиенту и серверу нужно поговорить» Прокопов Никита, Cognician
PDF
«Кошелек или деньги: сложный выбор между памятью и процессором» Алексеенко Иг...
«Про аналитику и серебряные пули» Александр Подсобляев, Rambler&Co
«Scrapy internals» Александр Сибиряков, Scrapinghub
«Отладка в Python 3.6: Быстрее, Выше, Сильнее» Елизавета Шашкова, JetBrains
«Gevent — быть или не быть?» Александр Мокров, Positive Technologies
«Ещё один Поиск Яндекса» Александр Кошелев, Яндекс
«How I Learned to Stop Worrying and Love the BFG: нагрузочное тестирование со...
«Write once run anywhere — почём опиум для народа?» Игорь Новиков, Scalr
«Gensim — тематическое моделирование для людей» Иван Меньших, Лев Константино...
«Тотальный контроль производительности» Михаил Юматов, ЦИАН
«Детские болезни live-чата» Ольга Сентемова, Тинькофф Банк
«Микросервисы наносят ответный удар!» Олег Чуркин, Rambler&Co
«Память и Python. Что надо знать для счастья?» Алексей Кузьмин, ЦНС
«Что такое serverless-архитектура и как с ней жить?» Николай Марков, Aligned ...
«Python на острие бритвы: PyPy project» Александр Кошкин, Positive Technologies
«PyWat. А хорошо ли вы знаете Python?» Александр Швец, Marilyn System
«(Без)опасный Python», Иван Цыганов, Positive Technologies
«Python of Things», Кирилл Борисов, Яндекс
«Как сделать так, чтобы тесты на Swift не причиняли боль» Сычев Александр, Ra...
«Клиенту и серверу нужно поговорить» Прокопов Никита, Cognician
«Кошелек или деньги: сложный выбор между памятью и процессором» Алексеенко Иг...

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
KodekX | Application Modernization Development
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Network Security Unit 5.pdf for BCA BBA.
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
NewMind AI Monthly Chronicles - July 2025
Diabetes mellitus diagnosis method based random forest with bat algorithm
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Spectral efficient network and resource selection model in 5G networks
Reach Out and Touch Someone: Haptics and Empathic Computing
KodekX | Application Modernization Development
The AUB Centre for AI in Media Proposal.docx
MYSQL Presentation for SQL database connectivity
Review of recent advances in non-invasive hemoglobin estimation
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Chapter 3 Spatial Domain Image Processing.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Network Security Unit 5.pdf for BCA BBA.
“AI and Expert System Decision Support & Business Intelligence Systems”
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf

Евгений Курпилянский "Индексирование поверх Cassandra". Выступление на Cassandra conf 2013

  • 1. Indexing Cassandra data in SQL-storage Indexing Cassandra data in SQL-storage Kurpilyansky Eugene SKB Kontur December 9th, 2013
  • 2. Indexing Cassandra data in SQL-storage What do we want? Suppose, we want to store objects of dierent types in Cassandra. Any object has a primary string key. Cassandra is well-suited for using it as key-value storage. But we usually want to search among all objects of same type by some criterion. Results of searching must be consistent and reect current state of database. How can we implement storage which satises these requirements?
  • 3. Indexing Cassandra data in SQL-storage What do we want? Suppose, we want to store objects of dierent types in Cassandra. Any object has a primary string key. Cassandra is well-suited for using it as key-value storage. But we usually want to search among all objects of same type by some criterion. Results of searching must be consistent and reect current state of database. How can we implement storage which satises these requirements?
  • 4. Indexing Cassandra data in SQL-storage What do we want? Suppose, we want to store objects of dierent types in Cassandra. Any object has a primary string key. Cassandra is well-suited for using it as key-value storage. But we usually want to search among all objects of same type by some criterion. Results of searching must be consistent and reect current state of database. How can we implement storage which satises these requirements?
  • 5. Indexing Cassandra data in SQL-storage What do we want? Suppose, we want to store objects of dierent types in Cassandra. Any object has a primary string key. Cassandra is well-suited for using it as key-value storage. But we usually want to search among all objects of same type by some criterion. Results of searching must be consistent and reect current state of database. How can we implement storage which satises these requirements?
  • 6. Indexing Cassandra data in SQL-storage What do we want? Suppose, we want to store objects of dierent types in Cassandra. Any object has a primary string key. Cassandra is well-suited for using it as key-value storage. But we usually want to search among all objects of same type by some criterion. Results of searching must be consistent and reect current state of database. How can we implement storage which satises these requirements?
  • 7. Indexing Cassandra data in SQL-storage Using native Cassandra indexes We can use native Cassandra indexes. Advantages There is no need to support additional storage. Disadvantages Every custom query may require new CF-structure for eective searching. SQL-indexes are more ecient than Cassandra's indexes. There exist a lot of complex indexes (e.g. full-text search indexing).
  • 8. Indexing Cassandra data in SQL-storage Using native Cassandra indexes We can use native Cassandra indexes. Advantages There is no need to support additional storage. Disadvantages Every custom query may require new CF-structure for eective searching. SQL-indexes are more ecient than Cassandra's indexes. There exist a lot of complex indexes (e.g. full-text search indexing).
  • 9. Indexing Cassandra data in SQL-storage Using native Cassandra indexes We can use native Cassandra indexes. Advantages There is no need to support additional storage. Disadvantages Every custom query may require new CF-structure for eective searching. SQL-indexes are more ecient than Cassandra's indexes. There exist a lot of complex indexes (e.g. full-text search indexing).
  • 10. Indexing Cassandra data in SQL-storage Using native Cassandra indexes We can use native Cassandra indexes. Advantages There is no need to support additional storage. Disadvantages Every custom query may require new CF-structure for eective searching. SQL-indexes are more ecient than Cassandra's indexes. There exist a lot of complex indexes (e.g. full-text search indexing).
  • 11. Indexing Cassandra data in SQL-storage Using native Cassandra indexes We can use native Cassandra indexes. Advantages There is no need to support additional storage. Disadvantages Every custom query may require new CF-structure for eective searching. SQL-indexes are more ecient than Cassandra's indexes. There exist a lot of complex indexes (e.g. full-text search indexing).
  • 12. Indexing Cassandra data in SQL-storage Using synchronization with SQL-storage Main idea Main idea Run IndexService application which is synchronizing data in SQL-storage with data in Cassandra (constantly, in background thread). To perform a search we should make a query to IndexService which will return the search result after nishing SQL-storage synchronization process.
  • 13. Indexing Cassandra data in SQL-storage Using synchronization with SQL-storage Main idea Main idea Run IndexService application which is synchronizing data in SQL-storage with data in Cassandra (constantly, in background thread). To perform a search we should make a query to IndexService which will return the search result after nishing SQL-storage synchronization process.
  • 14. Indexing Cassandra data in SQL-storage Using synchronization with SQL-storage Implementation of EventLog Create event log One event per one write-request or delete-request. Event log sorted by time of event.
  • 15. Indexing Cassandra data in SQL-storage Using synchronization with SQL-storage Implementation of EventLog Create event log One event per one write-request or delete-request. Event log sorted by time of event.
  • 16. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of EventLog Event string EventId; long Timestamp; string ObjectId; interface IEventLog void AddEvent(Event event); IEnumerableEvent GetEvents(long fromTicks); New implementation of IObjectStorage Before writing or deleting objects call method IEventLog.AddEvent.
  • 17. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of EventLog Event string EventId; long Timestamp; string ObjectId; interface IEventLog void AddEvent(Event event); IEnumerableEvent GetEvents(long fromTicks); New implementation of IObjectStorage Before writing or deleting objects call method IEventLog.AddEvent.
  • 18. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of EventLog Event string EventId; long Timestamp; string ObjectId; interface IEventLog void AddEvent(Event event); IEnumerableEvent GetEvents(long fromTicks); New implementation of IObjectStorage Before writing or deleting objects call method IEventLog.AddEvent.
  • 19. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of EventLog EventLog.AddEvent(Event event) Create column: ColumnName = event.Timestamp + ':' + event.EventId ColumnValue = event EventLog.GetEvents(long fromTicks) Execute get_slice from exclusive column for one row.
  • 20. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of EventLog EventLog.AddEvent(Event event) Create column: ColumnName = event.Timestamp + ':' + event.EventId ColumnValue = event EventLog.GetEvents(long fromTicks) Execute get_slice from exclusive column for one row. We should split all event log into rows using PartitionInterval to limit size of rows. PartitionInterval is some constant period of time (e.g. one hour, or six minutes).
  • 21. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of EventLog We should split all event log into rows using PartitionInterval to limit size of rows. PartitionInterval is some constant period of time (e.g. one hour, or six minutes). EventLog.AddEvent(Event event) Create column: RowKey = event.Timestamp / PartitionInterval.Ticks ColumnName = event.Timestamp + ':' + event.EventId ColumnValue = event EventLog.GetEvents(long fromTicks) Execute get_slice from exclusive column for one or more rows.
  • 22. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService IndexService It has a local SQL-storage (one storage per one service replica). There is one SQL-table per one type of object. There is one specic SQL-table for storing times of last synchronization for each type of object. There is one background thread per one type of object, which is reading event log and updating SQL-storage. For executing incoming SQL-query, we can use data from SQL-storage and a little range of events.
  • 23. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService IndexService It has a local SQL-storage (one storage per one service replica). There is one SQL-table per one type of object. There is one specic SQL-table for storing times of last synchronization for each type of object. There is one background thread per one type of object, which is reading event log and updating SQL-storage. For executing incoming SQL-query, we can use data from SQL-storage and a little range of events.
  • 24. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService IndexService It has a local SQL-storage (one storage per one service replica). There is one SQL-table per one type of object. There is one specic SQL-table for storing times of last synchronization for each type of object. There is one background thread per one type of object, which is reading event log and updating SQL-storage. For executing incoming SQL-query, we can use data from SQL-storage and a little range of events.
  • 25. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService IndexService It has a local SQL-storage (one storage per one service replica). There is one SQL-table per one type of object. There is one specic SQL-table for storing times of last synchronization for each type of object. There is one background thread per one type of object, which is reading event log and updating SQL-storage. For executing incoming SQL-query, we can use data from SQL-storage and a little range of events.
  • 26. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService IndexService It has a local SQL-storage (one storage per one service replica). There is one SQL-table per one type of object. There is one specic SQL-table for storing times of last synchronization for each type of object. There is one background thread per one type of object, which is reading event log and updating SQL-storage. For executing incoming SQL-query, we can use data from SQL-storage and a little range of events.
  • 27. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService Periodic synchronization action Set startSynchronizationTime = NowTicks. Find all events which should be processed. Process these events: update SQL-storage and keep unprocessed events (they should be processed on the next iteration). Update time of last synchronization to startSynchronizationTime in SQL-storage.
  • 28. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService Periodic synchronization action Set startSynchronizationTime = NowTicks. Find all events which should be processed. Process these events: update SQL-storage and keep unprocessed events (they should be processed on the next iteration). Update time of last synchronization to startSynchronizationTime in SQL-storage.
  • 29. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService Periodic synchronization action Set startSynchronizationTime = NowTicks. Find all events which should be processed. Process these events: update SQL-storage and keep unprocessed events (they should be processed on the next iteration). Update time of last synchronization to startSynchronizationTime in SQL-storage.
  • 30. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService Periodic synchronization action Set startSynchronizationTime = NowTicks. Find all events which should be processed. Process these events: update SQL-storage and keep unprocessed events (they should be processed on the next iteration). Update time of last synchronization to startSynchronizationTime in SQL-storage.
  • 31. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService ProcessEvents(Event[] events) This function actualizes values of related objects in SQL-storage. Remember, that we update object after creating an event. So, we can not process some of events at the moment, because correspoding object isn't updated yet.
  • 32. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService ProcessEvents(Event[] events) This function actualizes values of related objects in SQL-storage. Remember, that we update object after creating an event. So, we can not process some of events at the moment, because correspoding object isn't updated yet.
  • 33. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService Event[] ProcessEvents(Event[] events) This function actualizes values of related objects in SQL-storage and returns events, which have not been processed. How will this function be implemented? For every event we should analyze corresponding objects from both Cassandra and SQL-storage.
  • 34. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService Event[] ProcessEvents(Event[] events) This function actualizes values of related objects in SQL-storage and returns events, which have not been processed. How will this function be implemented? For every event we should analyze corresponding objects from both Cassandra and SQL-storage.
  • 35. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService Event[] ProcessEvents(Event[] events) This function actualizes values of related objects in SQL-storage and returns events, which have not been processed. How will this function be implemented? For every event we should analyze corresponding objects from both Cassandra and SQL-storage.
  • 36. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService Example 1 event = {Timestamp: 2008} cassObj = {Timestamp: 2008, School: 'USU'} sqlObj = {Timestamp: 2005, School: 'AESÑ USU'} What should we do?
  • 37. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService Example 1 event = {Timestamp: 2008} cassObj = {Timestamp: 2008, School: 'USU'} sqlObj = {Timestamp: 2005, School: 'AESÑ USU'} Write cassObj in SQL-storage and mark event as processed.
  • 38. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService Example 1 event = {Timestamp: 2008} cassObj = {Timestamp: 2008, School: 'USU'} sqlObj = {Timestamp: 2005, School: 'AESÑ USU'} Write cassObj in SQL-storage and mark event as processed. Example 2 event = {Timestamp: 2012} cassObj = {Timestamp: 2008, School: 'USU'} sqlObj = {Timestamp: 2005, School: 'AESÑ USU'} What should we do?
  • 39. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService Example 1 event = {Timestamp: 2008} cassObj = {Timestamp: 2008, School: 'USU'} sqlObj = {Timestamp: 2005, School: 'AESÑ USU'} Write cassObj in SQL-storage and mark event as processed. Example 2 event = {Timestamp: 2012} cassObj = {Timestamp: 2008, School: 'USU'} sqlObj = {Timestamp: 2005, School: 'AESÑ USU'} Timestamp of event is greater than timestamp of cassObj. Probably, it needs to wait for updating of object.
  • 40. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService Example 1 event = {Timestamp: 2008} cassObj = {Timestamp: 2008, School: 'USU'} sqlObj = {Timestamp: 2005, School: 'AESÑ USU'} Write cassObj in SQL-storage and mark event as processed. Example 2 event = {Timestamp: 2012} cassObj = {Timestamp: 2008, School: 'USU'} sqlObj = {Timestamp: 2005, School: 'AESÑ USU'} Timestamp of event is greater than timestamp of cassObj. Probably, it needs to wait for updating of object. Write cassObj in SQL-storage and mark event as unprocessed.
  • 41. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService Example 3 event = {Timestamp: 1997} cassObj is missing sqlObj is missing What should we do?
  • 42. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService Example 3 event = {Timestamp: 1997} cassObj is missing sqlObj is missing Probably, that event corresponds to the creation of object.
  • 43. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService Example 3 event = {Timestamp: 1997} cassObj is missing sqlObj is missing Probably, that event corresponds to the creation of object. Mark event as unprocessed.
  • 44. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService Example 3 event = {Timestamp: 1997} cassObj is missing sqlObj is missing Probably, that event corresponds to the creation of object. Mark event as unprocessed. Example 4 event = {Timestamp: 2017} cassObj is missing sqlObj = {Timestamp: 2012, School: 'UFU'} What should we do?
  • 45. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService Example 3 event = {Timestamp: 1997} cassObj is missing sqlObj is missing Probably, that event corresponds to the creation of object. Mark event as unprocessed. Example 4 event = {Timestamp: 2017} cassObj is missing sqlObj = {Timestamp: 2012, School: 'UFU'} Two cases are possible: 1 That event corresponds to the deletion of object. 2 That event corresponds to the creation of object. sqlObj is not missing, because there were two operationsin a row: delete and create.
  • 46. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService Example 4 event = {Timestamp: 2017} cassObj is missing sqlObj = {Timestamp: 2012, School: 'UFU'} Two cases are possible: 1 That event corresponds to the deletion of object. 2 That event corresponds to the creation of object. sqlObj is not missing, because there were two operationsin a row: delete and create. Delete sqlObj from SQL-storage and mark event as unprocessed.
  • 47. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService Event[] ProcessEvents(Event[] events) Read objects, which occured in these events, from Cassandra and SQL-storage (some of them can be missing). For each (event, cassObj, sqlObj) do If cassObj is not missing cassObj in SQL-storage event.Timestamp = cassObj.Timestamp Save If then mark else mark event as processed; event as unprocessed. else (i.e. cassObj is missing) sqlObj from SQL-storage event as unprocessed. Delete Mark if it's not missing. Return events which has been marked as unprocessed.
  • 48. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService Periodic synchronization action Set startSynchronizationTime = NowTicks. Find all events which should be processed. Process these events: update SQL-storage and keep unprocessed events (they should be processed on the next iteration). Update time of last synchronization to startSynchronizationTime in SQL-storage.
  • 49. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService What events should we use as arguments in ProcessEvents function? Of course, all unprocessed events from previous iteration. Also all new events, i.e. IEventLog.GetEvents(fromTicks). What is fromTicks? fromTicks = lastSynchronizationTime? No. Unfortunately, any operation with Cassandra can be executed for a long time. This time is limited by writeTimeout = attemptsCount · connectionTimeout. We should make undertow back, otherwise we can lose some events. fromTicks = lastSynchronizationTime - writeTimeout
  • 50. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService What events should we use as arguments in ProcessEvents function? Of course, all unprocessed events from previous iteration. Also all new events, i.e. IEventLog.GetEvents(fromTicks). What is fromTicks? fromTicks = lastSynchronizationTime? No. Unfortunately, any operation with Cassandra can be executed for a long time. This time is limited by writeTimeout = attemptsCount · connectionTimeout. We should make undertow back, otherwise we can lose some events. fromTicks = lastSynchronizationTime - writeTimeout
  • 51. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService What events should we use as arguments in ProcessEvents function? Of course, all unprocessed events from previous iteration. Also all new events, i.e. IEventLog.GetEvents(fromTicks). What is fromTicks? fromTicks = lastSynchronizationTime? No. Unfortunately, any operation with Cassandra can be executed for a long time. This time is limited by writeTimeout = attemptsCount · connectionTimeout. We should make undertow back, otherwise we can lose some events. fromTicks = lastSynchronizationTime - writeTimeout
  • 52. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService What events should we use as arguments in ProcessEvents function? Of course, all unprocessed events from previous iteration. Also all new events, i.e. IEventLog.GetEvents(fromTicks). What is fromTicks? fromTicks = lastSynchronizationTime? No. Unfortunately, any operation with Cassandra can be executed for a long time. This time is limited by writeTimeout = attemptsCount · connectionTimeout. We should make undertow back, otherwise we can lose some events. fromTicks = lastSynchronizationTime - writeTimeout
  • 53. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService What events should we use as arguments in ProcessEvents function? Of course, all unprocessed events from previous iteration. Also all new events, i.e. IEventLog.GetEvents(fromTicks). What is fromTicks? fromTicks = lastSynchronizationTime? No. Unfortunately, any operation with Cassandra can be executed for a long time. This time is limited by writeTimeout = attemptsCount · connectionTimeout. We should make undertow back, otherwise we can lose some events. fromTicks = lastSynchronizationTime - writeTimeout
  • 54. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService What events should we use as arguments in ProcessEvents function? Of course, all unprocessed events from previous iteration. Also all new events, i.e. IEventLog.GetEvents(fromTicks). What is fromTicks? fromTicks = lastSynchronizationTime? No. Unfortunately, any operation with Cassandra can be executed for a long time. This time is limited by writeTimeout = attemptsCount · connectionTimeout. We should make undertow back, otherwise we can lose some events. fromTicks = lastSynchronizationTime - writeTimeout
  • 55. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService What events should we use as arguments in ProcessEvents function? Of course, all unprocessed events from previous iteration. Also all new events, i.e. IEventLog.GetEvents(fromTicks). What is fromTicks? fromTicks = lastSynchronizationTime? No. Unfortunately, any operation with Cassandra can be executed for a long time. This time is limited by writeTimeout = attemptsCount · connectionTimeout. We should make undertow back, otherwise we can lose some events. fromTicks = lastSynchronizationTime - writeTimeout
  • 56. Indexing Cassandra data in SQL-storage Synchronizing SQL-storage with Cassandra Implementation of IndexService Executing search request
  • 57. Indexing Cassandra data in SQL-storage Advantages. Scalability. Availability. Fault tolerance. Sharding.
  • 58. Indexing Cassandra data in SQL-storage Advantages. Scalability. Availability. Fault tolerance. Sharding.
  • 59. Indexing Cassandra data in SQL-storage Questions Thank you for your attention. Any questions?