SlideShare a Scribd company logo
Advanced Non-Relational Schemas
For Big Data
by Victor Smirnov
“Data dominates. If you've chosen the right data
structures and organized things well, the
algorithms will almost always be self-evident. Data
structures, not algorithms, are central to
programming. (See Brooks p.102).”- Rob Pike,
Notes on Programming in C, 1989
What If You Need...
Dynamic Vector
Searchable Bitmap
File System
Compact Inverted Index
Suffix Tree/Suffix Array
Version History
Name Your Specific Problem...
Non-Relational Schema
Is just a data structure
That uses some Memory Model
Typically, Key->Value mapping
Where Key is an Integer ID
And Value is an arbitrary array of a limited size or
memory block
It's assumed that operations on memory blocks
are atomic.
Partial (Prefix) Sums Tree
Given a sequence of S[0, N) = s0...sn-1 of non-
negative integers
Sum(i) returns X = s0+s1+...+si.
FindLT(X) returns max position i that Sum(i) < X
FindLE(X) is the same, but Sum(i) <= X
We can also define range versions of Sum(i, j) and
FindLT(j, X)
All operations perform in O(log N) time.
Packing Perfect Balanced Tree into an Array
Some Performance Bits
Dynamic Vector
An ordered sequence of elements (bytes,
integers, strings) of size N
Acess(i) is O(log N)
Insert(i, value) is O(log N)
Delete(i) is O(log N)
We can also define batch operations:
Insert(i, value[])
Delete(i, j)
Split(i); Merge(AnotherVector);...
Dynamic Vector
Dynamic Vector Operations
FindLT(i) returns the B where i bounds and offset
j in the block B for i
Acces(i) is O(log N)
Insert(i, value) and Delete(i) are also O(log N)
because the tree is balanced.
File System: Map<ID, Vector<T>>
Maps ID to Vector<T>
Merge all values into one large Dynamic Vector,
in ID order
Create separate “index” sequence from pairs <ID,
Offset> in ID order
We can represent this “index” sequence as two
partial sums tree, for ID and for Offset
We can merge both these trees to one because
they have exactly the same structure: multi-index
balanced partial sums tree.
Map<ID, Vector<T>>
Sharing Tree Structures
Tree structure sharing saves both space and
time: SPMD principle (single program, multiple
data)
We can align partial sum trees with different
structures using interpolation (padding with
zeroes)
We can merge index and data streams (index
and data) of Map<ID, Vector<T>> in one multi-
stream tree.
Merging the trees, we will try to fit index pairs and
corresponding data into the same leaf node of
multi-stream tree.
Multistream Tree Node Layout
Multistream Balanced Tree
ACID
Atomic block operations are not enough
Even simple tree update affects several blocks
So, ACID is mandatory for advanced non-
relational schemas
We can get ACID for free with Multi-Version
Concurrency Control (MVCC)
We need Version History over data blocks
Where each each transaction is a version.
Transaction History via MVCC
Version History Implementation
Version History maps pair <ID, Version> to an ID
of real data block for that version and given ID
We have Map<ID, Vector<Version, ID>>
We can turn it to Version History by sorting each
Vector<Version, ID> (less sapce, slower)
Or by creating additional partial sums tree index
on top of it (more space, but much faster)
We can do it in just one multi-stream balanced
tree
MVCC requires some other data structures but
they can be designed by analogy.
Concurrency Handling
Version History is a
complicated data structure
Concurrent access to it
must be restricted
Split whole Version
History to shards
And shard blocks by ID to
reduce lock contention on
Version History
Distributed Storage and Processing
MVCC is very
Raft/Paxos-friendly
Because of Version
History and MVCC
So we can join storage
nodes to Raft groups
And join Raft groups to
larger groups with 2PC
Using split/merge model
to map data to nodes.
Storage Options
Bonus Slides
Searchable Bitmaps
rank1(n) = number of ones in [0, n)
select1(i) = position of i-th 1 in the bitmap
rank0(n) = number of zeroes in [0, n)
select0(i) = position of i-th 0 in the bitmap
Searchable Bitmap: Structure
Searchable Bitmaps: Persistent
Views
LOUDS Tree
LOUDS Tree: Parent()
Wavelet Tree
Searchable sequence [0...N) for large alphabets
Rank(i, s) returns number of symbols s in [0, i)
Select(k, s) returns position i of k-th symbol s
Insert(i, s), Delere(i), Access(i) – insert, remove
and access the symbol at position i respectively
All these operations have O(log N) time
complexity
By mapping numbers to symbols we can perform
the following lookup operations: >, >=, <, <=, <> in
O(log N) time.
Wavelet Tree: Structure
Wavelet Tree: Rank
Wavelet Tree: Inverted Index
Inverted Index Lookup
Thanks!
More details are at:
http://bit.ly/1D4cj21

More Related Content

PDF
Advanced Non-Relational Schemas For Big Data
PPTX
Tree and graph
PPTX
Array c programming
PPT
DATA MINING:Clustering Types
PPSX
Lecture 1 an introduction to data structure
PPTX
Summer Training Project On Data Structure & Algorithms
PDF
Big data Clustering Algorithms And Strategies
PPTX
Bsc cs ii dfs u-1 introduction to data structure
Advanced Non-Relational Schemas For Big Data
Tree and graph
Array c programming
DATA MINING:Clustering Types
Lecture 1 an introduction to data structure
Summer Training Project On Data Structure & Algorithms
Big data Clustering Algorithms And Strategies
Bsc cs ii dfs u-1 introduction to data structure

What's hot (20)

PPTX
Clique
PPT
Introduction to data structure
PPTX
K-Means clustring @jax
PPT
PPTX
Types of clustering and different types of clustering algorithms
PPTX
Clique and sting
PPTX
Hierarchical Clustering
PPT
Introductiont To Aray,Tree,Stack, Queue
PPT
Data structures and Alogarithims
PPT
Cure, Clustering Algorithm
PPT
3.3 hierarchical methods
PPT
data structure
PPTX
Clustering
PPTX
Path compression
PDF
PPTX
Unsupervised learning (clustering)
PPT
Data structure lecture 1
PDF
Data structure
PPTX
Set Operations - Union Find and Bloom Filters
PPTX
Introduction of Data Structure
Clique
Introduction to data structure
K-Means clustring @jax
Types of clustering and different types of clustering algorithms
Clique and sting
Hierarchical Clustering
Introductiont To Aray,Tree,Stack, Queue
Data structures and Alogarithims
Cure, Clustering Algorithm
3.3 hierarchical methods
data structure
Clustering
Path compression
Unsupervised learning (clustering)
Data structure lecture 1
Data structure
Set Operations - Union Find and Bloom Filters
Introduction of Data Structure
Ad

Similar to «Дизайн продвинутых нереляционных схем для Big Data» (20)

PPTX
Apache Cassandra, part 1 – principles, data model
PDF
AtomiDB FAQs
PPTX
A Tale of Data Pattern Discovery in Parallel
PPTX
Mca ii dfs u-1 introduction to data structure
PPTX
Bca ii dfs u-1 introduction to data structure
PPT
7 data management design
PPTX
Cassandra internals
PDF
JovianDATA MDX Engine Comad oct 22 2011
PDF
Jovian DATA: A multidimensional database for the cloud
PPT
Ado.net &amp; data persistence frameworks
PPTX
Big Data Analytics Module-4 as per vtu .pptx
PPTX
Big data technology unit 3
PPT
Lecture 1 IntroductionToDataStructures_coursematerial_Draft0.01.ppt
PPTX
nosql.pptx
PPT
Dbms relational model
PPT
Distributed Database Management System
PPTX
Data Analytics with R and SQL Server
PPTX
Summer training project on Data structures and algorithms.pptx
PPTX
Lecture 1.pptx
PPTX
python-pandas-For-Data-Analysis-Manipulate.pptx
Apache Cassandra, part 1 – principles, data model
AtomiDB FAQs
A Tale of Data Pattern Discovery in Parallel
Mca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structure
7 data management design
Cassandra internals
JovianDATA MDX Engine Comad oct 22 2011
Jovian DATA: A multidimensional database for the cloud
Ado.net &amp; data persistence frameworks
Big Data Analytics Module-4 as per vtu .pptx
Big data technology unit 3
Lecture 1 IntroductionToDataStructures_coursematerial_Draft0.01.ppt
nosql.pptx
Dbms relational model
Distributed Database Management System
Data Analytics with R and SQL Server
Summer training project on Data structures and algorithms.pptx
Lecture 1.pptx
python-pandas-For-Data-Analysis-Manipulate.pptx
Ad

More from Olga Lavrentieva (20)

PPTX
15 10-22 altoros-fact_sheet_st_v4
PPTX
Сергей Ковалёв (Altoros): Practical Steps to Improve Apache Hive Performance
PPTX
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
PDF
Владимир Иванов (Oracle): Java: прошлое и будущее
PPTX
Brug - Web push notification
PDF
Александр Ломов: "Reactjs + Haskell + Cloud Foundry = Love"
PPTX
Максим Жилинский: "Контейнеры: под капотом"
PPTX
Александр Протасеня: "PayPal. Различные способы интеграции"
PPTX
Сергей Черничков: "Интеграция платежных систем в .Net приложения"
PPTX
Антон Шемерей «Single responsibility principle в руби или почему instanceclas...
PDF
Егор Воробьёв: «Ruby internals»
PDF
Андрей Колешко «Что не так с Rails»
PDF
Дмитрий Савицкий «Ruby Anti Magic Shield»
PPTX
Сергей Алексеев «Парное программирование. Удаленно»
PPTX
«Почему Spark отнюдь не так хорош»
PPTX
«Cassandra data modeling – моделирование данных для NoSQL СУБД Cassandra»
PPTX
«Практика построения высокодоступного решения на базе Cloud Foundry Paas»
PPTX
«Обзор возможностей Open cv»
PPTX
«Нужно больше шин! Eventbus based framework vertx.io»
PDF
«Работа с базами данных с использованием Sequel»
15 10-22 altoros-fact_sheet_st_v4
Сергей Ковалёв (Altoros): Practical Steps to Improve Apache Hive Performance
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
Владимир Иванов (Oracle): Java: прошлое и будущее
Brug - Web push notification
Александр Ломов: "Reactjs + Haskell + Cloud Foundry = Love"
Максим Жилинский: "Контейнеры: под капотом"
Александр Протасеня: "PayPal. Различные способы интеграции"
Сергей Черничков: "Интеграция платежных систем в .Net приложения"
Антон Шемерей «Single responsibility principle в руби или почему instanceclas...
Егор Воробьёв: «Ruby internals»
Андрей Колешко «Что не так с Rails»
Дмитрий Савицкий «Ruby Anti Magic Shield»
Сергей Алексеев «Парное программирование. Удаленно»
«Почему Spark отнюдь не так хорош»
«Cassandra data modeling – моделирование данных для NoSQL СУБД Cassandra»
«Практика построения высокодоступного решения на базе Cloud Foundry Paas»
«Обзор возможностей Open cv»
«Нужно больше шин! Eventbus based framework vertx.io»
«Работа с базами данных с использованием Sequel»

Recently uploaded (20)

PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
A Presentation on Artificial Intelligence
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Cloud computing and distributed systems.
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Electronic commerce courselecture one. Pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Big Data Technologies - Introduction.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Understanding_Digital_Forensics_Presentation.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Advanced methodologies resolving dimensionality complications for autism neur...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
A Presentation on Artificial Intelligence
Per capita expenditure prediction using model stacking based on satellite ima...
Cloud computing and distributed systems.
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
MYSQL Presentation for SQL database connectivity
The Rise and Fall of 3GPP – Time for a Sabbatical?
Agricultural_Statistics_at_a_Glance_2022_0.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Electronic commerce courselecture one. Pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Big Data Technologies - Introduction.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...

«Дизайн продвинутых нереляционных схем для Big Data»

  • 1. Advanced Non-Relational Schemas For Big Data by Victor Smirnov
  • 2. “Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming. (See Brooks p.102).”- Rob Pike, Notes on Programming in C, 1989
  • 3. What If You Need... Dynamic Vector Searchable Bitmap File System Compact Inverted Index Suffix Tree/Suffix Array Version History Name Your Specific Problem...
  • 4. Non-Relational Schema Is just a data structure That uses some Memory Model Typically, Key->Value mapping Where Key is an Integer ID And Value is an arbitrary array of a limited size or memory block It's assumed that operations on memory blocks are atomic.
  • 5. Partial (Prefix) Sums Tree Given a sequence of S[0, N) = s0...sn-1 of non- negative integers Sum(i) returns X = s0+s1+...+si. FindLT(X) returns max position i that Sum(i) < X FindLE(X) is the same, but Sum(i) <= X We can also define range versions of Sum(i, j) and FindLT(j, X) All operations perform in O(log N) time.
  • 6. Packing Perfect Balanced Tree into an Array
  • 8. Dynamic Vector An ordered sequence of elements (bytes, integers, strings) of size N Acess(i) is O(log N) Insert(i, value) is O(log N) Delete(i) is O(log N) We can also define batch operations: Insert(i, value[]) Delete(i, j) Split(i); Merge(AnotherVector);...
  • 10. Dynamic Vector Operations FindLT(i) returns the B where i bounds and offset j in the block B for i Acces(i) is O(log N) Insert(i, value) and Delete(i) are also O(log N) because the tree is balanced.
  • 11. File System: Map<ID, Vector<T>> Maps ID to Vector<T> Merge all values into one large Dynamic Vector, in ID order Create separate “index” sequence from pairs <ID, Offset> in ID order We can represent this “index” sequence as two partial sums tree, for ID and for Offset We can merge both these trees to one because they have exactly the same structure: multi-index balanced partial sums tree.
  • 13. Sharing Tree Structures Tree structure sharing saves both space and time: SPMD principle (single program, multiple data) We can align partial sum trees with different structures using interpolation (padding with zeroes) We can merge index and data streams (index and data) of Map<ID, Vector<T>> in one multi- stream tree. Merging the trees, we will try to fit index pairs and corresponding data into the same leaf node of multi-stream tree.
  • 16. ACID Atomic block operations are not enough Even simple tree update affects several blocks So, ACID is mandatory for advanced non- relational schemas We can get ACID for free with Multi-Version Concurrency Control (MVCC) We need Version History over data blocks Where each each transaction is a version.
  • 18. Version History Implementation Version History maps pair <ID, Version> to an ID of real data block for that version and given ID We have Map<ID, Vector<Version, ID>> We can turn it to Version History by sorting each Vector<Version, ID> (less sapce, slower) Or by creating additional partial sums tree index on top of it (more space, but much faster) We can do it in just one multi-stream balanced tree MVCC requires some other data structures but they can be designed by analogy.
  • 19. Concurrency Handling Version History is a complicated data structure Concurrent access to it must be restricted Split whole Version History to shards And shard blocks by ID to reduce lock contention on Version History
  • 20. Distributed Storage and Processing MVCC is very Raft/Paxos-friendly Because of Version History and MVCC So we can join storage nodes to Raft groups And join Raft groups to larger groups with 2PC Using split/merge model to map data to nodes.
  • 23. Searchable Bitmaps rank1(n) = number of ones in [0, n) select1(i) = position of i-th 1 in the bitmap rank0(n) = number of zeroes in [0, n) select0(i) = position of i-th 0 in the bitmap
  • 28. Wavelet Tree Searchable sequence [0...N) for large alphabets Rank(i, s) returns number of symbols s in [0, i) Select(k, s) returns position i of k-th symbol s Insert(i, s), Delere(i), Access(i) – insert, remove and access the symbol at position i respectively All these operations have O(log N) time complexity By mapping numbers to symbols we can perform the following lookup operations: >, >=, <, <=, <> in O(log N) time.
  • 33. Thanks! More details are at: http://bit.ly/1D4cj21