SlideShare a Scribd company logo
Sridhar Valaguru
(c) Copy right 2013 contact extenddb@gmail.com
 Motivation
 Example Usecases
 eXTend DB.
 Design
 Extensibility
 Limitations
 Morph DB – KeyValue pair
 Design and implementation
 Block management design
 implementation
 Caches
 Unique Approach towards the Database
(c) Copy right 2013 contact extenddb@gmail.com
 No SQL document Database like mongo steadily becoming
popular.
 Mongo DB features suitable for wide variety of applications
over traditional sql databases
 JSON-style documents with dynamic schemas offer simplicity
and power.
 Rich, document-based queries.
 Index on any attribute.
 Fast in-place updates and atomic modifiers.
 Features like replication , sharding , High availability , Map
reduce etc. are not applicable in this context.
(c) Copy right 2013 contact extenddb@gmail.com
 Features mentioned previously are also applicable for stand-alone
applications installed/running on user machines.
 There are few problems in using Mongo DB in such applications.
 External dependency on Mongo DB .
 User needs to install it separately.
 User has to manage Mongo DB for the application to work.
 Possibility of name space collision among different unrelated applications.
 Unnecessary client-server communication impacts performance.
 So there is need for an embedded (into application) document database
with similar features as Mongo DB. Basically sqlite equivalent of Mongo
DB.
 An extensible database is a plus.
(c) Copy right 2013 contact extenddb@gmail.com
 Logging library -
 Each log file entry could be an object in the database.
 Indexes could be created at later point in time to analyze log
files using rich querying.
 File tagging application -
 Each file information could be stored as an object to the DB.
 With tags attached removed dynamically.
 Indexed data could extend the object with new fields.
 Querying / searching based on tags or indexed data.
(c) Copy right 2013 contact extenddb@gmail.com
 Single node user-space NFS server –
 Stores all metadata information into the database.
 Maps filehandle to object/file attributes.
 Objects accessed with filehandles and/or parent
file handle and name.
 File data stored separately outside the database
using object-id based name space.
 Any other stand-alone applications .
(c) Copy right 2013 contact extenddb@gmail.com
 No SQL Document Database
 Stores BSON documents
 Embedded into process
 Mongo DB like querying interface
 Extensible
 Each database collection is stored into set of
files in user specified directory.
(c) Copy right 2013 contact extenddb@gmail.com
Application
DataBase API
Query related Management api
Query
Optimizer
Extensible
Query Module
Storage Layer
Tokyo cabinet Morph DB
In-memory
Key value DB
(c) Copy right 2013 contact extenddb@gmail.com
 Data is stored in 3 types of files backed up by storage
layer key value database.
 Descriptor DB –
 Holds information about the list of indexes in the database.
 Main DB –
 Stores all the document information with generated BSON
object ids as keys.
 BSON object id uniquely identifies the object in the
collection.
 Index DB –
 Stores references of objects with particular field values as
key and list of object ids.
(c) Copy right 2013 contact extenddb@gmail.com
 Simple weight based query optimizer.
 Index with the least number of objects is
chosen.
(c) Copy right 2013 contact extenddb@gmail.com
 Provides 2 functionalities for database engine
 Given a query in bson object format returns a list
of indexes which can be used for the particular
query.
▪ This is in-turn used by query optimizer for finding the
best index to use.
 Takes bson object and a query bson object returns
whether the object matches the query or not.
(c) Copy right 2013 contact extenddb@gmail.com
 Query module implements comparison
operator between 2 bson elements.
 Has no knowledge of storage layer , just
operates on the given bson objects.
 Can be overridden by users by registering user
specified comparison operators.
 This could be very useful for custom binary
data stored in database.
 Different query operators are implemented in
the module for providing complex querying.
(c) Copy right 2013 contact extenddb@gmail.com
 Operators let a object be selected in different ways other than just by
comparing the value is equal to the value in query.
 E.g.
 {‘a’:3} will match and all documents which has field a with value 3. This is a
simple query.
 But if we want to get all objects whose values are greater than 3 we cant
accomplish this with simple query.
 {‘a’:{‘$gt’:3}} is the query which will match all the documents where the value
is greater than 3.
 Here operator ‘$gt’ is given meaning “greater than”.
 Any field name starting with “$” is considered as an operator and the rest
of the name gives the name of the operator.
 Querying function looks up for the operator in the registered list and
invokes the handler to check whether the field matches the criteria in
query.
 By default various operators like $lt, $lte ,$nin, $all , $in, $exists have
been implemented.
(c) Copy right 2013 contact extenddb@gmail.com
 Custom operators can be registered with the
query module.
 When a particular query comes the
corresponding user call back will be invoked.
 Call back takes value of the field as one
parameter and value of the query value as other
one and returns boolean.
 This way query language of eXTend DB can be
extended without having the need to edit the
code of the database or wait for the developer to
implement the features.
(c) Copy right 2013 contact extenddb@gmail.com
 Abstract layer which provides key-value storage.
 Isolates data storage from the rest of the
database engine.
 Only place where the data is stored.
 Backend can be any key-value pair database.
 E.g.
 Tokyo cabinet
 Morph DB
 In memory key value pair
 Currently tokyo-cabinet is the default key-value
pair backend which stores all the data to files.
 Also Morph DB backend is almost complete.
(c) Copy right 2013 contact extenddb@gmail.com
 Different backends can be chosen depending
the type of data stored.
 E.g.
 Index Databases can be stored completely in
memory which will provide fast access.
 Main DB could be stored using tokyo cabinet back
end.
 For persistent indexes Morph DB could be used .
(c) Copy right 2013 contact extenddb@gmail.com
 Easy to use mongodb like embedded
database.
 Extensible storage backends .
 Extensible query language.
 Completely customizable query behavior .
(c) Copy right 2013 contact extenddb@gmail.com
 Tokyocabinet updates are not in-place
 Every time the object is expanded old space in
file is discarded new space is found.
 This is a serious problem for heavy update
workload.
 Tokyo cabinet by default writes to memory need
to do sync to sync the data to file.
 If application crashes without sync data is lost.
 Sync calls are costly.
 Incase sync gets called after every insert the
performance is very low.
(c) Copy right 2013 contact extenddb@gmail.com
 Morph DB is a key value pair database aimed
at solving the limitations of tokyo cabinet.
 Aims of Morph DB –
 Fast in-place updates / object expansion.
 A fast block management layer which could reuse
storage used by deleted objects.
 Once written data read should not be slowed
down by block management layer.
 Writes all data directly to the file while
maintaining performance.
(c) Copy right 2013 contact extenddb@gmail.com
 B+ Tree implementation on top of block
management layer.
 Provides generation based cursors.
 Cursors can work while DB is being modified.
 Can search for values in a range of keys.
(c) Copy right 2013 contact extenddb@gmail.com
 Provides 2 basic functionalities
 Data Write –
▪ Finds allocates resources in file
▪ Writes the data to suitable location(s).
▪ Returns an address where the data is written.
▪ Upper layer must store this reference to read the data back.
▪ Data is not interpreted.
 Data Read –
▪ Given the address which was earlier returned by the Data
write reads data from the offset or links of offsets
▪ Verifies the checksum of each piece
▪ Returns stitched object to the caller.
(c) Copy right 2013 contact extenddb@gmail.com
 File storage is managed in terms of resource
clusters.
 Each resource cluster contains some header
information and the resources followed by it.
 Unique property of resources is that it is of
variable size instead of a fixed single block size
like in various solutions.
 Individual resources (block) size varies from 128
bytes to 4MB.
 This range of block sizes makes it suitable for
data of various sizes from very small values to 16
MB.
(c) Copy right 2013 contact extenddb@gmail.com
 Clusters are allocated on-demand for a particular
type of resource.
 Cluster sizes start from 128K and subsequent
cluster sizes are double the previous one capped
by 32 MB.
 Increasing cluster sizes makes the database file
size small initially and grows along with the data
size.
 In case of small clusters header information
could be significant size compared to the
resource sizes.
(c) Copy right 2013 contact extenddb@gmail.com
 Data is stored in list of blocks each stores reference to
next block in the list.
 Each chunk stores the checksum of the entire data.
 This helps in identifying corrupt or partially updated
links.
 When data is expanded according to the expanding
data size suitable block is allocated and linked.
 There is a cap on link counts there can be maximum 4
links.
 Once data spreads across 4 links data is automatically
defragmented and a suitable block bigger is found for
the entire block which will reduce number of links.
(c) Copy right 2013 contact extenddb@gmail.com
 Block allocation takes a block size parameter .
 A free block of specified size found in the bitmap
residing in cluster header and the address is returned.
 DiskAddr structure identifying resource is 64 bit , bit-
field structure.
 56-bit component directly gives the address of the
resources .
 So no translation of address in IO path.
 4-bit type field indicates the resource size 0 for 128
and 1 for 256 and so on.
 Type field helps identifying the resource when freeing.
(c) Copy right 2013 contact extenddb@gmail.com
 Block allocation need to be extremely fast.
 Caches used to remember last cluster from
which data was allocated cluster.
 One such cache for each resource type.
 Cache state makes allocation O(1) in case of
series of allocations.
 Freeing resource will set the cache state to
point to the lowest offset resource.
 Always search continues in the next clusters.
(c) Copy right 2013 contact extenddb@gmail.com
 System calls (mostly pread/pwrite were used)
are very fast in some machines(core i3
processors). Doing large number of small
writes were not a problem.
 In other machines (core 2 Duo) system calls
were significantly slower and huge percentage
of time was spent in system calls.
 Memory mapped IOs were significantly faster.
(c) Copy right 2013 contact extenddb@gmail.com
 Mapping entire file has few problems.
 File sizes can grow
 In 32-bit machines will limit the database size.
 Unused regions could be mapped and kernel could choose to remove
wrong set of pages.
 To avoid above draw backs list of mmapped blocks were used.
 Number was limited by 10 to limit the virtual address usage.
 Least recently used mmapped region is removed if new region is to
be mmapped.
 Whenever a cluster is allocated whole cluster is mmapped.
 For each IO this list is checked if it is a hit simple memcpy is done
or else fall back to old system call.
 This improved the performance by almost 50 % in slow machines.
(c) Copy right 2013 contact extenddb@gmail.com
 B+ Tree uses the block management layer to
store its internal nodes and data.
 Block manager has no information about how
the blocks are going to be used.
 Provides a slot for the upper layer to store a
reference to its superblock.
 Internal nodes stores all keys of the nodes and
references to corresponding child nodes/values.
 Parent pointer is not maintained on-disk this
makes the splitting of nodes fast.
 Parent child relation ship is established during
search.
(c) Copy right 2013 contact extenddb@gmail.com
 All the nodes being modified are in-memory.
 Nodes are pinned in cache.
 After each modification node is written back
to file.
(c) Copy right 2013 contact extenddb@gmail.com
 Concurrent modifications can be allowed by taking write
lock on root of sub-tree which could be modified by
insert/delete.
 An insert in B+Tree could modify few to all nodes in the
path from root to the leaf.
 The highest level which will be modified could be found by
whether child could overrun by the insert.
 If child is overrun then parent will be modified.
 So instead of locking root we just need to lock the subtree
whose root is the top most parent which could be
modified.
 Similar speculation could be done for deletes.
 All the nodes from root to the first child which could be
modified will be locked for read.
(c) Copy right 2013 contact extenddb@gmail.com
(c) Copy right 2013 contact extenddb@gmail.com
 Tokyocabinet -
http://fallabs.com/tokyocabinet/spex-en.html
 Mongo DB - http://www.mongodb.org/
(c) Copy right 2013 contact extenddb@gmail.com
(c) Copy right 2013 contact extenddb@gmail.com

More Related Content

PDF
Performance analysis of MongoDB and HBase
PDF
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
PDF
Mongodb
PPTX
Document databases
PPSX
Annotating search results from web databases-IEEE Transaction Paper 2013
PDF
Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYC
PDF
Data mining model for the data retrieval from central server configuration
PDF
Mdb dn 2017_17_bi_connector2
Performance analysis of MongoDB and HBase
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
Mongodb
Document databases
Annotating search results from web databases-IEEE Transaction Paper 2013
Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYC
Data mining model for the data retrieval from central server configuration
Mdb dn 2017_17_bi_connector2

What's hot (20)

PDF
Applied Semantic Search with Microsoft SQL Server
DOCX
Annotating search results from web databases
PDF
C1803041317
PPTX
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
PDF
M.sc. engg (ict) admission guide database management system 4
PDF
Ch2 the application layer protocols_http_3
PPT
ADO CONTROLS - Database usage
PDF
Vision Based Deep Web data Extraction on Nested Query Result Records
PDF
Database management system session 6
PDF
IRJET- Data Retrieval using Master Resource Description Framework
PDF
SQL Server 2012 - Semantic Search
PDF
Using Page Size for Controlling Duplicate Query Results in Semantic Web
PDF
Blazing Fast Analytics with MongoDB & Spark
PDF
Asp.net interview questions
PDF
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
PPTX
Benefits of Using MongoDB Over RDBMSs
PPT
Object Relational Mapping with LINQ To SQL
PDF
Nosql part 2
PDF
RELEVANT UPDATED DATA RETRIEVAL ARCHITECTURAL MODEL FOR CONTINUOUS TEXT EXTRA...
PDF
Relevant updated data retrieval architectural model for continous text extrac...
Applied Semantic Search with Microsoft SQL Server
Annotating search results from web databases
C1803041317
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
M.sc. engg (ict) admission guide database management system 4
Ch2 the application layer protocols_http_3
ADO CONTROLS - Database usage
Vision Based Deep Web data Extraction on Nested Query Result Records
Database management system session 6
IRJET- Data Retrieval using Master Resource Description Framework
SQL Server 2012 - Semantic Search
Using Page Size for Controlling Duplicate Query Results in Semantic Web
Blazing Fast Analytics with MongoDB & Spark
Asp.net interview questions
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
Benefits of Using MongoDB Over RDBMSs
Object Relational Mapping with LINQ To SQL
Nosql part 2
RELEVANT UPDATED DATA RETRIEVAL ARCHITECTURAL MODEL FOR CONTINUOUS TEXT EXTRA...
Relevant updated data retrieval architectural model for continous text extrac...
Ad

Similar to Extend db (20)

PPTX
Mongo db operations_v2
PPTX
Python Ireland Conference 2016 - Python and MongoDB Workshop
PPTX
NOSQL and MongoDB Database
PDF
No sq lv1_0
PDF
MongoDB is the MashupDB
PPTX
KEY
MongoDB - Ruby document store that doesn't rhyme with ouch
PPTX
Mongo db
PPTX
MongoDB
PDF
Which Questions We Should Have
PDF
Mongo db transcript
PPTX
MongoDB Evenings DC: MongoDB - The New Default Database for Giant Ideas
PDF
Introduction to MongoDB Basics from SQL to NoSQL
PPTX
PPTX
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
PDF
Drupal Day 2011 - Drupal and the rise of the documents
PPTX
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
PPTX
Mongo db
PPTX
Nosql
Mongo db operations_v2
Python Ireland Conference 2016 - Python and MongoDB Workshop
NOSQL and MongoDB Database
No sq lv1_0
MongoDB is the MashupDB
MongoDB - Ruby document store that doesn't rhyme with ouch
Mongo db
MongoDB
Which Questions We Should Have
Mongo db transcript
MongoDB Evenings DC: MongoDB - The New Default Database for Giant Ideas
Introduction to MongoDB Basics from SQL to NoSQL
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
Drupal Day 2011 - Drupal and the rise of the documents
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
Mongo db
Nosql
Ad

Recently uploaded (20)

PDF
cuic standard and advanced reporting.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPT
Teaching material agriculture food technology
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
cuic standard and advanced reporting.pdf
Electronic commerce courselecture one. Pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Network Security Unit 5.pdf for BCA BBA.
Diabetes mellitus diagnosis method based random forest with bat algorithm
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Teaching material agriculture food technology
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Approach and Philosophy of On baking technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
The AUB Centre for AI in Media Proposal.docx
Spectral efficient network and resource selection model in 5G networks
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Big Data Technologies - Introduction.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Review of recent advances in non-invasive hemoglobin estimation
sap open course for s4hana steps from ECC to s4
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Profit Center Accounting in SAP S/4HANA, S4F28 Col11

Extend db

  • 1. Sridhar Valaguru (c) Copy right 2013 contact extenddb@gmail.com
  • 2.  Motivation  Example Usecases  eXTend DB.  Design  Extensibility  Limitations  Morph DB – KeyValue pair  Design and implementation  Block management design  implementation  Caches  Unique Approach towards the Database (c) Copy right 2013 contact extenddb@gmail.com
  • 3.  No SQL document Database like mongo steadily becoming popular.  Mongo DB features suitable for wide variety of applications over traditional sql databases  JSON-style documents with dynamic schemas offer simplicity and power.  Rich, document-based queries.  Index on any attribute.  Fast in-place updates and atomic modifiers.  Features like replication , sharding , High availability , Map reduce etc. are not applicable in this context. (c) Copy right 2013 contact extenddb@gmail.com
  • 4.  Features mentioned previously are also applicable for stand-alone applications installed/running on user machines.  There are few problems in using Mongo DB in such applications.  External dependency on Mongo DB .  User needs to install it separately.  User has to manage Mongo DB for the application to work.  Possibility of name space collision among different unrelated applications.  Unnecessary client-server communication impacts performance.  So there is need for an embedded (into application) document database with similar features as Mongo DB. Basically sqlite equivalent of Mongo DB.  An extensible database is a plus. (c) Copy right 2013 contact extenddb@gmail.com
  • 5.  Logging library -  Each log file entry could be an object in the database.  Indexes could be created at later point in time to analyze log files using rich querying.  File tagging application -  Each file information could be stored as an object to the DB.  With tags attached removed dynamically.  Indexed data could extend the object with new fields.  Querying / searching based on tags or indexed data. (c) Copy right 2013 contact extenddb@gmail.com
  • 6.  Single node user-space NFS server –  Stores all metadata information into the database.  Maps filehandle to object/file attributes.  Objects accessed with filehandles and/or parent file handle and name.  File data stored separately outside the database using object-id based name space.  Any other stand-alone applications . (c) Copy right 2013 contact extenddb@gmail.com
  • 7.  No SQL Document Database  Stores BSON documents  Embedded into process  Mongo DB like querying interface  Extensible  Each database collection is stored into set of files in user specified directory. (c) Copy right 2013 contact extenddb@gmail.com
  • 8. Application DataBase API Query related Management api Query Optimizer Extensible Query Module Storage Layer Tokyo cabinet Morph DB In-memory Key value DB (c) Copy right 2013 contact extenddb@gmail.com
  • 9.  Data is stored in 3 types of files backed up by storage layer key value database.  Descriptor DB –  Holds information about the list of indexes in the database.  Main DB –  Stores all the document information with generated BSON object ids as keys.  BSON object id uniquely identifies the object in the collection.  Index DB –  Stores references of objects with particular field values as key and list of object ids. (c) Copy right 2013 contact extenddb@gmail.com
  • 10.  Simple weight based query optimizer.  Index with the least number of objects is chosen. (c) Copy right 2013 contact extenddb@gmail.com
  • 11.  Provides 2 functionalities for database engine  Given a query in bson object format returns a list of indexes which can be used for the particular query. ▪ This is in-turn used by query optimizer for finding the best index to use.  Takes bson object and a query bson object returns whether the object matches the query or not. (c) Copy right 2013 contact extenddb@gmail.com
  • 12.  Query module implements comparison operator between 2 bson elements.  Has no knowledge of storage layer , just operates on the given bson objects.  Can be overridden by users by registering user specified comparison operators.  This could be very useful for custom binary data stored in database.  Different query operators are implemented in the module for providing complex querying. (c) Copy right 2013 contact extenddb@gmail.com
  • 13.  Operators let a object be selected in different ways other than just by comparing the value is equal to the value in query.  E.g.  {‘a’:3} will match and all documents which has field a with value 3. This is a simple query.  But if we want to get all objects whose values are greater than 3 we cant accomplish this with simple query.  {‘a’:{‘$gt’:3}} is the query which will match all the documents where the value is greater than 3.  Here operator ‘$gt’ is given meaning “greater than”.  Any field name starting with “$” is considered as an operator and the rest of the name gives the name of the operator.  Querying function looks up for the operator in the registered list and invokes the handler to check whether the field matches the criteria in query.  By default various operators like $lt, $lte ,$nin, $all , $in, $exists have been implemented. (c) Copy right 2013 contact extenddb@gmail.com
  • 14.  Custom operators can be registered with the query module.  When a particular query comes the corresponding user call back will be invoked.  Call back takes value of the field as one parameter and value of the query value as other one and returns boolean.  This way query language of eXTend DB can be extended without having the need to edit the code of the database or wait for the developer to implement the features. (c) Copy right 2013 contact extenddb@gmail.com
  • 15.  Abstract layer which provides key-value storage.  Isolates data storage from the rest of the database engine.  Only place where the data is stored.  Backend can be any key-value pair database.  E.g.  Tokyo cabinet  Morph DB  In memory key value pair  Currently tokyo-cabinet is the default key-value pair backend which stores all the data to files.  Also Morph DB backend is almost complete. (c) Copy right 2013 contact extenddb@gmail.com
  • 16.  Different backends can be chosen depending the type of data stored.  E.g.  Index Databases can be stored completely in memory which will provide fast access.  Main DB could be stored using tokyo cabinet back end.  For persistent indexes Morph DB could be used . (c) Copy right 2013 contact extenddb@gmail.com
  • 17.  Easy to use mongodb like embedded database.  Extensible storage backends .  Extensible query language.  Completely customizable query behavior . (c) Copy right 2013 contact extenddb@gmail.com
  • 18.  Tokyocabinet updates are not in-place  Every time the object is expanded old space in file is discarded new space is found.  This is a serious problem for heavy update workload.  Tokyo cabinet by default writes to memory need to do sync to sync the data to file.  If application crashes without sync data is lost.  Sync calls are costly.  Incase sync gets called after every insert the performance is very low. (c) Copy right 2013 contact extenddb@gmail.com
  • 19.  Morph DB is a key value pair database aimed at solving the limitations of tokyo cabinet.  Aims of Morph DB –  Fast in-place updates / object expansion.  A fast block management layer which could reuse storage used by deleted objects.  Once written data read should not be slowed down by block management layer.  Writes all data directly to the file while maintaining performance. (c) Copy right 2013 contact extenddb@gmail.com
  • 20.  B+ Tree implementation on top of block management layer.  Provides generation based cursors.  Cursors can work while DB is being modified.  Can search for values in a range of keys. (c) Copy right 2013 contact extenddb@gmail.com
  • 21.  Provides 2 basic functionalities  Data Write – ▪ Finds allocates resources in file ▪ Writes the data to suitable location(s). ▪ Returns an address where the data is written. ▪ Upper layer must store this reference to read the data back. ▪ Data is not interpreted.  Data Read – ▪ Given the address which was earlier returned by the Data write reads data from the offset or links of offsets ▪ Verifies the checksum of each piece ▪ Returns stitched object to the caller. (c) Copy right 2013 contact extenddb@gmail.com
  • 22.  File storage is managed in terms of resource clusters.  Each resource cluster contains some header information and the resources followed by it.  Unique property of resources is that it is of variable size instead of a fixed single block size like in various solutions.  Individual resources (block) size varies from 128 bytes to 4MB.  This range of block sizes makes it suitable for data of various sizes from very small values to 16 MB. (c) Copy right 2013 contact extenddb@gmail.com
  • 23.  Clusters are allocated on-demand for a particular type of resource.  Cluster sizes start from 128K and subsequent cluster sizes are double the previous one capped by 32 MB.  Increasing cluster sizes makes the database file size small initially and grows along with the data size.  In case of small clusters header information could be significant size compared to the resource sizes. (c) Copy right 2013 contact extenddb@gmail.com
  • 24.  Data is stored in list of blocks each stores reference to next block in the list.  Each chunk stores the checksum of the entire data.  This helps in identifying corrupt or partially updated links.  When data is expanded according to the expanding data size suitable block is allocated and linked.  There is a cap on link counts there can be maximum 4 links.  Once data spreads across 4 links data is automatically defragmented and a suitable block bigger is found for the entire block which will reduce number of links. (c) Copy right 2013 contact extenddb@gmail.com
  • 25.  Block allocation takes a block size parameter .  A free block of specified size found in the bitmap residing in cluster header and the address is returned.  DiskAddr structure identifying resource is 64 bit , bit- field structure.  56-bit component directly gives the address of the resources .  So no translation of address in IO path.  4-bit type field indicates the resource size 0 for 128 and 1 for 256 and so on.  Type field helps identifying the resource when freeing. (c) Copy right 2013 contact extenddb@gmail.com
  • 26.  Block allocation need to be extremely fast.  Caches used to remember last cluster from which data was allocated cluster.  One such cache for each resource type.  Cache state makes allocation O(1) in case of series of allocations.  Freeing resource will set the cache state to point to the lowest offset resource.  Always search continues in the next clusters. (c) Copy right 2013 contact extenddb@gmail.com
  • 27.  System calls (mostly pread/pwrite were used) are very fast in some machines(core i3 processors). Doing large number of small writes were not a problem.  In other machines (core 2 Duo) system calls were significantly slower and huge percentage of time was spent in system calls.  Memory mapped IOs were significantly faster. (c) Copy right 2013 contact extenddb@gmail.com
  • 28.  Mapping entire file has few problems.  File sizes can grow  In 32-bit machines will limit the database size.  Unused regions could be mapped and kernel could choose to remove wrong set of pages.  To avoid above draw backs list of mmapped blocks were used.  Number was limited by 10 to limit the virtual address usage.  Least recently used mmapped region is removed if new region is to be mmapped.  Whenever a cluster is allocated whole cluster is mmapped.  For each IO this list is checked if it is a hit simple memcpy is done or else fall back to old system call.  This improved the performance by almost 50 % in slow machines. (c) Copy right 2013 contact extenddb@gmail.com
  • 29.  B+ Tree uses the block management layer to store its internal nodes and data.  Block manager has no information about how the blocks are going to be used.  Provides a slot for the upper layer to store a reference to its superblock.  Internal nodes stores all keys of the nodes and references to corresponding child nodes/values.  Parent pointer is not maintained on-disk this makes the splitting of nodes fast.  Parent child relation ship is established during search. (c) Copy right 2013 contact extenddb@gmail.com
  • 30.  All the nodes being modified are in-memory.  Nodes are pinned in cache.  After each modification node is written back to file. (c) Copy right 2013 contact extenddb@gmail.com
  • 31.  Concurrent modifications can be allowed by taking write lock on root of sub-tree which could be modified by insert/delete.  An insert in B+Tree could modify few to all nodes in the path from root to the leaf.  The highest level which will be modified could be found by whether child could overrun by the insert.  If child is overrun then parent will be modified.  So instead of locking root we just need to lock the subtree whose root is the top most parent which could be modified.  Similar speculation could be done for deletes.  All the nodes from root to the first child which could be modified will be locked for read. (c) Copy right 2013 contact extenddb@gmail.com
  • 32. (c) Copy right 2013 contact extenddb@gmail.com
  • 33.  Tokyocabinet - http://fallabs.com/tokyocabinet/spex-en.html  Mongo DB - http://www.mongodb.org/ (c) Copy right 2013 contact extenddb@gmail.com
  • 34. (c) Copy right 2013 contact extenddb@gmail.com