SlideShare a Scribd company logo
MAPDB
BY DEBMALYA JASH
WHAT IS MAPDB?
• MapDB is an open-source (Apache 2.0 licensed), embedded Java database engine and collection
framework. It provides Maps, Sets, Lists, Queues, Bitmaps with range queries, expiration, compression,
off-heap storage and streaming. MapDB is probably the fastest Java database, with performance
comparable to java.util collections. It also provides advanced features such as ACID transactions,
snapshots, incremental backups and much more.
MAIN CLASSES
• DBMaker
• DB
• HTreeMap
• BTreeMap
• Volume
• SortedTableMap
DBMAKER
• Handles database configuration, creation and opening. Using this class we can set different modes and
configuration options provided by MapDB.
DB
• Represents and open database (or a single transaction session). It is used to create, open and collection
storages.
• Handles database's lifecycle methods like commit(), rollback(), and close().
• To open (or create) a store, use one of the DBMaker.xxxDB() static methods.
• memoryDB() - Creates new in-memory database. Changes are lost after JVM exits. serializes data into byte[].
• memoryDirectDB() - Creates new in-memory database. Changes are lost after JVM exits. This will use
DirectByteBuffer outside of Heap, so Garbage Collector is not affected. Increase memory as per your
requirement with option -XX:MaxDirectMemorySize=10G
• fileDB() – stores serialized record in physical file.
• tempFileDB() - new database in temporary folder. Files are deleted after store was closed.
• appendFileDB() opens a database which uses append-only log files and so on.
• heapDB() - Creates new in-memory database which stores all data on heap without serialization. very fast, but
data will affect Garbage Collector the same way as traditional Java Collections.
HTREEMAP
• HTreeMap provides HashMap and HashSet collections for MapDB. It optionally supports entry
expiration and can be used as a cache. It is thread-safe and scales under parallel updates.
HTREEMAP ADVANTAGES
• HTreeMap is a segmented Hash Tree. Unlike other HashMaps it does not use fixed size Hash Table, and does not
rehash all data when Hash Table grows. HTreeMap uses auto-expanding Index Tree, so it never needs resize. It also
occupies less space, since empty hash slots do not consume any space.
• HTreeMap optionally supports entry expiration based on four criteria: maximal map size, maximal storage size, time-
to-live since last modification and time-to-live since last access. Expired entries are automatically removed. This
feature uses FIFO queue and each segment has independent expiration queue.
MAP LAYOUT
• MapDB has different set of parameters to control its access time and maximal size. Those are grouped
under term Map Layout.
• HTreeMap layout is controlled by layout function. It takes three parameters:
• concurrency, number of segments. Default value is 8, it always rounds-up to power of two.
• maximal node size of Index Tree Dir Node. Default value is 16, it always rounds-up to power of two.
Maximal value is 128 entries.
• number of Levels in Index Tree, default value is 4
CONCURRENCY
• Concurrency is implemented by using multiple segments, each with separate read-write lock. Each concurrent
segment is independent, it has its own Size Counter, iterators and Expiration Queues. Number of segments is
configurable. Too small number will cause congestion on concurrent updates, too large will increase memory
overhead.
• HTreeMap uses Index Tree instead of growing Object[] for its Hash Table. Index Tree is sparse array like structure,
which uses tree hierarchy of arrays. It is sparse, so unused entries do not occupy any space. It does not do rehashing
(copy all entries to bigger array), but also it can not grow beyond its initial capacity.
SHARD STORES FOR BETTER CONCURRENCY
• HTreeMap is split into separate segments. Each segment is independent and does not share any state
with other segments. However they still share underlying Store and that affects performance under
concurrent load. It is possible to make segments truly independent, by using separate Store for each
segment.
EXPIRATION
• HTreeMap offers optional entry expiration if some conditions are met. Entry can expire if:
• An entry exists in the map longer time than the expiration period is. The expiration period could be since the
creation, last modification or since the last read access.
• The number of entries in a map would exceed maximal number
• Map consumes more disk space or memory than space limit
EXPIRATION OVERFLOW
• HTreeMap supports Modification Listeners. It notifies listener about inserts, updates and removes from
HTreeMap. It is possible to link two collections together. Usually faster in-memory with limited size, and
slower on-disk with unlimited size. After an entry expires from in-memory, it is automatically moved to
on-disk by Modification Listener. And Value Loader will load values back to in-memory map, if those are
not found by map.get() operation.
BTREEMAP
• BTreeMap provides TreeMap and TreeSet for MapDB. It is based on lock-free concurrent B-Linked-Tree.
It offers great performance for small keys and has good vertical scalability.
• BTrees store all their keys and values as part of a btree node. Node size affects the performance a lot. A
large node means that many keys have to be deserialized on lookup. A smaller node loads faster, but
makes large BTrees deeper and requires more operations. The default maximal node size is 32 entries
and it can be changed in this way.
FRAGMENTATION
• A trade-off for lock-free design is fragmentation after deletion. The B-Linked-Tree does not delete btree
nodes after entry removal, once they become empty. If you fill a BTreeMap and then remove all entries,
about 40% of space will not be released. Any value updates (keys are kept) are not affected by this
fragmentation.
VOLUME
• MapDB has its own storage abstraction similar to ByteBuffer. It is called Volume and resides in package
org.mapdb.volume. It is growable, works over 2GB and has number of tweaks to work better with
MapDB.
VOLUME IMPLEMENTATION
• Volume over multiple byte[]. Use DBMaker.memoryDB(file)
• DirectByteBuffer for direct memory. Use DBMaker.memoryDirectDB()
• RandomAccessFile is safer way to access files. It is enabled by default, use DBMaker.fileDB(file)
• FileChannel a bit faster then RAF. Use DBMaker.fileDB(file).fileChannelEnable()
• MappedByteBuffer for memory mapped files. Use DBMaker.fileDB(file).fileMMapEnable()
SORTED TABLE MAP
• SortedTableMap is read-only and does not support updates. Changes should be applied by creating new Map with
Data Pump. Usually one places change into secondary map, and periodically merges two maps into
new SortedTableMap.
• SortedTableMap does not use DB object, but operates directly on Volume (MapDB abstraction over ByteBuffer).
COMPOSITE KEYS AND TUPLES
• MapDB allows composite keys in the form of Object[]. Interval submaps can be used to fetch tuple
subcomponents, or to create a simple form of multimap. Object array is not comparable, so you need to
use specialized serializer which provides comparator.
TRANSACTIONS AND CRASH PROTECTION
• By default transaction is disabled. To enable it we have to call transactionEnable() implicitly.
• DBMaker.fileDB(dbName).transactionEnable().make();
• Whenever we enable transaction, WAL (Write ahead log) is enabled, this enable file protection and
transactions atomic and durable.
MEMORY MAPPED FILES
• By default MapDB use a slower and safer disk access mode called Random-Access-File (RAF).
• Mmap files are much faster compared to RAF. The exact speed bonus depends on the operating system
and disk case management, but is typically between 10% and 300%.
FILE CHANNEL
• It is faster than RandomAccessFile, but has bit more overhead. It also works better under concurrent
access (RAF has global lock).
• DB db = DBMaker .fileDB(file) .fileChannelEnable() .make();
• FileChannel was causing problems in combination with Thread.interrupt. If threads gets interrupted
while doing IO, underlying channel is closed for all other threads.
IN MEMORY MODE
• THREE IN MEMORY MODE
• HEAP DB – VERY FAST AND USEFUL FOR SMALL DATASET , BUT AFFECTED BY GC ACTIVITY. stores objects
in Map<recid,Object> and does not use serialization.
• MEMORY DB - Store based on byte[]. In this mode data are serialized and stored into 1MB large byte[]. Technically this is still
on-heap, but is not affected by GC overhead, since data are not visible to GC. This mode is recommended by default, since it
does not require any additional JVM settings. Increasing maximal heap memory with -Xmx10G JVM parameter is enough.
• Memory direct db - Store based on DirectByteBuffer. In this case data are stored completely off-heap. in 1MB
DirectByteBuffers created with ByteBuffer.allocateDirect(size). You should increase maximal direct memory with JVM
parameter. This mode allows you to decrease maximal heap size to very small size (-Xmx128M). Small heap size has usually
better and more predictable performance.
ALLOCATION OPTIONS
• By default MapDB tries minimize space usage and allocates space in 1MB increments. This additional
allocations might be slower than single large allocation. There are two options to control storage initial
size and size increment. Allocation Increment has side effect on performance with mmap files. MapDB
maps file in series of DirectByteBuffer. Size of each buffer is equal to Size Increment (1MB by default), so
larger Size Increment means less buffers for the same disk store size. Operations such as sync, flush and
close have to traverse all buffers. So larger Size Increment could speedup commit and close operations.
PARTITIONING OPTIONS
• To achieve concurrency HashMap is split into segments, each segment is separate HashMap with its
own ReadWriteLock. Segment number is calculated from hash. When expiration is enabled each
segment has its own Expiration Queue.
QUICK TIPS
• Memory mapped files are much faster and should be enabled on 64bit systems for better performance.
• MapDB has Pump for fast bulk import of collections. It is much faster than to Map.put()
• Transactions have a performance overhead, but without them the store gets corrupted if not closed properly.
• Data stored in MapDB (keys and values) should be immutable. MapDB serializes objects on background.
• MapDB needs compaction sometimes. Run DB.compact() or see background compaction options.
• Better to use specific serializer (e.g. Serializer.STRING), otherwise slower generic serializer will be used.
REFERENCE
• MapDB github
• MapDB Manual
THANKS
•

More Related Content

PPT
PLC and Industrial Automation - Technology Overview
PPT
Biomass supported solar thermal hybrid power plant
PDF
Renewable energy grid integration
PDF
Energy storage in smart micro-grid
PDF
Feed water and condensate heaters
DOCX
Internship report RAJIV GANDHI COMBINED CYCLE POWER PLANT-NTPC LTD. Kayamkulam
PDF
L17 equidistant pulse firing scheme
PPTX
INDUCTION HEATING BY HIGH FREQUENCY RESONANT INVERTERS
PLC and Industrial Automation - Technology Overview
Biomass supported solar thermal hybrid power plant
Renewable energy grid integration
Energy storage in smart micro-grid
Feed water and condensate heaters
Internship report RAJIV GANDHI COMBINED CYCLE POWER PLANT-NTPC LTD. Kayamkulam
L17 equidistant pulse firing scheme
INDUCTION HEATING BY HIGH FREQUENCY RESONANT INVERTERS

Similar to Map db (20)

PPTX
PDF
Bigtable and Boxwood
PPT
Introduction To Maxtable
PDF
Hbase: an introduction
PPT
Computre_Engineering_Introduction_FPGA.ppt
PPTX
Designing data intensive applications
PDF
HBase Sizing Guide
PPTX
#GeodeSummit - Off-Heap Storage Current and Future Design
PPTX
Responding rapidly when you have 100+ GB data sets in Java
PPTX
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
PPTX
Hadoop – Architecture.pptx
PDF
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PPTX
Apache Geode Offheap Storage
PPTX
Lectures 9-HCE 311.pptx;parallel systems
PPTX
Apache Spark
PDF
Hbase 20141003
PPTX
NOSQL introduction for big data analytics
PPT
7. Key-Value Databases: In Depth
PPTX
Apache HBase Performance Tuning
PPTX
HBaseCon 2015: HBase Performance Tuning @ Salesforce
Bigtable and Boxwood
Introduction To Maxtable
Hbase: an introduction
Computre_Engineering_Introduction_FPGA.ppt
Designing data intensive applications
HBase Sizing Guide
#GeodeSummit - Off-Heap Storage Current and Future Design
Responding rapidly when you have 100+ GB data sets in Java
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Hadoop – Architecture.pptx
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
Apache Geode Offheap Storage
Lectures 9-HCE 311.pptx;parallel systems
Apache Spark
Hbase 20141003
NOSQL introduction for big data analytics
7. Key-Value Databases: In Depth
Apache HBase Performance Tuning
HBaseCon 2015: HBase Performance Tuning @ Salesforce
Ad

Recently uploaded (20)

PDF
Softaken Excel to vCard Converter Software.pdf
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
L1 - Introduction to python Backend.pptx
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
System and Network Administration Chapter 2
PDF
Digital Systems & Binary Numbers (comprehensive )
PPTX
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Designing Intelligence for the Shop Floor.pdf
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
Softaken Excel to vCard Converter Software.pdf
wealthsignaloriginal-com-DS-text-... (1).pdf
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
L1 - Introduction to python Backend.pptx
Upgrade and Innovation Strategies for SAP ERP Customers
How to Migrate SBCGlobal Email to Yahoo Easily
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
System and Network Administration Chapter 2
Digital Systems & Binary Numbers (comprehensive )
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Which alternative to Crystal Reports is best for small or large businesses.pdf
Designing Intelligence for the Shop Floor.pdf
Navsoft: AI-Powered Business Solutions & Custom Software Development
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
How to Choose the Right IT Partner for Your Business in Malaysia
Ad

Map db

  • 2. WHAT IS MAPDB? • MapDB is an open-source (Apache 2.0 licensed), embedded Java database engine and collection framework. It provides Maps, Sets, Lists, Queues, Bitmaps with range queries, expiration, compression, off-heap storage and streaming. MapDB is probably the fastest Java database, with performance comparable to java.util collections. It also provides advanced features such as ACID transactions, snapshots, incremental backups and much more.
  • 3. MAIN CLASSES • DBMaker • DB • HTreeMap • BTreeMap • Volume • SortedTableMap
  • 4. DBMAKER • Handles database configuration, creation and opening. Using this class we can set different modes and configuration options provided by MapDB.
  • 5. DB • Represents and open database (or a single transaction session). It is used to create, open and collection storages. • Handles database's lifecycle methods like commit(), rollback(), and close(). • To open (or create) a store, use one of the DBMaker.xxxDB() static methods. • memoryDB() - Creates new in-memory database. Changes are lost after JVM exits. serializes data into byte[]. • memoryDirectDB() - Creates new in-memory database. Changes are lost after JVM exits. This will use DirectByteBuffer outside of Heap, so Garbage Collector is not affected. Increase memory as per your requirement with option -XX:MaxDirectMemorySize=10G • fileDB() – stores serialized record in physical file. • tempFileDB() - new database in temporary folder. Files are deleted after store was closed. • appendFileDB() opens a database which uses append-only log files and so on. • heapDB() - Creates new in-memory database which stores all data on heap without serialization. very fast, but data will affect Garbage Collector the same way as traditional Java Collections.
  • 6. HTREEMAP • HTreeMap provides HashMap and HashSet collections for MapDB. It optionally supports entry expiration and can be used as a cache. It is thread-safe and scales under parallel updates.
  • 7. HTREEMAP ADVANTAGES • HTreeMap is a segmented Hash Tree. Unlike other HashMaps it does not use fixed size Hash Table, and does not rehash all data when Hash Table grows. HTreeMap uses auto-expanding Index Tree, so it never needs resize. It also occupies less space, since empty hash slots do not consume any space. • HTreeMap optionally supports entry expiration based on four criteria: maximal map size, maximal storage size, time- to-live since last modification and time-to-live since last access. Expired entries are automatically removed. This feature uses FIFO queue and each segment has independent expiration queue.
  • 8. MAP LAYOUT • MapDB has different set of parameters to control its access time and maximal size. Those are grouped under term Map Layout. • HTreeMap layout is controlled by layout function. It takes three parameters: • concurrency, number of segments. Default value is 8, it always rounds-up to power of two. • maximal node size of Index Tree Dir Node. Default value is 16, it always rounds-up to power of two. Maximal value is 128 entries. • number of Levels in Index Tree, default value is 4
  • 9. CONCURRENCY • Concurrency is implemented by using multiple segments, each with separate read-write lock. Each concurrent segment is independent, it has its own Size Counter, iterators and Expiration Queues. Number of segments is configurable. Too small number will cause congestion on concurrent updates, too large will increase memory overhead. • HTreeMap uses Index Tree instead of growing Object[] for its Hash Table. Index Tree is sparse array like structure, which uses tree hierarchy of arrays. It is sparse, so unused entries do not occupy any space. It does not do rehashing (copy all entries to bigger array), but also it can not grow beyond its initial capacity.
  • 10. SHARD STORES FOR BETTER CONCURRENCY • HTreeMap is split into separate segments. Each segment is independent and does not share any state with other segments. However they still share underlying Store and that affects performance under concurrent load. It is possible to make segments truly independent, by using separate Store for each segment.
  • 11. EXPIRATION • HTreeMap offers optional entry expiration if some conditions are met. Entry can expire if: • An entry exists in the map longer time than the expiration period is. The expiration period could be since the creation, last modification or since the last read access. • The number of entries in a map would exceed maximal number • Map consumes more disk space or memory than space limit
  • 12. EXPIRATION OVERFLOW • HTreeMap supports Modification Listeners. It notifies listener about inserts, updates and removes from HTreeMap. It is possible to link two collections together. Usually faster in-memory with limited size, and slower on-disk with unlimited size. After an entry expires from in-memory, it is automatically moved to on-disk by Modification Listener. And Value Loader will load values back to in-memory map, if those are not found by map.get() operation.
  • 13. BTREEMAP • BTreeMap provides TreeMap and TreeSet for MapDB. It is based on lock-free concurrent B-Linked-Tree. It offers great performance for small keys and has good vertical scalability. • BTrees store all their keys and values as part of a btree node. Node size affects the performance a lot. A large node means that many keys have to be deserialized on lookup. A smaller node loads faster, but makes large BTrees deeper and requires more operations. The default maximal node size is 32 entries and it can be changed in this way.
  • 14. FRAGMENTATION • A trade-off for lock-free design is fragmentation after deletion. The B-Linked-Tree does not delete btree nodes after entry removal, once they become empty. If you fill a BTreeMap and then remove all entries, about 40% of space will not be released. Any value updates (keys are kept) are not affected by this fragmentation.
  • 15. VOLUME • MapDB has its own storage abstraction similar to ByteBuffer. It is called Volume and resides in package org.mapdb.volume. It is growable, works over 2GB and has number of tweaks to work better with MapDB.
  • 16. VOLUME IMPLEMENTATION • Volume over multiple byte[]. Use DBMaker.memoryDB(file) • DirectByteBuffer for direct memory. Use DBMaker.memoryDirectDB() • RandomAccessFile is safer way to access files. It is enabled by default, use DBMaker.fileDB(file) • FileChannel a bit faster then RAF. Use DBMaker.fileDB(file).fileChannelEnable() • MappedByteBuffer for memory mapped files. Use DBMaker.fileDB(file).fileMMapEnable()
  • 17. SORTED TABLE MAP • SortedTableMap is read-only and does not support updates. Changes should be applied by creating new Map with Data Pump. Usually one places change into secondary map, and periodically merges two maps into new SortedTableMap. • SortedTableMap does not use DB object, but operates directly on Volume (MapDB abstraction over ByteBuffer).
  • 18. COMPOSITE KEYS AND TUPLES • MapDB allows composite keys in the form of Object[]. Interval submaps can be used to fetch tuple subcomponents, or to create a simple form of multimap. Object array is not comparable, so you need to use specialized serializer which provides comparator.
  • 19. TRANSACTIONS AND CRASH PROTECTION • By default transaction is disabled. To enable it we have to call transactionEnable() implicitly. • DBMaker.fileDB(dbName).transactionEnable().make(); • Whenever we enable transaction, WAL (Write ahead log) is enabled, this enable file protection and transactions atomic and durable.
  • 20. MEMORY MAPPED FILES • By default MapDB use a slower and safer disk access mode called Random-Access-File (RAF). • Mmap files are much faster compared to RAF. The exact speed bonus depends on the operating system and disk case management, but is typically between 10% and 300%.
  • 21. FILE CHANNEL • It is faster than RandomAccessFile, but has bit more overhead. It also works better under concurrent access (RAF has global lock). • DB db = DBMaker .fileDB(file) .fileChannelEnable() .make(); • FileChannel was causing problems in combination with Thread.interrupt. If threads gets interrupted while doing IO, underlying channel is closed for all other threads.
  • 22. IN MEMORY MODE • THREE IN MEMORY MODE • HEAP DB – VERY FAST AND USEFUL FOR SMALL DATASET , BUT AFFECTED BY GC ACTIVITY. stores objects in Map<recid,Object> and does not use serialization. • MEMORY DB - Store based on byte[]. In this mode data are serialized and stored into 1MB large byte[]. Technically this is still on-heap, but is not affected by GC overhead, since data are not visible to GC. This mode is recommended by default, since it does not require any additional JVM settings. Increasing maximal heap memory with -Xmx10G JVM parameter is enough. • Memory direct db - Store based on DirectByteBuffer. In this case data are stored completely off-heap. in 1MB DirectByteBuffers created with ByteBuffer.allocateDirect(size). You should increase maximal direct memory with JVM parameter. This mode allows you to decrease maximal heap size to very small size (-Xmx128M). Small heap size has usually better and more predictable performance.
  • 23. ALLOCATION OPTIONS • By default MapDB tries minimize space usage and allocates space in 1MB increments. This additional allocations might be slower than single large allocation. There are two options to control storage initial size and size increment. Allocation Increment has side effect on performance with mmap files. MapDB maps file in series of DirectByteBuffer. Size of each buffer is equal to Size Increment (1MB by default), so larger Size Increment means less buffers for the same disk store size. Operations such as sync, flush and close have to traverse all buffers. So larger Size Increment could speedup commit and close operations.
  • 24. PARTITIONING OPTIONS • To achieve concurrency HashMap is split into segments, each segment is separate HashMap with its own ReadWriteLock. Segment number is calculated from hash. When expiration is enabled each segment has its own Expiration Queue.
  • 25. QUICK TIPS • Memory mapped files are much faster and should be enabled on 64bit systems for better performance. • MapDB has Pump for fast bulk import of collections. It is much faster than to Map.put() • Transactions have a performance overhead, but without them the store gets corrupted if not closed properly. • Data stored in MapDB (keys and values) should be immutable. MapDB serializes objects on background. • MapDB needs compaction sometimes. Run DB.compact() or see background compaction options. • Better to use specific serializer (e.g. Serializer.STRING), otherwise slower generic serializer will be used.