SlideShare a Scribd company logo
TileDB webinars
The TileDB Embedded
Storage Engine
Founder & CEO of TileDB, Inc.
Dr. Stavros Papadopoulos
Who is this webinar for?
Those wanting to learn about data storage fundamentals
Layout, compression, IO, etc.
Those looking to efficiently store/access any kind of data to/from anywhere
Dataframes, genomics, LiDAR, SAR, weather, and more, with a single engine
Those tired of managing custom, inefficient data formats
Formats not supporting fast updates, indexing, versioning, cloud performance
Disclaimer
I am the exclusive recipient of complaints
Email me at: stavros@tiledb.com
All the credit for our amazing work goes to our powerful team
Check it out at https://tiledb.com/about
Deep roots at the intersection of HPC, databases and data science
Traction with telecoms, pharmas, hospitals and other scientific organizations
40 members with expertise across all applications and domains
Who we are
TileDB got spun out from MIT and Intel Labs in 2017
WHERE IT ALL STARTED
Raised over $20M, we are very well capitalized
INVESTORS
What is TileDB Embedded?
An embeddable C library that stores and accesses multi-dimensional arrays
Dense array Sparse array
It implements very fast array slicing across dimensions
Superior
performance
Built in C
Fully-parallelized
Columnar format
Multiple compressors
R-trees for sparse arrays
TileDB Embedded at a Glance
https://github.com/TileDBInc/TileDB
Open source:
Rapid updates
& data versioning
Immutable writes
Lock-free
Parallel reader / writer model
Time traveling
Schema evolution
TileDB Embedded at a Glance
https://github.com/TileDBInc/TileDB
Open source:
Extreme
interoperability
Numerous APIs
Numerous integrations
All backends
Optimized
for the cloud
Immutable writes
Parallel IO
Minimization of requests
TileDB Embedded at a Glance
APIs & tool Integrations with zero-copy where possible
TileDB Embedded
Open-source interoperable
storage with a universal
open-spec array format
● Parallel IO, rapid reads & writes
● Columnar, cloud-optimized
● Data versioning & time traveling
Why arrays?
The basics
Advanced internal mechanics
Examples
Work in progress
Agenda
Comparison to other formats and engines
Docs at docs.tiledb.com
Byte 0 1 ...
Regardless of what kind of data you have, it is laid out in a 1D storage medium
Why Arrays?
where each task may slice
Algorithm as a task graph
Regardless of what kind of algorithm you run, the algorithm involves a set of slices
Why Arrays?
Byte 0 1 ...
Byte 0 1 ...
Performance is absolutely dictated by the slice result locality on the 1D medium
Why Arrays?
Arrays provide a flexible way to map/slice any-dimensional (ND data to/from a 1D layout
Giving different “importance” to different dimensions (order and tiling)
Choosing whether dimension coordinates should be materialized or not (dense vs. sparse)
Considering compression, encryption and other filters (tiling)
Abstracting all the engineering magic that it takes to make everything very fast (engine)
Unifying the data model for all application domains! (universality)
Building indices for fast search (e.g., R-trees)
Arrays Subsume Dataframes
Sparse array
Dataframe
Dense vector
Arrays Are Universal
What else can be modeled as an array
LiDAR 3D sparse)
SAR 2D or 3D dense)
Population genomics (3D sparse)
Single-cell genomics (2D dense or sparse)
Biomedical imaging (2D or 3D dense)
Even flat files!!! 1D dense)
Time series (ND dense or sparse)
Weather (2D or 3D dense)
Graphs (2D sparse)
Video (3D dense)
Key-values (1D or ND sparse)
The Basics
dense_array1
├── __t2_t2_uuid1_v
│ ├── __fragment_metadata.tdb
│ └── a0.tdb
├── __t2_t2_uuid1.ok
├── __lock.tdb
├── __meta
└── __schema
└── __t1_t1_uuid2
A Simple 2D Dense Array
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
fragment
schema
attribute data
A Simple 2D Sparse Array
sparse_array1
├── __t2_t2_uuid2_v
│ ├── __fragment_metadata.tdb
│ ├── a0.tdb
│ ├── d0.tdb
│ └── d1.tdb
├── __t2_t2_uuid2_v.ok
├── __lock.tdb
├── __meta
└── __schema
└── __t1_t2_uuid1
1 2
3
4 5
6
fragment
schema
attribute data
coordinates
Groups
dense_group
├── __tiledb_group.tdb
└── nested_group
├── __tiledb_group.tdb
└── dense_array1
├── __lock.tdb
├── __meta
└── __schema
Groups provide an easy way to hierarchically organize arrays
Array Metadata
dense_array1
├── __t2_t2_uuid2_v
│ ├── __fragment_metadata.tdb
│ └── a0.tdb
├── __t2_t2_uuid2_v.ok
├── __lock.tdb
├── __meta
│ └── __t3_t3_uuid3
└── __schema
└── __t1_t2_uuid1
You can attach any number of (key, value) pairs to an array
The key must be string, and the value can be anything
metadata go here
Multiple Attributes
dense_array1
├── __t2_t2_uuid1_v
│ ├── __fragment_metadata.tdb
│ └── a0.tdb
│ └── a1.tdb
├── __t2_t2_uuid1.ok
├── __lock.tdb
├── __meta
└── __schema
└── __t1_t1_uuid2
1,a 2,b 3,c 4,d
5,e 6,f 7,g 8,h
9,i 10,j 11,k 12,l
13,m 14,n 15,o 16,p
You can store more than one values in each cell, even of different type
TileDB has a “columnar” format that allows you to efficiently subselect on attributes
attribute data
Var-length Attributes
dense_array3
├── __t2_t2_uuid1_v
│ ├── __fragment_metadata.tdb
│ └── a0.tdb
│ └── a0_var.tdb
├── __t2_t2_uuid1.ok
├── __lock.tdb
├── __meta
└── __schema
└── __t1_t1_uuid2
TileDB supports storing variable-length values in a cell (of any data type)
a bb ccc dddd
e ff ggg hhhh
i jj kkk lll
m nn ooo pppp
offsets
var-length data
Var-length Dimensions
sparse_array4
├── __t2_t2_uuid1_v
│ ├── __fragment_metadata.tdb
│ └── a0.tdb
│ └── d0.tdb
│ └── d0_var.tdb
├── __t2_t2_uuid1.ok
├── __lock.tdb
├── __meta
└── __schema
└── __t1_t1_uuid2
You can also have var-length dimensions and slice naturally using string ranges
Applicable only to sparse arrays
offsets
var-length data
a bb ccc dddd e ff
1 2 3 4 5 6
unbounded domain
infinite gaps
Heterogeneous Dimensions
4
1.0
0.0
“dddd”
0.4
infinite string
dimension
infinite float32
dimension
Sparse array allow you to have dimensions of different types
The following 2D array allows efficient slicing on a string and a float32 dimension
Arrays as Dataframes
An array is essentially a dataframe
where dimensions are special (they are “indexed”)
What About Cloud Object Stores?
array_name → {s3,azure,gcs,tiledb}://path/array_name
Everything
demonstrated works
as is on the cloud
Tiling & Layout
Tiling | Dense Arrays
fetches the whole array from storage
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
space tile
extents
fetches only a portion of the array (a tile)
A space tile is the atomic unit of IO
space tile
extents
Cell Layout | Dense Arrays
Three parameters define the values layout on storage, called the global order
Space tile extents
Tile order/layout (row-major or column-major)
Cell order/layout (row-major or column major)
row-major tile order
row-major cell order
22
space
tiles
col-major tile order
row-major cell order
22
space
tiles
row-major tile order
col-major cell order
42
space
tiles
Tiling & Cell Layout | Sparse Arrays
Sparse arrays store only non-empty cells
Grouping non-empty cells with space tiles would be inefficient (due to potential skew)
The atomic unit of IO in sparse arrays is the data tile, of fixed (user-defined) capacity
First impose a global order similar to dense arrays, then group based on capacity
col-major tile order
row-major cell order
22
space
tiles
capacity 2
space tile
extents
space tile
extents
col-major tile order
row-major cell order
22
space
tiles
capacity 4
data tile
Hilbert Order | Sparse Arrays
Space tiles greatly affect the cell layout in sparse arrays
Sometimes it is very difficult to define a good space tiling (especially with floats and strings)
For such cases, the Hilbert order is ideal (no tile extents and order)
For floats we discretize the domain into buckets
based on the number of dimensions
For strings we assign a number of bits per dimension
and then use the string prefixes as numbers
Tile Filters
TileDB allows a wide range of filters to be applied to each tile prior to its storage
Compressors (gzip, zstd, bzip2, …)
Checksums
Encryption
The atomic unit of filtering is the chunk (typically equal to the L1 cache size)
TileDB applies the filters across chunks in parallel in a pipeline
chunk
tile
zstd
AES256
Advanced
Internal Mechanics
Versioning and Time Traveling
In TileDB, every write is immutable
Each (batch) write creates a timestamped fragment
With fragments, TileDB implements
versioning and time traveling
Versioning and Time Traveling | Dense Arrays
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Read at 0,t1
100 3 4
7 8
9 10 11 12
13 14 15 16
Read at 0,t2
200
500 600
100 - -
- -
- - - -
- - - -
Read at (t1,t2
200
500 600
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Write at t1
100 200
500 600
Write at t2
100
Write at t2
40
Versioning and Time Traveling | Sparse Arrays
1 2
3
4 5
6
Write at t1
100 2
3
40 5
6
Read at 0,t2
1 100
Read at (t1,t2
40
When no dups
allowed
1
4
6
Read at 0,t1
2
3
5
When dups
are allowed
4
dups
100
Write at t2
40
Versioning and Time Traveling | Sparse Arrays
1 2
3
4 5
6
Write at t1
100
Read at (t1,t2
40
1
4
6
Read at 0,t1
2
3
5
100 2
3
40 5
6
Read at 0,t2
1
Indexing
TileDB has a three-level indexing approach
Fragment timestamps (in the fragment names) for time traveling
Non-empty domain in each fragment’s metadata
Either simple offset arithmetic (dense) or R-trees (sparse)
1. Get list of fragment names (with .ok)
t1_t1_uuid1_v
t2_t2_uuid2_v
...
2. Ignore fragments with timestamp not in time traveling interval
3. Ignore fragments with non-empty domain not overlapping slice
__fragment_metadata.tdb
__fragment_metadata.tdb
4a. Ignore dense tiles via implicit positional indexing, or
4b. Ignore sparse tiles from the R-tree that do not overlap the slice
Algorithm
A slicing query would just traverse the tree
top-down, visiting only nodes/MBRs that
intersect the slice
Indexing
Given the non-empty domain, the space tile extents and the
tile order, we can find easily that this slice overlaps the
second and fourth tile
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
row-major tile order
22
space
tiles
MBR1
MBR2
MBR3
MBR4
col-major tile order
row-major cell order
22
space
tiles
capacity
2
R-tree
(stored in fragment metadata)
MBR1 MBR2 MBR3 MBR4
Consolidation & Vacuuming
Numerous fragments can lead to performance degradation (loss of locality, expensive listing)
TileDB supports two levels of consolidation
Fragment metadata (group the non-empty domains in a single place)
Fragments (better preserve data locality)
Old fragments are preserved after consolidation (for time traveling)
TileDB can vacuum old fragments to save space and boost listing
Time traveling will not work on vacuumed fragments
Attribute Filter Push-Down
TileDB supports pushing attribute filter conditions down to the engine
That typically boosts performance
Much fewer data gets copied around
More L1-cache conscious
More opportunities for parallelism and vectorization
Schema Evolution
TileDB supports schema evolution (since v2.4
Adding an attribute
Dropping an attribute
More schema evolution features are coming up
Full versioning and time traveling is supported
Notes on Writing
Lots of flexibility in writing in different orders, different domain subarray, etc.
Support for lock-free, parallel writing
Tips for performance:
Each tile should be 100KB  1MB
Each fragment should be 1  2GB
Fragments should not “interleave”
Run fragment metadata consolidation (especially on cloud object stores)
No support for deletions and updates yet (coming up soon)
Notes on Reading
TileDB is eventually consistent
Support for parallel writers, parallel readers (all lock-free)
Support for reads in different layouts
Support for “streaming reads” (incomplete queries)
Tips for performance:
Allocate sufficient space for the result buffers (minimize incomplete queries)
Tune written layout based on the read layout (application dependent)
Push down coordinate and attribute filter conditions
Work In Progress
Coming Up
More schema evolution features
Support for deletes and updates
Git-like versioning
ACID via modularizing locking
More tile filters (e.g., sum, min, max)
RLE and dictionary compression on strings
Computations on compressed data
Linear Algebra operations
More SQL push down (e.g., group by)
Graph algorithms
TileDB vs. Others
High-level Comparisons
vs. HDF5
TileDB is cloud-native
TileDB has support for sparse arrays
vs. Zarr
TileDB is built in C and is more interoperable
TileDB has support for sparse arrays
TileDB has support for versioning and time traveling
TileDB has support for versioning and time traveling
High-level Comparisons
vs. Parquet
TileDB is multi-dimensional and supports more flexible layouts
TileDB has support for dense arrays
vs. Delta Lake
TileDB does not rely on Spark, Presto or other subsystem
TileDB has support for dense arrays
TileDB has support for versioning and time traveling
TileDB does not support deletes, updates and full ACID (yet)
TileDB is natively multi-dimensional and supports more flexible layouts
The Universal Database
Thank you
WE ARE HIRING
Apply at tiledb.workable.com

More Related Content

PDF
Scalable Filesystem Metadata Services with RocksDB
PDF
Hadoop architecture-tutorial
PDF
Write Faster SQL with Trino.pdf
PPTX
AHS13 Geoffrey Miller Sexual Fitness and Women's Fertility Cycles
PPT
Hive User Meeting August 2009 Facebook
PPTX
How to be Successful with Scylla
PDF
Streaming Analytics & CEP - Two sides of the same coin?
Scalable Filesystem Metadata Services with RocksDB
Hadoop architecture-tutorial
Write Faster SQL with Trino.pdf
AHS13 Geoffrey Miller Sexual Fitness and Women's Fertility Cycles
Hive User Meeting August 2009 Facebook
How to be Successful with Scylla
Streaming Analytics & CEP - Two sides of the same coin?

Similar to The TileDB Embedded Storage Engine (20)

PDF
Jan vitek distributedrandomforest_5-2-2013
DOCX
Sql Server Interview Question
PPTX
SQL Server In-Memory OLTP introduction (Hekaton)
PPTX
Maryna Popova "Deep dive AWS Redshift"
PDF
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
PDF
Design Patterns for Distributed Non-Relational Databases
PDF
User biglm
PPTX
Database Performance Tuning
PPT
Database Sizing
PPTX
Apache Cassandra, part 1 – principles, data model
PDF
Tulsa techfest Spark Core Aug 5th 2016
PPTX
Sql Basics And Advanced
PDF
PPTX
Structured Query Language (SQL) _ Edu4Sure Training.pptx
PDF
Design Patterns For Distributed NO-reational databases
PPT
Myth busters - performance tuning 103 2008
PPT
Advanced Encryption Standard presentation slide
PDF
Cassandra for Sysadmins
ODT
Sql and mysql database concepts
PDF
220 runtime environments
Jan vitek distributedrandomforest_5-2-2013
Sql Server Interview Question
SQL Server In-Memory OLTP introduction (Hekaton)
Maryna Popova "Deep dive AWS Redshift"
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Design Patterns for Distributed Non-Relational Databases
User biglm
Database Performance Tuning
Database Sizing
Apache Cassandra, part 1 – principles, data model
Tulsa techfest Spark Core Aug 5th 2016
Sql Basics And Advanced
Structured Query Language (SQL) _ Edu4Sure Training.pptx
Design Patterns For Distributed NO-reational databases
Myth busters - performance tuning 103 2008
Advanced Encryption Standard presentation slide
Cassandra for Sysadmins
Sql and mysql database concepts
220 runtime environments
Ad

More from Stavros Papadopoulos (6)

PDF
Analyzing LiDAR and SAR data with Capella Space and TileDB (TileDB webinars, ...
PDF
AIS data management and time series analytics on TileDB Cloud (Webinar, Feb 3...
PDF
The New Data Economics
PDF
Population genomics is a data management problem
PDF
TileDB Cloud Webinar (09/30/2021)
PDF
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
Analyzing LiDAR and SAR data with Capella Space and TileDB (TileDB webinars, ...
AIS data management and time series analytics on TileDB Cloud (Webinar, Feb 3...
The New Data Economics
Population genomics is a data management problem
TileDB Cloud Webinar (09/30/2021)
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
Ad

Recently uploaded (20)

PPTX
Introduction to machine learning and Linear Models
PDF
annual-report-2024-2025 original latest.
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Business Analytics and business intelligence.pdf
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPT
Quality review (1)_presentation of this 21
PDF
Introduction to Data Science and Data Analysis
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Fluorescence-microscope_Botany_detailed content
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Mega Projects Data Mega Projects Data
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Introduction to machine learning and Linear Models
annual-report-2024-2025 original latest.
Reliability_Chapter_ presentation 1221.5784
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Business Analytics and business intelligence.pdf
IB Computer Science - Internal Assessment.pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
Quality review (1)_presentation of this 21
Introduction to Data Science and Data Analysis
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Fluorescence-microscope_Botany_detailed content
ISS -ESG Data flows What is ESG and HowHow
Qualitative Qantitative and Mixed Methods.pptx
Introduction to Knowledge Engineering Part 1
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Mega Projects Data Mega Projects Data
Miokarditis (Inflamasi pada Otot Jantung)
Business Ppt On Nestle.pptx huunnnhhgfvu
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx

The TileDB Embedded Storage Engine

  • 1. TileDB webinars The TileDB Embedded Storage Engine Founder & CEO of TileDB, Inc. Dr. Stavros Papadopoulos
  • 2. Who is this webinar for? Those wanting to learn about data storage fundamentals Layout, compression, IO, etc. Those looking to efficiently store/access any kind of data to/from anywhere Dataframes, genomics, LiDAR, SAR, weather, and more, with a single engine Those tired of managing custom, inefficient data formats Formats not supporting fast updates, indexing, versioning, cloud performance
  • 3. Disclaimer I am the exclusive recipient of complaints Email me at: stavros@tiledb.com All the credit for our amazing work goes to our powerful team Check it out at https://tiledb.com/about
  • 4. Deep roots at the intersection of HPC, databases and data science Traction with telecoms, pharmas, hospitals and other scientific organizations 40 members with expertise across all applications and domains Who we are TileDB got spun out from MIT and Intel Labs in 2017 WHERE IT ALL STARTED Raised over $20M, we are very well capitalized INVESTORS
  • 5. What is TileDB Embedded? An embeddable C library that stores and accesses multi-dimensional arrays Dense array Sparse array It implements very fast array slicing across dimensions
  • 6. Superior performance Built in C Fully-parallelized Columnar format Multiple compressors R-trees for sparse arrays TileDB Embedded at a Glance https://github.com/TileDBInc/TileDB Open source: Rapid updates & data versioning Immutable writes Lock-free Parallel reader / writer model Time traveling Schema evolution
  • 7. TileDB Embedded at a Glance https://github.com/TileDBInc/TileDB Open source: Extreme interoperability Numerous APIs Numerous integrations All backends Optimized for the cloud Immutable writes Parallel IO Minimization of requests
  • 8. TileDB Embedded at a Glance APIs & tool Integrations with zero-copy where possible TileDB Embedded Open-source interoperable storage with a universal open-spec array format ● Parallel IO, rapid reads & writes ● Columnar, cloud-optimized ● Data versioning & time traveling
  • 9. Why arrays? The basics Advanced internal mechanics Examples Work in progress Agenda Comparison to other formats and engines Docs at docs.tiledb.com
  • 10. Byte 0 1 ... Regardless of what kind of data you have, it is laid out in a 1D storage medium Why Arrays? where each task may slice Algorithm as a task graph Regardless of what kind of algorithm you run, the algorithm involves a set of slices
  • 11. Why Arrays? Byte 0 1 ... Byte 0 1 ... Performance is absolutely dictated by the slice result locality on the 1D medium
  • 12. Why Arrays? Arrays provide a flexible way to map/slice any-dimensional (ND data to/from a 1D layout Giving different “importance” to different dimensions (order and tiling) Choosing whether dimension coordinates should be materialized or not (dense vs. sparse) Considering compression, encryption and other filters (tiling) Abstracting all the engineering magic that it takes to make everything very fast (engine) Unifying the data model for all application domains! (universality) Building indices for fast search (e.g., R-trees)
  • 13. Arrays Subsume Dataframes Sparse array Dataframe Dense vector
  • 14. Arrays Are Universal What else can be modeled as an array LiDAR 3D sparse) SAR 2D or 3D dense) Population genomics (3D sparse) Single-cell genomics (2D dense or sparse) Biomedical imaging (2D or 3D dense) Even flat files!!! 1D dense) Time series (ND dense or sparse) Weather (2D or 3D dense) Graphs (2D sparse) Video (3D dense) Key-values (1D or ND sparse)
  • 16. dense_array1 ├── __t2_t2_uuid1_v │ ├── __fragment_metadata.tdb │ └── a0.tdb ├── __t2_t2_uuid1.ok ├── __lock.tdb ├── __meta └── __schema └── __t1_t1_uuid2 A Simple 2D Dense Array 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 fragment schema attribute data
  • 17. A Simple 2D Sparse Array sparse_array1 ├── __t2_t2_uuid2_v │ ├── __fragment_metadata.tdb │ ├── a0.tdb │ ├── d0.tdb │ └── d1.tdb ├── __t2_t2_uuid2_v.ok ├── __lock.tdb ├── __meta └── __schema └── __t1_t2_uuid1 1 2 3 4 5 6 fragment schema attribute data coordinates
  • 18. Groups dense_group ├── __tiledb_group.tdb └── nested_group ├── __tiledb_group.tdb └── dense_array1 ├── __lock.tdb ├── __meta └── __schema Groups provide an easy way to hierarchically organize arrays
  • 19. Array Metadata dense_array1 ├── __t2_t2_uuid2_v │ ├── __fragment_metadata.tdb │ └── a0.tdb ├── __t2_t2_uuid2_v.ok ├── __lock.tdb ├── __meta │ └── __t3_t3_uuid3 └── __schema └── __t1_t2_uuid1 You can attach any number of (key, value) pairs to an array The key must be string, and the value can be anything metadata go here
  • 20. Multiple Attributes dense_array1 ├── __t2_t2_uuid1_v │ ├── __fragment_metadata.tdb │ └── a0.tdb │ └── a1.tdb ├── __t2_t2_uuid1.ok ├── __lock.tdb ├── __meta └── __schema └── __t1_t1_uuid2 1,a 2,b 3,c 4,d 5,e 6,f 7,g 8,h 9,i 10,j 11,k 12,l 13,m 14,n 15,o 16,p You can store more than one values in each cell, even of different type TileDB has a “columnar” format that allows you to efficiently subselect on attributes attribute data
  • 21. Var-length Attributes dense_array3 ├── __t2_t2_uuid1_v │ ├── __fragment_metadata.tdb │ └── a0.tdb │ └── a0_var.tdb ├── __t2_t2_uuid1.ok ├── __lock.tdb ├── __meta └── __schema └── __t1_t1_uuid2 TileDB supports storing variable-length values in a cell (of any data type) a bb ccc dddd e ff ggg hhhh i jj kkk lll m nn ooo pppp offsets var-length data
  • 22. Var-length Dimensions sparse_array4 ├── __t2_t2_uuid1_v │ ├── __fragment_metadata.tdb │ └── a0.tdb │ └── d0.tdb │ └── d0_var.tdb ├── __t2_t2_uuid1.ok ├── __lock.tdb ├── __meta └── __schema └── __t1_t1_uuid2 You can also have var-length dimensions and slice naturally using string ranges Applicable only to sparse arrays offsets var-length data a bb ccc dddd e ff 1 2 3 4 5 6 unbounded domain infinite gaps
  • 23. Heterogeneous Dimensions 4 1.0 0.0 “dddd” 0.4 infinite string dimension infinite float32 dimension Sparse array allow you to have dimensions of different types The following 2D array allows efficient slicing on a string and a float32 dimension
  • 24. Arrays as Dataframes An array is essentially a dataframe where dimensions are special (they are “indexed”)
  • 25. What About Cloud Object Stores? array_name → {s3,azure,gcs,tiledb}://path/array_name Everything demonstrated works as is on the cloud
  • 27. Tiling | Dense Arrays fetches the whole array from storage 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 space tile extents fetches only a portion of the array (a tile) A space tile is the atomic unit of IO space tile extents
  • 28. Cell Layout | Dense Arrays Three parameters define the values layout on storage, called the global order Space tile extents Tile order/layout (row-major or column-major) Cell order/layout (row-major or column major) row-major tile order row-major cell order 22 space tiles col-major tile order row-major cell order 22 space tiles row-major tile order col-major cell order 42 space tiles
  • 29. Tiling & Cell Layout | Sparse Arrays Sparse arrays store only non-empty cells Grouping non-empty cells with space tiles would be inefficient (due to potential skew) The atomic unit of IO in sparse arrays is the data tile, of fixed (user-defined) capacity First impose a global order similar to dense arrays, then group based on capacity col-major tile order row-major cell order 22 space tiles capacity 2 space tile extents space tile extents col-major tile order row-major cell order 22 space tiles capacity 4 data tile
  • 30. Hilbert Order | Sparse Arrays Space tiles greatly affect the cell layout in sparse arrays Sometimes it is very difficult to define a good space tiling (especially with floats and strings) For such cases, the Hilbert order is ideal (no tile extents and order) For floats we discretize the domain into buckets based on the number of dimensions For strings we assign a number of bits per dimension and then use the string prefixes as numbers
  • 31. Tile Filters TileDB allows a wide range of filters to be applied to each tile prior to its storage Compressors (gzip, zstd, bzip2, …) Checksums Encryption The atomic unit of filtering is the chunk (typically equal to the L1 cache size) TileDB applies the filters across chunks in parallel in a pipeline chunk tile zstd AES256
  • 33. Versioning and Time Traveling In TileDB, every write is immutable Each (batch) write creates a timestamped fragment With fragments, TileDB implements versioning and time traveling
  • 34. Versioning and Time Traveling | Dense Arrays 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Read at 0,t1 100 3 4 7 8 9 10 11 12 13 14 15 16 Read at 0,t2 200 500 600 100 - - - - - - - - - - - - Read at (t1,t2 200 500 600 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Write at t1 100 200 500 600 Write at t2
  • 35. 100 Write at t2 40 Versioning and Time Traveling | Sparse Arrays 1 2 3 4 5 6 Write at t1 100 2 3 40 5 6 Read at 0,t2 1 100 Read at (t1,t2 40 When no dups allowed 1 4 6 Read at 0,t1 2 3 5
  • 36. When dups are allowed 4 dups 100 Write at t2 40 Versioning and Time Traveling | Sparse Arrays 1 2 3 4 5 6 Write at t1 100 Read at (t1,t2 40 1 4 6 Read at 0,t1 2 3 5 100 2 3 40 5 6 Read at 0,t2 1
  • 37. Indexing TileDB has a three-level indexing approach Fragment timestamps (in the fragment names) for time traveling Non-empty domain in each fragment’s metadata Either simple offset arithmetic (dense) or R-trees (sparse) 1. Get list of fragment names (with .ok) t1_t1_uuid1_v t2_t2_uuid2_v ... 2. Ignore fragments with timestamp not in time traveling interval 3. Ignore fragments with non-empty domain not overlapping slice __fragment_metadata.tdb __fragment_metadata.tdb 4a. Ignore dense tiles via implicit positional indexing, or 4b. Ignore sparse tiles from the R-tree that do not overlap the slice Algorithm
  • 38. A slicing query would just traverse the tree top-down, visiting only nodes/MBRs that intersect the slice Indexing Given the non-empty domain, the space tile extents and the tile order, we can find easily that this slice overlaps the second and fourth tile 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 row-major tile order 22 space tiles MBR1 MBR2 MBR3 MBR4 col-major tile order row-major cell order 22 space tiles capacity 2 R-tree (stored in fragment metadata) MBR1 MBR2 MBR3 MBR4
  • 39. Consolidation & Vacuuming Numerous fragments can lead to performance degradation (loss of locality, expensive listing) TileDB supports two levels of consolidation Fragment metadata (group the non-empty domains in a single place) Fragments (better preserve data locality) Old fragments are preserved after consolidation (for time traveling) TileDB can vacuum old fragments to save space and boost listing Time traveling will not work on vacuumed fragments
  • 40. Attribute Filter Push-Down TileDB supports pushing attribute filter conditions down to the engine That typically boosts performance Much fewer data gets copied around More L1-cache conscious More opportunities for parallelism and vectorization
  • 41. Schema Evolution TileDB supports schema evolution (since v2.4 Adding an attribute Dropping an attribute More schema evolution features are coming up Full versioning and time traveling is supported
  • 42. Notes on Writing Lots of flexibility in writing in different orders, different domain subarray, etc. Support for lock-free, parallel writing Tips for performance: Each tile should be 100KB  1MB Each fragment should be 1  2GB Fragments should not “interleave” Run fragment metadata consolidation (especially on cloud object stores) No support for deletions and updates yet (coming up soon)
  • 43. Notes on Reading TileDB is eventually consistent Support for parallel writers, parallel readers (all lock-free) Support for reads in different layouts Support for “streaming reads” (incomplete queries) Tips for performance: Allocate sufficient space for the result buffers (minimize incomplete queries) Tune written layout based on the read layout (application dependent) Push down coordinate and attribute filter conditions
  • 45. Coming Up More schema evolution features Support for deletes and updates Git-like versioning ACID via modularizing locking More tile filters (e.g., sum, min, max) RLE and dictionary compression on strings Computations on compressed data Linear Algebra operations More SQL push down (e.g., group by) Graph algorithms
  • 47. High-level Comparisons vs. HDF5 TileDB is cloud-native TileDB has support for sparse arrays vs. Zarr TileDB is built in C and is more interoperable TileDB has support for sparse arrays TileDB has support for versioning and time traveling TileDB has support for versioning and time traveling
  • 48. High-level Comparisons vs. Parquet TileDB is multi-dimensional and supports more flexible layouts TileDB has support for dense arrays vs. Delta Lake TileDB does not rely on Spark, Presto or other subsystem TileDB has support for dense arrays TileDB has support for versioning and time traveling TileDB does not support deletes, updates and full ACID (yet) TileDB is natively multi-dimensional and supports more flexible layouts
  • 49. The Universal Database Thank you WE ARE HIRING Apply at tiledb.workable.com