SlideShare a Scribd company logo
TileDB webinars
February 3, 2022
AIS data management
& time-series analytics on
TileDB Cloud
Founder & CEO of TileDB, Inc.
Dr. Stavros Papadopoulos
Deep roots at the intersection of HPC, databases and data science
Traction with telecoms, pharmas, hospitals and other scientific organizations
45+ members with expertise across all applications and domains
Who we are
TileDB was spun out from MIT and Intel Labs in 2017
WHERE IT ALL STARTED
Raised over $20M, we are very well capitalized
INVESTORS
Data Economics
Consumption
How tools can compute
on the data, where
does the computation
happen
Distribution
Who has access to the
data, what is the means
of access, and
monetization
Production
What format does the
data get produced in
and where does it get
stored
The Problem | Data Economics is Flawed
Distribution (secure sharing) is an afterthought
Data produced in inefficient formats
All data management
solutions focus here
Consumption
How tools can
compute on the data,
where does the
computation happen
Data in some
custom format
.las
.cog
.csv
The Problem
very high TCO
Storage in some cloud
bucket or marketplace Org #N:
Download + Wrangle +
Built analytics infra
Org #1:
Download + Wrangle +
Built analytics infra
burden at data vendor
for extra services
Enter TileDB
Secure governance & collaboration
Scalable, serverless compute
Data & code sharing & monetization
Pay-as-you-go, consumer pays
Extreme interoperability
No infra hassles
Universal data
management platform
Data in a universal,
analysis-ready format
User / group #1:
any tool, any scale
User / group #N:
any tool, any scale
no wrangling
The Secret Sauce | The Data Model
Dense array
Store everything as dense or sparse multi-dimensional arrays
Sparse array
Arrays Subsume Dataframes
Sparse array
Dataframe
Dense vector
The Secret Sauce | The Data Model
What can be modeled as an array
LiDAR (3D sparse)
SAR (2D or 3D dense)
Population genomics (3D sparse)
Single-cell genomics (2D dense or sparse)
Biomedical imaging (2D or 3D dense) Even flat files!!! (1D dense)
Time series (ND dense or sparse)
Weather (2D or 3D dense)
Graphs (2D sparse)
Video (3D dense)
Key-values (1D or ND sparse)
Tables (1D dense or ND sparse)
TileDB Cloud
❏ Access control and logging
❏ Serverless SQL, UDFs, task graphs
❏ Jupyter notebooks and dashboards
Unified data management
and easy serverless compute
at global scale
How we built a Universal Database
Efficient APIs & Tool Integrations via Zero-Copy Techniques
TileDB Embedded
Open-source interoperable
storage with a universal
open-spec array format
❏ Parallel IO, rapid reads & writes
❏ Columnar, cloud-optimized
❏ Data versioning & time traveling
Superior
performance
Built in C++
Fully-parallelized
Columnar format
Multiple compressors
R-trees for sparse arrays
TileDB Embedded
https://github.com/TileDB-Inc/TileDB
Open source:
Rapid updates
& data versioning
Immutable writes
Lock-free
Parallel reader / writer model
Time traveling
TileDB Embedded
https://github.com/TileDB-Inc/TileDB
Open source:
Extreme
interoperability
Numerous APIs
Numerous integrations
All backends
Optimized
for the cloud
Immutable writes
Parallel IO
Minimization of requests
TileDB Cloud
Universal storage Universal tooling
Universal data
.las .cog .vcf .csv
Universal scale
Management. Collaboration. Scalability
TileDB Cloud
Works as SaaS: https://cloud.tiledb.com
Works on premises
Currently on AWS, soon on any cloud
Built to work anywhere
Slicing, SQL, UDFs, task graphs
It is completely serverless
On-demand JupyterHub instances
Can launch Jupyter notebooks
Compute sent to the data
It is geo-aware
Authentication, compliance, etc.
It is secure
TileDB Cloud
Full marketplace (via Stripe)
Everything is monetizable
Access control inside and outside your
organization
Make any data and code public
Discover any public data and code
(central catalog)
Everything is shareable at global scale
Jupyter notebooks
UDFs and task graphs
ML models
Everything is an array!
Dashboards (e.g., R shiny apps)
All types of data (even flat files)
Full auditability (data, code, any action)
Everything is logged
AIS capabilities on TileDB Cloud
Data is analysis-ready,
no more CSV downloads
A built-in marketplace,
no infrastructure costs
Time-series analysis,
at extreme scale
Fusion of AIS data with
other sources (e.g., SAR)
Numerous APIs and tool
integrations
Visualization with popular
tools and dashboards
The Universal Database
Thank you
Spire Maritime
Enabling the Data Advantage: Hosted Data Platform
18
Covering the Earth 24/7: Global data and analytics
The Evolution of Spire Maritime’s Data Services
The Early Years (<2013)
• AIS Messages delivered via proxy/SFTP in raw NMEA
or CSV formats
• Customer 100% responsible for data storage,
position and static message synthesization,
indexing, manipulation, etc.
2013
• Geospatial Web Services (GWS) Introduced
• Easy to query vessel-based information
• Removes complications associated with real-time
synthesization of position and static messages
• Key fields indexed to provide rapid query responses
• Data delivered in industry standard schema for
easier storage and manipulation
2021
• Hosted Data Platform Introduced (TileDB)
• Maintains all the benefits of historical GWS content but removes
the complexity and lowers the expense that customers will
experience to store and compute against the data
• Enables immediate access to interrogate Spire Maritime’s historical
data using complex queries that would typically require a fully
configured database to run
• Spire Maritime’s AIS data updated daily into TileDB platform
2
1
Hosted Data Platform Use Cases
`
Customers who
believe they are
spending too much
money on storage and
compute time based
on their Spire
Maritime data
subscription
Customers who only
want to ask
questions of the
data
• Don’t need or want
to store archive
data locally
• Focus on answering
real world
questions starting
from the moment
access to the
platform is
granted
Customers who lack
the skill set to
create the databases
needed to
interrogate the data
in a fast and
efficient way

More Related Content

PDF
Population genomics is a data management problem
PDF
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
PDF
TileDB Cloud Webinar (09/30/2021)
PDF
The New Data Economics
PDF
Future of Data Strategy (ASEAN)
PDF
Data Virtualization to Survive a Multi and Hybrid Cloud World
PDF
The Curse of the Data Lake Monster
PDF
Data engineering design patterns
Population genomics is a data management problem
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
TileDB Cloud Webinar (09/30/2021)
The New Data Economics
Future of Data Strategy (ASEAN)
Data Virtualization to Survive a Multi and Hybrid Cloud World
The Curse of the Data Lake Monster
Data engineering design patterns

What's hot (20)

PDF
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
PDF
Uwe Seiler, Data Architect and Trainer at codecentric AG - "Hadoop & Germany ...
PPT
Information Technology
PDF
Rethink Your Data Governance - POPI Act Compliance Made Easy with Data Virtua...
PDF
Self Service Analytics enabled by Data Virtualization from Denodo
PPTX
Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality
PDF
Dataiku Data Science Studio (datasheet)
PDF
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
PPTX
Applying Big Data Superpowers to Healthcare
PPTX
MLUC 2011 XQuery Enigma
PPTX
Big data analytic platform
PDF
Why Data Virtualization? An Introduction.
PDF
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
PPT
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
PDF
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
PDF
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
PPTX
From Hadoop to Enterprise Data Warehouse
PPTX
Intorducing Big Data and Microsoft Azure
PPTX
Data Mining - The Big Picture!
PDF
Smart data for a predictive bank
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Uwe Seiler, Data Architect and Trainer at codecentric AG - "Hadoop & Germany ...
Information Technology
Rethink Your Data Governance - POPI Act Compliance Made Easy with Data Virtua...
Self Service Analytics enabled by Data Virtualization from Denodo
Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality
Dataiku Data Science Studio (datasheet)
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Applying Big Data Superpowers to Healthcare
MLUC 2011 XQuery Enigma
Big data analytic platform
Why Data Virtualization? An Introduction.
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
From Hadoop to Enterprise Data Warehouse
Intorducing Big Data and Microsoft Azure
Data Mining - The Big Picture!
Smart data for a predictive bank
Ad

Similar to AIS data management and time series analytics on TileDB Cloud (Webinar, Feb 3, 2022) (20)

PDF
Demystifying Data Warehouse as a Service (DWaaS)
PPTX
The Most Trusted In-Memory database in the world- Altibase
PPTX
The Last Frontier- Virtualization, Hybrid Management and the Cloud
PDF
Crafting highly scalable and performant Modern Data Platforms
PPTX
Delivering Data Democratization in the Cloud with Snowflake
PDF
How Financial Institutions Are Leveraging Data Virtualization to Overcome the...
PDF
Building a Logical Data Fabric using Data Virtualization (ASEAN)
PDF
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
PDF
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
PPTX
How to get Real-Time Value from your IoT Data - Datastax
PDF
The Future of Data Management: The Enterprise Data Hub
PDF
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
PDF
Data Driven Advanced Analytics using Denodo Platform on AWS
PPTX
Speak to Your Data
PDF
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
PPTX
Data Warehouse Modernization: Accelerating Time-To-Action
PDF
Horses for Courses: Database Roundtable
PDF
Modern Data Management for Federal Modernization
PDF
Architecting Agile Data Applications for Scale
Demystifying Data Warehouse as a Service (DWaaS)
The Most Trusted In-Memory database in the world- Altibase
The Last Frontier- Virtualization, Hybrid Management and the Cloud
Crafting highly scalable and performant Modern Data Platforms
Delivering Data Democratization in the Cloud with Snowflake
How Financial Institutions Are Leveraging Data Virtualization to Overcome the...
Building a Logical Data Fabric using Data Virtualization (ASEAN)
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
How to get Real-Time Value from your IoT Data - Datastax
The Future of Data Management: The Enterprise Data Hub
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Driven Advanced Analytics using Denodo Platform on AWS
Speak to Your Data
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
Data Warehouse Modernization: Accelerating Time-To-Action
Horses for Courses: Database Roundtable
Modern Data Management for Federal Modernization
Architecting Agile Data Applications for Scale
Ad

Recently uploaded (20)

PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
.pdf is not working space design for the following data for the following dat...
PDF
annual-report-2024-2025 original latest.
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
Lecture1 pattern recognition............
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Clinical guidelines as a resource for EBP(1).pdf
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
.pdf is not working space design for the following data for the following dat...
annual-report-2024-2025 original latest.
climate analysis of Dhaka ,Banglades.pptx
Foundation of Data Science unit number two notes
Business Acumen Training GuidePresentation.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Lecture1 pattern recognition............
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Data_Analytics_and_PowerBI_Presentation.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx

AIS data management and time series analytics on TileDB Cloud (Webinar, Feb 3, 2022)

  • 1. TileDB webinars February 3, 2022 AIS data management & time-series analytics on TileDB Cloud Founder & CEO of TileDB, Inc. Dr. Stavros Papadopoulos
  • 2. Deep roots at the intersection of HPC, databases and data science Traction with telecoms, pharmas, hospitals and other scientific organizations 45+ members with expertise across all applications and domains Who we are TileDB was spun out from MIT and Intel Labs in 2017 WHERE IT ALL STARTED Raised over $20M, we are very well capitalized INVESTORS
  • 3. Data Economics Consumption How tools can compute on the data, where does the computation happen Distribution Who has access to the data, what is the means of access, and monetization Production What format does the data get produced in and where does it get stored
  • 4. The Problem | Data Economics is Flawed Distribution (secure sharing) is an afterthought Data produced in inefficient formats All data management solutions focus here Consumption How tools can compute on the data, where does the computation happen
  • 5. Data in some custom format .las .cog .csv The Problem very high TCO Storage in some cloud bucket or marketplace Org #N: Download + Wrangle + Built analytics infra Org #1: Download + Wrangle + Built analytics infra burden at data vendor for extra services
  • 6. Enter TileDB Secure governance & collaboration Scalable, serverless compute Data & code sharing & monetization Pay-as-you-go, consumer pays Extreme interoperability No infra hassles Universal data management platform Data in a universal, analysis-ready format User / group #1: any tool, any scale User / group #N: any tool, any scale no wrangling
  • 7. The Secret Sauce | The Data Model Dense array Store everything as dense or sparse multi-dimensional arrays Sparse array
  • 8. Arrays Subsume Dataframes Sparse array Dataframe Dense vector
  • 9. The Secret Sauce | The Data Model What can be modeled as an array LiDAR (3D sparse) SAR (2D or 3D dense) Population genomics (3D sparse) Single-cell genomics (2D dense or sparse) Biomedical imaging (2D or 3D dense) Even flat files!!! (1D dense) Time series (ND dense or sparse) Weather (2D or 3D dense) Graphs (2D sparse) Video (3D dense) Key-values (1D or ND sparse) Tables (1D dense or ND sparse)
  • 10. TileDB Cloud ❏ Access control and logging ❏ Serverless SQL, UDFs, task graphs ❏ Jupyter notebooks and dashboards Unified data management and easy serverless compute at global scale How we built a Universal Database Efficient APIs & Tool Integrations via Zero-Copy Techniques TileDB Embedded Open-source interoperable storage with a universal open-spec array format ❏ Parallel IO, rapid reads & writes ❏ Columnar, cloud-optimized ❏ Data versioning & time traveling
  • 11. Superior performance Built in C++ Fully-parallelized Columnar format Multiple compressors R-trees for sparse arrays TileDB Embedded https://github.com/TileDB-Inc/TileDB Open source: Rapid updates & data versioning Immutable writes Lock-free Parallel reader / writer model Time traveling
  • 12. TileDB Embedded https://github.com/TileDB-Inc/TileDB Open source: Extreme interoperability Numerous APIs Numerous integrations All backends Optimized for the cloud Immutable writes Parallel IO Minimization of requests
  • 13. TileDB Cloud Universal storage Universal tooling Universal data .las .cog .vcf .csv Universal scale Management. Collaboration. Scalability
  • 14. TileDB Cloud Works as SaaS: https://cloud.tiledb.com Works on premises Currently on AWS, soon on any cloud Built to work anywhere Slicing, SQL, UDFs, task graphs It is completely serverless On-demand JupyterHub instances Can launch Jupyter notebooks Compute sent to the data It is geo-aware Authentication, compliance, etc. It is secure
  • 15. TileDB Cloud Full marketplace (via Stripe) Everything is monetizable Access control inside and outside your organization Make any data and code public Discover any public data and code (central catalog) Everything is shareable at global scale Jupyter notebooks UDFs and task graphs ML models Everything is an array! Dashboards (e.g., R shiny apps) All types of data (even flat files) Full auditability (data, code, any action) Everything is logged
  • 16. AIS capabilities on TileDB Cloud Data is analysis-ready, no more CSV downloads A built-in marketplace, no infrastructure costs Time-series analysis, at extreme scale Fusion of AIS data with other sources (e.g., SAR) Numerous APIs and tool integrations Visualization with popular tools and dashboards
  • 18. Spire Maritime Enabling the Data Advantage: Hosted Data Platform 18
  • 19. Covering the Earth 24/7: Global data and analytics
  • 20. The Evolution of Spire Maritime’s Data Services The Early Years (<2013) • AIS Messages delivered via proxy/SFTP in raw NMEA or CSV formats • Customer 100% responsible for data storage, position and static message synthesization, indexing, manipulation, etc. 2013 • Geospatial Web Services (GWS) Introduced • Easy to query vessel-based information • Removes complications associated with real-time synthesization of position and static messages • Key fields indexed to provide rapid query responses • Data delivered in industry standard schema for easier storage and manipulation 2021 • Hosted Data Platform Introduced (TileDB) • Maintains all the benefits of historical GWS content but removes the complexity and lowers the expense that customers will experience to store and compute against the data • Enables immediate access to interrogate Spire Maritime’s historical data using complex queries that would typically require a fully configured database to run • Spire Maritime’s AIS data updated daily into TileDB platform
  • 21. 2 1 Hosted Data Platform Use Cases ` Customers who believe they are spending too much money on storage and compute time based on their Spire Maritime data subscription Customers who only want to ask questions of the data • Don’t need or want to store archive data locally • Focus on answering real world questions starting from the moment access to the platform is granted Customers who lack the skill set to create the databases needed to interrogate the data in a fast and efficient way