SlideShare a Scribd company logo
MONGODB FOR MULTI-DIMENSION
SPATIAL INDEXING
DECEMBER 2012
@nknize
+Nicholas Knize
Thermopylae Sciences & Technology – Who are we?
• Mixed Government (70%) and Commercial (30%) contracting
company w/ ~150 employees
• Core customers:
– SOUTHCOM, Intel & Security Command, Army Intel Sector, DOI
– LVMS, Select Energy Oil & Gas, OSU, Cleveland Cavaliers, and STL Rams
• #1 Google Enterprise partner for Federal and partner w/
imagery providers (GeoEye / Digital Globe)
• FOSS4G contributor and 10gen Enterprise partner
WHO ARE THESE GUYS?
ACCOMPLISHING THE IMPOSSIBLE
ENTERPRISE
PARTNER
“The 3D UDOP allows near real time visibility of all SOUTHCOM Directorates information in one
location…this capability allows for unprecedented situational awareness and information sharing”
-Gen. Doug Frasier
TST PRODUCTS
ACCOMPLISHING THE IMPOSSIBLE
COMMERCIAL CUSTOMERS
ACCOMPLISHING THE IMPOSSIBLE
Commercial Examples
Cleveland
Cavaliers
USGIF Las Vegas
Motor Speedway
Baltimore
Grand Prix
iSpatial framework serves millions of mobile devices
1. iSpatial provides web-based interface for Multi-INT visualization and collaborations
2. Map/Reduce provides spatial statistic processing (spatial regression) and heuristics
3. Modified MongoDB provides storing and indexing multi-dimension spatial data at scale
TST ARCHITECTURE
ACCOMPLISHING THE IMPOSSIBLE
iSpatial – UI/Visualization
Hadoop M/R – Processing / Analysis
MongoDB – Spatial Data Management @ Scale
1 2
3
What the…..HOW MUCH DATA?!?
• “Swimming in sensors drowning in data”
– What size data tsunami are we talking about?
• “Fix and Finish are meaningless until FIND is accomplished”
– A “Big Data” Spatial Search Problem
THAT’S A LOT OF DATA….
ACCOMPLISHING THE IMPOSSIBLE
Sensor Type Resolution Data Bandwidth TB/Hr
FMV 640 x 480 (Std Def)
1920 x 1080 (HD)
HD: 16bit x 3 bands @
30fps ~1Gbps
~0.45 TB
WAMI Constant Hawk = 96 Mpx
Gorgon Stare = 460 Mpx
Argus = 1.8 Gpx
GS @ 16bit x 3 bands @
2fps ~15.3Gps
Argus @ 16bit x 3 bands
@ 12fps ~345.6Gps
~6.89 TB
~155 TB
Satellite NITF / JP2 resolutions
32K x 32K
432K x 216K
32K x 32K @ 8bit x 3
bands @ 1frame/5mins
~27Gps
~12.15 TB
• Horizontally scalable – Large volume / elastic
• Vertically scalable – Heterogeneous data types (“Data Stack”)
• Smartly Distributed – Reduce the distance bits must travel
• Fault Tolerant – Replication Strategy and Consistency model
• High Availability – Node recovery
• Fast – Reads or writes (can’t always have both)
BIG DATA STORAGE CHARACTERISTICS
ACCOMPLISHING THE IMPOSSIBLE
Desired Data Store Characteristic for ‘Big Data’
• Cassandra
– Nice Bring Your Own Index (BYOI) design
– … but Java, Java, Java… Memory management can be a maintenance issue
– Adding new nodes can be a pain (Token Changes, nodetool)
– Key-Value store…good for simple data models
• Hbase
– Nice BigTable model
– Key-Value store…good for simple data models
– Lots of Java JNI (primarily based on std:hashmap of std:hashmap)
• CouchDB
– Provides some GeoSpatial functionality (Currently being rewritten)
– HEAVILY dependent on Map-Reduce model (complicated design)
– Erlang based – poor multi-threaded heap management
NOSQL OPTIONS
ACCOMPLISHING THE IMPOSSIBLE
Subset of Evaluated NoSQL Options
Why MongoDB for Thermopylae?
• Documents based on JSON – A GEOJSON match made in heaven! (OGC)
• C++ - No Garbage Collection Overhead! Efficient memory management
design reduces disk swapping and paging
• Disk storage is memory mapped, enabling fast swapping when necessary
• Built in auto-failover with replica sets and fast recovery with journaling
• Tunable Consistency – Consistency defined at application layer
• Schema Flexible – friendly properties of SQL enable easy port
• Provided initial spatial indexing support – Point based limited!
WHY TST <3’S MONGODB
ACCOMPLISHING THE IMPOSSIBLE
MONGODB SPATIAL INDEXER
ACCOMPLISHING THE IMPOSSIBLE
... The Spatial Indexer wasn’t quite right
• MongoDB (like nearly all relational DBs) uses a b-Tree
– Data structure for storing sorted data in log time
– Great for indexing numerical and text documents (1D attribute data)
– Cannot store multi-dimension (>2D) data – NOT COMPLEX GEOMETRY
FRIENDLY
DIMENSIONALITY REDUCTION
ACCOMPLISHING THE IMPOSSIBLE
How does MongoDB solve the dimensionality problem?
• Space Filling (Z) Curve
– A continuous line that
intersects every point in a
two-dimensional plane
• Use Geohash to
represent lat/lon values
– Interleave the bits of a
lat/long pair
– Base32 encode the result
GEOHASH BTREE ISSUES
ACCOMPLISHING THE IMPOSSIBLE
• Neighbors aren’t so
close!
– Neighboring points on the
Geoid may end up on
opposite ends of the
plane
– Impacts search efficiency
• What about Geometry?
– Doesn’t support > 2D
– Mongo uses Multi-
Location documents
which really just indexes
multiple points that link
back to a single document
Issues with the Geohash b-Tree approach
Sort Order and Multi-Dimension…a nightmare
(3D / 4D Hilbert Scanning Order)
GEO-SHARDING ALTERNATIVE
ACCOMPLISHING THE IMPOSSIBLE
Case 3:
Case 4:
Multi-Location Document (aka. Polygon) Search Polygon
Case 1:
Case 2:
Success!
Success!
Fail!
Fail!
Mongo Multi-location Document Clipping Issues
($within search doesn’t always work w/ multi-location)
MULTI-LOCATION CLIPPING
ACCOMPLISHING THE IMPOSSIBLE
• Constrain the system to single point searches
– Multi-dimension support will be exponentially complex (won’t scale)
• Interpolate points along the edge of the shape
– Multi-dimension support will be exponentially complex (won’t scale)
• Customize the spatial indexer
– Selected approach
SOLUTIONS TO GEOHASH PROBLEM
ACCOMPLISHING THE IMPOSSIBLE
Potential Solutions
CUSTOM TUNED SPATIAL INDEXER
ACCOMPLISHING THE IMPOSSIBLE
Thermopylae Custom Tuned MongoDB for Geo
TST Leverage’s Kriegel’s 1996 Research in R* Trees
• R-Trees organize any-dimensional data by representing
the data as a minimum bounding box.
• Each node bounds it’s children. A node can have many
objects in it (max: m min: ceil(m/2) )
• Splits and merges optimized by minimizing overlaps
• The leaves point to the actual objects (stored on disk
probably)
• Height balanced – search is always O(log n)
Spatial Indexing at Scale with R-Trees
RTREE THEORY
ACCOMPLISHING THE IMPOSSIBLE
Spatial data represented as minimum bounding rectangles (2-
dimension), cubes (3-dimension), hexadecant (4-dimension)
Index represented as: <I, DiskLoc> where:
I = (I0, I1, … In) : n = number of dimensions
Each I is a set in the form of [min,max] describing MBR range along a dimension
R*-Tree Spatial Index Example
• Sample insertion result for 4th order
tree
• Objectives:
1. Minimize area
2. Minimize overlaps
3. Minimize margins
4. Maximize inner node utilization
a b cd e f g h i j k l
m n o p
R*-TREE INDEX OBJECTIVES
ACCOMPLISHING THE IMPOSSIBLE
Insert
• Similar to insertion into B+-tree but may insert
into any leaf; leaf splits in case capacity exceeded.
– Which leaf to insert into?
– How to split a node?
R*-TREE INSERT EXAMPLE
ACCOMPLISHING THE IMPOSSIBLE
Insert—Leaf Selection
• Follow a path from root to leaf.
• At each node move into subtree whose MBR area
increases least with addition of new rectangle.
m
n
o p
Insert—Leaf Selection
• Insert into m.
m
Insert—Leaf Selection
• Insert into n.
n
Insert—Leaf Selection
• Insert into o.
o
Insert—Leaf Selection
• Insert into p.
p
m
n
o p
a
a
a
x
a b cd e f g h i j k l
m n o p
Query
• Start at root
• Find all overlapping MBRs
• Search subtrees recursively
Query
• Search m.
m
n
o p
a
a
x x
a b cd e f g h i j k l
m n o p
a
a
a
b
c
d
e
g
R*-Tree Leverages B-Tree Base Data Structures (buckets)
R*-TREE MONGODB IMPLEMENTATION
ACCOMPLISHING THE IMPOSSIBLE
Spatial Index
Architecture, Organization, & Performance
MBRKeyNode(s)
BucketHeader
MBRHeader
…
Dimensions Num Buckets Tree Height Read Time
3 3,448,276 3 190 ms
5 50,76,143 3 275 ms
100 90,909,091 8 ~4.9 sec
1B Polygon Read Performance (worst case O(n))
SPATIAL INDEX ARCH & ORG
ACCOMPLISHING THE IMPOSSIBLE
Geo-Sharding – (in work)
Scalable Distributed R* Tree (SD-r*Tree)
“Balanced” binary tree, with
nodes distributed on a set of
servers:
• Each internal node has
exactly two children
• Each leaf node stores a
subset of the indexed
dataset
• At each node, the height
of the subtrees differ by
at most one
• mongos “routing” node
maintains binary tree
GEO-SHARDING
ACCOMPLISHING THE IMPOSSIBLE
d0 d1
r1d0
Data Node Spatial
Coverage
a a
b
c
cb d0
r1
a
b
c
c
b
d2d1
e
d
d
r2
e
SD-r*Tree Data Structure Illustration
• di = Data Node (Chunk)
• ri = Coverage Node
Leveraged work from Litwin, Mouza, Rigaux 2007
SD-r*Tree DATA STRUCTURE
ACCOMPLISHING THE IMPOSSIBLE
SD-r*Tree Structure Distribution
d0
r1
a
b
c
c
b
d2d1
e
d
d
r2
e
r2
d1 d2
d0
r1
GeoShard 2 GeoShard 3
GeoShard 1
mongos
SD-r*TREE STRUCTURE DISTRIBUTION
ACCOMPLISHING THE IMPOSSIBLE
Beyond 4-Dimensions - X-Tree
(Berchtold, Keim, Kriegel – 1996)
Normal Internal Nodes Supernodes Data Nodes
• Avoid MBR overlaps – more overlaps approaches worst case O(n) read
• Avoid node splits (main cause for high overlap)
• Introduce new node structure: Supernodes – Large Directory nodes of variable size
BEYOND 4-DIMENSIONS
ACCOMPLISHING THE IMPOSSIBLE
X-TREE PERFORMANCE
ACCOMPLISHING THE IMPOSSIBLE
X-Tree Performance Results
(Berchtold, Keim, Kriegel – 1996)
T-Sciences Custom Tuned Spatial Indexer
• Optimized Spatial Search – Finds intersecting MBR and recurses into
those nodes
• Optimized Spatial Inserts – Uses the Hilbert Value of MBR centroid to
guide search
– 28% reduction in number of nodes touched
• Optimize Deletes – Leverages R* split/merge approach for rebalancing
tree when nodes become over/under-full
• Low maintenance – Leverages MongoDB’s automatic data compaction
and partitioning
CONCLUSION
ACCOMPLISHING THE IMPOSSIBLE
Example: Mosaicked Video with KLV Footprints
SLIDESHOW HEADER
ACCOMPLISHING THE IMPOSSIBLE
• Rip through
KLV Metadata
• Index frame
footprints, and
annotations as
MBR into
X(R*)-Tree
• Leverage Geo-
Sharding for
spatially
relevant scale
Example Use Case – OSINT (Foursquare Data)
• Sample Foursquare
data set mashed with
Government Intel
Data (poly reports)
• 100 million Geo
Document test (3D
points and polys)
• 4 server replica set
• ~350ms query
response
• ~300%
improvement over
PostGIS
EXAMPLE
ACCOMPLISHING THE IMPOSSIBLE
Community Support
• Thermopylae plans to open source
– http://github.com/thermopylae
• TST working with 10gen to offer as a spatial extension
• Active developer collaboration
– IRC: #mongodb freenode.net
FIND US
ACCOMPLISHING THE IMPOSSIBLE
THANK YOU
Questions?
Nicholas Knize
nknize@t-sciences.com
THANK YOU
ACCOMPLISHING THE IMPOSSIBLE
Backup
Key Customers - Government
• US Dept of State Bureau of Diplomatic Security
– Build and support 30 TB Google Earth Globe with multi-
terabytes of individual globes sent to embassies throughout
the world. Integrated Google Earth and iSpatial framework.
• US Army Intelligence Security Command
– Provide expertise in managing technology integration –
prime contractor providing operations, intelligence, and IT
support worldwide. Partners include IBM, Lockheed Martin,
Google, MIT, Carnegie Mellon. Integrated Google Earth and
iSpatial framework.
• US Southern Command
– Coordinate Intelligence management systems spatial data
collection, indexing, and distribution. Integrated Google
Earth, iSpatial, and iHarvest.
– Index large volume imagery and expose it for different
services (Air Force, Navy, Army, Marines, Coast Guard)
GOVERNMENT CUSTOMERS
ACCOMPLISHING THE IMPOSSIBLE
COMMERCIAL CUSTOMERS
ACCOMPLISHING THE IMPOSSIBLE
Key Customers - Commercial
Cleveland
Cavaliers
USGIF Las Vegas
Motor Speedway
Baltimore
Grand Prix
iSpatial framework serves millions of mobile devices
• Expose and manage Multi-INT enterprise data in a geo-temporal
user defined environment
• Provide a flexible and scalable spatial data infrastructure (SDI)
for Multi-INT data access and analysis
• Spatially referenced data visualization on 3D globe & 2D maps
• Access real/near real-time data feeds from forward deployed
devices
• Enable real-time information sharing and mission collaboration
ISPATIAL OVERVIEW
ACCOMPLISHING THE IMPOSSIBLE

More Related Content

PPTX
RTree Spatial Indexing with MongoDB - MongoDC
PDF
Petit Déjeuner Datastax 14-04-15 Courbo Spark : exemple de Machine Learning s...
PPTX
Making data storage more efficient
PDF
Spatial index(2)
PDF
TriHUG 3/14: HBase in Production
PPTX
Fundamental of Big Data with Hadoop and Hive
PDF
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
PDF
Regularised Cross-Modal Hashing (SIGIR'15 Poster)
RTree Spatial Indexing with MongoDB - MongoDC
Petit Déjeuner Datastax 14-04-15 Courbo Spark : exemple de Machine Learning s...
Making data storage more efficient
Spatial index(2)
TriHUG 3/14: HBase in Production
Fundamental of Big Data with Hadoop and Hive
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
Regularised Cross-Modal Hashing (SIGIR'15 Poster)

What's hot (19)

ODP
Google's Dremel
PDF
Datomic rtree-pres
PPT
Development of a Prototype Web GIS Server for HDF-EOS Data based on OGC Web M...
PDF
Scalable high-dimensional indexing with Hadoop
PDF
Expressing and Exploiting Multi-Dimensional Locality in DASH
PDF
Dremel Paper Review
PDF
Scaling Storage and Computation with Hadoop
PPT
Dremel: Interactive Analysis of Web-Scale Datasets
PDF
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
PPTX
Dremel interactive analysis of web scale datasets
PDF
Processing Big Data (Chapter 3, SC 11 Tutorial)
PPTX
Why is postgis awesome?
PDF
Introduction to Hadoop and Big Data Processing
PPTX
KIISE:SIGDB Workshop presentation.
PDF
GeoServer on Steroids at FOSS4G Europe 2014
PPTX
Spatial Data processing with Hadoop
PPTX
Presentation sreenu dwh-services
PDF
Modeling with Hadoop kdd2011
PPTX
Modern software design in Big data era
Google's Dremel
Datomic rtree-pres
Development of a Prototype Web GIS Server for HDF-EOS Data based on OGC Web M...
Scalable high-dimensional indexing with Hadoop
Expressing and Exploiting Multi-Dimensional Locality in DASH
Dremel Paper Review
Scaling Storage and Computation with Hadoop
Dremel: Interactive Analysis of Web-Scale Datasets
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Dremel interactive analysis of web scale datasets
Processing Big Data (Chapter 3, SC 11 Tutorial)
Why is postgis awesome?
Introduction to Hadoop and Big Data Processing
KIISE:SIGDB Workshop presentation.
GeoServer on Steroids at FOSS4G Europe 2014
Spatial Data processing with Hadoop
Presentation sreenu dwh-services
Modeling with Hadoop kdd2011
Modern software design in Big data era
Ad

Viewers also liked (20)

PDF
3DRepo
PDF
3D + MongoDB = 3D Repo
PPT
Salesforce - classification of cloud computing
PPT
Robotics classes in mumbai
PDF
How Fannie Mae Leverages Data Quality to Improve the Business
PPT
Robotics training in mumbai
PPT
Python classes in mumbai
PDF
2011 Annual Report
PDF
Boyuan Construction Investor Presentation
PPTX
How to Accelerate Backup Performance with Dell DR Series Backup Appliances
PDF
Cloud Ready Data: Speeding Your Journey to the Cloud
PDF
Business Innovation Approach
PDF
Developer's Guide to Knights Landing
PPTX
Reduce Networking/Infrastructure Cost With Multi-Vendor Equipment and Support...
PDF
Dematic Logistics Review #5
PPT
PCB DESIGN - Introduction to PCB Design Manufacturing
PPTX
Market Intelligence FY15 Defense Budget Briefing
PPT
Lecture 8Cylinders & open and closed circuit
PPT
Linux administration classes in mumbai
PDF
Revisions in the new d1.1 2010
3DRepo
3D + MongoDB = 3D Repo
Salesforce - classification of cloud computing
Robotics classes in mumbai
How Fannie Mae Leverages Data Quality to Improve the Business
Robotics training in mumbai
Python classes in mumbai
2011 Annual Report
Boyuan Construction Investor Presentation
How to Accelerate Backup Performance with Dell DR Series Backup Appliances
Cloud Ready Data: Speeding Your Journey to the Cloud
Business Innovation Approach
Developer's Guide to Knights Landing
Reduce Networking/Infrastructure Cost With Multi-Vendor Equipment and Support...
Dematic Logistics Review #5
PCB DESIGN - Introduction to PCB Design Manufacturing
Market Intelligence FY15 Defense Budget Briefing
Lecture 8Cylinders & open and closed circuit
Linux administration classes in mumbai
Revisions in the new d1.1 2010
Ad

Similar to High Dimensional Indexing using MongoDB (MongoSV 2012) (20)

PPTX
Getting Started with Geospatial Data in MongoDB
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Spatial_Data_Structures_Presentation.pptx
PPT
Mongo sf spatialmongo
PDF
Giving MongoDB a Way to Play with the GIS Community
PPTX
Spatial mongo for PHP and Zend
PPTX
Bringing Spatial Love to Your Java Application
PPTX
Spatial Mongo and Node.JS on Openshift JS.Everywhere 2012
PDF
A Century Of Weather Data - Midwest.io
PPTX
Geoindexing with MongoDB
PPT
MongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
PDF
Geospatial and MongoDB
PPTX
Spatial MongoDB, Node.JS, and Express - server-side JS for your application
KEY
Mapping Flatland: Using MongoDB for an MMO Crossword Game (GDC Online 2011)
PPTX
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
PDF
An Introduction to Mongo DB
PPTX
Geo Searches for Health Care Pricing Data with MongoDB
PDF
SPATIAL R-TREE INDEX BASED ON GRID DIVISION FOR QUERY PROCESSING
PDF
Spatial Indexing
Getting Started with Geospatial Data in MongoDB
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Spatial_Data_Structures_Presentation.pptx
Mongo sf spatialmongo
Giving MongoDB a Way to Play with the GIS Community
Spatial mongo for PHP and Zend
Bringing Spatial Love to Your Java Application
Spatial Mongo and Node.JS on Openshift JS.Everywhere 2012
A Century Of Weather Data - Midwest.io
Geoindexing with MongoDB
MongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
Geospatial and MongoDB
Spatial MongoDB, Node.JS, and Express - server-side JS for your application
Mapping Flatland: Using MongoDB for an MMO Crossword Game (GDC Online 2011)
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
An Introduction to Mongo DB
Geo Searches for Health Care Pricing Data with MongoDB
SPATIAL R-TREE INDEX BASED ON GRID DIVISION FOR QUERY PROCESSING
Spatial Indexing

Recently uploaded (20)

PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
Insiders guide to clinical Medicine.pdf
PPTX
Cell Types and Its function , kingdom of life
PDF
Basic Mud Logging Guide for educational purpose
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Institutional Correction lecture only . . .
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Complications of Minimal Access Surgery at WLH
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Classroom Observation Tools for Teachers
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
Pre independence Education in Inndia.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
Abdominal Access Techniques with Prof. Dr. R K Mishra
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
Insiders guide to clinical Medicine.pdf
Cell Types and Its function , kingdom of life
Basic Mud Logging Guide for educational purpose
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
O5-L3 Freight Transport Ops (International) V1.pdf
VCE English Exam - Section C Student Revision Booklet
Institutional Correction lecture only . . .
STATICS OF THE RIGID BODIES Hibbelers.pdf
Complications of Minimal Access Surgery at WLH
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Classroom Observation Tools for Teachers
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Pre independence Education in Inndia.pdf
PPH.pptx obstetrics and gynecology in nursing
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Anesthesia in Laparoscopic Surgery in India

High Dimensional Indexing using MongoDB (MongoSV 2012)

  • 1. MONGODB FOR MULTI-DIMENSION SPATIAL INDEXING DECEMBER 2012 @nknize +Nicholas Knize
  • 2. Thermopylae Sciences & Technology – Who are we? • Mixed Government (70%) and Commercial (30%) contracting company w/ ~150 employees • Core customers: – SOUTHCOM, Intel & Security Command, Army Intel Sector, DOI – LVMS, Select Energy Oil & Gas, OSU, Cleveland Cavaliers, and STL Rams • #1 Google Enterprise partner for Federal and partner w/ imagery providers (GeoEye / Digital Globe) • FOSS4G contributor and 10gen Enterprise partner WHO ARE THESE GUYS? ACCOMPLISHING THE IMPOSSIBLE ENTERPRISE PARTNER
  • 3. “The 3D UDOP allows near real time visibility of all SOUTHCOM Directorates information in one location…this capability allows for unprecedented situational awareness and information sharing” -Gen. Doug Frasier TST PRODUCTS ACCOMPLISHING THE IMPOSSIBLE
  • 4. COMMERCIAL CUSTOMERS ACCOMPLISHING THE IMPOSSIBLE Commercial Examples Cleveland Cavaliers USGIF Las Vegas Motor Speedway Baltimore Grand Prix iSpatial framework serves millions of mobile devices
  • 5. 1. iSpatial provides web-based interface for Multi-INT visualization and collaborations 2. Map/Reduce provides spatial statistic processing (spatial regression) and heuristics 3. Modified MongoDB provides storing and indexing multi-dimension spatial data at scale TST ARCHITECTURE ACCOMPLISHING THE IMPOSSIBLE iSpatial – UI/Visualization Hadoop M/R – Processing / Analysis MongoDB – Spatial Data Management @ Scale 1 2 3
  • 6. What the…..HOW MUCH DATA?!? • “Swimming in sensors drowning in data” – What size data tsunami are we talking about? • “Fix and Finish are meaningless until FIND is accomplished” – A “Big Data” Spatial Search Problem THAT’S A LOT OF DATA…. ACCOMPLISHING THE IMPOSSIBLE Sensor Type Resolution Data Bandwidth TB/Hr FMV 640 x 480 (Std Def) 1920 x 1080 (HD) HD: 16bit x 3 bands @ 30fps ~1Gbps ~0.45 TB WAMI Constant Hawk = 96 Mpx Gorgon Stare = 460 Mpx Argus = 1.8 Gpx GS @ 16bit x 3 bands @ 2fps ~15.3Gps Argus @ 16bit x 3 bands @ 12fps ~345.6Gps ~6.89 TB ~155 TB Satellite NITF / JP2 resolutions 32K x 32K 432K x 216K 32K x 32K @ 8bit x 3 bands @ 1frame/5mins ~27Gps ~12.15 TB
  • 7. • Horizontally scalable – Large volume / elastic • Vertically scalable – Heterogeneous data types (“Data Stack”) • Smartly Distributed – Reduce the distance bits must travel • Fault Tolerant – Replication Strategy and Consistency model • High Availability – Node recovery • Fast – Reads or writes (can’t always have both) BIG DATA STORAGE CHARACTERISTICS ACCOMPLISHING THE IMPOSSIBLE Desired Data Store Characteristic for ‘Big Data’
  • 8. • Cassandra – Nice Bring Your Own Index (BYOI) design – … but Java, Java, Java… Memory management can be a maintenance issue – Adding new nodes can be a pain (Token Changes, nodetool) – Key-Value store…good for simple data models • Hbase – Nice BigTable model – Key-Value store…good for simple data models – Lots of Java JNI (primarily based on std:hashmap of std:hashmap) • CouchDB – Provides some GeoSpatial functionality (Currently being rewritten) – HEAVILY dependent on Map-Reduce model (complicated design) – Erlang based – poor multi-threaded heap management NOSQL OPTIONS ACCOMPLISHING THE IMPOSSIBLE Subset of Evaluated NoSQL Options
  • 9. Why MongoDB for Thermopylae? • Documents based on JSON – A GEOJSON match made in heaven! (OGC) • C++ - No Garbage Collection Overhead! Efficient memory management design reduces disk swapping and paging • Disk storage is memory mapped, enabling fast swapping when necessary • Built in auto-failover with replica sets and fast recovery with journaling • Tunable Consistency – Consistency defined at application layer • Schema Flexible – friendly properties of SQL enable easy port • Provided initial spatial indexing support – Point based limited! WHY TST <3’S MONGODB ACCOMPLISHING THE IMPOSSIBLE
  • 10. MONGODB SPATIAL INDEXER ACCOMPLISHING THE IMPOSSIBLE ... The Spatial Indexer wasn’t quite right • MongoDB (like nearly all relational DBs) uses a b-Tree – Data structure for storing sorted data in log time – Great for indexing numerical and text documents (1D attribute data) – Cannot store multi-dimension (>2D) data – NOT COMPLEX GEOMETRY FRIENDLY
  • 11. DIMENSIONALITY REDUCTION ACCOMPLISHING THE IMPOSSIBLE How does MongoDB solve the dimensionality problem? • Space Filling (Z) Curve – A continuous line that intersects every point in a two-dimensional plane • Use Geohash to represent lat/lon values – Interleave the bits of a lat/long pair – Base32 encode the result
  • 12. GEOHASH BTREE ISSUES ACCOMPLISHING THE IMPOSSIBLE • Neighbors aren’t so close! – Neighboring points on the Geoid may end up on opposite ends of the plane – Impacts search efficiency • What about Geometry? – Doesn’t support > 2D – Mongo uses Multi- Location documents which really just indexes multiple points that link back to a single document Issues with the Geohash b-Tree approach
  • 13. Sort Order and Multi-Dimension…a nightmare (3D / 4D Hilbert Scanning Order) GEO-SHARDING ALTERNATIVE ACCOMPLISHING THE IMPOSSIBLE
  • 14. Case 3: Case 4: Multi-Location Document (aka. Polygon) Search Polygon Case 1: Case 2: Success! Success! Fail! Fail! Mongo Multi-location Document Clipping Issues ($within search doesn’t always work w/ multi-location) MULTI-LOCATION CLIPPING ACCOMPLISHING THE IMPOSSIBLE
  • 15. • Constrain the system to single point searches – Multi-dimension support will be exponentially complex (won’t scale) • Interpolate points along the edge of the shape – Multi-dimension support will be exponentially complex (won’t scale) • Customize the spatial indexer – Selected approach SOLUTIONS TO GEOHASH PROBLEM ACCOMPLISHING THE IMPOSSIBLE Potential Solutions
  • 16. CUSTOM TUNED SPATIAL INDEXER ACCOMPLISHING THE IMPOSSIBLE Thermopylae Custom Tuned MongoDB for Geo TST Leverage’s Kriegel’s 1996 Research in R* Trees • R-Trees organize any-dimensional data by representing the data as a minimum bounding box. • Each node bounds it’s children. A node can have many objects in it (max: m min: ceil(m/2) ) • Splits and merges optimized by minimizing overlaps • The leaves point to the actual objects (stored on disk probably) • Height balanced – search is always O(log n)
  • 17. Spatial Indexing at Scale with R-Trees RTREE THEORY ACCOMPLISHING THE IMPOSSIBLE Spatial data represented as minimum bounding rectangles (2- dimension), cubes (3-dimension), hexadecant (4-dimension) Index represented as: <I, DiskLoc> where: I = (I0, I1, … In) : n = number of dimensions Each I is a set in the form of [min,max] describing MBR range along a dimension
  • 18. R*-Tree Spatial Index Example • Sample insertion result for 4th order tree • Objectives: 1. Minimize area 2. Minimize overlaps 3. Minimize margins 4. Maximize inner node utilization a b cd e f g h i j k l m n o p R*-TREE INDEX OBJECTIVES ACCOMPLISHING THE IMPOSSIBLE
  • 19. Insert • Similar to insertion into B+-tree but may insert into any leaf; leaf splits in case capacity exceeded. – Which leaf to insert into? – How to split a node? R*-TREE INSERT EXAMPLE ACCOMPLISHING THE IMPOSSIBLE
  • 20. Insert—Leaf Selection • Follow a path from root to leaf. • At each node move into subtree whose MBR area increases least with addition of new rectangle. m n o p
  • 25. m n o p a a a x a b cd e f g h i j k l m n o p Query • Start at root • Find all overlapping MBRs • Search subtrees recursively
  • 26. Query • Search m. m n o p a a x x a b cd e f g h i j k l m n o p a a a b c d e g
  • 27. R*-Tree Leverages B-Tree Base Data Structures (buckets) R*-TREE MONGODB IMPLEMENTATION ACCOMPLISHING THE IMPOSSIBLE
  • 28. Spatial Index Architecture, Organization, & Performance MBRKeyNode(s) BucketHeader MBRHeader … Dimensions Num Buckets Tree Height Read Time 3 3,448,276 3 190 ms 5 50,76,143 3 275 ms 100 90,909,091 8 ~4.9 sec 1B Polygon Read Performance (worst case O(n)) SPATIAL INDEX ARCH & ORG ACCOMPLISHING THE IMPOSSIBLE
  • 29. Geo-Sharding – (in work) Scalable Distributed R* Tree (SD-r*Tree) “Balanced” binary tree, with nodes distributed on a set of servers: • Each internal node has exactly two children • Each leaf node stores a subset of the indexed dataset • At each node, the height of the subtrees differ by at most one • mongos “routing” node maintains binary tree GEO-SHARDING ACCOMPLISHING THE IMPOSSIBLE
  • 30. d0 d1 r1d0 Data Node Spatial Coverage a a b c cb d0 r1 a b c c b d2d1 e d d r2 e SD-r*Tree Data Structure Illustration • di = Data Node (Chunk) • ri = Coverage Node Leveraged work from Litwin, Mouza, Rigaux 2007 SD-r*Tree DATA STRUCTURE ACCOMPLISHING THE IMPOSSIBLE
  • 31. SD-r*Tree Structure Distribution d0 r1 a b c c b d2d1 e d d r2 e r2 d1 d2 d0 r1 GeoShard 2 GeoShard 3 GeoShard 1 mongos SD-r*TREE STRUCTURE DISTRIBUTION ACCOMPLISHING THE IMPOSSIBLE
  • 32. Beyond 4-Dimensions - X-Tree (Berchtold, Keim, Kriegel – 1996) Normal Internal Nodes Supernodes Data Nodes • Avoid MBR overlaps – more overlaps approaches worst case O(n) read • Avoid node splits (main cause for high overlap) • Introduce new node structure: Supernodes – Large Directory nodes of variable size BEYOND 4-DIMENSIONS ACCOMPLISHING THE IMPOSSIBLE
  • 33. X-TREE PERFORMANCE ACCOMPLISHING THE IMPOSSIBLE X-Tree Performance Results (Berchtold, Keim, Kriegel – 1996)
  • 34. T-Sciences Custom Tuned Spatial Indexer • Optimized Spatial Search – Finds intersecting MBR and recurses into those nodes • Optimized Spatial Inserts – Uses the Hilbert Value of MBR centroid to guide search – 28% reduction in number of nodes touched • Optimize Deletes – Leverages R* split/merge approach for rebalancing tree when nodes become over/under-full • Low maintenance – Leverages MongoDB’s automatic data compaction and partitioning CONCLUSION ACCOMPLISHING THE IMPOSSIBLE
  • 35. Example: Mosaicked Video with KLV Footprints SLIDESHOW HEADER ACCOMPLISHING THE IMPOSSIBLE • Rip through KLV Metadata • Index frame footprints, and annotations as MBR into X(R*)-Tree • Leverage Geo- Sharding for spatially relevant scale
  • 36. Example Use Case – OSINT (Foursquare Data) • Sample Foursquare data set mashed with Government Intel Data (poly reports) • 100 million Geo Document test (3D points and polys) • 4 server replica set • ~350ms query response • ~300% improvement over PostGIS EXAMPLE ACCOMPLISHING THE IMPOSSIBLE
  • 37. Community Support • Thermopylae plans to open source – http://github.com/thermopylae • TST working with 10gen to offer as a spatial extension • Active developer collaboration – IRC: #mongodb freenode.net FIND US ACCOMPLISHING THE IMPOSSIBLE
  • 40. Key Customers - Government • US Dept of State Bureau of Diplomatic Security – Build and support 30 TB Google Earth Globe with multi- terabytes of individual globes sent to embassies throughout the world. Integrated Google Earth and iSpatial framework. • US Army Intelligence Security Command – Provide expertise in managing technology integration – prime contractor providing operations, intelligence, and IT support worldwide. Partners include IBM, Lockheed Martin, Google, MIT, Carnegie Mellon. Integrated Google Earth and iSpatial framework. • US Southern Command – Coordinate Intelligence management systems spatial data collection, indexing, and distribution. Integrated Google Earth, iSpatial, and iHarvest. – Index large volume imagery and expose it for different services (Air Force, Navy, Army, Marines, Coast Guard) GOVERNMENT CUSTOMERS ACCOMPLISHING THE IMPOSSIBLE
  • 41. COMMERCIAL CUSTOMERS ACCOMPLISHING THE IMPOSSIBLE Key Customers - Commercial Cleveland Cavaliers USGIF Las Vegas Motor Speedway Baltimore Grand Prix iSpatial framework serves millions of mobile devices
  • 42. • Expose and manage Multi-INT enterprise data in a geo-temporal user defined environment • Provide a flexible and scalable spatial data infrastructure (SDI) for Multi-INT data access and analysis • Spatially referenced data visualization on 3D globe & 2D maps • Access real/near real-time data feeds from forward deployed devices • Enable real-time information sharing and mission collaboration ISPATIAL OVERVIEW ACCOMPLISHING THE IMPOSSIBLE

Editor's Notes

  • #4: Screen shot of UDOP…blow-out of key features (sharing, presentation builder, etc)