SlideShare a Scribd company logo
Geospatial Indexing & Search
at Scale with Lucene
Nick Knize, PhD
Lucene PMC Member + Committer
@nknize
2
Lets take a trip...
...through Geo advancements
Lucene Geo Search, Then & Now: Prefix Trees vs. BKD Trees
Improving Shape Search: BKD + Tessellation
2
23
Lucene Geo in Elasticsearch, an Introduction1
Mappings
Geo Field Types
4
PUT crime/incidents/_mapping
{
“properties” : {
“location” : {
“type” : “geo_point”,
“ignore_malformed” : true,
}
}
}
define
geo_point mapping
POST crime/incidents
{
“location” : { “lat” : 41.12, “lon” : -71.34 }
}
5
insert
geo_point mapping
POST crime/incidents
{
“location” : “41.12, -71.34”
}
POST crime/incidents
{
“location” : [[-71.34, 41.12], [-71.32, 41.21]]
}
6
define
geo_shape mapping
PUT police/precincts/_mapping
{
“properties” : {
“coverage” : {
“type” : “geo_shape”,
“ignore_malformed” : false,
“tree” : ”quadtree”,
“precision” : “5m”,
“distance_error_pct“ : 0.025,
“orientation” : “ccw”,
“points_only” : false
}
}
}
7
insert
geo_shape mapping
POST police/precincts/
{
“coverage” : {
“type” : “polygon”,
“coordinates” : [[
[-73.9762134, 40.7538588],
[-73.9742356, 40.7526327],
[-73.9656733, 40.7516774],
[-73.9763236, 40.7521246],
[-73.9723788, 40.7516733],
[-73.9732423, 40.7523556],
[-73.9762134, 40.7538588]
]]
}
}
• Shapes are parsed using OGC and ISO standards definitions
• OGC Simple Feature Access
• ISO Geographic information — Spatial Schema (19107:2003)
• Supports the following geo_shape types
• Point, MultiPoint
• LineString, MultiLineString
• Polygon (with holes), MultiPolygon (with holes)
• Envelope (bbox)
geo_shape mapping
8
insert
Indexing
Under the hood
10
A long, long time ago, in a galaxy far far away...
...the Inverted [text] Index
terms dictionary
(terms)
postings list
(doc ids)
Fast 2
The 1
brown 1
dog 1
fox 1
jumping 2
jumps 1
lazy 1
over 1
quick 1
spiders 2
the 1
11
And the open source community went bananas!
we Lucene text search!
12
But what about 1D numbers...?
Prefix Trees; precision
terms dictionary
(terms)
postings list
(doc ids)
1 1, 2, 3, 4, 5
10 2, 3, 4
11 1, 5
100 2
101 3, 4
111 1, 5
1000 2
1010 4
1011 3
1110 1
1111 5
1 myINT: 3881553173
1110 0111 0101 1011 1100 1101 0001 0101
2 myINT: 2405166357
1000 1111 0101 1011 1110 1101 0001 0101
3 myINT: 3205335297
1011 1111 0000 1101 1000 1001 0000 0001
4 myINT: 2835679237
1010 1001 0000 0101 0000 1000 0000 0101
5 myINT: 4177856517
1111 1001 0000 1101 0000 1000 0000 0101
13
But what about numbers...?
Prefix Trees; precision
terms dictionary
(terms)
postings list
(doc ids)
1 1, 2, 3, 4, 5
10 2, 3, 4
11 1, 5
100 2
101 3, 4
111 1, 5
1000 2
1010 4
1011 3
1110 1
1111 5
1 myINT: 3881553173
1110 0111 0101 1011 1100 1101 0001 0101
2 myINT: 2405166357
1000 1111 0101 1011 1110 1101 0001 0101
3 myINT: 3205335297
1011 1111 0000 1101 1000 1001 0000 0001
4 myINT: 2835679237
1010 1001 0000 0101 0000 1000 0000 0101
5 myINT: 4177856517
1111 1001 0000 1101 0000 1000 0000 0101
14
And the Lucene community went...bananas...again...
we Lucene NUMERIC search!
15
But, but, what about Geo?
…we’re cool too!
16
How about Geo...terms?
Prefix Trees; round peg, square hole
terms dictionary
(terms)
postings list
(doc ids)
1 1, 2, 3, 4, 5
10 2, 3, 4
11 1, 5
100 2
101 3, 4
111 1, 5
1000 2
1010 4
1011 3
1110 1
1111 5
17
Searching for round pegs in square holes...
Traversing the Prefix Tree...
terms dictionary
(terms)
postings list
(doc ids)
1 1, 2, 3, 4, 5
10 2, 3, 4
11 1, 5
100 2
101 3, 4
111 1, 5
1000 2
1010 4
1011 3
1110 1
1111 5
2
4
3
1
5
18
Searching for round pegs in square holes...
Traversing the Prefix Tree...
terms dictionary
(terms)
postings list
(doc ids)
1 1, 2, 3, 4, 5
10 2, 3, 4
11 1, 5
100 2
101 3, 4
111 1, 5
1000 2
1010 4
1011 3
1110 1
1111 5
2
4
3
1
5
19
Searching for round pegs in square holes...
Traversing the Prefix Tree...
terms dictionary
(terms)
postings list
(doc ids)
1 1, 2, 3, 4, 5
10 2, 3, 4
11 1, 5
100 2
101 3, 4
111 1, 5
1000 2
1010 4
1011 3
1110 1
1111 5
2
4
3
1
5
20
Searching for round pegs in square holes...
Traversing the Prefix Tree...
terms dictionary
(terms)
postings list
(doc ids)
1 1, 2, 3, 4, 5
10 2, 3, 4
11 1, 5
100 2
101 3, 4
111 1, 5
1000 2
1010 4
1011 3
1110 1
1111 5
2
4
3
1
5
21
Searching for round pegs in square holes...
Traversing the Prefix Tree...
terms dictionary
(terms)
postings list
(doc ids)
1 1, 2, 3, 4, 5
10 2, 3, 4
11 1, 5
100 2
101 3, 4
111 1, 5
1000 2
1010 4
1011 3
1110 1
1111 5
2
4
3
1
5
22
Searching for round pegs in square holes...
Traversing the Prefix Tree...of death?
terms dictionary
(terms)
postings list
(doc ids)
1 1, 2, 3, 4, 5
10 2, 3, 4
11 1, 5
100 2
101 3, 4
111 1, 5
1000 2
1010 4
1011 3
1110 1
1111 5
23
Okay, cool! But what about GeoShapes?!
Quad Trees!
24
Okay, cool! But what about GeoShapes?!
Use the Inverted [geo] Index...
terms dictionary
(terms)
postings list
(doc ids)
1 1, 2, 3, 4, 5
10 1, 2, 4
11 3, 5
100 1
101 2, 4
111 3, 5
1000 2
1010 4
1011 3
1110 3
1111 5
DocID: 6
25
Okay, cool! But what about GeoShapes?!
Use the Inverted [geo] Index...
terms dictionary
(terms)
postings list
(doc ids)
1 1, 2, 3, 4, 5, 6
10 1, 2, 4
11 3, 5, 6
100 1
101 2, 4
111 3, 5, 6
1000 2
1010 4
1011 3
1110 3, 6
1111 5, 6
DocID: 6
26
Okay, cool! But what about GeoShapes?!
Melt with the Inverted [geo] Index...
27
Okay, cool! But what about GeoShapes?!
Melt with the Inverted [geo] Index...
• Max tree_levels == 32 (2 bits / cell)
• distance_error_pct
• “slop” factor to manage transient memory
usage
• % of the diagonal distance (degrees) of
the shape
• Default == 0 if precision set (2.0)
• points_only
• optimization for points only shape index
• short-circuits recursion
28
geo search?
pssh… it’s simple!
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
29
30
For duty and humanity!
31
We’re doing it wrong
32
33
34
Block K Dimensional Trees (Bkd) to the rescue!
...the right tool!
35
Block K Dimensional Trees (Bkd) to the rescue!
...the right tool!
Miny Minx Maxy Maxx
m: internal node
36
Block K Dimensional Trees (Bkd) to the rescue!
...the right tool!
Yval Xval
leaf node
37
Block K Dimensional Trees (Bkd) to the rescue!
multi-way...
38
Block K Dimensional Trees (Bkd) to the rescue!
multi-way… perfectly balanced...
39
Block K Dimensional Trees (Bkd) to the rescue!
multi-way… perfectly balanced...FAIR...
40
Block K Dimensional Trees (Bkd) to the rescue!
...the right tool!
41
Okay, cool! But what about GeoShapes?!
Tessellation!
42
Quad Tree
Decomposition
Simple polygon example
• 8 vertex polygon
• 1º x 1º coverage area
• 3m quad cell resolution
• 1,105,889 terms
43
Tessellation
Decomposition
Simple polygon example
• 8 vertex polygon
• 1º x 1º coverage area
• 3m quad cell resolution
• 1,105,889 terms
• 1.11 cm resolution
• 8 terms
• 138,236 : 1 term ratio
•  (•◡•) / smaller, faster index!
44
Tessellation + BKD
a made in… Lucene 7.4
Miny Minx Maxy Maxx
m: internal node
???
leaf node
45
Tessellation + BKD
a seven dimension made in… Lucene 7.4
Index Dimensions Data Dimensions
46
Tessellation + BKD
a made in… Lucene 7.4
Miny Minx Maxy Maxx
m: internal node
???
leaf node
47
Smaller, faster, stronger...
Searching triangle intersections
48
XYShape for Spatial...in Lucene 8.2+
Tessellating & Indexing Virtual Worlds
49
Smaller, faster, stronger… at scale
Smaller index...
50
Smaller, faster, stronger… at scale
Faster indexing - Nearly as fast as points!!
51
Smaller, faster, stronger… at scale
Positive feedback...
@imotov: "...test that was crashing a shard on a
node with 16GB heap after running for 4.5 hours
now finishes in less than 1sec"
@esri: "results show that the insert operation is 25
to 30 times faster in Elasticsearch 6.6 compare to
Elasticsearch 6.5.0/6.4.2. ...along with better
performance we also get accurate results for spatial
queries"
52
geo search?
pssh… it’s simple!
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
53
So, what’s next??
• Non-geo use cases (`XYShape` type)
‒ CAD Drawings (Design, County / City Planning)
‒ Venue Mapping (Conferences, Hotels, Theme Parks, Schools)
‒ Sports analysis (Chicago Cubs, Baseball Advanced Media)
‒ Virtual Gaming & Mapping (Blizzard, Cyber Security / SIEM)
• Other coordinate systems (datum / projection support)
‒ Non terrestrial planet models
‒ Localized projections
‒ Custom projections
• Beyond 2D
‒ Space - Time (OGC Moving Features)
‒ Elevation Modelling (DEM, DTM)
‒ LiDAR (3D Modelling)
Use case application goodies...
Available 8.2
In Work
Aggregations
Geo & Analytics
‹#› 55
GeoDistance
Agg
{
"aggs" : {
“sf_rings" : {
"geo_distance" : {
"field" : "location",
"origin" : [32.95,
-96.82],
"ranges" : [
{ "to" : 50 },
{ "from" : 50,
"to" : 100 },
{ "from" : 100,
"to" : 300}
]
}
}
}
}
‹#› 56
GeoDistance
Agg
‹#› 57
GeoGrid
Agg
{
"aggs" : {
“crime_cells" : {
"geohash_grid" : {
"field" : "location",
"precision" : 8
}
}
}
}
‹#› 58
GeoGrid
Agg
59
geotile_grid Aggregation
Finally, a grid designed for maps!
7.0 introduces geo_tile aggregation
- matches the tiling scheme of well
known tile maps in the Web Mercator
Projection (EPSG:3857)
- on web mercator maps, grid cells are
- actually square
- preserve an identical aspect
ratio at all scales and latitudes
‹#› 60
GeoCentroid
Agg
"query" : {
"match" : {
"crime" : "burglary"
}
},
"aggs" : {
"towns" : {
"terms" : { "field" : "town" },
"aggs" : {
"centroid" : {
"geo_centroid" : {
"field" : “location"
}
}
}
}
}
‹#› 61
GeoCentroid
Agg
‹#› 62
GeoCentroid
Agg
‹#› 63
matrix_stats
Agg
{
"aggs": {
"statistics": {
"matrix_stats": {
"fields": ["poverty", "income"]
}
}
}
}
‹#› 64
matrix_stats
Agg
"aggregations": {
"statistics": {
"doc_count": 50,
"fields": [{
"name": "income",
"count": 50,
"mean": 51985.1,
"variance": 7.383377037755103E7,
"skewness": 0.5595114003506483,
"kurtosis": 2.5692365287787124,
"covariance": {
"income": 7.383377037755103E7,
"poverty": -21093.65836734694
},
"correlation": {
"income": 1.0,
"poverty": -0.8352655256272504
}
}, {
"name": "poverty",
"count": 50,
"mean": 12.732000000000001,
"variance": 8.637730612244896,
"skewness": 0.4516049811903419,
"kurtosis": 2.8615929677997767,
"covariance": {
"income": -21093.65836734694,
"poverty": 8.637730612244896
},
"correlation": {
"income": -0.8352655256272504,
"poverty": 1.0
}
}]
}
65
Geo Aggregations
more available, and coming soon...
• pca - ML foundation plugin
‒ dimensionality reduction
‒ image analysis
‒ classification / recognition
• geo_stats - In work...
‒ Moran’s I - measuring spatial auto-correlation
‒ Getis-Ord - spatial hot spot analysis
66
Now GA in Elastic 7.3….
THANK YOU
Follow for updates:
@nknize
apache/lucene-solr

More Related Content

PPTX
You only look once (YOLO) : unified real time object detection
PDF
Object Detection Using R-CNN Deep Learning Framework
PDF
Uncertainty in Deep Learning
PDF
Deep learning based object detection basics
PDF
Lecture 4 Relationship between pixels
PDF
A Brief History of Object Detection / Tommi Kerola
PPTX
Edge Detection algorithm and code
PDF
Deep sort and sort paper introduce presentation
You only look once (YOLO) : unified real time object detection
Object Detection Using R-CNN Deep Learning Framework
Uncertainty in Deep Learning
Deep learning based object detection basics
Lecture 4 Relationship between pixels
A Brief History of Object Detection / Tommi Kerola
Edge Detection algorithm and code
Deep sort and sort paper introduce presentation

What's hot (20)

PDF
Optics ordering points to identify the clustering structure
PDF
Mukund Narasimhan, Engineer, Pinterest at MLconf Seattle 2017
PPTX
Deep learning for object detection
PDF
Neural Networks: Principal Component Analysis (PCA)
PDF
ViT (Vision Transformer) Review [CDM]
PDF
YOLOv4: optimal speed and accuracy of object detection review
PDF
PR-132: SSD: Single Shot MultiBox Detector
PDF
Support Vector Machines for Classification
PPT
Image processing SaltPepper Noise
PPTX
PDF
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
PDF
K - Nearest neighbor ( KNN )
PPTX
Filtering and masking
PPTX
Object Detection using Deep Neural Networks
PDF
Faster R-CNN - PR012
PDF
Harris corner detector and face recognition
PDF
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
PPTX
Faster R-CNN
PPTX
Image compression
PPT
Image segmentation
Optics ordering points to identify the clustering structure
Mukund Narasimhan, Engineer, Pinterest at MLconf Seattle 2017
Deep learning for object detection
Neural Networks: Principal Component Analysis (PCA)
ViT (Vision Transformer) Review [CDM]
YOLOv4: optimal speed and accuracy of object detection review
PR-132: SSD: Single Shot MultiBox Detector
Support Vector Machines for Classification
Image processing SaltPepper Noise
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
K - Nearest neighbor ( KNN )
Filtering and masking
Object Detection using Deep Neural Networks
Faster R-CNN - PR012
Harris corner detector and face recognition
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Faster R-CNN
Image compression
Image segmentation
Ad

Similar to Geospatial Indexing and Search at Scale with Apache Lucene (20)

PDF
Geo exploration simplified with Elastic Maps
PDF
Geospatial Advancements in Elasticsearch
PDF
20170504 - Warp 10 Tour, 42 USA
PDF
The state of geo in ElasticSearch
KEY
Mapping Flatland: Using MongoDB for an MMO Crossword Game (GDC Online 2011)
PPTX
2014 11 lucene spatial temporal update
PDF
The Latest in Spatial & Temporal Search: Presented by David Smiley
KEY
Handling Real-time Geostreams
KEY
Handling Real-time Geostreams
PDF
Search with Polygons: Another Approach to Solr Geospatial Search
PPTX
Elasticsearch Field Data Types
PPTX
Lucene 4 spatial
PDF
Geo webinarjune2015
PPTX
Geo data analytics
PDF
OrientDB & Lucene
PDF
Enhanced Methodology for supporting approximate string search in Geospatial ...
PDF
Elastic Search
KEY
Geospatial Indexing and Querying with MongoDB
PPTX
RTree Spatial Indexing with MongoDB - MongoDC
Geo exploration simplified with Elastic Maps
Geospatial Advancements in Elasticsearch
20170504 - Warp 10 Tour, 42 USA
The state of geo in ElasticSearch
Mapping Flatland: Using MongoDB for an MMO Crossword Game (GDC Online 2011)
2014 11 lucene spatial temporal update
The Latest in Spatial & Temporal Search: Presented by David Smiley
Handling Real-time Geostreams
Handling Real-time Geostreams
Search with Polygons: Another Approach to Solr Geospatial Search
Elasticsearch Field Data Types
Lucene 4 spatial
Geo webinarjune2015
Geo data analytics
OrientDB & Lucene
Enhanced Methodology for supporting approximate string search in Geospatial ...
Elastic Search
Geospatial Indexing and Querying with MongoDB
RTree Spatial Indexing with MongoDB - MongoDC
Ad

Recently uploaded (20)

PPTX
Transform Your Business with a Software ERP System
PPTX
L1 - Introduction to python Backend.pptx
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
ai tools demonstartion for schools and inter college
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
System and Network Administration Chapter 2
PDF
AI in Product Development-omnex systems
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
history of c programming in notes for students .pptx
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
top salesforce developer skills in 2025.pdf
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Understanding Forklifts - TECH EHS Solution
Transform Your Business with a Software ERP System
L1 - Introduction to python Backend.pptx
Wondershare Filmora 15 Crack With Activation Key [2025
ai tools demonstartion for schools and inter college
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
System and Network Administration Chapter 2
AI in Product Development-omnex systems
Odoo Companies in India – Driving Business Transformation.pdf
Odoo POS Development Services by CandidRoot Solutions
history of c programming in notes for students .pptx
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
top salesforce developer skills in 2025.pdf
How Creative Agencies Leverage Project Management Software.pdf
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
VVF-Customer-Presentation2025-Ver1.9.pptx
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
Softaken Excel to vCard Converter Software.pdf
Understanding Forklifts - TECH EHS Solution

Geospatial Indexing and Search at Scale with Apache Lucene

  • 1. Geospatial Indexing & Search at Scale with Lucene Nick Knize, PhD Lucene PMC Member + Committer @nknize
  • 2. 2 Lets take a trip... ...through Geo advancements Lucene Geo Search, Then & Now: Prefix Trees vs. BKD Trees Improving Shape Search: BKD + Tessellation 2 23 Lucene Geo in Elasticsearch, an Introduction1
  • 4. 4 PUT crime/incidents/_mapping { “properties” : { “location” : { “type” : “geo_point”, “ignore_malformed” : true, } } } define geo_point mapping
  • 5. POST crime/incidents { “location” : { “lat” : 41.12, “lon” : -71.34 } } 5 insert geo_point mapping POST crime/incidents { “location” : “41.12, -71.34” } POST crime/incidents { “location” : [[-71.34, 41.12], [-71.32, 41.21]] }
  • 6. 6 define geo_shape mapping PUT police/precincts/_mapping { “properties” : { “coverage” : { “type” : “geo_shape”, “ignore_malformed” : false, “tree” : ”quadtree”, “precision” : “5m”, “distance_error_pct“ : 0.025, “orientation” : “ccw”, “points_only” : false } } }
  • 7. 7 insert geo_shape mapping POST police/precincts/ { “coverage” : { “type” : “polygon”, “coordinates” : [[ [-73.9762134, 40.7538588], [-73.9742356, 40.7526327], [-73.9656733, 40.7516774], [-73.9763236, 40.7521246], [-73.9723788, 40.7516733], [-73.9732423, 40.7523556], [-73.9762134, 40.7538588] ]] } }
  • 8. • Shapes are parsed using OGC and ISO standards definitions • OGC Simple Feature Access • ISO Geographic information — Spatial Schema (19107:2003) • Supports the following geo_shape types • Point, MultiPoint • LineString, MultiLineString • Polygon (with holes), MultiPolygon (with holes) • Envelope (bbox) geo_shape mapping 8 insert
  • 10. 10 A long, long time ago, in a galaxy far far away... ...the Inverted [text] Index terms dictionary (terms) postings list (doc ids) Fast 2 The 1 brown 1 dog 1 fox 1 jumping 2 jumps 1 lazy 1 over 1 quick 1 spiders 2 the 1
  • 11. 11 And the open source community went bananas! we Lucene text search!
  • 12. 12 But what about 1D numbers...? Prefix Trees; precision terms dictionary (terms) postings list (doc ids) 1 1, 2, 3, 4, 5 10 2, 3, 4 11 1, 5 100 2 101 3, 4 111 1, 5 1000 2 1010 4 1011 3 1110 1 1111 5 1 myINT: 3881553173 1110 0111 0101 1011 1100 1101 0001 0101 2 myINT: 2405166357 1000 1111 0101 1011 1110 1101 0001 0101 3 myINT: 3205335297 1011 1111 0000 1101 1000 1001 0000 0001 4 myINT: 2835679237 1010 1001 0000 0101 0000 1000 0000 0101 5 myINT: 4177856517 1111 1001 0000 1101 0000 1000 0000 0101
  • 13. 13 But what about numbers...? Prefix Trees; precision terms dictionary (terms) postings list (doc ids) 1 1, 2, 3, 4, 5 10 2, 3, 4 11 1, 5 100 2 101 3, 4 111 1, 5 1000 2 1010 4 1011 3 1110 1 1111 5 1 myINT: 3881553173 1110 0111 0101 1011 1100 1101 0001 0101 2 myINT: 2405166357 1000 1111 0101 1011 1110 1101 0001 0101 3 myINT: 3205335297 1011 1111 0000 1101 1000 1001 0000 0001 4 myINT: 2835679237 1010 1001 0000 0101 0000 1000 0000 0101 5 myINT: 4177856517 1111 1001 0000 1101 0000 1000 0000 0101
  • 14. 14 And the Lucene community went...bananas...again... we Lucene NUMERIC search!
  • 15. 15 But, but, what about Geo? …we’re cool too!
  • 16. 16 How about Geo...terms? Prefix Trees; round peg, square hole terms dictionary (terms) postings list (doc ids) 1 1, 2, 3, 4, 5 10 2, 3, 4 11 1, 5 100 2 101 3, 4 111 1, 5 1000 2 1010 4 1011 3 1110 1 1111 5
  • 17. 17 Searching for round pegs in square holes... Traversing the Prefix Tree... terms dictionary (terms) postings list (doc ids) 1 1, 2, 3, 4, 5 10 2, 3, 4 11 1, 5 100 2 101 3, 4 111 1, 5 1000 2 1010 4 1011 3 1110 1 1111 5 2 4 3 1 5
  • 18. 18 Searching for round pegs in square holes... Traversing the Prefix Tree... terms dictionary (terms) postings list (doc ids) 1 1, 2, 3, 4, 5 10 2, 3, 4 11 1, 5 100 2 101 3, 4 111 1, 5 1000 2 1010 4 1011 3 1110 1 1111 5 2 4 3 1 5
  • 19. 19 Searching for round pegs in square holes... Traversing the Prefix Tree... terms dictionary (terms) postings list (doc ids) 1 1, 2, 3, 4, 5 10 2, 3, 4 11 1, 5 100 2 101 3, 4 111 1, 5 1000 2 1010 4 1011 3 1110 1 1111 5 2 4 3 1 5
  • 20. 20 Searching for round pegs in square holes... Traversing the Prefix Tree... terms dictionary (terms) postings list (doc ids) 1 1, 2, 3, 4, 5 10 2, 3, 4 11 1, 5 100 2 101 3, 4 111 1, 5 1000 2 1010 4 1011 3 1110 1 1111 5 2 4 3 1 5
  • 21. 21 Searching for round pegs in square holes... Traversing the Prefix Tree... terms dictionary (terms) postings list (doc ids) 1 1, 2, 3, 4, 5 10 2, 3, 4 11 1, 5 100 2 101 3, 4 111 1, 5 1000 2 1010 4 1011 3 1110 1 1111 5 2 4 3 1 5
  • 22. 22 Searching for round pegs in square holes... Traversing the Prefix Tree...of death? terms dictionary (terms) postings list (doc ids) 1 1, 2, 3, 4, 5 10 2, 3, 4 11 1, 5 100 2 101 3, 4 111 1, 5 1000 2 1010 4 1011 3 1110 1 1111 5
  • 23. 23 Okay, cool! But what about GeoShapes?! Quad Trees!
  • 24. 24 Okay, cool! But what about GeoShapes?! Use the Inverted [geo] Index... terms dictionary (terms) postings list (doc ids) 1 1, 2, 3, 4, 5 10 1, 2, 4 11 3, 5 100 1 101 2, 4 111 3, 5 1000 2 1010 4 1011 3 1110 3 1111 5 DocID: 6
  • 25. 25 Okay, cool! But what about GeoShapes?! Use the Inverted [geo] Index... terms dictionary (terms) postings list (doc ids) 1 1, 2, 3, 4, 5, 6 10 1, 2, 4 11 3, 5, 6 100 1 101 2, 4 111 3, 5, 6 1000 2 1010 4 1011 3 1110 3, 6 1111 5, 6 DocID: 6
  • 26. 26 Okay, cool! But what about GeoShapes?! Melt with the Inverted [geo] Index...
  • 27. 27 Okay, cool! But what about GeoShapes?! Melt with the Inverted [geo] Index... • Max tree_levels == 32 (2 bits / cell) • distance_error_pct • “slop” factor to manage transient memory usage • % of the diagonal distance (degrees) of the shape • Default == 0 if precision set (2.0) • points_only • optimization for points only shape index • short-circuits recursion
  • 28. 28 geo search? pssh… it’s simple! ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯
  • 29. 29
  • 30. 30 For duty and humanity!
  • 32. 32
  • 33. 33
  • 34. 34 Block K Dimensional Trees (Bkd) to the rescue! ...the right tool!
  • 35. 35 Block K Dimensional Trees (Bkd) to the rescue! ...the right tool! Miny Minx Maxy Maxx m: internal node
  • 36. 36 Block K Dimensional Trees (Bkd) to the rescue! ...the right tool! Yval Xval leaf node
  • 37. 37 Block K Dimensional Trees (Bkd) to the rescue! multi-way...
  • 38. 38 Block K Dimensional Trees (Bkd) to the rescue! multi-way… perfectly balanced...
  • 39. 39 Block K Dimensional Trees (Bkd) to the rescue! multi-way… perfectly balanced...FAIR...
  • 40. 40 Block K Dimensional Trees (Bkd) to the rescue! ...the right tool!
  • 41. 41 Okay, cool! But what about GeoShapes?! Tessellation!
  • 42. 42 Quad Tree Decomposition Simple polygon example • 8 vertex polygon • 1º x 1º coverage area • 3m quad cell resolution • 1,105,889 terms
  • 43. 43 Tessellation Decomposition Simple polygon example • 8 vertex polygon • 1º x 1º coverage area • 3m quad cell resolution • 1,105,889 terms • 1.11 cm resolution • 8 terms • 138,236 : 1 term ratio • (•◡•) / smaller, faster index!
  • 44. 44 Tessellation + BKD a made in… Lucene 7.4 Miny Minx Maxy Maxx m: internal node ??? leaf node
  • 45. 45 Tessellation + BKD a seven dimension made in… Lucene 7.4 Index Dimensions Data Dimensions
  • 46. 46 Tessellation + BKD a made in… Lucene 7.4 Miny Minx Maxy Maxx m: internal node ??? leaf node
  • 48. 48 XYShape for Spatial...in Lucene 8.2+ Tessellating & Indexing Virtual Worlds
  • 49. 49 Smaller, faster, stronger… at scale Smaller index...
  • 50. 50 Smaller, faster, stronger… at scale Faster indexing - Nearly as fast as points!!
  • 51. 51 Smaller, faster, stronger… at scale Positive feedback... @imotov: "...test that was crashing a shard on a node with 16GB heap after running for 4.5 hours now finishes in less than 1sec" @esri: "results show that the insert operation is 25 to 30 times faster in Elasticsearch 6.6 compare to Elasticsearch 6.5.0/6.4.2. ...along with better performance we also get accurate results for spatial queries"
  • 52. 52 geo search? pssh… it’s simple! ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯
  • 53. 53 So, what’s next?? • Non-geo use cases (`XYShape` type) ‒ CAD Drawings (Design, County / City Planning) ‒ Venue Mapping (Conferences, Hotels, Theme Parks, Schools) ‒ Sports analysis (Chicago Cubs, Baseball Advanced Media) ‒ Virtual Gaming & Mapping (Blizzard, Cyber Security / SIEM) • Other coordinate systems (datum / projection support) ‒ Non terrestrial planet models ‒ Localized projections ‒ Custom projections • Beyond 2D ‒ Space - Time (OGC Moving Features) ‒ Elevation Modelling (DEM, DTM) ‒ LiDAR (3D Modelling) Use case application goodies... Available 8.2 In Work
  • 55. ‹#› 55 GeoDistance Agg { "aggs" : { “sf_rings" : { "geo_distance" : { "field" : "location", "origin" : [32.95, -96.82], "ranges" : [ { "to" : 50 }, { "from" : 50, "to" : 100 }, { "from" : 100, "to" : 300} ] } } } }
  • 57. ‹#› 57 GeoGrid Agg { "aggs" : { “crime_cells" : { "geohash_grid" : { "field" : "location", "precision" : 8 } } } }
  • 59. 59 geotile_grid Aggregation Finally, a grid designed for maps! 7.0 introduces geo_tile aggregation - matches the tiling scheme of well known tile maps in the Web Mercator Projection (EPSG:3857) - on web mercator maps, grid cells are - actually square - preserve an identical aspect ratio at all scales and latitudes
  • 60. ‹#› 60 GeoCentroid Agg "query" : { "match" : { "crime" : "burglary" } }, "aggs" : { "towns" : { "terms" : { "field" : "town" }, "aggs" : { "centroid" : { "geo_centroid" : { "field" : “location" } } } } }
  • 63. ‹#› 63 matrix_stats Agg { "aggs": { "statistics": { "matrix_stats": { "fields": ["poverty", "income"] } } } }
  • 64. ‹#› 64 matrix_stats Agg "aggregations": { "statistics": { "doc_count": 50, "fields": [{ "name": "income", "count": 50, "mean": 51985.1, "variance": 7.383377037755103E7, "skewness": 0.5595114003506483, "kurtosis": 2.5692365287787124, "covariance": { "income": 7.383377037755103E7, "poverty": -21093.65836734694 }, "correlation": { "income": 1.0, "poverty": -0.8352655256272504 } }, { "name": "poverty", "count": 50, "mean": 12.732000000000001, "variance": 8.637730612244896, "skewness": 0.4516049811903419, "kurtosis": 2.8615929677997767, "covariance": { "income": -21093.65836734694, "poverty": 8.637730612244896 }, "correlation": { "income": -0.8352655256272504, "poverty": 1.0 } }] }
  • 65. 65 Geo Aggregations more available, and coming soon... • pca - ML foundation plugin ‒ dimensionality reduction ‒ image analysis ‒ classification / recognition • geo_stats - In work... ‒ Moran’s I - measuring spatial auto-correlation ‒ Getis-Ord - spatial hot spot analysis
  • 66. 66 Now GA in Elastic 7.3….
  • 67. THANK YOU Follow for updates: @nknize apache/lucene-solr