SlideShare a Scribd company logo
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
Lucene/Solr Spatial in 2015
David Smiley
Search Engineer/Consultant (Freelance)
3
About David Smiley
Freelance Search Developer/Consultant
Expert Lucene/Solr development skills,

advise (consulting), training
Java, spatial, and full-stack experience
Apache Lucene/Solr committer & PMC member
Primary author of “Apache Solr Enterprise Search Server”
4
More Spatial Contributors!
Spatial4j Lucene Solr
David Smiley ✔️ ✔️ ✔️
Ryan McKinley ✔️
Justin Deoliveira ✔️
Mike McCandless ✔️
Nick Knize ✔️
Karl Wright ✔️
Ishan Chattopadhyaya ✔️
5
Agenda
New Features / Capabilities
New Approaches
Improvements
Pending
6
Topic: New Features
Heatmaps / grid faceting — Lucene, Solr
Surface-of-sphere shapes (Geo3d) — Lucene
Accurate indexed geometries — Lucene, Solr
GeoJSON read/write — Spatial4j
7
Heatmaps: Spatial Grid Faceting
Spatial density summary grid faceting,
also useful for point-plotting search results
Usually rendered with a gradient radius
Lucene & Solr APIs
Scalable & fast usually…
v5.2
8
Heatmaps Under the Hood
Requires a PrefixTreeStrategy Lucene field — grid based
Algorithm enumerates the underlying cell/terms and
accumulates the counter in a corresponding grid
Conceptually facet.method=enum for spatial
Works on non-point indexed shapes too
Complexity: O(cells * cellDepthFactor) not O(docs)
No/low memory; mainly the grid of integers
Solr will distribute to shards and merge
Could be faster still; a BFS (vs DFS) layout would be perfect
9
Solr Heatmap Faceting
On an RPT field
(SpatialRecursivePrefixTreeFieldType)
prefixTree=“packedQuad”
Query: 

/select?facet=true

&facet.heatmap=geo_rpt

&facet.heatmap.geom=

["-180 -90" TO "180 90”]
facet.heatmap.format=ints2D or png
// Normal Solr response...
"facet_counts":{
... // facet response fields
"facet_heatmaps":{
"loc_srpt":[
"gridLevel",2,
"columns",32,
"rows",32,
"minX",-180.0,
"maxX",180.0,
"minY",-90.0,
"maxY",90.0,
"counts_ints2D", [null, null, [0, 0, ... ]]
...
10
Solr Heatmap Resources
Solr Ref guide: https://cwiki.apache.org/confluence/display/solr/Spatial
+Search
Jack Reed’s Tutorial: http://www.jack-reed.com/2015/06/29/
visualizing-10-million-geonames-with-leaflet-solr-heatmap-facets.html
Live Demo: http://worldwidegeoweb.com
Open-source JavaScript Solr Heatmap Libraries
https://github.com/spacemansteve/SolrHeatmapLayer
https://github.com/mejackreed/leaflet-solr-heatmap
https://github.com/voyagersearch/leaflet-solr-heatmap
11
Geo3D: Shapes on the Surface of a Sphere
… or Ellipsoid of configurable axis
Not a general 3D space geometry lib
Internally uses geocentric X, Y, Z coordinates (hence 3D) with
3D planar geometry mathematics
Shapes: Point, Lat-Lon Rect, Circle, Polygons, Path (LineString)
with optional buffer
Distance computations: Arc (angular or surface), Linear
(straight-line), Normal
12
All 2D Maps of the Earth Distort Straight Lines
A straight bird-
flies path from
Anchorage to
Miami doesn’t
actually cross the
ocean!
13
Geo3D, continued…
Benefits
Inherently more accurate than 2D projected spatial
especially for big shapes or near poles
Many computations are fast; no expensive trigonometry
An alternative to JTS without the LGPL license (still)
Has own Lucene module (spatial3d), thus jar file
Maven groupId: org.apache.lucene, artifact: lucene-spatial3d
No Solr integration yet; pending more Spatial4j integration
14
Index & Search Geo3D Geometries
Spatial4j Geo3dShape
wrapper with RPT
In Lucene-spatial for now
Index Geo3d shapes
Limited to grid accuracy
Query by Geo3d shape
Limited distance sort
Heatmaps
Geo3DPointField &
PointInGeo3DShapeQuery
Based on a 3D BKD index
In spatial3d module
Index points-only
No multi-valued
Query by Geo3d shape
No distance sort
Leaner & faster than RPT
v5.4v5.2
15
RPT/SpatialPrefixTrees and Accuracy
RecursivePrefixTree (RPT) uses Lucene’s index as a PrefixTree
Thus represents shapes as grid cells of varying precision by
prefix
Example, a point shape:
D, DR, DRT, DRT2, DRT2Y
More accuracy scales
Example, a polygon shape:
Too many to list… 508 cells
More accuracy does NOT scale
16
Combining RPT with Serialized Geometry
RPT (RecursivePrefixTreeStrategy) is the grid index (inaccurate)
SDV (SerializedDVStrategy) stores serialized geometry (accurate)
RPT + SDV → CompositeSpatialStrategy
Accuracy & speed & smaller indexes
Optimized intersects predicate avoids some geometry checks
> 80% faster intersects queries, 75% smaller index
Solr adapter: RptWithGeometrySpatialField
Compatible with the Heatmaps feature
Includes a shape cache (per-segment); configurable
v5.2
17
Topic: New Approaches
Lucene
BKD Tree Indexes
GeoPointField
18
BKD Tree Indexes
New numeric/spatial index approach with own file format
Not based on Lucene Terms index
https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf
Much faster and compact than Trie/PrefixTree based indexes
Wither term auto-prefixing? LUCENE-5879
Indexed point-data only; multi-valued mostly
Intersects predicate only
Filtering only (no distance or other scoring)
Multiple implementations… (next slide)
Neat visualization https://youtu.be/
19
Multiple BKD Implementations
Multiple implementations of the same BKD concept:
(1D) RangeTreeDocValuesFormat
(2D) BKDPointField & BKD…Query
(3D) Geo3DPointField & PointInGeo3DShapeQuery
(ND) LUCENE-6825 (to Lucene-core) in-progress
1D,2D,3D Implementations are either in lucene-sandbox or
lucene-spatial3d for now
No Lucene-spatial module SpatialStrategy wrappers yet
thus no Spatial4j Shape integration nor Solr integration yet
20
BKD 1D: RangeTree
Efficient range search on single/multi-valued numbers or terms
Could be used for numbers, dates, IPV6 bytes, …
Alternatives: Normal number fields (trie), DateRangeField (RPT)
Would love to see a benchmark!
How-To:
RangeTreeDocValuesFormat
Numbers: SortedNumericDocValuesField with
NumericRangeTreeQuery
Bytes: SortedSetDocValuesField with SortedSetRangeTreeQuery
v5.3
21
BKD 2D: BKDPointField
Efficient 2D geospatial point index
Alternative to RPT or GeoPointField
5.7x faster than RPT w/ GeoHash. Smaller indexes.
How-To:
Use BKDPointField (requires BKDTreeDocValuesFormat)
Query:
BKDPointInBBoxQuery
BKDPointInPolygonQuery
point-radius (circle) — in-progress LUCENE-6698
v5.3
22
GeoPointField
2D geospatial point field
Indexed point-only data, single/multi-valued
Spatial 2D Trie/PrefixTree terms index
But not affiliated with Lucene-spatial SpatialPrefixTree/RPT
Configurable 2x
grid size (defaults to 512)
Compact bit interleaved Z-order encoding
Re-uses much of Lucene’s numeric precisionStep &
MultiTermQuery logic
2-phase grid/postings then doc-values algorithm
v5.3
23
…continued
Has no affiliation with Spatial4j, RPT, JTS, or SpatialStrategy
No Heatmaps, No custom Shape implementations
No Solr support yet
No dependencies
Easy to use compared to RPT; simpler internally too
How-To:
doc.add(new GeoPointField(name, lon, lat, Store.YES))
GeoPointDistanceQuery (sphere only) or GeoPointInBBoxQuery
or GeoPointInPolygonQuery. …DistanceRangeQuery pending
24
Topic: Improvements
Spatial4j
Minimal longitude bounding-box algorithm
Lucene (PrefixTree / RPT indexing)
Leaner & faster non-point indexes
New PackedQuadPrefixTree
Solr
Distance units: Kilometers/Miles/Degrees
Nicer ST_* spatial query parsers (almost done)
25
Topic: Some Pending Spatial TODOs
Spatial4j
Geo3D integration — a JTS
alternative
Lucene
FlexPrefixTree — LUCENE-4922
Multi-dimensional BKD —
LUCENE-6825
SpatialStrategy adapters for
GeoPointField, etc.
Solr
Better spatial Solr
QParsers — SOLR-4242
GeoJSON parsing
More FieldType adapters
for latest Lucene spatial
DateRangeField faceting
Nearest-neighbor search
Well, 2015 isn’t over yet. :-)
26
That’s all for now; thanks for coming!
Need Lucene/Solr guidance or custom development?
Contact me!
Email: dsmiley@apache.org
LinkedIn: http://www.linkedin.com/in/davidwsmiley
G+: +DavidSmiley
Twitter: @DavidWSmiley

More Related Content

PPTX
Lucene 4 spatial
PDF
Lucene solr 4 spatial extended deep dive
PDF
Geospatial search with SOLR
PDF
Search with Polygons: Another Approach to Solr Geospatial Search
PPTX
2016-01 Lucene Solr spatial in 2015, NYC Meetup
PPTX
Lucene/Solr spatial in 2015
PDF
The Latest in Spatial & Temporal Search: Presented by David Smiley
PPTX
H-Hypermap Heatmap Analytics at Scale
Lucene 4 spatial
Lucene solr 4 spatial extended deep dive
Geospatial search with SOLR
Search with Polygons: Another Approach to Solr Geospatial Search
2016-01 Lucene Solr spatial in 2015, NYC Meetup
Lucene/Solr spatial in 2015
The Latest in Spatial & Temporal Search: Presented by David Smiley
H-Hypermap Heatmap Analytics at Scale

What's hot (7)

PDF
Geographical Data Management for Web Applications
PDF
Class Weighted Convolutional Features for Image Retrieval
PPTX
Efficient Parallel Set-Similarity Joins Using MapReduce
PDF
JRuby: Apples and Oranges
PDF
Sparksummitny2016
PDF
A x86-optimized rank&select dictionary for bit sequences
PPTX
Apache con big data 2015 magellan
Geographical Data Management for Web Applications
Class Weighted Convolutional Features for Image Retrieval
Efficient Parallel Set-Similarity Joins Using MapReduce
JRuby: Apples and Oranges
Sparksummitny2016
A x86-optimized rank&select dictionary for bit sequences
Apache con big data 2015 magellan
Ad

Viewers also liked (20)

PDF
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
PDF
This Ain't Your Parent's Search Engine: Presented by Grant Ingersoll, Lucidworks
PDF
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
PDF
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
PDF
SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...
PDF
Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
PDF
Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will Hayes
PDF
Evolving Search Relevancy: Presented by James Strassburg, Direct Supply
PDF
Search at Twitter: Presented by Michael Busch, Twitter
PDF
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
PDF
Webinar: Ecommerce, Rules, and Relevance
PDF
Parallel SQL and Streaming Expressions in Apache Solr 6
PDF
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
PDF
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
PDF
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
PDF
Monitoring and Log Management for
PDF
Webinar: Solr 6 Deep Dive - SQL and Graph
PDF
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
PDF
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
PDF
Managed Search: Presented by Jacob Graves, Getty Images
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
This Ain't Your Parent's Search Engine: Presented by Grant Ingersoll, Lucidworks
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...
Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will Hayes
Evolving Search Relevancy: Presented by James Strassburg, Direct Supply
Search at Twitter: Presented by Michael Busch, Twitter
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Webinar: Ecommerce, Rules, and Relevance
Parallel SQL and Streaming Expressions in Apache Solr 6
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Monitoring and Log Management for
Webinar: Solr 6 Deep Dive - SQL and Graph
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Managed Search: Presented by Jacob Graves, Getty Images
Ad

Similar to Lucene/Solr Spatial in 2015: Presented by David Smiley (20)

PPTX
2014 11 lucene spatial temporal update
PPTX
High Dimensional Indexing using MongoDB (MongoSV 2012)
PPTX
Geospatial Indexing and Search at Scale with Apache Lucene
PPTX
LocationTech Projects
PDF
Geo exploration simplified with Elastic Maps
PDF
The state of geo in ElasticSearch
PDF
OrientDB & Lucene
PDF
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
PDF
Geospatial Advancements in Elasticsearch
PDF
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
PDF
3D Web Services And Models For The Web: Where Do We Stand?
PPTX
Spatial_Data_Structures_Presentation.pptx
PPTX
Optimizing spatial database
PPTX
Crawlable Spatial Data - #Geo4Web research topic #3
PPTX
Spatial databases
PPT
Going for GOLD - Adventures in Open Linked Geospatial Metadata
PPTX
Geo data analytics
PPTX
RTree Spatial Indexing with MongoDB - MongoDC
PPTX
Geographica: A Benchmark for Geospatial RDF Stores
2014 11 lucene spatial temporal update
High Dimensional Indexing using MongoDB (MongoSV 2012)
Geospatial Indexing and Search at Scale with Apache Lucene
LocationTech Projects
Geo exploration simplified with Elastic Maps
The state of geo in ElasticSearch
OrientDB & Lucene
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial Advancements in Elasticsearch
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
3D Web Services And Models For The Web: Where Do We Stand?
Spatial_Data_Structures_Presentation.pptx
Optimizing spatial database
Crawlable Spatial Data - #Geo4Web research topic #3
Spatial databases
Going for GOLD - Adventures in Open Linked Geospatial Metadata
Geo data analytics
RTree Spatial Indexing with MongoDB - MongoDC
Geographica: A Benchmark for Geospatial RDF Stores

More from Lucidworks (20)

PDF
Search is the Tip of the Spear for Your B2B eCommerce Strategy
PDF
Drive Agent Effectiveness in Salesforce
PPTX
How Crate & Barrel Connects Shoppers with Relevant Products
PPTX
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
PPTX
Connected Experiences Are Personalized Experiences
PDF
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
PPTX
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
PPTX
Preparing for Peak in Ecommerce | eTail Asia 2020
PPTX
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
PPTX
AI-Powered Linguistics and Search with Fusion and Rosette
PDF
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
PPTX
Webinar: Smart answers for employee and customer support after covid 19 - Europe
PDF
Smart Answers for Employee and Customer Support After COVID-19
PPTX
Applying AI & Search in Europe - featuring 451 Research
PPTX
Webinar: Accelerate Data Science with Fusion 5.1
PDF
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
PPTX
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
PPTX
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
PPTX
Webinar: Building a Business Case for Enterprise Search
PPTX
Why Insight Engines Matter in 2020 and Beyond
Search is the Tip of the Spear for Your B2B eCommerce Strategy
Drive Agent Effectiveness in Salesforce
How Crate & Barrel Connects Shoppers with Relevant Products
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Connected Experiences Are Personalized Experiences
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
Preparing for Peak in Ecommerce | eTail Asia 2020
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
AI-Powered Linguistics and Search with Fusion and Rosette
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Smart Answers for Employee and Customer Support After COVID-19
Applying AI & Search in Europe - featuring 451 Research
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Webinar: Building a Business Case for Enterprise Search
Why Insight Engines Matter in 2020 and Beyond

Recently uploaded (20)

PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Cloud computing and distributed systems.
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Electronic commerce courselecture one. Pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Big Data Technologies - Introduction.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
The AUB Centre for AI in Media Proposal.docx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Review of recent advances in non-invasive hemoglobin estimation
MYSQL Presentation for SQL database connectivity
Chapter 3 Spatial Domain Image Processing.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Cloud computing and distributed systems.
Advanced methodologies resolving dimensionality complications for autism neur...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Electronic commerce courselecture one. Pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Unlocking AI with Model Context Protocol (MCP)
Spectral efficient network and resource selection model in 5G networks
Big Data Technologies - Introduction.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf

Lucene/Solr Spatial in 2015: Presented by David Smiley

  • 1. O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
  • 2. Lucene/Solr Spatial in 2015 David Smiley Search Engineer/Consultant (Freelance)
  • 3. 3 About David Smiley Freelance Search Developer/Consultant Expert Lucene/Solr development skills,
 advise (consulting), training Java, spatial, and full-stack experience Apache Lucene/Solr committer & PMC member Primary author of “Apache Solr Enterprise Search Server”
  • 4. 4 More Spatial Contributors! Spatial4j Lucene Solr David Smiley ✔️ ✔️ ✔️ Ryan McKinley ✔️ Justin Deoliveira ✔️ Mike McCandless ✔️ Nick Knize ✔️ Karl Wright ✔️ Ishan Chattopadhyaya ✔️
  • 5. 5 Agenda New Features / Capabilities New Approaches Improvements Pending
  • 6. 6 Topic: New Features Heatmaps / grid faceting — Lucene, Solr Surface-of-sphere shapes (Geo3d) — Lucene Accurate indexed geometries — Lucene, Solr GeoJSON read/write — Spatial4j
  • 7. 7 Heatmaps: Spatial Grid Faceting Spatial density summary grid faceting, also useful for point-plotting search results Usually rendered with a gradient radius Lucene & Solr APIs Scalable & fast usually… v5.2
  • 8. 8 Heatmaps Under the Hood Requires a PrefixTreeStrategy Lucene field — grid based Algorithm enumerates the underlying cell/terms and accumulates the counter in a corresponding grid Conceptually facet.method=enum for spatial Works on non-point indexed shapes too Complexity: O(cells * cellDepthFactor) not O(docs) No/low memory; mainly the grid of integers Solr will distribute to shards and merge Could be faster still; a BFS (vs DFS) layout would be perfect
  • 9. 9 Solr Heatmap Faceting On an RPT field (SpatialRecursivePrefixTreeFieldType) prefixTree=“packedQuad” Query: 
 /select?facet=true
 &facet.heatmap=geo_rpt
 &facet.heatmap.geom=
 ["-180 -90" TO "180 90”] facet.heatmap.format=ints2D or png // Normal Solr response... "facet_counts":{ ... // facet response fields "facet_heatmaps":{ "loc_srpt":[ "gridLevel",2, "columns",32, "rows",32, "minX",-180.0, "maxX",180.0, "minY",-90.0, "maxY",90.0, "counts_ints2D", [null, null, [0, 0, ... ]] ...
  • 10. 10 Solr Heatmap Resources Solr Ref guide: https://cwiki.apache.org/confluence/display/solr/Spatial +Search Jack Reed’s Tutorial: http://www.jack-reed.com/2015/06/29/ visualizing-10-million-geonames-with-leaflet-solr-heatmap-facets.html Live Demo: http://worldwidegeoweb.com Open-source JavaScript Solr Heatmap Libraries https://github.com/spacemansteve/SolrHeatmapLayer https://github.com/mejackreed/leaflet-solr-heatmap https://github.com/voyagersearch/leaflet-solr-heatmap
  • 11. 11 Geo3D: Shapes on the Surface of a Sphere … or Ellipsoid of configurable axis Not a general 3D space geometry lib Internally uses geocentric X, Y, Z coordinates (hence 3D) with 3D planar geometry mathematics Shapes: Point, Lat-Lon Rect, Circle, Polygons, Path (LineString) with optional buffer Distance computations: Arc (angular or surface), Linear (straight-line), Normal
  • 12. 12 All 2D Maps of the Earth Distort Straight Lines A straight bird- flies path from Anchorage to Miami doesn’t actually cross the ocean!
  • 13. 13 Geo3D, continued… Benefits Inherently more accurate than 2D projected spatial especially for big shapes or near poles Many computations are fast; no expensive trigonometry An alternative to JTS without the LGPL license (still) Has own Lucene module (spatial3d), thus jar file Maven groupId: org.apache.lucene, artifact: lucene-spatial3d No Solr integration yet; pending more Spatial4j integration
  • 14. 14 Index & Search Geo3D Geometries Spatial4j Geo3dShape wrapper with RPT In Lucene-spatial for now Index Geo3d shapes Limited to grid accuracy Query by Geo3d shape Limited distance sort Heatmaps Geo3DPointField & PointInGeo3DShapeQuery Based on a 3D BKD index In spatial3d module Index points-only No multi-valued Query by Geo3d shape No distance sort Leaner & faster than RPT v5.4v5.2
  • 15. 15 RPT/SpatialPrefixTrees and Accuracy RecursivePrefixTree (RPT) uses Lucene’s index as a PrefixTree Thus represents shapes as grid cells of varying precision by prefix Example, a point shape: D, DR, DRT, DRT2, DRT2Y More accuracy scales Example, a polygon shape: Too many to list… 508 cells More accuracy does NOT scale
  • 16. 16 Combining RPT with Serialized Geometry RPT (RecursivePrefixTreeStrategy) is the grid index (inaccurate) SDV (SerializedDVStrategy) stores serialized geometry (accurate) RPT + SDV → CompositeSpatialStrategy Accuracy & speed & smaller indexes Optimized intersects predicate avoids some geometry checks > 80% faster intersects queries, 75% smaller index Solr adapter: RptWithGeometrySpatialField Compatible with the Heatmaps feature Includes a shape cache (per-segment); configurable v5.2
  • 17. 17 Topic: New Approaches Lucene BKD Tree Indexes GeoPointField
  • 18. 18 BKD Tree Indexes New numeric/spatial index approach with own file format Not based on Lucene Terms index https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf Much faster and compact than Trie/PrefixTree based indexes Wither term auto-prefixing? LUCENE-5879 Indexed point-data only; multi-valued mostly Intersects predicate only Filtering only (no distance or other scoring) Multiple implementations… (next slide) Neat visualization https://youtu.be/
  • 19. 19 Multiple BKD Implementations Multiple implementations of the same BKD concept: (1D) RangeTreeDocValuesFormat (2D) BKDPointField & BKD…Query (3D) Geo3DPointField & PointInGeo3DShapeQuery (ND) LUCENE-6825 (to Lucene-core) in-progress 1D,2D,3D Implementations are either in lucene-sandbox or lucene-spatial3d for now No Lucene-spatial module SpatialStrategy wrappers yet thus no Spatial4j Shape integration nor Solr integration yet
  • 20. 20 BKD 1D: RangeTree Efficient range search on single/multi-valued numbers or terms Could be used for numbers, dates, IPV6 bytes, … Alternatives: Normal number fields (trie), DateRangeField (RPT) Would love to see a benchmark! How-To: RangeTreeDocValuesFormat Numbers: SortedNumericDocValuesField with NumericRangeTreeQuery Bytes: SortedSetDocValuesField with SortedSetRangeTreeQuery v5.3
  • 21. 21 BKD 2D: BKDPointField Efficient 2D geospatial point index Alternative to RPT or GeoPointField 5.7x faster than RPT w/ GeoHash. Smaller indexes. How-To: Use BKDPointField (requires BKDTreeDocValuesFormat) Query: BKDPointInBBoxQuery BKDPointInPolygonQuery point-radius (circle) — in-progress LUCENE-6698 v5.3
  • 22. 22 GeoPointField 2D geospatial point field Indexed point-only data, single/multi-valued Spatial 2D Trie/PrefixTree terms index But not affiliated with Lucene-spatial SpatialPrefixTree/RPT Configurable 2x grid size (defaults to 512) Compact bit interleaved Z-order encoding Re-uses much of Lucene’s numeric precisionStep & MultiTermQuery logic 2-phase grid/postings then doc-values algorithm v5.3
  • 23. 23 …continued Has no affiliation with Spatial4j, RPT, JTS, or SpatialStrategy No Heatmaps, No custom Shape implementations No Solr support yet No dependencies Easy to use compared to RPT; simpler internally too How-To: doc.add(new GeoPointField(name, lon, lat, Store.YES)) GeoPointDistanceQuery (sphere only) or GeoPointInBBoxQuery or GeoPointInPolygonQuery. …DistanceRangeQuery pending
  • 24. 24 Topic: Improvements Spatial4j Minimal longitude bounding-box algorithm Lucene (PrefixTree / RPT indexing) Leaner & faster non-point indexes New PackedQuadPrefixTree Solr Distance units: Kilometers/Miles/Degrees Nicer ST_* spatial query parsers (almost done)
  • 25. 25 Topic: Some Pending Spatial TODOs Spatial4j Geo3D integration — a JTS alternative Lucene FlexPrefixTree — LUCENE-4922 Multi-dimensional BKD — LUCENE-6825 SpatialStrategy adapters for GeoPointField, etc. Solr Better spatial Solr QParsers — SOLR-4242 GeoJSON parsing More FieldType adapters for latest Lucene spatial DateRangeField faceting Nearest-neighbor search Well, 2015 isn’t over yet. :-)
  • 26. 26 That’s all for now; thanks for coming! Need Lucene/Solr guidance or custom development? Contact me! Email: dsmiley@apache.org LinkedIn: http://www.linkedin.com/in/davidwsmiley G+: +DavidSmiley Twitter: @DavidWSmiley