SlideShare a Scribd company logo
Big Data and Geospatial with HPCC Systems®
Powered by LexisNexis Risk Solutions
Ignacio Calvo
Greg McRandal
10/05/2016
Concepts in Geospatial
How to use them with HPCC
Use cases
@HPCCSystems
An approach to applying statistical
analysis and other analytic techniques
to data which has a geographical or
spatial aspect
Definition
Big Data and Geospatial with HPCC Systems
Origin of Geospatial
John Snow’s original map (1854),
using GIS to save lives. This map
was used to determine that
Cholera was water-borne
Need to know :
• Format
• Projection / coordinate system
Understanding the data
Formats : Vector vs Raster
Vector Raster
Projections are used to represent the world in ways
we can process
•The Earth is round and maps are flat
•Physical Maps
•Computer Maps
What is a projection?
Have I seen projections before?
•Peter vs Mercator vs Winkel tripel
•GPS (latitude/longitude)
•Google Maps
Two different projections representing the same place.
Projections
WGS84
•Latitude and longitude
•Our best approximation of the world
•Not always the best for a specific region
•Not technically a projection
Projections to know about
Mercator
•Many different ones, choose one based on your location
•Reduces the area it covers to a simple Cartesian plane
•Good near the central axis, bad far away from it :
• Web Mercator covers the whole world – good near equator, gets worse as you travel north or
south
• Irish National Grid – very good for Ireland, awful anywhere else.
Lies, damned lies, statistics… and maps!
*https://twitter.com/flashboy/status/641221733509373952
Lies, damned lies, statistics… and maps!
Projection Woes:
A straight line in Mercator is
not a straight line in WGS84
Four points converted
to WGS84
Where the lines
should be
Don’t re-project polygons!
This “solution” is only good
enough for visuals, not for
maths.
Lies, damned lies, statistics… and maps!
Lies, damned lies, statistics… and maps!
Visuals don’t agree with maths: Wind and Hail.
Web Mercator WGS84
Number one bug in Geospatial
*http://twcc.fr
Number one bug in Geospatial
Latitude
Longitude
X
Y
LatY LonX
Now I understand my data, what’s next?
Data Ingest Index Query
Bringing Geospatial into HPCC
GOAL
Bring our geospatial processes
into the realm of Big Data
STEPS
Spatial filtering of vector geometries
Spatial operations using vector geometries
Spatial reference projection and transformation
Reading of compressed geo-raster files
Big Data
Extend HPCC and ECL to support the following main
capabilities :
STEPS
Big Data
Integration of open source libraries
Ingesting Vector Data
It’s a CSV file.
Id Name Geometry Projection Value
1 Alice’s
place
POINT (53.78925462 -6.08354321) 4326* €5,973,000
2 Bob’s place POINT (-34.78925462 7.08354321) 4326 €872,000
3 Celine’s
place
POINT (102.78925462 -6.08354321) 4326 €9,324,000
* WGS84 (Lat/Lon)
3.
Peril tag
2.
Geocode address
1.
Policy data
Data ready to
ingest
Ingesting Vector Data
It’s a GML / XML file.
3.
Process and index
2.
Parse XPATH
1.
Shape data
Data ready to
query
Ingesting Vector Data
It’s a GML / XML file.
3.
Process and index
2.
Parse XPATH
1.
Shape data
Data ready to
query
Ingesting Vector Data
It’s a GML / XML file.
3.
Process and index
2.
Parse XPATH
1.
Shape data
Data ready to
query
Indexing vector data
• Outline Box: Biggest rectangle
• Boxes contain boxes
• Bottom box in the tree contains actual
geometries
• Here, 3 levels pictured
• Boxes can overlap (entries are only in one)
Querying vector data
Searching an R-Tree: e.g. Finding all buildings (points) inside a flood zone (polygon)
Does the query polygon overlap our box?
Return empty list
Search our boxes’
children
Is it a leaf node?
Return all nodes
for verification
Y
N
Y
N
Ingesting Raster Data
It’s a raster / TIFF file. Bitmap image
3.
Process and index
2.
Tile and spray
1.
Raster data
Data ready to
query
Ingesting Raster Data
3.
Process and index
2.
Tile and spray
1.
Raster data
Data ready to
query
Tiling divides raster images into
small manageable areas of known
dimensions.
These tiles have their own
metadata:
• Bounding box
• Grid position
Ingesting Raster Data
3.
Process and index
2.
Tile and spray
1.
Raster data
Data ready to
query
1. Figure out which grid position the
geometry needs
2. Extract the required pixel
3. Interrogate the pixel for its value
4. Interpret its value
5. Return to user
Ingesting Raster Data
It’s a raster / TIFF file. Bitmap image
3.
Process and index
2.
Tile and spray
1.
Raster data
Data ready to
query
Ingesting Raster Data
It’s a raster / TIFF file.
3.
Process and index
2.
Tile and spray
1.
Raster data
Data ready to
query
Bringing it all together
*Andrew Farrell
In pursuit of perils : Geo-spatial risk analysis through HPCC Systems
https://hpccsystems.com/resources/blog/afarrell/pursuit-perils-geo-spatial-risk-analysis-
through-hpcc-systems
Add even more value
Add even more value
Why Geospatial with HPCC?
• Efficient parallel processing
• Ability to import libraries from different languages
• Good coverage of functions and spatial predicates
• Fast ingestion
• Support for different formats
• Sub-second queries
Big Data and Geospatial with HPCC Systems
hpccsystems.com

More Related Content

PDF
Components of Spatial Data Quality in GIS
PPTX
GIS & Raster
PPT
datamodel_vector
PPTX
Optimizing spatial database
PDF
Spatial Data Model 2
PDF
Spatial Analysis and Geomatics
PDF
Spatial data analysis 1
PPT
Spatial data mining
Components of Spatial Data Quality in GIS
GIS & Raster
datamodel_vector
Optimizing spatial database
Spatial Data Model 2
Spatial Analysis and Geomatics
Spatial data analysis 1
Spatial data mining

What's hot (20)

PDF
GIS data structure
PPTX
Spatial databases
PPT
3D Analyst - Lab
PPT
4.2 spatial data mining
PPTX
Spatial analysis and Analysis Tools ( GIS )
PDF
Spatial Data Model
PPT
Improvement of Spatial Data Quality Using the Data Conflation
PPT
Iccsa stankuteha180611
PPT
3D Analyst - Lake, Jatiluhur
PDF
Spatial data analysis
PDF
Spatial vs non spatial
PPTX
Vector data model
PPTX
GIS Modeling
PDF
ePOM - Intro to Ocean Data Science - Raster and Vector Data Formats
PPSX
Geographical information system unit 5
PPTX
Conversion of Existing Data
PPTX
LIDAR and Drone Data - Datamine Discover3D
PPTX
MapInfo Discover 3D for Wind Energy Resources
PPT
3D Analyst - Watershed Lorelindu
PPT
Spatial Database Systems
GIS data structure
Spatial databases
3D Analyst - Lab
4.2 spatial data mining
Spatial analysis and Analysis Tools ( GIS )
Spatial Data Model
Improvement of Spatial Data Quality Using the Data Conflation
Iccsa stankuteha180611
3D Analyst - Lake, Jatiluhur
Spatial data analysis
Spatial vs non spatial
Vector data model
GIS Modeling
ePOM - Intro to Ocean Data Science - Raster and Vector Data Formats
Geographical information system unit 5
Conversion of Existing Data
LIDAR and Drone Data - Datamine Discover3D
MapInfo Discover 3D for Wind Energy Resources
3D Analyst - Watershed Lorelindu
Spatial Database Systems
Ad

Viewers also liked (20)

PPTX
2016 HPCC Systems Poster Presentation Competition
PDF
Farm Management System - Delivering a Precision Agriculture Solution
PDF
Enabling Aviation Analytics through HPCC Systems
PDF
Introduction to the Open Source HPCC Systems Platform by Arjuna Chala
PDF
HPCC Systems - Using Big Data to Help Feed the World
PPTX
Big data ppt
PDF
HPCC Presentation
PDF
HUG Ireland Event - HPCC Presentation Slides
PDF
Proagrica - Big Data to Feed the World
PDF
Big Data Ready Enterprise
PPTX
Big Data - Hadoop and MapReduce - Aditya Garg
PPTX
Poultry farm management system
PDF
The current challenges and opportunities of big data and analytics in emergen...
PDF
Big-data analytics: challenges and opportunities
PDF
Big Data: Issues and Challenges
PPTX
Webinar 2013 11-21-sebillo
PDF
LR Каталог продукции 2012
PPTX
MY NAME IS DUBIAN MARIN - UNAD
PDF
LR Прайс лист 08.2012
PPTX
Two Days Training on Advocacy at Lahore 8 - 9 December 2016
2016 HPCC Systems Poster Presentation Competition
Farm Management System - Delivering a Precision Agriculture Solution
Enabling Aviation Analytics through HPCC Systems
Introduction to the Open Source HPCC Systems Platform by Arjuna Chala
HPCC Systems - Using Big Data to Help Feed the World
Big data ppt
HPCC Presentation
HUG Ireland Event - HPCC Presentation Slides
Proagrica - Big Data to Feed the World
Big Data Ready Enterprise
Big Data - Hadoop and MapReduce - Aditya Garg
Poultry farm management system
The current challenges and opportunities of big data and analytics in emergen...
Big-data analytics: challenges and opportunities
Big Data: Issues and Challenges
Webinar 2013 11-21-sebillo
LR Каталог продукции 2012
MY NAME IS DUBIAN MARIN - UNAD
LR Прайс лист 08.2012
Two Days Training on Advocacy at Lahore 8 - 9 December 2016
Ad

Similar to Big Data and Geospatial with HPCC Systems (20)

PPT
What is Geography Information Systems (GIS)
PPTX
GIS Analysis For Site Remediation
PPTX
THE NATURE AND SOURCE OF GEOGRAPHIC DATA
PPTX
Getting started with GIS
PPTX
PIAS 2013-GIS.pptxfskjczjsbchdbfscnnND dHSA
PPTX
Fundamentals of GIS
PPTX
Geographic Information System unit 1
PPTX
Data models in geographical information system(GIS)
PDF
geographic information system pdf
PPTX
GIS_Intro_March_2014
PPTX
Gis Introduction related to remote sensing
PPT
Info Grafix
PDF
Topological Data Analysis of Complex Spatial Systems
PDF
Geographic information system(GIS) and its applications in agriculture
PPT
Final ies
PDF
GIS_FDP_Final.pdf
PDF
Spatial Data Science with R
PDF
Exploratory Spatial Analytics (ESA)
PDF
Scattered gis handbook
PPT
Intro to GIS and Remote Sensing
What is Geography Information Systems (GIS)
GIS Analysis For Site Remediation
THE NATURE AND SOURCE OF GEOGRAPHIC DATA
Getting started with GIS
PIAS 2013-GIS.pptxfskjczjsbchdbfscnnND dHSA
Fundamentals of GIS
Geographic Information System unit 1
Data models in geographical information system(GIS)
geographic information system pdf
GIS_Intro_March_2014
Gis Introduction related to remote sensing
Info Grafix
Topological Data Analysis of Complex Spatial Systems
Geographic information system(GIS) and its applications in agriculture
Final ies
GIS_FDP_Final.pdf
Spatial Data Science with R
Exploratory Spatial Analytics (ESA)
Scattered gis handbook
Intro to GIS and Remote Sensing

More from HPCC Systems (20)

PPTX
Natural Language to SQL Query conversion using Machine Learning Techniques on...
PPT
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
PPTX
Towards Trustable AI for Complex Systems
PPTX
Welcome
PPTX
Closing / Adjourn
PPTX
Community Website: Virtual Ribbon Cutting
PPTX
Path to 8.0
PPTX
Release Cycle Changes
PPTX
Geohashing with Uber’s H3 Geospatial Index
PPTX
Advancements in HPCC Systems Machine Learning
PPTX
Docker Support
PPTX
Expanding HPCC Systems Deep Neural Network Capabilities
PPTX
Leveraging Intra-Node Parallelization in HPCC Systems
PPTX
DataPatterns - Profiling in ECL Watch
PPTX
Leveraging the Spark-HPCC Ecosystem
PPTX
Work Unit Analysis Tool
PPTX
Community Award Ceremony
PPTX
Dapper Tool - A Bundle to Make your ECL Neater
PPTX
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
PPTX
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Towards Trustable AI for Complex Systems
Welcome
Closing / Adjourn
Community Website: Virtual Ribbon Cutting
Path to 8.0
Release Cycle Changes
Geohashing with Uber’s H3 Geospatial Index
Advancements in HPCC Systems Machine Learning
Docker Support
Expanding HPCC Systems Deep Neural Network Capabilities
Leveraging Intra-Node Parallelization in HPCC Systems
DataPatterns - Profiling in ECL Watch
Leveraging the Spark-HPCC Ecosystem
Work Unit Analysis Tool
Community Award Ceremony
Dapper Tool - A Bundle to Make your ECL Neater
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...

Recently uploaded (20)

PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Lecture1 pattern recognition............
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
1_Introduction to advance data techniques.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Global journeys: estimating international migration
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Clinical guidelines as a resource for EBP(1).pdf
Lecture1 pattern recognition............
Reliability_Chapter_ presentation 1221.5784
IBA_Chapter_11_Slides_Final_Accessible.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Moving the Public Sector (Government) to a Digital Adoption
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
1_Introduction to advance data techniques.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Introduction-to-Cloud-ComputingFinal.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Database Infoormation System (DBIS).pptx
Global journeys: estimating international migration
168300704-gasification-ppt.pdfhghhhsjsjhsuxush

Big Data and Geospatial with HPCC Systems

  • 1. Big Data and Geospatial with HPCC Systems® Powered by LexisNexis Risk Solutions Ignacio Calvo Greg McRandal 10/05/2016
  • 2. Concepts in Geospatial How to use them with HPCC Use cases @HPCCSystems
  • 3. An approach to applying statistical analysis and other analytic techniques to data which has a geographical or spatial aspect Definition
  • 5. Origin of Geospatial John Snow’s original map (1854), using GIS to save lives. This map was used to determine that Cholera was water-borne
  • 6. Need to know : • Format • Projection / coordinate system Understanding the data
  • 7. Formats : Vector vs Raster Vector Raster
  • 8. Projections are used to represent the world in ways we can process •The Earth is round and maps are flat •Physical Maps •Computer Maps What is a projection? Have I seen projections before? •Peter vs Mercator vs Winkel tripel •GPS (latitude/longitude) •Google Maps
  • 9. Two different projections representing the same place. Projections
  • 10. WGS84 •Latitude and longitude •Our best approximation of the world •Not always the best for a specific region •Not technically a projection Projections to know about Mercator •Many different ones, choose one based on your location •Reduces the area it covers to a simple Cartesian plane •Good near the central axis, bad far away from it : • Web Mercator covers the whole world – good near equator, gets worse as you travel north or south • Irish National Grid – very good for Ireland, awful anywhere else.
  • 11. Lies, damned lies, statistics… and maps! *https://twitter.com/flashboy/status/641221733509373952
  • 12. Lies, damned lies, statistics… and maps! Projection Woes: A straight line in Mercator is not a straight line in WGS84 Four points converted to WGS84 Where the lines should be Don’t re-project polygons! This “solution” is only good enough for visuals, not for maths.
  • 13. Lies, damned lies, statistics… and maps!
  • 14. Lies, damned lies, statistics… and maps! Visuals don’t agree with maths: Wind and Hail. Web Mercator WGS84
  • 15. Number one bug in Geospatial *http://twcc.fr
  • 16. Number one bug in Geospatial Latitude Longitude X Y LatY LonX
  • 17. Now I understand my data, what’s next? Data Ingest Index Query
  • 18. Bringing Geospatial into HPCC GOAL Bring our geospatial processes into the realm of Big Data
  • 19. STEPS Spatial filtering of vector geometries Spatial operations using vector geometries Spatial reference projection and transformation Reading of compressed geo-raster files Big Data Extend HPCC and ECL to support the following main capabilities :
  • 20. STEPS Big Data Integration of open source libraries
  • 21. Ingesting Vector Data It’s a CSV file. Id Name Geometry Projection Value 1 Alice’s place POINT (53.78925462 -6.08354321) 4326* €5,973,000 2 Bob’s place POINT (-34.78925462 7.08354321) 4326 €872,000 3 Celine’s place POINT (102.78925462 -6.08354321) 4326 €9,324,000 * WGS84 (Lat/Lon) 3. Peril tag 2. Geocode address 1. Policy data Data ready to ingest
  • 22. Ingesting Vector Data It’s a GML / XML file. 3. Process and index 2. Parse XPATH 1. Shape data Data ready to query
  • 23. Ingesting Vector Data It’s a GML / XML file. 3. Process and index 2. Parse XPATH 1. Shape data Data ready to query
  • 24. Ingesting Vector Data It’s a GML / XML file. 3. Process and index 2. Parse XPATH 1. Shape data Data ready to query
  • 25. Indexing vector data • Outline Box: Biggest rectangle • Boxes contain boxes • Bottom box in the tree contains actual geometries • Here, 3 levels pictured • Boxes can overlap (entries are only in one)
  • 26. Querying vector data Searching an R-Tree: e.g. Finding all buildings (points) inside a flood zone (polygon) Does the query polygon overlap our box? Return empty list Search our boxes’ children Is it a leaf node? Return all nodes for verification Y N Y N
  • 27. Ingesting Raster Data It’s a raster / TIFF file. Bitmap image 3. Process and index 2. Tile and spray 1. Raster data Data ready to query
  • 28. Ingesting Raster Data 3. Process and index 2. Tile and spray 1. Raster data Data ready to query Tiling divides raster images into small manageable areas of known dimensions. These tiles have their own metadata: • Bounding box • Grid position
  • 29. Ingesting Raster Data 3. Process and index 2. Tile and spray 1. Raster data Data ready to query 1. Figure out which grid position the geometry needs 2. Extract the required pixel 3. Interrogate the pixel for its value 4. Interpret its value 5. Return to user
  • 30. Ingesting Raster Data It’s a raster / TIFF file. Bitmap image 3. Process and index 2. Tile and spray 1. Raster data Data ready to query
  • 31. Ingesting Raster Data It’s a raster / TIFF file. 3. Process and index 2. Tile and spray 1. Raster data Data ready to query
  • 32. Bringing it all together *Andrew Farrell In pursuit of perils : Geo-spatial risk analysis through HPCC Systems https://hpccsystems.com/resources/blog/afarrell/pursuit-perils-geo-spatial-risk-analysis- through-hpcc-systems
  • 33. Add even more value
  • 34. Add even more value
  • 35. Why Geospatial with HPCC? • Efficient parallel processing • Ability to import libraries from different languages • Good coverage of functions and spatial predicates • Fast ingestion • Support for different formats • Sub-second queries