SlideShare a Scribd company logo
© 2015 SpaceCurve, Inc. Confidential. | 1!
© 2015 SpaceCurve, Inc. Confidential. | 2!
Spatial Data
Hadoop Ecosystem
SpaceCurve’s Spatial Data Platform
Integrating with Hadoop
© 2015 SpaceCurve, Inc. Confidential. | 3!
© 2015 SpaceCurve, Inc. Confidential. | 4!
•  Largest datasets are geospatial in nature
– Daily generation of petabytes of data
– Most is not used or simply discarded
•  Proliferation of mobile platforms, sensors and
IoT
– More geospatial data will be generated in real-time
•  Typical big data solutions can scale to ingest
and store vast quantities of data
– But these are not designed for real-time,
geospatial data
© 2015 SpaceCurve, Inc. Confidential. | 5!
Devices > People
In 2008, # of internet devices 
exceeded # of people on earth
20 - 50 Billion
Estimated # of connected devices by 2020
80% of all data
has spatial attributes*
90% of all mobile
data is location aware*
*According to Gartner
© 2015 SpaceCurve, Inc. Confidential. | 6!
ü Mobile Platforms
ü Operational Intelligence
ü Sensored World/Digital Business
ü Context Rich Autonomous Systems 
ü Smart Machines/M2M
Source: Gartner Technology Trends 2015
© 2015 SpaceCurve, Inc. Confidential. | 7!
THE WORLD IS A
STATIC MAP
CAPTURING THE
MOTION OF THINGS

REMOTE CONTROL
OF THINGS
THINGS TALK TO
EACH OTHER




THINGS BEHAVE
INTELLIGENTLY



Map coordinates of points
of interest cataloged and
described on the Internet.
Packages have passive
sensors, we can track on
web and know where they
passed checkpoints.
UAVs used as remote
sensing platforms for
emergency response.
Aircraft optimize fuel
consumption in real-time
using data from internal and
external sensor networks.
Large fleets of autonomous
vehicles adapting to weather
conditions and traffic
congestion.
EXAMPLES
© 2015 SpaceCurve, Inc. Confidential. | 8!
© 2015 SpaceCurve, Inc. Confidential. | 9!
•  Hadoop’s open source platform has become synonymous 
with big data processing
•  Core ecosystem:
–  Distributed file system for data storage (HDFS)
–  Distributed processing of data at scale (MapReduce)
–  Batch-oriented job execution
•  Hadoop-based solutions excel at:
–  Ingesting and data warehousing multiple sources of data
–  Creating and updating analytical dashboards on a weekly, daily or
hourly basis
–  Providing insights from historical data that apply to future
scenarios
© 2015 SpaceCurve, Inc. Confidential. | 10!
•  Hadoop ecosystem can scale to geospatial storage requirements
•  HDFS not efficient for organizing and analyzing these data models as:
–  Geospatial data does not have a predictable, uniform distribution
–  Hash functions can transform unpredictable, non-uniform
distributions do not preserve nor expose geospatial biases and
relationships efficiently
•  Results:
–  Reduction in parallelism and efficiency of geospatial analysis
–  Inability to implement computational geometry needed for
geospatial analytics
© 2015 SpaceCurve, Inc. Confidential. | 11!
© 2015 SpaceCurve, Inc. Confidential. | 12!
CONTINUOUS HIGH-VELOCITY data ingestion rates are far beyond the
limits of traditional spatial analysis platforms.
SPATIAL ANALYTICS required for high-value Internet of Everything 
applications are not supportable on popular big data platforms.
REAL-TIME operational analysis requirements preclude the use 
of batch-oriented platforms.
DATA VOLUME greatly exceeds capacity of platforms designed for real-time
analysis of human-generated sources.
© 2015 SpaceCurve, Inc. Confidential. | 13!
•  SpaceCurve has created the first purpose-built 
platform from the ground up:
–  Designed for organizing multiple streams of very large scale geospatial
data
–  Optimized for analyzing data in real-time
–  Eliminates limitations on geospatial data inherent in other platforms
•  The SpaceCurve platform makes it possible to:
–  Collect and fuse multiple sources of data in real-time and immediately
streaming it to an application
–  Allow continuous queries and analytics to be run with second and sub-
second responses
–  Provide insights from real-time data that can apply to current,
immediate scenarios
© 2015 SpaceCurve, Inc. Confidential. | 14!
CONTINUOUS
HIGH-VELOCITY
INGESTION
COMPLEX SPATIAL
DATA TYPES 
OPERATIONS
EXTREME DATA
VOLUMES
REAL-TIME QUERY
EXECUTION 
ANALYSIS
© 2015 SpaceCurve, Inc. Confidential. | 15!
© 2015 SpaceCurve, Inc. Confidential. | 16!
•  Integration at the HDFS layer
•  Enables all current systems and tools to be
utilized in their normal workflows
•  Leverages existing investments and enables
real-time geospatial use cases
•  Build combined workflows that operate in
parallel or where Hadoop components can
call out queries into SpaceCurve
© 2015 SpaceCurve, Inc. Confidential. | 17!
•  Additional resources can be found below:
– Github – https://github.com/SpaceCurve/hadoop
•  This resource outlines the mechanics of export/import
between SpaceCurve and Hadoop and includes a step-
by-step tutorial using California earthquake data
– SpaceCurveVM – available upon request
•  This resource lets a user install the SpaceCurve system
loaded with sample data and use SpaceCurve SQL to
query the data
© 2015 SpaceCurve, Inc. Confidential. | 18!
ESRI	
  Tools	
  
HDFS	
  
MapReduce	
  
Hive	
  
GeoJSON	
  
Mapper	
  
Reducer	
  
Hive	
  SQL	
  
SpaceCurve
HTTP/JSON	
  
Hadoop	
  Ecosystem	
  
© 2015 SpaceCurve, Inc. Confidential. | 19!

More Related Content

PPTX
The State of Big Data for Geo - ESRI Big Data Meetup
PDF
The Critical Role of IoT Data Integration to develop Big Data Applications (f...
PDF
Opportunities in Sensor Networks and Big Data in 2014 (for NIKKEI Big Data Co...
PPTX
Application of Distributed processing and Big data in agricultural DSS
PDF
Converged solutions for HPC and Big Data Analytics using Clusters and Clouds
PDF
How to Create the Google for Earth Data (XLDB 2015, Stanford)
PDF
NOAA Big Data Project Handout
The State of Big Data for Geo - ESRI Big Data Meetup
The Critical Role of IoT Data Integration to develop Big Data Applications (f...
Opportunities in Sensor Networks and Big Data in 2014 (for NIKKEI Big Data Co...
Application of Distributed processing and Big data in agricultural DSS
Converged solutions for HPC and Big Data Analytics using Clusters and Clouds
How to Create the Google for Earth Data (XLDB 2015, Stanford)
NOAA Big Data Project Handout

What's hot (19)

PDF
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
PDF
ExtremeEarth Data Science Pipeline for Linked Earth Observation Data
PDF
The NOAA Big Data Project Overview
PDF
Rainer Sternfeld - Planetary Big Data - PlanetOS - Stanford Engineering - Mar...
PDF
Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...
PDF
Accelerating Research and Enterprise Solutions by Bridging HPC and AI
PDF
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
DOCX
Hadoop Developer
PDF
Data Centric HPC for Numerical Weather Forecasting
PDF
CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...
PDF
NASA Earth Exchange (NEX) Overview
PDF
Dynamics 365: Empowering Canada's Oil & Gas Industry
PDF
Dynamics 365: Evolution to the Digital Age
PPTX
Mike Warren Keynote
PPTX
Science base usage analysis - AGU2016 - in21d08
PPTX
PDF
Building useful models for imbalanced datasets (without resampling)
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
ExtremeEarth Data Science Pipeline for Linked Earth Observation Data
The NOAA Big Data Project Overview
Rainer Sternfeld - Planetary Big Data - PlanetOS - Stanford Engineering - Mar...
Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...
Accelerating Research and Enterprise Solutions by Bridging HPC and AI
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Hadoop Developer
Data Centric HPC for Numerical Weather Forecasting
CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...
NASA Earth Exchange (NEX) Overview
Dynamics 365: Empowering Canada's Oil & Gas Industry
Dynamics 365: Evolution to the Digital Age
Mike Warren Keynote
Science base usage analysis - AGU2016 - in21d08
Building useful models for imbalanced datasets (without resampling)
Ad

Viewers also liked (18)

PPTX
Chapter 10, Part 2: Queering the Gecko- Race, Sexual Orientation, and Margina...
PPT
Funkcionalnaya shema-kompyutera
PPT
Information
PDF
Nuovi strumenti di accesso al credito - minibond
PPTX
Komp-nauku123
PPT
логістика
PPT
Dcb cms 330 movie anlysis pp
PDF
кратко резиме од спроведеното истражување во април 2016 од tns brima gallup i...
PPT
Programna injeneria1
PDF
Национална стратегија за застапување на ПРЕЦЕДЕ мрежата 2016-2026 Партнерство...
PPT
ПРИНЦИПИТЕ ОД ПОВЕЛБАТА ЗА ДЕЦАТА 2030 и нивната застапеност во НАЦИОНАЛНИО...
PDF
Стратегиски план 2016 2020 на Првата детска амбасада во светот МЕЃАШИ
PPTX
Prezentatsia menedzhment
RTF
Достали
PDF
Open Source: alternativa vincente per l'azienda?
PDF
testing
PPTX
LS 574 Information Literacy Instruction Assignment
PDF
Dcb cms 330 movie anlysis pp
Chapter 10, Part 2: Queering the Gecko- Race, Sexual Orientation, and Margina...
Funkcionalnaya shema-kompyutera
Information
Nuovi strumenti di accesso al credito - minibond
Komp-nauku123
логістика
Dcb cms 330 movie anlysis pp
кратко резиме од спроведеното истражување во април 2016 од tns brima gallup i...
Programna injeneria1
Национална стратегија за застапување на ПРЕЦЕДЕ мрежата 2016-2026 Партнерство...
ПРИНЦИПИТЕ ОД ПОВЕЛБАТА ЗА ДЕЦАТА 2030 и нивната застапеност во НАЦИОНАЛНИО...
Стратегиски план 2016 2020 на Првата детска амбасада во светот МЕЃАШИ
Prezentatsia menedzhment
Достали
Open Source: alternativa vincente per l'azienda?
testing
LS 574 Information Literacy Instruction Assignment
Dcb cms 330 movie anlysis pp
Ad

Similar to SpaceCurve - Integrating with Hadoop (20)

PDF
Q4 2016 GeoTrellis Presentation
PDF
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
PDF
Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...
PDF
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
PDF
Planet OS: Indexing the Real World (a lecture at the Stanford Engineering Sch...
PPTX
Big Data - Need of Converged Data Platform
PDF
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
PDF
Processing Geospatial Data At Scale @locationtech
PPT
Intelligent Data Processing for the Internet of Things
PDF
Real World Use Cases: Hadoop and NoSQL in Production
PDF
DataStax and Esri: Geotemporal IoT Search and Analytics
PPTX
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
PPTX
A modern IoT data processing toolbox
PDF
Effective IoT System on Openstack
PPT
OS MasterMap it's not a map - but data
PDF
04 open source_tools
PPTX
PPTX
Back to school: Big Data IDEA 101
PDF
Big data, Hadoop - lunchtime talk 2015.02.26
PPTX
Data Ingestion At Scale (CNECCS 2017)
Q4 2016 GeoTrellis Presentation
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Planet OS: Indexing the Real World (a lecture at the Stanford Engineering Sch...
Big Data - Need of Converged Data Platform
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Processing Geospatial Data At Scale @locationtech
Intelligent Data Processing for the Internet of Things
Real World Use Cases: Hadoop and NoSQL in Production
DataStax and Esri: Geotemporal IoT Search and Analytics
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
A modern IoT data processing toolbox
Effective IoT System on Openstack
OS MasterMap it's not a map - but data
04 open source_tools
Back to school: Big Data IDEA 101
Big data, Hadoop - lunchtime talk 2015.02.26
Data Ingestion At Scale (CNECCS 2017)

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
cuic standard and advanced reporting.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
A Presentation on Artificial Intelligence
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation theory and applications.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
cuic standard and advanced reporting.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Dropbox Q2 2025 Financial Results & Investor Presentation
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Review of recent advances in non-invasive hemoglobin estimation
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
“AI and Expert System Decision Support & Business Intelligence Systems”
Spectral efficient network and resource selection model in 5G networks
The Rise and Fall of 3GPP – Time for a Sabbatical?
A Presentation on Artificial Intelligence
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Machine learning based COVID-19 study performance prediction
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx

SpaceCurve - Integrating with Hadoop

  • 1. © 2015 SpaceCurve, Inc. Confidential. | 1!
  • 2. © 2015 SpaceCurve, Inc. Confidential. | 2! Spatial Data Hadoop Ecosystem SpaceCurve’s Spatial Data Platform Integrating with Hadoop
  • 3. © 2015 SpaceCurve, Inc. Confidential. | 3!
  • 4. © 2015 SpaceCurve, Inc. Confidential. | 4! •  Largest datasets are geospatial in nature – Daily generation of petabytes of data – Most is not used or simply discarded •  Proliferation of mobile platforms, sensors and IoT – More geospatial data will be generated in real-time •  Typical big data solutions can scale to ingest and store vast quantities of data – But these are not designed for real-time, geospatial data
  • 5. © 2015 SpaceCurve, Inc. Confidential. | 5! Devices > People In 2008, # of internet devices exceeded # of people on earth 20 - 50 Billion Estimated # of connected devices by 2020 80% of all data has spatial attributes* 90% of all mobile data is location aware* *According to Gartner
  • 6. © 2015 SpaceCurve, Inc. Confidential. | 6! ü Mobile Platforms ü Operational Intelligence ü Sensored World/Digital Business ü Context Rich Autonomous Systems  ü Smart Machines/M2M Source: Gartner Technology Trends 2015
  • 7. © 2015 SpaceCurve, Inc. Confidential. | 7! THE WORLD IS A STATIC MAP CAPTURING THE MOTION OF THINGS REMOTE CONTROL OF THINGS THINGS TALK TO EACH OTHER THINGS BEHAVE INTELLIGENTLY Map coordinates of points of interest cataloged and described on the Internet. Packages have passive sensors, we can track on web and know where they passed checkpoints. UAVs used as remote sensing platforms for emergency response. Aircraft optimize fuel consumption in real-time using data from internal and external sensor networks. Large fleets of autonomous vehicles adapting to weather conditions and traffic congestion. EXAMPLES
  • 8. © 2015 SpaceCurve, Inc. Confidential. | 8!
  • 9. © 2015 SpaceCurve, Inc. Confidential. | 9! •  Hadoop’s open source platform has become synonymous with big data processing •  Core ecosystem: –  Distributed file system for data storage (HDFS) –  Distributed processing of data at scale (MapReduce) –  Batch-oriented job execution •  Hadoop-based solutions excel at: –  Ingesting and data warehousing multiple sources of data –  Creating and updating analytical dashboards on a weekly, daily or hourly basis –  Providing insights from historical data that apply to future scenarios
  • 10. © 2015 SpaceCurve, Inc. Confidential. | 10! •  Hadoop ecosystem can scale to geospatial storage requirements •  HDFS not efficient for organizing and analyzing these data models as: –  Geospatial data does not have a predictable, uniform distribution –  Hash functions can transform unpredictable, non-uniform distributions do not preserve nor expose geospatial biases and relationships efficiently •  Results: –  Reduction in parallelism and efficiency of geospatial analysis –  Inability to implement computational geometry needed for geospatial analytics
  • 11. © 2015 SpaceCurve, Inc. Confidential. | 11!
  • 12. © 2015 SpaceCurve, Inc. Confidential. | 12! CONTINUOUS HIGH-VELOCITY data ingestion rates are far beyond the limits of traditional spatial analysis platforms. SPATIAL ANALYTICS required for high-value Internet of Everything applications are not supportable on popular big data platforms. REAL-TIME operational analysis requirements preclude the use of batch-oriented platforms. DATA VOLUME greatly exceeds capacity of platforms designed for real-time analysis of human-generated sources.
  • 13. © 2015 SpaceCurve, Inc. Confidential. | 13! •  SpaceCurve has created the first purpose-built platform from the ground up: –  Designed for organizing multiple streams of very large scale geospatial data –  Optimized for analyzing data in real-time –  Eliminates limitations on geospatial data inherent in other platforms •  The SpaceCurve platform makes it possible to: –  Collect and fuse multiple sources of data in real-time and immediately streaming it to an application –  Allow continuous queries and analytics to be run with second and sub- second responses –  Provide insights from real-time data that can apply to current, immediate scenarios
  • 14. © 2015 SpaceCurve, Inc. Confidential. | 14! CONTINUOUS HIGH-VELOCITY INGESTION COMPLEX SPATIAL DATA TYPES OPERATIONS EXTREME DATA VOLUMES REAL-TIME QUERY EXECUTION ANALYSIS
  • 15. © 2015 SpaceCurve, Inc. Confidential. | 15!
  • 16. © 2015 SpaceCurve, Inc. Confidential. | 16! •  Integration at the HDFS layer •  Enables all current systems and tools to be utilized in their normal workflows •  Leverages existing investments and enables real-time geospatial use cases •  Build combined workflows that operate in parallel or where Hadoop components can call out queries into SpaceCurve
  • 17. © 2015 SpaceCurve, Inc. Confidential. | 17! •  Additional resources can be found below: – Github – https://github.com/SpaceCurve/hadoop •  This resource outlines the mechanics of export/import between SpaceCurve and Hadoop and includes a step- by-step tutorial using California earthquake data – SpaceCurveVM – available upon request •  This resource lets a user install the SpaceCurve system loaded with sample data and use SpaceCurve SQL to query the data
  • 18. © 2015 SpaceCurve, Inc. Confidential. | 18! ESRI  Tools   HDFS   MapReduce   Hive   GeoJSON   Mapper   Reducer   Hive  SQL   SpaceCurve HTTP/JSON   Hadoop  Ecosystem  
  • 19. © 2015 SpaceCurve, Inc. Confidential. | 19!