SlideShare a Scribd company logo
2
Most read
6
Most read
21
Most read
1
Intro to Spatial Data Science
with R
AlĆ­ Santacruz
amsantac.co
JULY 2016
2
About me
• Expert in geomatics with a background in environmental sciences
• R geek
• PhD candidate in Geography
• Interested in Spatial Data Science
• Author of several R packages (available on CRAN)
3
Purpose of this talk
• Discuss what Spatial Data Science is
• Give an introductory explanation about how to conduct Spatial Data
Science with R
4
What is Spatial Data Science?
Spatial Data Scientist (n.):
Statistician GIS/RS expertGIS developer Software
engineer
Spatial Data
Scientist
Spatial Data Science
Data Science
Spatial
Person that is better in spatial data analysis than a GIS developer and
better in software engineering than a GIS/RS expert
5
Spatial Data Science
All they are combined for data
analysis in order to …
Support a better decision
making
"The key word in data science is not data; it is science"
Jeff Leek. Data Science Specialization. Coursera.
6
Spatial
Data
Scientist
Modified from
gettingsmart.com
7
Hacking skills
• Programming languages: Python and R (and others)
http://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html
8
Why should we use R?
• Free and open-source
• A large and comprehensive set of packages (> 8600)
• Data access
• Data cleaning
• Analysis
• Visualization and report generation
• Excellent development environments – RStudio IDE
• An active and friendly developers community
• A huge users community: > 2 million
9
Why R for spatial analysis
• 160+ packages in CRAN Task View: Analysis of Spatial Data
• Classes for spatial (and spatio-temporal) data
• Spatial data import/export
• Exploratory spatial data analysis
• Support for vector and raster operations
• Spatial statistics
• Data visualization through static and dynamic (web) graphics
• Integration with GIS software
• Easy integration with techniques from non-spatial packages
10
R classes for spatial data
• Before 2003:
• Several packages with different assumptions on how spatial data was structured
• From 2003:
• ā€˜sp’ package: extends R classes and methods for spatial data (vector and raster)
• From 2010:
• ā€˜raster’ package: deals with raster files stored in disk that are too large to be
loaded on memory (RAM)
11
R classes for spatial data
SpatialPointsDataFrame SpatialLinesDataFrame SpatialPolygonsDataFrame
SpatialPixelsDataFrame
SpatialGridDataFrame
sp package
RasterLayer
RasterStack
RasterBrick
raster package
(recommended)
12
The Data
Science
Process
Modified from science2knowledge
Reproducibility
13
MODELAR
los datos
MODEL
the data
EXPLORAR
los datos
EXPLORE
the data
PREPARAR
los datos
PREPARE
the data
• Is this A or B or C? :: classification
• Is this weird? :: anomaly detection
• How much/how many? :: regression
• How is it organized? :: clustering
• How will it change? :: prediction
"The key word in data science is not data; it is science"
Jeff Leek. Data Science Specialization. Coursera.
OBTENER
los datos
GET
the data
Domain expertise
PLANTEAR la
pregunta correcta
PLANTEAR la
pregunta correcta
ASK the right
question
COMUNICAR
los resultados
COMMUNICATE
the results
14
• Import vector layers: rgdal, raster packages
• Import raster layers: raster package
• Get geocoded data from APIs: twitteR package, see example
• Download satellite images/geographic data: raster, modis, MODISTools packages
For this slide and following ones see code
and examples in in this webpage
MODELAR
los datos
MODEL
the data
EXPLORAR
los datos
EXPLORE
the data
PREPARAR
los datos
PREPARE
the data
GET
the data
PLANTEAR la
pregunta correcta
PLANTEAR la
pregunta correcta
ASK the right
question
COMUNICAR
los resultados
COMMUNICATE
the results
15
• Data cleaning, subset, etc.
• Manipulate data with ā€œverbsā€ from dplyr and other Hadley-verse packages
• Spatial subset (sp, raster packages)
• Vector operations:
• Operations on the attribute table (sp package)
• Overlay: union, intersection, clip, extract values from raster data using
points/polygons (raster, rgeos packages)
• Dissolve (sp, rgeos packages), buffer (rgeos package)
• Rasterize vector data (raster package)
• Raster operations:
• Map algebra, spatial filters, resampling, … (raster package)
• Vectorize raster data (rgdal, raster packages)
For slides 14 - 18 see code and examples
in this webpage
MODELAR
los datos
MODEL
the data
EXPLORAR
los datos
EXPLORE
the data
PREPARE
the data
OBTENER
los datos
GET
the data
PLANTEAR la
pregunta correcta
PLANTEAR la
pregunta correcta
ASK the right
question
COMUNICAR
los resultados
COMMUNICATE
the results
16
• Descriptive statistics: central tendency and spread measures
• Exploratory graphics (2D, 3D): scatter plot, box plot, histogram, …
• Spatial autocorrelation:
• Global spatial autocorrelation statistics: Moran’s I, Geary’s C, Getis and Ord’s
G(d) (spdep package)
• Local spatial autocorrelation statistics: Moran’s Ii, Getis and Ord’s Gi y Gi*(d)
(spdep package)
MODELAR
los datos
MODEL
the data
EXPLORE
the data
PREPARAR
los datos
PREPARE
the data
OBTENER
los datos
GET
the data
PLANTEAR la
pregunta correcta
PLANTEAR la
pregunta correcta
ASK the right
question
COMUNICAR
los resultados
COMMUNICATE
the results
For slides 14 - 18 see code and examples
in this webpage
17
• Regression:
• Spatial autoregressive models (spdep package)
• Geographically weighted regression (spgwr package)
• Classification (Machine Learning):
• Supervised: RandomForests, SVM, boosting, … (caret package)
• Non-supervised: k-means clustering (stats package)
• Spatial statistics:
• Geostatistics (gstat, geoR, geospt packages and others)
• Spatial point patterns (spatstat package)
MODEL
the data
EXPLORAR
los datos
EXPLORE
the data
PREPARAR
los datos
PREPARE
the data
OBTENER
los datos
GET
the data
PLANTEAR la
pregunta correcta
PLANTEAR la
pregunta correcta
ASK the right
question
COMUNICAR
los resultados
COMMUNICATE
the results
For slides 14 - 18 see code and examples
in this webpage
18
• Static or interactive maps: tmap, leaflet, mapview packages
• Interactive graphics, web apps and dashboards:
• plotly (example), rcharts, googleVis (example) packages
• shiny, see example
• flexdashboard, see example
MODELAR
los datos
MODEL
the data
EXPLORAR
los datos
EXPLORE
the data
PREPARAR
los datos
PREPARE
the data
OBTENER
los datos
GET
the data
PLANTEAR la
pregunta correcta
PLANTEAR la
pregunta correcta
ASK the right
question
COMMUNICATE
the results
For slides 14 - 18 see code and examples
in this webpage
19
Don’t forget: Reproducibility!
• R code and output for examples shown in this webinar (slides 17-21) can
be reproduced with this .Rmd document using RMarkdown
• See this example about reproducible spatial analysis using interactive
notebooks
• Learn more about reproducible geoscientific research
20
Integrating R with GIS software
• QGIS: see example in this post
• ArcGIS: arcgisbinding package, see example in this post
• GRASS GIS: version 6, spgrass6 package; version 7, rgrass7 package
• gvSIG: more info in this post
• SAGA: RSAGA package
• GME (Geospatial Modelling Environment): more info in this webpage
21
References / Online resources
• Bivand, R., Pebesma, E., Gómez-Rubio, V. 2013. Applied Spatial Data
Analysis with R. New York: Springer. 2nd ed.
• R-SIG-Geo mailing list
• CRAN Task View: Analysis of Spatial Data
• Facebook groups: GIS with R, R project en EspaƱol
• Google+ groups: Statistics and R, R Programming for Data Analysis
• My blog: amsantac.co/blog.html
If you have any question feel free to contact me:
amsantac.co/contact.html
Thanks!

More Related Content

PPTX
geo spatial data and its types.pptx
PDF
Python Programming and GIS
PPT
Mobile gis
PPTX
GEOPROCESSING IN QGIS
PPTX
Raster data analysis
PPTX
Spatial databases
PPT
ENVI basic function overview
PDF
GIS and QGIS training notes
geo spatial data and its types.pptx
Python Programming and GIS
Mobile gis
GEOPROCESSING IN QGIS
Raster data analysis
Spatial databases
ENVI basic function overview
GIS and QGIS training notes

What's hot (20)

PPTX
R programming language in spatial analysis
PPTX
Getting started with GIS
PPTX
GIS_Intro_March_2014
PPT
spatial data infrastructure : issues and concepts
PPTX
Introduction to GIS
PPTX
Digitizing features_2 in ARC GIS
PPTX
Gis functions
PPTX
Geographic Information System for Egyptian Railway System(GIS)
PPTX
Arc Geographic Information System (GIS) Digital Elevation Models (DEM).
PPTX
Spatial Data Models
PPTX
Applications of RS and GIS in Urban Planning by Rakshith m murthy
PDF
Spatial analysis and Analysis Tools
PDF
PPT
Gis Geographical Information System Fundamentals
PPTX
Geographic Phenomena and their Representations
PPTX
Geo-spatial Analysis and Modelling
PDF
QGIS Module 1
PPTX
GIS - Topology
PPT
Web Mapping
PDF
geographic information system(gis) (1).pdf
R programming language in spatial analysis
Getting started with GIS
GIS_Intro_March_2014
spatial data infrastructure : issues and concepts
Introduction to GIS
Digitizing features_2 in ARC GIS
Gis functions
Geographic Information System for Egyptian Railway System(GIS)
Arc Geographic Information System (GIS) Digital Elevation Models (DEM).
Spatial Data Models
Applications of RS and GIS in Urban Planning by Rakshith m murthy
Spatial analysis and Analysis Tools
Gis Geographical Information System Fundamentals
Geographic Phenomena and their Representations
Geo-spatial Analysis and Modelling
QGIS Module 1
GIS - Topology
Web Mapping
geographic information system(gis) (1).pdf
Ad

Similar to Spatial Data Science with R (20)

PDF
Spatial_Data_Analysis_with_open_source_softwares[1]
PDF
Spatial Analysis with R - the Good, the Bad, and the Pretty
PDF
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
PDF
Essentials of R
PPT
Applied spatial data introducing
PDF
Applied Spatial Data Analysis With R Roger S Bivand Edzer J Pebesma
PDF
Data_Visualization_and_Engineering_UC_2022.pdf
PPTX
R spatial presentation
PDF
(eBook PDF) Introduction to Geographic Information Systems 8th
PDF
Using python to analyze spatial data
PDF
The Role of Data Science in Real Estate
Ā 
PPTX
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
PDF
EuroPython 2019: GeoSpatial Analysis using Python and JupyterHub
PDF
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
PPTX
Geographical information systems
PDF
GIS in R (Implementation of a web application)
PDF
Unit3 slides
PPTX
Unit 2 - Data Manipulation with R.pptx
PPTX
Dr Richard Fry - Using R as a GIS
Spatial_Data_Analysis_with_open_source_softwares[1]
Spatial Analysis with R - the Good, the Bad, and the Pretty
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Essentials of R
Applied spatial data introducing
Applied Spatial Data Analysis With R Roger S Bivand Edzer J Pebesma
Data_Visualization_and_Engineering_UC_2022.pdf
R spatial presentation
(eBook PDF) Introduction to Geographic Information Systems 8th
Using python to analyze spatial data
The Role of Data Science in Real Estate
Ā 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
EuroPython 2019: GeoSpatial Analysis using Python and JupyterHub
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
Geographical information systems
GIS in R (Implementation of a web application)
Unit3 slides
Unit 2 - Data Manipulation with R.pptx
Dr Richard Fry - Using R as a GIS
Ad

Recently uploaded (20)

PDF
Lecture1 pattern recognition............
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
ā€œGetting Started with Data Analytics Using R – Concepts, Tools & Case Studiesā€
PDF
Mega Projects Data Mega Projects Data
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Business Acumen Training GuidePresentation.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Computer network topology notes for revision
Lecture1 pattern recognition............
Fluorescence-microscope_Botany_detailed content
Launch Your Data Science Career in Kochi – 2025
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Clinical guidelines as a resource for EBP(1).pdf
Database Infoormation System (DBIS).pptx
Introduction to Knowledge Engineering Part 1
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
ā€œGetting Started with Data Analytics Using R – Concepts, Tools & Case Studiesā€
Mega Projects Data Mega Projects Data
STUDY DESIGN details- Lt Col Maksud (21).pptx
Reliability_Chapter_ presentation 1221.5784
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Business Acumen Training GuidePresentation.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Data_Analytics_and_PowerBI_Presentation.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Computer network topology notes for revision

Spatial Data Science with R

  • 1. 1 Intro to Spatial Data Science with R AlĆ­ Santacruz amsantac.co JULY 2016
  • 2. 2 About me • Expert in geomatics with a background in environmental sciences • R geek • PhD candidate in Geography • Interested in Spatial Data Science • Author of several R packages (available on CRAN)
  • 3. 3 Purpose of this talk • Discuss what Spatial Data Science is • Give an introductory explanation about how to conduct Spatial Data Science with R
  • 4. 4 What is Spatial Data Science? Spatial Data Scientist (n.): Statistician GIS/RS expertGIS developer Software engineer Spatial Data Scientist Spatial Data Science Data Science Spatial Person that is better in spatial data analysis than a GIS developer and better in software engineering than a GIS/RS expert
  • 5. 5 Spatial Data Science All they are combined for data analysis in order to … Support a better decision making "The key word in data science is not data; it is science" Jeff Leek. Data Science Specialization. Coursera.
  • 7. 7 Hacking skills • Programming languages: Python and R (and others) http://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html
  • 8. 8 Why should we use R? • Free and open-source • A large and comprehensive set of packages (> 8600) • Data access • Data cleaning • Analysis • Visualization and report generation • Excellent development environments – RStudio IDE • An active and friendly developers community • A huge users community: > 2 million
  • 9. 9 Why R for spatial analysis • 160+ packages in CRAN Task View: Analysis of Spatial Data • Classes for spatial (and spatio-temporal) data • Spatial data import/export • Exploratory spatial data analysis • Support for vector and raster operations • Spatial statistics • Data visualization through static and dynamic (web) graphics • Integration with GIS software • Easy integration with techniques from non-spatial packages
  • 10. 10 R classes for spatial data • Before 2003: • Several packages with different assumptions on how spatial data was structured • From 2003: • ā€˜sp’ package: extends R classes and methods for spatial data (vector and raster) • From 2010: • ā€˜raster’ package: deals with raster files stored in disk that are too large to be loaded on memory (RAM)
  • 11. 11 R classes for spatial data SpatialPointsDataFrame SpatialLinesDataFrame SpatialPolygonsDataFrame SpatialPixelsDataFrame SpatialGridDataFrame sp package RasterLayer RasterStack RasterBrick raster package (recommended)
  • 12. 12 The Data Science Process Modified from science2knowledge Reproducibility
  • 13. 13 MODELAR los datos MODEL the data EXPLORAR los datos EXPLORE the data PREPARAR los datos PREPARE the data • Is this A or B or C? :: classification • Is this weird? :: anomaly detection • How much/how many? :: regression • How is it organized? :: clustering • How will it change? :: prediction "The key word in data science is not data; it is science" Jeff Leek. Data Science Specialization. Coursera. OBTENER los datos GET the data Domain expertise PLANTEAR la pregunta correcta PLANTEAR la pregunta correcta ASK the right question COMUNICAR los resultados COMMUNICATE the results
  • 14. 14 • Import vector layers: rgdal, raster packages • Import raster layers: raster package • Get geocoded data from APIs: twitteR package, see example • Download satellite images/geographic data: raster, modis, MODISTools packages For this slide and following ones see code and examples in in this webpage MODELAR los datos MODEL the data EXPLORAR los datos EXPLORE the data PREPARAR los datos PREPARE the data GET the data PLANTEAR la pregunta correcta PLANTEAR la pregunta correcta ASK the right question COMUNICAR los resultados COMMUNICATE the results
  • 15. 15 • Data cleaning, subset, etc. • Manipulate data with ā€œverbsā€ from dplyr and other Hadley-verse packages • Spatial subset (sp, raster packages) • Vector operations: • Operations on the attribute table (sp package) • Overlay: union, intersection, clip, extract values from raster data using points/polygons (raster, rgeos packages) • Dissolve (sp, rgeos packages), buffer (rgeos package) • Rasterize vector data (raster package) • Raster operations: • Map algebra, spatial filters, resampling, … (raster package) • Vectorize raster data (rgdal, raster packages) For slides 14 - 18 see code and examples in this webpage MODELAR los datos MODEL the data EXPLORAR los datos EXPLORE the data PREPARE the data OBTENER los datos GET the data PLANTEAR la pregunta correcta PLANTEAR la pregunta correcta ASK the right question COMUNICAR los resultados COMMUNICATE the results
  • 16. 16 • Descriptive statistics: central tendency and spread measures • Exploratory graphics (2D, 3D): scatter plot, box plot, histogram, … • Spatial autocorrelation: • Global spatial autocorrelation statistics: Moran’s I, Geary’s C, Getis and Ord’s G(d) (spdep package) • Local spatial autocorrelation statistics: Moran’s Ii, Getis and Ord’s Gi y Gi*(d) (spdep package) MODELAR los datos MODEL the data EXPLORE the data PREPARAR los datos PREPARE the data OBTENER los datos GET the data PLANTEAR la pregunta correcta PLANTEAR la pregunta correcta ASK the right question COMUNICAR los resultados COMMUNICATE the results For slides 14 - 18 see code and examples in this webpage
  • 17. 17 • Regression: • Spatial autoregressive models (spdep package) • Geographically weighted regression (spgwr package) • Classification (Machine Learning): • Supervised: RandomForests, SVM, boosting, … (caret package) • Non-supervised: k-means clustering (stats package) • Spatial statistics: • Geostatistics (gstat, geoR, geospt packages and others) • Spatial point patterns (spatstat package) MODEL the data EXPLORAR los datos EXPLORE the data PREPARAR los datos PREPARE the data OBTENER los datos GET the data PLANTEAR la pregunta correcta PLANTEAR la pregunta correcta ASK the right question COMUNICAR los resultados COMMUNICATE the results For slides 14 - 18 see code and examples in this webpage
  • 18. 18 • Static or interactive maps: tmap, leaflet, mapview packages • Interactive graphics, web apps and dashboards: • plotly (example), rcharts, googleVis (example) packages • shiny, see example • flexdashboard, see example MODELAR los datos MODEL the data EXPLORAR los datos EXPLORE the data PREPARAR los datos PREPARE the data OBTENER los datos GET the data PLANTEAR la pregunta correcta PLANTEAR la pregunta correcta ASK the right question COMMUNICATE the results For slides 14 - 18 see code and examples in this webpage
  • 19. 19 Don’t forget: Reproducibility! • R code and output for examples shown in this webinar (slides 17-21) can be reproduced with this .Rmd document using RMarkdown • See this example about reproducible spatial analysis using interactive notebooks • Learn more about reproducible geoscientific research
  • 20. 20 Integrating R with GIS software • QGIS: see example in this post • ArcGIS: arcgisbinding package, see example in this post • GRASS GIS: version 6, spgrass6 package; version 7, rgrass7 package • gvSIG: more info in this post • SAGA: RSAGA package • GME (Geospatial Modelling Environment): more info in this webpage
  • 21. 21 References / Online resources • Bivand, R., Pebesma, E., Gómez-Rubio, V. 2013. Applied Spatial Data Analysis with R. New York: Springer. 2nd ed. • R-SIG-Geo mailing list • CRAN Task View: Analysis of Spatial Data • Facebook groups: GIS with R, R project en EspaƱol • Google+ groups: Statistics and R, R Programming for Data Analysis • My blog: amsantac.co/blog.html
  • 22. If you have any question feel free to contact me: amsantac.co/contact.html Thanks!