SlideShare a Scribd company logo
Understanding the planet using satellites
and deep learning
bcn.AI June 7th 2019
Albert Pujol Torras @AlbertPT71
Lead Machine Learning Platform
Agenda
● Satellogic
● Satellogic Data Science and Solutions
● What we can do with satellites, examples of problems we face
● What type of data do we work with ?
● Processing infrastructure, hardware and software
● Some lines of research we are interested on.
● Lessons learned
● Questions
understanding the planet using satellites and deep learning
Data Science & Solutions
BCN
Delivery platform
TLV
Headquarters & Design
BSAS
Manufacturing Plant
MVD
Comprehensive services
PEK
Object detection/Counting
What kinds of problems do we face ?
Object amount/density estimation / regression with lower image resolution
Estimation of other image modalities
HR RGB LR TIR LR SWIR1 LR SWIR2
HR THERMAL
Regression: time series image prediction
-Estimation of the yield at the end of the season
-Monitoring of changes in the estimation to know when and where to act.
Image semantic segmentation: Land use detection
“anomaly change” and “semantic change” detection
Time T0 Time T1 Diff (T1,T0)
Satellogic Data
3rd Party Satellite Data
Primary Data
Sources
Derived Layers
Temporal Evolution
Land Use Maps
Advanced Indices
Distance to Water
Terrain Orientation
Superresolution Images
...
These sources can be available globally or locally, dynamic or static, high or low res...
nKappa: Data science platform with focus on geographic data and satellite imagery.
Main goal: To scale solution development by automating/accelerating data science work.
nKappa enables solution development using aligned sets of image tiles (Kappas)
World Climate Maps
Geologic Data
Elevation Models
Georef: Man-Made Structure
Political Boundaries
Census Data Maps
Data - Data Sources
Sizes:
-Typical project: 20Gb/day.
-Daily world remap: continental surface processing 5300 hours of video per day.
Sources of image variation:
-Clouds….70% of the world is cloud covered.
-Perspective changes (off nadir satellite images, drone images).
-Shadows orientation, intensity, and longitude variations depending on day hour, clouds, and season.
-Chromatic changes due to aerosol and hour of day.
-Variations between sensors (different satellites, drone images,..)
-Variations/errors in image orthorectification, geolocalization.
-Growth and color of seasonal vegetation changes,...
Data - Data Sources
clouds perspective shadows
Chromatic and vegetation
changes
Data - Data Sources
Extremely unbalanced datasets
Rare and expensive: indispensable to train and to assess quality of ML and computer vision
approaches.
Sources of ground truth:
- Land ground truth provided by client.
- GT generated using highest resolution imagery.
- Human annotation
- Our team always annotate ... to understand the problem.
- Internal and external annotation (mechanical turk, supahands, ...)
- sample what to annotate to preserve variability and input domain coverage.
- Measure biases and variances of annotators (discard annotators, images,reconstruct annotation
instructions...).
- Other GT sources: first world surveyed data annotated from visual imagery or using land ground truth (Corina project,
LUCAS, Creaf, Siose in spain, USA USGS land cover dataset,...)
Useful one we have to deal with:
- Out of data (most of it are correct but small parts are erroneous)
- differing resolution (uncertain labels at class borders),
- domain/covariate shift: how to transfer it to places that differ in land management culture, climate or relief.
Data - Ground truth
expensivecheaper
GT Data: Covariate shift & Domain adaptation
Existent “good quality” Ground Truth
Rice fields in Europe
Target areas without ground truth
Urban areas in Europe Urban areas in Lagos
Rice fields in China
● huge amount of data --> cloud infrastructure.
● nKappa platform for distributed processing (actually using Microsoft Azure)
and in-house gpu servers (equipped with 1080ti’s)
● nKappa uses cloud for experiment management to keep track, team share,
and audit datasets, algorithms, models ,deploying pipelines and models in to
production, and handle all the GIS-ETL related stuff.
● GPU-servers mostly used in the stage of EDA and DS algorithms and models
development.
Infrastructure - Hardware
Some lines of research: Domain adaptation
”Deep Visual Domain Adaptation: A Survey”, Mei Wang, Weihong Deng,
“Domain Adaptation for Visual Applications: A Comprehensive Survey”, Gabriela Csurka
Sampling and sample-weighting based on classifier domain differentiation Adversarial networks to make embeddings invariant to domain change
GT Barcelona Target Lasa
Some lines of research: Usage of generative models
Image-to-Image Translation with Conditional Adversarial Networks, Isola, Phillip; Zhu, Jun-Yan; Zhou, Tinghui; Efros, Alexei A.
Satellite Image Spoofing: Creating Remote Sensing Dataset with Generative Adversarial Networks, Chunxue Xu,Bo Zhao
GeoGAN: A Conditional GAN with Reconstruction and Style Loss to Generate Standard Layer of Maps from Satellite Images
Invisible cities. https://opendot.github.io/ml4a-invisible-cities/implementation/
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros
Evaluation of the effects on semantic segmentation of using samples from Conditional Generative Adversarial Networks:
- Data augmentation: Generation of satellite images (textures) from land use random labels.
- Hiper resolution and image enhancement.
Some lines of research: Uncertainty measurement and GT cleaning
“Dropout as a Bayesian Approximation:Representing Model Uncertainty in Deep Learning”, Yarin Gal,Zoubin Ghahramani
“Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels”, Bo Han,, Quanming Yao, Xingrui Yu,, Gang Niu,Miao Xu,
Weihua Hu, Ivor W. Tsang, Masashi Sugiyama
Measure errors on GT labeling:
- Error and entropy on classification distribution when using
an ensemble of classifiers.
- Entropy of DNN outputs when applying dropout on fully
connected layers on inference stage.
Some lines of research: Distance metric -Invariant embeddings
How we can use the huge amount of unlabeled data to train models.
-learning deep NN invariant embeddings and transferable models for encoding land use content.
“Tile2Vec: Unsupervised representation learning for spatially distributed data” ,Neal Jean, Sherrie Wang, Anshul Samar, George Azzari, David Lobell,
Stefano Ermon
ANCHOR TILES
POSITIVE TILES
NEGATIVE TILES
Lessons learned
- Project success :
- 5% ML algorithm and algorithm parameters selection,
- 95% really understanding what the client needs, how to generate value, and anticipate how
your output is going to be consumed, defining good features, good ground truth, good
sampling data policy, pre and post processing.
- Dedicate the time first to ensure success, … after that improve:
- Using fast ML algorithms.
- Starting with small datasets with the input and output variability of the original one.
- Predictive models: accuracy is not always the most important: explainability, consistency.
- Worth invest on automatically measure dataset quality before start training on big datasets.
- Missing values, constant variables, unaligned bands, duplicated variables, unbalancing…
- Most of our in production costs are ETL (extract, transform, load)
- Deep Learning is amazing (sometimes too much for the problems to solve) ….and it is expensive:
- In production: computational cost.
- In development: Fine tuning and network cooking. (does not scale quite well)
- Context knowledge + common sense heuristics + ML vs end-to-end (is all tarjet domain variability
in your train set?)
Questions ?

More Related Content

PPTX
20181128 satellogic @ barcelona ai
PDF
AI Research and OpenPOWER at the NASA Frontier Development Lab
PDF
3d Modelling of Structures using terrestrial laser scanning technique
PDF
Applications of AI in the geospatial domain
PDF
Immersive 3 d visualization of remote sensing data
PDF
PDF
Copernicus and AI workshop 2020
PPTX
Taking a Geographic Approach to Machine Learning - Esri Ireland 'Do One Thing...
20181128 satellogic @ barcelona ai
AI Research and OpenPOWER at the NASA Frontier Development Lab
3d Modelling of Structures using terrestrial laser scanning technique
Applications of AI in the geospatial domain
Immersive 3 d visualization of remote sensing data
Copernicus and AI workshop 2020
Taking a Geographic Approach to Machine Learning - Esri Ireland 'Do One Thing...

What's hot (13)

PPTX
Automated features extraction from satellite images.
PPT
Critical Infrastructure Monitoring Using UAV Imagery
PDF
PyconPH 2014 - Image Analysis in Python
PPTX
Laser scanning technology in civil engg
PPTX
Big Data, Data and Information Mining for Earth Observation
PDF
Preliminary Evaluation of TinyYOLO on a New Dataset for Search-And-Rescue wit...
PDF
Application of terrestrial 3D laser scanning in building information modellin...
PDF
Crowd Counting from UAVs (ECCV2020)
PPT
Final presentation for Ordinance Survey sponsored MSc Project
PPTX
Remote Sensing Imagery & Artificial Intelligence
PPT
Godiva2 Overview
PDF
SPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
PPTX
Image processing training in mohali
Automated features extraction from satellite images.
Critical Infrastructure Monitoring Using UAV Imagery
PyconPH 2014 - Image Analysis in Python
Laser scanning technology in civil engg
Big Data, Data and Information Mining for Earth Observation
Preliminary Evaluation of TinyYOLO on a New Dataset for Search-And-Rescue wit...
Application of terrestrial 3D laser scanning in building information modellin...
Crowd Counting from UAVs (ECCV2020)
Final presentation for Ordinance Survey sponsored MSc Project
Remote Sensing Imagery & Artificial Intelligence
Godiva2 Overview
SPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
Image processing training in mohali
Ad

Similar to understanding the planet using satellites and deep learning (20)

PDF
Satellite Image Classification with Deep Learning Survey
PPTX
Analysis by semantic segmentation of Multispectral satellite imagery using de...
PDF
TensorFlow-KR 3rd meetup - Lightning Talk for SI Analytics
PPTX
Object extraction from satellite imagery using deep learning
PDF
Geospatial machine learning for urban development
PPTX
RemoteSensing_DeepLearning_v2.pptx
PPTX
artificial intelligence applications in Geomatics
PPTX
Artificial intelligence and Machine learning in remote sensing and GIS
PPTX
Computer Vision and GenAI for Geoscientists.pptx
PDF
12 SuperAI on Supercomputers
PDF
Satellite and Land Cover Image Classification using Deep Learning
PPTX
Final thesis presentation
PDF
Idea-Presentation-Format-SIH2023-College.pdf
PDF
Deep Learning techniques for boundaries detection
PPTX
Harpster, J. - Open data on buildings with satellite imagery processing
PDF
TEAM 3: Open Land Use for Africa (OLU4Africa)
PDF
Harnessing Spark Catalyst for Custom Data Payloads
PDF
Machine learning and climate and weather research
PDF
Annotation tools for ADAS & Autonomous Driving
Satellite Image Classification with Deep Learning Survey
Analysis by semantic segmentation of Multispectral satellite imagery using de...
TensorFlow-KR 3rd meetup - Lightning Talk for SI Analytics
Object extraction from satellite imagery using deep learning
Geospatial machine learning for urban development
RemoteSensing_DeepLearning_v2.pptx
artificial intelligence applications in Geomatics
Artificial intelligence and Machine learning in remote sensing and GIS
Computer Vision and GenAI for Geoscientists.pptx
12 SuperAI on Supercomputers
Satellite and Land Cover Image Classification using Deep Learning
Final thesis presentation
Idea-Presentation-Format-SIH2023-College.pdf
Deep Learning techniques for boundaries detection
Harpster, J. - Open data on buildings with satellite imagery processing
TEAM 3: Open Land Use for Africa (OLU4Africa)
Harnessing Spark Catalyst for Custom Data Payloads
Machine learning and climate and weather research
Annotation tools for ADAS & Autonomous Driving
Ad

Recently uploaded (20)

PPTX
web development for engineering and engineering
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Construction Project Organization Group 2.pptx
PDF
composite construction of structures.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
additive manufacturing of ss316l using mig welding
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Well-logging-methods_new................
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Digital Logic Computer Design lecture notes
web development for engineering and engineering
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Construction Project Organization Group 2.pptx
composite construction of structures.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
R24 SURVEYING LAB MANUAL for civil enggi
Operating System & Kernel Study Guide-1 - converted.pdf
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
additive manufacturing of ss316l using mig welding
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Well-logging-methods_new................
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
bas. eng. economics group 4 presentation 1.pptx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Digital Logic Computer Design lecture notes

understanding the planet using satellites and deep learning

  • 1. Understanding the planet using satellites and deep learning bcn.AI June 7th 2019 Albert Pujol Torras @AlbertPT71 Lead Machine Learning Platform
  • 2. Agenda ● Satellogic ● Satellogic Data Science and Solutions ● What we can do with satellites, examples of problems we face ● What type of data do we work with ? ● Processing infrastructure, hardware and software ● Some lines of research we are interested on. ● Lessons learned ● Questions
  • 4. Data Science & Solutions BCN Delivery platform TLV Headquarters & Design BSAS Manufacturing Plant MVD Comprehensive services PEK
  • 5. Object detection/Counting What kinds of problems do we face ?
  • 6. Object amount/density estimation / regression with lower image resolution
  • 7. Estimation of other image modalities HR RGB LR TIR LR SWIR1 LR SWIR2 HR THERMAL
  • 8. Regression: time series image prediction -Estimation of the yield at the end of the season -Monitoring of changes in the estimation to know when and where to act.
  • 9. Image semantic segmentation: Land use detection
  • 10. “anomaly change” and “semantic change” detection Time T0 Time T1 Diff (T1,T0)
  • 11. Satellogic Data 3rd Party Satellite Data Primary Data Sources Derived Layers Temporal Evolution Land Use Maps Advanced Indices Distance to Water Terrain Orientation Superresolution Images ... These sources can be available globally or locally, dynamic or static, high or low res... nKappa: Data science platform with focus on geographic data and satellite imagery. Main goal: To scale solution development by automating/accelerating data science work. nKappa enables solution development using aligned sets of image tiles (Kappas) World Climate Maps Geologic Data Elevation Models Georef: Man-Made Structure Political Boundaries Census Data Maps Data - Data Sources
  • 12. Sizes: -Typical project: 20Gb/day. -Daily world remap: continental surface processing 5300 hours of video per day. Sources of image variation: -Clouds….70% of the world is cloud covered. -Perspective changes (off nadir satellite images, drone images). -Shadows orientation, intensity, and longitude variations depending on day hour, clouds, and season. -Chromatic changes due to aerosol and hour of day. -Variations between sensors (different satellites, drone images,..) -Variations/errors in image orthorectification, geolocalization. -Growth and color of seasonal vegetation changes,... Data - Data Sources clouds perspective shadows Chromatic and vegetation changes
  • 13. Data - Data Sources Extremely unbalanced datasets
  • 14. Rare and expensive: indispensable to train and to assess quality of ML and computer vision approaches. Sources of ground truth: - Land ground truth provided by client. - GT generated using highest resolution imagery. - Human annotation - Our team always annotate ... to understand the problem. - Internal and external annotation (mechanical turk, supahands, ...) - sample what to annotate to preserve variability and input domain coverage. - Measure biases and variances of annotators (discard annotators, images,reconstruct annotation instructions...). - Other GT sources: first world surveyed data annotated from visual imagery or using land ground truth (Corina project, LUCAS, Creaf, Siose in spain, USA USGS land cover dataset,...) Useful one we have to deal with: - Out of data (most of it are correct but small parts are erroneous) - differing resolution (uncertain labels at class borders), - domain/covariate shift: how to transfer it to places that differ in land management culture, climate or relief. Data - Ground truth expensivecheaper
  • 15. GT Data: Covariate shift & Domain adaptation Existent “good quality” Ground Truth Rice fields in Europe Target areas without ground truth Urban areas in Europe Urban areas in Lagos Rice fields in China
  • 16. ● huge amount of data --> cloud infrastructure. ● nKappa platform for distributed processing (actually using Microsoft Azure) and in-house gpu servers (equipped with 1080ti’s) ● nKappa uses cloud for experiment management to keep track, team share, and audit datasets, algorithms, models ,deploying pipelines and models in to production, and handle all the GIS-ETL related stuff. ● GPU-servers mostly used in the stage of EDA and DS algorithms and models development. Infrastructure - Hardware
  • 17. Some lines of research: Domain adaptation ”Deep Visual Domain Adaptation: A Survey”, Mei Wang, Weihong Deng, “Domain Adaptation for Visual Applications: A Comprehensive Survey”, Gabriela Csurka Sampling and sample-weighting based on classifier domain differentiation Adversarial networks to make embeddings invariant to domain change GT Barcelona Target Lasa
  • 18. Some lines of research: Usage of generative models Image-to-Image Translation with Conditional Adversarial Networks, Isola, Phillip; Zhu, Jun-Yan; Zhou, Tinghui; Efros, Alexei A. Satellite Image Spoofing: Creating Remote Sensing Dataset with Generative Adversarial Networks, Chunxue Xu,Bo Zhao GeoGAN: A Conditional GAN with Reconstruction and Style Loss to Generate Standard Layer of Maps from Satellite Images Invisible cities. https://opendot.github.io/ml4a-invisible-cities/implementation/ Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros Evaluation of the effects on semantic segmentation of using samples from Conditional Generative Adversarial Networks: - Data augmentation: Generation of satellite images (textures) from land use random labels. - Hiper resolution and image enhancement.
  • 19. Some lines of research: Uncertainty measurement and GT cleaning “Dropout as a Bayesian Approximation:Representing Model Uncertainty in Deep Learning”, Yarin Gal,Zoubin Ghahramani “Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels”, Bo Han,, Quanming Yao, Xingrui Yu,, Gang Niu,Miao Xu, Weihua Hu, Ivor W. Tsang, Masashi Sugiyama Measure errors on GT labeling: - Error and entropy on classification distribution when using an ensemble of classifiers. - Entropy of DNN outputs when applying dropout on fully connected layers on inference stage.
  • 20. Some lines of research: Distance metric -Invariant embeddings How we can use the huge amount of unlabeled data to train models. -learning deep NN invariant embeddings and transferable models for encoding land use content. “Tile2Vec: Unsupervised representation learning for spatially distributed data” ,Neal Jean, Sherrie Wang, Anshul Samar, George Azzari, David Lobell, Stefano Ermon ANCHOR TILES POSITIVE TILES NEGATIVE TILES
  • 21. Lessons learned - Project success : - 5% ML algorithm and algorithm parameters selection, - 95% really understanding what the client needs, how to generate value, and anticipate how your output is going to be consumed, defining good features, good ground truth, good sampling data policy, pre and post processing. - Dedicate the time first to ensure success, … after that improve: - Using fast ML algorithms. - Starting with small datasets with the input and output variability of the original one. - Predictive models: accuracy is not always the most important: explainability, consistency. - Worth invest on automatically measure dataset quality before start training on big datasets. - Missing values, constant variables, unaligned bands, duplicated variables, unbalancing… - Most of our in production costs are ETL (extract, transform, load) - Deep Learning is amazing (sometimes too much for the problems to solve) ….and it is expensive: - In production: computational cost. - In development: Fine tuning and network cooking. (does not scale quite well) - Context knowledge + common sense heuristics + ML vs end-to-end (is all tarjet domain variability in your train set?)