SlideShare a Scribd company logo
ExtremeEarth
From Copernicus Big Data
to Extreme Earth Analytics
This project has received funding from the European Union’s Horizon 2020
research and innovation programme under grant agreement No 825258.
Frascati - September 12th 2019
Sina Sheikholeslami, KTH
Theofilos Kakantousis, Logical Clocks
Scalable Deep Learning Techniques for
Copernicus Data
Phi-Week 2019
3
Agenda
• What is Hopsworks
• End-to-end Machine Learning Pipelines
• Deep Learning Pipelines in Hopsworks
• Scalable Distributed Deep Learning Techniques
• Deep Learning Techniques for Earth Observation data
• Summary
What is Hopsworks
6
What is Hopsworks
7
What is Hopsworks
8
Hopsworks REST API
• Manage resources via the REST API
o Projects
o Datasets
o Jobs
o FeatureStore
o Experiments
o ModelServing
o Kafka
o ...
End-to-end Machine
Learning & Deep
Learning Pipelines
10
End-to-End ML & DL Pipelines can be factored into
stages
Data
Prep
Data
Ingest
Train Serve
Online
Monitor
Distributed Storage
Raw
Data
Data
Lake
Resource Manager
11
End-to-End ML & DL Pipelines in Hopsworks
12
Typical Feature Store pipeline
Deep Learning
Pipelines in Hopsworks
14
Data Access - Copernicus EO data
15
Data Access - Copernicus EO data
16
Feature Engineering
17
Feature Engineering
18
Experiments Overview - Hopsworks UI
19
Experiments Details - Hopsworks UI
20
Experiments - Hopsworks API
21
Experiments - Tensorboard
22
TensorFlow Extended with Beam Portability
Framework and Flink runner
23
TensorFlow Model Analysis
https://medium.com/tensorflow/introducing-tensorflow-model-analysis-scaleable-sliced-and-full-pass-metrics-5cde7baf0b7b
24
TensorFlow Model Analysis
https://medium.com/tensorflow/introducing-tensorflow-model-analysis-scaleable-sliced-and-full-pass-metrics-5cde7baf0b7b
25
Data Scientist Dev View - Jupyter Dashboard
26
Dev view - Python first
No need to write
Dockerfiles
Hopsworks Cluster
27
Dev View - Orchestration
28
Orchestration with Airflow
Scalable Distributed
Deep Learning
30
Distributed Deep Learning in Hopsworks
Executor 1 Executor N
Driver
HopsFS (HDFS)TensorBoard Model Serving
31
Data Parallel Distributed Training
Training Time
Generalization
Error
(Synchronous Stochastic Gradient Descent (SGD))
32
Distributed Deep Learning in Hopsworks
HopsYARN
10 GPUs on 1 host
100 GPUs on 10 hosts with ‘Infiniband’
Hopsworks supports a Hetrogenous Mix of GPUs
4 GPUs on any host
33
Ring-AllReduce vs Parameter Server
GPU 0
GPU 1
GPU 2
GPU 3
send
send
send
send
recv
recv
recv
recv GPU 1 GPU 2 GPU 3 GPU 4
Param Server(s)
Network Bandwidth is the Bottleneck for Distributed Training
34
Distributed Deep Learning in Hopsworks
• Uses Apache Spark/YARN to add distribution
to TensorFlow’s CollectiveAllReduceStrategy
– Automatically builds the ring
(Spark/YARN)
– Allocates GPUs to Spark Executors
Scalable Deep
Learning Techniques
for Satellite Data
36
Polar Use Case EE Ambitions - I
• Deep Learning Architecture
o Now: Most of the deep architectures developed for classifying remote sensing data focus on VHR optical images. The existing
pretrained networks are not optimized to the specific properties of Copernicus data (SAR Images)
o KPI: The new deep architecture will improve the classification accuracy of Copernicus data with respect to the use state-of-the-
art classifiers
• Sea Ice Charts
o Now: Sea ice charts are based on trained ice analysts manually segmenting a combination of SAR and other images into smaller
polygons to record parameters including, but not limited to, sea ice concentration and stage of development.
o KPI: Automatic production of more accurate and reliable sea ice products on the Hops data platform
• Distributed Deep Learning:
o Now: No work currently exists in the international literature for scaleout deep learning for Remote Sensing data.
o KPI: Classification algorithms running on Hops and scaling to PBs of Copernicus data.
37
Polar Use Case EE Ambitions - II
• Ice Charts
o Now: Ice charts are currently manual interpretations with inherent human quality control but also potential for bias.
o KPI: Methods must be robust and reliable throughout the sea ice seasonal cycle
• Computing power
o Now: New ice mapping algorithms currently run only for limited areas and seasons, using limited local processing resources.
o KPI: The Hops data platform is able to cope with input data volume and processing load to enable NRT product availability.
• Operational Delivery
o Now: New automatic ice information products not available to end users
o KPI: Demonstrate availability of new automatic ice mapping products with demo users
38
Polar Use Case EE Challenges
• Training data
o Need of enough and accurate training data
o Gathering high-quality ground truth is expensive and sometimes not feasible
• Resolution
o Each pixel represents a large area of land (100m x 100m)
o Accurate characterization can be jeopardized by misclassification of even single pixels
• Size of SAR Images
o The size of each image usually is 1+ GB
o Many pixels or patches should be considered in training and inference
• Porting and storing data to Hadoop
o Data must be prepared in efficient data structures
o Appropriate data format for storage of training data is required
39
Preprocessing
Calibration
Speckle Nosie
Terrain Correction
Feature
Extraction
Classification
Segmentation
Non-Linear Function
Feature Learning by Deep
Learning
Sea ice Edges
Sea Ice Types -
concentration
Iceberg Detection
Application Road Map
40
Deep Learning
Transfer Learning
Ad hoc Architecture
Distributed Training of Existing
Architecture
Designing Specific Distributed
Architecture for Remote Sensing
Pixel-Wised VGG16 from Scratch
103,754,000 Patch has been analyzed
to Label each pixel
Patch-Based VGG16 Fine-Tune
32x32 Patches
Sea ice Edges
Patch-Based Ad hoc network
32X32 Patches
Semi-Supervised Distributed
Training (GANs)
• All tests have been performed on Hopsworks
• High Resolution and Pixel-wise processing requires
analyzing 100+M patches
- approx. 9hour/image !
• New techniques for distributed training is needed in
the remote sensing domain
S. Khaleghian, T. Kræmer, T. Eltoft, A. Marinoni, “Distributed Deep learning for sea ice edges detection”, in prep.
45
Summary
• End-to-end Machine Learning pipelines with Hopsworks
• Scalable Deep Learning techniques
• Deep Learning techniques for Earth Observation data
Hopsworks Contributors
Jim Dowling, Seif Haridi, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias
Gebremeskel, Fabio Buso, Antonios Kouzoupis, Kim Hammar, Steffen Grohsschmiedt, Alex Ormenisan,
Robin Andersson, Moritz Meister, Kajetan Maliszewski, Netsanet Gebretsadkan Kidane, Sina Sheikholeslami,
Joel Stenkvist, August Bonds, Vasileios Giannokostas, Johan Svedlund Nordström, Rizvi Hasan, Paul Mälzer,
Bram Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto Lorente, Andre
Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Qi Qi, ...
@hopsworks
Thank you!
@logicalclocks
www.logicalclocks.com
sinash@kth.se
theo@logicalclocks.com

More Related Content

PDF
Polar Use Case - ExtremeEarth Open Workshop
PDF
Big Linked Data Federation - ExtremeEarth Open Workshop
PPTX
CourboSpark: Decision Tree for Time-series on Spark
PPTX
Cognitive computing with big data, high tech and low tech approaches
PDF
"Embedded Lucas-Kanade Tracking: How it Works, How to Implement It, and How t...
PPTX
Dealing with an Upside Down Internet With High Performance Time Series Database
PPTX
Opening the Path to Technical Excellence
PDF
DSD-INT 2018 Earth Science Through Datacubes - Merticariu
Polar Use Case - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open Workshop
CourboSpark: Decision Tree for Time-series on Spark
Cognitive computing with big data, high tech and low tech approaches
"Embedded Lucas-Kanade Tracking: How it Works, How to Implement It, and How t...
Dealing with an Upside Down Internet With High Performance Time Series Database
Opening the Path to Technical Excellence
DSD-INT 2018 Earth Science Through Datacubes - Merticariu

What's hot (20)

PDF
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
PPTX
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
PPTX
predictive-analytics-san-diego-2013-02-21
PPTX
Geo Analytics Canada Overview - May 2020
PDF
The Sierra Supercomputer: Science and Technology on a Mission
PDF
Storm users group real time hadoop
PPTX
Food Security Use Case - ExtremeEarth Open Workshop
PPTX
Real time-hadoop
PPT
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
PPTX
Dunning time-series-2015
PDF
Blue Waters and Resource Management - Now and in the Future
PDF
Hadoop as a Platform for Genomics
PPTX
Hug france-2012-12-04
PPTX
Goto amsterdam-2013-skinned
PDF
20190703_AGIT_GeoRasterWorkshop_GriddedData_KPatenge
PPTX
Earth Science Data and Information System (ESDIS) Project Update
PPT
The SKA Project - The World's Largest Streaming Data Processor
PDF
Sierra Supercomputer: Science Unleashed
PDF
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
PPTX
Hadoop for High-Performance Climate Analytics - Use Cases and Lessons Learned
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
predictive-analytics-san-diego-2013-02-21
Geo Analytics Canada Overview - May 2020
The Sierra Supercomputer: Science and Technology on a Mission
Storm users group real time hadoop
Food Security Use Case - ExtremeEarth Open Workshop
Real time-hadoop
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
Dunning time-series-2015
Blue Waters and Resource Management - Now and in the Future
Hadoop as a Platform for Genomics
Hug france-2012-12-04
Goto amsterdam-2013-skinned
20190703_AGIT_GeoRasterWorkshop_GriddedData_KPatenge
Earth Science Data and Information System (ESDIS) Project Update
The SKA Project - The World's Largest Streaming Data Processor
Sierra Supercomputer: Science Unleashed
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
Hadoop for High-Performance Climate Analytics - Use Cases and Lessons Learned
Ad

Similar to Scalable Deep Learning in ExtremeEarth-phiweek19 (20)

PDF
Hopsworks - ExtremeEarth Open Workshop
PDF
Artificial Intelligence in the Earth Observation Domain: Current European Res...
PDF
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
PDF
Extreme earth overview
PPTX
The ExtremeEarth infrastructure-phiweek19
PDF
ExtremeEarth Open Workshop - Overview and Achievements
PDF
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
PDF
Sysml 2019 demo_paper
PDF
End-to-End ML pipelines with Beam, Flink, TensorFlow and Hopsworks.
PDF
Odsc workshop - Distributed Tensorflow on Hops
PDF
Machine learning and climate and weather research
PPTX
LPS19 ExtremeEarth Project
PDF
IoT Story: From Edge to HDP
PDF
Hopsworks - The Platform for Data-Intensive AI
PDF
AI models for Ice Classification - ExtremeEarth Open Workshop
PPTX
Edge computing system for large scale distributed sensing systems
PDF
STEAM++ AN EXTENSIBLE END-TO-END FRAMEWORK FOR DEVELOPING IOT DATA PROCESSING...
PDF
Steam++ An Extensible End-to-end Framework for Developing IoT Data Processing...
PPTX
YouSz_20210421.pptx
PPT
big_data_casestudies_2.ppt
Hopsworks - ExtremeEarth Open Workshop
Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
Extreme earth overview
The ExtremeEarth infrastructure-phiweek19
ExtremeEarth Open Workshop - Overview and Achievements
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
Sysml 2019 demo_paper
End-to-End ML pipelines with Beam, Flink, TensorFlow and Hopsworks.
Odsc workshop - Distributed Tensorflow on Hops
Machine learning and climate and weather research
LPS19 ExtremeEarth Project
IoT Story: From Edge to HDP
Hopsworks - The Platform for Data-Intensive AI
AI models for Ice Classification - ExtremeEarth Open Workshop
Edge computing system for large scale distributed sensing systems
STEAM++ AN EXTENSIBLE END-TO-END FRAMEWORK FOR DEVELOPING IOT DATA PROCESSING...
Steam++ An Extensible End-to-end Framework for Developing IoT Data Processing...
YouSz_20210421.pptx
big_data_casestudies_2.ppt
Ad

More from ExtremeEarth (9)

PDF
Big Linked Data Querying - ExtremeEarth Open Workshop
PDF
ExtremeEarth Open Workshop - Introduction
PPTX
Big Linked Data Interlinking - ExtremeEarth Open Workshop
PDF
ExtremeEarth Data Science Pipeline for Linked Earth Observation Data
PDF
Snow Monitoring for Water Availability and Irrigation
PPTX
Polar Use Case in ExtremeEarth-phiweek19
PDF
Food security use case in ExtremeEarth-phiweek19
PPTX
Big linked geospatial data tools in ExtremeEarth-phiweek19
PDF
Copernicus and AI workshop 2020
Big Linked Data Querying - ExtremeEarth Open Workshop
ExtremeEarth Open Workshop - Introduction
Big Linked Data Interlinking - ExtremeEarth Open Workshop
ExtremeEarth Data Science Pipeline for Linked Earth Observation Data
Snow Monitoring for Water Availability and Irrigation
Polar Use Case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19
Copernicus and AI workshop 2020

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Empathic Computing: Creating Shared Understanding
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Spectroscopy.pptx food analysis technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Big Data Technologies - Introduction.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
The AUB Centre for AI in Media Proposal.docx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Encapsulation_ Review paper, used for researhc scholars
Empathic Computing: Creating Shared Understanding
Unlocking AI with Model Context Protocol (MCP)
Reach Out and Touch Someone: Haptics and Empathic Computing
Chapter 3 Spatial Domain Image Processing.pdf
Understanding_Digital_Forensics_Presentation.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Spectroscopy.pptx food analysis technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Mobile App Security Testing_ A Comprehensive Guide.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf

Scalable Deep Learning in ExtremeEarth-phiweek19