Semantic Segmentation on Satellite Imagery

Semantic Segmentation on
Satellite Imagery
Rahul Bhojwani, Nina Domingo,
Benjamin Mayhew, Christy Tsz-En Wang

Kaggle: Can you train an eye in the sky?
Challenge: The Defence Science and
Technology Laboratory (DSTL) is seeking
novel solutions to alleviate the burden on
their image analysts and challenges
kagglers to accurately identify and classify
objects in overhead satellite imagery.
Introduction Data Methods Results

What’s in a picture?

How is this useful?
Medical imaging Agriculture Surveillance

Data
Input: 25 1km x 1km satellite images in both 3-band and 16-band
formats
● Format: GeoTiff
● Images are taken from the same region but coordinates are
transformed so the location is obscured
Object class: every class is provided in the form of a Multipolygon
● Format: Geojson or WKT

Object Class Types
Buildings Crops
Misc. Manmade Structures Waterway
Roads Standing Water
Track Vehicle Large
Trees Vehicle Small

Data Processing of Labels
Match [0,1] coordinates to
pixel coordinates
Compute projection factors for
multipolygon

Data Processing of Labels
Multipolygons to shapely objects
Project geometry to pixel coordinates
Shapely objects to shapefiles
to tiff files

Data Processing
Original image Object mask Superimposed image

Object Class Type Distribution

Average Number of Polygons Distribution

More Data Processing
25 512x512
images
25 ~3300x3300
images
25 3072x3072
images
900 512x512
images
DIRECT SCALING PARTITION

Methods - Semantic Segmentation with Deep Learning
Important deep learning
models for semantic
segmentation:
● Fully Convolutional
Network [Nov 2014]
● U-net [May 2015]
● Segnet [Nov 2015]

VGG-16:

Fully Convolutional
Network:
● No fully
connected
● Skip
connection
● VGG-16

U-Net:

U-Net:
● Encoder-Decoder network.
● Every decoding phase is convolved with trainable filters.
● Copy the encoder embedding to the corresponding decoder.
● Data Augmentation [Stretching and rotation].
● Weighted Cross Entropy.
● Forces network to learn the border pixels.

Methods - Encode/Contracting path
Goal:
● Retain context and
localization accuracy.
Operations:
● Convolution
● Non Linearity (ReLU)
● Pooling
● But skip the fully connected
layers
3x3 Convolution with
no padding, stride of 2

Segnet Architecture:

Methods - Decode/Expansive path
Goal:
● To recover the object details and
spatial dimension
Operation:
● “Up-convolution”/ “upsampling”
● Concatenate with the corresponding
cropped encoder feature maps
● Convolution layers
● ReLU

Segnet:
● Encoding part is exactly VGG-16
● Use Trained weights from VGG-16 [Excluding the last fully connected
layer]
● Decoder uses the pooling indices from max pooling step of
corresponding encoder.
● The upsampled maps were convolved with trainable filters.
● Unlike U-Net they don’t copy the entire encoding.
● Reduced the trainable parameters from 134M → 14.7M

Segnet Unpooling:

FCN vs Segnet:

Training U-net
Pixel-wise soft-max + cross entropy loss function

Methods: How does upsampling work?
Transposed convolution (fractionally strided
convolution/deconvolution)
● Reconstructs the spatial resolution
● The weights are learnable
● It is NOT reverse convolution process
Transposed 2x2 convolution
with no padding, stride of 2 and
kernel of 3

Convolution as matrix
multiplication
4 x 4
3 x 3

Convolution as matrix
multiplication
4 x 16
16 x 1
4 x 1

Transposed convolution as
matrix multiplication
(16 x 4) (4 x 1) = (16 x 1)
● Dimension of input and output swap
● Uses transpose of convolution matrix

Preliminary results: partitioned images [900x512x512]
Epoch Loss Acc Epoch Loss Acc
1 0.2356 0.9587 6 NA NA
2 0.1763 0.9587 7 NA NA
3 ETA: ~1 day 8 NA NA
4 NA NA 9 NA NA
5 NA NA 10 NA NA

Actual Next Steps:
▫ Include more classes as part of our training.
▫ Tuning the hyperparameters of the model.
▫ Making the segnet work.
Future Works:
▫ Exploring more recently published models. Eg: Deeplab
v3[2018]
▫ Use higher computing resources to run the models
faster.

References:
▫ Ronneberger, O. (2017). Invited Talk: U-Net Convolutional Networks for Biomedical Image Segmentation.
Informatik Aktuell Bildverarbeitung Für Die Medizin 2017, 3-3. doi:10.1007/978-3-662-54345-0_3
▫ Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. 2015 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2015.7298965
▫ Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). SegNet: A Deep Convolutional Encoder-Decoder
Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12),
2481-2495. doi:10.1109/tpami.2016.264461
▫ https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d
▫ https://www.cs.toronto.edu/~frossard/post/vgg16/
▫ https://medium.com/@wilburdes/semantic-segmentation-using-fully-convolutional-neural-networks-
86e45336f99b
▫ https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection

Methods: dilated/atrous convolutions
Goal:
● Take away need to pool layers
Operations:
● Apply predefined gaps between each pixel
of input image
● Replace pooling layer from pretrained
classification system with dilated
convolution
e.g. 2-dilated convolution

Kaggle: Evaluation
Average Jaccard Index between the predicted multipolygons and actual
multipolygons. The Jaccard Index for two regions is the ratio of the area of the
intersection to the area of the union.
Jaccard =TP/(TP + FP + FN) = |A∩B|/|A∪B| = |A∩B|/(|A|+|B|−|A∩B|)

Semantic Segmentation on Satellite Imagery

More Related Content

What's hot (20)

Similar to Semantic Segmentation on Satellite Imagery (20)

Recently uploaded (20)

Semantic Segmentation on Satellite Imagery

Editor's Notes