SlideShare a Scribd company logo
Kaggle review
Planet: Understanding the Amazon
from Space
Tyantov Eduard
Challenge overview
#1 Challenge
Numbers
– Images of land surface of Earth
– Goal: help to detect deforestation of Amazon rainforest
– Classification task, 17 classes: atmospheric conditions, common land cover and rare land cover
– 3 months
– 938 teams
– $60k in prizes
#2 Scene examples: atmospheric
Cloudy
Haze
Partly cloudy
#2 Scene examples: common
Primary
Habitation
Water (river)
#2 Scene examples: common
Bare ground
Cultivation
Agriculture + road
#2 Scene examples: rare
Selective logging
Conventional Mining
Slash & burn
#2 Scene examples: rare
Blooming
Blow down
"Artisinal" Mining
(small & illegal)
#3 Data acquiring
Process
– GeoTiff format (red, green, blue, near infrared)
– Tiff -> JPG using “Planet visual product processor”
– 1600 panoramas splitted into 150k chips
– 17 chosen labels
– Labeling: Crowd Flower platform
– Assessors used only JPG data !
#4 Data characteristics
– 256x256 clips
– TIF: 4 channels, 16 bit numbers
– JPG converted from TIF
– Train 40.5k, test: 62k
#5 Evaluation
Score
F2 score, averaged across rows
Baseline
#6 Data problems
Resolved problems
– Test data leakage (geo data in tif) -> new test set
– New test set was heavily muddled (jpg != tif)
Unresolved
– Misalignment of jpg and tif
– Different signals in tif & jpg (for small percentage of the data)
– Very noisy labels
• In the same cases: random of atmospheric
JPG-BGR TIF-BGR TIF-NIR Alignment
JPG-BGR TIF-BGR
#7 Label distribution
Co-occurrenceFrequencies
#8 Baseline
Aspects
– Code: Pytorch
– Model
• resnet18 pretrained from Imagenet
• sigmoid + cross-entropy
• trainable: block3 + block4 + FC
– Training aspects:
• SGD, lr=0.1
• Augmentaion: typical imagenet
• Test-time augmentation (TTA) - same
– Decision: p > 0.5
Result (F2-score): 90.06%
#9 Baseline: overall threshold
Aspects
– Code: Pytorch
– Model
• resnet18 pretrained from Imagenet
• sigmoid + cross-entropy
• trainable: block3 + block4 + FC
– Training aspects:
• SGD, lr=0.1
• Augmentaion: standart imagenet
• Test-time augmentation (TTA) - same
– Decision: p > 0.2
Result (F2-score): 91.79% (+1.73)
#10: Choosing F2-thresholds
We can find quazzy-optimal thresholds per class !
It’s ok to use valid set. Two methods:
1. Per class
– Bruteforce thresholds for each class independently
– Metric = F2-score per class
2. Joint optimization
– Gibbs sampling
• Starting from thresholds=[0.2]*17
– Metric = F2-score averaged across valid set
Summary
– Per class is more coherent
– Joint yields better results if averaged on folds
#11 Baseline: optimal thresholds
Aspects
– Code: Pytorch
– Model
• resnet18 pretrained from Imagenet
• sigmoid + cross-entropy
• trainable: block3 + block4 + FC
– Training aspects:
• SGD, lr=0.1
• Augmentaion: standart imagenet
• Test-time augmentation (TTA) - same
– Decision: p > p_class ( 0.1 .. 0.3 )
– Conclusion: choosing thresholds is crucial for LeaderBoard
Result (F2-score): 92% (+0.21)
#12 Enhancing baseline
Aspects
1. Plateau scheduler
• Start from the highest lr
• lr=lr/10 on N=3 degrading epochs
• After changing lr: load best model so far
2. How to finetune ? Experiments:
• conv layers LR=LR/10
• warmup: several epochs only FC
• Best:
– FC, L3, L4 on the same lr
– FC, {L4,L3} * 0.1, {L2, L1}*0.05 with warm-up until loss degrades
3. Model tweak:
• + FC (256 units) + BN + ReLu
– adding BN yields much better results
Result (F2-score): 92.53% (+0.53)
#13: Other zoo models
Models
– Resnet34 + FCBN -> 92.65%
– Densenet121 -> 92.76%
– Densenet169 yields best result of standart augmentation
Result (F2-score): 92.79% (+0.26)
#14: Heng’s activity
– He organized kagglers to experiment and share results/insights
– Created Slack channel (later it was prohibited)
– Shared code (until some time )
– Posted all ideas during the competition
– A lot of top-finishers used his code as a baseline
– Finished at 19th
#15 Some results of this activity
Best result: 93.015
Diving deep
#16: Best single jpg model
Models
– Migrated from PIL to CV2 (easier to augment)
– Model: Resnet18 + FCBN
– Resolution: 256x256 (instead of 224x224)
– Train augmentation: random shift/zoom(+-10%)/rotate/flip/transpose
• Zooming’s crucial for fighting overfitting
• But cuts roads/cultivations/… off
– 6 TTA: 4 rotation + 2 flips, average
– Fixed avgpool in all zoo models Avgpool(7) -> Avgpool(7,1) + Avg
• otherwise avgpool will use only 224x224 of an original image
Result (F2-score): 92.975% (+0.185)
#17 Using tif data
Pros
– NIR channel - additional info
– TIF – has 16 bits (JPG 8 bits)
– Domain specific features (various indexes)
Cons
– No pretrained models
– Assessors used only JPG
– Misalignments
#18 TIF from scratch
Aspects
– 4-channels: RGB + NIR
– Same setup for Resnet18 +FCBN
– Training from scratch
Result (F2-score): 91.72% (-1.26)
#19 Various indexes
Indexes
– NDVI - Normalized difference vegetation index
• detects live green vegetation
– NDWI - Normalized Difference Water Index
• water ;)
– SAVI - Soil-Adjusted Vegetation Index
– EVI - Enhanced vegetation index
#19 Various indexes: examples
RGB
#20 Mix model
Aspects
– Use all available data: RGB + NIR + 2 best Indexes
– Split 6 input channels into 3 + 3
– Model
• JPG-branch: best jpg model
• TIF-branch: Resnet18/WideResnet/ResNext from scratch
– Learning rates
• JPG: lr * 0.05
– This setup’s best: WideResnet
Resnet
18
(JPG)
FC-256 FC-256
FC-17
Some
Resnet
(TIF)
RGB
NIR
NDWI
SAVI
prediction
Result (F2-score): 93.00% (+0.025)
#21 Enhancing mix model: NIR only
Insight
We can use pretrained imagenet weights for TIF and it’s better than from scratch !
Only NIR
– Used pretrained resnet18
– Cut off first conv layer 3 x ... -> 1 x ...
Resnet
18
(JPG)
FC-256 FC-256
FC-17
Resnet
18
(NIR)
RGB NIR
prediction
Result (F2-score): 93.01% (+0.01)
#22 Enhancing mix model: best single model
Aspects
– Use all available data: RGB + NIR + Indexes
– Model
• JPG-branch: best jpg model
• TIF-branch: Resnet18 pretrained
– Learning rates
• JPG: lr*0.05
• TIF: FC * 1, {L4,L3} * 1, {L2, L1}*0.1
– Results
• public: 93.071, private: 92.905
• Most competitors have ~ same: public: 93.143, private: 92.915 (overfit)
• 1-st place: local: ~93.3
Resnet
18
(JPG)
FC-256 FC-256
FC-17
Resnet
18
(TIF)
RGB
NIR
NDWI
SAVI
prediction
Result (F2-score): 93.071% (+0.061)
Ensembling
#23 It’s time to stack !
Guides
– Kaggle ensemble guide
– An Introduction to StackNet
#24 Ensembles from submission files
Correlated case
submissions
1111111100 = 80% accuracy
1111111100 = 80% accuracy
1011111100 = 70% accuracy
result
1111111100 = 80% accuracy
Less correlated
submissions
1111111100 = 80% accuracy
0111011101 = 70% accuracy
1000101111 = 60% accuracy
result
1111111101 = 90% accuracy
Pick less correlated submission files and vote. I just averaged last 20 submissions.
Result (F2-score): 93.095% (+0.024)
#25 Stacking: out-of-fold predictions
Idea: train k-fold, predict on valid, concatenate
#26 Stacking: idea
#27 Blending
Steps
1. Construct holdout set
2. For any model «layer1»
1. Train model on train set
2. Predict holdout
3. Train blending model on «layer1» on holdout set
Pros (vs stacking)
– Simpler
– No information leak
– Teammates can throw any model into blender. No seed.
Cons
– Less data
– May overfit to holdout
– Only 2 layers of models
#28 Stack & blend submissions
Problems
– F2 thresholds! They’re different and may overfit (I faced it)
– Overfitted on holdout ;)
Results
– Blending works better for me (log regression on top)
– Best submission based on simple weighting on F2 holdout score
• weight = ((score– min)/(max-min)) ** 0.5
– Models: 10 ensembles (59 models in total)
• jpg: densent121, 5 folds
• jpg: densent169, 5 folds
• jpg: resnet18, 8 folds
• mix: mixnet, 5,6,7 folds
• mix: wideresnet, 7 folds
• mix: wideresnet with selu, 5 folds
– Submission of 130 models scored worse
Final result (F2-score): 93.217% (+0.146), private: 93.015
#29 Last day: vain attempts (1/2)
Problem to solve
– In the train data there is a lot of wrong labels
– Label noise
Solution: purge
– 1% semi-automatic blacklist & fix labels
Examples below, random labels (clear, primary, …)
#29 Last day: vain attempts (2/2)
Results
– Purge significantly impoved valid scores and untouched HOLDOUT score
– But it’s a trap, overfitting somehow
Submission fuss
– Very confident in improvement 
– Waiting till 2:30 to assemble all model results (3:00 due)
– Had an exact plan for 5 submissions, but got worse and no time to think 
Lessons
– Not much sense to purge if test set’s the same (noisy)
– Spare more than 2 days for stacking 
– Plenty of room for stacking to improve results according to leaders’ posts
Final result (F2-score): 93.186% (-0.031)
#30 What didn’t work
List
– Hierarchical final layer (cloudy excludes all other)
• out_1 = sigmoid (cloud_activation)
• out_2 = out_1 * FC [only round works, floating not OK]
– Loss weighting using class distribution (to balance classes)
• oversampling also didn’t work
– YellowFin optimizer – gradient explosion
– Pretrained from Auto-Encoders, Split-Brain
#31 What didn’t work: AAE concept
Concept
– Adversarial Auto Encoders – almost the same as VAE, but instead of KL-distance -> Discriminator
– Trained resnet18 for decoder, transposed resnet18 (transposed conv instead strided conv) for encoder
#32 What didn’t work: Split Brain
Concept
– Splits input in 2 parts, 2 models predict each other input
#33 Technical
– Ubuntu 16, Cuda 8, cudnn 5, Anaconda, pytorch
– TitanX
– Single model training time:
• mixnet 2-3 hours
• jpg resnet18 1-1.5 hour
– Ensembles: 12-24 hours
– Code: https://github.com/EdwardTyantov/pytorch-kaggle-amazon-space
#34 Shake-up: worst case
Messed with sorting (merging submission files)
– top-15 -> bottom
#34 Shake-up: leaders
#35 Other competitors: panoramas
Used CNN averaged predictions for 4 or 8 neighbours as features for central element. Link
#36 Other competitors: dehazing
– Single Image Haze Removal Using Dark Channel Prior (paper)
#37 Other competitors: tricks
List
– Different network sizes: 64*64, 224*224, 256*256 (64 good performance on label «clear»)
– Hard example mining (1/3 with largest loss)
– Averaging TTA using XGBoost (learn mapping)
– XGBoost/Ridge on top of CNNs’ out-of-fold predictions
– cross-entropy + F2-loss (?!)

More Related Content

PPTX
Мэдээллийн системийг хөгжүүлэх
PDF
E drejta biznesore
PPT
Statistike nocionet kryesore dhe mostra ligjerata 2 - ardiana gashi
PPTX
Lec10 scheduling
PDF
CN R16 -UNIT-5.pdf
POTX
өгөгдөл дамжуулах
PPTX
Lekts presentation1
PPTX
RDBMS MySQL DB server
Мэдээллийн системийг хөгжүүлэх
E drejta biznesore
Statistike nocionet kryesore dhe mostra ligjerata 2 - ardiana gashi
Lec10 scheduling
CN R16 -UNIT-5.pdf
өгөгдөл дамжуулах
Lekts presentation1
RDBMS MySQL DB server

What's hot (20)

PPT
005 өгөгдлийн нөөцийн удирдлага
DOC
E drejta Biznesore
PDF
GPU Computing With Apache Spark And Python
PPTX
лекц № 7
PPTX
A Nutshell On Convolutional Codes (Representations)
PPTX
Lecture 2 - Asynchrnous and Synchronous Computation & Communication.pptx
PPT
6. biznesi dhe mjedisi i tij kulturor
PPTX
Congestion control in tcp
PPTX
Dijkstra & flooding ppt(Routing algorithm)
PDF
Router гэж юу вэ ?
DOCX
Г.НОМИН-ЭРДЭНЭ - МЭДЭЭЛЛИЙН АЮУЛГҮЙ БАЙДЛЫН БОДЛОГЫН СУДАЛГАА
PPT
Pp Lect1 1
PPTX
Bvleg2 logic
PPTX
PDF
System modeling and simulation full notes by sushma shetty (www.vtulife.com)
PPT
Pp Lect2 1
PDF
Synchronization Overview
PDF
визуаль програмчлал тест
PPTX
Lekts presentation10
PDF
утасгүй сүлжээний аюулгүй байдлын тухай.
005 өгөгдлийн нөөцийн удирдлага
E drejta Biznesore
GPU Computing With Apache Spark And Python
лекц № 7
A Nutshell On Convolutional Codes (Representations)
Lecture 2 - Asynchrnous and Synchronous Computation & Communication.pptx
6. biznesi dhe mjedisi i tij kulturor
Congestion control in tcp
Dijkstra & flooding ppt(Routing algorithm)
Router гэж юу вэ ?
Г.НОМИН-ЭРДЭНЭ - МЭДЭЭЛЛИЙН АЮУЛГҮЙ БАЙДЛЫН БОДЛОГЫН СУДАЛГАА
Pp Lect1 1
Bvleg2 logic
System modeling and simulation full notes by sushma shetty (www.vtulife.com)
Pp Lect2 1
Synchronization Overview
визуаль програмчлал тест
Lekts presentation10
утасгүй сүлжээний аюулгүй байдлын тухай.
Ad

Similar to Kaggle review Planet: Understanding the Amazon from Space (20)

PDF
Salt Identification Challenge
PDF
Tackling Open Images Challenge (2019)
PDF
HW03 (1).pdf
PDF
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
PDF
Computer vision
PPTX
Computer Vision for Beginners
PPTX
Parking: DeepLens to the rescue
PDF
PyData NYC by Akira Shibata
PPTX
Ai use cases
PPTX
Dmytro Panchenko "Cracking Kaggle: Human Protein Atlas"
PDF
CM UTaipei Kaggle Share
PDF
Landuse Classification from Satellite Imagery using Deep Learning
PDF
AIML4 CNN lab256 1hr (111-1).pdf
PPTX
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
PPTX
Anomaly Detection with Azure and .net
PDF
Eye deep
PDF
Large scale landuse classification of satellite imagery
PDF
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
PDF
automated-optical-inspection-and-defect-detection-for-industrial-applications...
PPTX
Multi-class Image Classification using deep convolutional networks on extreme...
Salt Identification Challenge
Tackling Open Images Challenge (2019)
HW03 (1).pdf
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Computer vision
Computer Vision for Beginners
Parking: DeepLens to the rescue
PyData NYC by Akira Shibata
Ai use cases
Dmytro Panchenko "Cracking Kaggle: Human Protein Atlas"
CM UTaipei Kaggle Share
Landuse Classification from Satellite Imagery using Deep Learning
AIML4 CNN lab256 1hr (111-1).pdf
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
Anomaly Detection with Azure and .net
Eye deep
Large scale landuse classification of satellite imagery
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
automated-optical-inspection-and-defect-detection-for-industrial-applications...
Multi-class Image Classification using deep convolutional networks on extreme...
Ad

More from Eduard Tyantov (9)

PPTX
Эксплуатация ML в Почте Mail.ru
PPTX
Опыт моделеварения от команды ComputerVision Mail.ru
PPTX
Саморазвитие: как я не усидел на двух стульях и нашел третий
PPTX
Project Management 2.0: AI Transformation
PPTX
Kaggle Google Landmark recognition
PPTX
Face Recognition: From Scratch To Hatch
PPTX
Deep Learning: Advances Of The Last Year
PPTX
Ultrasound nerve segmentation, kaggle review
PPTX
Artisto App, Highload 2016
Эксплуатация ML в Почте Mail.ru
Опыт моделеварения от команды ComputerVision Mail.ru
Саморазвитие: как я не усидел на двух стульях и нашел третий
Project Management 2.0: AI Transformation
Kaggle Google Landmark recognition
Face Recognition: From Scratch To Hatch
Deep Learning: Advances Of The Last Year
Ultrasound nerve segmentation, kaggle review
Artisto App, Highload 2016

Recently uploaded (20)

PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPTX
2Systematics of Living Organisms t-.pptx
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PDF
An interstellar mission to test astrophysical black holes
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
famous lake in india and its disturibution and importance
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PPTX
Microbiology with diagram medical studies .pptx
PPTX
Introduction to Cardiovascular system_structure and functions-1
PDF
The scientific heritage No 166 (166) (2025)
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PDF
Sciences of Europe No 170 (2025)
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
2Systematics of Living Organisms t-.pptx
Introduction to Fisheries Biotechnology_Lesson 1.pptx
microscope-Lecturecjchchchchcuvuvhc.pptx
An interstellar mission to test astrophysical black holes
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
famous lake in india and its disturibution and importance
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Viruses (History, structure and composition, classification, Bacteriophage Re...
Microbiology with diagram medical studies .pptx
Introduction to Cardiovascular system_structure and functions-1
The scientific heritage No 166 (166) (2025)
POSITIONING IN OPERATION THEATRE ROOM.ppt
Sciences of Europe No 170 (2025)
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
neck nodes and dissection types and lymph nodes levels
7. General Toxicologyfor clinical phrmacy.pptx
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf

Kaggle review Planet: Understanding the Amazon from Space

  • 1. Kaggle review Planet: Understanding the Amazon from Space Tyantov Eduard
  • 3. #1 Challenge Numbers – Images of land surface of Earth – Goal: help to detect deforestation of Amazon rainforest – Classification task, 17 classes: atmospheric conditions, common land cover and rare land cover – 3 months – 938 teams – $60k in prizes
  • 4. #2 Scene examples: atmospheric Cloudy Haze Partly cloudy
  • 5. #2 Scene examples: common Primary Habitation Water (river)
  • 6. #2 Scene examples: common Bare ground Cultivation Agriculture + road
  • 7. #2 Scene examples: rare Selective logging Conventional Mining Slash & burn
  • 8. #2 Scene examples: rare Blooming Blow down "Artisinal" Mining (small & illegal)
  • 9. #3 Data acquiring Process – GeoTiff format (red, green, blue, near infrared) – Tiff -> JPG using “Planet visual product processor” – 1600 panoramas splitted into 150k chips – 17 chosen labels – Labeling: Crowd Flower platform – Assessors used only JPG data !
  • 10. #4 Data characteristics – 256x256 clips – TIF: 4 channels, 16 bit numbers – JPG converted from TIF – Train 40.5k, test: 62k
  • 11. #5 Evaluation Score F2 score, averaged across rows
  • 13. #6 Data problems Resolved problems – Test data leakage (geo data in tif) -> new test set – New test set was heavily muddled (jpg != tif) Unresolved – Misalignment of jpg and tif – Different signals in tif & jpg (for small percentage of the data) – Very noisy labels • In the same cases: random of atmospheric JPG-BGR TIF-BGR TIF-NIR Alignment JPG-BGR TIF-BGR
  • 15. #8 Baseline Aspects – Code: Pytorch – Model • resnet18 pretrained from Imagenet • sigmoid + cross-entropy • trainable: block3 + block4 + FC – Training aspects: • SGD, lr=0.1 • Augmentaion: typical imagenet • Test-time augmentation (TTA) - same – Decision: p > 0.5 Result (F2-score): 90.06%
  • 16. #9 Baseline: overall threshold Aspects – Code: Pytorch – Model • resnet18 pretrained from Imagenet • sigmoid + cross-entropy • trainable: block3 + block4 + FC – Training aspects: • SGD, lr=0.1 • Augmentaion: standart imagenet • Test-time augmentation (TTA) - same – Decision: p > 0.2 Result (F2-score): 91.79% (+1.73)
  • 17. #10: Choosing F2-thresholds We can find quazzy-optimal thresholds per class ! It’s ok to use valid set. Two methods: 1. Per class – Bruteforce thresholds for each class independently – Metric = F2-score per class 2. Joint optimization – Gibbs sampling • Starting from thresholds=[0.2]*17 – Metric = F2-score averaged across valid set Summary – Per class is more coherent – Joint yields better results if averaged on folds
  • 18. #11 Baseline: optimal thresholds Aspects – Code: Pytorch – Model • resnet18 pretrained from Imagenet • sigmoid + cross-entropy • trainable: block3 + block4 + FC – Training aspects: • SGD, lr=0.1 • Augmentaion: standart imagenet • Test-time augmentation (TTA) - same – Decision: p > p_class ( 0.1 .. 0.3 ) – Conclusion: choosing thresholds is crucial for LeaderBoard Result (F2-score): 92% (+0.21)
  • 19. #12 Enhancing baseline Aspects 1. Plateau scheduler • Start from the highest lr • lr=lr/10 on N=3 degrading epochs • After changing lr: load best model so far 2. How to finetune ? Experiments: • conv layers LR=LR/10 • warmup: several epochs only FC • Best: – FC, L3, L4 on the same lr – FC, {L4,L3} * 0.1, {L2, L1}*0.05 with warm-up until loss degrades 3. Model tweak: • + FC (256 units) + BN + ReLu – adding BN yields much better results Result (F2-score): 92.53% (+0.53)
  • 20. #13: Other zoo models Models – Resnet34 + FCBN -> 92.65% – Densenet121 -> 92.76% – Densenet169 yields best result of standart augmentation Result (F2-score): 92.79% (+0.26)
  • 21. #14: Heng’s activity – He organized kagglers to experiment and share results/insights – Created Slack channel (later it was prohibited) – Shared code (until some time ) – Posted all ideas during the competition – A lot of top-finishers used his code as a baseline – Finished at 19th
  • 22. #15 Some results of this activity Best result: 93.015
  • 24. #16: Best single jpg model Models – Migrated from PIL to CV2 (easier to augment) – Model: Resnet18 + FCBN – Resolution: 256x256 (instead of 224x224) – Train augmentation: random shift/zoom(+-10%)/rotate/flip/transpose • Zooming’s crucial for fighting overfitting • But cuts roads/cultivations/… off – 6 TTA: 4 rotation + 2 flips, average – Fixed avgpool in all zoo models Avgpool(7) -> Avgpool(7,1) + Avg • otherwise avgpool will use only 224x224 of an original image Result (F2-score): 92.975% (+0.185)
  • 25. #17 Using tif data Pros – NIR channel - additional info – TIF – has 16 bits (JPG 8 bits) – Domain specific features (various indexes) Cons – No pretrained models – Assessors used only JPG – Misalignments
  • 26. #18 TIF from scratch Aspects – 4-channels: RGB + NIR – Same setup for Resnet18 +FCBN – Training from scratch Result (F2-score): 91.72% (-1.26)
  • 27. #19 Various indexes Indexes – NDVI - Normalized difference vegetation index • detects live green vegetation – NDWI - Normalized Difference Water Index • water ;) – SAVI - Soil-Adjusted Vegetation Index – EVI - Enhanced vegetation index
  • 28. #19 Various indexes: examples RGB
  • 29. #20 Mix model Aspects – Use all available data: RGB + NIR + 2 best Indexes – Split 6 input channels into 3 + 3 – Model • JPG-branch: best jpg model • TIF-branch: Resnet18/WideResnet/ResNext from scratch – Learning rates • JPG: lr * 0.05 – This setup’s best: WideResnet Resnet 18 (JPG) FC-256 FC-256 FC-17 Some Resnet (TIF) RGB NIR NDWI SAVI prediction Result (F2-score): 93.00% (+0.025)
  • 30. #21 Enhancing mix model: NIR only Insight We can use pretrained imagenet weights for TIF and it’s better than from scratch ! Only NIR – Used pretrained resnet18 – Cut off first conv layer 3 x ... -> 1 x ... Resnet 18 (JPG) FC-256 FC-256 FC-17 Resnet 18 (NIR) RGB NIR prediction Result (F2-score): 93.01% (+0.01)
  • 31. #22 Enhancing mix model: best single model Aspects – Use all available data: RGB + NIR + Indexes – Model • JPG-branch: best jpg model • TIF-branch: Resnet18 pretrained – Learning rates • JPG: lr*0.05 • TIF: FC * 1, {L4,L3} * 1, {L2, L1}*0.1 – Results • public: 93.071, private: 92.905 • Most competitors have ~ same: public: 93.143, private: 92.915 (overfit) • 1-st place: local: ~93.3 Resnet 18 (JPG) FC-256 FC-256 FC-17 Resnet 18 (TIF) RGB NIR NDWI SAVI prediction Result (F2-score): 93.071% (+0.061)
  • 33. #23 It’s time to stack ! Guides – Kaggle ensemble guide – An Introduction to StackNet
  • 34. #24 Ensembles from submission files Correlated case submissions 1111111100 = 80% accuracy 1111111100 = 80% accuracy 1011111100 = 70% accuracy result 1111111100 = 80% accuracy Less correlated submissions 1111111100 = 80% accuracy 0111011101 = 70% accuracy 1000101111 = 60% accuracy result 1111111101 = 90% accuracy Pick less correlated submission files and vote. I just averaged last 20 submissions. Result (F2-score): 93.095% (+0.024)
  • 35. #25 Stacking: out-of-fold predictions Idea: train k-fold, predict on valid, concatenate
  • 37. #27 Blending Steps 1. Construct holdout set 2. For any model «layer1» 1. Train model on train set 2. Predict holdout 3. Train blending model on «layer1» on holdout set Pros (vs stacking) – Simpler – No information leak – Teammates can throw any model into blender. No seed. Cons – Less data – May overfit to holdout – Only 2 layers of models
  • 38. #28 Stack & blend submissions Problems – F2 thresholds! They’re different and may overfit (I faced it) – Overfitted on holdout ;) Results – Blending works better for me (log regression on top) – Best submission based on simple weighting on F2 holdout score • weight = ((score– min)/(max-min)) ** 0.5 – Models: 10 ensembles (59 models in total) • jpg: densent121, 5 folds • jpg: densent169, 5 folds • jpg: resnet18, 8 folds • mix: mixnet, 5,6,7 folds • mix: wideresnet, 7 folds • mix: wideresnet with selu, 5 folds – Submission of 130 models scored worse Final result (F2-score): 93.217% (+0.146), private: 93.015
  • 39. #29 Last day: vain attempts (1/2) Problem to solve – In the train data there is a lot of wrong labels – Label noise Solution: purge – 1% semi-automatic blacklist & fix labels Examples below, random labels (clear, primary, …)
  • 40. #29 Last day: vain attempts (2/2) Results – Purge significantly impoved valid scores and untouched HOLDOUT score – But it’s a trap, overfitting somehow Submission fuss – Very confident in improvement  – Waiting till 2:30 to assemble all model results (3:00 due) – Had an exact plan for 5 submissions, but got worse and no time to think  Lessons – Not much sense to purge if test set’s the same (noisy) – Spare more than 2 days for stacking  – Plenty of room for stacking to improve results according to leaders’ posts Final result (F2-score): 93.186% (-0.031)
  • 41. #30 What didn’t work List – Hierarchical final layer (cloudy excludes all other) • out_1 = sigmoid (cloud_activation) • out_2 = out_1 * FC [only round works, floating not OK] – Loss weighting using class distribution (to balance classes) • oversampling also didn’t work – YellowFin optimizer – gradient explosion – Pretrained from Auto-Encoders, Split-Brain
  • 42. #31 What didn’t work: AAE concept Concept – Adversarial Auto Encoders – almost the same as VAE, but instead of KL-distance -> Discriminator – Trained resnet18 for decoder, transposed resnet18 (transposed conv instead strided conv) for encoder
  • 43. #32 What didn’t work: Split Brain Concept – Splits input in 2 parts, 2 models predict each other input
  • 44. #33 Technical – Ubuntu 16, Cuda 8, cudnn 5, Anaconda, pytorch – TitanX – Single model training time: • mixnet 2-3 hours • jpg resnet18 1-1.5 hour – Ensembles: 12-24 hours – Code: https://github.com/EdwardTyantov/pytorch-kaggle-amazon-space
  • 45. #34 Shake-up: worst case Messed with sorting (merging submission files) – top-15 -> bottom
  • 47. #35 Other competitors: panoramas Used CNN averaged predictions for 4 or 8 neighbours as features for central element. Link
  • 48. #36 Other competitors: dehazing – Single Image Haze Removal Using Dark Channel Prior (paper)
  • 49. #37 Other competitors: tricks List – Different network sizes: 64*64, 224*224, 256*256 (64 good performance on label «clear») – Hard example mining (1/3 with largest loss) – Averaging TTA using XGBoost (learn mapping) – XGBoost/Ridge on top of CNNs’ out-of-fold predictions – cross-entropy + F2-loss (?!)