SlideShare a Scribd company logo
FINE-TUNING A CONVOLUTIONAL NETWORK
FOR CULTURAL EVENT RECOGNITION
ADVISORS:
Andrea Calafell
Xavier Giró-i-Nieto Amaia Salvador
20/07/2015
AUTHOR:
Matthias Zeppelzauer
OUTLINE
1. Motivation and State of the art
2. Baseline
3. Study of the dataset bias
4. Denoising
5. Fracking
6. Fine-tuning deeper layers only
7. Ensemble of event detectors
8. Conclusions and future work
2
MOTIVATION: Cultural Heritage
3Chinese New year
MOTIVATION: Cultural Heritage
4Carnival Rio
Classic onsite explorers
5
Onsite social media is big data...
6
...and online explorers need our help
7
CHALEARN: Looking at People
8
TRAINING
SET
5,875
VALIDATION
SET
2,332
TEST
SET
3,569
50
EVENTS
MOTIVATION: Goals
9
● Improve the results obtained in
ChaLearn Challenge.
● Exploit the noisy data collected
from Flickr
STATE OF THE ART: CaffeNet
10
Content
Visual
Time stamp Context
Geolocation
Text
Zaharieva’15 X X X
Mattivi’11 X X
Bossard’13 X X
Cao’08 X X X
Sutanto’13 X
Schinas’12 X X
Brenner’13 X X
Nguyen’13 X X
MediaEval
Social
Event Detection
STATE OF THE ART: CaffeNet
11
CaffeNet
ARCHITECTURE
[Khrizevsky’12]
SOFTWARE
[Jia’14]
DATA
[Deng’09]
STATE OF THE ART: CNN ARCHITECTURE
12
Convolutional Neural Network architecture
Babenko et al, Neural codes for image retrieval. In Computer Vision-ECCV, 2014
STATE OF THE ART: Object+Scene CNNs
13
Object-Scene Convolutional Neural Network for event recognition
Wang et al, Object-scene convolutional neural networks for event recognition in images. In CVPRW, 2015
OUTLINE
1. Motivation and State of the art
2. Baseline
3. Study of the dataset bias
4. Denoising
5. Fracking
6. Fine-tuning deeper layers only
7. Ensemble of event detectors
8. Conclusions and future work
14
BASELINE: Fine-tuning a ConvNet
15
50
BASELINE: ChaLearn @ CVPRW 2015
16
Awarded with the 2nd prize of the Cultural Event Recognition Challenge in the ChaLearn Workshop at CVPR
2015
Salvador. A, Giro-i-Nieto. X, Calafell, A, et al, Cultural Event Recognition with Visual ConvNets and Temporal Models. In
CVPRW, 2015
BASELINE: ChaLearn @ CVPRW 2015
17
Awarded with the 2nd prize of the Cultural Event Recognition Challenge in the ChaLearn Workshop at CVPR
2015
Salvador. A, Giro-i-Nieto. X, Calafell, A, et al, Cultural Event Recognition with Visual ConvNets and Temporal Models. In
CVPRW, 2015
OUTLINE
1. Motivation and State of the art
2. Baseline
3. Study of the dataset bias
4. Denoising
5. Fracking
6. Fine-tuning deeper layers only
7. Ensemble of event detectors
8. Conclusions and future work
18
Convnets require to be trained with...
19
a large amount of
labeled images
but clean data is expensive...
20
and downloading noisy data in
an unsupervised fashion is
easier and cheaper.
NOISY DATA: Flickr Dataset
21
FLICKR
DATASET
4,068
50
EVENTS
DATASET BIAS
22
Dataset bias when fine-tuning with ChaLearn or Flickr
dataset:
OUTLINE
1. Motivation and State of the art
2. Baseline
3. Study of the dataset bias
4. Denoising
5. Fracking
6. Fine-tuning deeper layers only
7. Ensemble of event detectors
8. Conclusions and future work
23
DENOISING THE FLICKR DATASET
24
Mosaic of Queens Day from ChaLearn Mosaic of Queens Day from Flickr
DENOISING THE FLICKR DATASET
25Example event: Annual Buffalo Roundup
Fine-tuned
model with
ChaLearn
New subset
from
BASELINE: Dataset ordering during fine-tuning
26
CaffeNet
FINE-TUNING JOINT:
DENOISING THE FLICKR DATASET
27
Joint fine-tuning of the clean and noisy datasets:
0.6136
BASELINE: Dataset ordering during fine-tuning
28
CaffeNet
FINE-TUNING: FINE-TUNING:
DENOISING THE FLICKR DATASET
29
Sequential fine-tuning of the clean and noisy datasets:
0.6136
BASELINE: Dataset ordering during fine-tuning
30
CaffeNet
FINE-TUNING:FINE-TUNING:
DENOISING THE FLICKR DATASET
31
Sequential fine-tuning of the noisy and clean datasets:
0.6136
+1,3%
OUTLINE
1. Motivation and State of the art
2. Baseline
3. Study of the dataset bias
4. Denoising
5. Fracking
6. Fine-tuning deeper layers only
7. Ensemble of event detectors
8. Conclusions and future work
32
FRACKING MINING +/- SAMPLES
33
FRACKING THE TRAINING DATASET
34Example event: Pingxi Lantern Festival
Fine-tuned
model with
ChaLearn
New subset
from
hard negatives
hard positive
BASELINE: Dataset ordering during fine-tuning
35
CaffeNet
FINE-TUNING: Fine-tuning
with fracking
subset from:
FRACKING THE TRAINING DATASET
36
Results of fine-tuning using fracking in images from ChaLearn:
baseline: 0.61365
+0,9%
OUTLINE
1. Motivation and State of the art
2. Baseline
3. Study of the dataset bias
4. Denoising
5. Fracking
6. Fine-tuning deeper layers only
7. Ensemble of event detectors
8. Conclusions and future work
37
FINE-TUNING DEEPER LAYERS ONLY
38
Layer 2 responds to corners and other edge/color conjunctions.
FINE-TUNING DEEPER LAYERS ONLY
39
Layer 3 has more complex invariances, capturing similar textures
Zeiler et al, Visualizing and Understanding Convolutional Networks, In Computer Vision-ECCV 2014,
FINE-TUNING DEEPER LAYERS ONLY
40
50
Andrej Karpathy. Convolutional neural networks for visual recognition. In Stanford CS class CS231n.
FC6 FC7
FC8
FINE-TUNING DEEPER LAYERS ONLY
41
Results of only fine-tuning the deeper layers:
+3%
0.61365
FINE-TUNING DEEPER LAYERS ONLY
42
Results of only fine-tuning the deeper layers :
+4%
0.6136
OUTLINE
1. Motivation and State of the art
2. Baseline
3. Study of the dataset bias
4. Denoising
5. Fracking
6. Fine-tuning deeper layers only
7. Ensemble of event detectors
8. Conclusions and future work
43
BASELINE: ChaLearn @ CVPRW 2015
44
Awarded with the 2nd prize of the Cultural Event Recognition Challenge in the ChaLearn Workshop at CVPR
2015
Salvador. A, Giro-i-Nieto. X, Calafell, A, et al, Cultural Event Recognition with Visual ConvNets and Temporal Models. In
CVPRW, 2015
ENSEMBLE OF EVENT DETECTORS
45
SINGLE CONVNET FOR THE 50 EVENTS:
ENSEMBLE OF EVENT DETECTORS
46
ONE CONVNET FOR EACH EVENTS:
ENSEMBLE OF EVENT DETECTORS
47
Results of ensemble of binary :
+6,6%
0.6136
OUTLINE
1. Motivation and State of the art
2. Baseline
3. Study of the dataset bias
4. Denoising
5. Fracking
6. Fine-tuning deeper layers only
7. Ensemble of event detectors
8. Conclusions and future work
48
CONLUSIONS
49
● The Flickr dataset helped us to improve the score by swapping the
order in which we were using the clean and noisy datasets
CaffeNet
FINE-TUNING:FINE-TUNING:
+1,3%
CONLUSIONS
50
● The network actually succeeds in improving his performance by
learning from its own mistakes when applying fracking.
+0,9%
CaffeNet
FINE-TUNING:
Fine-tuning with
fracking subset
from:
CONLUSIONS
51
● The results are better if we keep the weights learned in the earlier
layers from a very large dataset.
50
+4%
CONLUSIONS
52
● Fine-tuning one convnet for each class increases the score.
+6,6%
FUTURE WORK
53
● Mix our solutions with a fine-tuned network with PLACES, and with other
local solutions.
SCENE CNN
(PLACES)
LOCAL
NOW
● Compete (and try to win) ChaLearn @ ICCV 2015 !!
FINE-TUNING A CONVOLUTIONAL NETWORK
FOR CULTURAL EVENT RECOGNITION
ADVISORS:
Andrea Calafell
Xavier Giró-i-Nieto Amaia Salvador
20/07/2015
AUTHOR:
Matthias Zeppelzauer

More Related Content

PDF
Region-oriented Convolutional Networks for Object Retrieval
PDF
Convolutional Features for Instance Search
PDF
Deep and Young Vision Learning at UPC BarcelonaTech (NIPS 2016)
PDF
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
PDF
Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
PDF
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
PDF
Deep image retrieval learning global representations for image search
PDF
Intepretability / Explainable AI for Deep Neural Networks
Region-oriented Convolutional Networks for Object Retrieval
Convolutional Features for Instance Search
Deep and Young Vision Learning at UPC BarcelonaTech (NIPS 2016)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Deep image retrieval learning global representations for image search
Intepretability / Explainable AI for Deep Neural Networks

What's hot (20)

PDF
Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...
PDF
Class Weighted Convolutional Features for Image Retrieval
PPTX
Object Detection Methods using Deep Learning
PDF
Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...
PDF
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
PDF
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
PDF
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
PDF
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
PPTX
SeRanet introduction
PDF
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
PDF
Object Detection Beyond Mask R-CNN and RetinaNet I
PDF
Deep Learning for Computer Vision: Object Detection (UPC 2016)
PPTX
Deep image retrieval - learning global representations for image search - ub ...
PDF
Scaling up Deep Learning Based Super Resolution Algorithms
PDF
DeepFix: a fully convolutional neural network for predicting human fixations...
PDF
Computer vision for transportation
PDF
Visual Search and Question Answering II
PDF
Adaptive object detection using adjacency and zoom prediction
PDF
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
PDF
Mask-RCNN for Instance Segmentation
Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...
Class Weighted Convolutional Features for Image Retrieval
Object Detection Methods using Deep Learning
Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
SeRanet introduction
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
Object Detection Beyond Mask R-CNN and RetinaNet I
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep image retrieval - learning global representations for image search - ub ...
Scaling up Deep Learning Based Super Resolution Algorithms
DeepFix: a fully convolutional neural network for predicting human fixations...
Computer vision for transportation
Visual Search and Question Answering II
Adaptive object detection using adjacency and zoom prediction
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
Mask-RCNN for Instance Segmentation
Ad

Similar to Fine tuning a convolutional network for cultural event recognition (20)

PDF
Data-driven hypothesis generation using deep neural nets
PPTX
Surveillance scene classification using machine learning
PDF
thesis_final.pdf
PDF
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
PDF
CNNs: from the Basics to Recent Advances
PPTX
Object Discovery using CNN Features in Egocentric Videos
PDF
Computing Challenges at the Large Hadron Collider
PPTX
2019 Project Showcase - Alexander Adam Laurence
PDF
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
PDF
02 stein intro_4th-pv_modeling_workshop_2015-10-22_sand2015-8571_c
PDF
Data Mining And Big Data Brian C Castellani Rajeev Rajaram
PDF
Report face recognition : ArganRecogn
PDF
Applications of Machine Learning to Location-based Social Networks
PPTX
Cancer uk 2015_module1_ouellette_ver02
PPTX
Huawei STW 2018 public
PDF
Paris Data Ladies #14
PPT
Loughborough research forum 2010 data overload presentation
PDF
Underwater sparse image classification using deep convolutional neural networks
PDF
What is wrong with data challenges
DOCX
Dissertation final report
Data-driven hypothesis generation using deep neural nets
Surveillance scene classification using machine learning
thesis_final.pdf
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
CNNs: from the Basics to Recent Advances
Object Discovery using CNN Features in Egocentric Videos
Computing Challenges at the Large Hadron Collider
2019 Project Showcase - Alexander Adam Laurence
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
02 stein intro_4th-pv_modeling_workshop_2015-10-22_sand2015-8571_c
Data Mining And Big Data Brian C Castellani Rajeev Rajaram
Report face recognition : ArganRecogn
Applications of Machine Learning to Location-based Social Networks
Cancer uk 2015_module1_ouellette_ver02
Huawei STW 2018 public
Paris Data Ladies #14
Loughborough research forum 2010 data overload presentation
Underwater sparse image classification using deep convolutional neural networks
What is wrong with data challenges
Dissertation final report
Ad

More from Universitat Politècnica de Catalunya (20)

PDF
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
PDF
Deep Generative Learning for All
PDF
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
PDF
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
PDF
The Transformer - Xavier Giró - UPC Barcelona 2021
PDF
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
PDF
Open challenges in sign language translation and production
PPTX
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
PPTX
Discovery and Learning of Navigation Goals from Pixels in Minecraft
PDF
Learn2Sign : Sign language recognition and translation using human keypoint e...
PDF
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
PDF
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
PDF
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
PDF
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
PDF
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
PDF
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
PDF
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
PDF
Curriculum Learning for Recurrent Video Object Segmentation
PDF
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
PDF
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
The Transformer - Xavier Giró - UPC Barcelona 2021
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Open challenges in sign language translation and production
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Learn2Sign : Sign language recognition and translation using human keypoint e...
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Curriculum Learning for Recurrent Video Object Segmentation
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020

Recently uploaded (20)

PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Encapsulation theory and applications.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
NewMind AI Weekly Chronicles - August'25 Week I
Chapter 3 Spatial Domain Image Processing.pdf
Programs and apps: productivity, graphics, security and other tools
MYSQL Presentation for SQL database connectivity
Spectral efficient network and resource selection model in 5G networks
Review of recent advances in non-invasive hemoglobin estimation
20250228 LYD VKU AI Blended-Learning.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Digital-Transformation-Roadmap-for-Companies.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Network Security Unit 5.pdf for BCA BBA.
Encapsulation theory and applications.pdf
Approach and Philosophy of On baking technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
MIND Revenue Release Quarter 2 2025 Press Release
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
NewMind AI Weekly Chronicles - August'25 Week I

Fine tuning a convolutional network for cultural event recognition

  • 1. FINE-TUNING A CONVOLUTIONAL NETWORK FOR CULTURAL EVENT RECOGNITION ADVISORS: Andrea Calafell Xavier Giró-i-Nieto Amaia Salvador 20/07/2015 AUTHOR: Matthias Zeppelzauer
  • 2. OUTLINE 1. Motivation and State of the art 2. Baseline 3. Study of the dataset bias 4. Denoising 5. Fracking 6. Fine-tuning deeper layers only 7. Ensemble of event detectors 8. Conclusions and future work 2
  • 6. Onsite social media is big data... 6
  • 7. ...and online explorers need our help 7
  • 8. CHALEARN: Looking at People 8 TRAINING SET 5,875 VALIDATION SET 2,332 TEST SET 3,569 50 EVENTS
  • 9. MOTIVATION: Goals 9 ● Improve the results obtained in ChaLearn Challenge. ● Exploit the noisy data collected from Flickr
  • 10. STATE OF THE ART: CaffeNet 10 Content Visual Time stamp Context Geolocation Text Zaharieva’15 X X X Mattivi’11 X X Bossard’13 X X Cao’08 X X X Sutanto’13 X Schinas’12 X X Brenner’13 X X Nguyen’13 X X MediaEval Social Event Detection
  • 11. STATE OF THE ART: CaffeNet 11 CaffeNet ARCHITECTURE [Khrizevsky’12] SOFTWARE [Jia’14] DATA [Deng’09]
  • 12. STATE OF THE ART: CNN ARCHITECTURE 12 Convolutional Neural Network architecture Babenko et al, Neural codes for image retrieval. In Computer Vision-ECCV, 2014
  • 13. STATE OF THE ART: Object+Scene CNNs 13 Object-Scene Convolutional Neural Network for event recognition Wang et al, Object-scene convolutional neural networks for event recognition in images. In CVPRW, 2015
  • 14. OUTLINE 1. Motivation and State of the art 2. Baseline 3. Study of the dataset bias 4. Denoising 5. Fracking 6. Fine-tuning deeper layers only 7. Ensemble of event detectors 8. Conclusions and future work 14
  • 15. BASELINE: Fine-tuning a ConvNet 15 50
  • 16. BASELINE: ChaLearn @ CVPRW 2015 16 Awarded with the 2nd prize of the Cultural Event Recognition Challenge in the ChaLearn Workshop at CVPR 2015 Salvador. A, Giro-i-Nieto. X, Calafell, A, et al, Cultural Event Recognition with Visual ConvNets and Temporal Models. In CVPRW, 2015
  • 17. BASELINE: ChaLearn @ CVPRW 2015 17 Awarded with the 2nd prize of the Cultural Event Recognition Challenge in the ChaLearn Workshop at CVPR 2015 Salvador. A, Giro-i-Nieto. X, Calafell, A, et al, Cultural Event Recognition with Visual ConvNets and Temporal Models. In CVPRW, 2015
  • 18. OUTLINE 1. Motivation and State of the art 2. Baseline 3. Study of the dataset bias 4. Denoising 5. Fracking 6. Fine-tuning deeper layers only 7. Ensemble of event detectors 8. Conclusions and future work 18
  • 19. Convnets require to be trained with... 19 a large amount of labeled images
  • 20. but clean data is expensive... 20 and downloading noisy data in an unsupervised fashion is easier and cheaper.
  • 21. NOISY DATA: Flickr Dataset 21 FLICKR DATASET 4,068 50 EVENTS
  • 22. DATASET BIAS 22 Dataset bias when fine-tuning with ChaLearn or Flickr dataset:
  • 23. OUTLINE 1. Motivation and State of the art 2. Baseline 3. Study of the dataset bias 4. Denoising 5. Fracking 6. Fine-tuning deeper layers only 7. Ensemble of event detectors 8. Conclusions and future work 23
  • 24. DENOISING THE FLICKR DATASET 24 Mosaic of Queens Day from ChaLearn Mosaic of Queens Day from Flickr
  • 25. DENOISING THE FLICKR DATASET 25Example event: Annual Buffalo Roundup Fine-tuned model with ChaLearn New subset from
  • 26. BASELINE: Dataset ordering during fine-tuning 26 CaffeNet FINE-TUNING JOINT:
  • 27. DENOISING THE FLICKR DATASET 27 Joint fine-tuning of the clean and noisy datasets: 0.6136
  • 28. BASELINE: Dataset ordering during fine-tuning 28 CaffeNet FINE-TUNING: FINE-TUNING:
  • 29. DENOISING THE FLICKR DATASET 29 Sequential fine-tuning of the clean and noisy datasets: 0.6136
  • 30. BASELINE: Dataset ordering during fine-tuning 30 CaffeNet FINE-TUNING:FINE-TUNING:
  • 31. DENOISING THE FLICKR DATASET 31 Sequential fine-tuning of the noisy and clean datasets: 0.6136 +1,3%
  • 32. OUTLINE 1. Motivation and State of the art 2. Baseline 3. Study of the dataset bias 4. Denoising 5. Fracking 6. Fine-tuning deeper layers only 7. Ensemble of event detectors 8. Conclusions and future work 32
  • 33. FRACKING MINING +/- SAMPLES 33
  • 34. FRACKING THE TRAINING DATASET 34Example event: Pingxi Lantern Festival Fine-tuned model with ChaLearn New subset from hard negatives hard positive
  • 35. BASELINE: Dataset ordering during fine-tuning 35 CaffeNet FINE-TUNING: Fine-tuning with fracking subset from:
  • 36. FRACKING THE TRAINING DATASET 36 Results of fine-tuning using fracking in images from ChaLearn: baseline: 0.61365 +0,9%
  • 37. OUTLINE 1. Motivation and State of the art 2. Baseline 3. Study of the dataset bias 4. Denoising 5. Fracking 6. Fine-tuning deeper layers only 7. Ensemble of event detectors 8. Conclusions and future work 37
  • 38. FINE-TUNING DEEPER LAYERS ONLY 38 Layer 2 responds to corners and other edge/color conjunctions.
  • 39. FINE-TUNING DEEPER LAYERS ONLY 39 Layer 3 has more complex invariances, capturing similar textures Zeiler et al, Visualizing and Understanding Convolutional Networks, In Computer Vision-ECCV 2014,
  • 40. FINE-TUNING DEEPER LAYERS ONLY 40 50 Andrej Karpathy. Convolutional neural networks for visual recognition. In Stanford CS class CS231n. FC6 FC7 FC8
  • 41. FINE-TUNING DEEPER LAYERS ONLY 41 Results of only fine-tuning the deeper layers: +3% 0.61365
  • 42. FINE-TUNING DEEPER LAYERS ONLY 42 Results of only fine-tuning the deeper layers : +4% 0.6136
  • 43. OUTLINE 1. Motivation and State of the art 2. Baseline 3. Study of the dataset bias 4. Denoising 5. Fracking 6. Fine-tuning deeper layers only 7. Ensemble of event detectors 8. Conclusions and future work 43
  • 44. BASELINE: ChaLearn @ CVPRW 2015 44 Awarded with the 2nd prize of the Cultural Event Recognition Challenge in the ChaLearn Workshop at CVPR 2015 Salvador. A, Giro-i-Nieto. X, Calafell, A, et al, Cultural Event Recognition with Visual ConvNets and Temporal Models. In CVPRW, 2015
  • 45. ENSEMBLE OF EVENT DETECTORS 45 SINGLE CONVNET FOR THE 50 EVENTS:
  • 46. ENSEMBLE OF EVENT DETECTORS 46 ONE CONVNET FOR EACH EVENTS:
  • 47. ENSEMBLE OF EVENT DETECTORS 47 Results of ensemble of binary : +6,6% 0.6136
  • 48. OUTLINE 1. Motivation and State of the art 2. Baseline 3. Study of the dataset bias 4. Denoising 5. Fracking 6. Fine-tuning deeper layers only 7. Ensemble of event detectors 8. Conclusions and future work 48
  • 49. CONLUSIONS 49 ● The Flickr dataset helped us to improve the score by swapping the order in which we were using the clean and noisy datasets CaffeNet FINE-TUNING:FINE-TUNING: +1,3%
  • 50. CONLUSIONS 50 ● The network actually succeeds in improving his performance by learning from its own mistakes when applying fracking. +0,9% CaffeNet FINE-TUNING: Fine-tuning with fracking subset from:
  • 51. CONLUSIONS 51 ● The results are better if we keep the weights learned in the earlier layers from a very large dataset. 50 +4%
  • 52. CONLUSIONS 52 ● Fine-tuning one convnet for each class increases the score. +6,6%
  • 53. FUTURE WORK 53 ● Mix our solutions with a fine-tuned network with PLACES, and with other local solutions. SCENE CNN (PLACES) LOCAL NOW ● Compete (and try to win) ChaLearn @ ICCV 2015 !!
  • 54. FINE-TUNING A CONVOLUTIONAL NETWORK FOR CULTURAL EVENT RECOGNITION ADVISORS: Andrea Calafell Xavier Giró-i-Nieto Amaia Salvador 20/07/2015 AUTHOR: Matthias Zeppelzauer