SlideShare a Scribd company logo
HUCVL@MediaEval 2016:
Predicting Interesting Key Frames with Deep Models
Göksu Erdoğan, Aykut Erdem, Erkut Erdem
{goksuerdogan, aykut, erkut}@cs.hacettepe.edu.tr
Experimental Results
• We submit totally three runs on the test set.
• Each of them corresponds to different deep model.
2016 Predicting Media Interestingness Task
• The MediaEval 2016 Predicting Media Interestingness Task as a
new task [1].
• We focus on the image interestingness subtask.
• Goal: Automatically identify interesting key frames of a given
movie trailer.
References
[1] C.-H. Demarty, M. Sj•oberg, B. Ionescu, T.-T. Do, H. Wang, N. Q.K. Duong,
and F. Lefebvre, "Mediaeval 2016 predicting media interestingness task", In
Proc. of
the MediaEval 2016 Workshop, 2016.
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classication with
deep convolutional neural networks", Advances in Neural Information Processing
Systems, pages 1097 - 1105, 2012.
[3] A. Khosla, A. S. Raju, A. Torralba, and A. Oliva, "Understanding and predicting
image memorability at a large scale", In Proc. International Conference on
Computer Vision, pages 2390 - 2398, 2015.
[4] X. Wang and A. Gupta, "Unsupervised learning of visual representations using
videos.", In Proc. International Conference on Computer Vision, pages 2794 -
2802, 2015.
Conclusion & Future Work
• Imbalanced data makes training process challenging.
• Future directions:
– Context of a local temporal neighborhood or the whole video.
– Multi-task learning scheme: jointly perform classification
(interestingness label) and regression (interestingness score).
Our Approach
• We propose three different deep models:
– AlexNet
– MemNet
– Triplet Loss
• Problem is formulated as regression problem.
• Post-processing is employed to find labels.
Deep
network
0.7
1
0.2 0.15 0.1
0.9
0.4
0
0.2
0.4
0.6
0.8
1
1.2
0 2 4 6 8
Interestingnessscore
frame
0
0.5
1
interesting
uninteresting
Acknowledgement
This work is partially supported by the Scientic and Technological
Research Council of Turkey (Award #113E497).
Runs network
model
Run 1 AlexNet
Run 2 MemNet
Run 3 Triplet Loss
Runs mAP accuracy
Run 1 0.2125 0.8224
Run 2 0.2121 0.8275
Run 3 0.2001 0.8249
1890 211
205 36
1896 205
202 42
1893 208
202 39
Run 1 Run 2 Run 3
AlexNet [2]
• ImageNet dataset
• ILSVRC 2012 task
• Object classification
Change:
• Softmax loss layer is replaced by Euclidean loss.
• Training lasted approximately 2000 epochs.
MemNet [3]
• Memorability and interestingness are both intrinsic image
properties.
• No change: "MemNet fits directly to the our problem."
• Training lasted approximately 3000 epochs.
Triplet Loss [4]
• Triplet loss function:
𝐿 𝑥, 𝑥+
, 𝑥−
= max 0, 𝐷 𝑥, 𝑥+
− 𝐷 𝑥, 𝑥−
+ 𝑀
• Learning a 1D embedding space for images in which
– interesting images become closer,
– uninteresting frames become distant from interesting
images.
Change:
• Size of fc8 layer is dropped to one.
• Training lasted approximately 10000 epochs.
Method overview.
Interestingness Classification
• Our CNN models compute real valued interestingness scores for
each key frame of a given video sequence.
• We need to convert these scores to interestingness labels such as
interesting/uninteresting
frames mean std
interesting 0.11 0.08
uninteresting 0.89 0.08
Distributions of the confidence values for interesting/uninteresting
frames over the training data (left) and a video sample(right)
Statistics for the confidence values for interesting and
uninteresting frames over training data
Evaluation results on the test set.
Confusion matrices
• A global threshold for interestingness over all training data does not
exist.
• So threshold for interestingness is specified on the video level.
• Key frames are sorted based on interestingness scores, then top
10% frames are classified as interesting.

More Related Content

PPTX
Object classification using deep neural network
PDF
2 gamarnik
PDF
3D 딥러닝 동향
PDF
Image compression and reconstruction using a new approach by artificial neura...
PPTX
The neural tangent link between CNN denoisers and non-local filters
PDF
Domain Invariant Representation Learning with Domain Density Transformations
PDF
Crowdsourced Object Segmentation with a Game
PDF
Ijetr011958
Object classification using deep neural network
2 gamarnik
3D 딥러닝 동향
Image compression and reconstruction using a new approach by artificial neura...
The neural tangent link between CNN denoisers and non-local filters
Domain Invariant Representation Learning with Domain Density Transformations
Crowdsourced Object Segmentation with a Game
Ijetr011958

What's hot (9)

PDF
A comparison between scilab inbuilt module and novel method for image fusion
PPTX
Visual cryptography for color images
PPTX
Visual Search Engine with MXNet Gluon
PPTX
Machine Learning - Introduction to Convolutional Neural Networks
PPT
SECURE IMAGE RETRIEVAL BASED ON HYBRID FEATURES AND HASHES
PDF
APPLICATION OF CONVOLUTIONAL NEURAL NETWORK IN LAWN MEASUREMENT
PDF
APPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGE
PDF
Lecture 4 Relationship between pixels
PPTX
Brief Introduction to Deep Learning for Object Recognition Using MATLAB
A comparison between scilab inbuilt module and novel method for image fusion
Visual cryptography for color images
Visual Search Engine with MXNet Gluon
Machine Learning - Introduction to Convolutional Neural Networks
SECURE IMAGE RETRIEVAL BASED ON HYBRID FEATURES AND HASHES
APPLICATION OF CONVOLUTIONAL NEURAL NETWORK IN LAWN MEASUREMENT
APPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGE
Lecture 4 Relationship between pixels
Brief Introduction to Deep Learning for Object Recognition Using MATLAB
Ad

Viewers also liked (19)

PDF
MediaEval 2015 - Multimodal Person Discovery in Broadcast TV
PDF
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...
PDF
MediaEval 2016 - UVigo System for Multimodal Person Discovery in Broadcast TV...
PDF
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
PDF
MediaEval 2016 - RECOD at Placing Task
PDF
MediaEval 2015 - Privacy Protection Filter Using StegoScrambling in Video Sur...
PDF
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
PDF
MediaEval 2016: LAPI at Predicting Media Interestingness Task
PDF
MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015
PDF
MediaEval 2016 - Approaches to and Issues Arising from Answering Natural Lang...
PDF
MediaEval 2015 - Emotion in Music: Task Overview
PDF
MediaEval 2016 - TUD-MMC Predicting media Interestingness Task
PDF
MediaEval 2015 - The SPL-IT-UC Query by Example Search on Speech system for M...
PDF
MediaEval 2016 - Tag Propagation in Talking Face Graphs
PDF
MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...
PDF
MediaEval 2016 - UPMC at MediaEval2016 Retrieving Diverse Social Images Task
PDF
MediaEval 2016 - Verifying Multimedia Use Task Overview
PDF
MediaEval 2016 - Simula Team @ Context of Experience Task
PDF
MediaEval 2016 - MLPBOON Predicting Media Interestingness System
MediaEval 2015 - Multimodal Person Discovery in Broadcast TV
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...
MediaEval 2016 - UVigo System for Multimodal Person Discovery in Broadcast TV...
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
MediaEval 2016 - RECOD at Placing Task
MediaEval 2015 - Privacy Protection Filter Using StegoScrambling in Video Sur...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016: LAPI at Predicting Media Interestingness Task
MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015
MediaEval 2016 - Approaches to and Issues Arising from Answering Natural Lang...
MediaEval 2015 - Emotion in Music: Task Overview
MediaEval 2016 - TUD-MMC Predicting media Interestingness Task
MediaEval 2015 - The SPL-IT-UC Query by Example Search on Speech system for M...
MediaEval 2016 - Tag Propagation in Talking Face Graphs
MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...
MediaEval 2016 - UPMC at MediaEval2016 Retrieving Diverse Social Images Task
MediaEval 2016 - Verifying Multimedia Use Task Overview
MediaEval 2016 - Simula Team @ Context of Experience Task
MediaEval 2016 - MLPBOON Predicting Media Interestingness System
Ad

Similar to MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models (20)

PPTX
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
PDF
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
PPTX
Weave-D - 2nd Progress Evaluation Presentation
PPT
Introduction to Machine Vision
PDF
Automated Image Captioning – Model Based on CNN – GRU Architecture
PDF
APPLICATION OF CONVOLUTIONAL NEURAL NETWORK IN LAWN MEASUREMENT
PDF
Signal & Image Processing : An International Journal (SIPIJ)
PDF
Introduction talk to Computer Vision
PDF
A Beginner's Guide to Monocular Depth Estimation
PDF
IMAGE DE-NOISING USING DEEP NEURAL NETWORK
PDF
Image De-Noising Using Deep Neural Network
PDF
Image De-Noising Using Deep Neural Network
PDF
Image De-Noising Using Deep Neural Network
PDF
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
PPTX
Promises of Deep Learning
PDF
A Pointing Gesture-based Signal to Text Communication System Using OpenCV in ...
PPTX
Lenet and Alexnet machine learning .pptx
PDF
IMAGE SEGMENTATION AND ITS TECHNIQUES
PPTX
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
Weave-D - 2nd Progress Evaluation Presentation
Introduction to Machine Vision
Automated Image Captioning – Model Based on CNN – GRU Architecture
APPLICATION OF CONVOLUTIONAL NEURAL NETWORK IN LAWN MEASUREMENT
Signal & Image Processing : An International Journal (SIPIJ)
Introduction talk to Computer Vision
A Beginner's Guide to Monocular Depth Estimation
IMAGE DE-NOISING USING DEEP NEURAL NETWORK
Image De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural Network
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Promises of Deep Learning
A Pointing Gesture-based Signal to Text Communication System Using OpenCV in ...
Lenet and Alexnet machine learning .pptx
IMAGE SEGMENTATION AND ITS TECHNIQUES
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak

More from multimediaeval (20)

PPTX
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
PDF
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
PDF
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
PDF
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
PPTX
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
PDF
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
PDF
Fooling an Automatic Image Quality Estimator
PDF
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
PDF
Pixel Privacy: Quality Camouflage for Social Images
PDF
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
PPTX
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
PDF
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
PDF
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
PPTX
Deep Conditional Adversarial learning for polyp Segmentation
PPTX
A Temporal-Spatial Attention Model for Medical Image Detection
PPTX
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
PDF
Fine-tuning for Polyp Segmentation with Attention
PPTX
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
PPTX
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
PDF
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Fooling an Automatic Image Quality Estimator
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Pixel Privacy: Quality Camouflage for Social Images
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Deep Conditional Adversarial learning for polyp Segmentation
A Temporal-Spatial Attention Model for Medical Image Detection
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
Fine-tuning for Polyp Segmentation with Attention
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...

Recently uploaded (20)

PPTX
2Systematics of Living Organisms t-.pptx
PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PPT
6.1 High Risk New Born. Padetric health ppt
PPTX
Microbiology with diagram medical studies .pptx
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
Pharmacology of Autonomic nervous system
PDF
The scientific heritage No 166 (166) (2025)
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
HPLC-PPT.docx high performance liquid chromatography
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
2Systematics of Living Organisms t-.pptx
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
6.1 High Risk New Born. Padetric health ppt
Microbiology with diagram medical studies .pptx
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
Phytochemical Investigation of Miliusa longipes.pdf
7. General Toxicologyfor clinical phrmacy.pptx
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
2. Earth - The Living Planet Module 2ELS
TOTAL hIP ARTHROPLASTY Presentation.pptx
INTRODUCTION TO EVS | Concept of sustainability
Pharmacology of Autonomic nervous system
The scientific heritage No 166 (166) (2025)
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
neck nodes and dissection types and lymph nodes levels
ECG_Course_Presentation د.محمد صقران ppt
Classification Systems_TAXONOMY_SCIENCE8.pptx
HPLC-PPT.docx high performance liquid chromatography
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg

MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models

  • 1. HUCVL@MediaEval 2016: Predicting Interesting Key Frames with Deep Models Göksu Erdoğan, Aykut Erdem, Erkut Erdem {goksuerdogan, aykut, erkut}@cs.hacettepe.edu.tr Experimental Results • We submit totally three runs on the test set. • Each of them corresponds to different deep model. 2016 Predicting Media Interestingness Task • The MediaEval 2016 Predicting Media Interestingness Task as a new task [1]. • We focus on the image interestingness subtask. • Goal: Automatically identify interesting key frames of a given movie trailer. References [1] C.-H. Demarty, M. Sj•oberg, B. Ionescu, T.-T. Do, H. Wang, N. Q.K. Duong, and F. Lefebvre, "Mediaeval 2016 predicting media interestingness task", In Proc. of the MediaEval 2016 Workshop, 2016. [2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classication with deep convolutional neural networks", Advances in Neural Information Processing Systems, pages 1097 - 1105, 2012. [3] A. Khosla, A. S. Raju, A. Torralba, and A. Oliva, "Understanding and predicting image memorability at a large scale", In Proc. International Conference on Computer Vision, pages 2390 - 2398, 2015. [4] X. Wang and A. Gupta, "Unsupervised learning of visual representations using videos.", In Proc. International Conference on Computer Vision, pages 2794 - 2802, 2015. Conclusion & Future Work • Imbalanced data makes training process challenging. • Future directions: – Context of a local temporal neighborhood or the whole video. – Multi-task learning scheme: jointly perform classification (interestingness label) and regression (interestingness score). Our Approach • We propose three different deep models: – AlexNet – MemNet – Triplet Loss • Problem is formulated as regression problem. • Post-processing is employed to find labels. Deep network 0.7 1 0.2 0.15 0.1 0.9 0.4 0 0.2 0.4 0.6 0.8 1 1.2 0 2 4 6 8 Interestingnessscore frame 0 0.5 1 interesting uninteresting Acknowledgement This work is partially supported by the Scientic and Technological Research Council of Turkey (Award #113E497). Runs network model Run 1 AlexNet Run 2 MemNet Run 3 Triplet Loss Runs mAP accuracy Run 1 0.2125 0.8224 Run 2 0.2121 0.8275 Run 3 0.2001 0.8249 1890 211 205 36 1896 205 202 42 1893 208 202 39 Run 1 Run 2 Run 3 AlexNet [2] • ImageNet dataset • ILSVRC 2012 task • Object classification Change: • Softmax loss layer is replaced by Euclidean loss. • Training lasted approximately 2000 epochs. MemNet [3] • Memorability and interestingness are both intrinsic image properties. • No change: "MemNet fits directly to the our problem." • Training lasted approximately 3000 epochs. Triplet Loss [4] • Triplet loss function: 𝐿 𝑥, 𝑥+ , 𝑥− = max 0, 𝐷 𝑥, 𝑥+ − 𝐷 𝑥, 𝑥− + 𝑀 • Learning a 1D embedding space for images in which – interesting images become closer, – uninteresting frames become distant from interesting images. Change: • Size of fc8 layer is dropped to one. • Training lasted approximately 10000 epochs. Method overview. Interestingness Classification • Our CNN models compute real valued interestingness scores for each key frame of a given video sequence. • We need to convert these scores to interestingness labels such as interesting/uninteresting frames mean std interesting 0.11 0.08 uninteresting 0.89 0.08 Distributions of the confidence values for interesting/uninteresting frames over the training data (left) and a video sample(right) Statistics for the confidence values for interesting and uninteresting frames over training data Evaluation results on the test set. Confusion matrices • A global threshold for interestingness over all training data does not exist. • So threshold for interestingness is specified on the video level. • Key frames are sorted based on interestingness scores, then top 10% frames are classified as interesting.