SlideShare a Scribd company logo
Predicting Media
Interestingness
Deep Learning for Multimedia Processing
Lluc Cardoner
Outline
● Motivation
● Predicting image interestingness
● Results
● Predicting video interestingness
● Results
2
Motivation
3
What is interesting?
4
What is interesting?
5
Not interesting Interesting
Problem definition
Interesting
Not interesting
6
Image / Video
MediaEval conclusions 2016
Features
● Image: CNN features
● Video: Multi-modal (visual + audio)
Models
● SVM mostly used
● Few end-to-end deep learning architectures
● Video: time dependencies
7
Demarty, Claire-Helène, et al. "Predicting Interestingness of Visual Content." Visual Content Indexing and Retrieval with Psycho‐Visual
Models (2017).
End-to-end deep learning approach
8
Dataset 2016: Data
● 52 movie trailers - development
● 26 movie trailers - testing
Total: 13 GB
9
Outline
● Motivation
● Predicting image interestingness
● Results
● Predicting video interestingness
● Results
10
Dataset 2016: Frames
● 52 movie trailers - development
● 26 movie trailers - testing
11
Segment 1 Segment 2 Segment 3 Segment N
...
Movie trailer
Segment 4 Segment 5
Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Frame 6
Dataset: Ground truth
● Classification: 2 classes
○ 0 - not interesting
○ 1 - interesting
● Confidence values
○ Between 0 and 1
● Rank of the frame or segment in the video
12
Interesting: 1.0 → 1
Not Interesting: 0.026 → 0
Predicting image interestingness
● ResNet50
○ Transfer learning
○ Fine tuning
13He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern
recognition. 2016.
Adding layers
Problem: overfitting
14
Data augmentation
● Image Data Generator
○ Horizontal flip
○ Shuffling
15
Dropout
16
Unbalanced classes
● Class weights
17
Train last layers
18
Outline
● Motivation
● Predicting image interestingness
● Results
● Predicting video interestingness
● Results
19
Evaluation metric
● Mean Average Precision (MAP)
For both subtasks
20
Results: Image interestingness
Id MAP Architecture
25 0.1392 train new layers and 2 last layers from ResNet
27 0.1728 augment just class 1 and balanced
30 0.1478 dropout of 0.5
31 0.1177 Class weights + dropout + horizontal flip
37 0.1564 Class weights + dropout + flip, shift, zoom
39 0.1402 Class weights + dropout + flip, shift, zoom + 2 ResNet layers
Threshold: 0.5
2016 MAP
Baseline 0.1655
Top result 0.2336
21
Threshold
22
Results: Image interestingness
Static Threshold Dynamic threshold
Id MAP threshold MAP
25 0.1392 0.1577 0.1932
27 0.1728 0.4875 0.1909
30 0.1478 0.1572 0.2243
31 0.1177 0.5066 0.2396
37 0.1564 0.5295 0.2362
39 0.1402 0.1336 0.1795
2016 MAP
Baseline 0.1655
Top result 0.2336
23
Outline
● Predicting image interestingness
● Results
● Predicting video interestingness
● Results
24
Dataset 2016: Segments
● 52 movie trailers - development
● 26 movie trailers - testing
Segment 1 Segment 2 Segment 3 Segment N
...
Movie trailer
Segment 4 Segment 5
25
Predicting video interestingness
● Extract features: C3D
● Training LSTM network
Time 26
3D Convolutional network
27
Montes, Alberto, Amaia Salvador, and Xavier Giro-i-Nieto. "Temporal activity detection in untrimmed videos with recurrent neural networks."
NIPS Workshop Large Scale Computer Vision Systems 2016
Extract features
● Preprocess
○ Clips
● Feature extraction
○ 3D convolutional network
● Label mapping
○ Feature vector
28
Label mapping
29
S1 {0, 0.50} S2 {1, 0.60}
Clip (16 frames)
80% 20%
0.8 x 0.5 + 0.2 x 0.6 = 0.52
Fine-tuning LSTM
30
Outline
● Predicting image interestingness
● Results
● Predicting video interestingness
● Results
31
Results: Video interestingness
2016 MAP
Baseline 0.1496
Top result 0.1815
Technicolor 0.1365
Id MAP
65 0.1541
Clips
Interestingnessvalue
32Shen, Yuesong, Claire-Hélène Demarty, and Ngoc QK Duong. "Technicolor@ MediaEval 2016 Predicting Media Interestingness Task."
MediaEval (2016).
Conclusions
33
Predicting image interestingness MAP
Class weights + dropout + horizontal flip 0.2396
Class weights + dropout + flip, shift, zoom 0.2362
Conclusions
34
Static Threshold Dynamic threshold
MAP MAP
0.1392 0.1932
0.1728 0.1909
0.1478 0.2243
0.1177 0.2396
0.1564 0.2362
0.1402 0.1795
Conclusions
35
Image Video
Baseline: 0.1655
Top result 2016: 0.2336
Our result: 0.2396
Baseline: 0.1496
Top result 2016: 0.1815
Our result: 0.1541
Technicolor: 0.1365
36
https://github.com/lluccardoner/MediaInterestingness
37

More Related Content

PPTX
Activity report on Deep-learning based compression
PPTX
JPEG PLENO - Towards a New Standard for Plenoptic Image Compression
PDF
Change Detection of 3D Scene with 3D and 2D Information for Environment Checking
PDF
Automatic image moderation in classifieds, Jarosław Szymczak
PDF
Automatic image moderation in classifieds
PDF
Unity Roadmap 2020: Live games
PDF
MediaEval 2017 - Interestingness Task: MediaEval 2017 Predicting Media Intere...
PPTX
Ping Pong Pad
Activity report on Deep-learning based compression
JPEG PLENO - Towards a New Standard for Plenoptic Image Compression
Change Detection of 3D Scene with 3D and 2D Information for Environment Checking
Automatic image moderation in classifieds, Jarosław Szymczak
Automatic image moderation in classifieds
Unity Roadmap 2020: Live games
MediaEval 2017 - Interestingness Task: MediaEval 2017 Predicting Media Intere...
Ping Pong Pad

Similar to Predicting Media Interestingness (20)

PDF
"Computational Photography: Understanding and Expanding the Capabilities of S...
PDF
Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
PDF
Blender Tutorial Animation Basics - Camera follow path tracking to target
PDF
Data behind UA
PDF
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
PDF
How to get stakeholder involvement in an agile world?v2
PDF
Graph Gurus Episode 28: In-Database Machine Learning Solution for Real-Time R...
PPTX
NMSL_2017summer
PDF
支援DSL的嵌入式圖形操作環境
PPTX
Fixation Prediction for 360° Video Streaming in Head-Mounted Virtual Reality
PDF
Video Captioning at TRECVID 2022
PDF
“Image Signal Processing Optimization for Object Detection,” a Presentation f...
PPTX
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
PPTX
2015 Intro to 3d Module \ Cert4 IDM
PPTX
MOVIE RECOMMENDATION SYSTEM, UDUMULA GOPI REDDY, Y24MC13085.pptx
PDF
Modeling and Performance Analysis of Scrumban with Test-Driven Development us...
PDF
“How Do We Enable Edge ML Everywhere? Data, Reliability and Silicon Flexibili...
PPTX
On Ramp to Unreal Engine
PDF
Sticky Notes - a tool for supporting collaborative activities in a 3D virtual...
PPTX
Dimond recognition system
"Computational Photography: Understanding and Expanding the Capabilities of S...
Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
Blender Tutorial Animation Basics - Camera follow path tracking to target
Data behind UA
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
How to get stakeholder involvement in an agile world?v2
Graph Gurus Episode 28: In-Database Machine Learning Solution for Real-Time R...
NMSL_2017summer
支援DSL的嵌入式圖形操作環境
Fixation Prediction for 360° Video Streaming in Head-Mounted Virtual Reality
Video Captioning at TRECVID 2022
“Image Signal Processing Optimization for Object Detection,” a Presentation f...
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
2015 Intro to 3d Module \ Cert4 IDM
MOVIE RECOMMENDATION SYSTEM, UDUMULA GOPI REDDY, Y24MC13085.pptx
Modeling and Performance Analysis of Scrumban with Test-Driven Development us...
“How Do We Enable Edge ML Everywhere? Data, Reliability and Silicon Flexibili...
On Ramp to Unreal Engine
Sticky Notes - a tool for supporting collaborative activities in a 3D virtual...
Dimond recognition system
Ad

More from Universitat Politècnica de Catalunya (20)

PDF
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
PDF
Deep Generative Learning for All
PDF
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
PDF
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
PDF
The Transformer - Xavier Giró - UPC Barcelona 2021
PDF
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
PDF
Open challenges in sign language translation and production
PPTX
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
PPTX
Discovery and Learning of Navigation Goals from Pixels in Minecraft
PDF
Learn2Sign : Sign language recognition and translation using human keypoint e...
PDF
Intepretability / Explainable AI for Deep Neural Networks
PDF
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
PDF
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
PDF
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
PDF
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
PDF
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
PDF
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
PDF
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
PDF
Curriculum Learning for Recurrent Video Object Segmentation
PDF
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
The Transformer - Xavier Giró - UPC Barcelona 2021
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Open challenges in sign language translation and production
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Learn2Sign : Sign language recognition and translation using human keypoint e...
Intepretability / Explainable AI for Deep Neural Networks
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Curriculum Learning for Recurrent Video Object Segmentation
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Ad

Recently uploaded (20)

PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Leprosy and NLEP programme community medicine
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
modul_python (1).pptx for professional and student
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
Predictive modeling basics in data cleaning process
PDF
Introduction to Data Science and Data Analysis
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Computer network topology notes for revision
Data_Analytics_and_PowerBI_Presentation.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Miokarditis (Inflamasi pada Otot Jantung)
Leprosy and NLEP programme community medicine
IB Computer Science - Internal Assessment.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
modul_python (1).pptx for professional and student
Reliability_Chapter_ presentation 1221.5784
Supervised vs unsupervised machine learning algorithms
Clinical guidelines as a resource for EBP(1).pdf
Predictive modeling basics in data cleaning process
Introduction to Data Science and Data Analysis
Introduction-to-Cloud-ComputingFinal.pptx
Introduction to Knowledge Engineering Part 1
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Computer network topology notes for revision

Predicting Media Interestingness