2016 MediaEval - Interestingness Task Overview

Predicting Media Interestingness Task
Overview
Claire-Hélène Demarty – Technicolor
Mats Sjöberg – University of Helsinki
Bogdan Ionescu – University Polytehnica of Bucharest
Thanh-Toan Do – Singapor University of Science
Hanli Wang – Tongji University
Ngoc Q.K. Duong, Technicolor
Frédéric Lefebvre, Technicolor
MediaEval 2016 Workshop
October, 20-21st 2016

Interestingness?
Are these interesting images?
2
?

Interestingness?
3
?
Definition?

Interestingness?
4
?
Definition?
Subjective
SemanticPerceptual

n Derives from a use case at Technicolor
n Helping professionals to illustrate a Video on Demand (VOD) web site by
selecting some interesting frames and/or video excerpts for the posted
movies.
n The frames and excerpts should be suitable in terms of helping a user to
make his/her decision about whether he/she is interested in watching the
underlying movie.
n Two subtasks -> Image and Video
n Image subtask: given a set of keyframes extracted from a movie, …
n Video subtask: given the video shots of a movie, …
… automatically identify those images/shots that viewers report to be the most
interesting in the given movie.
n Binary classification task on a per movie basis…
… but confidence values are also required.
5
Task definition
12/7/16

n From Hollywood-like movie trailers
n Manual segmentation of shots
n Extraction of middle key-frame of each shot
6
Dataset & additional features
12/7/16
Development Set Test Set
Trailer # 52 26
Total % interesting Total % interesting
Shot # 5054 8.3 2342 9.6
Key-frame # 5054 9.4 2342 10.3
n Precomputed content descriptors:
n Low-level: denseSift, HoG, LBP, GIST, HSV color histograms, MFCC, fc7 and
prob layers from AlexNet
n Mid-level: face detection and tracking-by-detection

7
Manual annotations
12/7/16
Thank you
Mats!
Thanks to all
of you!
Binary decision
(manual
thresholding)
Pair comparison
protocol
Aggregation into
rankings
pairs rankings
Annotators:
>310 persons for video
>100 persons for image
From 29 countries

Image subtask: Visual information only, no external data
Video subtask: Audio and visual information, no external data
External data IS:
n Additional datasets and annotations dedicated to the interestingness
prediction
n Pre-trained models, features, detectors obtained from such dedicated
datasets
n Additional metadata that could be found on the internet on the provided
content
External data IS NOT:
n CNN features generated on generic datasets not dedicated to interestingness
prediction
8
Required runs
12/7/16

n Official measure:
Ø Mean Average Precision (over all trailers)
n Additional metrics are computed:
n False alarm rate, miss detection rate, precision, recall, F-measure, etc.
9
Evaluation metrics
12/7/16

10
Task participation
12/7/16
0
5
10
15
20
25
30
35
Registrations Returned agreements Submitting teams Workshop
Task Participation
n Registrations:
n 31 teams
n 16 countries
n 3 ‘experienced’ teams
n Submissions: 12 teams
n 9 teams submitted on both substasks
n 2 teams on image subtask
n 1 team on video subtask

11
Official results – Image subtask – 27 runs
12/7/16
Runs MAP Official ranking
me16in_tudmmc2_image_histface 0.2336 TUDMMC
me16in_technicolor_image_run1_SVM_rbf* 0.2336 Technicolor
me16in_technicolor_image_run2_DNNresampling06_100* 0.2315 Technicolor
me16in_MLPBOON_image_run5 0.2296 MLPBOON
me16in_BigVid_image_run5FusionCNN 0.2294 BigVid
me16in_tudmmc2_image_hist 0.2202 TUDMMC
me16in_HUCVL_image_run1 0.2125 HUCVL
me16in_UITNII_image_FA 0.2115 UITNII
me16in_RUC_image_run2 0.2035 RUC
me16in_ethcvl1_image_run2 0.1952 ETHCVL
me16in_HKBU_image_baseline 0.1868 HKBU
me16in_HKBU_image_drbaseline 0.1839 HKBU
me16in_BigVid_image_run4SVM 0.1789 BigVid
me16in_UITNII_image_V1 0.1773 UITNII
me16in_lapi_image_runf1* 0.1714 LAPI
me16in_UNIGECISA_image_ReglineLoF 0.1704 UNIGECISA
BASELINE (on testset) 0.1655
me16in_lapi_image_runf2* 0.1398 LAPI
* organizers

12
Official results – Video subtask – 28 runs
12/7/16
* organizers
Runs MAP Official ranking
me16in_recod_video_run1 0.1815 RECOD
me16in_recod_video_run1_old 0.1753 RECOD
me16in_HKBU_video_drbaseline 0.1735 HKBU
me16in_UNIGECISA_video_RegsrrLoF 0.171 UNIGECISA
me16in_RUC_video_run2 0.1704 RUC
me16in_UITNII_video_A1 0.169 UITNII
me16in_RUC_video_run1 0.1647 RUC
me16in_UITNII_video_F1 0.1641 UITNII
me16in_lapi_video_runf5 0.1629 LAPI
me16in_technicolor_video_run5_CSP_multimodal_80_epoch7 0.1618 Technicolor
me16in_ethcvl1_video_run2 0.1574 ETHCVL
me16in_tudmmc2_video_histface 0.1558 TUDMMC
me16in_tudmmc2_video_hist 0.1557 TUDMMC
me16in_BigVid_video_run3RankSVM 0.154 BigVid
me16in_HKBU_video_baseline 0.1521 HKBU
me16in_BigVid_video_run2FusionCNN 0.1511 BigVid
me16in_UNIGECISA_video_RegsrrGiFe 0.1497 UNIGECISA
BASELINE (on testset) 0.1496
me16in_BigVid_video_run1SVM 0.1482 BigVId
me16in_technicolor_video_run3_LSTM_U19_100_epoch5 0.1465 Technicolor
me16in_UNIGECISA_video_SVRloAudio 0.1367 UNIGECISA
me16in_technicolor_video_run4_CSP_video_80_epoch9 0.1365 Technicolor
me16in_ethcvl1_video_run1 0.1362 ETHCVL

n On the task itself?
n Image interestingness is NOT video interestingness
n Issue with the video dataset (needs more interations? needs more data
samples?)
n Overall low map values: room for improvment!
n On the participants systems?
n This year’s trend? No trend!
n Classic machine learning, deep learning systems… but also rule-based systems
n Some multimodal (audio, video, text), some temporal… and some not.
n (Mostly) No use of external data
n Simple systems did as well (better…) than sophisticated systems
n Dataset unbalance: an issue?
n Dataset size: penalizing deep learning systems?
13
What we have learned
12/7/16

2016 MediaEval - Interestingness Task Overview

More Related Content

Viewers also liked (13)

Similar to 2016 MediaEval - Interestingness Task Overview (20)

More from multimediaeval (20)

Recently uploaded (20)

2016 MediaEval - Interestingness Task Overview