Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a Video Memorable?

MediaEval2020
Predicting Media Memorability
Task Overview
Alba García Seco de Herrera, Rukiye Savran Kiziltepe, Jon Chamberlain, Mihai Gabriel Constantin,
Claire-Hélène Demarty, Faiyaz Doctor, Bogdan Ionescu, Alan Smeaton
Presentation Video

Task Description
Goal: predicting how memorable a video is to viewers
15/12/2020 MediaEval2020 2
• Automatically predicting short-term and
long-term memorability
• TRECVid 2019 Video to Text dataset1
• Sound and more action
1. Awad, G., Butt, A.A., Lee, Y., Fiscus, J., Godil, A., Delgado, A., Smeaton, A.F. and Graham, Y., Trecvid 2019:
An evaluation campaign to benchmark video activity detection, video captioning and matching, and video
search & retrieval. 2019.

Annotation Tool
• Short-term memorability : a few minutes after memorization
• Long-term memorability: 24 – 72 hours later
15/12/2020 MediaEval2020 3
Romain Cohendet, Claire-Hélène Demarty, Ngoc Duong, and Martin Engilberge. VideoMem: Constructing, Analyzing, Predicting Short-term and Long-term Video Memorability. Proceedings of the IEEE
International Conference on Computer Vision. 2019.
Video Memorability Game

Annotation Protocol
Step 1 (180 videos)
• 40 targets– repeated after a few minutes
• 60 fillers – non target videos
• 20 vigilance fillers – repeated quickly to monitor the attention
15/12/2020 MediaEval2020 4
Romain Cohendet, Claire-Hélène Demarty, Ngoc Duong, and Martin Engilberge. VideoMem: Constructing, Analyzing, Predicting Short-term and Long-term Video Memorability. Proceedings of the IEEE
International Conference on Computer Vision. 2019.
Step 2 (120 videos)
• 40 targets– randomly chosen from non-vigilance fillers
• 80 fillers – randomly chosen new videos

Dataset Description
• TRECVid 2019
(Video to Text)
• 1500 videos
• 1000 training set
• 500 test set
15/12/2020 MediaEval2020 5

Dataset Description
15/12/2020 MediaEval2020 6
• AlexNetFC7
• HOG
• HSVHist
• RGBHist
• LBP
• VGGFC7
• C3D
• Text descriptions
• Annotations
• Response time
• Key press
• Video position
Short-term memorability score
Long-term memorability score

Examples (Low Short-term and Long-term Memorability)
15/12/2020 MediaEval2020 7
• At football game, the ball is kicked past end zone and
woman is knocked down from her knees
• football player are playing at a football field.
• At a college football game, during a kickoff, the kicker
kicks the ball over the endzone and hits a spectator
in the face while they are trying to catch it.
• a person is injured when the football player kicked a
ball across a field during a game
• Football kicks football during a day game and a
cheerleader tries to catch it and ball hits her in the
head.

Examples (High Short-term and Long-term Memorability)
15/12/2020 MediaEval2020 8
• Two boys wearing white shirts on playground swings
• Two young men, are on a swing and yell, outdoors.

Results (Mean Spearman's Rank Correlation Scores )
• 14 teams registered
• 9 teams submitted 28 runs
• 8 papers
• Spearman’s rank correlation
15/12/2020 MediaEval2020 9
Short-term Long-term
Spearman Pearson MSE Spearman Pearson MSE
Mean 0.058 0.066 0.013 0.036 0.043 0.051
Variance 0.002 0.002 0.000 0.002 0.001 0.000

15/12/2020 MediaEval2020 10
Spearman Pearson MSE Spearman Pearson MSE
CUC_DMT run1-required 0.06 0.055 0.01 0.049 0.05 0.05
run1-required 0.054 0.044 0.01 0.113 0.121 0.05
run2-required 0.05 0.072 0.01 0.059 0.071 0.05
run3-required - - - 0.109 0.119 0.05
run4-required 0.076 0.092 0.01 0.041 0.058 0.05
memento10k 0.137 0.13 0.01 - - -
DCU@ML-Labsrun1-required 0.034 0.078 0.1 -0.01 0.022 0.09
HSV-Run1 0.042 0.042 0.01 0.032 0.016 0.05
RGB-Run2 -0.003 -0.026 0.01 0.043 0.042 0.04
RGB-Run3 -0.015 -0.012 0.01 0.032 0.037 0.04
RGB-HSV-Run4 -0.022 -0.001 0.01 -0.017 -0.012 0.04
Score-Run5 0.02 0.054 0.01 -0.054 -0.036 0.05
GTH-UPM run1-required 0.016 0.011 0.01 -0.041 -0.028 0.05
run0-required 0.007 0.029 0.01 0.028 0.033 0.05
run1-required -0.01 -0.019 0.01 0.012 0.021 0.05
run2-required 0.053 0.085 0.01 0.037 0.033 0.05
run3-required 0.05 0.053 0.01 0.014 0.017 0.05
run1-audiovisual 0.099 0.09 0.01 0.077 0.085 0.06
run2-vilbert 0.098 0.085 0.01 -0.017 0.011 0.06
run3-text 0.073 0.091 0.01 0.019 0.049 0.06
run4-all-SLT 0.101 0.09 0.01 0.078 0.085 0.06
run5-all-required 0.101 0.09 0.01 0.067 0.066 0.05
run1-required 0.136 0.145 0.01 0.012 0.012 0.05
run7 0.102 0.127 0.01 0.056 0.059 0.04
run8 0.091 0.095 0.01 0.077 0.068 0.05
run9 0.085 0.124 0.01 0.044 0.048 0.05
run42 0.116 0.144 0.01 0.076 0.069 0.05
MMSys run 0.007 0.01 0.01 0.048 0.032 0.05
MG-UCB
Team Run
Short-term Long-term
DCU-Audio
Essex-NLIP
KT-UPB
MeMAD
Results (Official Results on Test-set for Teams’ all runs)

Results (Official Results on Test-set for Teams’ best runs–Short-term)
15/12/2020 MediaEval2020 11
DCU-Audio memento10k 0.137 Audio Gestalt => Multimodal Deep Learning-based Late Fusion (Momento10K)
MG-UCB run1-required 0.136 Visual, Audio, Textual, Visiolinguistic Features=> Weighted Average
MeMAD run4-all-SLT and run5-all-required
0.101 Visual, Audio, Textual =>SVR , BR, GRU => Weighted Late Fusion
CUC_DMT run1-required 0.06 Multi-level Encoding and Captions=> Gradient Boosting, Random Forest, Neural Network
KT-UPB run2-required 0.053 C3D => Random Forest
Essex-NLIP HSV-Run1 0.042 HSV => Random Forest
DCU@ML-Labs run1-required 0.034 C3D => SemNET (Momento10K)
GTH-UPM run1-required 0.016 Multimodal Late Fusion of Self-Attention => SVR => Bidirectional LSTM
MMSys run 0.007 -
Team Run Approach
Short-term

Results (Official Results on Test-set for Teams’ best runs–Long-term)
15/12/2020 MediaEval2020 12
DCU-Audio run1-required 0.113 Audio Gestalt => Multimodal Deep Learning-based Late Fusion (Momento10K)
MeMAD run4-all-SLT 0.078 Visual, Audio, Textual, Visiolinguistic Features=> Weighted Average
MG-UCB run8 0.077 Visual, Audio, Textual =>SVR , BR, GRU => Weighted Late Fusion
CUC_DMT run1-required 0.049 Multi-level Encoding and Captions=> Gradient Boosting, Random Forest, Neural Network
MMSys run 0.048 -
Essex-NLIP RGB-Run2 0.043 RGB => Random Forest
KT-UPB run2-required 0.037 C3D => Random Forest
DCU@ML-Labs run1-required -0.01 C3D => SemNET (Momento10K)
GTH-UPM run1-required -0.041 Multimodal Late Fusion of Self-Attention => SVR => Bidirectional LSTM
Team Run Approach
Long-term

Conclusion
• Short-term memorability – better results
• Long-term memorability – results slightly lower
• The best results:
• DCU-Audio (0.137; 0.113)
• MG-UCB (0.136; 0.77)
• MeMAD (0.101; 0.078)
• Audio and captions
• Fusion
• Deep learning techniques
• More annotations
15/12/2020 MediaEval2020 13

Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a Video Memorable?

More Related Content

Similar to Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a Video Memorable? (20)

More from multimediaeval (20)

Recently uploaded (20)

Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a Video Memorable?