SED2012 Dataset

The 2012 Social Event Detection Dataset
Symeon Papadopoulos1, Emmanouil Schinas1, Vasileios Mezaris1,
Raphaël Troncy2, Yiannis Kompatsiaris1

1
CERTH-ITI, Thessaloniki, Greece
2
EURECOM, Sophia Antipolis, France

Oslo, 28 Feb - 1 Mar 2013

SED2012 Overview
• Large collection (>160K) of CC-licensed Flickr
photos and some of their metadata
• Event annotations for 149 target events (of
specific categories and locations of interest)

• Primary use: Social event detection
– Used in the context of MediaEval 2012 (SED task)
• Secondary uses: image geotagging,
distractors in CBIR, city summarization
2

Dataset Overview
Flickr photo collection
• 167,332 photos
• 4,422 unique contributors
• Creative Commons licenses

Event Annotations
• Challenge 1: Technical events in Germany
• Challenge 2: Soccer events in Hamburg and Madrid
• Challenge 3: Indignados movement events in Madrid

3

Data Collection Process
• Flickr API: http://www.flickr.com/services/api/
• Used method flickr.photo.search with five
geographical centres:
Barcelona, Cologne, Hamburg, Hannover, Madrid
• Time period: Jan 2009 – Dec 2011
• All photos CC licensed
• 403 photos from the
EventMedia collection
R. Troncy, B. Malocha, and A. Fialho. Linking Events with Media. In 6th Intern.
Conference on Semantic Systems (I-SEMANTICS), Graz, Austria, 2010

4

Photo Distribution
Place distribution

Yearly distribution

Language distribution

5

Dataset Collection Motivation
Selection of five cities (three German, two Spanish):
• Include large number of non-English text metadata (cf.
language distribution table)
• Ensure existence of numerous events for the target types
• Include distractor images:
– Challenge 2: Cologne, Hannover distractor for Hamburg, Barcelona
distractor for Madrid
– Challenge 3: Barcelona distractor for Madrid
Selection of only geotagged photos:
• Ease of annotation
Selection of only CC-licensed photos:
• Reuse of collection for research

6

Tag Statistics (1/2)
number of users using the tag

51,611 unique tags

prevalence of
location specific tags

event-specific tags

7

Tag Statistics (2/2)
barcelona
>20K photos have no tags spain
madrid

>57% of tags appear
once or twice

83.9% less than or equal to 10 tags >40K tags appear less than 10 times

8

User Statistics

60% of users less
than 10 photos

30 most active users contribute ~30% of dataset
9

Ground Truth Creation
• Manual annotations by use of CrEve
– web-based annotation
– two-round annotation by five annotators (three in the
first, two in the second)
– interactive annotation (search & annotate)
– each round terminated as soon as no new event-related
photos discovered
– approximate effort: 100 person-hours
C. Zigkolis, S. Papadopoulos, G. Filippou, Y. Kompatsiaris, A. Vakali. Collaborative Event
Annotation in Tagged Photo Collections. Multimedia Tools & Applications, 2012

• Annotations for Challenge 1 enriched by EventMedia
(403 photos featuring technical events in Germany)
10

Ground Truth Statistics (1/3)

10 events related
with >100 photos

~27% of events associated
with 1 or 2 photos

11

106 events are captured by
single users
erroneous timestamps in photos

9 events captured by more The majority of events last for less
than 10 people than a day (typical for soccer)
12

Madrid events

Santiago Bernabeu
stadium Puerta del Sol

Stadium of Butarque

Vicente Calderon stadium
13

Technical Event Examples
PHP Unconf. 2010 Gamescom 2009

CeBIT 2010 Convention Camp 2011

14

Soccer Event Examples
Real Madrid – Milan (2010) World Cup 2010

St. Pauli – HSV (2010) Spain – Colombia (2011)

15

Indignados Event Examples
Inaugural march, 15 May Large gathering, 20 May

Gathering, 15 Oct Demonstration, 17 Nov

16

Evaluation
• F-measure (macro), Precision, Recall
– goodness of retrieved photos, but not how well
they were clustered into events
• Normalized Mutual Information (NMI)
– compares automatically extracted clustering of
photos into events with the ground truth
• Evaluation script is made available together
with the dataset.
• Implementation of event detection available:
http://mklab.iti.gr/project/sed2012_certh
17

Questions
@sympapadopoulos
www.slideshare.net/sympapadopoulos

SED2012 Dataset

More Related Content

Similar to SED2012 Dataset (20)

More from Symeon Papadopoulos (20)

Recently uploaded (20)

SED2012 Dataset

Editor's Notes