LinkedTV @ MediaEval 2013 Search and Hyperlinking Task

Television Linked To The Web

LinkedTV @ MediaEval
Search and Hyperlinking
M. Sahuguet1, B. Huet1, B. Cervenková2, E. Apostolidis4, V. Mezaris4, D. Stein3,
S. Eickeler3, J.L. Redondo Garcia1, R. Troncy1, and L. Pikora2
MediaEval 2013 Workshop
Barcelona, Catalunya, Spain, 18-19 October 2013.
(1)

(2)

www.linkedtv.eu

(3)

(4)

LinkedTV ― Television Linked To the Web
www.linkedtv.eu

LinkedTV: interweaving Web and
TV into a single experience
Second screen scenario for
enriching television content and
achieving interaction between
user and content

Web: http://www.linkedtv.eu
2

LinkedTV @ MediaEval Search and Hyperlinking 2013

10/18/2013

LinkedTV@MediaEval
www.linkedtv.eu

 MediaEval Search & Hyperlinking:
an overview of LinkedTV’s enrichment process









Brainstorming
Pre-processing (BBC dataset)
Video segmentation
Indexing data in Lucene
From visual cues to detected concepts
Search task
Hyperlinking task
Conclusion

3


10/18/2013

Brainstorming
www.linkedtv.eu

 Brainstorming meeting: Tasks and Dataset analysis

Shots are too small to return to user
Typos in the queries
Duplicate videos in the dataset
Visual concepts are not usable as such
Visual cues may not be helpful
Visual cues can also help as search terms
Maybe we can segment the videos differently?
Can we use speaker information?
Name of show/channel may appear in the query
Actors/Character names may appear
What analysis can we further apply on videos?

4


10/18/2013

Brainstorming
www.linkedtv.eu

 Brainstorming meeting: Tasks and Dataset analysis
 Search:



Getting the right video is possible
Need to extract segment with good timing

 Segmentation level is of major importance


Shot are too short



We want to be as close as possible to the viewer

 Visual cues: not always helpful
<visualQueues>2 men sitting opposite each other</visualQueues>
<visualQueues>stands out and grabs your attention</visualQueues>

 Need to design a framework to use Visual Cues

 How can the LinkedTV media analysis tools be used?

5


10/18/2013

Pre-processing dataset
www.linkedtv.eu

 Processing ~ 1697h of BBC video data

Visual Concept detection (151)

20 days on 100 cores

Scene segmentation

CERTH

2 days on 6 cores

OCR

Fraunhofer

1 day on 10 cores

Keywords extraction

Fraunhofer

5 hours

Named Entities extraction

Eurecom

4 days

Face detection and tracking

6

CERTH

Eurecom

4 days on 160 cores


10/18/2013

Video Segmentation
www.linkedtv.eu

 Shots (provided by Task Organisers)
 Scenes: groups of adjacent shots




Visual similarity
Temporal consistency
P. Sidiropoulos, V. Mezaris, I. Kompatsiaris, H. Meinedo, M. Bugalho, and I.
Trancoso. Temporal Video Segmentation to Scenes Using High-Level
Audiovisual Features. IEEE Transactions on Circuits and Systems for Video
Technology, 2011

 Sliding windows:


7

inspired from M. Eskevich, G. Jones, C. Wartena, M. Larson, R. Aly, T.
Verschoor, and R. Ordelman. Comparing retrieval effectiveness of
alternative content segmentation methods for Internet video search. 10th
International Workshop on Content-Based Multimedia Indexing (CBMI), 2012


10/18/2013

Indexing data in Lucene
www.linkedtv.eu

 Lucene engine for indexing the data
 Index at different temporal granularities:


Video level (pre-filtering)



Scenes level



Shot level



Sliding windows segments level

 Index different features at each temporal granularity:


Text (transcripts, subtitles)



Metadata (title, synopsis, cast, etc)



OCR



Visual concepts values (floating point fields)

 Design a framework for querying indexes and returning video segments
from a query
8


10/18/2013

www.linkedtv.eu

 Text search is straightforward (default, TF-IDF values)
 Need to incorporate visual information to the search

9


10/18/2013

www.linkedtv.eu

 Which concepts are present in the query?
 semantic word distance based on Wordnet synset
 mapping between keywords (extracted from the visual cues query)
and visual concepts
<visualQueues>animals, kenya wildlife reserve, marathon</visualQueues>
mapped visual concepts: Athlete, Dogs, Horse, Animal

10


10/18/2013

www.linkedtv.eu

and visual concepts

 Integration of detected visual concepts to the Lucene search:
 Concepts filtering

11


10/18/2013

www.linkedtv.eu

 mapping between keywords (extracted first results:
- Correct detection rate from the 100 from the visual cues query)
and visual concepts 0,5
- threshold at
- Normalize confidence: threshold at 0,7

 Concepts filtering

12


10/18/2013

www.linkedtv.eu

and visual concepts

 Concepts Selection
 Designing an enriched query: both textual (text query) and visual
information (range query).

13


10/18/2013

Search task
www.linkedtv.eu

 Search videos at different temporal granularity
 Concatenation of textual and visual query for text search


<queryText>Odd cars, Fake MacLaren, </queryText>



<visualQueues>Jeremy Clarkson, Richard Hammond, James May, Ferrari 430
Scuderia</visualQueues>

 Visual cues can be found in queryText too

 If TV Channel is mentioned, perform filtering:


<visualQueues>Cannabis on BBC ONE</visualQueues>



Should also be done on show titles (for next year?)

 For some runs, filter at video level first


Making a text query on the video index



Use 20 first video for segment search

 Focused search
14


10/18/2013

Search task
www.linkedtv.eu

 Different granularities:





scenes
partial scenes (begin at shot ; ends at the corresponding scene ending)
temporally clustered shots (inside a video)
sliding window

 Different textual data (transcript/ASR)
 With/Without Visual Concepts
 With/Without use of synonyms
 9 runs
 goal : comparing approaches and features

15


10/18/2013

Search task – Results
www.linkedtv.eu

MASP

scenes-C

0.3095

0.1770

0.1951

0.3091

0.1767

0.1947

0.3152

0.1635

0.2021

scenes-I

0.2613

0.1444

0.1582

scenes-U

0.2458

0.1344

0.1528

0.2284

0.1241

0.1024

part-scenes-noC

0.2281

0.1240

0.1021

clustering-C

0.2929

0.1525

0.1814

clustering-noC

0.2849

0.1479

0.1713

SW-60-S

0.2833

0.1925

0.2027

SW-60-I

0.1965

0.1206

0.1204

SW-40-U

16

mGAP

part-scenes-C

Search over
sliding window
segments (size
60)

MRR

scenes-S
Scene search
using only
subtitles

Run
scenes-noC

Scenes search
using textual and
visual concepts

0.2368

0.1342

0.1501


10/18/2013

Hyperlinking Task
www.linkedtv.eu

 Re-use of the search component



Shot clustering approach
Scene approach

 Create a query from the anchor!




Get subtitle and shots aligned with anchor
Text query: extract keywords using Alchemy API (highest weight to anchor
than context)
Visual cues query: for each concept, highest score over all shots

 Use of “MoreLikeThis” (MLT) feature in Lucene, combined with THD


sliding window approach

 Create temporary documents from the anchor!



17

THD = Targeted Hypernym Discovery (UEP): returns semantic
annotation, synonyms
MLT: finding similar documents as input


10/18/2013

Hyperlinking results
www.linkedtv.eu

Run

18

P-10

P-20

0.0577

0.4467

0.3200

0.2067

LA SW MLT

0.1201

0.4200

0.4200

0.3217

LA scenes

0.1770

0.6867

0.5867

0.4167

LC clustering 0.0823

Scenes search in
LC condition
(anchor + context)

P-5

LA clustering
Scenes search in
LA condition
(anchor only)

MAP

0.5733

0.4833

0.2767

LC SW MLT

0.1820

0.5667

0.5667

0.4300

LC scenes

0.2523

0.8133

0.7300

0.5283


10/18/2013

Conclusions
www.linkedtv.eu

 Major findings
 Scene segmentation approach performs best
 Improvement when using visual concepts
 when carefully employed

 Future work
 Improve scene detection
 Closer follow human perception
 Improve the link between query and visual concepts
 Use named entities

Thank you
Questions?
19


10/18/2013

LinkedTV @ MediaEval 2013 Search and Hyperlinking Task

More Related Content

Similar to LinkedTV @ MediaEval 2013 Search and Hyperlinking Task (20)

More from Benoit HUET (9)

Recently uploaded (20)

LinkedTV @ MediaEval 2013 Search and Hyperlinking Task

Editor's Notes