CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

Copyright©2014 NTT corp. All Rights Reserved.
CVPR2014 reading
“Reconstructing storyline graphs for image
recommendation from web community photos”
Akisato Kimura <akisato@ieee.org> [@_akisato]

1
1-page summary
• Creating a storyline graph from a set of photo sequences
(and optionally friendship graphs) for a topic of interest.
• A photo sequence
= list[ zip( photos, time stamps ) ], created by a single user.
• A storyline
= a series of events with chronological or causal relations,
represented by a directed graph.

2
Why not storylines? (1)
• Many topics of interest consist of a sequence of
activities or events repeated across photo streams.
 Independence day = marathon race (1,2) + parades (3-6) +
barbeque + fireworks (8-9)

3
Why not storylines? (2)
• A storyline can characterize various branching
narrative structure associated with the topic.
 A single photo stream = a linear thread of story by a user.
 Its aggregation reveals underlying big pictures.

4
Related work by the 1st author
CVPR14 oral
CVPR14
CVPR13 oral
WSDM13
KDD12
ECCV10
+ another line of research: WSDM14, CVPR12 oral, ICCV11 oral, NIPS09, CVPR08 oral

5
ECCV10 paper
Generating a sparse similarity network of web images &
associated time stamps
• The method is simple: connecting temporally close & visually
similar images
• It reveals subtopic outbreaks and evolutions.

6
KDD12 paper
Modeling an image stream with a point process
• This enables us to predict what images are likely to appear
at a future time point by extrapolating the image stream

7
WSDM13 paper
Modeling an image stream with point processes &
developing a regularized multi-task regression
• For retrieving relevant and temporally suitable images for a
given word, time point and optionally user information.

8
CVPR13 paper
Aligning and segmenting multiple web photo streams
for inferring storylines

9
CVPR14 paper
• Creating a storyline graph from photo streams
• Segmentation in CVPR13 seems redundant.
• Image clustering might be sufficient for representing
subtopics, as shown in KDD12 & WSDM13 papers.

10
Another CVPR14 paper
A set of videos is useful for creating a storyline graph
• Videos convey temporal smoothness between frames, which is
often missing in photo streams.

11
Problem definition
[ Input ] A set of photo streams
The set of photo streams 𝑷𝑷 = 𝑃𝑃1, … , 𝑃𝑃𝐿𝐿
A photo stream 𝑃𝑃𝑙𝑙 = 𝑝𝑝1
𝑙𝑙
, … , 𝑝𝑝𝐿𝐿𝑙𝑙
𝑙𝑙
,
taken by a single person within a period of time [0, 𝑇𝑇]
A photo 𝑝𝑝𝑗𝑗
𝑙𝑙
= (𝑥𝑥𝑗𝑗
𝑙𝑙
, 𝑡𝑡𝑗𝑗
𝑙𝑙
) ,
a pair of an image descriptor and a time stamp.
[ Output ] A storyline graph
The storyline graph 𝑮𝑮 = (𝑶𝑶, 𝑬𝑬)
Each node in 𝑶𝑶 = an image cluster.
Edges 𝑬𝑬 = 𝑬𝑬𝑡𝑡
𝑡𝑡 smoothly change over time.
Each edge 𝑬𝑬𝑡𝑡 is represented by an adjacency matrix 𝑨𝑨𝑡𝑡.

12
Storyline graphs in detail
• Why image clusters for nodes?
 Images are too many, much of them are redundant.
• Edges should be sparse and time-varying
 Time-varying: popular transitions smoothly change over
time
timeline
At 12PM
At 7PM
t = 10AM t = 12PM t = 2PM

13
Image encoding
4 different image (global) descriptors
• [SIFT] 3-level spatial pyramid histograms for HSV color SIFT
• [HOG2x2] 3-level spatial pyramid histograms for HOG.
• [Tiny] 32x32 TinyImages.
• [Scene] SUN397 detector outputs.
Constructing image clusters by K-means (K=600)
+ assigning 𝑐𝑐-NN clusters with Gaussian weighting
• In the case of [Scene], top-𝑐𝑐 detector outputs are used.
• Each descriptor 𝑥𝑥𝑗𝑗
𝑙𝑙
has at most 4𝑐𝑐 non-zero components.

14
Modeling photo streams
Introducing several practical assumptions
All the photo streams are taken independently of one another.
Every photo stream obeys 1st-order Markovians.
𝑓𝑓 𝒙𝒙𝑗𝑗
𝑙𝑙
, 𝑡𝑡𝑗𝑗
𝑙𝑙
𝒙𝒙𝑗𝑗−1
𝑙𝑙
, 𝑡𝑡𝑗𝑗−1
𝑙𝑙
= � 𝑓𝑓(𝑥𝑥𝑗𝑗,𝑑𝑑
𝑙𝑙
, 𝑡𝑡𝑗𝑗
𝑙𝑙
|𝒙𝒙𝑗𝑗−1
𝑙𝑙
, 𝑡𝑡𝑗𝑗−1
𝑙𝑙
)
𝐷𝐷
𝑑𝑑=1
All the elements in a descriptor are conditionally independent
one another given the previous descriptor.

15
Modeling a storyline
A simple linear model for
Encoding temporal transitions into 𝑨𝑨𝑒𝑒
The log likelihood (for stationary A)
To be minimized

16
Optimization
A simple least squares if 𝑨𝑨𝑡𝑡 is time-independent.
Introducing neighborhood selection [Meinshausen+ 2006]
Plus 𝑙𝑙1-regularization
Gaussian kernel for 𝑡𝑡𝑖𝑖 centered at 𝑡𝑡
Introducing sparsity into 𝑨𝑨𝑡𝑡

17
Incorporating additional information
Strategy : introducing a product kernel
1. Original = neighborhood selection
2. If you’d customize the graph for a particular user 𝑢𝑢𝑞𝑞
3. If you’d introduce seasonal trends
𝑠𝑠𝑞𝑞 = 𝑠𝑠(𝑚𝑚𝑞𝑞) : A function of months to seasons

18
Image recommendation with storylines
2 typical tasks for sequential image prediction
1. Given an image sequence, predict K next likely images
2. Given two parts of temporary distant image sequences,
estimate the most likely path between them
A state space model would be helpful for those tasks
(remember, )
1. Applying the forward algorithm.
2. Exploiting the forward-backward algorithm with EM.
1. 2.

19
Experiments
1. Evaluating reconstructed storyline graphs
via user studies with AMT.
2. Quantitatively comparing the performance
for the 2 types of image prediction tasks.
a. Predicting next likely images.
b. Filling in missing parts of a photo stream.
[Baseline]
1. PageRank-based image retrieval (details missing)
2. HMM for modeling photo sequences
3. Clustering-based summarization

20
Dataset
3.3M Flickr images of 42K photo streams for 24 classes
The friendship graph was indirectly built from group information
(The edge weight indicates the number of groups that both users join together).

21
Scheme for evaluations
[ Basic idea ] Let each turker to compare tuples of
images representing the storyline graphs.
1. Each algorithm generates storyline per topic.
2. Sample 100 standard images as test instances.
3. Each algorithm predicts next most-likely image after the test
instance.
4. [ Turker task (>3 turners per test image)]
✔ Our method
Baseline 2Test image
B
A
A crowd of human
subjects evaluate
only a basic unit (i.e.
important edge of
storyline).

22
Evaluating storyline graphs
Better than baselines (HMM, PageRank & Clustering).
𝐼𝐼𝑞𝑞 𝐼𝐼𝑒𝑒
𝐼𝐼𝑞𝑞 𝐼𝐼𝑒𝑒 𝐼𝐼𝑞𝑞 𝐼𝐼𝑒𝑒
[66.5, 67.5, 69.4] over (HMM), (Page), (Clust)

23
Setting: Image prediction tasks (1)
• The “future prediction” task
Method 1
estimates
Hidden
Groundtruth
23
Procedures
Training (80%)
Build storyline graph
Task (I): Given a short sequence of test PS,
predict next likely images
Measure
similarity!
? ? ? ? ?

24
Setting: Image prediction tasks (2)
• The “filling in gaps” task
Procedures
Training (80%)
Build storyline graph
Method 1
estimates
Hidden
GT
24
Task (II): Given a pair of distant sequences,
fill in missing parts
? ? ? ? ?
Measure
similarity!

25
Performance measured by PSNR
Future prediction - Personalized
Future prediction - Normal [9.60, 8.99, 8.86, 8.75]
[9.53, 9.01, 8.85, 8.75]
Filling in gaps - Personalized
Filling in gaps - Normal [9.70, 8.97, 8.89, 8.96]
[9.57, 9.05, 8.87, 8.93]

CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

More Related Content

What's hot (19)

Viewers also liked (20)

Similar to CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos" (20)

More from Akisato Kimura (20)

Recently uploaded (20)

CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"