SlideShare a Scribd company logo
TVSum: Summarizing Web Videos Using Titles
PRESENTED BY:
NEERAJ BAGHEL
M.TECH CSE II YR
2015 IEEE Conference on Computer Vision and Pattern Recognition
At Hynes Convention Center in Boston, Massachusetts.
Yale Song, Jordi Vallmitjana, Amanda Stent, Alejandro Jaimes
Yahoo Labs, New York
1
OUTLINE
• INTRODUCTION
• OBSTACLES IN VIDEO SUMMARIZATION
• DATASET
• TVSum Framework
• TVSum50 Benchmark Dataset
• EXPERIMENTAL RESULTS
• CONCLUSION
• REFERENCES
2
1/23
INTRODUCTION
Video
• Video data is a great asset for information extraction and
knowledge discovery.
• Due to its size an variability, it is extremely hard for users to
monitor.
video summarization
• Intelligent video summarization algorithms allow us to
quickly browse a lengthy video by capturing the essence
and removing redundant information.
3
1/23
OBSTACLES IN VIDEO SUMMARIZATI
ON
• Which part of a video is important
• Images irrelevant to video content
• Author present TVSum, an unsupervised video
summarization framework that uses title-based image
search results to find visually important shots
• Author developed a novel co-archetypal analysis technique
that learns canonical visual concepts shared between
video and images
4
1/23
DATASET
 Author introduce a new benchmark dataset, TVSum50, that
contains 50 videos and their shot-level importance scores
annotated via crowd sourcing.
 Experimental results on two datasets, SumMe and
TVSum50,
5
1/23
TVSum50 Benchmark Dataset
 Title-based video summarization is a relatively
unexplored domain; there is no publicly available dataset
suitable for our purpose.
 Author therefore collected a new dataset,TVSum50, that
contains 50 videos and their shot-level importance scores
obtained via crowdsourcing.
6
1/23
1. changing Vehicle Tire (VT)
2. getting Vehicle Unstuck(VU)
3. Grooming an Animal (GA)
4. Making Sandwich (MS),
5. ParKour (PK)
6. PaRade (PR)
7. Flash Mob gathering (FM)
8. Bee-Keeping (BK)
9. Attempting Bike Tricks (BT)
10. Dog Show (DS).
7
Figure 2. TVSum50 dataset contains 50 videos collected 10 categories
TVSum50 Benchmark Dataset (contd.)
1/23
Video Data Collection
 Author selected 10 categories from the TRECVid
Multimedia Event Detection (MED) task and collected 50
videos (5 per category) from YouTube using the category
name as a search query term.
 From the search results, we chose videos using the
following criteria:
(i) under the Creative Commons license;
(ii) duration is 2 to 10 minutes.
(iii)contains more than a single shot.
(iv) its title is descriptive of the visual topic in the video.
8
1/23
Web Image Data Collection
 In order to learn canonical visual concepts from a video
title.
 Author need a sufficiently diverse set of images [15].
Unfortunately,the title itself can sometimes be too specific
as a query term [29].
9
1/23
Figure 3. Chronological bias in shot importance labeling.
Shown here is the video “Statue of Liberty” from the SumMe
dataset [20].(a) from the SumMe dataset, (b) collected by
us using our annotation interface.
10
1/23
TVSum Framework
This framework consists of four modules:
 Shot segmentation.
 canonical visual concept learning
 shot importance scoring
 summary generation
11
1/23
Shot Segmentation
 Shot segmentation is a crucial step in video
summarizationfor maintaining visual coherence within
each shot, which in turn affects the overall quality of a
summary
12
1/23
Canonical Visual Concept
Learning
 Author define canonical visual concepts as the patterns
shared between video X and its title-based image search
results Y, and represent them as a set of p latent
variables Z = [z1, ・ ・ ・ , zp] ∈ Rd×p, where p ≪ d (in
this experiments,p=200 and d=1,640).
13
1/23
Shot Importance Scoring
 Author first measure frame-level importance using the
learned factorization of X into XBA. Specifically, we
measure the importance of the i-th video frame xi by
computing the total contribution of the corresponding
elements of BA in reconstructing the original signal X,
that is,
14
1/23
Summary Generation
 To generate a summary of length l, Author solve the
following optimization problem:
where s is the number of shots, vi is the importance
score of the i-th shot, and wi is the length of the i-th shot.
15
1/23
Baseline Models
Auhtor compared our summarization approach to 8
baselines covering a variety of different methods:-
 Sampling (SU and SR): A summary is generated by selecting
shots either uniformly (SU) or shots either randomly (SR) such
that the summary length is within the length budget l.
 Clustering (CK and CS): Author tested two clustering
approaches: k-means (CK) and spectral clustering (CS).
16
1/23
Baseline Models
 LiveLight (LL): summary is generated by removing redundant
shots over time, measuring redundancy using a dictionary of shots
updated online. we selected shots with the highest reconstruction
errors that fit in the length budget l.
 Web Image Prior [25] (WP): As in [25], Author defined
100.positive and 1 negative classes, using images from other
videos in the same dataset as negative examples.
 Archetypal Analysis [11] (AA1 and AA2): Author include two
versions of archetypal analysis: one that learns archetypes from
video data only (AA1), and another that uses a combination of
video and image data (AA2).
17
1/23
EXPERIMENTAL
RESULTS(contd.)
18
1/23
Table 2. Experimental results on our TVSum50 dataset.
Numbers show mean pairwise F1 scores.
CONCLUSIONS
 Auhtor presented TVSum, an unsupervised video
summarization framework that uses the video title
to find visually important shots.
19
1/23
REFERENCES
 [1] M. Basseville, I. V. Nikiforov, et al. Detection of abrupt changes:theory and application, volume
104. Prentice Hall Englewood Cliffs,1993. 2, 3
 [2] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse
problems. SIAM Journal on Imaging Sciences, 2(1), 2009. 3
 [3] V. Berger. Selection bias and covariate imbalances in randomized clinical trials, volume 66.
John Wiley & Sons, 2007. 6
 [4] K. Bleakley and J.-P. Vert. The group fused lasso for multiple change-point detection. arXiv
preprint arXiv:1106.4199, 2011. 2,3
 [5] A. Borji and L. Itti. State-of-the-art in visual attention modeling. PAMI, 35(1), 2013. 2
 [6] A. Bosch, A. Zisserman, and X. Mu˜noz. Image classification using random forests and ferns.
In ICCV, 2007. 7
 [7] A. Bosch, A. Zisserman, and X. Mu˜noz. Representing shape with a spatial pyramid kernel. In
CIVR, 2007. 7
 [8] C. Carpineto and G. Romano. A survey of automatic query expansion in information retrieval.
ACM CSUR, 44(1), 2012. 5
 [9] F. Chen and C. De Vleeschouwer. Formulating team-sport video summarization as a resource
allocation problem. CSVT, 21. 2
20
1/23
REFERENCES (contd.)
 [10] J. Chen, Y. Cui, G. Ye, D. Liu, and S.-F. Chang. Event-driven semantic concept discovery by
exploiting weakly tagged internet images. In ICMR, page 1, 2014. 2
 [11] Y. Chen, J. Mairal, and Z. Harchaoui. Fast and robust archetypal analysis for representation
learning. In CVPR, 2014.
 [12] Y. Cong, J. Yuan, and J. Luo. Towards scalable summarization of consumer videos via
sparse dictionary selection. IEEE Multimedia,14(1), 2012.
 [13] A. Cutler and L. Breiman. Archetypal analysis. Technometrics, 36(4), 1994.
 [14] S. K. Divvala, A. Farhadi, and C. Guestrin. Learning everything about anything: Webly-
supervised visual concept learning. In CVPR,2014. 8
 [15] L. Duan, D. Xu, and S.-F. Chang. Exploiting web images for event recognition in consumer
videos: A multiple source domain adaptation approach. In CVPR, 2012. 4
 [16] N. Ejaz, I. Mehmood, and S. Wook Baik. Efficient visual attention based framework for
extracting key frames from videos. Signal Processing: Image Communication, 28(1), 2013. 2, 7,
8
 [17] A. Ekin, A. M. Tekalp, and R. Mehrotra. Automatic soccer video analysis and summarization.
Image Processing, IEEE Transactions on, 12(7), 2003. 2
21
1/23
REFERENCES (contd.)
 [18] S. Feng, Z. Lei, D. Yi, and S. Z. Li. Online content-aware video condensation. In CVPR, 2012.
2
 [19] S. Fidler, A. Sharma, and R. Urtasun. A sentence is worth a thousand pixels. In CVPR, 2013.
2
 [20] M. Gygli, H. Grabner, H. Riemenschneider, and L. V. Gool. Creating summaries from user
videos. In ECCV, 2014. 2, 4, 5, 6, 7, 8
 [21] M. Gygli, H. Grabner, H. Riemenschneider, F. Nater, and L. V. Gool. The interestingness of
images. In ICCV, 2013. 2
 [22] Z. Harchaoui and C. L´ evy-Leduc. Multiple change-point estimation with a total variation
penalty. Journal of the American Statistical Association, 105(492), 2010. 3
 [23] G. Hripcsak and A. S. Rothschild. Agreement, the f-measure, and reliability in information
retrieval. JAMIA, 12(3), 2005. 6
 [24] Y. Jia, J. T. Abbott, J. Austerweil, T. Griffiths, and T. Darrell. Visual concept learning:
Combining machine vision and bayesian generalization on concept hierarchies. In NIPS, 2013. 8
 [25] A. Khosla, R. Hamid, C. Lin, and N. Sundaresan. Large-scale video summarization using
web-image priors. In CVPR, 2013.
22
1/23
REFERENCES (contd.)
 [26] G. Kim, L. Sigal, and E. P. Xing. Joint summarization of large-scale collections of web
images and videos for storyline reconstruction. In CVPR, 2014. 2
 [27] Y. J. Lee, J. Ghosh, and K. Grauman. Discovering important people and objects for
egocentric video summarization. In CVPR, 2012. 2,
 [28] L. Li, K. Zhou, G.-R. Xue, H. Zha, and Y. Yu. Video summarization via transferrable
structured learning. In WWW, 2011. 1
 [29] D. Lin, S. Fidler, C. Kong, and R. Urtasun. Visual semantic search: Retrieving videos via
complex textual queries. In CVPR, 2014. 2, 4,8
 [30] D. Liu, G. Hua, and T. Chen. A hierarchical visual model for video object summarization.
PAMI, 32(12), 2010. 2
 [31] X. Liu, Y. Mu, B. Lang, and S.-F. Chang. Mixed image-keyword query adaptive hashing over
multilabel images. TOMCCAP, 10(2),2014. 2
 [32] Z. Lu and K. Grauman. Story-driven summarization for egocentric video. In CVPR, 2013. 2,
5, 6
 [33] Y.-F. Ma, L. Lu, H.-J. Zhang, and M. Li. A user attention model for video summarization. In
ACM MM, 2002. 2
23
1/23
24

More Related Content

PDF
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...
PDF
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...
PDF
Fractal Compression of an AVI Video File using DWT and Particle Swarm Optimiz...
PDF
Shift Invariant Ear Feature Extraction using Dual Tree Complex Wavelet Transf...
PDF
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
PDF
Performance Analysis of Digital Watermarking Of Video in the Spatial Domain
PDF
Investigations on the role of analysis window shape parameter in speech enhan...
PPTX
Single image haze removal
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...
Fractal Compression of an AVI Video File using DWT and Particle Swarm Optimiz...
Shift Invariant Ear Feature Extraction using Dual Tree Complex Wavelet Transf...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Performance Analysis of Digital Watermarking Of Video in the Spatial Domain
Investigations on the role of analysis window shape parameter in speech enhan...
Single image haze removal

What's hot (18)

PDF
A robust watermarking algorithm based on image normalization and dc coefficients
PDF
P180203105108
PDF
Efficient video indexing for fast motion video
PDF
PREVENTING COPYRIGHTS INFRINGEMENT OF IMAGES BY WATERMARKING IN TRANSFORM DOM...
PDF
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
PDF
Can Generative Adversarial Networks Model Imaging Physics?
PPTX
Predicting Media Memorability with Audio, Video, and Text representations
PDF
An Approach for Image Deblurring: Based on Sparse Representation and Regulari...
PDF
image segmentation
PDF
Thesis
PDF
Image compression and reconstruction using a new approach by artificial neura...
PDF
RECOGNITION OF RECAPTURED IMAGES USING PHYSICAL BASED FEATURES
PDF
M.sc.iii sem digital image processing unit i
PPT
Multimodal Analysis for Bridging Semantic Gap with Biologically Inspired Algo...
PDF
AN EMERGING TREND OF FEATURE EXTRACTION METHOD IN VIDEO PROCESSING
PDF
M.sc.iii sem digital image processing unit iv
PDF
Optimized Implementation of Edge Preserving Color Guided Filter for Video on ...
PDF
MULTIMODAL BIOMETRICS RECOGNITION FROM FACIAL VIDEO VIA DEEP LEARNING
A robust watermarking algorithm based on image normalization and dc coefficients
P180203105108
Efficient video indexing for fast motion video
PREVENTING COPYRIGHTS INFRINGEMENT OF IMAGES BY WATERMARKING IN TRANSFORM DOM...
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Can Generative Adversarial Networks Model Imaging Physics?
Predicting Media Memorability with Audio, Video, and Text representations
An Approach for Image Deblurring: Based on Sparse Representation and Regulari...
image segmentation
Thesis
Image compression and reconstruction using a new approach by artificial neura...
RECOGNITION OF RECAPTURED IMAGES USING PHYSICAL BASED FEATURES
M.sc.iii sem digital image processing unit i
Multimodal Analysis for Bridging Semantic Gap with Biologically Inspired Algo...
AN EMERGING TREND OF FEATURE EXTRACTION METHOD IN VIDEO PROCESSING
M.sc.iii sem digital image processing unit iv
Optimized Implementation of Edge Preserving Color Guided Filter for Video on ...
MULTIMODAL BIOMETRICS RECOGNITION FROM FACIAL VIDEO VIA DEEP LEARNING
Ad

Similar to TVSum: Summarizing Web Videos Using Titles (20)

PPTX
Mtech Second progresspresentation ON VIDEO SUMMARIZATION
PPTX
M.tech Third progress Presentation
PPTX
Query focused video summarization
PPTX
Mtech Fourth progress presentation
PPTX
CA-SUM Video Summarization
PPTX
Mtech First progress PRESENTATION ON VIDEO SUMMARIZATION
PPTX
Hierarchical structure adaptive
PPTX
Summarizing videos with Attention
PPTX
Unsupervised object-level video summarization with online motion auto-encoder
PPTX
Enhancing Video Summarization via Vision-Language Embedding
PDF
Parking Surveillance Footage Summarization
PDF
PGL SUM Video Summarization
PDF
GAN-based video summarization
PPTX
Explaining video summarization based on the focus of attention
PDF
Icme2020 tutorial video_summarization_part1
PPTX
Semantic Summarization of videos, Semantic Summarization of videos
PPTX
Dataset and methods for 360-degree video summarization
PDF
SUMMARY GENERATION FOR LECTURING VIDEOS
PDF
Video Summarization for Sports
PPTX
video to text summarization using natural languyge proccesing
Mtech Second progresspresentation ON VIDEO SUMMARIZATION
M.tech Third progress Presentation
Query focused video summarization
Mtech Fourth progress presentation
CA-SUM Video Summarization
Mtech First progress PRESENTATION ON VIDEO SUMMARIZATION
Hierarchical structure adaptive
Summarizing videos with Attention
Unsupervised object-level video summarization with online motion auto-encoder
Enhancing Video Summarization via Vision-Language Embedding
Parking Surveillance Footage Summarization
PGL SUM Video Summarization
GAN-based video summarization
Explaining video summarization based on the focus of attention
Icme2020 tutorial video_summarization_part1
Semantic Summarization of videos, Semantic Summarization of videos
Dataset and methods for 360-degree video summarization
SUMMARY GENERATION FOR LECTURING VIDEOS
Video Summarization for Sports
video to text summarization using natural languyge proccesing
Ad

More from NEERAJ BAGHEL (9)

PPTX
Generating super resolution images using transformers
PPT
Latex intro
PPTX
Host rank:Exploiting the Hierarchical Structure for Link Analysis
PPTX
Traffic behavior of local area network based on
PPTX
A Framework For Dynamic Hand Gesture Recognition Using Key Frames Extraction
PPTX
Fingerprint recognition
PPT
Disk scheduling
PPTX
SMOWSER (A VOICE BASED BROWSER)
PPTX
Itvv project ppt
Generating super resolution images using transformers
Latex intro
Host rank:Exploiting the Hierarchical Structure for Link Analysis
Traffic behavior of local area network based on
A Framework For Dynamic Hand Gesture Recognition Using Key Frames Extraction
Fingerprint recognition
Disk scheduling
SMOWSER (A VOICE BASED BROWSER)
Itvv project ppt

Recently uploaded (20)

PDF
Analyzing Impact of Pakistan Economic Corridor on Import and Export in Pakist...
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
PPTX
Fundamentals of Mechanical Engineering.pptx
PPTX
Safety Seminar civil to be ensured for safe working.
PPT
Occupational Health and Safety Management System
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PPTX
Nature of X-rays, X- Ray Equipment, Fluoroscopy
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PDF
PPT on Performance Review to get promotions
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PDF
Soil Improvement Techniques Note - Rabbi
PPTX
Information Storage and Retrieval Techniques Unit III
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PDF
Visual Aids for Exploratory Data Analysis.pdf
PDF
Integrating Fractal Dimension and Time Series Analysis for Optimized Hyperspe...
Analyzing Impact of Pakistan Economic Corridor on Import and Export in Pakist...
Automation-in-Manufacturing-Chapter-Introduction.pdf
Fundamentals of safety and accident prevention -final (1).pptx
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
Fundamentals of Mechanical Engineering.pptx
Safety Seminar civil to be ensured for safe working.
Occupational Health and Safety Management System
Exploratory_Data_Analysis_Fundamentals.pdf
Nature of X-rays, X- Ray Equipment, Fluoroscopy
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PPT on Performance Review to get promotions
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
Soil Improvement Techniques Note - Rabbi
Information Storage and Retrieval Techniques Unit III
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
Visual Aids for Exploratory Data Analysis.pdf
Integrating Fractal Dimension and Time Series Analysis for Optimized Hyperspe...

TVSum: Summarizing Web Videos Using Titles

  • 1. TVSum: Summarizing Web Videos Using Titles PRESENTED BY: NEERAJ BAGHEL M.TECH CSE II YR 2015 IEEE Conference on Computer Vision and Pattern Recognition At Hynes Convention Center in Boston, Massachusetts. Yale Song, Jordi Vallmitjana, Amanda Stent, Alejandro Jaimes Yahoo Labs, New York 1
  • 2. OUTLINE • INTRODUCTION • OBSTACLES IN VIDEO SUMMARIZATION • DATASET • TVSum Framework • TVSum50 Benchmark Dataset • EXPERIMENTAL RESULTS • CONCLUSION • REFERENCES 2 1/23
  • 3. INTRODUCTION Video • Video data is a great asset for information extraction and knowledge discovery. • Due to its size an variability, it is extremely hard for users to monitor. video summarization • Intelligent video summarization algorithms allow us to quickly browse a lengthy video by capturing the essence and removing redundant information. 3 1/23
  • 4. OBSTACLES IN VIDEO SUMMARIZATI ON • Which part of a video is important • Images irrelevant to video content • Author present TVSum, an unsupervised video summarization framework that uses title-based image search results to find visually important shots • Author developed a novel co-archetypal analysis technique that learns canonical visual concepts shared between video and images 4 1/23
  • 5. DATASET  Author introduce a new benchmark dataset, TVSum50, that contains 50 videos and their shot-level importance scores annotated via crowd sourcing.  Experimental results on two datasets, SumMe and TVSum50, 5 1/23
  • 6. TVSum50 Benchmark Dataset  Title-based video summarization is a relatively unexplored domain; there is no publicly available dataset suitable for our purpose.  Author therefore collected a new dataset,TVSum50, that contains 50 videos and their shot-level importance scores obtained via crowdsourcing. 6 1/23
  • 7. 1. changing Vehicle Tire (VT) 2. getting Vehicle Unstuck(VU) 3. Grooming an Animal (GA) 4. Making Sandwich (MS), 5. ParKour (PK) 6. PaRade (PR) 7. Flash Mob gathering (FM) 8. Bee-Keeping (BK) 9. Attempting Bike Tricks (BT) 10. Dog Show (DS). 7 Figure 2. TVSum50 dataset contains 50 videos collected 10 categories TVSum50 Benchmark Dataset (contd.) 1/23
  • 8. Video Data Collection  Author selected 10 categories from the TRECVid Multimedia Event Detection (MED) task and collected 50 videos (5 per category) from YouTube using the category name as a search query term.  From the search results, we chose videos using the following criteria: (i) under the Creative Commons license; (ii) duration is 2 to 10 minutes. (iii)contains more than a single shot. (iv) its title is descriptive of the visual topic in the video. 8 1/23
  • 9. Web Image Data Collection  In order to learn canonical visual concepts from a video title.  Author need a sufficiently diverse set of images [15]. Unfortunately,the title itself can sometimes be too specific as a query term [29]. 9 1/23
  • 10. Figure 3. Chronological bias in shot importance labeling. Shown here is the video “Statue of Liberty” from the SumMe dataset [20].(a) from the SumMe dataset, (b) collected by us using our annotation interface. 10 1/23
  • 11. TVSum Framework This framework consists of four modules:  Shot segmentation.  canonical visual concept learning  shot importance scoring  summary generation 11 1/23
  • 12. Shot Segmentation  Shot segmentation is a crucial step in video summarizationfor maintaining visual coherence within each shot, which in turn affects the overall quality of a summary 12 1/23
  • 13. Canonical Visual Concept Learning  Author define canonical visual concepts as the patterns shared between video X and its title-based image search results Y, and represent them as a set of p latent variables Z = [z1, ・ ・ ・ , zp] ∈ Rd×p, where p ≪ d (in this experiments,p=200 and d=1,640). 13 1/23
  • 14. Shot Importance Scoring  Author first measure frame-level importance using the learned factorization of X into XBA. Specifically, we measure the importance of the i-th video frame xi by computing the total contribution of the corresponding elements of BA in reconstructing the original signal X, that is, 14 1/23
  • 15. Summary Generation  To generate a summary of length l, Author solve the following optimization problem: where s is the number of shots, vi is the importance score of the i-th shot, and wi is the length of the i-th shot. 15 1/23
  • 16. Baseline Models Auhtor compared our summarization approach to 8 baselines covering a variety of different methods:-  Sampling (SU and SR): A summary is generated by selecting shots either uniformly (SU) or shots either randomly (SR) such that the summary length is within the length budget l.  Clustering (CK and CS): Author tested two clustering approaches: k-means (CK) and spectral clustering (CS). 16 1/23
  • 17. Baseline Models  LiveLight (LL): summary is generated by removing redundant shots over time, measuring redundancy using a dictionary of shots updated online. we selected shots with the highest reconstruction errors that fit in the length budget l.  Web Image Prior [25] (WP): As in [25], Author defined 100.positive and 1 negative classes, using images from other videos in the same dataset as negative examples.  Archetypal Analysis [11] (AA1 and AA2): Author include two versions of archetypal analysis: one that learns archetypes from video data only (AA1), and another that uses a combination of video and image data (AA2). 17 1/23
  • 18. EXPERIMENTAL RESULTS(contd.) 18 1/23 Table 2. Experimental results on our TVSum50 dataset. Numbers show mean pairwise F1 scores.
  • 19. CONCLUSIONS  Auhtor presented TVSum, an unsupervised video summarization framework that uses the video title to find visually important shots. 19 1/23
  • 20. REFERENCES  [1] M. Basseville, I. V. Nikiforov, et al. Detection of abrupt changes:theory and application, volume 104. Prentice Hall Englewood Cliffs,1993. 2, 3  [2] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1), 2009. 3  [3] V. Berger. Selection bias and covariate imbalances in randomized clinical trials, volume 66. John Wiley & Sons, 2007. 6  [4] K. Bleakley and J.-P. Vert. The group fused lasso for multiple change-point detection. arXiv preprint arXiv:1106.4199, 2011. 2,3  [5] A. Borji and L. Itti. State-of-the-art in visual attention modeling. PAMI, 35(1), 2013. 2  [6] A. Bosch, A. Zisserman, and X. Mu˜noz. Image classification using random forests and ferns. In ICCV, 2007. 7  [7] A. Bosch, A. Zisserman, and X. Mu˜noz. Representing shape with a spatial pyramid kernel. In CIVR, 2007. 7  [8] C. Carpineto and G. Romano. A survey of automatic query expansion in information retrieval. ACM CSUR, 44(1), 2012. 5  [9] F. Chen and C. De Vleeschouwer. Formulating team-sport video summarization as a resource allocation problem. CSVT, 21. 2 20 1/23
  • 21. REFERENCES (contd.)  [10] J. Chen, Y. Cui, G. Ye, D. Liu, and S.-F. Chang. Event-driven semantic concept discovery by exploiting weakly tagged internet images. In ICMR, page 1, 2014. 2  [11] Y. Chen, J. Mairal, and Z. Harchaoui. Fast and robust archetypal analysis for representation learning. In CVPR, 2014.  [12] Y. Cong, J. Yuan, and J. Luo. Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Multimedia,14(1), 2012.  [13] A. Cutler and L. Breiman. Archetypal analysis. Technometrics, 36(4), 1994.  [14] S. K. Divvala, A. Farhadi, and C. Guestrin. Learning everything about anything: Webly- supervised visual concept learning. In CVPR,2014. 8  [15] L. Duan, D. Xu, and S.-F. Chang. Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach. In CVPR, 2012. 4  [16] N. Ejaz, I. Mehmood, and S. Wook Baik. Efficient visual attention based framework for extracting key frames from videos. Signal Processing: Image Communication, 28(1), 2013. 2, 7, 8  [17] A. Ekin, A. M. Tekalp, and R. Mehrotra. Automatic soccer video analysis and summarization. Image Processing, IEEE Transactions on, 12(7), 2003. 2 21 1/23
  • 22. REFERENCES (contd.)  [18] S. Feng, Z. Lei, D. Yi, and S. Z. Li. Online content-aware video condensation. In CVPR, 2012. 2  [19] S. Fidler, A. Sharma, and R. Urtasun. A sentence is worth a thousand pixels. In CVPR, 2013. 2  [20] M. Gygli, H. Grabner, H. Riemenschneider, and L. V. Gool. Creating summaries from user videos. In ECCV, 2014. 2, 4, 5, 6, 7, 8  [21] M. Gygli, H. Grabner, H. Riemenschneider, F. Nater, and L. V. Gool. The interestingness of images. In ICCV, 2013. 2  [22] Z. Harchaoui and C. L´ evy-Leduc. Multiple change-point estimation with a total variation penalty. Journal of the American Statistical Association, 105(492), 2010. 3  [23] G. Hripcsak and A. S. Rothschild. Agreement, the f-measure, and reliability in information retrieval. JAMIA, 12(3), 2005. 6  [24] Y. Jia, J. T. Abbott, J. Austerweil, T. Griffiths, and T. Darrell. Visual concept learning: Combining machine vision and bayesian generalization on concept hierarchies. In NIPS, 2013. 8  [25] A. Khosla, R. Hamid, C. Lin, and N. Sundaresan. Large-scale video summarization using web-image priors. In CVPR, 2013. 22 1/23
  • 23. REFERENCES (contd.)  [26] G. Kim, L. Sigal, and E. P. Xing. Joint summarization of large-scale collections of web images and videos for storyline reconstruction. In CVPR, 2014. 2  [27] Y. J. Lee, J. Ghosh, and K. Grauman. Discovering important people and objects for egocentric video summarization. In CVPR, 2012. 2,  [28] L. Li, K. Zhou, G.-R. Xue, H. Zha, and Y. Yu. Video summarization via transferrable structured learning. In WWW, 2011. 1  [29] D. Lin, S. Fidler, C. Kong, and R. Urtasun. Visual semantic search: Retrieving videos via complex textual queries. In CVPR, 2014. 2, 4,8  [30] D. Liu, G. Hua, and T. Chen. A hierarchical visual model for video object summarization. PAMI, 32(12), 2010. 2  [31] X. Liu, Y. Mu, B. Lang, and S.-F. Chang. Mixed image-keyword query adaptive hashing over multilabel images. TOMCCAP, 10(2),2014. 2  [32] Z. Lu and K. Grauman. Story-driven summarization for egocentric video. In CVPR, 2013. 2, 5, 6  [33] Y.-F. Ma, L. Lu, H.-J. Zhang, and M. Li. A user attention model for video summarization. In ACM MM, 2002. 2 23 1/23
  • 24. 24