SlideShare a Scribd company logo
Video Copy Detection Using
                                     Inclined Video Tomography and Bag-of-Visual-Words
                                          Hyun-seok Min, Se Min Kim, Wesley De Neve, and Yong Man Ro
                                                                   Image and Video Systems Lab
                                                      Korea Advanced Institute of Science and Technology (KAIST)
                                                                        Daejeon, South Korea
                                     e-mail: ymro@ee.kaist.ac.kr                                                                                       website: http://ivylab.kaist.ac.kr

I. INTRODUCTION                                                                                               III. VIDEO MATCHING USING HISTOGRAMS
                                                                                                              - The dissimilarity between two video clips Vq and Vr:
- BoVW-based approaches can be effectively used for the detection of both
  image and video copies                                                                                                                                   N                             p: the position of the video shot in the
                                                                                                                                   1
   - however, these approaches typically ignore the inherent temporal
                                                                                                                             q
                                                                                                                   D(V , V ) = min
                                                                                                                                p N
                                                                                                                                    r
                                                                                                                                                          ∑               q     r
                                                                                                                                                                 Dshot (S i , S i + p ),    reference video clip at which similarity
                                                                                                                                                                                            measurement starts
     nature of video content                                                                                                                              i =1
                                                                                                                                                                                                                        q N              r        r L
- Conventional video tomography extracts slices from a space-time cube
                                                                                                                                                                                                            q
                                                                                                                                                                                                           V =         Si          V =           Sl
                                                                                                                                                                                                                          i =1                      l =1
  that are parallel to the time axis
   - however, slices that are parallel to the time axis do not take advantage
                                                                                                              - The dissimilarity between two video shots Sq and Sr is measured by
     of spatial information
                                                                                                                making use of the cosine similarity:
- This paper proposes to create a content-based video signature by means                                                                                              M
  of the following two sequential steps                                                                                                                              ∑       q
                                                                                                                                                                            aj     r
                                                                                                                                                                                 ×aj                        M: the number of visual words in the
                                                                                                                                                                                                               vocabulary
   1) extraction of inclined tomography images from the video content                                                                   q     r                      j =1
      - angle of inclination is dependent on the amount of motion                                                         Dshot (S , S ) = 1 -                                                       , aj: the weight of the jth visual word
                                                                                                                                                                 M               M
   2) characterization of the inclined tomography images by means of BoVW                                                                                        ∑ )∑ )
                                                                                                                                                                  (  (   q 2
                                                                                                                                                                        aj               r 2
                                                                                                                                                                                        aj
                                                                                                                                                                 j =1            j =1

II. CREATION OF A VIDEO SIGNATURE BY MEANS OF                                                                IV. EXPERIMENTS
  INCLINED VIDEO TOMOGRAPHY AND BOVW                                              1. Experimental setup
1. Extraction of inclined tomography images                                        - Use of TRECVID 2009 for creating NDVCs and reference video clips
   - To extract inclined tomography images from a video clip V, we first           - Use of 100 query video clips by applying five transformations to 20
     segment V into N space-time cubes such that V = <S1, S2, …, SN>                 video clips randomly selected from the reference video database
   - We subsequently segment each space-time cube into several space-                 - blurring: we blurred frames using a Gaussian kernel with a radius
     time sub-cubes                                                                     of 15;
                                                                                      - picture-in-picture: we inserted a picture with a size that is 30% of
                                            Fv, Fb : number of frames in a space-       the size of the main frame;
                                                     time cube and space-time
                                                     sub-cube                         - change in brightness: we increased the brightness with 40%;
                                            Wv, Wb : width of a space-time cube       - mirroring: we reversed frames from the left to the right;
                                                       and space-time sub-cube        - change in frame rate: we halved the frame rate.
                                                                  Hv, Hb : height of a space-time cube
                                                                           and space-time sub-cube           2. Experimental results
                                                                                                                  1.1                                                                                 1.1
                                                                                                                    1                                                                                   1
                                                                                                                  0.9                                                                                 0.9
       Fig. 1. Segmentation of a space-time cube into space-time sub-cubes.                                       0.8                                                                                 0.8
                                                                                                                  0.7                                                                                 0.7
                                                                                                                                                                                               Precision
                                                                                                         Recall




                                                                                                                  0.6                                                                                 0.6
   - The angle of inclination of the tomography image extracted reflects the                                      0.5                                                                                 0.5
     intensity of motion in the space-time sub-cube under consideration                                           0.4                                                                                 0.4
                                                                                                                  0.3                                                                                 0.3
                                                                                                                  0.2                                                                                 0.2
                                                                                                                  0.1                                                                                 0.1
                                                                                                                    0                                                                                   0
                                                                                                                            blur      pattern change in mirroring frame rate      average                       blur      pattern change in mirroring frame rate   average
                                                                                                                                     insertion brightness           change                                               insertion brightness           change
                                                                                                                                                   Transformations                                                                     Transformations

                                                                                                                        Proposed video signature    BoVW using SIFT       Video tomography                  Proposed video signature    BoVW using SIFT   Video tomography



                                                                                                                              Fig. 4. Comparison of the effectiveness of several video signatures.




   Fig. 2. Extraction of an inclined tomography image from a space-time sub-cube.
                                                             L(x, y, t): the luminance value of a
           β
   θ=
      Wb × H b × Fb
                         ∑L( x, y, f + 1) - L( x, y, f ) ,              pixel (x, y) of a particular
                                                                        frame at time t
                      ( x, y , f )
                                                             β: a weight parameter

2. BoVW applied to inclined tomography images
   - each space-time cube Si can be represented as a vector Ai that
     summarizes how the space-time sub-cubes are distributed over the
     vocabulary of visual words used                                                                                        (a)                                       (b)
                                                        M: the number of visual words vj in the        Fig. 5. Example images: (a) example key frame and (b) 16 inclined tomography images
        A i = ai,1 , ai ,2 ,...,ai ,M ,                    vocabulary used
                                                                                                                             extracted from the key frame shown in (a).
                                                        ai,j: the weight of the jth visual word
                                                                                                       V. CONCLUSIONS
                                                                                                         - This paper introduced a novel video signature that takes advantage of
                                                                                                           both inclined video tomography and BoVW
                                                                                                         - The proposed video signature is able to capture both spatial and
                                                                                                           temporal information
                                                                                                            - the angle of inclination of the extracted tomography images is
Fig. 3. Extraction of a histogram of visual words from an inclined tomography image.                          dependent on the amount of motion in the local volumes

                                     IEEE International Conference on Multimedia and Expo (ICME), July 2012, Melbourne (Australia)

More Related Content

PDF
Spacetime Meshing for Discontinuous Galerkin Methods
PDF
IGARSS2011-TDX_TandemL_v1.pdf
PPTX
Spectrum-Compliant Accelerograms through Harmonic Wavelet Transform
PDF
Evidence Of Bimodal Crystallite Size Distribution In Microcrystalline Silico...
PDF
PPTX
Geomage Presentation
KEY
Pre-coalescence detection of low frequency inspiral signals
PDF
Spacetime Meshing for Discontinuous Galerkin Methods
IGARSS2011-TDX_TandemL_v1.pdf
Spectrum-Compliant Accelerograms through Harmonic Wavelet Transform
Evidence Of Bimodal Crystallite Size Distribution In Microcrystalline Silico...
Geomage Presentation
Pre-coalescence detection of low frequency inspiral signals

Similar to Video Copy Detection Using Inclined Video Tomography and Bag-of-Visual-Words (20)

PDF
Exploiting collective knowledge in an image folksonomy for semantic-based nea...
PPT
Projection of lines
PPT
straight line
PPS
Projection of lines(thedirectdata.com)
PPT
Edp projection of lines
PPT
Edp st line(new)
PDF
Lecture 10h
PPT
Projection of lines
PDF
Wiamis2010 poster
PDF
Wiamis2010 poster
PPT
Leveraging an image folksonomy and the signature quadratic form distance for ...
PDF
Fcv poster bao
PDF
Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)
PDF
Tall-and-skinny QR factorizations in MapReduce architectures
PDF
Rear View Virtual Image Displays
PDF
A New In-Camera Imaging Model For Color Computer Vision And Its Application
PDF
18 Khan Precis
PDF
Visual Odomtery(2)
PDF
CUbRIK research presented at SSMS 2012
PDF
Using Flashcard Apps for Art History Study
Exploiting collective knowledge in an image folksonomy for semantic-based nea...
Projection of lines
straight line
Projection of lines(thedirectdata.com)
Edp projection of lines
Edp st line(new)
Lecture 10h
Projection of lines
Wiamis2010 poster
Wiamis2010 poster
Leveraging an image folksonomy and the signature quadratic form distance for ...
Fcv poster bao
Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)
Tall-and-skinny QR factorizations in MapReduce architectures
Rear View Virtual Image Displays
A New In-Camera Imaging Model For Color Computer Vision And Its Application
18 Khan Precis
Visual Odomtery(2)
CUbRIK research presented at SSMS 2012
Using Flashcard Apps for Art History Study
Ad

More from Ghent University Global Campus (20)

PPTX
Ghent University Global Campus: Overview
DOCX
GUGC 10th Anniversary Celebration - Progress Report - Celebrating a Decade of...
PPTX
Introduction to the Center for Biosystems and Biotech Data Science at Ghent U...
PDF
GUGC Research Overview (December 2024)
PDF
Towards diagnosis of rotator cuff tears in 3-D MRI using 3-D convolutional ne...
PDF
Investigating the biological relevance in trained embedding representations o...
PDF
Impact of adversarial examples on deep learning models for biomedical image s...
PPTX
Learning Biologically Relevant Features Using Convolutional Neural Networks f...
PPTX
The 5th Aslla Symposium
PPTX
Ghent University Global Campus 101
PDF
Booklet for the First GUGC Research Symposium
PDF
Center for Biotech Data Science at Ghent University Global Campus
PDF
Center for Biotech Data Science at Ghent University Global Campus
PPTX
Learning biologically relevant features using convolutional neural networks f...
PPTX
Towards reading genomic data using deep learning-driven NLP techniques
PPTX
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
PPTX
GUGC Info Session - Informatics and Bioinformatics
PPTX
Ghent University Global Campus - Sungkyunkwan University: Workshop on Researc...
PPTX
Ghent University and GUGC-K: Overview of Teaching and Research Activities
PDF
Biotech Data Science @ GUGC in Korea: Deep Learning for Prediction of Drug-Ta...
Ghent University Global Campus: Overview
GUGC 10th Anniversary Celebration - Progress Report - Celebrating a Decade of...
Introduction to the Center for Biosystems and Biotech Data Science at Ghent U...
GUGC Research Overview (December 2024)
Towards diagnosis of rotator cuff tears in 3-D MRI using 3-D convolutional ne...
Investigating the biological relevance in trained embedding representations o...
Impact of adversarial examples on deep learning models for biomedical image s...
Learning Biologically Relevant Features Using Convolutional Neural Networks f...
The 5th Aslla Symposium
Ghent University Global Campus 101
Booklet for the First GUGC Research Symposium
Center for Biotech Data Science at Ghent University Global Campus
Center for Biotech Data Science at Ghent University Global Campus
Learning biologically relevant features using convolutional neural networks f...
Towards reading genomic data using deep learning-driven NLP techniques
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
GUGC Info Session - Informatics and Bioinformatics
Ghent University Global Campus - Sungkyunkwan University: Workshop on Researc...
Ghent University and GUGC-K: Overview of Teaching and Research Activities
Biotech Data Science @ GUGC in Korea: Deep Learning for Prediction of Drug-Ta...
Ad

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
August Patch Tuesday
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
TLE Review Electricity (Electricity).pptx
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Architecture types and enterprise applications.pdf
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
1. Introduction to Computer Programming.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Hybrid model detection and classification of lung cancer
NewMind AI Weekly Chronicles – August ’25 Week III
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
August Patch Tuesday
Hindi spoken digit analysis for native and non-native speakers
Zenith AI: Advanced Artificial Intelligence
TLE Review Electricity (Electricity).pptx
DP Operators-handbook-extract for the Mautical Institute
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Getting started with AI Agents and Multi-Agent Systems
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Architecture types and enterprise applications.pdf
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
1. Introduction to Computer Programming.pptx
Programs and apps: productivity, graphics, security and other tools
Univ-Connecticut-ChatGPT-Presentaion.pdf
Group 1 Presentation -Planning and Decision Making .pptx
Final SEM Unit 1 for mit wpu at pune .pptx
Hybrid model detection and classification of lung cancer

Video Copy Detection Using Inclined Video Tomography and Bag-of-Visual-Words

  • 1. Video Copy Detection Using Inclined Video Tomography and Bag-of-Visual-Words Hyun-seok Min, Se Min Kim, Wesley De Neve, and Yong Man Ro Image and Video Systems Lab Korea Advanced Institute of Science and Technology (KAIST) Daejeon, South Korea e-mail: ymro@ee.kaist.ac.kr website: http://ivylab.kaist.ac.kr I. INTRODUCTION III. VIDEO MATCHING USING HISTOGRAMS - The dissimilarity between two video clips Vq and Vr: - BoVW-based approaches can be effectively used for the detection of both image and video copies N p: the position of the video shot in the 1 - however, these approaches typically ignore the inherent temporal q D(V , V ) = min p N r ∑ q r Dshot (S i , S i + p ), reference video clip at which similarity measurement starts nature of video content i =1 q N r r L - Conventional video tomography extracts slices from a space-time cube q V = Si V = Sl i =1 l =1 that are parallel to the time axis - however, slices that are parallel to the time axis do not take advantage - The dissimilarity between two video shots Sq and Sr is measured by of spatial information making use of the cosine similarity: - This paper proposes to create a content-based video signature by means M of the following two sequential steps ∑ q aj r ×aj M: the number of visual words in the vocabulary 1) extraction of inclined tomography images from the video content q r j =1 - angle of inclination is dependent on the amount of motion Dshot (S , S ) = 1 - , aj: the weight of the jth visual word M M 2) characterization of the inclined tomography images by means of BoVW ∑ )∑ ) ( ( q 2 aj r 2 aj j =1 j =1 II. CREATION OF A VIDEO SIGNATURE BY MEANS OF IV. EXPERIMENTS INCLINED VIDEO TOMOGRAPHY AND BOVW 1. Experimental setup 1. Extraction of inclined tomography images - Use of TRECVID 2009 for creating NDVCs and reference video clips - To extract inclined tomography images from a video clip V, we first - Use of 100 query video clips by applying five transformations to 20 segment V into N space-time cubes such that V = <S1, S2, …, SN> video clips randomly selected from the reference video database - We subsequently segment each space-time cube into several space- - blurring: we blurred frames using a Gaussian kernel with a radius time sub-cubes of 15; - picture-in-picture: we inserted a picture with a size that is 30% of Fv, Fb : number of frames in a space- the size of the main frame; time cube and space-time sub-cube - change in brightness: we increased the brightness with 40%; Wv, Wb : width of a space-time cube - mirroring: we reversed frames from the left to the right; and space-time sub-cube - change in frame rate: we halved the frame rate. Hv, Hb : height of a space-time cube and space-time sub-cube 2. Experimental results 1.1 1.1 1 1 0.9 0.9 Fig. 1. Segmentation of a space-time cube into space-time sub-cubes. 0.8 0.8 0.7 0.7 Precision Recall 0.6 0.6 - The angle of inclination of the tomography image extracted reflects the 0.5 0.5 intensity of motion in the space-time sub-cube under consideration 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 blur pattern change in mirroring frame rate average blur pattern change in mirroring frame rate average insertion brightness change insertion brightness change Transformations Transformations Proposed video signature BoVW using SIFT Video tomography Proposed video signature BoVW using SIFT Video tomography Fig. 4. Comparison of the effectiveness of several video signatures. Fig. 2. Extraction of an inclined tomography image from a space-time sub-cube. L(x, y, t): the luminance value of a β θ= Wb × H b × Fb ∑L( x, y, f + 1) - L( x, y, f ) , pixel (x, y) of a particular frame at time t ( x, y , f ) β: a weight parameter 2. BoVW applied to inclined tomography images - each space-time cube Si can be represented as a vector Ai that summarizes how the space-time sub-cubes are distributed over the vocabulary of visual words used (a) (b) M: the number of visual words vj in the Fig. 5. Example images: (a) example key frame and (b) 16 inclined tomography images A i = ai,1 , ai ,2 ,...,ai ,M , vocabulary used extracted from the key frame shown in (a). ai,j: the weight of the jth visual word V. CONCLUSIONS - This paper introduced a novel video signature that takes advantage of both inclined video tomography and BoVW - The proposed video signature is able to capture both spatial and temporal information - the angle of inclination of the extracted tomography images is Fig. 3. Extraction of a histogram of visual words from an inclined tomography image. dependent on the amount of motion in the local volumes IEEE International Conference on Multimedia and Expo (ICME), July 2012, Melbourne (Australia)