SlideShare a Scribd company logo
KIT at MediaEval 2012 - Content-based Genre
Classification with Visual Cues
Tomas Semela
Makarand Tapaswi

MediaEval 2012 Workshop
Institute for Anthropomatics




KIT – University of the State of Baden-Wuerttemberg and
National Research Center of the Helmholtz Association     www.kit.edu
Motivation


      Rapid growth of digital
      multimedia data in the
      broadcast and web video
      domain



      Need for efficient
      automated indexing and
      content search



2           KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues   Institute for Anthropomatics
            MediaEval 2012 Workshop
Challenges



    Broadcast TV domain                                             Web video domain

      Channel archives                                                    Video portals like YouTube
      Digital distribution                                                (User content)
      Web offerings                                                       Arrangement in categories:
                                                                                   Resemblence to topics
                                                                                   (Autos – Animals – Travel)
      Arrangement in genres:                                                       Variation in production
         Highly characteristic                                                     values and style
         Low variance                                                              Not limited to single genre
         Clear boundaries                                                          characterstics


3            KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues            Institute for Anthropomatics
             MediaEval 2012 Workshop
Related work



      System from University of Torino, Italy
              Extract video features from aural, visual, cognitive and structural
              cues
              Neural network for classification



                    M. Montagnuolo, A. Messina, ”Parallel Neural
                    Networks for Multimodal Video Genre Classification”,
                    Multimedia Tools and Appl., 41(1):125–159, 2009




4    05.10.2012   KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues   Institute for Anthropomatics
                  MediaEval 2012 Workshop
KIT System




      Visual feature extraction from keyframes
      SVM classification system
      Fusion of results with majority voting

5    05.10.2012   KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues   Institute for Anthropomatics
                  MediaEval 2012 Workshop
Low-level visual features



      Color                                                                    Texture
              Color moments                                                             Wavelet texture
              HSV histogram                                                             Edge histogram
              Color auto correlogram                                                    Co-occurrence texture

                                 Global features for each video

            H. K. Ekenel, T. Semela, and R. Stiefelhagen, “Content-based video
            genre classification using multiple cues”, AIEMPro'10, pages 21-26,
            2010.




6    05.10.2012   KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues         Institute for Anthropomatics
                  MediaEval 2012 Workshop
SIFT – For each keyframe

      Interest point detection
          Dense sampling
      Spatial-pyramid
          1x1 – 2x2 – 1x3
      SIFT descriptors
          SIFT
          rgbSIFT
          opponentSIFT
      Bag-of-visual-words
          Codebook (500-dim.)
          Codeword histograms
    K. E. A. van de Sande, T. Gevers, and C. G. M. Snoek,
    “Empowering Visual Categorization with the GPU”, IEEE
    Transactions on Multimedia, 13(1):60-70, 2011.
7              KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues   Institute for Anthropomatics
               MediaEval 2012 Workshop
Classification

      Training of one support vector machine (SVM) for each
      genre and each feature
              Binary classification (one vs. all)
              RBF kernel
              Cross-validation
              Fusion in decision level
              Majority voting (probability output)
              SIFT: keyframes classified individually, output averaged over video




8    05.10.2012   KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues   Institute for Anthropomatics
                  MediaEval 2012 Workshop
Domain Knowledge


      Video distribution in the development set:
              Autos 8 videos
              Technology ~ 500 videos


      Use this information in the final prediction of the category
      as a likelihood of the distribution on blip.tv:
         1. SVM scores for each video normalized to unit sum
         2. Divide these probabilities by the square root of the number of
            videos in the development set for each category to include the
            a-priori knowledge of the class distribution
         3. Finally, step one is repeated to obtain unit sum


9    05.10.2012   KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues   Institute for Anthropomatics
                  MediaEval 2012 Workshop
Evaluation


       Blip.tv data with ~ 9550 clips
       Two configurations with/without prior domain knowledge


            No prior                               run1                                 run2             run3

                                                 Visual                                 SIFT         Visual + SIFT

                   MAP                           0.3008                               0.2329            0.3499

                   Prior                           run4                                 run5             run6

                                                 Visual                                 SIFT         Visual + SIFT

                   MAP                           0.3461                               0.1448            0.3581


10    05.10.2012       KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues         Institute for Anthropomatics
                       MediaEval 2012 Workshop
Evaluation – Run 6




11    05.10.2012   KIT at MediaEval 2011 – Content-based genre classification on web-videos   Institute for Anthropomatics
                   MediaEval 2011 Workshop
Evaluation




                                                           Run6 (MAP):
                   Top 4 categories:                                                        Worst 4 categories:
               autos and vehicles (0.812)                                                citizen journalism (0.158)
                     health (0.668)                                                          documentary (0.119)
             movies and television (0.602)                                                  videoblogging (0.100)
                    religion (0.578)                                                              travel (0.010)




12    05.10.2012    KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues                    Institute for Anthropomatics
                    MediaEval 2012 Workshop
Conclusions & Future Work

      Conclusions
         Visual-based classification shows limitations for category tagging
         Few categories with satisfactory results
         SIFT: only slight improvement of overall results
         Prior domain knowledge improves results overall
      Future Work
         Temporal features
         Mid-level semantics
            Action Detection, Audio segmentation
         ASR & Metadata integration
         Individual classification approach & features for each genre

13           KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues   Institute for Anthropomatics
             MediaEval 2012 Workshop
Thank you


14   05.10.2012   KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues   Institute for Anthropomatics
                  MediaEval 2012 Workshop
15   05.10.2012   KIT at MediaEval 2011 – Content-based genre classification on web-videos   Institute for Anthropomatics
                  MediaEval 2011 Workshop

More Related Content

PDF
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
PDF
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
PDF
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
PDF
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
PDF
Understanding user interactivity for immersive communications and its impact ...
PDF
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
ZIP
Simone Mora - PhD Interview at ITU
PDF
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Understanding user interactivity for immersive communications and its impact ...
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
Simone Mora - PhD Interview at ITU
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019

Similar to KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues (20)

PDF
D1.1. State of The Art and Requirements Analysis for Hypervideo
PPT
Overview of the MediaEval 2012 Tagging Task
PDF
Content Modelling for Human Action Detection via Multidimensional Approach
PDF
A Multimodal Approach for Video Geocoding
PDF
Presentacion Dcai 2010
PDF
Bertini - Automatic Metadata Extraction in VidiVideo & im3i @EUscreen Mykonos
PPTX
Enrichment of News Show Videos with Multimodal Semi-Automatic Analysis
PDF
Automatic Visual Concept Detection in Videos: Review
PDF
Call for papers - 9th International Conference on Signal, Image Processing an...
PDF
TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...
PPTX
Sensory Effect Dataset and Test Setups
PDF
Research and activity report
PDF
Mpeg v-awareness event
PDF
MediaEval 2016: LAPI at Predicting Media Interestingness Task
PDF
Media Genre Inference for Predicting Media Interestingness
PDF
MediaEval 2017 - Interestingness Task: EURECOM @MediaEval 2017: Media Genre I...
PDF
MediaEval 2015 - RFA at MediaEval 2015 Affective Impact of Movies Task: A Mul...
PDF
Accessible project concept and_achievementsv01
PDF
VERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval
PDF
White Paper - Mpeg 4 Toolkit Approach
D1.1. State of The Art and Requirements Analysis for Hypervideo
Overview of the MediaEval 2012 Tagging Task
Content Modelling for Human Action Detection via Multidimensional Approach
A Multimodal Approach for Video Geocoding
Presentacion Dcai 2010
Bertini - Automatic Metadata Extraction in VidiVideo & im3i @EUscreen Mykonos
Enrichment of News Show Videos with Multimodal Semi-Automatic Analysis
Automatic Visual Concept Detection in Videos: Review
Call for papers - 9th International Conference on Signal, Image Processing an...
TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...
Sensory Effect Dataset and Test Setups
Research and activity report
Mpeg v-awareness event
MediaEval 2016: LAPI at Predicting Media Interestingness Task
Media Genre Inference for Predicting Media Interestingness
MediaEval 2017 - Interestingness Task: EURECOM @MediaEval 2017: Media Genre I...
MediaEval 2015 - RFA at MediaEval 2015 Affective Impact of Movies Task: A Mul...
Accessible project concept and_achievementsv01
VERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval
White Paper - Mpeg 4 Toolkit Approach
Ad

More from MediaEval2012 (20)

PDF
MediaEval 2012 Opening
PDF
Closing
PPTX
Brave New Task: Musiclef Multimodal Music Tagging
PDF
Search and Hyperlinking Task at MediaEval 2012
PDF
CUNI at MediaEval 2012: Search and Hyperlinking Task
PDF
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
PPTX
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
PPTX
Brave New Task: User Account Matching
PDF
The CLEF Initiative From 2010 to 2012 and Onwards
PPT
Overview of MediaEval 2012 Visual Privacy Task
PPT
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
PPT
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
PPTX
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
PPTX
mevd2012 esra_
PPTX
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
PPT
The MediaEval 2012 Affect Task: Violent Scenes Detectio
PPT
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
PDF
LIG at MediaEval 2012 affect task: use of a generic method
PPT
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
PPT
ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...
MediaEval 2012 Opening
Closing
Brave New Task: Musiclef Multimodal Music Tagging
Search and Hyperlinking Task at MediaEval 2012
CUNI at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Brave New Task: User Account Matching
The CLEF Initiative From 2010 to 2012 and Onwards
Overview of MediaEval 2012 Visual Privacy Task
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
mevd2012 esra_
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
The MediaEval 2012 Affect Task: Violent Scenes Detectio
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
LIG at MediaEval 2012 affect task: use of a generic method
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...
Ad

Recently uploaded (20)

PDF
Getting Started with Data Integration: FME Form 101
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
project resource management chapter-09.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Tartificialntelligence_presentation.pptx
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Modernising the Digital Integration Hub
PPTX
1. Introduction to Computer Programming.pptx
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Getting Started with Data Integration: FME Form 101
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
project resource management chapter-09.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Tartificialntelligence_presentation.pptx
A novel scalable deep ensemble learning framework for big data classification...
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Group 1 Presentation -Planning and Decision Making .pptx
Enhancing emotion recognition model for a student engagement use case through...
Final SEM Unit 1 for mit wpu at pune .pptx
Programs and apps: productivity, graphics, security and other tools
Modernising the Digital Integration Hub
1. Introduction to Computer Programming.pptx
Zenith AI: Advanced Artificial Intelligence
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Web App vs Mobile App What Should You Build First.pdf
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
O2C Customer Invoices to Receipt V15A.pptx
Univ-Connecticut-ChatGPT-Presentaion.pdf
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf

KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues

  • 1. KIT at MediaEval 2012 - Content-based Genre Classification with Visual Cues Tomas Semela Makarand Tapaswi MediaEval 2012 Workshop Institute for Anthropomatics KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu
  • 2. Motivation Rapid growth of digital multimedia data in the broadcast and web video domain Need for efficient automated indexing and content search 2 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 3. Challenges Broadcast TV domain Web video domain Channel archives Video portals like YouTube Digital distribution (User content) Web offerings Arrangement in categories: Resemblence to topics (Autos – Animals – Travel) Arrangement in genres: Variation in production Highly characteristic values and style Low variance Not limited to single genre Clear boundaries characterstics 3 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 4. Related work System from University of Torino, Italy Extract video features from aural, visual, cognitive and structural cues Neural network for classification M. Montagnuolo, A. Messina, ”Parallel Neural Networks for Multimodal Video Genre Classification”, Multimedia Tools and Appl., 41(1):125–159, 2009 4 05.10.2012 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 5. KIT System Visual feature extraction from keyframes SVM classification system Fusion of results with majority voting 5 05.10.2012 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 6. Low-level visual features Color Texture Color moments Wavelet texture HSV histogram Edge histogram Color auto correlogram Co-occurrence texture Global features for each video H. K. Ekenel, T. Semela, and R. Stiefelhagen, “Content-based video genre classification using multiple cues”, AIEMPro'10, pages 21-26, 2010. 6 05.10.2012 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 7. SIFT – For each keyframe Interest point detection Dense sampling Spatial-pyramid 1x1 – 2x2 – 1x3 SIFT descriptors SIFT rgbSIFT opponentSIFT Bag-of-visual-words Codebook (500-dim.) Codeword histograms K. E. A. van de Sande, T. Gevers, and C. G. M. Snoek, “Empowering Visual Categorization with the GPU”, IEEE Transactions on Multimedia, 13(1):60-70, 2011. 7 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 8. Classification Training of one support vector machine (SVM) for each genre and each feature Binary classification (one vs. all) RBF kernel Cross-validation Fusion in decision level Majority voting (probability output) SIFT: keyframes classified individually, output averaged over video 8 05.10.2012 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 9. Domain Knowledge Video distribution in the development set: Autos 8 videos Technology ~ 500 videos Use this information in the final prediction of the category as a likelihood of the distribution on blip.tv: 1. SVM scores for each video normalized to unit sum 2. Divide these probabilities by the square root of the number of videos in the development set for each category to include the a-priori knowledge of the class distribution 3. Finally, step one is repeated to obtain unit sum 9 05.10.2012 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 10. Evaluation Blip.tv data with ~ 9550 clips Two configurations with/without prior domain knowledge No prior run1 run2 run3 Visual SIFT Visual + SIFT MAP 0.3008 0.2329 0.3499 Prior run4 run5 run6 Visual SIFT Visual + SIFT MAP 0.3461 0.1448 0.3581 10 05.10.2012 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 11. Evaluation – Run 6 11 05.10.2012 KIT at MediaEval 2011 – Content-based genre classification on web-videos Institute for Anthropomatics MediaEval 2011 Workshop
  • 12. Evaluation Run6 (MAP): Top 4 categories: Worst 4 categories: autos and vehicles (0.812) citizen journalism (0.158) health (0.668) documentary (0.119) movies and television (0.602) videoblogging (0.100) religion (0.578) travel (0.010) 12 05.10.2012 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 13. Conclusions & Future Work Conclusions Visual-based classification shows limitations for category tagging Few categories with satisfactory results SIFT: only slight improvement of overall results Prior domain knowledge improves results overall Future Work Temporal features Mid-level semantics Action Detection, Audio segmentation ASR & Metadata integration Individual classification approach & features for each genre 13 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 14. Thank you 14 05.10.2012 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 15. 15 05.10.2012 KIT at MediaEval 2011 – Content-based genre classification on web-videos Institute for Anthropomatics MediaEval 2011 Workshop