SlideShare a Scribd company logo
Temporal Action Localization in
Untrimmed Videos via Multi-
Stage CNNs
Slides by Alberto Montes
Computer Vision Group Reading Group,
[arXiv] [code]
Zheng Shou, Dongang Wang and Shih-Fu Chang
Introduction
Previous Work
Improved Dense Trajectory (iDT)
Fisher Vector
2D Convolution
Segment-CNN
Segment-CNN
Segment-CNN
Segment-CNN
Problem Definition
Video:
frame # frames
Annotations:
Candidates:
action category
action category
start and ending frame
Multi-Scale Segment Generation
◉ Each frame resized to 171x128 pixels
◉ Temporal sliding windows:
○ 16, 32, 64, 128, 256, 512 frames
○ 75% overlap
◉ Construct segment s by uniformly sampling 16
frames
Network Architecture
C3D Network
Training Proposal and
Classification Network
◉ lr=0.0001 except fc8 lr=0.01, momentum=0.9,
weight decay factor=0.0005
◉ Drop lr by factor of 2 every 10K iterations
Proposal Network:
● fc8: 2 nodes
Classification Network:
● fc8: K+1 nodes
Localization Network
Add Custom Loss function
Localization Network
true class label
overlap sensitivity
Try to boost segments with high overlap
Works best with: λ = 1, α = 0.25
Localization Network
Learning target:
Localization Network
Prediction and Post-
processing
◉ Keep segments with Ppro
> 0.7
◉ Remove background segments
◉ Ploc
multiply with class-specific frequency of
occurrence for each window length in the
training data to leverage window length
distribution patterns
◉ NMS based on Ploc
to remove redundancy.
(θ - 0.1)
Experiments
MEXaction2
“Bull Charge Cape” and
“Horse Riding” videos
77 hours of videos
Training set: 1336 instances
Validation set: 310 instances
Test set: 329 instances
Datasets
THUMOS 2014
Temporal Action Detection Task
20 categories
Training set: 2755 videos
Validation set: 1010 videos and
3007 instances
Test set: 1574 videos and 3358
instances
Results MEXaction2
DFT: Dense Trajectory Features + SVM
Results MEXaction2
Results MEXaction2
Evaluation
Evaluation
Evaluation
Impact of individual networks:
Conclusions
Propose a multi-stage framework Semgent-CNN
to address temporal action location
“
Thank you!
Questions?

More Related Content

PDF
Recurrent Instance Segmentation (UPC Reading Group)
PDF
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
PDF
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
PDF
YolactEdge Review [cdm]
PDF
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
PDF
#6 PyData Warsaw: Deep learning for image segmentation
PDF
Transformer in Computer Vision
PDF
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Recurrent Instance Segmentation (UPC Reading Group)
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
YolactEdge Review [cdm]
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
#6 PyData Warsaw: Deep learning for image segmentation
Transformer in Computer Vision
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...

What's hot (20)

PDF
Joint unsupervised learning of deep representations and image clusters
PDF
crfasrnn_presentation
PDF
Deep Learning for Computer Vision: Segmentation (UPC 2016)
PDF
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
PDF
Pr057 mask rcnn
PDF
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
PDF
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
PDF
Pr045 deep lab_semantic_segmentation
PDF
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
PDF
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
PDF
ViT (Vision Transformer) Review [CDM]
PDF
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
PDF
[Paper] Multiscale Vision Transformers(MVit)
PPTX
Semantic segmentation with Convolutional Neural Network Approaches
PDF
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
PDF
Robustness of compressed CNNs
PDF
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
PDF
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
PDF
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
Joint unsupervised learning of deep representations and image clusters
crfasrnn_presentation
Deep Learning for Computer Vision: Segmentation (UPC 2016)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Pr057 mask rcnn
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
Pr045 deep lab_semantic_segmentation
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
ViT (Vision Transformer) Review [CDM]
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
[Paper] Multiscale Vision Transformers(MVit)
Semantic segmentation with Convolutional Neural Network Approaches
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Robustness of compressed CNNs
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
Ad

Viewers also liked (13)

PDF
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
PDF
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
PDF
ConvolutionalNeuralNetworks
PDF
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
PDF
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
PDF
Comparing Incremental Learning Strategies for Convolutional Neural Networks
PDF
Deep Convolutional Neural Networks - Overview
PDF
CNNs: from the Basics to Recent Advances
PDF
101: Convolutional Neural Networks
PDF
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
PPTX
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
PPTX
Deep Learning - Convolutional Neural Networks - Architectural Zoo
PDF
Deep Residual Learning (ILSVRC2015 winner)
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
ConvolutionalNeuralNetworks
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Comparing Incremental Learning Strategies for Convolutional Neural Networks
Deep Convolutional Neural Networks - Overview
CNNs: from the Basics to Recent Advances
101: Convolutional Neural Networks
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Deep Learning - Convolutional Neural Networks - Architectural Zoo
Deep Residual Learning (ILSVRC2015 winner)
Ad

Similar to Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs (20)

PDF
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
PDF
alphablues - ML applied to text and image in chat bots
PDF
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
PDF
dfdshofdifhdifhdfhgfoighfgofgfgfgfgdfdfdfdf
PDF
Efficient video perception through AI
PDF
Meetup 18/10/2018 - Artificiële intelligentie en mobiliteit
PDF
Improving Hardware Efficiency for DNN Applications
PDF
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
PDF
Lightweight DNN Processor Design (based on NVDLA)
PPTX
Cvpr 2018 papers review (efficient computing)
PDF
“Introduction to Computer Vision with Convolutional Neural Networks,” a Prese...
PPTX
Powerpoint templates for machine learning.pptx
PPTX
Deep Learning in Computer Vision
PDF
"Designing Deep Neural Network Algorithms for Embedded Devices," a Presentati...
PPTX
Wavelet video processing tecnology
PPTX
B.tech_project_ppt.pptx
PDF
AI On the Edge: Model Compression
PDF
Convolutional Neural Network Models - Deep Learning
PPTX
Introduction to CNN Models: DenseNet & MobileNet
PPTX
Tìm hiểu về CNN và ResNet | Computer Vision
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
alphablues - ML applied to text and image in chat bots
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
dfdshofdifhdifhdfhgfoighfgofgfgfgfgdfdfdfdf
Efficient video perception through AI
Meetup 18/10/2018 - Artificiële intelligentie en mobiliteit
Improving Hardware Efficiency for DNN Applications
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Lightweight DNN Processor Design (based on NVDLA)
Cvpr 2018 papers review (efficient computing)
“Introduction to Computer Vision with Convolutional Neural Networks,” a Prese...
Powerpoint templates for machine learning.pptx
Deep Learning in Computer Vision
"Designing Deep Neural Network Algorithms for Embedded Devices," a Presentati...
Wavelet video processing tecnology
B.tech_project_ppt.pptx
AI On the Edge: Model Compression
Convolutional Neural Network Models - Deep Learning
Introduction to CNN Models: DenseNet & MobileNet
Tìm hiểu về CNN và ResNet | Computer Vision

More from Universitat Politècnica de Catalunya (20)

PDF
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
PDF
Deep Generative Learning for All
PDF
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
PDF
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
PDF
The Transformer - Xavier Giró - UPC Barcelona 2021
PDF
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
PDF
Open challenges in sign language translation and production
PPTX
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
PPTX
Discovery and Learning of Navigation Goals from Pixels in Minecraft
PDF
Learn2Sign : Sign language recognition and translation using human keypoint e...
PDF
Intepretability / Explainable AI for Deep Neural Networks
PDF
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
PDF
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
PDF
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
PDF
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
PDF
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
PDF
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
PDF
Curriculum Learning for Recurrent Video Object Segmentation
PDF
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
PDF
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
The Transformer - Xavier Giró - UPC Barcelona 2021
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Open challenges in sign language translation and production
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Learn2Sign : Sign language recognition and translation using human keypoint e...
Intepretability / Explainable AI for Deep Neural Networks
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Curriculum Learning for Recurrent Video Object Segmentation
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020

Recently uploaded (20)

PDF
A comparative analysis of optical character recognition models for extracting...
PDF
cuic standard and advanced reporting.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Machine Learning_overview_presentation.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
1. Introduction to Computer Programming.pptx
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
A comparative analysis of optical character recognition models for extracting...
cuic standard and advanced reporting.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Machine Learning_overview_presentation.pptx
Unlocking AI with Model Context Protocol (MCP)
Assigned Numbers - 2025 - Bluetooth® Document
1. Introduction to Computer Programming.pptx
SOPHOS-XG Firewall Administrator PPT.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Network Security Unit 5.pdf for BCA BBA.
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
“AI and Expert System Decision Support & Business Intelligence Systems”
Per capita expenditure prediction using model stacking based on satellite ima...
MIND Revenue Release Quarter 2 2025 Press Release
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
NewMind AI Weekly Chronicles - August'25-Week II

Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs