Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

Temporal Action Localization in
Untrimmed Videos via Multi-
Stage CNNs
Slides by Alberto Montes
Computer Vision Group Reading Group,
[arXiv] [code]
Zheng Shou, Dongang Wang and Shih-Fu Chang

Previous Work
Improved Dense Trajectory (iDT)
Fisher Vector
2D Convolution

Problem Definition
Video:
frame # frames
Annotations:
Candidates:
action category
action category
start and ending frame

Multi-Scale Segment Generation
◉ Each frame resized to 171x128 pixels
◉ Temporal sliding windows:
○ 16, 32, 64, 128, 256, 512 frames
○ 75% overlap
◉ Construct segment s by uniformly sampling 16
frames

Network Architecture
C3D Network

Training Proposal and
Classification Network
◉ lr=0.0001 except fc8 lr=0.01, momentum=0.9,
weight decay factor=0.0005
◉ Drop lr by factor of 2 every 10K iterations
Proposal Network:
● fc8: 2 nodes
Classification Network:
● fc8: K+1 nodes

Localization Network
Add Custom Loss function

true class label
overlap sensitivity
Try to boost segments with high overlap
Works best with: λ = 1, α = 0.25

Learning target:

Prediction and Post-
processing
◉ Keep segments with Ppro
> 0.7
◉ Remove background segments
◉ Ploc
multiply with class-specific frequency of
occurrence for each window length in the
training data to leverage window length
distribution patterns
◉ NMS based on Ploc
to remove redundancy.
(θ - 0.1)

MEXaction2
“Bull Charge Cape” and
“Horse Riding” videos
77 hours of videos
Training set: 1336 instances
Validation set: 310 instances
Test set: 329 instances
Datasets
THUMOS 2014
Temporal Action Detection Task
20 categories
Training set: 2755 videos
Validation set: 1010 videos and
3007 instances
Test set: 1574 videos and 3358
instances

Results MEXaction2
DFT: Dense Trajectory Features + SVM

Evaluation
Impact of individual networks:

Conclusions
Propose a multi-stage framework Semgent-CNN
to address temporal action location

Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs

More Related Content

What's hot (20)

Viewers also liked (13)

Similar to Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs (20)

More from Universitat Politècnica de Catalunya (20)

Recently uploaded (20)

Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs