SlideShare a Scribd company logo
Human Action Recognition without Human
He Yun1,2, Soma Shirakabe1,2, Yutaka Satoh1,2, Hirokatsu Kataoka1
1Computer Vision Research Group, AIST, Japan
2Human-Centered Vision Lab., University of Tsukuba, Japan
Motion representation
•  Database: UCF101, HMDB51, ActivityNet
•  Approach: IDT, Two-Stream CNN
–  DBs and approaches have been prepared in the field
Action Database
h"p://www.thumos.info/
The problem setting in action recognition
•  Video-level prediction
–  1 action-label prediction per input video
Tennis	Swing	
Mo6on	Descriptor
Dense Trajectories (DT) [Wang+, CVPR11]
•  Trajectory-based representation
–  A large amount of trajectories
–  Feature description (HOG, HOF, MBH)
–  Codeword vector is generated
Two-Stream CNN [Simonyan+, NIPS14]
•  Spatial and temporal convolution
–  Spatial-stream: From a RGB image
–  Temporal-stream: From a stacked flows
–  Score fusion: Average or SVM
Is background enough to classify actions?
•  RGB input is too strong!
–  The two-stream CNN[Simonyan+, NIPS14] reported spatial-stream can understand an
action more than expected
•  72.4% with spatial-stream (RGB) @UCF101
•  “Human Action Recognition without Human”
Without Human?
•  Human action recognition can be done just by motion of the
background?
Tennis	Swing	
Mo6on	Descriptor	
Tennis	Swing?	
Mo6on	Descriptor
Detailed setting of w/ and w/o Human
•  With and without human setting
–  Without human setting: center-blind image with UCF101
–  With human setting: inverse of the without human setting
I	(x,	y)	 f	(x,	y)	*	 I’	(x,	y)	
1/2	 1/4	1/4	
1/2	
1/4	
1/4	
I	(x,	y)	 f	(x,	y)	*	 I’	(x,	y)	
1/2	 1/4	1/4	
1/2	
1/4	
1/4	
ー	 ー	
Without	Human	SeIng		 With	Human	SeIng
Framework
–  Baseline: Very deep two-stream CNN [Wang+, arXiv15]
–  Two different scenarios: without human and with human
Exploration experiment
•  @UCF101
–  UCF101 pre-trained model with very deep two-stream CNN
–  With/Without Human Setting
Visual results (Full Image)
Visual results (Without Human Setting)
Without Human
•  The concept of ”Human Action Recognition without Human”
–  The accuracies are very close
•  With human is +9.49% better than without human
–  The current motion representation heavily rely on the backgrounds
Future work
•  This is a suggestive reality
–  We must accept this reality to realize better motion representation
–  Pure motion representation is an urgent work!
•  More sophisticated approach
•  Human only motion

More Related Content

PDF
【ISVC2015】Evaluation of Vision-based Human Activity Recognition in Dense Traj...
PDF
Extended Co-occurrence HOG with Dense Trajectories for Fine-grained Activity ...
PDF
Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
PDF
Visual geometry with deep learning
PDF
Cross-domain complementary learning with synthetic data for multi-person part...
PDF
Faster R-CNN: Towards real-time object detection with region proposal network...
PDF
Step zhedong
PDF
Deep Learningを用いた経路予測の研究動向
【ISVC2015】Evaluation of Vision-based Human Activity Recognition in Dense Traj...
Extended Co-occurrence HOG with Dense Trajectories for Fine-grained Activity ...
Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
Visual geometry with deep learning
Cross-domain complementary learning with synthetic data for multi-person part...
Faster R-CNN: Towards real-time object detection with region proposal network...
Step zhedong
Deep Learningを用いた経路予測の研究動向

What's hot (20)

PPTX
CVPR2016を自分なりにまとめてみた
PDF
Learning to Find and Match Interest Points
PPTX
Fast Multi-frame Stereo Scene Flow with Motion Segmentation (CVPR 2017)
PDF
SLAM Zero to One
PDF
行動認識手法の論文・ツール紹介
PDF
SSII2020TS: Event-Based Camera の基礎と ニューラルネットワークによる信号処理 〜 生き物のように「変化」を捉えるビジョンセ...
PPTX
物体検出の歴史(R-CNNからSSD・YOLOまで)
PPTX
Real Time Human Posture Detection with Multiple Depth Sensors
PPTX
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
PDF
Learning Disentangled Representation for Robust Person Re-identification
PDF
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
PPTX
Human Action Recognition Using 3D Joint Information and HOOFD Features
PPTX
A Comparison of People Counting Techniques via Video Scene Analysis
PPTX
ISM2014
PPTX
【DL輪読会】ViT + Self Supervised Learningまとめ
PDF
When Remote Sensing Meets Artificial Intelligence
PDF
Real time pedestrian detection, tracking, and distance estimation
PDF
Image Translation with GAN
PDF
【CVPR 2020 メタサーベイ】Video Analysis and Understanding
PDF
MIRU2014 SLAC
CVPR2016を自分なりにまとめてみた
Learning to Find and Match Interest Points
Fast Multi-frame Stereo Scene Flow with Motion Segmentation (CVPR 2017)
SLAM Zero to One
行動認識手法の論文・ツール紹介
SSII2020TS: Event-Based Camera の基礎と ニューラルネットワークによる信号処理 〜 生き物のように「変化」を捉えるビジョンセ...
物体検出の歴史(R-CNNからSSD・YOLOまで)
Real Time Human Posture Detection with Multiple Depth Sensors
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Learning Disentangled Representation for Robust Person Re-identification
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
Human Action Recognition Using 3D Joint Information and HOOFD Features
A Comparison of People Counting Techniques via Video Scene Analysis
ISM2014
【DL輪読会】ViT + Self Supervised Learningまとめ
When Remote Sensing Meets Artificial Intelligence
Real time pedestrian detection, tracking, and distance estimation
Image Translation with GAN
【CVPR 2020 メタサーベイ】Video Analysis and Understanding
MIRU2014 SLAC
Ad

Viewers also liked (20)

PDF
ECCV 2016 速報
PDF
【論文紹介】Fashion Style in 128 Floats: Joint Ranking and Classification using Wea...
PDF
【チュートリアル】コンピュータビジョンによる動画認識
PDF
【慶應大学講演】なぜ、博士課程に進学したか?
PDF
【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction...
PDF
Deep Residual Learning (ILSVRC2015 winner)
PDF
CVPR 2016 速報
PPTX
“Purikura” culture in Japan and our web application architecture
PDF
これからのコンピュータビジョン技術 - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMU研究会 2...
PPTX
スマホマーケットの概要と、 マーケティングの失敗例と改善 (アナリティクス アソシエーション 特別セミナー)
PDF
【CVPR2016_LAP】Dominant Codewords Selection with Topic Model for Action Recogn...
PDF
ILSVRC2015 手法のメモ
PDF
FacebookとブログからHPへ誘導し売上UPセミナー:新潟県写真館協会主催
PDF
CVPR 2016 まとめ v1
PDF
Deep Learning技術の最近の動向とPreferred Networksの取り組み
PDF
TensorFlowによるCNNアーキテクチャ構築
PDF
3 Benefits Of Hiring Marketing Contractors
PDF
Webinar Authority
PPTX
Dani sape96
PPTX
Attribution d'actions gratuites : nouvelle fiscalité !
ECCV 2016 速報
【論文紹介】Fashion Style in 128 Floats: Joint Ranking and Classification using Wea...
【チュートリアル】コンピュータビジョンによる動画認識
【慶應大学講演】なぜ、博士課程に進学したか?
【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction...
Deep Residual Learning (ILSVRC2015 winner)
CVPR 2016 速報
“Purikura” culture in Japan and our web application architecture
これからのコンピュータビジョン技術 - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMU研究会 2...
スマホマーケットの概要と、 マーケティングの失敗例と改善 (アナリティクス アソシエーション 特別セミナー)
【CVPR2016_LAP】Dominant Codewords Selection with Topic Model for Action Recogn...
ILSVRC2015 手法のメモ
FacebookとブログからHPへ誘導し売上UPセミナー:新潟県写真館協会主催
CVPR 2016 まとめ v1
Deep Learning技術の最近の動向とPreferred Networksの取り組み
TensorFlowによるCNNアーキテクチャ構築
3 Benefits Of Hiring Marketing Contractors
Webinar Authority
Dani sape96
Attribution d'actions gratuites : nouvelle fiscalité !
Ad

Similar to 【ECCV 2016 BNMW】Human Action Recognition without Human (20)

PPTX
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
PPTX
Human action recognition with kinect using a joint motion descriptor
PPTX
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
PDF
Visual Object Tracking: review
PDF
Sparse representation based human action recognition using an action region-a...
PPTX
Action_recognition-topic.pptx
PDF
Human Action Recognition
PDF
From Bio-Intelligence BI to Artificial-Intelligence AI in Engineering and STE...
PDF
From Bio-Intelligence BI to Artificial-Intelligence AI in Engineering and STE...
PDF
ROBUST STATISTICAL APPROACH FOR EXTRACTION OF MOVING HUMAN SILHOUETTES FROM V...
PPTX
02-07-20_Anees.pptx
PDF
med_poster_spie
PPTX
Classic video datasets and algorithms.pptx
PDF
20141003.journal club
PDF
Intro to Neural Networks
PDF
An analysis of_machine_and_human_analytics_in_classification
PDF
PCSJ-IMPS2024招待講演「動作認識と動画像符号化」2024年度画像符号化シンポジウム(PCSJ 2024) 2024年度映像メディア処理シンポジ...
PDF
ISMB2014読み会 イントロ + Deep learning of the tissue-regulated splicing code
PDF
Poster RITS motion_correction
PPTX
A Movement Recognition Method using LBP
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Human action recognition with kinect using a joint motion descriptor
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Visual Object Tracking: review
Sparse representation based human action recognition using an action region-a...
Action_recognition-topic.pptx
Human Action Recognition
From Bio-Intelligence BI to Artificial-Intelligence AI in Engineering and STE...
From Bio-Intelligence BI to Artificial-Intelligence AI in Engineering and STE...
ROBUST STATISTICAL APPROACH FOR EXTRACTION OF MOVING HUMAN SILHOUETTES FROM V...
02-07-20_Anees.pptx
med_poster_spie
Classic video datasets and algorithms.pptx
20141003.journal club
Intro to Neural Networks
An analysis of_machine_and_human_analytics_in_classification
PCSJ-IMPS2024招待講演「動作認識と動画像符号化」2024年度画像符号化シンポジウム(PCSJ 2024) 2024年度映像メディア処理シンポジ...
ISMB2014読み会 イントロ + Deep learning of the tissue-regulated splicing code
Poster RITS motion_correction
A Movement Recognition Method using LBP

More from Hirokatsu Kataoka (6)

PDF
【チュートリアル】コンピュータビジョンによる動画認識 v2
PDF
【ITSC2015】Fine-grained Walking Activity Recognition via Driving Recorder Dataset
PDF
【SSII2015】人を観る技術の先端的研究
PDF
PythonによるCVアルゴリズム実装
PDF
CV分野におけるサーベイ方法
PDF
【チュートリアル】動的な人物・物体認識技術 -Dense Trajectories-
【チュートリアル】コンピュータビジョンによる動画認識 v2
【ITSC2015】Fine-grained Walking Activity Recognition via Driving Recorder Dataset
【SSII2015】人を観る技術の先端的研究
PythonによるCVアルゴリズム実装
CV分野におけるサーベイ方法
【チュートリアル】動的な人物・物体認識技術 -Dense Trajectories-

Recently uploaded (20)

PPT
Chemical bonding and molecular structure
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
The scientific heritage No 166 (166) (2025)
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
HPLC-PPT.docx high performance liquid chromatography
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Chemical bonding and molecular structure
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
The scientific heritage No 166 (166) (2025)
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
microscope-Lecturecjchchchchcuvuvhc.pptx
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
Classification Systems_TAXONOMY_SCIENCE8.pptx
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
ECG_Course_Presentation د.محمد صقران ppt
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
HPLC-PPT.docx high performance liquid chromatography
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Derivatives of integument scales, beaks, horns,.pptx
Introduction to Fisheries Biotechnology_Lesson 1.pptx
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
7. General Toxicologyfor clinical phrmacy.pptx
Comparative Structure of Integument in Vertebrates.pptx
Biophysics 2.pdffffffffffffffffffffffffff
. Radiology Case Scenariosssssssssssssss
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg

【ECCV 2016 BNMW】Human Action Recognition without Human

  • 1. Human Action Recognition without Human He Yun1,2, Soma Shirakabe1,2, Yutaka Satoh1,2, Hirokatsu Kataoka1 1Computer Vision Research Group, AIST, Japan 2Human-Centered Vision Lab., University of Tsukuba, Japan
  • 2. Motion representation •  Database: UCF101, HMDB51, ActivityNet •  Approach: IDT, Two-Stream CNN –  DBs and approaches have been prepared in the field
  • 4. The problem setting in action recognition •  Video-level prediction –  1 action-label prediction per input video Tennis Swing Mo6on Descriptor
  • 5. Dense Trajectories (DT) [Wang+, CVPR11] •  Trajectory-based representation –  A large amount of trajectories –  Feature description (HOG, HOF, MBH) –  Codeword vector is generated
  • 6. Two-Stream CNN [Simonyan+, NIPS14] •  Spatial and temporal convolution –  Spatial-stream: From a RGB image –  Temporal-stream: From a stacked flows –  Score fusion: Average or SVM
  • 7. Is background enough to classify actions? •  RGB input is too strong! –  The two-stream CNN[Simonyan+, NIPS14] reported spatial-stream can understand an action more than expected •  72.4% with spatial-stream (RGB) @UCF101 •  “Human Action Recognition without Human”
  • 8. Without Human? •  Human action recognition can be done just by motion of the background? Tennis Swing Mo6on Descriptor Tennis Swing? Mo6on Descriptor
  • 9. Detailed setting of w/ and w/o Human •  With and without human setting –  Without human setting: center-blind image with UCF101 –  With human setting: inverse of the without human setting I (x, y) f (x, y) * I’ (x, y) 1/2 1/4 1/4 1/2 1/4 1/4 I (x, y) f (x, y) * I’ (x, y) 1/2 1/4 1/4 1/2 1/4 1/4 ー ー Without Human SeIng With Human SeIng
  • 10. Framework –  Baseline: Very deep two-stream CNN [Wang+, arXiv15] –  Two different scenarios: without human and with human
  • 11. Exploration experiment •  @UCF101 –  UCF101 pre-trained model with very deep two-stream CNN –  With/Without Human Setting
  • 13. Visual results (Without Human Setting)
  • 14. Without Human •  The concept of ”Human Action Recognition without Human” –  The accuracies are very close •  With human is +9.49% better than without human –  The current motion representation heavily rely on the backgrounds
  • 15. Future work •  This is a suggestive reality –  We must accept this reality to realize better motion representation –  Pure motion representation is an urgent work! •  More sophisticated approach •  Human only motion