cvpr2011: human activity recognition - part 4: syntactic

Frontiers of
Human Activity Analysis

J. K. Aggarwal
Michael S. Ryoo
Kris M. Kitani

Overview

Human Activity Recognition

Single-layer Hierarchical

Space-time Statistical
Syntactic

Sequential Descriptive

2

Motivation

How do we interpret a sequence of actions?

3

Hierarchy
Hierarchy implies decomposition into sub-parts

4

Now we’ll cover…

Human Activity Recognition

Single-layer Hierarchical

Space-time Statistical
Syntactic
Sequential Descriptive

5

Syntactic
Approaches

6

Syntactic Models

Activities as strings of symbols.

What is the underlying structure?

7

Early applications to Vision
Tsai and Fu 1980.
Attributed Grammar-A Tool for Combining Syntactic and Statistical Approaches to Pattern Recognition.

8

Hierarchical syntactic approach
  Useful for activities with:
  Deep hierarchical structure
  Repetitive (cyclic) structure

  Not for
  Systems with a lot of errors and uncertainty
  Activities with shallow structure

9

Basics
Context-Free Grammar

Generic Language Natural Languages

Start Symbol (S) Sentences

Set of Terminal Symbols (T) Words

Set of Non-Terminal Symbols (N) Parts of Speech

Set of Production Rules (P) Syntax Rules

10

Parsing with a grammar
S → NP VP (0.8) PP → PREP NP (1.0)
S → VP (0.2) PREP → like (1.0)
NP → NOUN (0.4) VERB → swat (0.2)
NP → NOUN PP (0.4) VERB → flies (0.4)
NP → NOUN NP (0.2) VERB → like (0.4)
VP → VERB (0.3) NOUN → swat (0.05)
VP → VERB NP (0.3) NOUN → flies (0.45)
VP → VERB PP (0.2) NOUN → ants (0.5)
VP → VERB NP PP (0.2)

swat flies like ants
11

Parsing with a grammar
S → NP VP (0.8) PP → PREP NP (1.0)
S → VP (0.2) PREP → like (1.0)
NP → NOUN (0.4) VERB → swat (0.2)
NP → NOUN PP (0.4) VERB → flies (0.4)
NP → NOUN NP (0.2) VERB → like (0.4)
VP → VERB (0.3) NOUN → swat (0.05)
VP → VERB NP (0.3) NOUN → flies (0.45)
VP → VERB PP (0.2) NOUN → ants (0.5)
VP → VERB NP PP (0.2)

S
NP (0.8) VP

(0.2) (0.3)
NOUN NP NP
(0.4) (0.4)

NOUN VERB NOUN
(0.05) (0.45) (0.4) (0.5)

swat flies like ants
12

Video analysis with CFGs

The “Inverse Hollywood problem”:
From video to scripts and storyboards via causal analysis.
Brand 1997

Action Recognition using Probabilistic Parsing.
Bobick and Ivanov 1998

Recognizing Multitasked Activities from Video using
Stochastic Context-Free Grammar.
Moore and Essa 2001

13

CFG for human activities

enter detach leave enter detach attach touch touch detach attach leave

M. Brand. The "Inverse Hollywood Problem":
From video to scripts and storyboards
via causal analysis. AAAI 1997.

14

Parse tree
SCENE (Open up a PC)

IN ACTION (Open PC) ACTION (unscrew) OUT

OUT IN MOVE REMOVE

ADD ADD
MOTION MOTION

enter detach leave enter detach attach touch touch detach attach leave

•  Deterministic low-level primitive detection
•  Deterministic parsing

M. Brand. The "Inverse Hollywood Problem": From video to scripts and storyboards via causal analysis. AAAI 1997.
15

Stochastic CFGs
Action Recognition using Probabilistic Parsing.
Bobick and Ivanov 1998

16

Gesture analysis with CFGs
Primitive recognition with HMMs

Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998
17

left-right

18

up-down

19

right-left

20

down-up

21

Parse Tree
S

RH

TOP UD BOT DU

LR RL

left-right up-down right-left down-up

22

Errors
Likelihood value over time (not discrete symbols)

HMM a

HMM b

Errors are inevitable…
but the grammar acts as a top-down constraint

23

Dealing with uncertainty & errors
  Stolcke-Early (probabilistic) parser
  SKIP rules to deal with insertion errors

HMM a

HMM b

HMM c

24

SCFG for Blackjack
Recognizing Multitasked Activities from Video using
Stochastic Context-Free Grammar.
Moore and Essa 2001

•  Deals with more complex activities
•  Deals with more error types

25

extracting primitive actions

26

Game grammar

Recognizing Multitasked Activities from Video using Stochastic Context-Free Grammar. Moore and Essa 2001
27

Dealing with errors

  Ungrammatical strings cause parser to fail
  Account for errors with multiple hypothesis
  Insertion, deletion, substitution

  Issues
  How many errors should we tolerate?
  Potentially exponential hypothesis space
  Ungrammatical strings: vision problem or illegal
activity?

28

Observations
  CFGs good for structured activities
  Can incorporate uncertainty in observations
  Natural contextual prior for recognizing errors

  Not clear how to deal with errors
  Assumes ‘good’ action classifiers
  Need to define grammar manually

Can we learn the grammar from data?

29

Heuristic Grammatical Induction

1.  Lexicon learning
•  Learn HMMs
•  Cluster HMMs

2.  Convert video to string

3.  Learn Grammar

Unsupervised Analysis of Human Gestures. Wang et al 2001
30

COMPRESSIVE
a b c d a b c d b c d a b a b
length occurrence new rule new symbol

deletion of insertion of
substring new rule
substring

On-Line and Off-Line Heuristics for Inferring Hierarchies of Repetitions in Sequences.
Nevill-Manning 2000.
31

example

S→a b c d a b c d b c d a b a b
(DL=16)

A→b c d
S→a A a A A a b a b
(DL=14)

Repeat until compression becomes 0.
32

Critical assumption
  No uncertainty
  No errors
  insertions
  deletions
  substitution

Can we learn grammars despite errors?

33

Learning with noise
Can we learn the basic structure of a transaction?

Recovering the basic structure of human activities from
noisy video-based symbol strings. Kitani et al 2008.
34

extracting primitives

Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
35

Underlying structure?

D → a x b y c a b x c y a b c x

36



D→a b c a b c a b c

37




38




A→a b c D → A A A
Simple grammar Efficient compression

39

Information Theory Problem (MDL)

ˆ
G = arg min {DL(G) + DL(D|G)}
Model complexity Data compression
G

40


ˆ
G
DL(G) = − log p(G)
Model complexity
= − log p(θS , GS )
= − log p(θS |GS ) − log p(GS )
= DL(θS |GS ) − DL(GS )
Grammar parameters Grammar structure

41


ˆ
G
DL(G) = − log p(G)
Model complexity
= − log p(θS , GS )
= − log p(θS |GS ) − log p(GS )
= DL(θS |GS ) − DL(GS )
Grammar parameters Grammar structure

DL(D|G) = − log p(D|G)
Data compression Likelihood
(inside probabilities)

42

Minimum Description Length

43

Minimum Description Length

44

45

46

Conclusions
  Possible to learn basic structure
  Robust to errors
(insertion, deletion, substitution)

  Need a lot of training data
  Computational complexity

47

Bayesian Approaches

Inﬁnite Hierarchical Hidden Markov Models. The Inﬁnite PCFG using Hierarchical Dirichlet Processes.
Heller et al 2009. Liang et al 2007.

48

Take home message
Hierarchical Syntactic Models

  Useful for activities with:
  Deep hierarchical structure
  Repetitive (cyclic) structure

  Not for
  Systems with a lot of errors and uncertainty
  Activities with weak structure

49

Statistical
Approaches

50

Using a hierarchical statistical approach

  Use when
  Low-level action detectors are noisy
  Structure of activity is sequential
  Integrating dynamics

  Not for
  Activities with deep hierarchical structure
  Activities with complex temporal structure

51

Statistical (State-based) Model
Activities as a stochastic path.

What are the underlying dynamics?

52

Characteristics
  Strong Markov assumption
  Strong dynamics prior
  Robust to uncertainty

  Modifications to account for
  Hierarchical structure
  Concurrent structure

53

Hierarchical activities
Problem:
How do we model
hierarchical activities?

combinatory state space!

Solution:
“stack” actions for
hierarchical activities

54

Hierarchical hidden Markov model

Learning and Detecting Activities from Movement Trajectories Using the
Hierarchical Hidden Markov Models. Nguyen et al 2005
55

Context-free activity grammar

Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Models. Nguyen et al 2005
56

Context-free activity grammar

Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Models. Nguyen et al 2005
57

Observations
  Tree structures useful for hierarchies
  Tight integration of trajectories with
abstract semantic states

  Activities are not always a single
sequence
(ie. they sometimes happen in parallel )

58

Concurrent activities
Problem:
How do we model
concurrent activities?

combinatory state space!

Solution:
“stand-up” model for
concurrent activities

59

Propagation network

Propagation Networks for Recognition of Partially Ordered Sequential Action. Shi et al 2004
60

Propagation Networks for Recognition of Partially Ordered Sequential Action. Shi et al 2004
61

temporal inference

Inference by standing the state transition model on its side
62

Inferring structure (storylines)
Understanding Videos, Constructing Plots –
Learning a Visually Grounded Storyline Model from Annotated Videos
Gupta, Srinivasan, Shi and Davis CVPR 2009

Learn AND-OR graphs from weakly labeled data

63

Scripts from structure

Understanding Videos, Constructing Plots - Learning a Visually Grounded Storyline Model from Annotated Videos.
Gupta, Srinivasan, Shi and Davis CVPR 2009
64

Take home message
Hierarchical statistical model

  Use when
  Low-level action detectors are noisy
  Structure of activity is sequential
  Integrating dynamics

  Not for
  Activities with deep hierarchical structure
  Activities with complex temporal structure

65

Contrasting hierarchical approaches

Actions as: Activities as: Model Characteristic

probabilistic Robust to
Statistic paths DBN
states uncertainty

discrete Describes
Syntactic strings CFG
symbols deep hierarchy

Encodes
logical
Descriptive sets CFG, MLN complex
relationships
logic

66

References
(not included in ACM survey paper)

  W. Tsai and K.S. Fu. Attributed Grammar-A Tool for Combining
Syntactic and Statistical Approaches to Pattern Recognition. SMC1980.
  M. Brand. The "Inverse Hollywood Problem":
From video to scripts and storyboards via causal analysis. AAAI 1997.
  T. Wang, H. Shum, Y. Xu, N. Zheng. Unsupervised Analysis of Human
Gestures. PRCM 2001.
  C.G. Nevill-Manning, I.H. Witten. On-Line and Off-Line Heuristics for
Inferring Hierarchies of Repetitions in Sequences. IEEE 2000.
  K. Heller, Y.W. Teh and D. Gorur. Inﬁnite Hierarchical Hidden Markov Model
s. AISTATS 2009.
  P. Liang, S. Petrov, M. Jordan, D. Klein. The Inﬁnite PCFG using
Hierarchical Dirichlet Processes. EMNLP 2007.
  A. Gupta, N. Srinivasan, J.Shi and L.Davis. Understanding Videos,
Constructing Plots - Learning a Visually Grounded Storyline Model from
Annotated Videos. CVPR 2009.

67

cvpr2011: human activity recognition - part 4: syntactic

More Related Content

Viewers also liked (7)

More from zukun (20)

Recently uploaded (20)

cvpr2011: human activity recognition - part 4: syntactic