[NS][Lab_Seminar_240923]Prompt-supervised Dynamic Attention Graph Convolutional Network for Skeleton-based Action Recognition.pptx

Prompt-supervised Dynamic
Attention Graph Convolutional
Network for Skeleton-based
Action Recognition
Tien-Bach-Thanh Do
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: osfa19730@catholic.ac.kr
2024/09/23
Shasha Zhu et al.
Neurocomputing 2024

2
Introduction
● Overview of Skeleton-based Action Recognition
○ Core task in video understanding, used in human-computer interaction, health monitoring
○ Skeleton sequences: high information density, low redundancy, clear structure
● Problem Statement
○ Existing methods fail to utilize precise high-level semantic action descriptions
● Objective
○ Propose a Prompt-supervised Dynamic Attention Graph Convolutional Network (PDA-GCN)
to improve accuracy in recognizing human actions

3
Motivation & Challenges
● Complexity in human actions: similar action manifestations can have different semantics
● Traditional methods:
○ CNNs
○ RNNs
○ GCNs fail to capture both global and local relationship effectively

4
Proposed model
● Prompt Supervision (PS) module:
○ Use pre-trained language models (LLMs) as knowledge engines
● Dynamic Attention Graph Convolution (DA-GC) module:
○ Self-attention mechanism for capturing relationships between joints
○ Dynamic convolution focus on local details, improving model accuracy

5
Model
● Main branch:
○ Encoder: process skeleton sequence data and extract joint relationships
○ Spatial modeling: DA-GC block for context-sensitive topology extraction
○ Temporal modeling: multi-scale temporal convolution for skeleton sequences over time
● Supervised branch:
○ Prompt supervision: use pre-stored text features from LLMs to refine classification

6
Key Innovation
● Dynamic Attention Graph Convolution (DA-GC):
○ Combine standard and dynamic convolution for local and global feature integration
● Prompt Supervision (PS):
○ Enhance model’s learning by introducing LLM-based action descriptions, improving discriminative
power with minimal computation cost

7
Model
Fig. 1. Architecture Overview of PDA-GCN. where represents the splicing operation, represents element multiplication, PE and
GAP represent position embedding and global average pooling, respectively. The CTR-GC block and MS-TC block are shown in
the green dotted box at the top of the figure, and the DA-GC block and PS block will be described in detail later

8
Model
● Input data is first pre-processed to convert the input skeleton sequence into an initial joint representation
● Supervision loss
● Overall loss

9
Model
Dynamic attention graph convolution module
Fig. 2. Overview of the DA-GC module. where and denotes the splicing operation and element product, DConv is a dynamic
⊕ ⊗
convolution, A is the predefined topology, BN is a group normalization, and ReLU is a activation function.

10
Model
● Attention graph A’:
● Dynamic topology
● Reset to:

11
Model
● Dynamic convolution to enhance local context information is proposed
● Attention weight

12
Model
Fig. 3. Overview of the dynamic convolution. Where, DWConv is a depthwise convolve

13
Model
Prompt supervision module
Fig. 4. Overview of PS module. N is the number of joint nodes, C is the number of current channels, cls is the number of action
categories and GAP is the global average pooling

14
Experiments
Experimental Settings
● Datasets:
○ NTU RGB+D 60 and 120 (common for skeleton-based action recognition)
● Metrics:
○ Cross-Subject and Cross-View used to measure model performance

20
Conclusion
● PDA-GCN provide a robust, efficient solution for skeleton-based action recognition
● Combine dynamic attention and prompt supervision for superior accuracy
● Extend the model to larger datasets and explore further integration with pre-trained models for human-
centric tasks

[NS][Lab_Seminar_240923]Prompt-supervised Dynamic Attention Graph Convolutional Network for Skeleton-based Action Recognition.pptx

More Related Content

Similar to [NS][Lab_Seminar_240923]Prompt-supervised Dynamic Attention Graph Convolutional Network for Skeleton-based Action Recognition.pptx (20)

More from thanhdowork (20)

Recently uploaded (20)

[NS][Lab_Seminar_240923]Prompt-supervised Dynamic Attention Graph Convolutional Network for Skeleton-based Action Recognition.pptx