SlideShare a Scribd company logo
Prompt-supervised Dynamic
Attention Graph Convolutional
Network for Skeleton-based
Action Recognition
Tien-Bach-Thanh Do
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: osfa19730@catholic.ac.kr
2024/09/23
Shasha Zhu et al.
Neurocomputing 2024
2
Introduction
● Overview of Skeleton-based Action Recognition
○ Core task in video understanding, used in human-computer interaction, health monitoring
○ Skeleton sequences: high information density, low redundancy, clear structure
● Problem Statement
○ Existing methods fail to utilize precise high-level semantic action descriptions
● Objective
○ Propose a Prompt-supervised Dynamic Attention Graph Convolutional Network (PDA-GCN)
to improve accuracy in recognizing human actions
3
Motivation & Challenges
● Complexity in human actions: similar action manifestations can have different semantics
● Traditional methods:
○ CNNs
○ RNNs
○ GCNs fail to capture both global and local relationship effectively
4
Proposed model
● Prompt Supervision (PS) module:
○ Use pre-trained language models (LLMs) as knowledge engines
● Dynamic Attention Graph Convolution (DA-GC) module:
○ Self-attention mechanism for capturing relationships between joints
○ Dynamic convolution focus on local details, improving model accuracy
5
Model
● Main branch:
○ Encoder: process skeleton sequence data and extract joint relationships
○ Spatial modeling: DA-GC block for context-sensitive topology extraction
○ Temporal modeling: multi-scale temporal convolution for skeleton sequences over time
● Supervised branch:
○ Prompt supervision: use pre-stored text features from LLMs to refine classification
6
Key Innovation
● Dynamic Attention Graph Convolution (DA-GC):
○ Combine standard and dynamic convolution for local and global feature integration
● Prompt Supervision (PS):
○ Enhance model’s learning by introducing LLM-based action descriptions, improving discriminative
power with minimal computation cost
7
Model
Fig. 1. Architecture Overview of PDA-GCN. where represents the splicing operation, represents element multiplication, PE and
GAP represent position embedding and global average pooling, respectively. The CTR-GC block and MS-TC block are shown in
the green dotted box at the top of the figure, and the DA-GC block and PS block will be described in detail later
8
Model
● Input data is first pre-processed to convert the input skeleton sequence into an initial joint representation
● Supervision loss
● Overall loss
9
Model
Dynamic attention graph convolution module
Fig. 2. Overview of the DA-GC module. where and denotes the splicing operation and element product, DConv is a dynamic
⊕ ⊗
convolution, A is the predefined topology, BN is a group normalization, and ReLU is a activation function.
10
Model
Dynamic attention graph convolution module
● Attention graph A’:
● Dynamic topology
● Reset to:
11
Model
Dynamic attention graph convolution module
● Dynamic convolution to enhance local context information is proposed
● Attention weight
12
Model
Fig. 3. Overview of the dynamic convolution. Where, DWConv is a depthwise convolve
13
Model
Prompt supervision module
Fig. 4. Overview of PS module. N is the number of joint nodes, C is the number of current channels, cls is the number of action
categories and GAP is the global average pooling
14
Experiments
Experimental Settings
● Datasets:
○ NTU RGB+D 60 and 120 (common for skeleton-based action recognition)
● Metrics:
○ Cross-Subject and Cross-View used to measure model performance
15
Experiments
Results
16
Experiments
Results
17
Experiments
Results
18
Experiments
Results
19
Experiments
Results
20
Conclusion
● PDA-GCN provide a robust, efficient solution for skeleton-based action recognition
● Combine dynamic attention and prompt supervision for superior accuracy
● Extend the model to larger datasets and explore further integration with pre-trained models for human-
centric tasks

More Related Content

PPTX
Semantic Segmentation on Satellite Imagery
PPTX
[NS][Lab_Seminar_250512]Context-based Interpretable Spatio-Temporal Graph Con...
PPTX
[NS][Lab_Seminar_241021]A Novel Adaptive Hypergraph Neural Network for Enhanc...
PDF
The Importance of Time in Visual Attention Models
PDF
A NOVEL GRAPH REPRESENTATION FOR SKELETON-BASED ACTION RECOGNITION
PPTX
[NS][Lab_Seminar_240701]G-CASCADE: Efficient Cascaded Graph Convolutional Dec...
PDF
A Novel Graph Representation for Skeleton-based Action Recognition
PDF
A Novel Graph Representation for Skeleton-based Action Recognition
Semantic Segmentation on Satellite Imagery
[NS][Lab_Seminar_250512]Context-based Interpretable Spatio-Temporal Graph Con...
[NS][Lab_Seminar_241021]A Novel Adaptive Hypergraph Neural Network for Enhanc...
The Importance of Time in Visual Attention Models
A NOVEL GRAPH REPRESENTATION FOR SKELETON-BASED ACTION RECOGNITION
[NS][Lab_Seminar_240701]G-CASCADE: Efficient Cascaded Graph Convolutional Dec...
A Novel Graph Representation for Skeleton-based Action Recognition
A Novel Graph Representation for Skeleton-based Action Recognition

Similar to [NS][Lab_Seminar_240923]Prompt-supervised Dynamic Attention Graph Convolutional Network for Skeleton-based Action Recognition.pptx (20)

PDF
A Novel Graph Representation for Skeleton-based Action Recognition
PDF
ANALYSIS OF LUNG NODULE DETECTION AND STAGE CLASSIFICATION USING FASTER RCNN ...
PDF
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...
PPTX
240628_Thanh_LabSeminar[Explore Internal and External Similarity for Single I...
PDF
A PERFORMANCE EVALUATION OF A PARALLEL BIOLOGICAL NETWORK MICROCIRCUIT IN NEURON
PPTX
[NS][Lab_Seminar_250407]AlignmentLearning.pptx
PDF
Clustering-based Analysis for Heavy-Hitter Flow Detection
PPTX
[NS][Lab_Seminar_250505]Dual-Graph Attention Convolution Network for 3D Point...
PPT
On qo s provisioning in context aware wireless sensor networks for healthcare
PDF
Control chart pattern recognition using k mica clustering and neural networks
PDF
Implementation of Non-restoring Reversible Divider Using a Quantum-Dot Cellul...
PDF
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
PPTX
250428_JW_labseminar[KGAT: Knowledge Graph Attention Network for Recommendati...
PDF
Deep Learning personalised, closed-loop Brain-Computer Interfaces for mu...
PPTX
[NS][Lab_Seminar_240622]Vision HGNN: An Image is More than a Graph of Nodes.pptx
PDF
E035425030
PPTX
crowd counting.pptx
PDF
Big data 2.0, deep learning and financial Usecases
PPTX
240325_Thanh_LabSeminar[SkeletalGNN].pptx
PPTX
250224_JH_Labseminar[Graph Attention Networks].pptx
A Novel Graph Representation for Skeleton-based Action Recognition
ANALYSIS OF LUNG NODULE DETECTION AND STAGE CLASSIFICATION USING FASTER RCNN ...
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...
240628_Thanh_LabSeminar[Explore Internal and External Similarity for Single I...
A PERFORMANCE EVALUATION OF A PARALLEL BIOLOGICAL NETWORK MICROCIRCUIT IN NEURON
[NS][Lab_Seminar_250407]AlignmentLearning.pptx
Clustering-based Analysis for Heavy-Hitter Flow Detection
[NS][Lab_Seminar_250505]Dual-Graph Attention Convolution Network for 3D Point...
On qo s provisioning in context aware wireless sensor networks for healthcare
Control chart pattern recognition using k mica clustering and neural networks
Implementation of Non-restoring Reversible Divider Using a Quantum-Dot Cellul...
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
250428_JW_labseminar[KGAT: Knowledge Graph Attention Network for Recommendati...
Deep Learning personalised, closed-loop Brain-Computer Interfaces for mu...
[NS][Lab_Seminar_240622]Vision HGNN: An Image is More than a Graph of Nodes.pptx
E035425030
crowd counting.pptx
Big data 2.0, deep learning and financial Usecases
240325_Thanh_LabSeminar[SkeletalGNN].pptx
250224_JH_Labseminar[Graph Attention Networks].pptx
Ad

More from thanhdowork (20)

PPTX
[NS][Lab_Seminar_250811]Imagine and Seek: Improving Composed Image Retrieval ...
PPTX
250811_HW_LabSeminar[Self-Supervised Graph Information Bottleneck for Multivi...
PPTX
250811_Thien_Labseminar[Cluster-GCN].pptx
PPTX
250811_Thuy_Labseminar[BioBRIDGE: BRIDGING BIOMEDICAL FOUNDATION MODELS VIA K...
PPTX
[NS][Lab_Seminar_250728]On the Trade-off between Over-smoothing and Over-squa...
PPTX
250804_HW_LabSeminar[Discrete Curvature Graph Information Bottleneck].pptx
PPTX
250728_Thuy_Labseminar[Knowledge Enhanced Representation Learning for Drug Di...
PPTX
250728_Thuy_Labseminar[Predictive Chemistry Augmented with Text Retrieval].pptx
PPTX
[NS][Lab_Seminar_250728]NeuralWalker.pptx
PPTX
A Novel Shape-Aware Topological Representation for GPR Data with DNN Integrat...
PPTX
250721_Thien_Labseminar[Variational Graph Auto-Encoders].pptx
PPTX
250721_HW_LabSeminar[RingFormer: A Ring-Enhanced Graph Transformer for Organi...
PPTX
250721_Thuy_Labseminar[Thought Propagation: An Analogical Approach to Complex...
PPTX
[NS][Lab_Seminar_250721]On Measuring Long-Range Interactions in Graph Neural ...
PPTX
250714_HW_LabSeminar[Structural Reasoning Improves Molecular Understanding of...
PPTX
[NS][Lab_Seminar_250714]Candidate Set Re-ranking for Composed Image Retrieval...
PPTX
250714_Thuy_Labseminar[BioT5: Enriching Cross-modal Integration in Biology wi...
PPTX
250707_HW_LabSeminar[CHEMICAL-REACTION-AWARE MOLECULE REPRESENTATION LEARNING...
PPTX
[NS][Lab_Seminar_250707]Learning with Noisy Triplet Correspondence for Compos...
PPTX
250707_JW_labseminar[CBAM: Convolutional Block Attention Module].pptx
[NS][Lab_Seminar_250811]Imagine and Seek: Improving Composed Image Retrieval ...
250811_HW_LabSeminar[Self-Supervised Graph Information Bottleneck for Multivi...
250811_Thien_Labseminar[Cluster-GCN].pptx
250811_Thuy_Labseminar[BioBRIDGE: BRIDGING BIOMEDICAL FOUNDATION MODELS VIA K...
[NS][Lab_Seminar_250728]On the Trade-off between Over-smoothing and Over-squa...
250804_HW_LabSeminar[Discrete Curvature Graph Information Bottleneck].pptx
250728_Thuy_Labseminar[Knowledge Enhanced Representation Learning for Drug Di...
250728_Thuy_Labseminar[Predictive Chemistry Augmented with Text Retrieval].pptx
[NS][Lab_Seminar_250728]NeuralWalker.pptx
A Novel Shape-Aware Topological Representation for GPR Data with DNN Integrat...
250721_Thien_Labseminar[Variational Graph Auto-Encoders].pptx
250721_HW_LabSeminar[RingFormer: A Ring-Enhanced Graph Transformer for Organi...
250721_Thuy_Labseminar[Thought Propagation: An Analogical Approach to Complex...
[NS][Lab_Seminar_250721]On Measuring Long-Range Interactions in Graph Neural ...
250714_HW_LabSeminar[Structural Reasoning Improves Molecular Understanding of...
[NS][Lab_Seminar_250714]Candidate Set Re-ranking for Composed Image Retrieval...
250714_Thuy_Labseminar[BioT5: Enriching Cross-modal Integration in Biology wi...
250707_HW_LabSeminar[CHEMICAL-REACTION-AWARE MOLECULE REPRESENTATION LEARNING...
[NS][Lab_Seminar_250707]Learning with Noisy Triplet Correspondence for Compos...
250707_JW_labseminar[CBAM: Convolutional Block Attention Module].pptx
Ad

Recently uploaded (20)

PPTX
Institutional Correction lecture only . . .
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Complications of Minimal Access Surgery at WLH
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Pharma ospi slides which help in ospi learning
PDF
01-Introduction-to-Information-Management.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Lesson notes of climatology university.
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Pre independence Education in Inndia.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Computing-Curriculum for Schools in Ghana
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
master seminar digital applications in india
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
Institutional Correction lecture only . . .
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Complications of Minimal Access Surgery at WLH
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
VCE English Exam - Section C Student Revision Booklet
Pharma ospi slides which help in ospi learning
01-Introduction-to-Information-Management.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Lesson notes of climatology university.
Microbial diseases, their pathogenesis and prophylaxis
2.FourierTransform-ShortQuestionswithAnswers.pdf
Anesthesia in Laparoscopic Surgery in India
Pre independence Education in Inndia.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Final Presentation General Medicine 03-08-2024.pptx
Computing-Curriculum for Schools in Ghana
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
master seminar digital applications in india
PPH.pptx obstetrics and gynecology in nursing
102 student loan defaulters named and shamed – Is someone you know on the list?

[NS][Lab_Seminar_240923]Prompt-supervised Dynamic Attention Graph Convolutional Network for Skeleton-based Action Recognition.pptx

  • 1. Prompt-supervised Dynamic Attention Graph Convolutional Network for Skeleton-based Action Recognition Tien-Bach-Thanh Do Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: osfa19730@catholic.ac.kr 2024/09/23 Shasha Zhu et al. Neurocomputing 2024
  • 2. 2 Introduction ● Overview of Skeleton-based Action Recognition ○ Core task in video understanding, used in human-computer interaction, health monitoring ○ Skeleton sequences: high information density, low redundancy, clear structure ● Problem Statement ○ Existing methods fail to utilize precise high-level semantic action descriptions ● Objective ○ Propose a Prompt-supervised Dynamic Attention Graph Convolutional Network (PDA-GCN) to improve accuracy in recognizing human actions
  • 3. 3 Motivation & Challenges ● Complexity in human actions: similar action manifestations can have different semantics ● Traditional methods: ○ CNNs ○ RNNs ○ GCNs fail to capture both global and local relationship effectively
  • 4. 4 Proposed model ● Prompt Supervision (PS) module: ○ Use pre-trained language models (LLMs) as knowledge engines ● Dynamic Attention Graph Convolution (DA-GC) module: ○ Self-attention mechanism for capturing relationships between joints ○ Dynamic convolution focus on local details, improving model accuracy
  • 5. 5 Model ● Main branch: ○ Encoder: process skeleton sequence data and extract joint relationships ○ Spatial modeling: DA-GC block for context-sensitive topology extraction ○ Temporal modeling: multi-scale temporal convolution for skeleton sequences over time ● Supervised branch: ○ Prompt supervision: use pre-stored text features from LLMs to refine classification
  • 6. 6 Key Innovation ● Dynamic Attention Graph Convolution (DA-GC): ○ Combine standard and dynamic convolution for local and global feature integration ● Prompt Supervision (PS): ○ Enhance model’s learning by introducing LLM-based action descriptions, improving discriminative power with minimal computation cost
  • 7. 7 Model Fig. 1. Architecture Overview of PDA-GCN. where represents the splicing operation, represents element multiplication, PE and GAP represent position embedding and global average pooling, respectively. The CTR-GC block and MS-TC block are shown in the green dotted box at the top of the figure, and the DA-GC block and PS block will be described in detail later
  • 8. 8 Model ● Input data is first pre-processed to convert the input skeleton sequence into an initial joint representation ● Supervision loss ● Overall loss
  • 9. 9 Model Dynamic attention graph convolution module Fig. 2. Overview of the DA-GC module. where and denotes the splicing operation and element product, DConv is a dynamic ⊕ ⊗ convolution, A is the predefined topology, BN is a group normalization, and ReLU is a activation function.
  • 10. 10 Model Dynamic attention graph convolution module ● Attention graph A’: ● Dynamic topology ● Reset to:
  • 11. 11 Model Dynamic attention graph convolution module ● Dynamic convolution to enhance local context information is proposed ● Attention weight
  • 12. 12 Model Fig. 3. Overview of the dynamic convolution. Where, DWConv is a depthwise convolve
  • 13. 13 Model Prompt supervision module Fig. 4. Overview of PS module. N is the number of joint nodes, C is the number of current channels, cls is the number of action categories and GAP is the global average pooling
  • 14. 14 Experiments Experimental Settings ● Datasets: ○ NTU RGB+D 60 and 120 (common for skeleton-based action recognition) ● Metrics: ○ Cross-Subject and Cross-View used to measure model performance
  • 20. 20 Conclusion ● PDA-GCN provide a robust, efficient solution for skeleton-based action recognition ● Combine dynamic attention and prompt supervision for superior accuracy ● Extend the model to larger datasets and explore further integration with pre-trained models for human- centric tasks