SlideShare a Scribd company logo
Quang-Huy Tran
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: huytran1126@gmail.com
2024-09-02
Dynamic Semantic-Based Spatial Graph
Convolution Network for Skeleton-Based
Human Action Recognition
Jianyang Xie et al.
AAAI-2024: The Thirty-Eighth AAAI Conference on Artificial Intelligence
2
OUTLINE
• MOTIVATION
• METHODOLOGY
• EXPERIMENT & RESULT
• CONCLUSION
3
MOTIVATION
• Human action recognition (HAR) is an essential topic:
o computer vision and wide range of applications.
o based-on skeleton sensor.
o Traditional methods (CNN/RNN) or STGNN extracting handcrafted features from skeleton sequence.
Overview and Limitation
o SOTA ST-GCN considered fixed graph.
 insufficient to capture changeable movements.
o Adaptive adjacency based: ignored the
semantic information.
 insufficient to capture semantic properties of
actions.
o Semantic-guided: explicit input encoding.
 Not flexible and cooperate when in deeper GCN.
• Challenges:
4
INTRODUCTION
• Propose temporal-causal SFD network (TC-SFDN) architecture to detect the forgeries at
the frame, clip and action levels.
o a hierarchical GCN architecture to learn both low-level skeleton representations based on physical
body connections.
o high-level action representations based on the temporal-causal graph for each action instance.
Contribution
• Propose dynamic semantic-based graph neural convolutions network (DS-GCN):
o encode the dynamical semantic information of joints and edges implicitly.
o joint/edge type was encoded with different transform functions, each of which represents a specific
distribution
• A group of SSL tasks are designed to efficiently train TC-SFDN for multilevel SFD.
5
METHODOLOGY
Problem Definition
• A skeleton data is constructed as spatial-temporal graph
o N body joints in T frames: .
o : spatial and temporal link.
o : joint coordinates as the node feature, d is dimension.
o Spatial graph: intra-body .
o Temporal graph: Same joints along consecutive frames .
o ST-GCN can be divided into using 1D temporal convolution: S-GCN (focus on) and T-GCN.
• Topology-Fixed Graph Convolution Network:
o Update the node representation by aggregating information from its neighborhood.
o Denotes adjacency three partition
o Output of S-GCN from input
6
METHODOLOGY
Problem Definition
• Topology-Adaptive Graph Convolution Network:
o Adaptive matrix dynamically learned with self attention mechanism.
o Suppose with 2 two transformation functions, the correlation between 2
joints:
• Semantic-Guided Graph Convolution Network:
o input feature was refined by adding a one-hot vector of joint types
o Adaptive matrix S-GCN:
7
METHODOLOGY
Main Architecture: DS-GCN
8
METHODOLOGY
Dynamic Semantic-Based GCN
• Topology-adaptive GCN:
o Joint and edge types encoded dynamically.
o a directed graph G = (V, E, A, R, X), A and R denote the type mapping function for each node, edge:
o Semantic-based adaptive graph for node and edge:
9
METHODOLOGY
Dynamic Semantic-Based GCN
• Node Type-Aware Adaptive Topology.
o projected into their individual feature space with a node type mapping function.
o Calculate according to the non-local mechanism.
 s and t as two nodes of different types, node-aware feature representation:
o Directed correction between node sand t along channel dimension:
10
METHODOLOGY
Dynamic Semantic-Based GCN
• Edge Type-Aware Adaptive Topology.
o applying separate convolution kernel on the adaptive graph.
o Given three nodes s, t and u of different types, edge type-aware adaptive correlation:
o Edge type-aware topology can be represented
 s and t is the node type index, M is the number of types.
11
METHODOLOGY
Dynamic Semantic-Based GCN
• Decomposed into three branches:
o The node-type aware branch, edge-type aware branch, and general branch.
o A branch-wise weight:
 learnable and utilized for the combination of a shared correction matrix.
o For each branch, combination of a shared correction matrix and a self-adaptive graph was utilized for
spatial graph convolution operation.
 3 branches were concatenated along feature channel dimension and followed by a 1 × 1 convolution kernel.
 Process DS-GCN:
12
METHODOLOGY
Model Architecture
• Ten blocks in series:
o Followed by a global average pooling and a softmax classifier.
o Number of basic feature channels is 64 and doubled at 5th
and 8th
block.
o Each block: 1 DS-GCN and multi-scale temporal module (temporal convolution network).
13
EXPERIMENT AND RESULT
Experiment Settings
• Dataset: human action recognition
o NTU-RGB+D and Kinetics-400.
• Baselines:
o STGNN or GNN: ST-GCN [1], SGN[2], AS-GCN[3], RA-GCN[4], 2s-GCN[5], GCNN[6], FGCN[7], shiftGCN[8],
DSTA-Net[9], MS-G3D[10], CTR-GCN[11] and ST-GCN++[12].
o CNN: PoseConv3D[13].
[1] Yan, S., Xiong, Y., & Lin, D. (2018, April). Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI conference on artificial intelligence (Vol. 32, No. 1).
[2] Zhang, P., Lan, C., Zeng, W., Xing, J., Xue, J., & Zheng, N. (2020). Semantics-guided neural networks for efficient skeleton-based human action recognition. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1112-1121).
[3] Li, M., Chen, S., Chen, X., Zhang, Y., Wang, Y., & Tian, Q. (2019). Actional-structural graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3595-3603).
[4] Song, Y. F., Zhang, Z., Shan, C., & Wang, L. (2020). Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 31(5), 1915-1925.
[5] Shi, L., Zhang, Y., Cheng, J., & Lu, H. (2019). Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12026-12035).
[6] Shi, L., Zhang, Y., Cheng, J., & Lu, H. (2019). Skeleton-based action recognition with directed graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7912-7921).
[7] Yang, H., Yan, D., Zhang, L., Sun, Y., Li, D., & Maybank, S. J. (2021). Feedback graph convolutional network for skeleton-based action recognition. IEEE Transactions on Image Processing, 31, 164-175.
[8] Cheng, K., Zhang, Y., He, X., Chen, W., Cheng, J., & Lu, H. (2020). Skeleton-based action recognition with shift graph convolutional network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 183-192).
[9] Shi, L., Zhang, Y., Cheng, J., & Lu, H. (2020). Decoupled spatial-temporal attention network for skeleton-based action-gesture recognition. In Proceedings of the Asian conference on computer vision.
[10] Liu, Z., Zhang, H., Chen, Z., Wang, Z., & Ouyang, W. (2020). Disentangling and unifying graph convolutions for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 143-152).
[11] Chen, Y., Zhang, Z., Yuan, C., Li, B., Deng, Y., & Hu, W. (2021). Channel-wise topology refinement graph convolution for skeleton-based action recognition. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 13359-13368).
[12] Duan, H., Wang, J., Chen, K., & Lin, D. (2022, October). Pyskl: Towards good practices for skeleton action recognition. In Proceedings of the 30th ACM International Conference on Multimedia (pp. 7351-7354).
[13] Duan, H., Zhao, Y., Chen, K., Lin, D., & Dai, B. (2022). Revisiting skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2969-2978).
• Measurement:
o Accuracy (ACC).
14
EXPERIMENT AND RESULT
Result – Overall Performance
Tab. Classification accuracy comparison against state-of-the-art methods.
15
EXPERIMENT AND RESULT
Result – Ablation study.
Tab. Generalization of the proposed semantic module.
Tab. Ablation On the edge/node type encoding.
Tab. Comparison DS-GCN in different learnable weight manners.
Tab. Exploration on the semantic encoding stage.
16
CONCLUSION
• Propose 2 dynamical semantic-based adaptive graph:
o Node type-aware and edge type-aware adaptive graph.
o Can be apply to any ST-GCN models for skeleton-based recognition.
Summarization
• Generated a dynamic semantic-based graph neural network for skeleton-based human
action recognition:
o outperforms SOTA methods notably on both NTURGB+D and Kinetics-400.
[20240902_LabSeminar_Huy]Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition.pptx
[20240902_LabSeminar_Huy]Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition.pptx

More Related Content

PPTX
[20240812_LabSeminar_Huy]Spatio-Temporal Fusion for Human Action Recognition ...
PPTX
[20240825_LabSeminar_Huy]Self-Supervised Learning for Multilevel Skeleton-Bas...
PPTX
[20240703_LabSeminar_Huy]MakeGNNGreatAgain.pptx
PPTX
[20240805_LabSeminar_Huy]GPT-ST: Generative Pre-Training of Spatio-Temporal G...
PPTX
[20240930_LabSeminar_Huy]GinAR: An End-To-End Multivariate Time Series Foreca...
PPTX
[20240628_LabSeminar_Huy]ScalableSTGNN.pptx
PPTX
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
PPTX
[20240614_LabSeminar_Huy]GRLSTM: Trajectory Similarity Computation with Graph...
[20240812_LabSeminar_Huy]Spatio-Temporal Fusion for Human Action Recognition ...
[20240825_LabSeminar_Huy]Self-Supervised Learning for Multilevel Skeleton-Bas...
[20240703_LabSeminar_Huy]MakeGNNGreatAgain.pptx
[20240805_LabSeminar_Huy]GPT-ST: Generative Pre-Training of Spatio-Temporal G...
[20240930_LabSeminar_Huy]GinAR: An End-To-End Multivariate Time Series Foreca...
[20240628_LabSeminar_Huy]ScalableSTGNN.pptx
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240614_LabSeminar_Huy]GRLSTM: Trajectory Similarity Computation with Graph...

Similar to [20240902_LabSeminar_Huy]Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition.pptx (20)

PPTX
[20240712_LabSeminar_Huy]Spatio-Temporal Neural Structural Causal Models for ...
PPTX
[20240710_LabSeminar_Huy]PDFormer: Propagation Delay-Aware Dynamic Long-Range...
PPTX
[20240520_LabSeminar_Huy]DSTAGNN: Dynamic Spatial-Temporal Aware Graph Neural...
PPTX
Scrdet++ analysis
PPTX
[20240408_LabSeminar_Huy]PivotalSTGNN.pptx
PPTX
[2024107_LabSeminar_Huy]MFTraj: Map-Free, Behavior-Driven Trajectory Predicti...
PPTX
[20240617_LabSeminar_Huy]Long-term Spatio-Temporal Forecasting via Dynamic Mu...
PPTX
[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...
PPTX
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
PPTX
[20240527_LabSeminar_Huy]Meta-Graph.pptx
PDF
A Novel Graph Representation for Skeleton-based Action Recognition
PDF
A Novel Graph Representation for Skeleton-based Action Recognition
PDF
A Novel Graph Representation for Skeleton-based Action Recognition
PDF
IRJET- Identification of Missing Person in the Crowd using Pretrained Neu...
PPTX
[20240701_LabSeminar_Huy]TelTrans: Applying Multi-Type Telecom Data to Transp...
PDF
FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...
PDF
NONLINEAR MODELING AND ANALYSIS OF WSN NODE LOCALIZATION METHOD
PDF
NONLINEAR MODELING AND ANALYSIS OF WSN NODE LOCALIZATION METHOD
PDF
Evaluation of conditional images synthesis: generating a photorealistic image...
PPTX
[20240819_LabSeminar_Huy]Learning Decomposed Spatial Relations for Multi-Vari...
[20240712_LabSeminar_Huy]Spatio-Temporal Neural Structural Causal Models for ...
[20240710_LabSeminar_Huy]PDFormer: Propagation Delay-Aware Dynamic Long-Range...
[20240520_LabSeminar_Huy]DSTAGNN: Dynamic Spatial-Temporal Aware Graph Neural...
Scrdet++ analysis
[20240408_LabSeminar_Huy]PivotalSTGNN.pptx
[2024107_LabSeminar_Huy]MFTraj: Map-Free, Behavior-Driven Trajectory Predicti...
[20240617_LabSeminar_Huy]Long-term Spatio-Temporal Forecasting via Dynamic Mu...
[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
[20240527_LabSeminar_Huy]Meta-Graph.pptx
A Novel Graph Representation for Skeleton-based Action Recognition
A Novel Graph Representation for Skeleton-based Action Recognition
A Novel Graph Representation for Skeleton-based Action Recognition
IRJET- Identification of Missing Person in the Crowd using Pretrained Neu...
[20240701_LabSeminar_Huy]TelTrans: Applying Multi-Type Telecom Data to Transp...
FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...
NONLINEAR MODELING AND ANALYSIS OF WSN NODE LOCALIZATION METHOD
NONLINEAR MODELING AND ANALYSIS OF WSN NODE LOCALIZATION METHOD
Evaluation of conditional images synthesis: generating a photorealistic image...
[20240819_LabSeminar_Huy]Learning Decomposed Spatial Relations for Multi-Vari...
Ad

More from thanhdowork (20)

PPTX
[NS][Lab_Seminar_250811]Imagine and Seek: Improving Composed Image Retrieval ...
PPTX
250811_HW_LabSeminar[Self-Supervised Graph Information Bottleneck for Multivi...
PPTX
250811_Thien_Labseminar[Cluster-GCN].pptx
PPTX
250811_Thuy_Labseminar[BioBRIDGE: BRIDGING BIOMEDICAL FOUNDATION MODELS VIA K...
PPTX
[NS][Lab_Seminar_250728]On the Trade-off between Over-smoothing and Over-squa...
PPTX
250804_HW_LabSeminar[Discrete Curvature Graph Information Bottleneck].pptx
PPTX
250728_Thuy_Labseminar[Knowledge Enhanced Representation Learning for Drug Di...
PPTX
250728_Thuy_Labseminar[Predictive Chemistry Augmented with Text Retrieval].pptx
PPTX
[NS][Lab_Seminar_250728]NeuralWalker.pptx
PPTX
A Novel Shape-Aware Topological Representation for GPR Data with DNN Integrat...
PPTX
250721_Thien_Labseminar[Variational Graph Auto-Encoders].pptx
PPTX
250721_HW_LabSeminar[RingFormer: A Ring-Enhanced Graph Transformer for Organi...
PPTX
250721_Thuy_Labseminar[Thought Propagation: An Analogical Approach to Complex...
PPTX
[NS][Lab_Seminar_250721]On Measuring Long-Range Interactions in Graph Neural ...
PPTX
250714_HW_LabSeminar[Structural Reasoning Improves Molecular Understanding of...
PPTX
[NS][Lab_Seminar_250714]Candidate Set Re-ranking for Composed Image Retrieval...
PPTX
250714_Thuy_Labseminar[BioT5: Enriching Cross-modal Integration in Biology wi...
PPTX
250707_HW_LabSeminar[CHEMICAL-REACTION-AWARE MOLECULE REPRESENTATION LEARNING...
PPTX
[NS][Lab_Seminar_250707]Learning with Noisy Triplet Correspondence for Compos...
PPTX
250707_JW_labseminar[CBAM: Convolutional Block Attention Module].pptx
[NS][Lab_Seminar_250811]Imagine and Seek: Improving Composed Image Retrieval ...
250811_HW_LabSeminar[Self-Supervised Graph Information Bottleneck for Multivi...
250811_Thien_Labseminar[Cluster-GCN].pptx
250811_Thuy_Labseminar[BioBRIDGE: BRIDGING BIOMEDICAL FOUNDATION MODELS VIA K...
[NS][Lab_Seminar_250728]On the Trade-off between Over-smoothing and Over-squa...
250804_HW_LabSeminar[Discrete Curvature Graph Information Bottleneck].pptx
250728_Thuy_Labseminar[Knowledge Enhanced Representation Learning for Drug Di...
250728_Thuy_Labseminar[Predictive Chemistry Augmented with Text Retrieval].pptx
[NS][Lab_Seminar_250728]NeuralWalker.pptx
A Novel Shape-Aware Topological Representation for GPR Data with DNN Integrat...
250721_Thien_Labseminar[Variational Graph Auto-Encoders].pptx
250721_HW_LabSeminar[RingFormer: A Ring-Enhanced Graph Transformer for Organi...
250721_Thuy_Labseminar[Thought Propagation: An Analogical Approach to Complex...
[NS][Lab_Seminar_250721]On Measuring Long-Range Interactions in Graph Neural ...
250714_HW_LabSeminar[Structural Reasoning Improves Molecular Understanding of...
[NS][Lab_Seminar_250714]Candidate Set Re-ranking for Composed Image Retrieval...
250714_Thuy_Labseminar[BioT5: Enriching Cross-modal Integration in Biology wi...
250707_HW_LabSeminar[CHEMICAL-REACTION-AWARE MOLECULE REPRESENTATION LEARNING...
[NS][Lab_Seminar_250707]Learning with Noisy Triplet Correspondence for Compos...
250707_JW_labseminar[CBAM: Convolutional Block Attention Module].pptx
Ad

Recently uploaded (20)

PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
RMMM.pdf make it easy to upload and study
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Classroom Observation Tools for Teachers
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Lesson notes of climatology university.
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Insiders guide to clinical Medicine.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
VCE English Exam - Section C Student Revision Booklet
RMMM.pdf make it easy to upload and study
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Classroom Observation Tools for Teachers
Module 4: Burden of Disease Tutorial Slides S2 2025
TR - Agricultural Crops Production NC III.pdf
PPH.pptx obstetrics and gynecology in nursing
Lesson notes of climatology university.
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Supply Chain Operations Speaking Notes -ICLT Program
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
2.FourierTransform-ShortQuestionswithAnswers.pdf
Anesthesia in Laparoscopic Surgery in India
102 student loan defaulters named and shamed – Is someone you know on the list?
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Insiders guide to clinical Medicine.pdf

[20240902_LabSeminar_Huy]Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition.pptx

  • 1. Quang-Huy Tran Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: huytran1126@gmail.com 2024-09-02 Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition Jianyang Xie et al. AAAI-2024: The Thirty-Eighth AAAI Conference on Artificial Intelligence
  • 2. 2 OUTLINE • MOTIVATION • METHODOLOGY • EXPERIMENT & RESULT • CONCLUSION
  • 3. 3 MOTIVATION • Human action recognition (HAR) is an essential topic: o computer vision and wide range of applications. o based-on skeleton sensor. o Traditional methods (CNN/RNN) or STGNN extracting handcrafted features from skeleton sequence. Overview and Limitation o SOTA ST-GCN considered fixed graph.  insufficient to capture changeable movements. o Adaptive adjacency based: ignored the semantic information.  insufficient to capture semantic properties of actions. o Semantic-guided: explicit input encoding.  Not flexible and cooperate when in deeper GCN. • Challenges:
  • 4. 4 INTRODUCTION • Propose temporal-causal SFD network (TC-SFDN) architecture to detect the forgeries at the frame, clip and action levels. o a hierarchical GCN architecture to learn both low-level skeleton representations based on physical body connections. o high-level action representations based on the temporal-causal graph for each action instance. Contribution • Propose dynamic semantic-based graph neural convolutions network (DS-GCN): o encode the dynamical semantic information of joints and edges implicitly. o joint/edge type was encoded with different transform functions, each of which represents a specific distribution • A group of SSL tasks are designed to efficiently train TC-SFDN for multilevel SFD.
  • 5. 5 METHODOLOGY Problem Definition • A skeleton data is constructed as spatial-temporal graph o N body joints in T frames: . o : spatial and temporal link. o : joint coordinates as the node feature, d is dimension. o Spatial graph: intra-body . o Temporal graph: Same joints along consecutive frames . o ST-GCN can be divided into using 1D temporal convolution: S-GCN (focus on) and T-GCN. • Topology-Fixed Graph Convolution Network: o Update the node representation by aggregating information from its neighborhood. o Denotes adjacency three partition o Output of S-GCN from input
  • 6. 6 METHODOLOGY Problem Definition • Topology-Adaptive Graph Convolution Network: o Adaptive matrix dynamically learned with self attention mechanism. o Suppose with 2 two transformation functions, the correlation between 2 joints: • Semantic-Guided Graph Convolution Network: o input feature was refined by adding a one-hot vector of joint types o Adaptive matrix S-GCN:
  • 8. 8 METHODOLOGY Dynamic Semantic-Based GCN • Topology-adaptive GCN: o Joint and edge types encoded dynamically. o a directed graph G = (V, E, A, R, X), A and R denote the type mapping function for each node, edge: o Semantic-based adaptive graph for node and edge:
  • 9. 9 METHODOLOGY Dynamic Semantic-Based GCN • Node Type-Aware Adaptive Topology. o projected into their individual feature space with a node type mapping function. o Calculate according to the non-local mechanism.  s and t as two nodes of different types, node-aware feature representation: o Directed correction between node sand t along channel dimension:
  • 10. 10 METHODOLOGY Dynamic Semantic-Based GCN • Edge Type-Aware Adaptive Topology. o applying separate convolution kernel on the adaptive graph. o Given three nodes s, t and u of different types, edge type-aware adaptive correlation: o Edge type-aware topology can be represented  s and t is the node type index, M is the number of types.
  • 11. 11 METHODOLOGY Dynamic Semantic-Based GCN • Decomposed into three branches: o The node-type aware branch, edge-type aware branch, and general branch. o A branch-wise weight:  learnable and utilized for the combination of a shared correction matrix. o For each branch, combination of a shared correction matrix and a self-adaptive graph was utilized for spatial graph convolution operation.  3 branches were concatenated along feature channel dimension and followed by a 1 × 1 convolution kernel.  Process DS-GCN:
  • 12. 12 METHODOLOGY Model Architecture • Ten blocks in series: o Followed by a global average pooling and a softmax classifier. o Number of basic feature channels is 64 and doubled at 5th and 8th block. o Each block: 1 DS-GCN and multi-scale temporal module (temporal convolution network).
  • 13. 13 EXPERIMENT AND RESULT Experiment Settings • Dataset: human action recognition o NTU-RGB+D and Kinetics-400. • Baselines: o STGNN or GNN: ST-GCN [1], SGN[2], AS-GCN[3], RA-GCN[4], 2s-GCN[5], GCNN[6], FGCN[7], shiftGCN[8], DSTA-Net[9], MS-G3D[10], CTR-GCN[11] and ST-GCN++[12]. o CNN: PoseConv3D[13]. [1] Yan, S., Xiong, Y., & Lin, D. (2018, April). Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI conference on artificial intelligence (Vol. 32, No. 1). [2] Zhang, P., Lan, C., Zeng, W., Xing, J., Xue, J., & Zheng, N. (2020). Semantics-guided neural networks for efficient skeleton-based human action recognition. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1112-1121). [3] Li, M., Chen, S., Chen, X., Zhang, Y., Wang, Y., & Tian, Q. (2019). Actional-structural graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3595-3603). [4] Song, Y. F., Zhang, Z., Shan, C., & Wang, L. (2020). Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 31(5), 1915-1925. [5] Shi, L., Zhang, Y., Cheng, J., & Lu, H. (2019). Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12026-12035). [6] Shi, L., Zhang, Y., Cheng, J., & Lu, H. (2019). Skeleton-based action recognition with directed graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7912-7921). [7] Yang, H., Yan, D., Zhang, L., Sun, Y., Li, D., & Maybank, S. J. (2021). Feedback graph convolutional network for skeleton-based action recognition. IEEE Transactions on Image Processing, 31, 164-175. [8] Cheng, K., Zhang, Y., He, X., Chen, W., Cheng, J., & Lu, H. (2020). Skeleton-based action recognition with shift graph convolutional network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 183-192). [9] Shi, L., Zhang, Y., Cheng, J., & Lu, H. (2020). Decoupled spatial-temporal attention network for skeleton-based action-gesture recognition. In Proceedings of the Asian conference on computer vision. [10] Liu, Z., Zhang, H., Chen, Z., Wang, Z., & Ouyang, W. (2020). Disentangling and unifying graph convolutions for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 143-152). [11] Chen, Y., Zhang, Z., Yuan, C., Li, B., Deng, Y., & Hu, W. (2021). Channel-wise topology refinement graph convolution for skeleton-based action recognition. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 13359-13368). [12] Duan, H., Wang, J., Chen, K., & Lin, D. (2022, October). Pyskl: Towards good practices for skeleton action recognition. In Proceedings of the 30th ACM International Conference on Multimedia (pp. 7351-7354). [13] Duan, H., Zhao, Y., Chen, K., Lin, D., & Dai, B. (2022). Revisiting skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2969-2978). • Measurement: o Accuracy (ACC).
  • 14. 14 EXPERIMENT AND RESULT Result – Overall Performance Tab. Classification accuracy comparison against state-of-the-art methods.
  • 15. 15 EXPERIMENT AND RESULT Result – Ablation study. Tab. Generalization of the proposed semantic module. Tab. Ablation On the edge/node type encoding. Tab. Comparison DS-GCN in different learnable weight manners. Tab. Exploration on the semantic encoding stage.
  • 16. 16 CONCLUSION • Propose 2 dynamical semantic-based adaptive graph: o Node type-aware and edge type-aware adaptive graph. o Can be apply to any ST-GCN models for skeleton-based recognition. Summarization • Generated a dynamic semantic-based graph neural network for skeleton-based human action recognition: o outperforms SOTA methods notably on both NTURGB+D and Kinetics-400.