Action Transformer: Model Improvement and Effective Investigation with MPOSE2021 and MSR Action 3D Datasets

Van-Dung  Hoang; Khac-Anh  Phu; Van-Tuong Lan Le

doi:10.37934/araset.62.1.7689

Authors

Van-Dung Hoang Faculty of Information Technology, HCMC University of Technology and Education, Ho Chi Minh City, 700000, Vietnam,
Khac-Anh Phu Faculty of Information Technology, University of Sciences, Hue University, Hue city, 530000, Vietnam,
Van-Tuong Lan Le University of Sciences, Hue University, Hue city, 530000, Vietnam,

DOI:

https://doi.org/10.37934/araset.62.1.7689

Keywords:

Action Transformer, Human Action Recognition, Skeleton Data, Deep Learning

Abstract

The AcT (Action Transformer) model has shown promising results in action recognition tasks. However, achieving high accuracy in complex and dynamic action sequences remains a challenge. In this paper, we present an approach to improve the accuracy of the AcT model by increasing the model's training complexity, validated on the MPOSE2021 and MSR Action datasets. Our method enhances the AcT model by incorporating a multi-level feature fusion technique. We introduce additional convolutional and pooling layers to capture more detailed spatial and temporal information from the input data. This increases the model's ability to discriminate between subtle action variations and improves its accuracy in recognizing complex actions. We evaluate the effectiveness of our proposed approach through extensive experiments on the MPOSE2021 and MSR Action datasets. The results demonstrate that our enhanced AcT model achieves significantly improved accuracy compared to the baseline AcT model and outperforms existing state-of-the-art methods. Our method effectively captures the intricacies of complex actions and provides more accurate predictions.