Paper
3 October 2024 A convolutional network video action recognition algorithm based on a three-dimensional attention mechanism
Yunfeng Tan, Yingxiao Wang
Author Affiliations +
Proceedings Volume 13272, Fifth International Conference on Computer Vision and Data Mining (ICCVDM 2024); 132720P (2024) https://doi.org/10.1117/12.3048345
Event: 5th International Conference on Computer Vision and Data Mining (ICCVDM 2024), 2024, Changchun, China
Abstract
There are issues with capturing the actions of moving objects in video sequences with sufficient accuracy. Introducing a three-dimensional convolutional neural network (3D-CNN) can effectively capture dynamic information in video sequences. However, due to its fixed receptive field, it cannot capture long-range dependencies. To address these issues, this paper proposes an algorithm that combines a three-dimensional convolutional neural network and a 3D self-attention mechanism. This method primarily improves action recognition accuracy by capturing spatiotemporal information in video sequences. Firstly, 3D-CNN is utilized to perform convolution operations in width, height, and time dimensions, extracting spatiotemporal features from video frames. Secondly, the self-attention mechanism is extended to three dimensions by introducing a 3D self-attention mechanism to process spatiotemporal data. This mechanism calculates the relationship between each frame in the video sequence and all other frames, capturing long-range dependencies. Experimental results show that, compared to other algorithms, the recognition accuracy of the proposed algorithm increases by at least 7.2% and 5.4% on the UCF-101 and HMDB51 datasets, respectively. Additionally, the proposed algorithm has good interpretability, allowing for understanding of the model's decision-making process through visualizing attention weights.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Yunfeng Tan and Yingxiao Wang "A convolutional network video action recognition algorithm based on a three-dimensional attention mechanism", Proc. SPIE 13272, Fifth International Conference on Computer Vision and Data Mining (ICCVDM 2024), 132720P (3 October 2024); https://doi.org/10.1117/12.3048345
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Data modeling

3D modeling

Action recognition

Convolutional neural networks

RGB color model

Education and training

Back to Top