Human pose activity recognition (HPAR) offers a wide range of applications due to the widespread use of collection devices such as smartphones and video cameras, as well as its capacity to gather human activity data. Electronic devices and applications continue to evolve, and breakthroughs in artificial intelligence (AI) have transformed the capacity to extract deeply buried information for accurate recognition and interpretation. We propose a systematic design for integrating conventional networks and constraints into the attention framework for learning long-range dependencies, thereby achieving end-to-end pose estimation with flexibility and scalability. The proposed method modifies the temporal receptive field using a multi-scale structure of dilated convolutions and can be adapted to a causal model for real-time performance. Our approach achieves state-of-the-art performance on the task of three-dimensional HPAR and outperforms previous methods while maintaining a lower complexity cost. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
3D modeling
Neural networks
Convolution
Video
Education and training
Motion models
3D image processing