Joint misalignment-aware bilateral detection network for human pose estimation in videos

Qianyun Song; Hao Zhang; Yanan Liu; Shouzheng Sun; Dan Xu

doi:10.1117/12.3029389

13 May 2024 Joint misalignment-aware bilateral detection network for human pose estimation in videos

Qianyun Song, Hao Zhang, Yanan Liu, Shouzheng Sun, Dan Xu

Proceedings Volume 13158, Seventh International Conference on Computer Graphics and Virtuality (ICCGV 2024); 131580C (2024) https://doi.org/10.1117/12.3029389
Event: Seventh International Conference on Computer Graphics and Virtuality (ICCGV24), 2024, Hangzhou, China

Abstract

Existing human pose estimation methods in videos often rely on sampling strategies to select frames for estimation tasks. Common sampling approaches include uniform sparse sampling and keyframe selection. However, the former focuses solely on fixed positions of video frames, leading to the omission of dynamic information, while the latter incurs high computational costs by processing each frame. To address these issues, we propose an efficient and effective pose estimation framework, named Joint Misalignment-aware Bilateral Detection Network (J-BDNet). Our framework incorporates a Bilateral Dynamic Attention Module (BDA) using knowledge distillation for efficiency. BDA detects dynamic information on both left and right halves of a video segment, guiding the sampling process. Additionally, employing a smart bilateral recursive sampling strategy with BDA enables extracting more spatiotemporal dependencies from pose data, reducing computational costs without increasing the pose estimator’s usage frequency. Moreover, we enhance existing denoise network robustness by randomly exchanging body joint positions in pose data. Experiments demonstrate the performance of our framework in terms of high occlusion, spatial blur, and illumination variations, and achie state-of-the-art performance on Sub-JHMDB datasets.

(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Qianyun Song, Hao Zhang, Yanan Liu, Shouzheng Sun, and Dan Xu "Joint misalignment-aware bilateral detection network for human pose estimation in videos", Proc. SPIE 13158, Seventh International Conference on Computer Graphics and Virtuality (ICCGV 2024), 131580C (13 May 2024); https://doi.org/10.1117/12.3029389

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
10 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Pose estimation

Video

Image segmentation

Education and training

Sampling rates

Transformers

Video processing

Show All Keywords

Keywords/Phrases

Search In:

Publication Years