Paper
27 March 2022 Multi-task feature fusion network for monocular depth estimation without joint annotations
Jialing Zou, Rui Wang, James Zhiqing Wen
Author Affiliations +
Proceedings Volume 12169, Eighth Symposium on Novel Photoelectronic Detection Technology and Applications; 12169CF (2022) https://doi.org/10.1117/12.2627286
Event: Eighth Symposium on Novel Photoelectronic Detection Technology and Applications, 2021, Kunming, China
Abstract
Monocular depth estimation refers to recovering the depth information of a 3D scene from a single 2D image taken by a camera. A multi-task training framework combining of semantic segmentation and depth estimation is developed to improve the monocular depth estimation performance in this paper. Nevertheless, joint annotations, namely semantic labels and depth annotations, are necessary for training dataset in the traditional joint training framework of semantics and depth. Unluckily, scarcely any large public datasets that provide the joint annotations can be accessed. To address the problem, a training framework having the feature correlation screening and linkage mechanism based on the linear independence of Gram matrix called GSFA-MDEN (Gram Semantic-Feature-Aided Monocular Depth Estimation Network), which is trained through the TSTB (Two-Stages-Two-Branches) training strategy, is studied and developed. GSFA-MDEN is composed with two brunches namely DepthNet and SemanticsNet, which are firstly trained through two different large datasets having its own respective annotation. Subsequently, the overall network is constructed through the feature fusion of the two brunches based on the Gram nonlinear correlation, which can establish the quantitative representation of the correlation between semantic features and depth features. Compared to the original DepthNet, on the KITTI dataset, GSFAMDEN decreases Root Mean Square Error (RMSE) from 5.808m to 5.370m by adding SemanticsNet assisted depth estimation, and the RMSE is further decreased to 5.167m by creatively employing Gram nonlinear correlation to excavate correlation of different task features. The series experimental results illustrate the superiority of GSFA-MDEN.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jialing Zou, Rui Wang, and James Zhiqing Wen "Multi-task feature fusion network for monocular depth estimation without joint annotations", Proc. SPIE 12169, Eighth Symposium on Novel Photoelectronic Detection Technology and Applications, 12169CF (27 March 2022); https://doi.org/10.1117/12.2627286
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image segmentation

3D image processing

Computer vision technology

Machine vision

Pattern recognition

Back to Top