Single-shot structured light depth prediction based on deep learning has wide application demands in intelligent manufacturing and defect detection. In the depth prediction area, existing CNN has poor long-range global modeling capabilities, and Transformer architecture requires large computational resources. For the challenge, we first propose and adopt DPM-UNet (Depth Prediction Mamba UNet), an end-to-end structured light depth prediction model from singleshot grating integrating Mamba and U-Net architectures, which fully leverages the long-range capturing capability and low computational performance of the State Space Model, as well as the pixel-by-pixel reconstruction capability of the UNet. Compared with pure CNN and Transformer architecture, DPM-UNet is able to achieve more accurate prediction with a 26.4% error reduction on public datasets. Experiments showed that DPM-UNet is effective in improving the accuracy and robustness of depth maps, and demonstrates remarkable potential in structured light depth prediction tasks.
KEYWORDS: High dynamic range imaging, Fringe analysis, Deep learning, Cameras, 3D metrology, Projection systems, Optical spheres, Reflectivity, Neural networks, Metals
Fringe projection profilometry (FPP) technology, renowned for its stable and high-precision characteristics, is widely employed in three-dimensional surface measurements of objects. Whether utilizing deep learningbased methods or traditional multi-frequency, multi-step fringe analysis techniques, both require acquiring highquality stripe patterns modulated by the three-dimensional surface of the object. However, the limited dynamic range of cameras makes it difficult to capture effective fringe information in a single exposure, and while multiexposure methods can address this issue, they are inefficient. To address this, this study proposes an end-to-end neural network approach for generating high dynamic range (HDR) fringe patterns from projected gratings. Additionally, an end-to-end network is employed to solve the fringe phase. Experimental results demonstrate that this method significantly improves fringe pattern recovery on metallic surfaces with overexposed or underexposed regions. On a high dynamic range reflectivity dataset, the method achieved a phase error of 0.02072, successfully reconstructing 3D objects with only 8.3% of the time required by the 12-step Phase Shifting Profilometry (PSP) method. Furthermore, on standard spherical and planar objects, the method achieved a radius accuracy of 53.1 μm and flatness accuracy of 61.7 μm, demonstrating effective measurement precision without the need for additional steps. This method is effective for both high dynamic range reflective and non-high dynamic range reflective objects.
Fringe Projection Profilometry (FPP) faces challenges with objects of varying surface reflectivity, as projected light can exceed the camera’s dynamic range, hindering effective fringe capture. Current solutions using repeated projections with varying exposures increase measurement time, limiting real-time applicability. This study validates deep neural networks that transform traditional multi-frequency, multi-step, multi-exposure methods into a single-step, multi-exposure format, significantly reducing measurement time while maintaining accuracy. Experimental results demonstrate that deep learning methods can effectively extract phase information from modulated fringe images, unwrap it, and reconstruct 3D point clouds. On high-reflectivity metal datasets, the accuracy of the deep learning approach closely matches that of the traditional six-step method, while using only 16.7% of the time. For standard objects, the accuracy reaches up to 60 microns. These findings confirm that various deep learning methods can efficiently resolve phase information in modulated fringe patterns, significantly enhancing measurement speed.
KEYWORDS: Depth maps, Structured light, Education and training, 3D image processing, Phase reconstruction, Lithium, Fringe analysis, Signal processing, 3D modeling, 3D metrology, Deep learning
In recent years, end-to-end depth map prediction from single-shot fringe modulation images in structured light 3D measurement(F2D) has drawn widespread attention, which has significantly reduced measurement times and eliminated the complex intermediate steps in the traditional method. However, F2D is a long-distance ill-conditioned prediction problem, and it is difficult for existing regression networks to achieve high-precision pixel-by-pixel prediction over long distances in space and time. For the challenge, we propose APS-UNet(Absolute Phase aided Supervision UNet), an endto- end depth maps prediction network supervised by an absolute phase branch. With the core physical process, absolute phase branch as auxiliary supervision, can decompose the one challenging long-distance prediction into two easier shortdistance prediction tasks. Moreover, in the training process, the two branches provide feedback to each other, enhancing the accuracy and robustness of depth prediction. Compared to Res-UNet, APS-UNet demonstrates a 32% decrease in mean absolute error (MAE) based on the real dataset, highlighting the effectiveness of this network.
KEYWORDS: Semantics, 3D mask effects, 3D metrology, Structured light, Shadows, 3D modeling, Education and training, Deep learning, Image processing, Feature extraction
Deep learning-driven structured light 3D measurement has garnered significant attention due to the fast speed, high precision and non-contact characteristic. However, the accurate prediction of edge discontinuity area is still one of the challenges. In single-frame end-to-end absolute phase prediction task, we initially proposed a mask semantic attention network (MSAN) to enhance the edge and whole accuracy. Firstly, mask serves to partition the scene into its background (shadow) and foreground (objects) elements, and it provides semantic attention for the network. Secondly, we designed a mask fusion (MF) module which can effectively integrates feature maps with mask semantics. Based on the MF module and mask semantic information, we developed a U-shaped network architecture, and each layer feature map of the decoder is fused with the input mask adopting the MF module. MSAN improves edge prediction accuracy by explicitly identifying edge regions and drawing the network's attention to the edges and objects rather than shadow areas, enhancing overall prediction accuracy. Validation on real datasets showed that the mean absolute error decreased by 33% and the root mean square error decreased by 76% with MSAN, demonstrating the network's capability to improve both overall and edge precision in structured light deep learning tasks. This advancement significantly benefits the development of high precise and rapid structured light 3D measurement technologies.
Depth estimation and semantic segmentation are crucial for visual perception and scene understanding. Multi-task learning, which captures shared features across multiple tasks within a scene, is often applied to depth estimation and semantic segmentation tasks to jointly improve accuracy. In this paper, a deformable attention-guided network for multi-task learning is proposed to enhance the accuracy of both depth estimation and semantic segmentation. The primary network architecture consists of a shared encoder, initial pred modules, deformable attention modules and decoders. RGB images are first input into the shared encoder to extract generic representations for different tasks. These shared feature maps are then decoupled into depth, semantic, edge and surface normal features in the initial pred module. At each stage, effective attention is applied to depth and semantic features under the guidance of fusion features in the deformable attention module. The decoder upsamples each deformable attention-enhanced feature map and outputs the final predictions. The proposed model achieves mIoU accuracy of 44.25% and RMSE of 0.5183, outperforming the single task baseline, multi-task baseline and state-of-the-art multi-task learning model.
KEYWORDS: Phase unwrapping, Deep learning, 3D metrology, Shadows, Semantics, Network architectures, Education and training, Visualization, Time metrology, Phase reconstruction
Single-frame high-precision 3D measurement using deep learning has been widely studied for its minimal measurement time. However, the long physical and semantic distances make the end-to-end absolute phase reconstruction of single-frame grating challenging. To tackle this difficulty, we propose the DSAS-S2AP-X (Dual-Stage Auxiliary Supervision Network for Single-Frame to Absolute Phase Prediction with X) strategy, which includes the secondary highest frequency unwrapped phase and the highest frequency wrapped phase supervision branches. It combines a multi-frequency temporal phase unwrapping model with existing regression networks X (meaning arbitrary). Experimental results have shown that the DSAS-S2AP-ResUNet34 strategy can reduce the mean absolute error (MAE) and root mean square error (RMSE) of the absolute phase by 34.3% and 25.9% respectively based on the ResUNet34.
The three-frequency heterodyne phase shift profilometry is widely used in high-precision 3D reconstruction. However, the high accuracy comes at the cost of requiring many projected frames, which increases measurement time and decreases measurement efficiency. To address this challenge, we propose a rapid, high-precision absolute phase acquisition method called X+1+1, which fully integrates the accuracy advantages of the multi-frequency n-step heterodyne phase-shifting method and the speed advantages of the Modified Fourier transform profilometry (MFTP). The highest frequency gratings use the standard X-step phase-shifting method to determine the wrapped phase, ensuring high unwrapping accuracy and obtaining background light intensity. For intermediate and low frequencies, a single-frame grating and the Backgroundgenerated Modified Fourier transform profilometry (BGMFTP) are used to solve each wrapped phase to reduce the measurement time. Finally, the heterodyne method processes these three-frequency wrapped phases to obtain the absolute phase. Experimental results demonstrated the high accuracy and speed of this method in the 3D measurement process. Compared to traditional Fourier transform profilometry, the X+1+1 method has a 53% improvement in accuracy, while maintains the same level of performance as the three-frequency four-step heterodyne method in continuous non-marginal flat areas and the projection time was reduced by approximately 50%. The proposed X+1+1 method provides a new solution for balancing speed and accuracy in the application and promotion of structured-light 3D measurement.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.