Multiview stereo reconstruction based on context-aware transformer

Zhaoxu Tian

doi:10.1117/12.3032052

6 June 2024 Multiview stereo reconstruction based on context-aware transformer

Zhaoxu Tian

Proceedings Volume 13175, International Conference on Computer Network Security and Software Engineering (CNSSE 2024); 131750U (2024) https://doi.org/10.1117/12.3032052
Event: 4th International Conference on Computer Network Security and Software Engineering (CNSSE 2024), 2024, Sanya, China

Abstract

This paper tackles the challenges inherent in existing Multi-View Stereo (MVS) methods, which often struggle with scenes that have repetitive textures and complex scenarios, leading to reconstructions that lack quality, completeness, and accuracy. To address these issues, we introduce a novel deep learning network, Clo-PatchmatchNet, which leverages context-aware Transformers for enhanced performance. The network's architecture starts with a feature extraction module that processes image features. These features are then input into a learnable Patchmatch algorithm, creating an initial depth map. This map undergoes further refinement to yield the final, detailed depth map. A key innovation in our approach is the integration of a context-aware Transformer block, known as Cloblock, into the feature extraction stage. This allows the network to effectively capture both global contextual information and high-frequency local details, enhancing feature matching across various views. Our experimental evaluations, conducted on the Technical University of Denmark (DTU) dataset, reveal that Clo-PatchmatchNet outperforms the traditional PatchmatchNet by achieving a 2.5% improvement in reconstruction completeness and a 1.2% increase in accuracy, resulting in an overall enhancement of 1.7%. Moreover, when compared to other contemporary methods, our proposed solution demonstrates superior performance in terms of both completeness and overall quality, marking a significant advancement in the field of 3D reconstruction.

(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Zhaoxu Tian "Multiview stereo reconstruction based on context-aware transformer", Proc. SPIE 13175, International Conference on Computer Network Security and Software Engineering (CNSSE 2024), 131750U (6 June 2024); https://doi.org/10.1117/12.3032052

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
5 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Transformers

Depth maps

Feature extraction

Education and training

Image processing

Point clouds

Visualization

Show All Keywords

Keywords/Phrases

Search In:

Publication Years