Although self-supervised depth estimation models based on transformers have achieved success, lightweight depth prediction networks exhibit a particularly pronounced issue with depth prediction blurriness at object boundaries compared to standard depth prediction networks. We found that this problem arises from the token dimension constraints, which limit the precise representation of semantic and spatial information. To address this challenge, we introduce a lightweight monocular self-supervised depth estimation network, RENA-Depth, which leverages convolutional neural network neighborhood attention-guided recursive Transformers to enhance depth estimation precision. Specifically, the design begins with the introduction of neighborhood adaptive attention (NA), which focuses on local and regional scales. This component adaptively mines latent semantic and spatial information from the neighborhoods of the input features of self-attention. Subsequently, a global feature recursive interaction module was developed to recursively refine the interaction between local and global information, enhancing the representation of semantic and spatial information without a significant increase in parameters. Finally, an attention equilibrium loss is proposed, which motivates richer semantic information representation and clarifies boundary depth by penalizing the orthogonality similarity of attention mechanisms. Extensive evaluations on the Karlsruhe Institute of Technology and Toyota Technological Institute and Make3D datasets have demonstrated that the proposed lightweight self-supervised depth estimation model, RENA-Depth, outperforms the most advanced lightweight depth detection algorithms, confirming its efficacy and innovation in improving depth prediction accuracy.
To address the issue of high miss rates for distant small objects and the diminished system detection performance due to the influence of hazy when autonomous vehicles operate on mountain highways. We propose a framework for small object vehicle detection in hazy traffic environments (SHTDet). This framework aims to enhance small object detection for autonomous driving under hazy conditions on mountainous motorways. Specifically, to restore the clarity of hazy images, we designed an image enhancement (IE), and its parameters are predicted by a convolutional neural network [filter parameter estimation (FPE)]. In addition, to enhance the detection accuracy of small objects, we introduce a cascaded sparse query (CSQ) mechanism, which effectively utilizes high-resolution features while maintaining fast detection speed. We jointly optimize the IE and the detection network (CSQ-FCOS) in an end-to-end manner, ensuring that FPE module can learn a suitable IE. Our proposed SHTDet method is adept at adaptively handling sunny and hazy conditions. Extensive experiments demonstrate the efficacy of the SHTDet method in detecting small objects on hazy sections of mountain highways.
The current field of autonomous driving has achieved superior object detection performance in good weather conditions. However, the environment sensing capability of autonomous vehicles is severely affected in rainfall traffic environments. Although deep-learning-based image derain algorithms have made significant progress, integrating them with high-level vision tasks, such as object detection, remains challenging due to the significant differences between the derain and object detection algorithms. Additionally, the accuracy of object detection in real rain traffic environments is significantly reduced due to the domain transfer problem between the training dataset and the actual rain environment. To address this domain-shifting problem, we propose an adaptive rain image enhancement object detection network for autonomous driving in adverse weather conditions (ARODNet). This network architecture consists of an image adaptive enhancement module, an image derain module, and an object detection module. The baseline detection module (CBAM-YOLOv7) is built by incorporating the YOLOv7 object detection network into a feed-forward convolutional neural network, and it includes an attention module (CBAM). We propose a domain adaptive rain image enhancement module, DRIP, for low-quality images acquired under heavy rainfall conditions. DRIP enhances low-quality images on rainy days by adaptively learning multiple preprocessing weights. To remove the effects of rain patterns and fog clouds on image detection, we introduce DRIP-enhanced images into the depth estimation derain module (DeRain) to prevent rain and fog from obscuring the objects to be detected. Finally, the multistage joint training strategy is adopted to improve the training efficiency, and the object detection is performed while the image is derained. The efficacy of the ARODNet network for object detection in rainy weather traffic environments has been demonstrated through numerous quantitative and qualitative studies.
With the development of monitoring technology and the improvement of people's security awareness, intelligent human abnormal action recognition technology in the field of action recognition is increasingly high. In most cases, abnormal human action may have little difference in appearance compared with normal behavior, so the control of visual rhythm information becomes an important factor affecting action recognition, but people often focus on the appearance information of the action and ignore the rhythm information. In this paper, we introduce the temporal pyramid module to process the visual tempos information, meanwhile, the traditional LSTM local history information transfer method is very easy to lose the context information, which is not conducive to the grasp of global information and thus will greatly affect the processing effect of the temporal pyramid. This paper introduces a non-local neural network module to enhance the network's ability to grasp global information and the model's long-range modeling capability, which is used to supplement the temporal pyramid module. Finally, this paper uses the mainstream anomaly dataset UCF-Crime to test the network performance, and the improved network model recognition accuracy AUC reaches 0.82, which is better than other stateof-the-art methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.