With the remarkable advances of unmanned aerial vehicles (UAVs) and machine vision, aerial tracking has attracted wide attention from scholars. Previous tracking methods were mostly implemented in clean and well-lit environments, making it challenging to track camouflaged people rapidly and accurately in woodlands. We develop a framework for camouflaged people aerial tracking (CPAT) based on transformer. Specifically, a camouflaged people discovery strategy is proposed to rapidly generate training samples from the unlabeled videos captured by the UAV. Dynamic programming is also employed to filter noises to generate smooth candidate frames. To exploit multilevel feature information, a transformer fusion framework is designed to integrate shallow spatial information and in-depth semantic features. For reducing computing consumption, the spatial attention reduction mechanism is embedded in the multihead attention for fast tracking. Further, we build a dataset for evaluating the effect of camouflaged people tracking called Cam235, which consists of 85 manually labeled test sequences and more than 100k frames of the unlabeled training set. Exhaustive experiments on Cam235-test and popular tracking datasets prove that the CPAT is superior to other trackers for practical application. Under the most challenging condition of camouflaged people tracking, the CPAT achieves the precision of 67.9%, surpassing the state-of-the-art trackers by large margins.
With the development of unmanned aerial vehicles (UAVs) and computer vision, target detection methods based on UAVs have been increasingly applied in military and civilian fields. Considering the adaptability requirements of low illumination environments such as rain, fog, and night, visible and infrared (IR) sensors are often installed on UAVs to perform in all-weather and all-day conditions. To improve the near-surface detection performance of UAVs in low illumination environments, a pedestrian detection method using image fusion and deep learning is proposed. Visible and IR pedestrian images are collected by the UAV. The corresponding aerial images are registered and annotated. These two different types of aerial images are aligned at the time sequence and matched using the scale invariant feature transform. A U-type generative adversarial network (GAN) is first developed to fuse visible and IR images. A convolutional block attention module is introduced to strengthen the pedestrian target information in the GAN. The spatial domain and channel domain attention mechanisms are proposed to generate color fusion images with rich details and solve the problems of feature extraction as well as fusion rules designed manually in the existing image fusion methods. Then, You Only Look Once Version 3 (YOLOv3)-spatial pyramid pooling combined with transfer learning is adopted using the fused images to train the model on our aerial dataset to verify the pedestrian detection performance. In addition, comparison experiments are carried out. The experimental results demonstrate that the YOLOv3 model is successfully transferred to the target dataset. The performance of the proposed detection model using the fused images for transfer training is the best among the different methods. Finally, the accuracy P, recall R, mean average precision, and F1 score reach 0.804, 0.923, 0.928, and 0.859, respectively.
A visual localization approach for unmanned aerial vehicles (UAVs) based on the hybrid real-time stereo visual odometry (VO) is presented. The hybrid VO initializes the semidirect visual odometry (SVO) with depth obtained from the stereo camera, which runs with the monocular scheme at 30 Hz. Meanwhile, a feature-based stereo VO runs in a parallel thread to improve the reliability. Considering the robustness of the feature-based VO, we not only are able to estimate the pose of the UAV with the same rate of the incoming images but also to recover the pose of the UAV when the SVO fails. We demonstrate the accuracy and the reliability of the hybrid VO based on a public benchmark dataset and a dataset recorded on our experimental platform. In addition, the autonomous hovering experiment verifies that the estimated result is fast and accurate enough to control the position of the UAV.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.