Object detection in drone unmanned aerial vehicle (UAV) imagery is increasing. However, existing lightweight object detection models still face challenges in UAV aerial image object detection tasks due to the variable object scale and the existence of dense small objects. We propose a lightweight object detection model named YoloV8-RFCA. For small object features, the channel attention mechanism is integrated with receptive-field convolution to construct the Receptive-field conv with Channel Attention (RFCA) attention module, removing the parameter sharing issue and enhancing the feature extraction capability of the backbone network. Focusing on the feature information loss and degradation caused by multi-level transmission during the feature fusion operation, an asymptotic feature fusion strategy is proposed. Related experiment results indicate that the model achieved 82.5 mAP on the PASCAL VOC dataset and 41.9 mAP on the VisDrone2019 dataset. These experimental results confirm that our proposed model has a high practical application value in the field of UAV aerial image object detection.
Deep-learning-based semantic segmentation is the research focus for unmanned aerospace vehicle (UAV) aerial images analysis. However, there are problems in segmenting small and narrow objects and boundary regions, due to the large size differences between objects and the unbalanced class data in aerial images. A network named SEC-BRNet is proposed for the boundary refinement problem. First, the semantic embedding connections and progressive upsampling decoder are used to obtain spatial details for generating fused feature maps, which are then concatenated in decoding process level by level for recovering the boundary details. Second, a multiloss training strategy is developed for data imbalance and boundary roughness problems, including cross-entropy loss, Dice loss, and active boundary loss. After extensive experiments, our network could achieve 84.8% mIoU and 89.04% Boundary IoU on the AeroScapes dataset and achieve 62.81% mIoU and 90.78% Boundary IoU on the Semantic Drone Dataset. The experimental results indicate that the proposed SEC-BRNet performs well in semantic segmentation task for UAV aerial images.
In the tracking-by-detection scheme of multiple object tracking (MOT), the data association process in which existing tracking data and new detections are matched over time is very important. A framework is proposed to solve the data association problem in MOT in scenarios where there are potential target interactions and occlusions in crowded environments. This framework consists of an input layer and an association layer. The input layer is an end-to-end feature-map extraction model that incorporates a simplified Siamese convolutional neural network, which effectively distinguishes similar objects based on their appearance and motion. The association layer is composed of a bidirectional gated recurrent unit network with three layers of fully connected networks (FCNs), whose outputs are fed into the FCNs and transformed into an association matrix that reflects the matching scores between the detections and existing tracks. The matrix is then used to minimize the loss of the framework. The experimental results show that the proposed framework demonstrates outstanding performance for MOT, with its accuracy and precision in MOT reaching values as high as 26.1% and 71.2%, respectively.
Low dose CT is a popular research which focuses to reduce radiation damaging. Inspiring from the aperture coding method in optical imaging, azimuth coding projection method which belongs to the category of incomplete projection is proposed to shorten the exposure time and reduce the projection paths. Based on this coding method, the ROI will inevitably be sampled intensively, the information which is from region of interest (ROI)projection data was modulated by "coding". And the azimuth coding projection methods for the ROI will reflect the spatial continuity of the ROI. The spatial correlation between slice and adjacent slices is strong in human CT image sequences. Deep learning (DL) technology excels in medical image feature extraction. Convolutional neural network(CNN)was used to extract the modulated ROI projection information, and CNN incorporated the spatial information from adjacent slices based on the strong spatial correlation, then the obtained feature map is nonlinearly mapped to the feature map containing less artifacts. After training and testing the CNN, there is one azimuth coding method which are adapted to the corresponding the ROI at least, CT reconstructed images were restored well.
This paper presents a segmented X-ray spectrum detection method based on a layered X-ray detector in Cadmium Telluride (CdTe) substrate. We describe the three-dimensional structure of proposed detector pixel and investigate the matched spectrum-resolving method. Polychromatic X-ray beam enter the CdTe substrate edge on and will be absorbed completely in different thickness varying with photon energy. Discrete potential wells are formed under external controlling voltage to collect the photo-electrons generated in different layers, and segmented X-ray spectrum can be deduced from the quantity of photo-electrons. In this work, we verify the feasibility of the segmented-spectrum detection mechanism by simulating the absorption of monochromatic X-ray in a CdTe substrate. Experiments in simulation show that the number of photo-electrons grow exponentially with the increase of incident thickness, and photons with different energy will be absorbed in various thickness. The charges generated in different layers are collected into adjacent potential wells, and collection efficiency is estimated to be about 87% for different incident intensity under the 40000V/cm electric field. Errors caused by charge sharing between neighboring layers are also analyzed, and it can be considered negligible by setting appropriate size of electrodes.
Image blending plays an important role in video mosaicking, which has a high demand for real-time performance and visual quality. This paper proposes a fast blending method based on Bresenham algorithm, which realizes blending by controlling the storing addresses of source pixels. The starting storing location is accurately computed based on the coordinates of the middle pixel of the seam instead of the first pixel’s, reducing the accumulated error along the seam significantly. The other storing addresses are acquired using a variable-step Bresenham method, which takes advantage of burst mode operation of a dynamic memory and can achieve a good trade-off between the operation convenience and memory requirement. By the proposed method, complicated calculations of storing addresses are simplified into integer additions and subtractions, which is more suitable for hardware implementation. A hardware architecture based on field programmable gate array is presented to evaluate the proposed method with clock frequency analysis and resource assessment. The experimental results show that the proposed method achieves good performance of high image quality, low computational complexity, and low memory requirement.
In modern image processing, due to the development of digital image processing, the focus of the sensor can be automatically set by the digital processing system through computation. In the other hand, the auto-focusing synchronously and consistently is one of the most important factors for image mosaic and fusion processing, especially for the system with multi-sensor which are put on one line in order to gain the wide angle video information. Different images sampled by the sensors with different focal length values will always increase the complexity of the affine matrix of the image mosaic and fusion in next, which potentially reducing the efficiency of the system and consuming more power. Here, a new fast evaluation method based on the gray value variance of the image pixel is proposed to find the common focal length value for all sensors to achieve the better image sharpness. For the multi-frame pictures that are sampled from different sensors that have been adjusted and been regarded as time synchronization, the gray value variances of the adjacent pixels are determined to generate one curve. This curve is the focus measure function which describes the relationship between the image sharpness and the focal length value of the sensor. On the basis of all focus measure functions of all sensors in the image processing system, this paper uses least square method to carry out the data fitting to imitate the disperse curves and give one objective function for the multi-sensor system, and then find the optimal solution corresponding to the extreme value of the image sharpness according to the evaluation of the objective function. This optimal focal length value is the common parameter for all sensors in this system. By setting the common focal length value, in the premise of ensuring the image sharpness, the computing of the affine matrix which is the core processing of the image mosaic and fusion which stitching all those pictures into one wide angle image will be greatly simplified and the efficiency of the image processing system is significantly improved.
A real-time enhanced technique which can accelerate the processing time and provide a more distinct image using
dynamic area threshold was presented. It can compensate the flaw of the image which captured by CMOS sensor.
According to the value of statistic result, dynamic increasing algorithm would be used to change the contrast and
brightness via local area pixels enhancement while pixels' real value is larger than the upper threshold. On the
converse, if the real value of pixels is below the lower threshold, dynamic decreasing algorithm would be used in
them to darker these pixels more than these which are above the upper threshold value. The edge of the dynamic
area would be analyzed and provide an appropriate way to process these edge pixels, which will let them be the
usual gradual edge in the dynamic area. And this algorithm is implemented in FPGA of Xilinx XUPV4 with an
MCU which can control the area where we are interested in to increase the quality efficiently.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.