This work focuses on a bimodal vision system which was previously demonstrated as a relevant sensing candidate for detecting and tracking fast objects by combining the unique event-based sensor features i.e. high temporal resolution, reduced bandwidth needs, low energy consumption, and passive detection capabilities with the high-spatial resolution of a RGB camera. In this study, we aim to propose a model based on the principle of attentional vision for real-time detection and tracking of UAVs, taking into account computing and on-board resource constraints. A laboratory demonstrator have been proposed to evaluate the operational limits in terms of computation time, system performances (including target detection) versus speed. Our first indoor and outdoor tests revealed the interest and potential of our system to quickly detect objects flying at hundreds of kilometers an hour.
This feasibility study explores training a DNN-based military vehicle detector for airborne guidance systems, addressing challenges of scarce data, numerous similar vehicle classes, and varied real warfare conditions. To this end, a sanitized military vehicle image database is created from multiple 2D and 3D sources with miniature vehicles acquired under various view angles. Complemented by data augmentation tools including AI generated backgrounds, we are able to export controlled, trustworthy and class-equilibrated semi-synthetic datasets. As successful training on a limited number of classes has already been demonstrated, this study further explores the relevancy of this approach on real warfare footage testing with multi-class detector training. By leveraging the combination of data sources and data augmentation techniques and generative AI for creating contextual backgrounds, precision, selectivity, and adaptability of the detectors are evaluated and improved across diverse operational and current situational contexts.
This feasibility study investigates the training of DNN-based object detectors for military vehicle detection in intelligent guidance systems for airborne devices. The research addresses the challenges of scarce training images, infrared signatures, and varying flight phases and target distances. To tackle these issues, a database of sanitized military vehicle patches from multiple sources, data augmentation tools and Generative AI (Stable Diffusion XL) are employed to create synthetic training datasets. The objectives include obtaining a robust and performant system based on trustworthy AI, covering vehicle detection, recognition and identification in both infrared and color images within different contexts. In this study various object detection models are trained and evaluated for recall, precision and inference speed based on flight phase and spectral domain, while considering future embedding into airborne devices. The research is still ongoing, with initial results demonstrating the applicability of our approaches for military vehicle detection in aerial imagery.
KEYWORDS: Education and training, 3D modeling, Visual process modeling, Solid modeling, Object detection, Data modeling, 3D scanning, 3D image processing, 3D acquisition, Optical sensors
Vision-based object detection remains an active research area in both civilian and military domains. While the state-of-the-art relies on deep learning techniques, these demand large multi-context datasets. Given the rarity of open-access datasets for military applications, alternative methods for data collection and training dataset creation are essential. This paper presents a novel vehicle signature acquisition based on indoor 3D-scanning of miniature military vehicles. By using 3D projections of the scanned vehicles as well as off-the-shelves computer aided design models, relevant image signatures are generated showing the vehicle from different perspectives. The resulting context-independent signatures are enhanced with data augmentation techniques and used for object detection model training. The trained models are evaluated by means of aerial test sequences showing real vehicles and situations. Results are compared to state-of-the art methodologies. Our method is shown to be a suitable indoor solution for training a vehicle detector for real situations.
Deep neural network based military vehicle detectors pose particular challenges due to the scarcity of relevant images and limited access to vehicles in this domain, particularly in the infrared spectrum. To address these issues, a novel drone-based bi-modal vehicle acquisition method is proposed, capturing 72 key images from different view angles of a vehicle in a fast and automated way. By overlaying vehicle patches with relevant background images and utilizing data augmentation techniques, synthetic training images are obtained. This study introduces the use of AI-generated synthetic background images compared to real video footage. Several models were trained and their performance compared in real-world situations. Results demonstrate that the combination of data augmentation, context-specific background samples, and synthetic background images significantly improves model precision while maintaining Mean Average Precision, highlighting the potential of utilizing Generative AI (Stable Diffusion) and drones to generate training datasets for object detectors in challenging domains.
Performing object detection and recognition at the imaging sensor level, raises many technical and scientific challenges. Today’s state-of-the-art detection performances are obtained with deep Convolutional Neural Network (CNN) models. However, reaching the expected CNN behavior in terms of sensitivity and specificity require to master the training dataset. We explore in this paper a fast and automated method to acquire images of vehicles in infrared and visible range employing a commercial inspection drone equipped with thermal and visible range cameras, associated to a dedicated data-augmentation method for automated generation of context-specific machine learning datasets. The purpose is to successfully train a CNN to recognize the vehicles in realistic outdoor situations in infrared or visible range images, while reducing mandatory access to the vehicles of interest and the needs of complex and long outdoor image acquisition. First results demonstrate the feasibility of our approach for training a deep neural network-based object detector for vehicle detection and recognition applications in aerial images.
KEYWORDS: Visual process modeling, Image acquisition, 3D modeling, Object detection, Education and training, Data modeling, Detection and tracking algorithms, Defense and security, Visualization, Performance modeling
Mastering the content quality, diversity and context representativeness of a database is a key step to efficiently trained deep learning models. This project aims at controlling the relevant hyper-parameters of training datasets to be able to guarantee a mission performance and to contribute to the explainability of the model behavior. In this presentation, we show an approach to design DRI (Detection, Recognition and Identification) algorithms of military vehicles with different acquisition sources. Starting from a definition of a mission-agnostic image database, this study is focused on controlled image acquisition sources which automates the collection of few but relevant object signatures and their metadata e.g. bounding box, segmentation mask, view angles, object orientations, lighting conditions... By putting the accent on the acquisition of a reduced amount of images coupled with data augmentation technics, it is foreseen to demonstrate a dataset creation method that is fast, efficient, controlled and easily adaptable to new mission scenarios and contexts. This study compares three different sources: an optical acquisition bench of scaled vehicle model, the 3D scanning of scaled models and 3D graphic model of vehicles. The challenge is to make predictions on real situations with a neural network model only trained with the generated images. First results obtained with the datasets extracted from the 3D environment with graphic models and with scanned scaled ones are not yet reaching the previous performance levels obtained with 2D acquisition bench. Further investigations are needed to understand the influence of the numerous hyper-parameters introduced by the 3D environment simulation.
Compared to frame-based visual streams, event-driven visual streams offer very low bandwidth needs and high temporal resolution, making them an interesting choice for embedded object recognition. Such visual systems are seen to overcome standard cameras performances but have not yet been studied in the frame of Homing Guidance for projectiles, with drastic navigation constraints. This work starts from a first interaction model between a standard camera and an event camera, validated in the context of unattended ground sensors and situational awareness applications from a static position. In this paper we propose to extend this first interaction model by bringing a higher-level activity analysis and object recognition from a moving position. The proposed event-based terminal guidance system is studied firstly through a target laser designation scenario and the optical flow computation to validate guidance parameters. Real-time embedded processing techniques are evaluated, preparing the design of a future demonstrator of a very fast navigation system. The first results have been obtained using embedded Linux architectures with multi-threaded features extractions. This paper shows and comments these first results.
Performing specific object detection and recognition at the imaging sensor level, raises many technical and scientific challenges. Today state-of-the-art detection performances are obtained with Deep Convolutional Neural Network (CNN) models. However reaching the expected CNN behavior in terms of sensitivity and specificity require to master the training dataset. We explore in this paper, a new way of acquiring images of military vehicles in sanitized and controlled conditions of the laboratory in order to train a CNN to recognize the same visual signature with real vehicles in realistic outdoor situations. By combining sanitized images, counter-examples and different data augmentation techniques, our investigations aim at reducing the needs of complex outdoor image acquisition. First results demonstrate the feasibility to detect and classify, in real situations, military vehicles by exploiting only signatures from miniature models.
A new challenging vision system has recently gained prominence and proven its capacities compared to traditional imagers: the paradigm of event-based vision. Instead of capturing the whole sensor area in a fixed frame rate as in a frame-based camera, Spike sensors or event cameras report the location and the sign of brightness changes in the image. Despite the fact that the currently available spatial resolutions are quite low (640x480 pixels) for these event cameras, the real interest is in their very high temporal resolution (in the range of microseconds) and very high dynamic range (up to 140 dB). Thanks to the event-driven approach, their power consumption and processing power requirements are quite low compared to conventional cameras. This latter characteristic is of particular interest for embedded applications especially for situational awareness. The main goal for this project is to detect and to track activity zones from the spike event stream, and to notify the standard imager where the activity takes place. By doing so, automated situational awareness is enabled by analyzing the sparse information of event-based vision, and waking up the standard camera at the right moments, and at the right positions i.e. the detected regions of interest. We demonstrate the capacity of this bimodal vision approach to take advantage of both cameras: spatial resolution for standard camera and temporal resolution for event-based cameras. An opto-mechanical demonstrator has been designed to integrate both cameras in a compact visual system, with embedded Software processing, enabling the perspective of autonomous remote sensing. Several field experiments demonstrate the performances and the interest of such an autonomous vision system. The emphasis is placed on the ability to detect and track fast moving objects, such as fast drones. Results and performances are evaluated and discussed on these realistic scenarios.
In the context of decreasing human engagement in conflicts and ever wider areas of responsibility, vision-based Unattended Ground Sensor (UGS) deployment is mandatory for wide and long range situational awareness while relieving and protecting human resources in safe areas. High-resolution single sensor concepts imply processing a huge amount of pixels either off-board, by a high bandwidth communication media to a central processing infrastructure, or on-board with a powerful embedded processor. Nevertheless, these approaches decrease EM stealth and require a lot of electrical power affecting the sensor autonomy. To allow on-board real-time processing while optimizing Size,Weight and Power (SWaP) requirements, this paper presents a bi-focal vision UGS concept. It couples 4 low-resolution CMOS sensors with small, light and high aperture optics for providing the equivalent of a 17 Mega Pixels single-sensor with 130◦ FOV.
KEYWORDS: Visual process modeling, Sensors, Systems modeling, RGB color model, Computing systems, Retina, Signal processing, Embedded systems, Cameras, Biomimetics, Detection and tracking algorithms, Image fusion, Image processing, Video processing, Video surveillance, Feature extraction
Unmanned systems used for threat detection and identification are still not efficient enough for monitoring autonomously the battlefield. The limitation on size and energy makes those systems unable to use most state- of-the-art computer vision algorithms for recognition. The bio-inspired approach based on the humans peripheral and foveal visions has been reported as a way to combine recognition performance and computational efficiency. As a low resolution camera observes a large zone and detects significant changes, a second camera focuses on each event and provides a high resolution image of it. While such biomimetic existing approaches usually separate the two vision modes according to their functionality (e.g. detection, recognition) and to their basic primitives (i.e. features, algorithms), our approach uses common structures and features for both peripheral and foveal cameras, thereby decreasing the computational load with respect to the previous approaches.
The proposed approach is demonstrated using simulated data. The outcome proves particularly attractive for real time embedded systems, as the primitives (features and classifier) have already proven good performances in low power embedded systems. This first result reveals the high potential of dual views fusion technique in the context of long duration unmanned video surveillance systems. It also encourages us to go further into miming the mechanisms of the human eye. In particular, it is expected that adding a retro-action of the fovea towards the peripheral vision will further enhance the quality and efficiency of the detection process.
Improving the surveillance capacity over wide zones requires a set of smart battery-powered Unattended Ground Sensors capable of issuing an alarm to a decision-making center. Only high-level information has to be sent when a relevant suspicious situation occurs. In this paper we propose an innovative bio-inspired approach that mimics the human bi-modal vision mechanism and the parallel processing ability of the human brain. The designed prototype exploits two levels of analysis: a low-level panoramic motion analysis, the peripheral vision, and a high-level event-focused analysis, the foveal vision. By tracking moving objects and fusing multiple criteria (size, speed, trajectory, etc.), the peripheral vision module acts as a fast relevant event detector. The foveal vision module focuses on the detected events to extract more detailed features (texture, color, shape, etc.) in order to improve the recognition efficiency. The implemented recognition core is able to acquire human knowledge and to classify in real-time a huge amount of heterogeneous data thanks to its natively parallel hardware structure. This UGS prototype validates our system approach under laboratory tests. The peripheral analysis module demonstrates a low false alarm rate whereas the foveal vision correctly focuses on the detected events. A parallel FPGA implementation of the recognition core succeeds in fulfilling the embedded application requirements. These results are paving the way of future reconfigurable virtual field agents. By locally processing the data and sending only high-level information, their energy requirements and electromagnetic signature are optimized. Moreover, the embedded Artificial Intelligence core enables these bio-inspired systems to recognize and learn new significant events. By duplicating human expertise in potentially hazardous places, our miniature visual event detector will allow early warning and contribute to better human decision making.
Pedestrian movement along critical infrastructures like pipes, railways or highways, is of major interest in surveillance applications as well as its behavior in urban environment. The goal is to anticipate illicit or dangerous human activities. For this purpose, we propose an all-in-one small autonomous system which delivers high level statistics and reports alerts in specific cases. This situational awareness project leads us to manage efficiently the scene by performing movement analysis. A dynamic background extraction algorithm is developed to reach the degree of robustness against natural and urban environment perturbations and also to match the embedded implementation constraints. When changes are detected in the scene, specific patterns are applied to detect and highlight relevant movements. Depending on the applications, specific descriptors can be extracted and fused in order to reach a high level of interpretation. In this paper, our approach is applied to two operational use cases: pedestrian urban statistics and railway surveillance. In the first case, a grid of prototypes is deployed over a city centre to collect pedestrian movement statistics up to a macroscopic level of analysis. The results demonstrate the relevance of the delivered information; in particular, the flow density map highlights pedestrian preferential paths along the streets. In the second case, one prototype is set next to high speed train tracks to secure the area. The results exhibit a low false alarm rate and assess our approach of a large sensor network for delivering a precise operational picture without overwhelming a supervisor.
The purpose of this document is to present a comparative study of five algorithms of heart sound localization, one of
which, is a method based on radial basis function networks applied in a novel approach. The advantages and
disadvantages of each method are evaluated according to a data base of 50 subjects in which there are 25 healthy
subjects selected from the University Hospital of Strasbourg (HUS) and from theMARS500 project (Moscow) and
25 subjects with cardiac pathologies selected from the HUS. This study is made under the control of an experienced
cardiologist. The performance of each method is evaluated by calculating the area under a receiver operating curve
(AUC) and the robustness is shown against different levels of additive white Gaussian noise.
Today Optronic Countermeasure (OCM) concerns imply an IR Focal-Plane Array (FPA) facing an in-band laser
irradiation. In order to evaluate the efficiency of new countermeasure concepts or the robustness of FPAs, it is necessary
to quantify the whole interaction effects. Even though some studies in the open literature show the vulnerability of
imaging systems to laser dazzling, the diversity of analysis criteria employed does not allow the results of these studies
to be correlated.
Therefore, we focus our effort on the definition of common sensor figures of merit adapted to laser OCM studies. In this
paper, two investigation levels are presented: the first one for analyzing the local nonlinear photocell response and the
second one for quantifying the whole dazzling impact on image. The first study gives interesting results on InSb photocell behaviors when irradiated by a picosecond MWIR laser. With an increasing irradiance, four different successive responses appear: from linear, logarithmic, decreasing ones to permanent linear offset response. In the second study, our quantifying tools are described and their successful implementation through the picosecond laser-dazzling characterization of an InSb FPA is assessed.
A measurement of the photoelectric parameter (contrast, pixels affected) degradation of visible Focal-Plane Arrays (FPAs) irradiated by a laser has been performed. The irradiation fluence levels applied range typically from 300 μJ/cm2 to 700 mJ/cm2. A silicon FPA has been used for the visible domain. The effects of a laser irradiation in the Field Of View (FOV) and out of the FOV of the camera have been studied. It has been shown that the camera contrast decrease can reach 50% during the laser irradiation performed out of the FOV. Moreover, the effects of the Automatic Gain Control (AGC) and of the integration time on the blooming processes have been investigated. Thus, no AGC influence on the number of affected pixels has been measured, and it has been revealed that the integration time is the most sensitive parameter in the blooming action. Finally, only little laser energy is necessary for the system dazzling (1 μJ for 152 ns). A simulation of the irradiated images has been developed using a finite-difference solution. A good agreement has been shown between the experimental and simulated images. This procedure can be extended to test the blooming effects of IR cameras.
3-D optical fluorescent microscopy becomes now an efficient tool for volumic investigation of living biological samples. The 3-D data can be acquired by Optical Sectioning Microscopy which is performed by axial stepping of the object versus the objective. For any instrument, each recorded image can be described by a convolution equation between the original object and the Point Spread Function (PSF) of the acquisition system. To assess performance and ensure the data reproducibility, as for any 3-D quantitative analysis, the system indentification is mandatory. The PSF explains the properties of the image acquisition system; it can be computed or acquired experimentally. Statistical tools and Zernike moments are shown appropriate and complementary to describe a 3-D system PSF and to quantify the variation of the PSF as function of the optical parameters. Some critical experimental parameters can be identified with these tools. This is helpful for biologist to define an aquisition protocol optimizing the use of the system. Reduction of out-of-focus light is the task of 3-D microscopy; it is carried out computationally by deconvolution process. Pre-filtering the images improves the stability of deconvolution results, now less dependent on the regularization parameter; this helps the biologists to use restoration process.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.