Computer vision (CV) algorithms have improved tremendously with the application of neural network-based approaches. For instance, Convolutional Neural Networks (CNNs) achieve state of the art performance on Infrared (IR) detection and identification (e.g., classification) problems. To train such algorithms, however, requires a tremendous quantity of labeled data, which are less available in the IR domain than for “natural imagery”, and are further less available for CV-related tasks. Recent work has demonstrated that synthetic data generation techniques provide a cheap and attractive alternative to collecting real data, despite a “realism gap” that exists between synthetic and real IR data.
In this work, we train deep models on a combination of real and synthetic IR data, and we evaluate model performance on real IR data. We focus on the tasks of vehicle and person detection, object identification, and vehicle parts segmentation. We find that for both detection and object identification, training on a combination of real and synthetic data performs better than training only on real data. This classification improvement demonstrates an advantage to using synthetic data for computer vision. Furthermore, we believe that the utility of synthetic data – when combined with real data – will only increase as the realism gap closes.
In order to achieve state of the art classification and detection performance with modern deep learning approaches, large amounts of labeled data are required. In the infrared (IR) domain, the required quantity of data can be prohibitively expensive and time-consuming to acquire. This makes the generation of synthetic data an attractive alternative. The well-known Unreal Engine (UE) software enables multispectral simulation addon packages to obtain a degree of physical realism, providing a possible avenue for generating such data. However, significant technical challenges remain to design a synthetic IR dataset—varying class, position, object size, and many other factors is critical to achieving a training dataset useful for object detection and classification. In this work we explore these critical axes of variation using standard CNN architectures, evaluating a large UE training set on a real IR validation set, and provide guidelines for variation in many of these critical dimensions for multiple machine learning problems.
The Common Data Format (CDF) was established to reconcile issues related to the use of various data formats and storage schemes throughout the US Army, C5ISR Research and Technology Integration (RTI) directorate. This paper describes the CDF and its usage for streamlining data sharing for Aided Targeting Recognition (AiTR) consumption. The CDF is based on the well-established Hierarchical Data Format 5 (HDF5). The CDF structure can contain collection imagery, the corresponding frame-synchronous and asynchronous metadata, and the related labeling information in a single file. The CDF is specifically designed to simplify data sharing.
Large amounts of labelled imagery are needed to sufficiently train Deep Neural Network (DNN) based classification algorithms. In many cases, collecting an adequate training dataset requires excessive amounts of time and money. The limited data problem is exacerbated when military-relevant imagery requirements are imposed. This often requires imagery collected in the infrared (IR) band, as well as, imagery of military-relevant targets; adding difficulty due to scarcity of sensors, targets, and personnel with the ability to capture the data. To mitigate these types of problems, this study evaluates the effectiveness of synthetic data when aided with small amounts of real data for training DNN based classifier algorithms. This study analyzes the efficacy of the YOLOv3 classifier algorithm at detecting common household objects after training on synthetic data created through an image chipping and insertion method. A set of image chips are created by extracting objects from a green screen background which are then used to generate synthetic training examples by pasting them on a variety of new backgrounds. The impact of background variety and addition of small amounts of real data on trained algorithm performance is analyzed.
In this paper we will analyze the effect of vibration on the performance of the YOLO (You Only Look Once) algorithm on object classification. The YOLO algorithm will initially be trained using static imagery of common objects. Image processing techniques will then be used to create video clips with varying levels of blur due to vibration. These videos will then be used to evaluate the performance of the algorithm on object classification. Results of the classification will be presented. An initial summary of the classification results will be discussed in relation to the amount of vibration added to the imagery.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.