Advances in Convolutional Neural Networks (CNN) have demonstrated state of the art performance in the tasks of image classification and object detection over the past decade. While significant progress has been made in development of more efficient networks, the computational and memory requirements still exceed practical limits in many applications. Additionally, the pose variability in such applications requires even larger training datasets for the network to generalize to all possible scenarios. The goal of this work is to develop an architecture for fusion of multiple views of a single target to provide robust classification with a lightweight backbone network used across all agents. Motivated by approaches to ensemble learning, we demonstrate that multiple weak learners with computationally efficient networks can combine to enhance classification accuracy. Three methods of fusion are considered: decision fusion, feature fusion, and multi-scale feature fusion. A novel network architecture is developed and implemented for each approach then trained and evaluated using synthetic data. For the feature fusion models, a custom training scheme is developed to minimize classification error while maintaining a common feature extraction backbone across agents. This conforms to a distributed classification use case where each agent has no prior knowledge of its position relative to target. Finally, we discuss the requirements for shared data of each approach in the context of applications with limited communication bandwidth.
|