Recent approaches have achieved excellent results on few-shot object detection. However, most detectors are easily confused by visually similar classes, leading to misclassification of interesting objects. In this work, we introduce an anti-confusion grouping mechanism for this problem. Our model can refine the results of the major multi-class classifier of the few-shot object detector with an anti-confusion module. Instead of maximizing the feature distribution distance of similar classes in the feature space, our approach uses additional auxiliary grouping module to distinguish similar classes on the same feature space as in base training phase. Concretely, the class groups are obtained according to the class visual similarity, and then they are utilized to train the auxiliary module. The main classifier, regressor and auxiliary anti-confusion module are end-to-end trained based on a multi-task loss. In the test phase, the auxiliary module is combined with the main classifier to provide the final classification result. Through extensive experiments, we demonstrate that our model outperforms well-established baselines for few-shot object detection. We also present analysis on various aspects of our model, aiming to provide some inspiration for future few-shot detection works.
CAVE is a large virtual reality display system and this paper focuses on the CAVE-based panoramic video playing method to achieve playing panoramic video in any stream format in the CAVE system with free viewpoint. Our framework include the processing video data captured by VR video capture device, the establishment of CAVE model, the switching of viewpoint and the viewpoint interaction with video and finally realize a smooth and efficient panoramic video playing architecture.
Most existing local feature based facial expression recognition system have concentrate on the salient region on the face, while the effectiveness of the selected region and the computational complexity of the system still need improved. To overcome the limits of the previous work, we propose a novel algorithm kernel ReliefF to select the discriminative patches on the face. The novel approach not only considers the whole feature but also enhances the locality of the variation of the expressive face. Furthermore, it takes less computational complexity. Experimental results on CK+ and RML demonstrate that the method significantly outperforms the state-of-the-art.
Omnidirectional videos are widely used in Virtual Reality applications. Omnidirectional videos are sphere in origin and need to be projected to a 2D-plane before coding and transmission. Common projection methods suffer from the overstretching in the polar areas which leads to the enormous decreasing of the omnidirectional video quality. In this paper, we proposed a novel representation based on pseudo-cylindrical projection. The representation is then reshaped and rearranged in the considering of several constraints including saving the pixel area and increasing viewing quality. The generation of our representation is formed as a multi dimension optimization problem. Our results across the test video sequences show significant coding gains over standard representations.
This paper proposes a no-reference objective stereoscopic video quality assessment method with the motivation that making the effect of objective experiments close to that of subjective way. We believe that the image regions with different visual salient degree should not have the same weights when designing an assessment metric. Therefore, we firstly use GBVS algorithm to each frame pairs and separate both the left and right viewing images into the regions with strong, general and week saliency. Besides, local feature information like blockiness, zero-crossing and depth are extracted and combined with a mathematical model to calculate a quality assessment score. Regions with different salient degree are assigned with different weights in the mathematical model. Experiment results demonstrate the superiority of our method compared with the existed state-of-the-art no-reference objective Stereoscopic video quality assessment methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.