Visual object tracking, which aims to estimate the position of an arbitrary target in a video sequence automatically, has drawn great attention in recent years. Many efforts have been made regarding this topic. The Siamese network, with a balanced accuracy and speed, has achieved great success. The Siamese network consists of two branches: one for the target image and the other for the search image. The position with the maximum score in the similarity map between the target and the search images indicates the place of the target image in the search image. Current Siamese trackers treat the features of different channels and spatial locations equally. However, the features of different channels and spatial locations may represent different semantic information. We propose a channel and spatial (CS) attention-based Siamese network for visual object tracking. A CS attention mechanism is inserted into the feature extractor to enhance the semantic feature learning. The experimental results show that the proposed network significantly improves the performance of the baseline tracker and is one of the top-ranked trackers among all tested state-of-the-art trackers on the most widely used visual object tracking datasets.
With the development of video services available on various devices (mobile, PC, tablet, television, etc.), the end user experiences and requirements have changed. Objective video quality assessment tools and metrics are increasingly complex and adapted to different types of service context. Compared with other reviews of quality metrics, the full-reference (FR) video quality metrics reviewed (particularly, ViS3, SSIMplus, video multimethod assessment fusion, and open perceptual video quality metric) are more recent and have not been compared with each other yet. We compare and evaluate the performance of 10 FR metrics with respect to their accuracy in predicting the subjective perceived quality for the videoconferencing application. Emphasis is given to the metrics evaluating the visual quality (from human perception). A detailed statistical study is carried out on four subjective databases: École polytechnique fédérale de Lausanne database, live mobile database, and two Orange Lab databases, with a large sample of distortion types (transmission errors, encoding bit rates, etc.).
The popularity of 3D applications has brought out new challenges in the creation, compression and transmission of 3D content due to the large size of 3D data and the limitation of transmission. Several compression standards, such as, Multiview-HEVC and 3D-HEVC have been proposed to compress the 3D content aiding by view synthesis technologies, among which the most commonly used algorithm is Depth-Image-Based-Rendering (DIBR), but the quality assessment of DIBR-synthesized view is very challenging owing to its new types of distortions induced by inaccurate depth map which the conventional 2D quality metrics may fail to assess. In this paper, we test the performance of existing objective metrics on free-viewpoint video with different depth coding algorithms. Results show that all the existing objective metrics perform not well on this database including the full-reference and the no-reference. There is certainly room for further improvement for the algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.