KEYWORDS: Wavelets, Video, Video coding, Scalable video coding, Wavelet transforms, Motion estimation, Discrete wavelet transforms, Signal to noise ratio, Video surveillance, Computer programming
In 3D wavelet video coding schemes, in which a temporal wavelet decomposition of the video data is combined with a spatial wavelet transform, temporal scalability and the reduction of temporal redundancy is often achieved at the expense of a delay. The delay increases according to the number of video frames that are jointly coded or, in other terms, according to the temporal wavelet transform depth. Depending on the system delay that is allowed by a specific application, the maximum temporal transform depth might be limited. On the other hand, consecutive temporal lowpass frames at the highest permitted temporal decomposition level might still be strongly correlated, especially in case of video material with static background or low motion that can be optimally compensated. In this case, the temporal correlation should be exploited to improve the coding efficiency without inducing an additional delay to the overall system. In this paper, we consider a 3D wavelet video coding scheme in which the temporal wavelet decomposition precedes the spatial wavelet decomposition, and investigate the application of a spatially scalable wavelet video coder with in-band prediction to the temporal lowpass frames at the maximum temporal transform depth.
In this paper, a wavelet-based spatially scalable video coding scheme with in-band prediction is investigated. The discrete wavelet transform is employed for the spatial decomposition of each video frame. A blockwise motion compensated prediction is computed for the individual subbands. For motion compensation the shift-invariant overcomplete discrete wavelet transform is exploited to achieve half-pixel motion vector accuracy. A blockwise decision is made between the coding of the motion compensated prediction error and the corresponding motion vector data and the wavelet coefficients. JPEG 2000 quantization and arithmetic coding are applied to the resulting prediction error. The performance of the motion compensated predictive in-band coder is evaluated for different block sizes and sub-pixel accuracy.
A novel concept for SNR scalability with motion compensation in the enhancement layer is introduced. The quantization of the prediction error at different quantization step sizes is performed in the same loop. This allows the application of bit plane coding if the configuration of the quantizers is appropriately chosen. Since a layered prediction is employed at the encoder a drift can occur at a base layer decoder. The concept is, therefore, extended by a drift limitation operation. In this context, two approaches are investigated. One is based on a modification of the prediction error. In the second approach the drift is controlled by dynamic clipping of the enhancement prediction. The proposed SNR scalability concept is applied to the lowpass band of a wavelet-based video coding scheme. The performance is compared with a conventional approach to SNR scalability with two and three quantization layers, respectively.
Due to the rapidly growing multimedia content available on the internet it is highly desirable to index multimedia data automatically and to provide content based search and retrieval functionalities. The first step in order to describe and annotate video data is to split the sequences into sub-shots which are related to semantic units. This paper addresses unsupervised scene change detection and keyframe selection of video sequences. Unlike other methods this is performed by using a standardized multimedia content description of the video data. We apply the MPEG-7 scalable color descriptor and the edge histogram descriptor for shot boundary detection and show that this method performs well. Furthermore, we propose to store the output data of our system in a video segment description scheme to provide simple but efficient search and retrieval functionalities for video scenes based on color features.
Today's standard video coders employ the hybrid coding scheme on a macroblock basis. In these coders blocks of 16 X 16, and 8 X 8 pixel are used for motion compensation of non- interlaced video. The Discrete Cosine Transform (DCT) is then applied to the prediction error on blocks of size 8 X 8. The emerging coding standard H.26L employs a set of seven different block sizes for motion compensation. The size of these blocks varies from 4 X 4 to 16 X 16. The block sizes smaller than 8 X 8 imply that the 8 X 8 DCT cannot be used for transform coding of the prediction error. In the current test model an integer approximation of the 4 X 4 DCT matrix is employed. In this paper the concept of Adaptive Block Transforms is proposed. In this scheme, the transform block size is adapted to the block sizes used for motion compensation. The transform exploits the maximum possible signal length for transform coding without exceeding the compensated block boundaries. The proposed scheme is integrated into the H.26L test model. New integer approximations of the 8 X 8 and 16 X 16 DCT matrices are introduced. Like the TML 4 X 4 transform the coefficient values of these matrices are restricted to a limited range. The results presented here are based on an entropy estimation. They reveal an increased rate/distortion performance of approximately 1.1 dB for high rates on the employed test sequences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.