To achieve consumer-level quality, media systems must process continuous streams of audio and video data while maintaining exacting tolerances on sampling and frame rate, jitter, and synchronization. While it is relatively straightforward to design fixed-function hardware implementations to satisfy worst-case conditions, there is a growing trend to utilize programmable multi-tasking solutions for media applications. The flexibility of these systems enables support for multiple current and future media formats, which can reduce design costs and time-to-market. This paper seeks to provide practical engineering solutions to achieve robust media processing on such systems, with specific attention given to power-constrained environments. The techniques covered in this article utilize the fundamental concepts of software optimization, software/hardware partitioning, stream buffering, hierarchical prioritization, and system resource and power management. A novel enhancement to dynamically adjust processor voltage and frequency based on buffer fullness to reduce system power consumption is examined in detail. The application of these techniques is provided in a case study of a portable video player implementation based on a general-purpose processor running a non real-time operating system that achieves robust playback from local storage and streaming over 802.11.
Compression and interpolation each require, given part of an image, or part of a collection or stream of images, being able to predict other parts. Compression is achieved by transmitting part of the imagery along with instructions for predicting the rest of it; of course, the instructions are usually much shorter than the unsent data. Interpolation is just a matter of predicting part of the way between two extreme images; however, whereas in compression the original image is known at the encoder, and thus the residual can be calculated, compressed, and transmitted, in interpolation the actual intermediate image is not known, so it is not possible to improve the final image quality by adding back the residual image. Practical 3D-video compression methods typically use a system with four modules: (1) coding one of the streams (the main stream) using a conventional method (e.g., MPEG), (2) calculating the disparity map(s) between corresponding points in the main stream and the auxiliary stream(s), (3) coding the disparity maps, and (4) coding the residuals. It is natural and usually advantageous to integrate motion compensation with the disparity calculation and coding. The efficient coding and transmission of the residuals is usually the only practical way to handle occlusions, and the ultimate performance of beginning-to-end systems is usually dominated by the cost of this coding. In this paper we summarize the background principles, explain the innovative features of our implementation steps, and provide quantitative measures of component and system performance.
Eye strain is often experienced when viewing a stereoscopic image pair on a flat display device (e.g., a computer monitor). Violations of two relationships that contribute to this eye strain are: (1) the accommodation/convergence breakdown and (2) the conflict between interposition and disparity depth cues. We describe a simple algorithm that reduces eye strain through horizontal image translation and corresponding image cropping, based on a statistical description of the estimated disparity within a stereoscopi image pair. The desired amount of translation is based on the given stereoscopic image pair, and, therefore, requires no user intervention. In this paper, we first develop a statistical model of the estimated disparity that incorporates the possibility of erroneous estimates. An estimate of the actual disparity range is obtained by thresholding the disparity histogram to avoid the contribution of false disparity values. Based on the estimated disparity range, the image pair is translated to force all points to lie on, or behind, the screen surface. This algorithm has been applied to diverse real stereoscopic images and sequences. Stereoscopic image pairs, which were often characterized as producing eye strain and confusion, produced comfortable stereoscopy after the automated translation.
KEYWORDS: Video, Cameras, Video compression, Composites, Video coding, Signal processing, Computer programming, Motion estimation, Image compression, Video processing
In this paper, we present a new algorithm that adaptively selects the best possible reference frame for the predictive coding of generalized, or multi-view, video signals, based on estimated prediction similarity with the desired frame. We define similarity between two frames as the absence of occlusion, and we estimate this quantity from the variance of composite displacement vector maps. The composite maps are obtained without requiring the computationally intensive process of motion estimation for each candidate reference frame. We provide prediction and compression performance results for generalized video signals using both this scheme and schemes where the reference frames were heuristically pre- selected. When the predicted frames were used in a modified MPEG encoder simulation, the signal compressed using the adaptively selected reference frames required, on average, more than 10% fewer bits to encode than the non-adaptive techniques; for individual frames, the reduction in bits was sometimes more than 80%. These gains were obtained with an acceptable computational increase and an inconsequential bit-count overhead.
Binocular digital imaging is a rapidly developing branch of digital imaging. Any such system must have some means that allows each eye to see only the image intended for it. We describe a time-division multiplexing technique that we have developed for Silicon Graphics Inc. (SGITM) workstations. We utilize the `double buffering' hardware feature of the SGITM graphics system for binocular image rendering. Our technique allows for multiple, re-sizable, full-resolution stereoscopic and monoscopic windows to be displayed simultaneously. We describe corresponding software developed to exploit this hardware. This software contains user-controllable options for specifying the most comfortable zero-disparity plane and effective interocular separation. Several perceptual experiments indicate that most viewers perceive 3D comfortably with this system. We also discuss speed and architecture requirements of the graphics and processor hardware to provide flickerless stereoscopic animation and video with our technique.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.