Contrast energy was proposed by Watson, Barlow, and Robson (Science, 1983) as a useful metric for representing luminance contrast target stimuli because it represents the detectability of the stimulus in photon noise for an ideal observer. We propose here the use of visible contrast energy metrics for detection and discrimination among static luminance patterns. The visibility is approximated with spatial frequency sensitivity weighting and eccentricity sensitivity weighting. The suggested weighting functions revise the Standard Spatial Observer (Watson and Ahumada, J. Vision, 2005) for luminance contrast detection , extend it into the near periphery, and provide compensation for duration. Under the assumption that the detection is limited only by internal noise, both detection and discrimination performance can be predicted by metrics based on the visible energy of the difference images.
The Spatial Standard Observer (SSO) was developed in response to a need for a simple, practical tool for measurement of visibility and discriminability of spatial patterns. The SSO is a highly simplified model of human spatial vision, based on data collected in a large cooperative multi-lab project known as ModelFest. It incorporates only a few essential components, such as a local contrast transformation, contrast sensitivity function, local masking, and local pooling. The SSO may be useful in a wide variety of applications, such as evaluating vision from unmanned aerial vehicles, measuring visibility of damage to aircraft and to the shuttle orbiter, predicting outcomes of corrective laser eye surgery, inspection of displays during the manufacturing process, estimation of the quality of compressed digital video, evaluation of legibility of text, and predicting discriminability of icons or symbols in a graphical user interface. In this talk I will describe the development of the SSO, and will discuss in detail a number of these potential applications.
KEYWORDS: Colorimetry, Spatial frequencies, Modulation, Data modeling, Contrast sensitivity, Visual process modeling, Distance measurement, Visualization, Human vision and color perception, Databases
The aim of the ColorFest is to extend the original ModelFest (http://vision.arc.nasa.gov/modelfest/) experiments to build a spatio-chromatic standard observer for the detection of static coloured images. The two major issues that need to be addressed are (1) the contrast sensitivity functions for the three chromatic mechanisms and (2) how the output of these channels is combined. We measured detection thresholds for stimuli modulated along different colour directions and for a wide range of spatial frequencies. The three main directions (an achromatic direction, a nominally isoluminant red-green direction, and the tritanopic confusion line) and four intermediate colour directions were used. These intermediate directions were the vector sums of the thresholds along the main directions. We evaluate two models. Detection performance is described by a linear transformation C defining the chromatic tuning and a diagonal matrix S reflecting the sensitivity of the chromatic mechanisms for a particular spatial frequency. The output of the three chromatic mechanisms is combined according to a Minkowski metric (General Separable Model), or according to a Euclidean Distance measure (Ellipsoidal Separable Model). For all three observers the ellipsoidal model fits as well as the general separable model. Estimating the chromatic tuning improves the model fit for one observer.
Models that predict human performance on narrow classes of visual stimuli abound in the vision science literature. However, the vision and the applied imaging communities need robust general-purpose, rather than narrow, computational human visual system (HVS) models to evaluate image fidelity and quality and ultimately improve imaging algorithms. Of the general-purpose early HVS models that currently exist, direct model comparisons on the same data sets are rarely made. The Modelfest group was formed several years ago to solve these and other vision modeling issues. The group has developed a database of static spatial test images with threshold data that is posted on the WEB for modellers to use in HVS model design and testing. The first phase of data collection was limited to detection thresholds for static gray scale 2D images. The current effort will extend the database to include thresholds for selected grayscale 2D spatio-temporal image sequences. In future years, the database will be extended to include discrimination (masking) for dynamic, color and gray scale image sequences. The purpose of this presentation is to invite the Electronic Imaging community to participate in this effort and to inform them of the developing data set, which is available to all interested researchers. This paper presents the display specifications, psychophysical methods and stimulus definitions for the second phase of the project, spatio-temporal detection. The threshold data will be collected by each of the authors over the next year and presented on the WEB along with the stimuli.
KEYWORDS: Video, Visualization, Calibration, Sensors, Video compression, Video processing, Data modeling, Statistical analysis, Quality measurement, Digital video discs
The study of subjective visual quality, and the development of computed quality metrics, require accurate and meaningful measurement of visual impairment. A natural unit for impairment is the JND. In many cases, what is required is a measure of an impairment scale, that is, the growth of the subjective impairment, in JNDS, as some physical parameters is increased.
KEYWORDS: Video, Data modeling, Visualization, Video compression, Quantization, Human vision and color perception, Spatial frequencies, Visibility, Error analysis, Mathematical modeling
Development of video quality metrics has taken support from experimental vision data mainly at two levels of abstraction. On the one hand are the carefully controlled tests of human visual response to well-defined, controlled visual stimuli, such as the ModelFest study. On the other hand are experiments in which viewers rate the global quality of 'natural' video sequences exhibiting impairments of loosely-controlled composition and amplitude, as in the Video Quality Experts Group study. The IEEE Broadcast Technology Society Subcommittee on Video Compression Measurements has initiated an intermediate level approach to video quality assessment aimed toward developing a scale of video impairment and unit of measure by which to describe video distortion in both perceptual and engineering terms. The proposed IEEE study will attempt to define a scale of video impairment in terms of multiple measurements of the just-noticeable-difference (JND) of compression-induced video impairments. A paired comparison psychophysical method will be used to define a psychometric function of the visual sensitivity to compression-induced video impairments of various amplitudes. In this effort, quality assessment is related directly to visual perception of video impairments rather than to the more 'atomic' visual stimuli as used in many human vision experiments. Yet the experimenter's control over the stimuli is greater than that used in much of contemporary video quality testing.
KEYWORDS: Data modeling, Visual process modeling, Spatial frequencies, Human vision and color perception, Databases, Video compression, Composites, Visualization, Performance modeling, Image compression
A robust model of the human visual system (HVS) would have a major practical impact on the difficult technological problems of transmitting and storing digital images. Although most HVS models exhibit similarities, they may have significant differences in predicting performance. Different HVS models are rarely compared using the same set of psychophysical measurements, so their relative efficacy is unclear. The Modelfest organization was formed to solve this problem and accelerate the development of robust new models of human vision. Members of Modelfest have gathered psychophysical threshold data on the year one stimuli described at last year's SPIE meeting. Modelfest is an exciting new approach to modeling involving the sharing of resources, learning from each other's modeling successes and providing a method to cross-validate proposed HVS models. The purpose of this presentation is to invite the Electronic Imaging community to participate in this effort and inform them of the developing database, which is available to all researchers interested in modeling human vision. In future years, the database will be extended to other domains such as visual masking, and temporal processing. This Modelfest progress report summarizes the stimulus definitions and data collection methods used, but focuses on the results of the phase one data collection effort. Each of the authors has provided at least one dataset from their respective laboratories. These data and data collected subsequent to the submission of this paper are posted on the WWW for further analysis and future modeling efforts.
Ann Rohaly, Philip Corriveau, John Libert, Arthur Webster, Vittorio Baroncini, John Beerends, Jean-Louis Blin, Laura Contin, Takahiro Hamada, David Harrison, Andries Hekstra, Jeffrey Lubin, Yukihiro Nishida, Ricardo Nishihara, John Pearson, Antonio Pessoa, Neil Pickford, Alexander Schertz, Massimo Visca, Andrew Watson, Stefan Winkler
The Video Quality Experts Group (VQEG) was formed in October 1997 to address video quality issues. The group is composed of experts from various backgrounds and affiliations, including participants from several internationally recognized organizations working int he field of video quality assessment. The first task undertaken by VQEG was to provide a validation of objective video quality measurement methods leading to recommendations in both the telecommunications and radiocommunication sectors of the International Telecommunications Union. To this end, VQEG designed and executed a test program to compare subjective video quality evaluations to the predictions of a number of proposed objective measurement methods for video quality in the bit rate range of 768 kb/s to 50 Mb/s. The results of this test show that there is no objective measurement system that is currently able to replace subjective testing. Depending on the metric used for evaluation, the performance of eight or nine models was found to be statistically equivalent, leading to the conclusion that no single model outperforms the others in all cases. The greatest achievement of this first validation effort is the unique data set assembled to help future development of objective models.
KEYWORDS: Visual process modeling, Data modeling, Spatial frequencies, Databases, Visualization, Image quality, Image compression, Human vision and color perception, Performance modeling, Linear filtering
Models that predict human performance on narrow classes of visual stimuli abound in the vision science literature. However, the vision and the applied imaging communities need robust general-purpose, rather than narrow, computational human visual system models to evaluate image fidelity and quality and ultimately improve imaging algorithms. Psychophysical measure of image imaging algorithms. Psychophysical measures of image quality are too costly and time consuming to gather to evaluate the impact each algorithm modification might have on image quality.
KEYWORDS: Video, Data modeling, Visualization, Video compression, Spatial filters, Video processing, Spatial frequencies, Visual process modeling, Human vision and color perception, Error analysis
The growth of digital video has given rise to a need for computational methods for evaluating the visual quality of digital video. We have developed a new digital video quality metric, which we call DVQ. Here we provide a brief description of the metric, and give a preliminary report on its performance. DVQ accepts a pair of digital video sequences, and computes a measure of the magnitude of the visible difference between them. The metric is based on the Discrete Cosine Transform. It incorporates aspects of early visual processing, including light adaptation, luminance and chromatic channels, spatial and temporal filtering, spatial frequency channels, contrast masking, and probability summation. It also includes primitive dynamics of light adaptation and contrast masking. We have applied the metric to digital video sequences corrupted by various typical compression artifacts, and compared the results to quality ratings made by human observers.
KEYWORDS: Video, Visualization, Visual process modeling, Image compression, Quantization, Video compression, Data modeling, RGB color model, Visual compression, Colorimetry
The advent of widespread distribution of digital video creates a need for automated methods for evaluating the visual quality of digital video. This is particularly so since most digital video is compressed using lossy methods, which involve the controlled introduction of potentially visible artifacts. Compounding the problem is the bursty nature of digital video, which requires adaptive bit allocation based on visual quality metrics, and the economic need to reduce bit-rate to the lowest level that yields acceptable quality. In previous work, we have developed visual quality metrics for evaluating, controlling,a nd optimizing the quality of compressed still images. These metrics incorporate simplified models of human visual sensitivity to spatial and chromatic visual signals. Here I describe a new video quality metric that is an extension of these still image metrics into the time domain. Like the still image metrics, it is based on the Discrete Cosine Transform. An effort has been made to minimize the amount of memory and computation required by the metric, in order that might be applied in the widest range of applications. To calibrate the basic sensitivity of this metric to spatial and temporal signals we have made measurements of visual thresholds for temporally varying samples of DCT quantization noise.
The ability of a human observer to locate a lesion in natural medical image backgrounds (extracted from patients x-ray coronary angiograms) is degraded by two major factors: (1) the noisy variations in the background, (2) the presence of a high contrast complex background (through pattern masking effects). The purpose of this paper is to isolate and model the effect of a deterministic complex background on visual signal detection in natural medical image backgrounds. We perform image discrimination experiments where the observers have to discriminate an image containing the background plus signal from an image containing the background only. Five different samples of medical image backgrounds were extracted from patients' digital x-ray coronary angiograms. On each trial, two images were shown sequentially, one image with the simulated contrast target and the other without. The observer's task was to select the image with the target. An adaptive staircase method was used to determine the sequence of signal contrasts presented and the signal's energy thresholds were determined by maximum likelihood estimation. We tested the ability of single channel and multiple channel image discrimination models with a variety of contrast gain control mechanisms to predict the variation of the signal energy threshold in the different background samples. Human signal energy thresholds were best predicted by a multiple channel model with wide band masking.
Image quality models usually include a mechanism whereby artifacts are masked by the image acting as a background. Scientific study of visual masking has followed two traditions: contrast masking and noise masking, depending primarily on whether the mask is deterministic or random. In the former tradition, masking is explained by a decrease in the effective gain of the early visual system. In the latter tradition, masking is explained by an increased variance in some internal decision variable. The masking process in image quality models is usually of the gain-control variety, derived from the contrast masking tradition. In this paper we describe a third type of masking, which I call entropy masking, that arises when the mask is deterministic but unfamiliar. Some properties and implication of entropy masking are discussed. We argue that image quality models should incorporate entropy masking, as well as contrast masking.
This is a brief report on research on the subject of DCTune optimization of JPEG compression of dental x-rays. DCTune is a technology for optimizing DCT quantization matrices to yield maximum perceptual quality for a given bit-rate, or minimum bit-rate for a given perceptual quality. In addition, the technology provides a means of setting the perceptual quality of compressed imagery in a systematic way. We optimized matrices for a total of 20 images at two resolutions (150 and 300 dpi) and four bit-rates (0.25, 0.5, 0.75, 1.0 bits/pixel), and examined structural regularities in the resulting matrices. We also conducted some brief psychophysical studies to validate the DCTune quality metric and to demonstrate the visual advantage of DCTune compression over standard JPEG.
Keywords: jpeg, compression, quantization, adaptive, image quality
Experiments on visual detection in computer simulated noise (e.g. white noise) show that random variations from location to location in the image (due to noise) degrade human performance. Psychophysical experiments of visual detection of signals superimposed on a known deterministic background ('mask') show that human performance can be degraded by the presence of a high contrast deterministic background through divisive inhibition. The purpose of this paper is to perform a psychophysical experiment to determine the relative importance of these two sources of performance degradation (random background variations and contrast masking effects) in human visual detection in natural medical image backgrounds. The results show that both contrast masking and random background variations degrade human performance for detecting signals in natural medical image backgrounds. These results suggest that current observer models which do not include a source of degradation due to the deterministic presence of the background might need to model such effects in order to reliably predict human visual detection in natural medical image backgrounds.
The discrete wavelet transform (DWT) decomposes an image into bands that vary in spatial frequency and orientation. It is widely used for image compression. Measures of the visibility of DWT quantization errors are required to achieve optimal compression. Uniform quantization of a single band of coefficients results in an artifact that is the sum of a lattice of random amplitude basis functions of the corresponding DWT synthesis filter, which we call DWT uniform quantization noise. We measured visual detection thresholds for samples of DWT uniform quantization noise in Y, Cb, and Cr color channels. The spatial frequency of a wavelet is r 2-L, where r is display visual resolution in pixels/degree, and L is the wavelet level. Amplitude thresholds increase rapidly with spatial frequency. Thresholds also increase from Y to Cr to Cb, and with orientation from low-pass to horizontal/vertical to diagonal. We propose a mathematical model for DWT noise detection thresholds that is a function of level, orientation, and display visual resolution. This allows calculation of a 'perceptually lossless' quantization matrix for which all errors are in theory below the visual threshold. The model may also be used as the basis for adaptive quantization schemes.
Object detection involves looking for one of a large set of object subimages in a large set of background images. Image discrimination models predict the probability that an observer will detect a difference between two images. We find that discrimination models can predict the relative detectability of objects in different images, suggesting that these simpler models may be useful in some object detection applications. Six images of a vehicle in an otherwise natural setting were altered to remove the vehicle and mixed with the original image in various proportions. Nineteen observers rated the 24 images for the presence of a vehicle. The pattern of observer detectabilities for the different images was predicted by three discrimination models. A Cortex transform discrimination model, a contrast sensitivity function filter model, and a root-mean-square difference predictor based on the digital image values gave prediction errors of 15%, 49%, and 46%, respectively. Two observers given the same images repeatedly to make the task a discrimination task rated the images similarly, but had detectabilities a factor of two higher.
The large variety of algorithms for data compression has created a growing need for methods to judge (new) compression algorithms. The results of several subjective experiments illustrate that numerical category scaling techniques provide an efficient and valid way not only to obtain compression ratio versus quality curves that characterize coder performance over a broad range of compression ratios, but also to assess perceived image quality in a much smaller range (e.g. close to threshold level). Our first object is to discuss a number of simple techniques that can be used to assess perceived image quality. We show how to analyze data obtained from numerical category scaling experiments and how to set up such experiments. Second, we demonstrate that the results from a numerical scaling experiment depend on the specific nature of the subject's task in combination with the nature of the images to be judged. As results from subjective scaling experiments depend on many factors, we conclude that one should be very careful in selecting an appropriate assessment technique.
Several recent image compression standards rely upon the discrete cosine transform (DCT). Models of DCT basis function visibility can be used to design quantization matrices for arbitrary viewing conditions and images. Here we report new results on the effects of viewing distance and contrast masking on basis function visibility. We measured contrast detection thresholds for DCT basis functions at viewing distances yielding 16, 32, and 64 pixels/degree. Our detection model has been elaborated to incorporate the observed effects. We have also measured detection thresholds for individual basis functions when superimposed upon another basis function of the same or a different frequency. We find considerable masking between nearby DCT frequencies. A model for these masking effects also is presented.
A detection model is developed to predict visibility thresholds for discrete cosine transform coefficient quantization error, based on the luminance and chrominance of the error. The model is an extension of a previously proposed luminance-based model, and is based on new experimental data. In addition to the luminance-only predictions of the previous model, the new model predicts the detectability of quantization error in color space directions in which chrominance error plays a major role. This more complete model allows DCT coefficient quantization matrices to be designed for display conditions other than those of the experimental measurements: other display luminances, other veiling luminances, other spatial frequencies (different pixel sizes, viewing distances, and aspect ratios), and other color directions.
Several image compression standards (JPEG, MPEG, H.261) are based on the Discrete Cosine Transform (DCT). These standards do not specify the actual DCT quantization matrix. Ahumada & Peterson and Peterson, Ahumada & Watson provide mathematical formulae to compute a perceptually lossless quantization matrix. Here I show how to compute a matrix that is optimized for a particular image. The method treats each DCT coefficient as an approximation to the local response of a visual `channel.' For a given quantization matrix, the DCT quantization errors are adjusted by contrast sensitivity, light adaptation, and contrast masking, and are pooled non-linearly over the blocks of the image. This yields an 8 X 8 `perceptual error matrix.' A second non-linear pooling over the perceptual error matrix yields total perceptual error. With this model we may estimate the quantization matrix for a particular image that yields minimum bit rate for a given total perceptual error, or minimum perceptual error for a given bit rate. Custom matrices for a number of images show clear improvement over image-independent matrices. Custom matrices are compatible with the JPEG standard, which requires transmission of the quantization matrix.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.