Estimating building height from satellite imagery is important for digital surface modeling while also providing rich information for change detection and building footprint detection. The acquisition of building height usually requires a LiDAR system, which is not often available in many satellite systems. In this paper, we describe a building height estimation method that does not require building height annotation. Our method estimates building height using building shadows and satellite image metadata given a single RGB satellite image. To reduce the data annotation needed, we design a multi-stage instance detection method for building and shadow detection with both supervised and semi-supervised training. Given the detected building and shadow instances, we can then estimate the building height with satellite image metadata. Building height estimation is done by maximizing the overlap between the projected shadow region given a query height and the detected shadow region. We evaluate our method on the xView2 and Urban Semantic 3D datasets and show that the proposed method achieves accurate building detection, shadow detection, and height estimation.
Video sharing platforms and social networks have been growing very rapidly for the past few years. The rapid increase in the amount of video content introduces many challenges in terms of copyright violation detection and video search and retrieval. Generating and matching content-based video signatures, or fingerprints, is an effective method to detect copies or “near-duplicate” videos. Video signatures should be robust to changes in the video features used to characterize the signature caused by common signal processing operations. Recent work has focused on generating video signatures based on the uncompressed domain. However, decompression is a computationally intensive operation. In large video databases, it becomes advantageous to create robust signatures directly from the compressed domain. The High Efficiency Video Coding (HEVC) standard has been recently ratified as the latest video coding standard and wide spread adoption is anticipated. We propose a method in which a content-based video signature is generated directly from the HEVC-coded bitstream. Motion vectors from the HEVC-coded bitstream are used as the features. A robust hashing function based on projection on random matrices is used to generate the hashing bits. A sequence of these bits serves as the signature for the video. Our experimental results show that our proposed method generates a signature robust to common signal processing techniques such as resolution scaling, brightness scaling and compression.
KEYWORDS: Image segmentation, 3D modeling, Scanning electron microscopy, Image processing algorithms and systems, Expectation maximization algorithms, 3D image processing, Deconvolution, Image processing, Microscopes, 3D acquisition
In this paper, we propose the scanning electron microscope (SEM) image blurring model and apply this model
to the joint deconvolution and segmentation method which performs deconvolution and segmentation simultaneously.
In the field of materials science and engineering, automated image segmentation techniques are critical and
getting exact boundary shape is especially important. However, there are still some difficulty in getting good segmentation
results when the images have blurring degradation. SEM images have blurring due in part to complex
electron interactions during acquisition. To improve segmentation results at object boundaries, we incorporate
prior knowledge of this blurring degradation into the existing EM/MPM segmentation algorithm. Experimental
results are presented to demonstrate that the proposed method can be used to improve the segmentation of
microscope images of materials.
This paper presents the application of the expectation-maximization/maximization of the posterior marginals
(EM/MPM) algorithm to signal detection for functional MRI (fMRI). On basis of assumptions for fMRI 3-D image
data, a novel analysis method is proposed and applied to synthetic data and human brain data. Synthetic data analysis is
conducted using two statistical noise models (white and autoregressive of order 1) and, for low contrast-to-noise ratio
(CNR) data, reveals better sensitivity and specificity for the new method than for the traditional General Linear Model
(GLM) approach. When applied to human brain data, functional activation regions are found to be consistent with those
obtained using the GLM approach.
This paper describes the application of the expectation-maximization/maximization of the posterior marginals
(EM/MPM) algorithm to serial section images, which inherently represent three dimensional (3D) data. The images of
interest are electron micrographs of cross sections of a titanium alloy. To improve the accuracy of the resulting
segmentation images, the images are pre-filtered before being used as input to the EM/MPM algorithm. The output of
the pre-filter at a particular pixel represents an estimate of the entropy at that pixel, based on the grayscale values of
neighboring pixels. This filter tends to be biased towards higher entropy values if an edge is present within the window
being used. This causes edges in the final segmentation to move out from higher entropy regions and into lower entropy
regions. In order to preserve the locations of these edges, a multiscale technique involving the use of an adaptive filter
window has been developed. We present experimental results demonstrating the application of this technique.
KEYWORDS: Scalable video coding, Computer programming, Video, Video coding, Quantization, Motion analysis, Video processing, Motion models, Spatial resolution, Data modeling
In this paper, a new enhancement layer motion compensation technique referred to as subband motion compensation
is proposed for spatially scalable video coding. This approach is proposed as an alternative to a technique
which we call pyramid motion compensation in this paper and which is referred to as inter-layer residual prediction
in the H.264/MPEG4-AVC scalable extension, Scalable Video Coding (SVC) standard. The main difference
between these two techniques lies in the way they use the base layer information to encode the enhancement
layer. Experimental results comparing the two approaches show that for enhancement layer encoding, pyramid
method is better when the corresponding base layer is encoded with a lower bitrate while subband method
outperforms pyramid method when the base layer has a higher bitrate. This motivates future proposed work to
adaptively choose between these two methods at the macroblock level or even at the transform coefficient level
for spatially scalable video coding.
This paper describes a new approach for improving the coding efficiency of spatially scalable video coding. For spatial scalability implemented within the traditional hybrid block transform motion-compensated video coding framework, it has been surprisingly difficult to achieve coding efficiency significantly better than simulcast. One of the most difficult tasks to do efficiently is the motion compensation in the enhancement layer. The new approach for motion compensation presented here uses techniques borrowed from frequency scalability. The three frequency scalability approaches to motion compensation that can be adapted to spatial scalability are the pyramid scheme, the subband scheme, and the conditional replacement scheme. This paper describes the pyramid scheme and presents experimental results comparing the pyramid approach to non-scalable H.264 and simulcast.
This paper addresses the problem of efficiently decoding high- definition (HD) video for display at a reduced resolution. The decoder presented in this paper is intended for applications that are constrained not only in memory size, but also in peak memory bandwidth. This is the case, for example, during decoding of a high-definition television (HDTV) channel for picture-in-picture (PIP) display, if the reduced resolution PIP-channel decoder is sharing memory with the full-resolution main-channel decoder. The most significant source of video quality degradation in a reduced-resolution decoder is prediction drift, which is caused by the mismatch between the full-resolution reference frames used by the encoder and the subsampled reference frames used by the decoder. to mitigate the visually annoying effects of prediction drift, the decoder described in this paper operates at two different resolutions -- a lower resolution for B pictures, which do not contribute to prediction drift and a higher resolution for I and P pictures. This means that the motion-compensation unit (MCU) essentially operates at the higher resolution, but the peak memory bandwidth is the same as that required to decode at the lower resolution. Storage of additional data, representing the higher resolution for I and P pictures, requires a relatively small amount of additional memory as compared to decoding at the lower resolution. Experimental results will demonstrate the improvement in video quality achieved by the addition of the higher-resolution data in forming predictions for P pictures.
The use of mathematical morphology in low- and mid-level image processing and computer vision applications has allowed the development of a class of techniques for analyzing shape information in color images. These techniques have shown to be useful in image enhancement, segmentation, and analysis. In this paper, we develop and test scalable parallel algorithms necessary to implement a class of morphological filters on a parallel computer, specifically, the MasPar MP-1. We examine the issues relative to the parallel implementation of the algorithms and show that real-time enhancement of high resolution color images is possible.
We present a new algorithm that utilizes mathematical morphology for pyramidal decomposition of color images. Several previous approaches have utilized linear or morphological smoothing to obtain pyramidal representations of monochrome images. In this paper an extension of various previously developed monochrome pyramid algorithms is presented. Our decomposition algorithm allows for lossy color image compression by using Block Truncation Coding at the pyramid levels to attain reduced bit rates.
The use of mathematical morphology in low and mid-level image processing and computer vision applications has allowed the development of a class of techniques for analyzing shape information in monochromatic images. In this paper, we extend some of these techniques to color images. We have investigated the application of various methods for 'color morphology'. We present results of our empirical study for three different applications: noise suppression, multiscale smoothing, and edge detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.