The shape and values of a typical static convolution kernel remain fixed once the network is trained. Recently, dynamic convolutions were proposed to change the kernel’s values depending on the input during the test phase. We aim to extend the concept of dynamic convolutions by introducing an element-wise dynamic convolution approach. This method enables adaptive changes in kernel values for each output data element. Furthermore, a deformable element-wise dynamic convolution is proposed to enable simultaneous changes in kernel shape and value. The proposed deformable dynamic convolution is compatible with the static convolution in terms of input–output relationships. The capability of existing network architectures can be enhanced by replacing the static convolution with the suggested deformable dynamic convolution. Extensive experiments demonstrate that the proposed deformable dynamic convolution can improve the network performance in various computer vision tasks, including image classification and semantic segmentation.
Multi-modal pedestrian detection, which integrates visible and thermal sensors, has been developed to overcome many limitations of visible-modal pedestrian detection, such as poor illumination, cluttered background, and occlusion. By adopting the combination of multiple modalities, we can efficiently detect pedestrians even with poor visibility. Nevertheless, the critical assumption of multi-modal pedestrian detection is that multi-modal images are perfectly aligned. In general, however, this assumption often becomes invalid in real-world situations. Viewpoints of the different modal sensors are usually different. Then, the positions of pedestrians on the different modal images have disparities. We proposed a multi-modal faster-RCNN specifically designed to handle misalignment between two modalities. The faster-RCNN consists of a region proposal network (RPN) and a detector. We introduce position regressors for both modalities in the RPN and the detector. Intersection over union (IoU) is one of the useful metrics for object detection but is defined only for a single-modal image. We extend it into multi-modal IoU to evaluate the preciseness of both modalities. Our experimental results with the proposed evaluation metrics demonstrate that the proposed method has comparable performance with state-of-the-art methods and outperforms them for data with significant misalignment.
We present a gradient-domain image reconstruction framework with the chroma-preserving pixel-wise intensity-range constraint and the base-structure constraint. The existing methods for manipulating base structures and detailed textures are classifiable into two major streams: gradient-domain and layer-decomposition. To generate detail-preserving and artifact-free output images, we combine the two approaches’ benefits into the proposed framework by introducing the chroma-preserving intensity-range constraint and the base-structure constraint. To preserve details of the input image, the proposed method takes advantage of reconstructing the output image in the gradient domain, whereas the output intensity is guaranteed to lie within the specified intensity range by the intensity-range constraint. The reconstructed image lies close to the base structure by the base-structure constraint, which is effective for restraining artifacts. Using this chroma-preserving pixel-wise luminance constraint, the proposed algorithm does not require post-processing such as intensity clipping or rescaling. The proposed framework directly generates the output luminance, which guarantees that the output RGB intensities are within the target intensity range, preserving the chromatic component. Experiments demonstrated that (1) the proposed framework is effective for various applications such as tone mapping, seamless image cloning, detail enhancement, and image restoration, and (2) the proposed framework can preserve chroma components compared with the existing methods.
Color correction is one of the most essential camera imaging operations that transforms a camera-specific RGB color space to a standard color space, typically the XYZ or the sRGB color space. Linear color correction (LCC) and polynomial color correction (PCC) are two widely used methods; they perform the color space transformation using a color correction matrix. Owing to the use of high-order terms, PCC generally achieves lower colorimetric errors than LCC. However, PCC amplifies noise more severely than LCC. Consequently, for noisy images, there exists a trade-off between LCC and PCC regarding color fidelity and noise amplification. We propose a color correction framework called tunable color correction (TCC) that enables us to tune the color correction matrix between the LCC and the PCC models. We also derive a mean squared error calculation model of PCC that enables us to select the best trade-off balance in the TCC framework. We experimentally demonstrate that TCC effectively balances the trade-off for noisy images and outperforms LCC and PCC. We also generalize TCC to multispectral cases and demonstrate its effectiveness by taking the color correction for an RGB-near-infrared sensor as an example.
We present a method of constructing a full thermal panorama from a collection of images taken by a long wavelength infrared and visible camera system. Due to recent development in joint stereo calibration, our method is able to project a thermal image onto a corresponding visible image. After both images are in the same domain, panoramic projection from visible images can also be applied to thermal images. Our method also extends the scope of the projection function by allowing nonzero distance between center of rotation and optical center of rotating camera as well as arbitrary camera’s primary axes that are no longer parallel to the world axes. We show that our method works on a dataset with various camera movements and compare our results with standard software made specifically for a visible panorama and a thermal panorama. Furthermore, we are able to combine thermal panoramas of the same scene with a simple calculation. With an addition of guided upsampling, we can simultaneously improve thermal image resolution and field of view with a single run of the method.
Infrared (IR) thermography camera became an essential tool for monitoring applications such as pedestrian detection and equipment monitoring. Most commonly used IR cameras are Long Wavelength Infrared (LWIR) cameras due to their suitable wavelength for environmental temperature. Even though the cost of LWIR cameras had been on a decline, the affordable ones only provided low-resolution images. Enhancement techniques that could be applied to visible images often failed to perform correctly on low-resolution LWIR images. Many attempts on thermal image enhancement had been on high-resolution images. Stereo calibration between visible cameras and LWIR cameras had recently been improved in term of accuracy and ease of use. Recent visible cameras and LWIR cameras are bundled into one device, giving the capability of simultaneously taking visible and LWIR images. However, few works take advantage of this camera systems. In this work, image enhancement framework for visible and LWIR camera systems is proposed. The proposed framework consists of two inter-connected modules: visible image enhancement module and LWIR image enhancement module. The enhancement technique that will be experimented is image stitching which serves two purposes: view expansion and super-resolution. The visible image enhancement module follows a regular workflow for image stitching. The intermediate results such as homography and seam carvings labels are passed to LWIR image enhancement module. The LWIR image enhancement module aligns LWIR images to visible images using stereo calibrations results and utilizes already computed homography from visible images to avoid feature extraction and matching on LWIR images. The framework is able to handle difference in image resolution between visible images and LWIR images by performing sparse pixel-to-pixel version of image alignment and image projection. Experiments show that the proposed framework leads to richer image stitching's results comparing to the results from an existing commercial software.
A far-infrared (FIR) image contains important invisible information for various applications such as night vision and fire detection, while a visible image includes colors and textures in a scene. We present a coaxial visible and FIR camera system accompanied to obtain the complementary information of both images simultaneously. The proposed camera system is composed of three parts: a visible camera, a FIR camera, and a beam-splitter made from silicon. The FIR radiation from the scene is reflected at the beam-splitter, while the visible radiation is transmitted through this beam-splitter. Even if we use this coaxial visible and FIR camera system, the alignment between the visible and FIR images are not perfect. Therefore, we also present the joint calibration method which can simultaneously estimate accurate geometric parameters of both cameras, i.e. the intrinsic parameters of both cameras and the extrinsic parameters between both cameras. In the proposed calibration method, we use a novel calibration target which has a two-layer structure where thermal emission property of each layer is different. By using the proposed calibration target, we can stably and precisely obtain the corresponding points of the checker pattern in the calibration target from both the visible and the FIR images. Widely used calibration tools can accurately estimate both camera parameters. We can obtain aligned visible and FIR images by the coaxial camera system with precise calibration using two-layer calibration target. Experimental results demonstrate that the proposed camera system is useful for various applications such as image fusion, image denoising, and image up-sampling.
Recent developments of long wave infrared (LWIR) devices and LWIR sensor technologies enable us to obtain an LWIR image with high bit depth and low signal-noise ratio. To exploit these recent developments, we propose a novel temperature visualization method that can simultaneously represent global distribution and local details of the input temperature. The global temperature distribution is represented by pseudo color. On the other hand, to manipulate the local temperature details, the output luminance is generated by gradient-domain image reconstruction. Experimental results on real LWIR images show the effectiveness of the proposed method.
In this paper, we propose a unified optimization framework for L2, L1, and/or L0 constrained image reconstruction. First, we generalize cost functions for image reconstruction, which consist of a fidelity term with L2 norm and constraint terms with L2, L1, and/or L0 norms. This generalized cost function covers many types of existing cost functions for image reconstruction. Then, we show that this generalized cost function can be optimized by the alternating direction method of multipliers (ADMM). The ADMM is a well-known iterative optimization approach for convex problems. Experimental results demonstrate that the proposed unified optimization framework is applicable to a wide range of applications.
We present an image fusion algorithm for a visible image and a near-infrared image. The proposed algorithm synthesizes a fused image that includes high-visibility information of both images while reducing artifacts caused by geometric and illumination inconsistencies. In the proposed fusion, the high-visibility area is labeled at each pixel by global optimization based on the local visibility and inconsistency. The local visibility is evaluated using a local contrast. The inconsistency is also locally estimated based on a learning-based approach. The fused luminance is constructed using Poisson image reconstruction that preserves the gradient of the selected high-visibility areas. The proposed fusion framework has various applications, which include denoising, haze removal, and image enhancement. Experimental results show that the proposed method has comparable or even superior performance to existing methods designed for specific applications.
This paper presents a novel image fusion algorithm for a visible image and a near infrared (NIR) image. For
the proposed fusion, the image is selected pixel-by-pixel based on local saliency. In this paper, the local saliency
is measured by a local contrast. Then, the gradient information is fused and the output image is constructed
by a Poisson image editing which preserves the gradient information of both images. The effectiveness of the
proposed fusion algorithm is demonstrated in various applications including denoising, dehazing, and image
enhancement.
KEYWORDS: Color difference, Image processing, Algorithm development, Image acquisition, Visualization, Image interpolation, Color imaging, Digital cameras, Cameras, RGB color model
A color difference interpolation technique is widely used for color image demosaicking. In this paper, we propose
a minimized-laplacian residual interpolation (MLRI) as an alternative to the color difference interpolation, where
the residuals are differences between observed and tentatively estimated pixel values. In the MLRI, we estimate
the tentative pixel values by minimizing the Laplacian energies of the residuals. This residual image transfor-
mation allows us to interpolate more easily than the standard color difference transformation. We incorporate
the proposed MLRI into the gradient based threshold free (GBTF) algorithm, which is one of current state-of-
the-art demosaicking algorithms. Experimental results demonstrate that our proposed demosaicking algorithm
can outperform the state-of-the-art algorithms for the 30 images of the IMAX and the Kodak datasets.
KEYWORDS: RGB color model, Near infrared, Optical filters, Color difference, Image filtering, Lutetium, Image quality, Cameras, Algorithm development, Linear filtering
Extra band information in addition to the RGB, such as the near-infrared (NIR) and the ultra-violet, is valuable for many applications. In this paper, we propose a novel color filter array (CFA), which we call “hybrid CFA," and a demosaicking algorithm for the simultaneous capturing of the RGB and the additional band images. Our proposed hybrid CFA and demosaicking algorithm do not rely on any specific correlation between the RGB and the additional band. Therefore, the additional band can be arbitrarily decided by users. Experimental results demonstrate that our proposed demosaicking algorithm with the proposed hybrid CFA can provide the additional band image while keeping the RGB image almost the same quality as the image acquired by using the standard Bayer CFA.
A sparse representation is known as a very powerful tool to solve image reconstruction problem such as denoising
and the single image super-resolution. In the sparse representation, it is assumed that an image patch or data
can be approximated by a linear combination of a few bases selected from a given dictionary. A single overcomplete
dictionary is usually learned with training patches. Dictionary learning methods almost are concerned
about building a general over-complete dictionary on the assumption that the bases in dictionary can represent
everything. However, using more appropriate dictionary, the sparse representation of patch can obtain better
results. In this paper, we propose a classification-and-reconstruction approach with multiple dictionaries. Before
learning dictionary for reconstruction, some representative bases can be used to classify all training patches
from database and multiple dictionaries for reconstruction can be learned by classified patches respectively. In
reconstruction phase, the patch of input image can be classified and the adaptive dictionary can be selected to
use. We demonstrate that the proposed classification-and-reconstruction approach outperforms existing sparse
representation with the single dictionary.
This paper proposes a novel adaptive dictionary learning approach for a single-image super-resolution based on
a sparse representation. The adaptive dictionary learning approach of the sparse representation is very powerful,
for image restoration such as image denoising. The existing adaptive dictionary learning requires training image
patches which have the same resolution as the output image. Because of this requirement, the adaptive dictionary
learning for the single-image super-resolution is not trivial, since the resolution of the input low-resolution image
which can be used for the adaptive dictionary learning is essentially different from that of the output high-
resolution image. It is known that natural images have high across-resolution patch redundancy which means
that we can find similar patches within different resolution images. Our experimental comparisons demonstrate
that the proposed across-resolution adaptive dictionary learning approach outperforms state-of-the-art single-image super-resolutions.
Spectral reflectance is an inherent property of objects that is useful for many computer vision tasks. The spectral
reflectance of a scene can be described as a spatio-spectral (SS) datacube, in which each value represents the
reflectance at a spatial location and a wavelength. In this paper, we propose a novel method that reconstructs
the SS datacube from raw data obtained by an image sensor equipped with a multispectral filter array. In our
proposed method, we describe the SS datacube as a linear combination of spatially adaptive SS basis vectors.
In a previous method, spatially invariant SS basis vectors are used for describing the SS datacube. In contrast,
we adaptively generate the SS basis vectors for each spatial location. Then, we reconstruct the SS datacube
by estimating the linear coefficients of the spatially adaptive SS basis vectors from the raw data. Experimental
results demonstrate that our proposed method can accurately reconstruct the SS datacube compared with the
method using spatially invariant SS basis vectors.
Multispectral imaging is highly demanded for precise color reproduction and for various computer vision applications.
Multispectral imaging with a multispectral color filter array (MCFA), which can be considered as a
multispectral extension of commonly used consumer RGB cameras, could be a simple, low-cost, and practical
system. A challenge of the multispectral imaging with the MCFA is multispectral demosaicking because each
spectral component of the MCFA is severely undersampled. In this paper, we propose a novel multispectral
demosaicking algorithm using a guided filter. The guided filter is recently proposed as an excellent structurepreserving
filter. The guided filter requires so-called a guide image. A main issue of the guided filter is how to
obtain an effective guide image. In our proposed algorithm, we generate the guide image from the most densely
sampled spectral component in the MCFA. Then, ohter spectral components are interpolated by the guided
filter. Experimental results demonstrate that our proposed algorithm outperforms other existing demosaicking
algorithms both visually and quantitatively.
The bilateral filter and the non-local means (NL-means) filter are known as very powerful nonlinear filters. The
first contribution of this paper is to give a general framework which involves the bilateral filter and the NL-means
filter. The general framework is derived based on Bayesian inference. Our analysis reveals that the range weight
in the bilateral filter and the similarity measure in the NL-means filter are associated with a noise model or
a likelihood distribution. The second contribution is to extend the bilateral filter and the NL-means filter for
a general noise model. We also provide a filter classification. The filter classification framework clarifies the
differences among existing filters and helps us to develop new filters. As example of future directions, we extend
the bilateral filter and the NL-means filter for a general noise model. Both extended filters are theoretically and
experimentally justified.
Many consumer digital color cameras have a single image sensor with a color filter array. The data captured by
the single image sensor are called raw data. An effective compression of the raw data is highly demanded. This
paper proposes a raw data compression method using existing image coding framework. The proposed coding
method is performed to minimize error between the observed raw data and the decoded raw data. Experimental
comparisons demonstrate that the proposed method has high performance compared to existing methods.
We propose a fast MAP-based super-resolution algorithm for reconstructing a high-resolution image (HRI) by combining multiple low-resolution images (LRIs). The proposed algorithm optimizes a cost function with respect to the HRI in the frequency domain, whereas existing MAP algorithms optimize with respect to the HRI in the spatial domain. Calculation amount comparison verifies that the proposed algorithm has a much smaller calculation cost than a classical algorithm. Experiments using real images are also demonstrated. They show that the proposed algorithm greatly hastens the super-resolution process, reconstructing an identical HRI to the
classical algorithm.
Several methods are proposed in this study to retrieve some cloud physical parameters from ground-based simultaneous observations with a microwave radiometer and other instruments. When cloud is thin, liquid water path and columnar water vapor amount can be retrieved to be consistent with measured intensities of two frequencies around 20 GHz and around 30 GHz under the assumption of homogeneous cloud temperature. In this retrieval procedure, cloud temperature, which affects the calculated value of liquid water path seriously, is given form measurement with an IR radiative thermometer. The effective radius of cloud particles is also derived from comparing the measured downward solar flux on the ground surface to calculated one. For the thick cloud, which is expected that the temperature difference between top and bottom of cloud is large, cloud top temperature is also retrieved from narrow view angle measurements of transmitted radiance of solar radiation at two wavelengths in addition to microwave measurements. The liquid water path is deduced from the comparison of measured radiances at the two wavelengths to the calculated values, thus the appropriate cloud top temperature can be retrieved to be consistent with the measured microwave intensity at two frequencies. Practices suggested that the observation with a microwave radiometer and a pyrnometer enables to derive the detail of cloud structure.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.