In the past few years, deep learning-based image inpainting has made significant progress. However, many existing methods do not take into account the rationality of the structure and the fineness of the texture, which leads to the scattered structure or excessive smoothness of the repaired image. To solve this problem, we propose a two-stage image inpainting model composed of structure generation network and texture generation network. The structure generation network focuses on the structure and color domain and uses the damaged structure map extracted from the mask image to reasonably fill the mask area to generate a complete structure map. The texture generation network uses the repaired structure map to guide the refinement process. We train the two-stage network on the public datasets Places2, CelebA, and Paris StreetView, and the experimental results show the superiority of our method over the previous methods.
Video inpainting is a very challenging task. Directly using the image inpainting method to repair the damaged video leads to the inter-frame contents flicker due to temporal discontinuities. In this paper, we introduce spatial structure and temporal edge information guided video inpainting model to repair the missing regions in high-resolution video. The model uses a convolutional neural network with residual blocks to fix up the missing contents in intra-frame according to spatial structure. At the same time, temporal edge of reference frame is introduced in the temporal domain, which has a large guiding effect on improving the texture and reducing the inter-frame flicker. We train the model with regular and irregular masks on the YouTube high resolution video datasets, and the trained model is qualitatively and quantitatively evaluated on the test set, and the results show our method is superior to the previous methods.
Based on the original exemplar-based Criminisi algorithm, we proposed two points to improve the result of image inpainting. First, in order to solve the problem that the searched matching block existing in the optimal block search is not optimal, this paper proposes a fusion repair strategy. The first n blocks are selected as matching blocks in the search of the optimal block, and then the weighted averages are performed on the matching blocks, which is used as the target block to be repaired. Second, considering the size of the block to be repaired, a layered repair strategy is adopted. An image to be repaired is first downsampled to obtain images at different scales, and then repaired from the topmost image. The experimental results show that the proposed algorithm improves the quality of the repair subjectively and objectively.
This paper presents a temporal video error concealment method specially designed for H.265/HEVC. We propose the quad-tree partitioning prediction and the coherency sensitive hashing in order for better error concealment performance in the corrupted frames via HEVC codec. First, we try to deduce the most probable partitioning of the missing coding tree unit (CTU) using the proposed quad-tree partitioning prediction, which generates several CUs that constitute the CTU. Then, a coding unit (CU) priority choosing method is applied to select the best one from these CUs for prior concealment. Last, the coherency sensitive hashing is adopted for concealing the chosen best CU for better searching quality. The experiments shows that the recovery performance of the purposed method surpasses the compared state-of-the-art methods since the quad-tree partitioning prediction, the priority choosing process, and the coherency sensitive hashing help to improve the overall performance.
Three-dimensional (3-D) holoscopic imaging is a candidate promising 3-D technology that can overcome some drawbacks of current 3-D technologies. Due to the particular optical structure, a holoscopic image consists of an array of two-dimensional microimages (MIs) that represent different perspectives of the scene. To address the data-intensive characteristics and specific structure of holoscopic images, efficient coding schemes are of utmost importance for efficient storage and transmission. We propose a 3-D holoscopic image-coding scheme using a sparse viewpoint image (VI) array and disparities. In the proposed scheme, a holoscopic image is decomposed into a VI array totally and the VI array is sampled into a sparse VI array. To reconstruct the full holoscopic image, disparities between adjoining MIs are calculated. Based on the remainder set of VIs and disparities, a full holoscopic image is reconstructed and encoded as a reference frame for the coding of the full holoscopic image. As an outcome of the representation, we propose a multiview plus depth compression scheme for 3-D holoscopic images coding. Experimental results show that the proposed coding scheme can achieve an average of 51% bit-rate reduction compared with high efficiency video coding intracoding.
KEYWORDS: 3D video compression, Video compression, Video, Laser Doppler velocimetry, 3D displays, Copper, Digital filtering, Communication engineering, Telecommunications, Data compression
Layered depth video (LDV) is a sparse representation of MVD, which is considered as a promising 3D video format for supporting 3D video services. This format consists of one full view and additional residual data that represents side views. However, the amount of residual data becomes larger when the distance between the central view and side views increases. To address this problem, a new inpainting-based residual data generation method is proposed in this paper. Then, the inpainting-induced artifacts is considered as new residual data and the residual data of two side views is merged into one buffer to further reduce the amount of data. On the other hand, the block wise alignment is used for higher coding efficiency. And in order to fit the shape or distribution of residual data, a new compression algorithm for coding residual data is proposed. The experiments show high compression efficiency of the proposed method. The proposed method allows reduction of required bitrate of at least 30% comparing to classical LDV method, while they have the similar quality of intermediate virtual view in the terminal’s display.
KEYWORDS: 3D image processing, Image compression, Statistical analysis, Video coding, 3D displays, Visualization, Error analysis, Computer programming, Prototyping, Imaging systems
Three-dimensional (3-D) holoscopic imaging, also known as integral imaging, light field imaging, or plenoptic imaging, can provide natural and fatigue-free 3-D visualization. However, a large amount of data is required to represent the 3-D holoscopic content. Therefore, efficient coding schemes for this particular type of image are needed. A 3-D holoscopic image coding scheme with kernel-based minimum mean square error (MMSE) estimation is proposed. In the proposed scheme, the coding block is predicted by an MMSE estimator under statistical modeling. In order to obtain the signal statistical behavior, kernel density estimation (KDE) is utilized to estimate the probability density function of the statistical modeling. As bandwidth estimation (BE) is a key issue in the KDE problem, we also propose a BE method based on kernel trick. The experimental results demonstrate that the proposed scheme can achieve a better rate-distortion performance and a better visual rendering quality.
KEYWORDS: Copper, Scalable video coding, Video coding, Computer programming, Video, Mechanical efficiency, Communication engineering, Telecommunications, Video compression, Temporal resolution
A scalable extension design is proposed for High Efficiency Video Coding (HEVC), which can provide temporal, spatial,
and quality scalability. This technique achieves high coding efficiency and error resilience, but increases the
computational complexity. To reduce the complexity of the quality scalable video coding, this paper proposes a fast
mode selection method based on mode distribution of coding units(CUs). Some experiments are tested which show that
the proposed algorithm can achieve up to 63.70% decrease in encoding time with a negligible loss of video quality.
In the emerging international standard for scalable video coding (SVC) as an extension of H.264/AVC, a computationally expensive exhaustive mode decision is employed to select the best prediction mode for each macroblock (MB). Although this technique achieves the highest possible coding efficiency, it results in extremely large computation complexity, which obstructs SVC from the practical application. We propose a fast mode decision algorithm for SVC, comprising two fast mode decision techniques: early SKIP mode decision and adaptive early termination for mode decision. They make use of the coding information of spatial neighboring MBs in the same frame and neighboring MBs from base layer to early terminate the mode decision procedure. Experimental results show that the proposed fast mode decision algorithm can achieve the average computational savings of about 70% with almost no loss of rate distortion performance in the enhancement layer.
Video streaming over the Internet usually encounters with bandwidth variations and packet losses, which impacted badly on the reconstructed video quality. Fine Granularity Scalability (FGS) can well provide bit-rate adaptability to different bandwidth conditions over the Internet, due to its fine granular and error resilience. However, the effective solution of packet losses is Multiple Description Coding (MDC), but a great deal of redundancy information is brought up. For an FGS video bit-stream, the base layer is usually very small and of high importance, error-free transmission could be achieved through classical error resilience technique. As a result, the overall streaming quality is mostly dependent on the enhancement layer. Moreover, it is worthy of note that the different bit-planes are of different importance, which are suitable to unequal protection (UEP) strategy. So, a new joint MDC and UEP method is proposed to protect the enhancement layer in this paper. In the proposed method, the MDC encoder/decoder is embedded into the normal enhancement layer encoder/decoder. By considering of the unequal protection of bit-plane and the redundancy of MDC, the two most significant bit-planes adopt the MDC-based strategy. While, the remaining bit-planes only encoded by normal enhancement layer coding system. Experimental results are demonstrated to testify the efficiency of our proposed method.
KEYWORDS: Image segmentation, Video, Video surveillance, Image processing, Image processing algorithms and systems, Information fusion, Image fusion, Motion detection, Video compression, Video processing
An algorithm for object segmentation from stereo sequences based on fusion of multi-cues of edge, disparity, motion and color is presented in this paper. Firstly, the accurate disparity field is obtained using a two-level disparity matching method based on image edge information. The morphological operators are then performed on the given disparity field to obtain coarse objects segments. "Split and merge" process is applied to extract the objects regions, and "erosion and dilation" process is used to fill some small inner holes in the target regions or smooth the discontinuous regions. On the other hand, spatial-temporal segments are obtained with image edge structure and motion change detection. Different object boundaries can be articulated according to disparity and spatial-temporal segments. At last, the multi-objects are extracted by further fusion of the color information. Experiments indicate this algorithm is an effective method for segmenting multi-objects overlapped each other from stereoscopic video that usually is difficult to be done in the case of monocular video.
KEYWORDS: Digital signal processing, Video, Visualization, Video processing, Image processing, Reconstruction algorithms, Computer aided design, Communication engineering, Multimedia, Internet
MPEG-4 is applied to varieties of current and future multimedia applications. The standard not only supports existed frame-based coding standards such as MPEG-i, MPEG-2 or H.263, but also provides content-based representation and flexible toolbox, which lead to MPEG-4 more complicated. This paper first briefly presents the implementation method of the video decoding of MPEG-4 Core Visual Profile, which is a subset of MPEG-4 standard. The Core Visual Profile is quite suitable to streaming video, which will possibly become the hot spot for the development ofMPEG-4. Then, the paper proposes a design scheme for the basic hardware structure of the decoding system based on TMS32OC6X DSP, and simply analyzes the decoding processing of the system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.