To advance road safety through technological innovation, we present the Comprehensive Urban Navigation and Yielding (CUNY) Video Dataset (CVD), a pioneering collection aimed at enriching the analysis of roadway incidents using stationary camera footage. Derived from 1,013 YouTube videos, CVD is intricately annotated to discern between collision and non-collision scenarios, opening avenues for profound insights into various roadway incidents. CVD has been meticulously curated to overcome prevalent limitations in existing collision databases, boasting a comprehensive representation of environmental conditions, camera qualities, geographical diversity, and temporal variations. It is particularly well-suited for integration with existing road monitoring infrastructures, enabling optimization of emergency response, enhancement of traffic management, and overall improvement in road safety. By openly disseminating this dataset, we seek to address the scarcity of accessible, diverse, and authentic video data for collision analysis, contributing to advancements in the field of intelligent transportation systems and fostering safer road environments.
Recent progress in deraining and dehazing methods has dramatically enhanced image quality in bad weather. However, these methods are vulnerable to adversarial attacks, severely compromising their effectiveness. Traditional defenses like adversarial training and model distillation necessitate significant retraining, hindering their real-world application due to high computational costs. To address these limitations, we propose the Quaternion-Hadamard Transformer Network (QHTN), a novel defense strategy against white-box adversarial attacks, including the Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD). The QHTN leverages a transformer architecture with three key modules: preprocessing, local-global feature extraction, and reconstruction. The local-global feature extraction module utilizes innovative Hadamard and quaternion convolution blocks to analyze spatial and inter-channel relationships. This unique approach enables the QHTN to incorporate a denoising mechanism during preprocessing, effectively mitigating adversarial noise before it influences the model's input. Extensive evaluations demonstrate the QHTN's efficacy in safeguarding haze and rain removal models from adversarial attacks. These results validate the QHTN's efficiency and potential for broader adoption in image-processing defense mechanisms.
The performance of real-world CV systems used in outdoor surveillance and autonomous vehicles severely suffers from adverse weather conditions. Removing mist, rain streaks, adherent raindrops, and snow is an important processing step in real-world applications. Several solutions based on deep learning were proposed for multiple-type weather removal. Existing methods are prohibitively expensive regarding computational requirements and aren’t suitable for real-time operation. We propose ChebTF–lightweight encoder-decoder architecture based on quaternion neural network principles and a novel polynomial transform block to address this issue. The assessment of synthetic benchmarking datasets and realworld images, both quantitatively and qualitatively, demonstrates the comparable performance of the proposed ChebTF in handling various weather artifacts compared to other leading weather removal methods.
The 3-D reconstruction is relative to constructing a mathematical representation of the scene geometric. In most existing approaches, Lambert's law of the object reflectivity in the scene is explicitly or implicitly assumed. In practice, the law of reflectivity of scene objects differs from Lambert's law as, for example, objects with properties: semi-transparent, transparent, specular, with subsurface scattering effects. This paper proposes an algorithm to estimate a surface reflectance model of 3-D shape and parameters from multiple views. The proposed algorithm includes the following steps: 1) determination of the optical properties of the scanned scene by separating the direct and global lighting components using high-frequency templates; 2) generation of a set of patterns of structured light, the structure of which depends on the optical properties of the scanned scene; 3) scanning the scene using the generated structured light patterns for views non- Lambertian surface; 4) construction of a 3-D model of the scene by triangulating methodology. To solve the problem of determining the views non-Lambertian surface, a 3-D reconstruction algorithm based on a convolutional neural network is proposed. To train the neural network, we apply two stages. At the first stage, the encoder is trained for the descriptor description of the input image. In the second step, a fully connected neural network is added to the encoder for regression for choosing the best views. The coder is trained using the generative adversarial methodology to construct a descriptor description that stores spatial information and information about the optical properties of surfaces located in different areas of the image. The codec network is trained to recover the defect map (depends directly on the sensor and scene properties) from a color image. The architecture of the neural network (generator) is based on the U-Net architecture. As a result, this method uses non-Lambertian properties, and it can compensate for triangulation reconstruction errors caused by viewdependent reflections. Experimental results on both synthetic and real objects are given.
Skin lesion segmentation (SLS) has a vital role in the early and precise diagnosis of skin cancer by computer-aided diagnosis (CAD) systems. But, the automatic SLS in dermoscopic images is a challenging task due to the substantial differences in color, texture, artifacts (hairs, gel bubbles, ruler markers), indistinct boundaries, low contrast, and varying sizes, position, and shapes of the lesion images. In the paper, we propose an extended GrabCut image segmentation algorithm for Foreground/Backgrounds dermoscopic image segmentation applications. The method integrates octree color quantization and a modified GrabCut method with a new energy function. Extensive computer simulation on ISIC 2017 has shown to compare favorably on both qualitative and quantitative evaluations with commonly used segmentation tools.
Video related technology has grown rapidly due to the progress of digital devices such as virtual reality, 3D cameras, 3D films, and 3D display and Internet. During video acquisition and processing (compression, transmission, and reproduction) they may suffer some types of distortions which lead degradation and has a direct effect on the subjective sensation about human eyes. Moreover, the subjective evaluation is boring, time-consuming, and we do not have a specialist to do this kind of work. So, it is necessary to evaluate the quality of the videos by computers. Which means that the video quality evaluation/assessment has become vital. The goal of video quality assessment is to predict the perceptual quality for improving the performance of practical video application systems. In other words, the users’ experience is worse, so we need a metric to measure the distortions. The commonly used videos’ quality assessment methods: a) consider the video as a sequence of two-dimensional images and the videos’ quality assessment (scores) computing by weighted averaging of per frames (2D images) score, which conflicts with the fact that a video signal is a 3D volume and which ignores the movement features, b) are designed for specific distortions (for example blockiness and blurriness). In this paper, we present a novel deep learning architecture for no-reference video quality assessment. It based on a 3D convolutional neural network and generative adversarial network (GAN). We evaluate the proposed approach on the LIVE, ECVQ, TID2013 and EVVQ databases. Computer simulations show that the proposed video quality assessment: a) get convergence on a small amount of data, b) more “universal”- it can be used for different video quality degradation, including denoising, deblocking, deconvolution, and c) outperforms existing no-reference video quality assessment/methods. In addition, we demonstrate how our predicted no-reference quality metric correlates with qualitative opinion in a human observer study.
The paper presents a novel visual quality metric for lossy compressed video quality assessment. High degree of correlation with subjective estimations of quality is due to using of a convolutional neural network trained on a large amount of pairs video sequence-subjective quality score. We demonstrate how our predicted no-reference quality metric correlates with qualitative opinion in a human observer study. Results are shown on the EVVQ dataset with comparison existing approaches.
In the paper we propose approach for lossless image compression. Proposed method is based on separate processing of two image components: structure and texture. In the subsequent step separated components are compressed by standard RLE/LZW coding. We have performed a comparative analysis with existing techniques using standard test images. Our approach have shown promising results.
Content–based image retrieval systems have plenty of applications in modern world. The most important one is the image search by query image or by semantic description. Approaches to this problem are employed in personal photo–collection management systems, web–scale image search engines, medical systems, etc. Automatic analysis of large unlabeled image datasets is virtually impossible without satisfactory image–retrieval technique. It’s the main reason why this kind of automatic image processing has attracted so much attention during recent years. Despite rather huge progress in the field, semantically meaningful image retrieval still remains a challenging task. The main issue here is the demand to provide reliable results in short amount of time. This paper addresses the problem by novel technique for simultaneous learning of global image features and binary hash codes. Our approach provide mapping of pixel–based image representation to hash–value space simultaneously trying to save as much of semantic image content as possible. We use deep learning methodology to generate image description with properties of similarity preservation and statistical independence. The main advantage of our approach in contrast to existing is ability to fine–tune retrieval procedure for very specific application which allow us to provide better results in comparison to general techniques. Presented in the paper framework for data– dependent image hashing is based on use two different kinds of neural networks: convolutional neural networks for image description and autoencoder for feature to hash space mapping. Experimental results confirmed that our approach has shown promising results in compare to other state–of–the–art methods.
This paper proposes a video stabilization method using space-time video completion for effective static and dynamic textures reconstruction instead of frames cropping. The proposed method can produce full-frame videos by naturally filling in missing image parts by locally aligning image data of neighboring frames. We propose to use a set of descriptors that encapsulate the information of periodical motion of objects necessary to reconstruct missing/corrupted frames. The background is filled-in by extending spatial texture synthesis techniques using set of 3D patches. Experimental results demonstrate the effectiveness of the proposed method in the task of full-frame video stabilization.
Inpainting has received a lot of attention in recent years and quality assessment is an important task to evaluate different image reconstruction approaches. In many cases inpainting methods introduce a blur in sharp transitions in image and image contours in the recovery of large areas with missing pixels and often fail to recover curvy boundary edges. Quantitative metrics of inpainting results currently do not exist and researchers use human comparisons to evaluate their methodologies and techniques. Most objective quality assessment methods rely on a reference image, which is often not available in inpainting applications. Usually researchers use subjective quality assessment by human observers. It is difficult and time consuming procedure. This paper focuses on a machine learning approach for no-reference visual quality assessment for image inpainting based on the human visual property. Our method is based on observation that Local Binary Patterns well describe local structural information of the image. We use a support vector regression learned on assessed by human images to predict perceived quality of inpainted images. We demonstrate how our predicted quality value correlates with qualitative opinion in a human observer study. Results are shown on a human-scored dataset for different inpainting methods.
This article discusses features of the parallel hashing for the designing of the frame filtering tables in distributed computing systems. The proposed method of filtering tables design can reduce the time of frame processing by network bridges and switches and provide a low probability of filtering table overflowing. The optimal number of parallel tables was determined for a given amount of memory for table design.
This paper focuses on a machine learning approach for objective inpainting quality assessment. Inpainting has received a
lot of attention in recent years and quality assessment is an important task to evaluate different image reconstruction
approaches. Quantitative metrics for successful image inpainting currently do not exist; researchers instead are relying
upon qualitative human comparisons in order to evaluate their methodologies and techniques. We present an approach
for objective inpainting quality assessment based on natural image statistics and machine learning techniques. Our
method is based on observation that when images are properly normalized or transferred to a transform domain, local
descriptors can be modeled by some parametric distributions. The shapes of these distributions are different for noninpainted
and inpainted images. Approach permits to obtain a feature vector strongly correlated with a subjective image
perception by a human visual system. Next, we use a support vector regression learned on assessed by human images to
predict perceived quality of inpainted images. We demonstrate how our predicted quality value repeatably correlates
with a qualitative opinion in a human observer study.
The problem of automatic video restoration and object removal attract the attention of many researchers. In this paper we
present a new framework for video inpainting. We consider the case when a camera motion is approximately parallel to
the plane of image projection. The scene may consist of a stationary background with a moving foreground, both of
which may require inpainting. Moving objects can move differently, but should not to change their size. A framework
presented in this paper contains the following steps: moving objects identification, moving objects tracking and
background/foreground segmentation, inpainting and, finally, a video rendering. Some results on test video sequence
processing are presented.
This paper focuses on the fast texture and structure reconstruction of images. The proposed method, applied to images,
consists of several steps. The first one deals with the extracted textural features of the input images based on the
Law’s energy. The pixels around damaged image regions are clustered using these features, that allow to define the
correspondence between pixels from different patches. Second, cubic spline curve is applied to reconstruct a structure
and to connect edges and contours in the damaged area. The choice of the current pixel to be recovered is decided using
the fast marching approach. The Telea method or modifications of the exemplar based method are used after this
depending on the classification of the regions where to-be-restored pixel is located. In modification to quickly find
patches we use the perceptual hash. Such a strategy allows to get some data structure containing the hashes of similar
patches. This enables us to reduce the search procedure to the procedure for "calculations" of the patch. The proposed
method is tested on various samples of images, with different geometrical features and compared with the state-of-the-art
image inpainting methods; the proposed technique is shown to produce better results in reconstruction of missing small
and large objects on test images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.