Traditional stimulation-evoked cortical activity analysis mainly relies on the manual detection of the bright spots in a specific size regarded as active cells. However, there is limited research on the automatic detection of the cortical active cell in optical imaging of in vivo experiments which is much noisy. To address the laborious and hard annotation work, we propose a novel weakly supervised approach for active cell detection along the temporal frames. Compared to prevalent detection methods on common datasets, we formulate cell activation detection as a classification problem. We combine the techniques of clustering and deep neural network with little user indication of the Maximum Intensity Projection (MIP) of the time-lapse optical image sequence to realize the unsupervised classification model. The proposed approach achieves comparable performance on our optical image sequence with instant activation changing at each frame, which marks the cells using the fluorescent indicators. Although much noise is introduced during in vivo imaging, our algorithm is designed to accurately and effectively generate statistics on cell activation without requiring any prior training data preparation. This feature makes it particularly valuable for analyzing cell responses to psychopharmacological stimulation in subsequent analyses.
Direct visual tracking (DVT) for planar objects is a fundamental problem in computer vision. DVT methods often formulate tracking as an image registration problem, where image intensities are directly used to match two images. However, these methods are usually sensitive to illumination changes as they assume intensities are constant. The gradient orientation (GO) was proven to be insensitive to illumination variations in previous reports. We further confirmed that the GO’s robustness can be significantly improved when the pyramid technique is employed. We present a robust DVT method, named gradient orientation pyramid efficient second-order minimization (GOP-ESM), based on the proposed gradient orientation pyramid descriptor. GOP-ESM takes the advantages of the robust feature descriptors and the efficient second-order minimization method as to enhance tracking robustness and accuracy. We also published a tracking dataset for planar objects with illumination changes (POIC). The evaluations on the proposed POIC dataset and the other two public benchmark datasets demonstrated that GOP-ESM outperforms the state-of-the-art tracking methods against various environmental variations, especially illumination changes.
Vehicle detection in wide area motion imagery (WAMI) has drawn increasing attention from the computer vision research community in recent decades. In this paper, we present a new architecture for vehicle detection on road using multi-task network, which is able to detect and segment vehicles, estimate their pose, and meanwhile yield road isolation for a given region. The multi-task network consists of three components: 1) vehicle detection, 2) vehicle and road segmentation, and 3) detection screening. Segmentation and detection components share the same backbone network and are trained jointly in an end-to-end way. Unlike background subtraction or frame differencing based methods, the proposed Multitask Assessment of Roads and Vehicles Network (MARVN) method can detect vehicles which are slowing down, stopped, and/or partially occluded in a single image. In addition, the method can eliminate the detections which are located at outside road using yielded road segmentation so as to decrease the false positive rate. As few WAMI datasets have road mask and vehicles bounding box anotations, we extract 512 frames from WPAFB 2009 dataset and carefully refine the original annotations. The resulting dataset is thus named as WAMI512. We extensively compare the proposed method with state-of-the-art methods on WAMI512 dataset, and demonstrate superior performance in terms of efficiency and accuracy.
Humans have always had a keen interest in understanding activities and the surrounding environment for mobility, communication, and survival. Thanks to recent progress in photography and breakthroughs in aviation, we are now able to capture tens of megapixels of ground imagery, namely Wide Area Motion Imagery (WAMI), at multiple frames per second from unmanned aerial vehicles (UAVs). WAMI serves as a great source for many applications, including security, urban planning and route planning. These applications require fast and accurate image understanding which is time consuming for humans, due to the large data volume and city-scale area coverage. Therefore, automatic processing and understanding of WAMI imagery has been gaining attention in both industry and the research community. This paper focuses on an essential step in WAMI imagery analysis, namely vehicle classification. That is, deciding whether a certain image patch contains a vehicle or not. We collect a set of positive and negative sample image patches, for training and testing the detector. Positive samples are 64 × 64 image patches centered on annotated vehicles. We generate two sets of negative images. The first set is generated from positive images with some location shift. The second set of negative patches is generated from randomly sampled patches. We also discard those patches if a vehicle accidentally locates at the center. Both positive and negative samples are randomly divided into 9000 training images and 3000 testing images. We propose to train a deep convolution network for classifying these patches. The classifier is based on a pre-trained AlexNet Model in the Caffe library, with an adapted loss function for vehicle classification. The performance of our classifier is compared to several traditional image classifier methods using Support Vector Machine (SVM) and Histogram of Oriented Gradient (HOG) features. While the SVM+HOG method achieves an accuracy of 91.2%, the accuracy of our deep network-based classifier reaches 97.9%.
KEYWORDS: Fiber optic gyroscopes, Clouds, Sensors, Surveillance, Video, Video surveillance, Data processing, Detection and tracking algorithms, Roads, Data centers
Real-time information fusion based on WAMI (Wide-Area Motion Imagery), FMV (Full Motion Video), and Text data is highly desired for many mission critical emergency or security applications. Cloud Computing has been considered promising to achieve big data integration from multi-modal sources. In many mission critical tasks, however, powerful Cloud technology cannot satisfy the tight latency tolerance as the servers are allocated far from the sensing platform, actually there is no guaranteed connection in the emergency situations. Therefore, data processing, information fusion, and decision making are required to be executed on-site (i.e., near the data collection). Fog Computing, a recently proposed extension and complement for Cloud Computing, enables computing on-site without outsourcing jobs to a remote Cloud. In this work, we have investigated the feasibility of processing streaming WAMI in the Fog for real-time, online, uninterrupted target tracking. Using a single target tracking algorithm, we studied the performance of a Fog Computing prototype. The experimental results are very encouraging that validated the effectiveness of our Fog approach to achieve real-time frame rates.
In this paper we propose to use the Wavelet Leader (WL) transformation for studying trabecular bone patterns. Given an input image, its WL transformation is defined as the cross-channel-layer maximum pooling of an underlying wavelet transformation. WL inherits the advantage of the original wavelet transformation in capturing spatial-frequency statistics of texture images, while being more robust against scale and orientation thanks to the maximum pooling strategy. These properties make WL an attractive alternative to replace wavelet transformations which are used for trabecular analysis in previous studies. In particular, in this paper, after extracting wavelet leader descriptors from a trabecular texture patch, we feed them into two existing statistic texture characterization methods, namely the Gray Level Co-occurrence Matrix (GLCM) and the Gray Level Run Length Matrix (GLRLM). The most discriminative features, Energy of GLCM and Gray Level Non-Uniformity of GLRLM, are retained to distinguish two different populations between osteoporotic patients and control subjects. Receiver Operating Characteristics (ROC) curves are used to measure performance of classification. Experimental results on a recently released benchmark dataset show that WL significantly boosts the performance of baseline wavelet transformations by 5% in average.
Osteoporosis is the common cause for a broken bone among senior citizens. Early diagnosis of osteoporosis requires routine examination which may be costly for patients. A potential low cost diagnosis is to identify a senior citizen at high risk of osteoporosis by pre-screening during routine dental examination. Therefore, osteoporosis analysis using dental radiographs severs as a key step in routine dental examination. The aim of this study is to localize landmarks in dental radiographs which are helpful to assess the evidence of osteoporosis. We collect eight landmarks which are critical in osteoporosis analysis. Our goal is to localize these landmarks automatically for a given dental radiographic image. To address the challenges such as large variations of appearances in subjects, in this paper, we formulate the task into a multi-class classification problem. A hybrid feature pool is used to represent these landmarks. For the discriminative classification problem, we use a random forest to fuse the hybrid feature representation. In the experiments, we also evaluate the performances of individual feature component and the hybrid fused feature. Our proposed method achieves average detection error of 2:9mm.
Wide area motion imagery (WAMI) has been attracting an increased amount of research attention due to its large spatial and temporal coverage. An important application includes moving target analysis, where vehicle detection is often one of the first steps before advanced activity analysis. While there exist many vehicle detection algorithms, a thorough evaluation of them on WAMI data still remains a challenge mainly due to the lack of an appropriate benchmark data set. In this paper, we address a research need by presenting a new benchmark for wide area motion imagery vehicle detection data. The WAMI benchmark is based on the recently available Wright-Patterson Air Force Base (WPAFB09) dataset and the Temple Resolved Uncertainty Target History (TRUTH) associated target annotation. Trajectory annotations were provided in the original release of the WPAFB09 dataset, but detailed vehicle annotations were not available with the dataset. In addition, annotations of static vehicles, e.g., in parking lots, are also not identified in the original release. Addressing these issues, we re-annotated the whole dataset with detailed information for each vehicle, including not only a target’s location, but also its pose and size. The annotated WAMI data set should be useful to community for a common benchmark to compare WAMI detection, tracking, and identification methods.
KEYWORDS: Video, Video surveillance, Surveillance, Information fusion, Information security, Unmanned aerial vehicles, Sensors, Semantic video, Data fusion, Process modeling
Future surveillance systems will work in complex and cluttered environments which require systems engineering
solutions for such applications such as airport ground surface management. In this paper, we highlight the use of a L1
video tracker for monitoring activities at an airport. We present methods of information fusion, entity detection, and
activity analysis using airport videos for runway detection and airport terminal events. For coordinated airport security,
automated ground surveillance enhances efficient and safe maneuvers for aircraft, unmanned air vehicles (UAVs) and
unmanned ground vehicles (UGVs) operating within airport environments.
In the last decade, there have been numerous developments in wide-area motion imagery (WAMI) from the sensor design to data exploitation. In this paper, we summarize the published literature on WAMI results in an effort to organize the techniques, discuss the developments, and determine the state-of-the-art. Using the organization of developments, we see the variations in approaches and relations to the data sets available. The literature summary provides and anthology of many of the developers in the last decade and their associated techniques. In our use case, we showcase current methods and products that enable future WAMI exploitation developments.
Wide-Area Motion Imagery (WAMI) feature extraction is important for applications such as target tracking, traffic management
and accident discovery. With the increasing amount of WAMI collections and feature extraction from the data,
a scalable framework is needed to handle the large amount of information. Cloud computing is one of the approaches
recently applied in large scale or big data. In this paper, MapReduce in Hadoop is investigated for large scale feature
extraction tasks for WAMI. Specifically, a large dataset of WAMI images is divided into several splits. Each split has a
small subset of WAMI images. The feature extractions of WAMI images in each split are distributed to slave nodes in the
Hadoop system. Feature extraction of each image is performed individually in the assigned slave node. Finally, the feature
extraction results are sent to the Hadoop File System (HDFS) to aggregate the feature information over the collected imagery.
Experiments of feature extraction with and without MapReduce are conducted to illustrate the effectiveness of our
proposed Cloud-Enabled WAMI Exploitation (CAWE) approach.
Multi-flash (MF) photography offers a number of advantages over regular photography including removing the
effects of illumination, color and texture as well as highlighting occlusion contours. Implementing MF photography
on mobile devices, however, is challenging due to their restricted form factors, limited synchronization
capabilities, low computational power and limited interface connectivity. In this paper, we present a novel mobile
MF technique that overcomes these limitations and achieves comparable performance as conventional MF. We
first construct a mobile flash ring using four LED lights and design a special mobile flash-camera synchronization
unit. The mobile device’s own flash first triggers the flash ring via an auxiliary photocell. The mobile flashes are
then triggered consecutively in sync with the mobile camera’s frame rate, to guarantee that each image is captured
with only one LED flash on. To process the acquired MF images, we further develop a class of fast mobile
image processing techniques for image registration, depth edge extraction, and edge-preserving smoothing. We
demonstrate our mobile MF on a number of mobile imaging applications, including occlusion detection, image
thumbnailing, image abstraction and object category classification.
In this paper, we formulate the problem of infrared target tracking as a binary classification task and extend the online
multiple instance learning tracker (MILTracker) for the task. Compared with many color or texture based tracking
algorithms, the MILtracker highlights the difference between the target and the background or similar objects, and is thus
suitable for infrared target tracking which undergoes serious textual information loss. To address the specific challenges in
the infrared sequences, we extend the original MILtracker from two aspects. Firstly, an adaptive motion prediction procedure is integrated in to enhance the efficiency of the tracker. This step helps discriminate disturbing objects that are visual very similar to the target under tracking. Secondly, a spatial weight mask is introduced into the target representation to augment its robustness against similar background clutters, especially distracters. We apply the proposed approach on several challenging IR sequences. The experimental results clearly validate the effectiveness of our method with encouraging performances.
Moving vehicle detection in wide area motion imagery is a challenging task due to the large motion of the camera and
the small number of pixels on the target. At the same time, this task is very important for surveillance applications, and
the result can be used for urban traffic management, accident and emergency responder routing. Also, the effectiveness of
the context in object detection task can be further explored to increase target tracking accuracy. In this paper, we propose
to use Spatial Context(SC) to improve the performance of the vehicle detection task. We first model the background
of 8 consecutive frames with median filter, and get candidates by using background subtraction. The SC is built based
on the candidates that have been classified as positive by Histograms of Oriented Gradient(HOG) with Multiple Kernel
Learning(MKL). The region around each positive candidate is divided into m subregions with a fixed length l, then, the
SC, a histogram, is built based on the number of positive candidates in each region. We use the publicly available CLIF
2006 dataset to evaluate the effect of SC. The experiments demonstrate that SC is useful to remove false positives, around
which there are few positive candidates, and the combination of SC and HOG with multiple kernel learning outperforms
the use of SC or HOG only.
With the emergence of long lasting surveillance systems, e.g., full motion video (FMV) networks and wide area motion
imagery (WAMI) sensors, extracting targets’ long term pattern of life over a day becomes possible. In this paper, we
present a framework for extracting the pattern of life (POL) of targets from WAMI video. We first apply a context-aware
multi-target tracker (CAMT) to track multiple targets in the WAMI video and obtain the targets’ tracklets, traces, and the
locations, from surveillance information extracted from the targets' long-term trajectories. Then, entity networks
propagate over time are constructed with targets’ tracklets, traces, and the interested locations. Finally, the entity
network is analyzed using network retrieving technique to extract the POL of interested targets.
Traditional tracking frameworks are challenged by low video frame rate scenarios, because the appearances and locations
of the target may change considerably in consecutive frames. Our paper presents a saliency-based temporal association
dependency (STAD) framework to deal with such a low frame rate scenario and demonstrate good results in our
robot testbed. We first use median filter to create a background of the scene, then apply background subtraction to every
new frame to decide the rough position of the target. With the help of the markers on the robots, we use a gradient voting
algorithm to detect the high responses of the directions of the robots. Finally, a template matching with branch pruning
is used to obtain the finer estimation of the pose of the robots. To make the tracking-by-detection framework stable, we
further introduce the temporal constraints using a previously detected result as well as an association technique. Our experiments
show that our method can achieve a very stable tracking result and outperforms some state-of-the-art trackers such
as Meanshift, Online-AdaBoosting, Mulitple-Instance-Learning, Tracking-Learning-Detection etc. Also. we demonstrate
that our algorithm provides near real-time solutions given the low frame rate requirement.
Atmospheric clouds are commonly encountered phenomena affecting visual tracking from air-borne or space-borne
sensors. Generally clouds are difficult to detect and extract because they are complex in shape and interact with sunlight
in a complex fashion. In this paper, we propose a clustering game theoretic image segmentation based approach to
identify, extract, and patch clouds. In our framework, the first step is to decompose a given image containing clouds. The
problem of image segmentation is considered as a “clustering game”. Within this context, the notion of a cluster is
equivalent to a classical equilibrium concept from game theory, as the game equilibrium reflects both the internal and
external (e.g., two-player) cluster conditions. To obtain the evolutionary stable strategies, we explore three evolutionary
dynamics: fictitious play, replicator dynamics, and infection and immunization dynamics (InImDyn). Secondly, we use
the boundary and shape features to refine the cloud segments. This step can lower the false alarm rate. In the third step,
we remove the detected clouds and patch the empty spots by performing background recovery. We demonstrate our
cloud detection framework on a video clip provides supportive results.
With the advent of new technology in wide-area motion imagery (WAMI) and full-motion video (FMV), there is a
capability to exploit the imagery in conjunction with other information sources for improving confidence in detection,
tracking, and identification (DTI) of dismounts. Image exploitation, along with other radar and intelligence information
can aid decision support and situation awareness. Many advantages and limitations exist in dismount tracking analysis
using WAMI/FMV; however, through layered management of sensing resources, there are future capabilities to explore
that would increase dismount DTI accuracy, confidence, and timeliness. A layered sensing approach enables commandlevel
strategic, operational, and tactical analysis of dismounts to combine multiple sensors and databases, to validate DTI
information, as well as to enhance reporting results. In this paper, we discuss WAMI/FMV, compile a list of issues and
challenges of exploiting the data for WAMI, and provide examples from recently reported results. Our aim is to provide a
discussion to ensure that nominated combatants are detected, the sensed information is validated across multiple
perspectives, the reported confidence values achieve positive combatant versus non- combatant detection, and the related
situational awareness attributes including behavior analysis, spatial-temporal relations, and cueing are provided in a timely
and reliable manner to stakeholders.
Image registration in wide area motion imagery (WAMI) is a critical problem that is required for target tracking, image fusion, and situation awareness. The high resolution, extremely low frame rate, and large camera motion in such videos; however, introduces challenging constraints that distinguish the task from
traditional image registration from such sesnors as full motion video (FMV). In this study, we propose to
use the feature-based approach for the registration of wide area surveillance imagery. Specifically, we extract Speeded Up Robust Feature (SURF) feature points for each frame. After that, a kd-tree algorithm is adopted to match the feature points of each frame to the reference frame. Then, we use the RANdom SAmple Consensus (RANSAC) algorithm to refine the matching results. Finally, the refined matching point pairs are used to estimate the transformation between frames. The experiments are conducted on the Columbus Large Image Format (CLIF) dataset. The experimental results show that the proposed approach is very efficient for the wide area motion imagery registration.
The apical root regions play an important role in analysis and diagnosis of many oral diseases. Automatic
detection of such regions is consequently the first step toward computer-aided diagnosis of these diseases.
In this paper we propose an automatic method for periapical root region detection by using the state-of-theart
machine learning approaches. Specifically, we have adapted the AdaBoost classifier for apical root
detection. One challenge in the task is the lack of training cases especially for diseased ones. To handle this
problem, we boost the training set by including more root regions that are close to the annotated ones and
decompose the original images to randomly generate negative samples. Based on these training samples,
the Adaboost algorithm in combination with Haar wavelets is utilized in this task to train an apical root
detector. The learned detector usually generates a large amount of true and false positives. In order to
reduce the number of false positives, a confidence score for each candidate detection result is calculated for
further purification. We first merge the detected regions by combining tightly overlapped detected
candidate regions and then we use the confidence scores from the Adaboost detector to eliminate the false
positives. The proposed method is evaluated on a dataset containing 39 annotated digitized oral X-Ray
images from 21 patients. The experimental results show that our approach can achieve promising detection
accuracy.
Periapical lesion is a common disease in oral health. While many studies have been devoted to image-based
diagnosis of periapical lesion, these studies usually require clinicians to perform the task. In this paper we
investigate the automatic solutions toward periapical lesion classification using quantized texture analysis.
Specifically, we adapt the bag-of-visual-words model for periapical root image representation, which
captures the texture information by collecting local patch statistics. Then we investigate several similarity
measure approaches with the K-nearest neighbor (KNN) classifier for the diagnosis task. To evaluate these
classifiers we have collected a digitized oral X-Ray image dataset from 21 patients, resulting 139 root
images in total. The extensive experimental results demonstrate that the KNN classifier based on the bagof-
words model can achieve very promising performance for periapical lesion classification.
With the rapid development and wide application of medical imaging technology, explosive volumes of medical
image data are produced every day all over the world. As such, it becomes increasingly challenging to manage
and utilize such data effectively and efficiently. In particular, content-based medical image retrieval has been
intensively researched in the past decade or so.
In this work, we propose a novel approach to content-based medical image retrieval utilizing the co-occurrence
of both the texture and the shape features in contrast to most previous algorithms that use purely the texture
or the shape feature. Specifically, we propose a novel form of representation for the co-occurrence of the texture
and the shape features in an image, i.e., the gray level and edge direction co-occurrence matrix (GLEDCOM).
Based on GLEDCOM, we define eleven features forming a feature vector that is used to measure the similarity
between images. As a result, it consistently yields outstanding performance on both images rich in texture (e.g.,
image of brain) and images with dominant smooth regions and sharp edges (e.g., image of bladder).
As demonstrated by experiments, the mean precision of retrieval with GLEDCOM algorithm outperforms a
set of representative algorithms including the gray level co-occurrence matrix (GLCM) based, the Hu's seven
moment invariants (HSMI) based, the uniformity estimation method (UEM) based and the the modified Zernike
moments (MZM) based algorithms by 10%-20%.
This work is a part of our ongoing study aimed at understanding a relation between the topology of anatomical branching
structures with the underlying image texture. Morphological variability of the breast ductal network is associated with
subsequent development of abnormalities in patients with nipple discharge such as papilloma, breast cancer and atypia.
In this work, we investigate complex dependence among ductal components to perform segmentation, the first step for
analyzing topology of ductal lobes. Our automated framework is based on incorporating a conditional random field with
texture descriptors of skewness, coarseness, contrast, energy and fractal dimension. These features are selected to
capture the architectural variability of the enhanced ducts by encoding spatial variations between pixel patches in
galactographic image. The segmentation algorithm was applied to a dataset of 20 x-ray galactograms obtained at the
Hospital of the University of Pennsylvania. We compared the performance of the proposed approach with fully and semi
automated segmentation algorithms based on neural network classification, fuzzy-connectedness, vesselness filter and
graph cuts. Global consistency error and confusion matrix analysis were used as accuracy measurements. For the
proposed approach, the true positive rate was higher and the false negative rate was significantly lower compared to
other fully automated methods. This indicates that segmentation based on CRF incorporated with texture descriptors has
potential to efficiently support the analysis of complex topology of the ducts and aid in development of realistic breast
anatomy phantoms.
This work is a part of our ongoing study aimed at comparing the topology of anatomical branching structures with the
underlying image texture. Detection of regions of interest (ROIs) in clinical breast images serves as the first step in
development of an automated system for image analysis and breast cancer diagnosis. In this paper, we have investigated
machine learning approaches for the task of identifying ROIs with visible breast ductal trees in a given galactographic
image. Specifically, we have developed boosting based framework using the AdaBoost algorithm in combination with
Haar wavelet features for the ROI detection. Twenty-eight clinical galactograms with expert annotated ROIs were used
for training. Positive samples were generated by resampling near the annotated ROIs, and negative samples were
generated randomly by image decomposition. Each detected ROI candidate was given a confidences core. Candidate
ROIs with spatial overlap were merged and their confidence scores combined. We have compared three strategies for
elimination of false positives. The strategies differed in their approach to combining confidence scores by summation,
averaging, or selecting the maximum score.. The strategies were compared based upon the spatial overlap with
annotated ROIs. Using a 4-fold cross-validation with the annotated clinical galactographic images, the summation
strategy showed the best performance with 75% detection rate. When combining the top two candidates, the selection of
maximum score showed the best performance with 96% detection rate.
Since several lung diseases can be potentially diagnosed based on the patterns of lung tissue observed in medical images,
automated texture classification can be useful in assisting the diagnosis. In this paper, we propose a methodology for
discriminating between various types of normal and diseased lung tissue in computed tomography (CT) images that
utilizes Vector Quantization (VQ), an image compression technique, to extract discriminative texture features. Rather
than focusing on images of the entire lung, we direct our attention to the extraction of local descriptors from individual
regions of interest (ROIs) as determined by domain experts. After determining the ROIs, we generate "locally optimal"
codebooks representing texture features of each region using the Generalized Lloyd Algorithm. We then utilize the
codeword usage frequency of each codebook as a discriminative feature vector for the region it represents. We compare
k-nearest neighbor, support vector machine and neural network classification approaches using the normalized
histogram intersection as a similarity measure. The classification accuracy reached up to 98% for certain experimental
settings, indicating that our approach may potentially assist clinicians in the interpretation of lung images and facilitate
the investigation of relationships among structure, texture and function or pathology related to several lung diseases.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.