A problem of interest for nuclear nonproliferation is monitoring activities at nuclear facilities, where proliferation events may only take place a few times and often under variable conditions. Machine learning has revolutionized data analytics by enabling the use of measurable signatures to generate predictive models of facility operations. However, traditional methods for training these models require large, reliable data sets with labeled observations, a challenge for nonproliferation. Highly variable conditions further complicate this as events from training data may have occurred in conditions quite different from the event of interest. Our hypothesis is that when events occur in a highly variable environment, careful training data selection for each test event could outperform the standard approach of using all available training data. We developed a method to optimize training data selection for the given test event and applied it to predicting the power level of the High Flux Isotope Reactor (HFIR) at Oak Ridge National Laboratory. In this study, the reactor startup exhibits variability between occurrences due to natural variability in environmental conditions and operational procedures. Using a combination of analysis techniques, a similitude assessment was performed on data collected from HFIR to isolate clusters that were optimal for training a predictive model. Concepts such as dynamic time warping and Jaccard similarity were used in conjunction with clustering analysis. In order to validate this approach, the model was trained on every combination of unique training events and the predictive performance was compared to the performance using a subset of the training data selected by isolated clusters found through the similitude assessment.
Using data for the states of Brazil, we construct a polynomial distributed lag model under different truncation lag criteria to predict reported dengue cases. Accurately predicting dengue cases provides the framework to develop forecasting models, which would provide public health professionals time to create targeted interventions for areas at high risk of dengue outbreaks. Others have shown that variables of interest such as temperature and vegetation can be used to predict dengue cases. These models did not detail how truncation lag criteria was chosen for their respective models when polynomial distributed lag was used. We explore current truncation lag selection methods used widely in the literature (marginal and minimized AIC) and determine which of these methods works best for our given data set. While minimized AIC truncation lag selection produced the best fit to our data, this method used substantially more data to inform its prediction compared to the marginal truncation lag selection method. Finally, the following variables were found to be significant predictors of dengue in this region: normalized difference vegetation index (NDVI), green-based normalized difference water index (NDWI), normalized burn ratio (NBR), and temperature. These best predictors were derived from multispectral remote sensing imagery as well as temperature data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.