Advances In Measurement Error Modeling, 2019 Southern Methodist University

#### Advances In Measurement Error Modeling, Linh Nghiem

*Statistical Science Theses and Dissertations*

Measurement error in observations is widely known to cause bias and a loss of power when fitting statistical models, particularly when studying distribution shape or the relationship between an outcome and a variable of interest. Most existing correction methods in the literature require strong assumptions about the distribution of the measurement error, or rely on ancillary data which is not always available. This limits the applicability of these methods in many situations. Furthermore, new correction approaches are also needed for high-dimensional settings, where the presence of measurement error in the covariates adds another level of complexity to the desirable structure ...

Selecting Maximally-Predictive Deep Features To Explain What Drives Fixations In Free-Viewing, 2019 University of Tübingen

#### Selecting Maximally-Predictive Deep Features To Explain What Drives Fixations In Free-Viewing, Matthias Kümmerer, Thomas S.A. Wallis, Matthias Bethge

*MODVIS Workshop*

No abstract provided.

Quantifying Sleep Architecture For Pediatric Hypersomnia Conditions, 2019 Colorado School of Mines

#### Quantifying Sleep Architecture For Pediatric Hypersomnia Conditions, Alicia K. Colclasure

*Biology and Medicine Through Mathematics Conference*

No abstract provided.

#### Self-Driving Cars: Evaluation Of Deep Learning Techniques For Object Detection In Different Driving Conditions, Ramesh Simhambhatla, Kevin Okiah, Shravan Kuchkula, Robert Slater

*SMU Data Science Review*

Deep Learning has revolutionized Computer Vision, and it is the core technology behind capabilities of a self-driving car. Convolutional Neural Networks (CNNs) are at the heart of this deep learning revolution for improving the task of object detection. A number of successful object detection systems have been proposed in recent years that are based on CNNs. In this paper, an empirical evaluation of three recent meta-architectures: SSD (Single Shot multi-box Detector), R-CNN (Region-based CNN) and R-FCN (Region-based Fully Convolutional Networks) was conducted to measure how fast and accurate they are in identifying objects on the road, such as vehicles, pedestrians ...

Fast Spatial Inference In The Homogeneous Ising Model, 2019 Universite de Montreal

#### Fast Spatial Inference In The Homogeneous Ising Model, Alejandro Murua, Ranjan Maitra

*Ranjan Maitra*

The Ising model is important in statistical modeling and inference in many applications, however its normalizing constant, mean number of active vertices and mean spin interaction are intractable. We provide accurate approximations that make it possible to calculate these quantities numerically. Simulation studies indicate good performance when compared to Markov Chain Monte Carlo methods and at a tiny fraction of the time. The methodology is also used to perform Bayesian inference in a functional Magnetic Resonance Imaging activation detection experiment.

Assessing Significance In Finite Mixture Models, 2019 Iowa State University

#### Assessing Significance In Finite Mixture Models, Ranjan Maitra, Volodymyr Melnykov

*Ranjan Maitra*

A new method is proposed to quantify significance in finite mixture models. The basis for this new methodology is an approach that calculates the p-value for testing a simpler model against a more complicated one in a way that is able to obviate the failure of regularity conditions for likelihood ratio tests. The developed testing procedure allows for pairwise comparison of any two mixture models with failure to reject the null hypothesis implying insignificant likelihood improvement under the more complex model. This leads to a comprehensive tool called a quantitation map which displays significance and quantitatively summarizes all model comparisons ...

Characterizing The Tails Of Degree Distributions In Real-World Networks, 2019 University of Colorado, Boulder

#### Characterizing The Tails Of Degree Distributions In Real-World Networks, Anna Broido

*Applied Mathematics Graduate Theses & Dissertations*

This is a thesis about how to characterize the statistical structure of the tails of degree distributions of real-world networks. The primary contribution is a statistical test of the prevalence of scale-free structure in real-world networks. A central claim in modern network science is that real-world networks are typically "scale free," meaning that the fraction of nodes with degree k follows a power law, decaying like k^{-a}, often with 2 < a< 3. However, empirical evidence for this belief derives from a relatively small number of real-world networks. In the first section, we test the universality of scale-free structure by applying state-of-the-art statistical tools to a large corpus of nearly 1000 network data sets drawn from social, biological, technological, and informational sources. We fit the power-law model to each degree distribution, test its statistical plausibility, and compare it via a likelihood ratio test to alternative, non-scale-free models, e.g., the log-normal. Across domains, we find that scale-free networks are rare, with only 4% exhibiting the strongest-possible evidence of scale-free structure and 52% exhibiting the weakest-possible evidence. Furthermore, evidence of scale-free structure is not uniformly distributed across sources: social networks are at best weakly scale free, while a handful of technological and biological networks can be called strongly scale free. These results undermine the universality of scale-free networks and reveal that real-world networks exhibit a rich structural diversity that will likely require new ideas and mechanisms to explain. A core methodological component of addressing the ubiquity of scale-free structure in real-world networks is an ability to fit a power law to the degree distribution. In the second section, we numerically evaluate and compare, using both synthetic data with known structure and real-world data with unknown structure, two statistically principled methods for estimating the tail parameters for power-law distributions, showing that in practice, a method based on extreme value theory and a sophisticated bootstrap and the more commonly used method based an empirical minimization approach exhibit similar accuracy.

Leveraging Reviews To Improve User Experience, 2019 Southern Methodist University

#### Leveraging Reviews To Improve User Experience, Anthony Schams, Iram Bakhtiar, Cristina Stanley

*SMU Data Science Review*

In this paper, we will explore and present a method of finding characteristics of a restaurant using its reviews through machine learning algorithms. We begin by building models to predict the ratings of individual reviews using text and categorical features. This is to examine the efficacy of the algorithms to the task. Both XGBoost and logistic regression will be examined. With these models, our goal is then to identify key phrases in reviews that are correlated with positive and negative experience. Our analysis makes use of review data publicly made available by Yelp. Key bigrams extracted were non-specific to the ...

Repairing Landsat Satellite Imagery Using Deep Machine Learning Techniques, 2019 SMU

#### Repairing Landsat Satellite Imagery Using Deep Machine Learning Techniques, Griffin J. Lane, Patricia Goresen, Robert Slater

*SMU Data Science Review*

Satellite Imagery is one of the most widely used sources to analyze geographic features and environments in the world. The data gathered from satellites are used to quantify many vital problems facing our society, such as the impact of natural disasters, shore erosion, rising water levels, and urban growth rates. In this paper, we construct machine learning and deep learning algorithms for repairing anomalies in the Landsat satellite imagery data which arise for various reasons ranging from cloud obstruction to satellite malfunctions. The accuracy of GIS data is crucial to ensuring the models produced from such data are as close ...

Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, 2019 Southern Methodist University

#### Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia

*SMU Data Science Review*

In this paper, we help NASA solve three Exploration Mission-1 (EM-1) challenges: data storage, computation time, and visualization of complex data. NASA is studying one year of trajectory data to determine available launch opportunities (about 90TBs of data). We improve data storage by introducing a cloud-based solution that provides elasticity and server upgrades. This migration will save $120k in infrastructure costs every four years, and potentially avoid schedule slips. Additionally, it increases computational efficiency by 125%. We further enhance computation via machine learning techniques that use the classic orbital elements to predict valid trajectories. Our machine learning model decreases trajectory ...

Leveraging Natural Language Processing Applications And Microblogging Platform For Increased Transparency In Crisis Areas, 2019 Southern Methodist University

#### Leveraging Natural Language Processing Applications And Microblogging Platform For Increased Transparency In Crisis Areas, Ernesto Carrera-Ruvalcaba, Johnson Ekedum, Austin Hancock, Ben Brock

*SMU Data Science Review*

Through microblogging applications, such as Twitter, people actively document their lives even in times of natural disasters such as hurricanes and earthquakes. While first responders and crisis-teams are able to help people who call 911, or arrive at a designated shelter, there are vast amounts of information being exchanged online via Twitter that provide real-time, location-based alerts that are going unnoticed. To effectively use this information, the Tweets must be verified for authenticity and categorized to ensure that the proper authorities can be alerted. In this paper, we create a Crisis Message Corpus from geotagged Tweets occurring during 7 hurricanes ...

An Extended Laplace Approximation Method For Bayesian Inference Of Self-Exciting Spatial-Temporal Models Of Count Data, 2019 Iowa State University

#### An Extended Laplace Approximation Method For Bayesian Inference Of Self-Exciting Spatial-Temporal Models Of Count Data, Nicholas J. Clark, Philip M. Dixon

*Philip Dixon*

Self-Exciting models are statistical models of count data where the probability of an event occurring is infl d by the history of the process. In particular, self-exciting spatio-temporal models allow for spatial dependence as well as temporal self-excitation. For large spatial or temporal regions, however, the model leads to an intractable likeli- hood. An increasingly common method for dealing with large spatio-temporal models is by using Laplace approximations (LA). This method is convenient as it can easily be applied and is quickly implemented. However, as we will demonstrate in this manuscript, when applied to self-exciting Poisson spatial-temporal models, Laplace Approximations ...

Assessing The Impacts Of Time-To-Detection Distribution Assumptions On Detection Probability Estimation, 2019 Iowa State University

#### Assessing The Impacts Of Time-To-Detection Distribution Assumptions On Detection Probability Estimation, Adam Martin-Schwarze, Jarad Niemi, Philip Dixon

*Philip Dixon*

Abundance estimates from animal point-count surveys require accurate estimates of detection probabilities. The standard model for estimating detection from removal-sampled point-count surveys assumes that organisms at a survey site are detected at a constant rate; however, this assumption can often lead to biased estimates. We consider a class of N-mixture models that allows for detection heterogeneity over time through a flexibly defined time-to-detection distribution (TTDD) and allows for fixed and random effects for both abundance and detection. Our model is thus a combination of survival time-to-event analysis with unknown-N, unknown-p abundance estimation. We specifically explore two-parameter families of TTDDs, e ...

Modeling And Estimation For Self-Exciting Spatio-Temporal Models Of Terrorist Activity, 2019 Iowa State University

#### Modeling And Estimation For Self-Exciting Spatio-Temporal Models Of Terrorist Activity, Nicholas J. Clark, Philip M. Dixon

*Philip Dixon*

Spatio-temporal hierarchical modeling is an extremely attractive way to model the spread of crime or terrorism data over a given region, especially when the observations are counts and must be modeled discretely. The spatio-temporal diffusion is placed, as a matter of convenience, in the process model allowing for straightforward estimation of the diffusion parameters through Bayesian techniques. However, this method of modeling does not allow for the existence of self-excitation, or a temporal data model dependency, that has been shown to exist in criminal and terrorism data. In this manuscript we will use existing theories on how violence spreads to ...

Comparisons Of Prediction Equations For Estimating Energy Expenditure In Youth, 2019 Iowa State University

#### Comparisons Of Prediction Equations For Estimating Energy Expenditure In Youth, Youngwon Kim, Scott E. Crouter, Jung-Min Lee, Philip M. Dixon, Glenn A. Gaesser, Gregory J. Welk

*Philip Dixon*

### Objectives

The purpose of this study was to compare the validity of Actigraph 2-regression models (2RM) and 1-regression models (1RM) for estimation of EE in children.

### Design

The study used a cross-sectional design with criterion estimates from a metabolic cart.

### Methods

A total of 59 children (7–13 yrs) performed 12 activities (randomly selected from a set of 24 activities) for 5 min each, while being concurrently measured with an Actigraph GT3X and indirect calorimetry. METRMR (MET considering one's resting metabolic rate) for the GT3X was estimated applying 2RM with vector magnitude (VM2RM) and vertical axis (VA2RM), and four ...

Time Series Analysis: Forecasting Treasury Bill Interest Rates, 2019 Murray State University

#### Time Series Analysis: Forecasting Treasury Bill Interest Rates, Nadine P. Innes

*Honors College Theses*

A Treasury Bill is a short-term investment typically with a maturity date of 12 months or less that is backed by the Treasury Department of the United States government. Rates of return for Treasury Bills are constantly changing over time due to the constant change of demand from borrowers and supply from lenders. This study seeks to forecast treasury bill rates that mature in 3 months. Since actuaries employ their knowledge of mathematics and statistical methods to analyze the likelihood of future events and their possible financial repercussions, having a projection of future treasury bill rates can provide guidance to ...

Predictive Distributions Via Filtered Historical Simulation For Financial Risk Management, 2019 Utah State University

#### Predictive Distributions Via Filtered Historical Simulation For Financial Risk Management, Tyson Clark

*All Graduate Plan B and other Reports*

Filtered historical simulation with an underlying GARCH process can be used as a valuable tool in VaR analysis, as it derives risk estimates that are sensitive to the distributional properties of the historical data of the produced predictive density. I examine the applications to risk analysis that filtered historical simulation can provide, as well as an interpretation of the predictive density as a poor man’s Bayesian posterior distribution. The predictive density allows us to make associated probabilistic statements regarding the results for VaR analysis, giving greater measurement of risk and the ability to maintain the optimal level of risk ...

Deep Neural Network Architectures For Music Genre Classification, 2019 University of San Francisco

#### Deep Neural Network Architectures For Music Genre Classification, Kai Middlebrook, Shyam Sudhakaran, Kunal Sonar, David Guy Brizan

*Creative Activity and Research Day - CARD*

With the recent advancements in technology, many tasks in fields such as computer vision, natural language processing, and signal processing have been solved using deep learning architectures. In the audio domain, these architectures have been used to learn musical features of songs to predict: moods, genres, and instruments. In the case of genre classification, deep learning models were applied to popular datasets--which are explicitly chosen to represent their genres--and achieved state-of-the-art results. However, these results have not been reproduced on less refined datasets. To this end, we introduce an un-curated dataset which contains genre labels and 30-second audio previews for ...

Combining Survey And Non-Survey Data For Improved Sub-Area Prediction Using A Multi-Level Model, 2019 Iowa State University

#### Combining Survey And Non-Survey Data For Improved Sub-Area Prediction Using A Multi-Level Model, Jae Kwang Kim, Zhonglei Wang, Zhengyuan Zhu, Nathan B. Cruze

*Zhengyuan Zhu*

Combining information from different sources is an important practical problem in survey sampling. Using a hierarchical area-level model, we establish a framework to integrate auxiliary information to improve state-level area estimates. The best predictors are obtained by the conditional expectations of latent variables given observations, and an estimate of the mean squared prediction error is discussed. Sponsored by the National Agricultural Statistics Service of the US Department of Agriculture, the proposed model is applied to the planted crop acreage estimation problem by combining information from three sources, including the June Area Survey obtained by a probability-based sampling of lands, administrative ...

The Importance Of Geographic And Biological Variables In Predicting The Naturalization Of Non-Native Woody Plants In The Upper Midwest, 2019 Iowa State University

#### The Importance Of Geographic And Biological Variables In Predicting The Naturalization Of Non-Native Woody Plants In The Upper Midwest, Mark P. Widrlechner, Emily J. Kapler, Philip M. Dixon, Janette R. Thompson

*Janette R. Thompson*

The selection, introduction, and cultivation of non-native woody plants beyond their native ranges can have great benefits, but also unintended consequences. Among these consequences is the tendency for some species to naturalize and become invasive pests in new environments to which they were introduced. In lieu of lengthy and costly field trials, risk-assessment models can be used to predict the likelihood of naturalization. We compared the relative performance of five established risk-assessment models on species datasets from two previously untested areas: southern Minnesota and northern Missouri. Model classification rates ranged from 64.2 to 90.5%, biologically significant errors ranged ...