Open Access. Powered by Scholars. Published by Universities.®

Applied Statistics Commons

Open Access. Powered by Scholars. Published by Universities.®

Doctoral Dissertations

Discipline
Institution
Keyword
Publication Year

Articles 1 - 30 of 47

Full-Text Articles in Applied Statistics

Nonparametric Derivative Estimation Using Penalized Splines: Theory And Application, Bright Antwi Boasiako Nov 2023

Nonparametric Derivative Estimation Using Penalized Splines: Theory And Application, Bright Antwi Boasiako

Doctoral Dissertations

This dissertation is in the field of Nonparametric Derivative Estimation using
Penalized Splines. It is conducted in two parts. In the first part, we study the L2
convergence rates of estimating derivatives of mean regression functions using penalized splines. In 1982, Stone provided the optimal rates of convergence for estimating derivatives of mean regression functions using nonparametric methods. Using these rates, Zhou et. al. in their 2000 paper showed that the MSE of derivative estimators based on regression splines approach zero at the optimal rate of convergence. Also, in 2019, Xiao showed that, under some general conditions, penalized spline estimators …


Better Understanding Genomic Architecture With The Use Of Applied Statistics And Explainable Artificial Intelligence, Jonathon C. Romero Aug 2022

Better Understanding Genomic Architecture With The Use Of Applied Statistics And Explainable Artificial Intelligence, Jonathon C. Romero

Doctoral Dissertations

With the continuous improvements in biological data collection, new techniques are needed to better understand the complex relationships in genomic and other biological data sets. Explainable Artificial Intelligence (X-AI) techniques like Iterative Random Forest (iRF) excel at finding interactions within data, such as genomic epistasis. Here, the introduction of new methods to mine for these complex interactions is shown in a variety of scenarios. The application of iRF as a method for Genomic Wide Epistasis Studies shows that the method is robust in finding interacting sets of features in synthetic data, without requiring the exponentially increasing computation time of many …


Sparse Model Selection Using Information Complexity, Yaojin Sun May 2022

Sparse Model Selection Using Information Complexity, Yaojin Sun

Doctoral Dissertations

This dissertation studies and uses the application of information complexity to statistical model selection through three different projects. Specifically, we design statistical models that incorporate sparsity features to make the models more explanatory and computationally efficient.

In the first project, we propose a Sparse Bridge Regression model for variable selection when the number of variables is much greater than the number of observations if model misspecification occurs. The model is demonstrated to have excellent explanatory power in high-dimensional data analysis through numerical simulations and real-world data analysis.

The second project proposes a novel hybrid modeling method that utilizes a mixture …


Monitoring Mammals At Multiple Scales: Case Studies From Carnivore Communities, Kadambari Devarajan Oct 2021

Monitoring Mammals At Multiple Scales: Case Studies From Carnivore Communities, Kadambari Devarajan

Doctoral Dissertations

Carnivores are distributed widely and threatened by habitat loss, poaching, climate change, and disease. They are considered integral to ecosystem function through their direct and indirect interactions with species at different trophic levels. Given the importance of carnivores, it is of high conservation priority to understand the processes driving carnivore assemblages in different systems. It is thus essential to determine the abiotic and biotic drivers of carnivore community composition at different spatial scales and address the following questions: (i) What factors influence carnivore community composition and diversity? (ii) How do the factors influencing carnivore communities vary across spatial and temporal …


Using Generalizability And Rasch Measurement Theory To Ensure Rigorous Measurement In An International Development Education Evaluation, Louise Bahry Oct 2021

Using Generalizability And Rasch Measurement Theory To Ensure Rigorous Measurement In An International Development Education Evaluation, Louise Bahry

Doctoral Dissertations

Between the United States and Great Britain, over 30 billion USD was spent in 2018 on international aid, over a billion of which is dedicated to education programs alone. Recently, there has been increased attention on the rigorous evaluation of aid-funded programs, moving beyond counting outputs to the measurement of educational impact. The current study uses two methodological approaches (Generalizability (Brennan, 1992, 2001) and Rasch Measurement Theory (Andrich, 1978; Rasch, 1980; Wright & Masters, 1982) to analyze data from math and literacy assessments, and self-report surveys used in an international evaluation of an educational initiative in the Democratic Republic of …


Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang Jul 2021

Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang

Doctoral Dissertations

In the process of statistical modeling, the descriptive modeling plays an essential role in accelerating the formulation of plausible hypotheses in the subsequent explanatory modeling and facilitating the selection of potential variables in the subsequent predictive modeling. Especially, for multivariate categorical data analysis, it is desirable to use the descriptive modeling methods for uncovering and summarizing the potential association structure among multiple categorical variables in a compact manner. However, many classical methods in this case either rely on strong assumptions for parametric models or become infeasible when the data dimension is higher. To this end, we propose a model-free method …


Motor Control-Based Assessment Of Therapy Effects In Individuals Post-Stroke: Implications For Prediction Of Response And Subject-Specific Modifications, Ashley Rice May 2021

Motor Control-Based Assessment Of Therapy Effects In Individuals Post-Stroke: Implications For Prediction Of Response And Subject-Specific Modifications, Ashley Rice

Doctoral Dissertations

Producing a coordinated motion such as walking is, at its root, the result of healthy communication pathways between the central nervous system and the musculoskeletal system. The central nervous system produces an electrical signal responsible for the excitation of a muscle, and the musculoskeletal system contains the necessary equipment for producing a movement-driving force to achieve a desired motion. Motor control refers to the ability an individual has to produce a desired motion, and the complexity of motor control is a mathematical concept stemming from how the electrical signals from the central nervous system translate to muscle activations. Exercising a …


Geometric Representation Learning, Luke Vilnis Apr 2021

Geometric Representation Learning, Luke Vilnis

Doctoral Dissertations

Vector embedding models are a cornerstone of modern machine learning methods for knowledge representation and reasoning. These methods aim to turn semantic questions into geometric questions by learning representations of concepts and other domain objects in a lower-dimensional vector space. In that spirit, this work advocates for density- and region-based representation learning. Embedding domain elements as geometric objects beyond a single point enables us to naturally represent breadth and polysemy, make asymmetric comparisons, answer complex queries, and provides a strong inductive bias when labeled data is scarce. We present a model for word representation using Gaussian densities, enabling asymmetric entailment …


Root Stage Distributions And Their Importance In Plant-Soil Feedback Models, Tyler Poppenwimer Dec 2020

Root Stage Distributions And Their Importance In Plant-Soil Feedback Models, Tyler Poppenwimer

Doctoral Dissertations

Roots are fundamental to PSFs, being a key mediator of these feedbacks by interacting with and affecting the soil environment and soil microbial communities. However, most PSF models aggregate roots into a homogeneous component or only implicitly simulate roots via functions. Roots are not homogeneous and root traits (nutrient and water uptake, turnover rate, respiration rate, mycorrhizal colonization, etc.) vary with age, branch order, and diameter. Trait differences among a plant’s roots lead to variation in root function and roots can be disaggregated according to their function. The impact on plant growth and resource cycling of changes in the distribution …


Bayesian Topological Machine Learning, Christopher A. Oballe Aug 2020

Bayesian Topological Machine Learning, Christopher A. Oballe

Doctoral Dissertations

Topological data analysis encompasses a broad set of ideas and techniques that address 1) how to rigorously define and summarize the shape of data, and 2) use these constructs for inference. This dissertation addresses the second problem by developing new inferential tools for topological data analysis and applying them to solve real-world data problems. First, a Bayesian framework to approximate probability distributions of persistence diagrams is established. The key insight underpinning this framework is that persistence diagrams may be viewed as Poisson point processes with prior intensities. With this assumption in hand, one may compute posterior intensities by adopting techniques …


Interacting Effects Of Climate And Biotic Factors On Mesocarnivore Distribution And Snowshoe Hare Demography Along The Boreal-Temperate Ecotone, Alexej P. Siren Jul 2020

Interacting Effects Of Climate And Biotic Factors On Mesocarnivore Distribution And Snowshoe Hare Demography Along The Boreal-Temperate Ecotone, Alexej P. Siren

Doctoral Dissertations

The motivation of my dissertation research was to understand the influence of climate and biotic factors on range limits with a focus on winter-adapted species, including the Canada lynx (Lynx canadensis), American marten (Martes americana), and snowshoe hare (Lepus americanus). I investigated range dynamics along the boreal-temperate ecotone of the northeastern US. Through an integrative literature review, I developed a theoretical framework building from existing thinking on range limits and ecological theory. I used this theory for my second chapter to evaluate direct and indirect causes of carnivore range limits in the northeastern US, …


Latent Class Models For At-Risk Populations, Shuaimin Kang Jul 2020

Latent Class Models For At-Risk Populations, Shuaimin Kang

Doctoral Dissertations

Clustering Network Tree Data From Respondent-Driven Sampling With Application to Opioid Users in New York City There is great interest in finding meaningful subgroups of attributed network data. There are many available methods for clustering complete network. Unfortunately, much network data is collected through sampling, and therefore incomplete. Respondent-driven sampling (RDS) is a widely used method for sampling hard-to-reach human populations based on tracing links in the underlying unobserved social network. The resulting data therefore have tree structure representing a sub-sample of the network, along with many nodal attributes. In this paper, we introduce an approach to adjust mixture models …


Allocative Poisson Factorization For Computational Social Science, Aaron Schein Jul 2019

Allocative Poisson Factorization For Computational Social Science, Aaron Schein

Doctoral Dissertations

Social science data often comes in the form of high-dimensional discrete data such as categorical survey responses, social interaction records, or text. These data sets exhibit high degrees of sparsity, missingness, overdispersion, and burstiness, all of which present challenges to traditional statistical modeling techniques. The framework of Poisson factorization (PF) has emerged in recent years as a natural way to model high-dimensional discrete data sets. This framework assumes that each observed count in a data set is a Poisson random variable $y ~ Pois(\mu)$ whose rate parameter $\mu$ is a function of shared model parameters. This thesis examines a specific …


Population Viability And Connectivity Of The Federally Threatened Eastern Indigo Snake In Central Peninsular Florida, Javan Bauder Mar 2019

Population Viability And Connectivity Of The Federally Threatened Eastern Indigo Snake In Central Peninsular Florida, Javan Bauder

Doctoral Dissertations

Understanding the factors influencing the likelihood of persistence of real-world populations requires both an accurate understanding of the traits and behaviors of individuals within those populations (e.g., movement, habitat selection, survival, fecundity, dispersal) but also an understanding of how those traits and behaviors are influenced by landscape features. The federally threatened eastern indigo snake (EIS, Drymarchon couperi) has declined throughout its range primarily due to anthropogenically-induced habitat loss and fragmentation making spatially-explicit assessments of population viability and connectivity essential for understanding its current status and directing future conservation efforts. The primary goal of my dissertation was to understand how …


Essays In Financial Economics: Announcement Effects In Fixed Income Markets, James J. Forest Oct 2018

Essays In Financial Economics: Announcement Effects In Fixed Income Markets, James J. Forest

Doctoral Dissertations

ABSTRACT ESSAYS IN FINANCIAL ECONOMICS: ANNOUNCEMENT EFFECTS IN FIXED INCOME MARKETS PHD IN FINANCE MAY 2018 JAMES J FOREST B.A., FRAMINGHAM STATE UNIVERSITY M.S., NORTHEASTERN UNIVERSITY Ph.D., UNIVERSITY OF MASSACHUSETTS – AMHERST Directed by: Professor Hossein B. Kazemi This dissertation demonstrates the use of empirical techniques for dealing with modeling issues that arise when analyzing announcement effects in fixed income markets. It describes empirical challenges in achieving unbiased and efficient parameter estimates and shows the importance of modelling a wide range of macroeconomic announcement effects to avoid omitted variable bias. Employing techniques common in Macroeconomics, financial market researchers are better …


Deep Energy-Based Models For Structured Prediction, David Belanger Nov 2017

Deep Energy-Based Models For Structured Prediction, David Belanger

Doctoral Dissertations

We introduce structured prediction energy networks (SPENs), a flexible frame- work for structured prediction. A deep architecture is used to define an energy func- tion over candidate outputs and predictions are produced by gradient-based energy minimization. This deep energy captures dependencies between labels that would lead to intractable graphical models, and allows us to automatically discover discrim- inative features of the structured output. Furthermore, practitioners can explore a wide variety of energy function architectures without having to hand-design predic- tion and learning methods for each model. This is because all of our prediction and learning methods interact with the energy …


Spatiotemporal Subspace Feature Tracking By Mining Discriminatory Characteristics, Richard D. Appiah Oct 2017

Spatiotemporal Subspace Feature Tracking By Mining Discriminatory Characteristics, Richard D. Appiah

Doctoral Dissertations

Recent advancements in data collection technologies have made it possible to collect heterogeneous data at complex levels of abstraction, and at an alarming pace and volume. Data mining, and most recently data science seek to discover hidden patterns and insights from these data by employing a variety of knowledge discovery techniques. At the core of these techniques is the selection and use of features, variables or properties upon which the data were acquired to facilitate effective data modeling. Selecting relevant features in data modeling is critical to ensure an overall model accuracy and optimal predictive performance of future effects. The …


Statistical Methods On Risk Management Of Extreme Events, Zijing Zhang Jul 2017

Statistical Methods On Risk Management Of Extreme Events, Zijing Zhang

Doctoral Dissertations

The goal of the dissertation is the investigation of financial risk analysis methodologies, using the schemes for extreme value modeling as well as techniques from copula modeling. Extreme value theory is concerned with probabilistic and statistical questions re- lated to unusual behavior or rare events. The subject has a rich mathematical theory and also a long tradition of applications in a variety of areas. We are interested in its application in risk management, with a focus on estimating and forcasting the Value-at-Risk of financial time series data. Extremal data are inherently scarce, thus making inference challenging. In order to obtain …


Inference In Networking Systems With Designed Measurements, Chang Liu Mar 2017

Inference In Networking Systems With Designed Measurements, Chang Liu

Doctoral Dissertations

Networking systems consist of network infrastructures and the end-hosts have been essential in supporting our daily communication, delivering huge amount of content and large number of services, and providing large scale distributed computing. To monitor and optimize the performance of such networking systems, or to provide flexible functionalities for the applications running on top of them, it is important to know the internal metrics of the networking systems such as link loss rates or path delays. The internal metrics are often not directly available due to the scale and complexity of the networking systems. This motivates the techniques of inference …


A Study Of Mathematics Achievement, Placement, And Graduation Of Engineering Students, Sara Hahler Blazek Jan 2017

A Study Of Mathematics Achievement, Placement, And Graduation Of Engineering Students, Sara Hahler Blazek

Doctoral Dissertations

The purpose of this study was to determine how background knowledge impacts freshmen engineering students' success at Louisiana Tech University in terms of grades in two different freshman classes and graduation. To determine what factors impact students, three different studies were implemented. The first study used linear regression to analyze which demographic and academic variables significantly impacted freshman math and engineering courses. Using regression discontinuity, the second study determined if the university's placement requirement for Pre-Calculus was appropriate. The final study analyzed factors that impact graduation for engineering students as well as other disciplines to determine which significant variables were …


Development And Validation Of The Statistics Assessment Of Graduate Students, Dammika Lakmal Walpitage Dec 2016

Development And Validation Of The Statistics Assessment Of Graduate Students, Dammika Lakmal Walpitage

Doctoral Dissertations

This study developed the Statistics Assessment of Graduate Students (SAGS) instrument, and established its preliminary item characteristics, reliability, and validity evidence. Even though there are limited number of assessments available for measuring different aspects of statistical cognition, these previously available assessments have numerous limitations. The SAGS instrument was developed using Rasch modeling approach to create a new measure of statistical research methodology knowledge of graduate students in education and other behavioral and social sciences. Thirty-five multiple-choice questions were written with stems representing applied research situations and response options distinguishing between appropriate use of various statistical tests or procedures. A focus …


Variable Selection Via Penalized Regression And The Genetic Algorithm Using Information Complexity, With Applications For High-Dimensional -Omics Data, Tyler J. Massaro Aug 2016

Variable Selection Via Penalized Regression And The Genetic Algorithm Using Information Complexity, With Applications For High-Dimensional -Omics Data, Tyler J. Massaro

Doctoral Dissertations

This dissertation is a collection of examples, algorithms, and techniques for researchers interested in selecting influential variables from statistical regression models. Chapters 1, 2, and 3 provide background information that will be used throughout the remaining chapters, on topics including but not limited to information complexity, model selection, covariance estimation, stepwise variable selection, penalized regression, and especially the genetic algorithm (GA) approach to variable subsetting.

In chapter 4, we fully develop the framework for performing GA subset selection in logistic regression models. We present advantages of this approach against stepwise and elastic net regularized regression in selecting variables from a …


Advanced Sequential Monte Carlo Methods And Their Applications To Sparse Sensor Network For Detection And Estimation, Kai Kang Aug 2016

Advanced Sequential Monte Carlo Methods And Their Applications To Sparse Sensor Network For Detection And Estimation, Kai Kang

Doctoral Dissertations

The general state space models present a flexible framework for modeling dynamic systems and therefore have vast applications in many disciplines such as engineering, economics, biology, etc. However, optimal estimation problems of non-linear non-Gaussian state space models are analytically intractable in general. Sequential Monte Carlo (SMC) methods become a very popular class of simulation-based methods for the solution of optimal estimation problems. The advantages of SMC methods in comparison with classical filtering methods such as Kalman Filter and Extended Kalman Filter are that they are able to handle non-linear non-Gaussian scenarios without relying on any local linearization techniques. In this …


Wind Power Capacity Value Metrics And Variability: A Study In New England, Frederick W. Letson Nov 2015

Wind Power Capacity Value Metrics And Variability: A Study In New England, Frederick W. Letson

Doctoral Dissertations

Capacity value is the contribution of a power plant to the ability of the power system to meet high demand. As wind power penetration in New England, and worldwide, increases so does the importance of identifying the capacity contribution made by wind power plants. It is critical to accurately characterize the capacity value of these wind power plants and the variability of the capacity value over the long term. This is important in order to avoid the cost of keeping extra power plants operational while still being able to cover the demand for power reliably. This capacity value calculation is …


Variable Selection In Single Index Varying Coefficient Models With Lasso, Peng Wang Nov 2015

Variable Selection In Single Index Varying Coefficient Models With Lasso, Peng Wang

Doctoral Dissertations

Single index varying coefficient model is a very attractive statistical model due to its ability to reduce dimensions and easy-of-interpretation. There are many theoretical studies and practical applications with it, but typically without features of variable selection, and no public software is available for solving it. Here we propose a new algorithm to fit the single index varying coefficient model, and to carry variable selection in the index part with LASSO. The core idea is a two-step scheme which alternates between estimating coefficient functions and selecting-and-estimating the single index. Both in simulation and in application to a Geoscience dataset, we …


Evaluating The Effects Of Standardized Patient Care Pathways On Clinical Outcomes, Anna V. Romanova Aug 2015

Evaluating The Effects Of Standardized Patient Care Pathways On Clinical Outcomes, Anna V. Romanova

Doctoral Dissertations

The main focus of this study is to create a standardized approach to evaluating the impact of the patient care pathways across all major disease categories and key outcome measures in a hospital setting when randomized clinical trials are not feasible. Toward this goal I identify statistical methods, control factors, and adjustments that can correct for potential confounding in observational studies. I investigate the efficiency of existing bias correction methods under varying conditions of imbalanced samples through a Monte Carlo simulation. The simulation results are then utilized in a case study for one of the largest primary diagnosis areas, chronic …


Sensitivity Of Mixed Models To Computational Algorithms Of Time Series Data, Gunaime Nevine Apr 2015

Sensitivity Of Mixed Models To Computational Algorithms Of Time Series Data, Gunaime Nevine

Doctoral Dissertations

Statistical analysis is influenced by implementation of the algorithms used to execute the computations associated with various statistical techniques. Over many years; very important criteria for model comparison has been studied and examined, and two algorithms on a single dataset have been performed numerous times. The goal of this research is not comparing two or more models on one dataset, but comparing models with numerical algorithms that have been used to solve them on the same dataset.

In this research, different models have been broadly applied in modeling and their contrasting which are affected by the numerical algorithms in different …


Incorporating Boltzmann Machine Priors For Semantic Labeling In Images And Videos, Andrew Kae Aug 2014

Incorporating Boltzmann Machine Priors For Semantic Labeling In Images And Videos, Andrew Kae

Doctoral Dissertations

Semantic labeling is the task of assigning category labels to regions in an image. For example, a scene may consist of regions corresponding to categories such as sky, water, and ground, or parts of a face such as eyes, nose, and mouth. Semantic labeling is an important mid-level vision task for grouping and organizing image regions into coherent parts. Labeling these regions allows us to better understand the scene itself as well as properties of the objects in the scene, such as their parts, location, and interaction within the scene. Typical approaches for this task include the conditional random field …


Predicting High-Stakes Tests Of Math Achievement Using A Group-Administered Rti Instrument: Validating Skills Measured By The Monitoring Instructional Responsiveness: Math, Jeremy Thomas Coles Aug 2014

Predicting High-Stakes Tests Of Math Achievement Using A Group-Administered Rti Instrument: Validating Skills Measured By The Monitoring Instructional Responsiveness: Math, Jeremy Thomas Coles

Doctoral Dissertations

Three universal screeners and nine progress monitoring probes from the Monitoring Instructional Responsiveness: Math (MIR:M), a silent, group-administered math assessment designed for implementation with an RTI Model, were administered to 223 fifth-grade students. The growth parameters of the overall MIR:M composite and two global composites (math calculation and math reasoning) identified significant variation in student growth, within significant linear and quadratic trajectories. However, there were significant differences in the nature of the growth trajectories that have applied educational implications. In addition, growth parameters across the three composites provided significant predictive potential when using the Tennessee Comprehensive Assessment Program (TCAP) Achievement …


Impacts Of Climate Change On The Evolution Of The Electrical Grid, Melissa Ree Allen Aug 2014

Impacts Of Climate Change On The Evolution Of The Electrical Grid, Melissa Ree Allen

Doctoral Dissertations

Maintaining interdependent infrastructures exposed to a changing climate requires understanding 1) the local impact on power assets; 2) how the infrastructure will evolve as the demand for infrastructure changes location and volume and; 3) what vulnerabilities are introduced by these changing infrastructure topologies. This dissertation attempts to develop a methodology that will a) downscale the climate direct effect on the infrastructure; b) allow population to redistribute in response to increasing extreme events that will increase under climate impacts; and c) project new distributions of electricity demand in the mid-21st century.

The research was structured in three parts. The first …