Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

University of Massachusetts Amherst

Discipline
Keyword
Publication Year
Publication
Publication Type

Articles 1 - 30 of 33

Full-Text Articles in Statistical Models

Nonparametric Derivative Estimation Using Penalized Splines: Theory And Application, Bright Antwi Boasiako Nov 2023

Nonparametric Derivative Estimation Using Penalized Splines: Theory And Application, Bright Antwi Boasiako

Doctoral Dissertations

This dissertation is in the field of Nonparametric Derivative Estimation using
Penalized Splines. It is conducted in two parts. In the first part, we study the L2
convergence rates of estimating derivatives of mean regression functions using penalized splines. In 1982, Stone provided the optimal rates of convergence for estimating derivatives of mean regression functions using nonparametric methods. Using these rates, Zhou et. al. in their 2000 paper showed that the MSE of derivative estimators based on regression splines approach zero at the optimal rate of convergence. Also, in 2019, Xiao showed that, under some general conditions, penalized spline estimators …


Forecasting Covid-19 With Temporal Hierarchies And Ensemble Methods, Li Shandross Aug 2023

Forecasting Covid-19 With Temporal Hierarchies And Ensemble Methods, Li Shandross

Masters Theses

Infectious disease forecasting efforts underwent rapid growth during the COVID-19 pandemic, providing guidance for pandemic response and about potential future trends. Yet despite their importance, short-term forecasting models often struggled to produce accurate real-time predictions of this complex and rapidly changing system. This gap in accuracy persisted into the pandemic and warrants the exploration and testing of new methods to glean fresh insights.

In this work, we examined the application of the temporal hierarchical forecasting (THieF) methodology to probabilistic forecasts of COVID-19 incident hospital admissions in the United States. THieF is an innovative forecasting technique that aggregates time-series data into …


Applications Of Statistical Physics To Ecology: Ising Models And Two-Cycle Coupled Oscillators, Vahini Reddy Nareddy Oct 2022

Applications Of Statistical Physics To Ecology: Ising Models And Two-Cycle Coupled Oscillators, Vahini Reddy Nareddy

Doctoral Dissertations

Many ecological systems exhibit noisy period-2 oscillations and, when they are spatially extended, they undergo phase transition from synchrony to incoherence in the Ising universality class. Period-2 cycles have two possible phases of oscillations and can be represented as two states in the bistable systems. Understanding the dynamics of ecological systems by representing their oscillations as bistable states and developing dynamical models using the tools from statistical physics to predict their future states is the focus of this thesis. As the ecological oscillators with two-cycle behavior undergo phase transitions in the Ising universality class, many features of synchrony and equilibrium …


Methods To Improve Inference From Dependent Network Data, Dongah Kim Feb 2022

Methods To Improve Inference From Dependent Network Data, Dongah Kim

Doctoral Dissertations

Over the past decade, network research has increased dramatically. Network data are used in many fields because they contain not only covariates of each observation, but also `relationships' between observations. Therefore, statistical analysis of network data has been rapidly developed. However, network data presents many challenges, such as collecting network data, inferring the prevalence of an outcome of interest, and valid statistical testing typically with highly dependent data. The methods discussed in this thesis are developed to improve statistical inference from dependent network data.


Statistical Improvements For Ecological Learning About Spatial Processes, Gaetan L. Dupont Oct 2021

Statistical Improvements For Ecological Learning About Spatial Processes, Gaetan L. Dupont

Masters Theses

Ecological inquiry is rooted fundamentally in understanding population abundance, both to develop theory and improve conservation outcomes. Despite this importance, estimating abundance is difficult due to the imperfect detection of individuals in a sample population. Further, accounting for space can provide more biologically realistic inference, shifting the focus from abundance to density and encouraging the exploration of spatial processes. To address these challenges, Spatial Capture-Recapture (“SCR”) has emerged as the most prominent method for estimating density reliably. The SCR model is conceptually straightforward: it combines a spatial model of detection with a point process model of the spatial distribution of …


Monitoring Mammals At Multiple Scales: Case Studies From Carnivore Communities, Kadambari Devarajan Oct 2021

Monitoring Mammals At Multiple Scales: Case Studies From Carnivore Communities, Kadambari Devarajan

Doctoral Dissertations

Carnivores are distributed widely and threatened by habitat loss, poaching, climate change, and disease. They are considered integral to ecosystem function through their direct and indirect interactions with species at different trophic levels. Given the importance of carnivores, it is of high conservation priority to understand the processes driving carnivore assemblages in different systems. It is thus essential to determine the abiotic and biotic drivers of carnivore community composition at different spatial scales and address the following questions: (i) What factors influence carnivore community composition and diversity? (ii) How do the factors influencing carnivore communities vary across spatial and temporal …


Measurement Invariance Across Immigrant And Non-Immigrant Populations On Pisa Cognitive And Non-Cognitive Scales, Maritza Casas Oct 2021

Measurement Invariance Across Immigrant And Non-Immigrant Populations On Pisa Cognitive And Non-Cognitive Scales, Maritza Casas

Doctoral Dissertations

International large-scale educational assessments (ILSAs) have played a relevant role in educational policies targeting immigrant students across countries as their results are used by governments as input for decision-making purposes. Given the potential impact that ILSAs can have, the psychometric features of these assessments must be carefully assessed and empirical evidence about the extent to which the inferences made based on test results are valid must be collected. To do so, the first step is to determine if the test results have the same meaning across countries and groups of examinees that is, if the measures are invariant so that …


Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang Jul 2021

Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang

Doctoral Dissertations

In the process of statistical modeling, the descriptive modeling plays an essential role in accelerating the formulation of plausible hypotheses in the subsequent explanatory modeling and facilitating the selection of potential variables in the subsequent predictive modeling. Especially, for multivariate categorical data analysis, it is desirable to use the descriptive modeling methods for uncovering and summarizing the potential association structure among multiple categorical variables in a compact manner. However, many classical methods in this case either rely on strong assumptions for parametric models or become infeasible when the data dimension is higher. To this end, we propose a model-free method …


Latent Class Models For At-Risk Populations, Shuaimin Kang Jul 2020

Latent Class Models For At-Risk Populations, Shuaimin Kang

Doctoral Dissertations

Clustering Network Tree Data From Respondent-Driven Sampling With Application to Opioid Users in New York City There is great interest in finding meaningful subgroups of attributed network data. There are many available methods for clustering complete network. Unfortunately, much network data is collected through sampling, and therefore incomplete. Respondent-driven sampling (RDS) is a widely used method for sampling hard-to-reach human populations based on tracing links in the underlying unobserved social network. The resulting data therefore have tree structure representing a sub-sample of the network, along with many nodal attributes. In this paper, we introduce an approach to adjust mixture models …


Bayesian Methods For The Assessment Of Reporting Errors For Data-Sparse Population-Periods With Applications To Estimating Mortality, Emily Peterson Mar 2020

Bayesian Methods For The Assessment Of Reporting Errors For Data-Sparse Population-Periods With Applications To Estimating Mortality, Emily Peterson

Doctoral Dissertations

Population level mortality data is often subject to substantial reporting errors due to misclassification of cause of death, misclassification of death status, or age reporting errors. Accuracy of error-prone data sources can be assessed by comparing such data to gold standard data for the same population-period. We present Bayesian methods for assessing the extent of reporting errors across different population-periods and generalizing those to settings where gold-standard data are lacking. Firstly, we investigate misclassification errors of maternal cause of death reporting in civil registration vital statistics data. We use a Bayesian hierarchical bivariate random-walk model to estimate country-year specific sensitivity …


Allocative Poisson Factorization For Computational Social Science, Aaron Schein Jul 2019

Allocative Poisson Factorization For Computational Social Science, Aaron Schein

Doctoral Dissertations

Social science data often comes in the form of high-dimensional discrete data such as categorical survey responses, social interaction records, or text. These data sets exhibit high degrees of sparsity, missingness, overdispersion, and burstiness, all of which present challenges to traditional statistical modeling techniques. The framework of Poisson factorization (PF) has emerged in recent years as a natural way to model high-dimensional discrete data sets. This framework assumes that each observed count in a data set is a Poisson random variable $y ~ Pois(\mu)$ whose rate parameter $\mu$ is a function of shared model parameters. This thesis examines a specific …


Population Viability And Connectivity Of The Federally Threatened Eastern Indigo Snake In Central Peninsular Florida, Javan Bauder Mar 2019

Population Viability And Connectivity Of The Federally Threatened Eastern Indigo Snake In Central Peninsular Florida, Javan Bauder

Doctoral Dissertations

Understanding the factors influencing the likelihood of persistence of real-world populations requires both an accurate understanding of the traits and behaviors of individuals within those populations (e.g., movement, habitat selection, survival, fecundity, dispersal) but also an understanding of how those traits and behaviors are influenced by landscape features. The federally threatened eastern indigo snake (EIS, Drymarchon couperi) has declined throughout its range primarily due to anthropogenically-induced habitat loss and fragmentation making spatially-explicit assessments of population viability and connectivity essential for understanding its current status and directing future conservation efforts. The primary goal of my dissertation was to understand how …


Quantile Regression For Survival Data With Delayed Entry, Boqin Sun Nov 2018

Quantile Regression For Survival Data With Delayed Entry, Boqin Sun

Doctoral Dissertations

Delayed entry arises frequently in follow-up studies for survival outcomes, where additional study subjects enter during the study period. We propose a quantile regression model to analyze survival data subject to delayed entry and right-censoring. Such a model offers flexibility in assessing covariate effects on survival outcome and the regression coefficients are interpretable as direct effects on the event time. Under the conditional independent censoring assumption, we proposed a weighted martingale-based estimating equation, and formulated the solution finding as a $\ell_1$-type convex optimization problem, which was solved through a linear programming algorithm. We established uniform consistency and weak convergence of …


Variational Approximations For Density Deconvolution, Yue Chang Nov 2018

Variational Approximations For Density Deconvolution, Yue Chang

Doctoral Dissertations

This thesis considers the problem of density estimation when the variables of interest are subject to measurement error. The measurement error is assumed to be additive and homoscedastic. We specify the density of interest by a Dirichlet Process Mixture Model and establish variational approximation approaches to the density deconvolution problem. Gaussian and Laplacian error distributions are considered, which are representatives of supersmooth and ordinary smooth distributions, respectively. We develop two variational approximation algorithms for Gaussian error deconvolution and one variational approximation algorithm for Laplacian error deconvolution. Their performances are compared to deconvoluting kernels and Monte Carlo Markov Chain method by …


Model-Based Predictive Analytics For Additive And Smart Manufacturing, Zhuo Yang Oct 2018

Model-Based Predictive Analytics For Additive And Smart Manufacturing, Zhuo Yang

Doctoral Dissertations

Qualification and certification for additive and smart manufacturing systems can be uncertain and very costly. Using available historical data can mitigate some costs of producing and testing sample parts. However, use of such data lacks the flexibility to represent specific new problems which decreases predictive accuracy and efficiency. To address these compelling needs, in this dissertation modeling techniques are introduced that can proactively estimate results expected from additive and smart manufacturing processes swiftly and with practical levels of accuracy and reliability. More specifically, this research addresses the current challenges and limitations posed by use of available data and the high …


Essays In Financial Economics: Announcement Effects In Fixed Income Markets, James J. Forest Oct 2018

Essays In Financial Economics: Announcement Effects In Fixed Income Markets, James J. Forest

Doctoral Dissertations

ABSTRACT ESSAYS IN FINANCIAL ECONOMICS: ANNOUNCEMENT EFFECTS IN FIXED INCOME MARKETS PHD IN FINANCE MAY 2018 JAMES J FOREST B.A., FRAMINGHAM STATE UNIVERSITY M.S., NORTHEASTERN UNIVERSITY Ph.D., UNIVERSITY OF MASSACHUSETTS – AMHERST Directed by: Professor Hossein B. Kazemi This dissertation demonstrates the use of empirical techniques for dealing with modeling issues that arise when analyzing announcement effects in fixed income markets. It describes empirical challenges in achieving unbiased and efficient parameter estimates and shows the importance of modelling a wide range of macroeconomic announcement effects to avoid omitted variable bias. Employing techniques common in Macroeconomics, financial market researchers are better …


Real-Time Dengue Forecasting In Thailand: A Comparison Of Penalized Regression Approaches Using Internet Search Data, Caroline Kusiak Oct 2018

Real-Time Dengue Forecasting In Thailand: A Comparison Of Penalized Regression Approaches Using Internet Search Data, Caroline Kusiak

Masters Theses

Dengue fever affects over 390 million people annually worldwide and is of particu- lar concern in Southeast Asia where it is one of the leading causes of hospitalization. Modeling trends in dengue occurrence can provide valuable information to Public Health officials, however many challenges arise depending on the data available. In Thailand, reporting of dengue cases is often delayed by more than 6 weeks, and a small fraction of cases may not be reported until over 11 months after they occurred. This study shows that incorporating data on Google Search trends can improve dis- ease predictions in settings with severely …


Juvenile River Herring In Freshwater Lakes: Sampling Approaches For Evaluating Growth And Survival, Matthew T. Devine Oct 2017

Juvenile River Herring In Freshwater Lakes: Sampling Approaches For Evaluating Growth And Survival, Matthew T. Devine

Masters Theses

River herring, collectively alewives (Alosa pseudoharengus) and blueback herring (A. aestivalis), have experienced substantial population declines over the past five decades due in large part to overfishing, combined with other sources of mortality, and disrupted access to critical freshwater spawning habitats. Anadromous river herring populations are currently assessed by counting adults in rivers during upstream spawning migrations, but no field-based assessment methods exist for estimating juvenile densities in freshwater nursery habitats. Counts of 4-year-old migrating adults are variable and prevent understanding about how mortality acts on different life stages prior to returning to spawn (e.g., juveniles …


Modelling Bird Migration With Motus Data And Bayesian State-Space Models, Justin Baldwin Oct 2017

Modelling Bird Migration With Motus Data And Bayesian State-Space Models, Justin Baldwin

Masters Theses

Bird migration is a poorly-known yet important phenomenon, as understanding movement patterns of birds can inform conservation strategies and public health policy for animal-borne diseases. Recent advances in wildlife tracking technology, in particular the Motus system, have allowed researchers to track even small flying birds and insects with radio transmitters that weigh fractions of a gram. This system relies on a community-based distributed sensor network that detects tagged animals as they move through the detection nodes on journeys that range from small local movements to intercontinental migrations. The quantity of data generated by the Motus system is unprecedented, is on …


Information Metrics For Predictive Modeling And Machine Learning, Kostantinos Gourgoulias Jul 2017

Information Metrics For Predictive Modeling And Machine Learning, Kostantinos Gourgoulias

Doctoral Dissertations

The ever-increasing complexity of the models used in predictive modeling and data science and their use for prediction and inference has made the development of tools for uncertainty quantification and model selection especially important. In this work, we seek to understand the various trade-offs associated with the simulation of stochastic systems. Some trade-offs are computational, e.g., execution time of an algorithm versus accuracy of simulation. Others are analytical: whether or not we are able to find tractable substitutes for quantities of interest, e.g., distributions, ergodic averages, etc. The first two chapters of this thesis deal with the study of the …


Statistical Methods For High Dimensional Data Arising From Large Epidemiological Studies, Hui Xu Jul 2017

Statistical Methods For High Dimensional Data Arising From Large Epidemiological Studies, Hui Xu

Doctoral Dissertations

In this thesis, we propose statistical models for addressing commonly encountered data types and study designs in large epidemiologic investigations aimed at understanding the molecular basis of complex disorders. The motivating applications come from diverse disease areas in Women's Health, including the study of type II diabetes in the Women's Health Initiative (WHI), invasive breast cancer in the Nurses' Health Study and the study of the metabolomic underpinnings of cardiovascular disease in the WHI. We have also put significant effort into making the implementation of the proposed methods accessible through freely available, user-friendly software packages in R. The first chapter …


Inference In Networking Systems With Designed Measurements, Chang Liu Mar 2017

Inference In Networking Systems With Designed Measurements, Chang Liu

Doctoral Dissertations

Networking systems consist of network infrastructures and the end-hosts have been essential in supporting our daily communication, delivering huge amount of content and large number of services, and providing large scale distributed computing. To monitor and optimize the performance of such networking systems, or to provide flexible functionalities for the applications running on top of them, it is important to know the internal metrics of the networking systems such as link loss rates or path delays. The internal metrics are often not directly available due to the scale and complexity of the networking systems. This motivates the techniques of inference …


Inference From Network Data In Hard-To-Reach Populations, Isabelle Beaudry Mar 2017

Inference From Network Data In Hard-To-Reach Populations, Isabelle Beaudry

Doctoral Dissertations

The objective of this thesis is to develop methods to make inference about the prevalence of an outcome of interest in hard-to-reach populations. The proposed methods address issues specific to the survey strategies employed to access those populations. One of the common sampling methodology used in this context is respondent-driven sampling (RDS). Under RDS, the network connecting members of the target population is used to uncover the hidden members. Specialized techniques are then used to make inference from the data collected in this fashion. Our first objective is to correct traditional RDS prevalence estimators and their associated uncertainty estimators for …


Niche-Based Modeling Of Japanese Stiltgrass (Microstegium Vimineum) Using Presence-Only Information, Nathan Bush Nov 2015

Niche-Based Modeling Of Japanese Stiltgrass (Microstegium Vimineum) Using Presence-Only Information, Nathan Bush

Masters Theses

The Connecticut River watershed is experiencing a rapid invasion of aggressive non-native plant species, which threaten watershed function and structure. Volunteer-based monitoring programs such as the University of Massachusetts’ OutSmart Invasives Species Project, Early Detection Distribution Mapping System (EDDMapS) and the Invasive Plant Atlas of New England (IPANE) have gathered valuable invasive plant data. These programs provide a unique opportunity for researchers to model invasive plant species utilizing citizen-sourced data. This study took advantage of these large data sources to model invasive plant distribution and to determine environmental and biophysical predictors that are most influential in dispersion, and to identify …


Estimation Problems In Complex Field Studies With Deep Interactions: Time-To-Event And Local Regression Models For Environmental Effects On Vital Rates, Krzysztof M. Sakrejda Nov 2015

Estimation Problems In Complex Field Studies With Deep Interactions: Time-To-Event And Local Regression Models For Environmental Effects On Vital Rates, Krzysztof M. Sakrejda

Doctoral Dissertations

Field studies that measure vital rates in context over extended time periods are a cornerstone of our understanding of population processes. These studies inform us about the relationship between biological process and environmental noise in an irreplaceable way. These data sets bring ``big data'' and ``big model'' challenges, which limit the application of standard software (e.g., \textbf{BUGS}). The environmental sensitivity of vital rates is also expected to exhibit interactions and non-linearity, which typically result in difficult model selection questions in large data sets. Finally, long-term ecological data sets often contain complex temporal structure. In commonly applied discrete-time models complex temporal …


Variable Selection In Single Index Varying Coefficient Models With Lasso, Peng Wang Nov 2015

Variable Selection In Single Index Varying Coefficient Models With Lasso, Peng Wang

Doctoral Dissertations

Single index varying coefficient model is a very attractive statistical model due to its ability to reduce dimensions and easy-of-interpretation. There are many theoretical studies and practical applications with it, but typically without features of variable selection, and no public software is available for solving it. Here we propose a new algorithm to fit the single index varying coefficient model, and to carry variable selection in the index part with LASSO. The core idea is a two-step scheme which alternates between estimating coefficient functions and selecting-and-estimating the single index. Both in simulation and in application to a Geoscience dataset, we …


Computational Communication Intelligence: Exploring Linguistic Manifestation And Social Dynamics In Online Communication, Xiaoxi Xu Nov 2014

Computational Communication Intelligence: Exploring Linguistic Manifestation And Social Dynamics In Online Communication, Xiaoxi Xu

Doctoral Dissertations

We now live in an age of online communication. As social media becomes an integral part of our life, online communication becomes an essential life skill. In this dissertation, we aim to understand how people effectively communicate online. We research components of success in online communication and present scientific methods to study the skill of effective communication. This research advances the state of art in machine learning and communication studies. For communication studies, we pioneer the study of a communication phenomenon we call Communication Intelligence in online interactions. We create a theory about communication intelligence that measures participants’ ten high-order …


Incorporating Boltzmann Machine Priors For Semantic Labeling In Images And Videos, Andrew Kae Aug 2014

Incorporating Boltzmann Machine Priors For Semantic Labeling In Images And Videos, Andrew Kae

Doctoral Dissertations

Semantic labeling is the task of assigning category labels to regions in an image. For example, a scene may consist of regions corresponding to categories such as sky, water, and ground, or parts of a face such as eyes, nose, and mouth. Semantic labeling is an important mid-level vision task for grouping and organizing image regions into coherent parts. Labeling these regions allows us to better understand the scene itself as well as properties of the objects in the scene, such as their parts, location, and interaction within the scene. Typical approaches for this task include the conditional random field …


Spatial And Temporal Correlations Of Freeway Link Speeds: An Empirical Study, Piotr J. Rachtan Jan 2012

Spatial And Temporal Correlations Of Freeway Link Speeds: An Empirical Study, Piotr J. Rachtan

Masters Theses 1911 - February 2014

Congestion on roadways and high level of uncertainty of traffic conditions are major considerations for trip planning. The purpose of this research is to investigate the characteristics and patterns of spatial and temporal correlations and also to detect other variables that affect correlation in a freeway setting. 5-minute speed aggregates from the Performance Measurement System (PeMS) database are obtained for two directions of an urban freeway – I-10 between Santa Monica and Los Angeles, California. Observations are for all non-holiday weekdays between January 1st and June 30th, 2010. Other variables include traffic flow, ramp locations, number of lanes and the …


Route Choice Behavior In Risky Networks With Real-Time Information, Michael D. Razo Jan 2010

Route Choice Behavior In Risky Networks With Real-Time Information, Michael D. Razo

Masters Theses 1911 - February 2014

This research investigates route choice behavior in networks with risky travel times and real-time information. A stated preference survey is conducted in which subjects use a PC-based interactive maps to choose routes link-by-link in various scenarios. The scenarios include two types of maps: the first presenting a choice between one stochastic route and one deterministic route, and the second with real-time information and an available detour. The first type measures the basic risk attitude of the subject. The second type allows for strategic planning, and measures the effect of this opportunity on subjects' choice behavior.

Results from each subject are …