Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 85

Full-Text Articles in Physical Sciences and Mathematics

Nonparametric Derivative Estimation Using Penalized Splines: Theory And Application, Bright Antwi Boasiako Nov 2023

Nonparametric Derivative Estimation Using Penalized Splines: Theory And Application, Bright Antwi Boasiako

Doctoral Dissertations

This dissertation is in the field of Nonparametric Derivative Estimation using
Penalized Splines. It is conducted in two parts. In the first part, we study the L2
convergence rates of estimating derivatives of mean regression functions using penalized splines. In 1982, Stone provided the optimal rates of convergence for estimating derivatives of mean regression functions using nonparametric methods. Using these rates, Zhou et. al. in their 2000 paper showed that the MSE of derivative estimators based on regression splines approach zero at the optimal rate of convergence. Also, in 2019, Xiao showed that, under some general conditions, penalized spline estimators …


Genetic Associations Of Alzheimer’S Disease And Mild Cognitive Impairment, Scott Hebert Aug 2023

Genetic Associations Of Alzheimer’S Disease And Mild Cognitive Impairment, Scott Hebert

Masters Theses

Over 6 million people are estimated to have been living with Alzheimer’s Disease (AD) in 2020, with another 12 million living with Mild Cognitive Impairment (MCI). Research has been conducted to evaluate genetic links to AD, but more research is needed on the subject. The Alzheimer’s Disease Neuroimaging Initiative (ADNI) has been conducting a longitudinal study of AD and MCI since 2004 and offering their data to research teams around the world. Diagnostic and demographic data was collected from participants, as well as data regarding single nucleotide polymorphisms (SNPs). SNP data was transformed to a binary format regarding whether the …


Forecasting Covid-19 With Temporal Hierarchies And Ensemble Methods, Li Shandross Aug 2023

Forecasting Covid-19 With Temporal Hierarchies And Ensemble Methods, Li Shandross

Masters Theses

Infectious disease forecasting efforts underwent rapid growth during the COVID-19 pandemic, providing guidance for pandemic response and about potential future trends. Yet despite their importance, short-term forecasting models often struggled to produce accurate real-time predictions of this complex and rapidly changing system. This gap in accuracy persisted into the pandemic and warrants the exploration and testing of new methods to glean fresh insights.

In this work, we examined the application of the temporal hierarchical forecasting (THieF) methodology to probabilistic forecasts of COVID-19 incident hospital admissions in the United States. THieF is an innovative forecasting technique that aggregates time-series data into …


Inverse Probability Weighting In Survival Analysis And Network Analysis, Yukun Lu Feb 2023

Inverse Probability Weighting In Survival Analysis And Network Analysis, Yukun Lu

Doctoral Dissertations

Inverse probability weighting is a popular technique to accommodate selection bias due to non-random sampling and missing data. In the first chapter, we develop an inverse probability weighted estimator and an augmented inverse probability weighted estimator of regression coefficients for a linear model with randomly censored covariates, when the censoring mechanism may be dependent on the outcome. We investigate the asymptotic properties of both estimators and evaluate their finite sample performance through extensive simulation studies. We apply the proposed methods to an Alzheimer’s disease study. In the second chapter, we present an application of network analysis in a study of …


Estimation Of Causal Effects In Complex Clustered Data, Joshua R. Nugent Oct 2022

Estimation Of Causal Effects In Complex Clustered Data, Joshua R. Nugent

Doctoral Dissertations

Analysis of clustered data from randomized trials or observational data often poses theoretical and practical statistical challenges, including but not limited to small numbers of independent units, many adjustment variables, continuous exposures, and/or differential clustering across trial arms. Further, commonly-used parametric methods rely on assumptions that may be violated in practice. Motivated by three scientific questions in public health, methods are developed and/or demonstrated for non-parametric estimation of causal effects. In Chapter 1, methods are elaborated for a cluster randomized trial (CRT) with missing individual-level data at baseline and follow-up, a complex sampling strategy, and limited number of clusters. Chapter …


Applications Of Statistical Physics To Ecology: Ising Models And Two-Cycle Coupled Oscillators, Vahini Reddy Nareddy Oct 2022

Applications Of Statistical Physics To Ecology: Ising Models And Two-Cycle Coupled Oscillators, Vahini Reddy Nareddy

Doctoral Dissertations

Many ecological systems exhibit noisy period-2 oscillations and, when they are spatially extended, they undergo phase transition from synchrony to incoherence in the Ising universality class. Period-2 cycles have two possible phases of oscillations and can be represented as two states in the bistable systems. Understanding the dynamics of ecological systems by representing their oscillations as bistable states and developing dynamical models using the tools from statistical physics to predict their future states is the focus of this thesis. As the ecological oscillators with two-cycle behavior undergo phase transitions in the Ising universality class, many features of synchrony and equilibrium …


Bayesian Hierarchical Temporal Modeling And Targeted Learning With Application To Reproductive Health, Herbert P. Susmann Oct 2022

Bayesian Hierarchical Temporal Modeling And Targeted Learning With Application To Reproductive Health, Herbert P. Susmann

Doctoral Dissertations

The international community via the United Nations Sustainable Development Goals has set the target of universal access to reproductive health-care services, including family planning, by 2030. Progress towards reaching this goal is assessed by tracking appropriate demographic and health indicators at national and subnational levels. This task is challenging, however, in populations where relevant data are limited or of low quality. Statistical models are then needed to estimate and project demographic and health indicators in populations based on the available data. Our first contribution, in Chapter 1, is to unify many existing demographic and health indicator models by proposing an …


Statistical Methods To Study Transposon Sequencing Data: Nonparametric Bayesian Models With Sampling Algorithms, Shai He Oct 2022

Statistical Methods To Study Transposon Sequencing Data: Nonparametric Bayesian Models With Sampling Algorithms, Shai He

Doctoral Dissertations

As the development of Next Generation Sequencing(NGS) technology, researchers can easily obtain data from millions of cells( bulk samples) or just collecting data from a single cell. However, while bulk samples can capture broad changes, it may risk providing an average measurement that is not representative of the genetic state of any individual cell. While single-cell experiments can capture the genetic state of the individual cell, a single cell sample can increase uncertainty, sampling enough cells to gain a representative sample of population is expensive. Therefore, there is a need to integrate information from both bulk and single-cell data to …


Three Dimensional Spatio-Temporal Cluster Analysis Of Sars-Cov-2 Infections, Keith W. Allison Jun 2022

Three Dimensional Spatio-Temporal Cluster Analysis Of Sars-Cov-2 Infections, Keith W. Allison

Masters Theses

The COVID-19 pandemic has heightened the need for fine-scale analysis of the clustering of cases of infectious disease in order to better understand and prevent the localized spread of infection. The students living on the University of Massachusetts, Amherst campus provided a unique opportunity to do so, due to frequent mandatory testing during the 2020-2021 academic year, and dense living conditions. The South-West dormitory area is of particular interest due to its extremely high population density, housing around half of students living on campus during normal conditions. Using data gathered by the Public Health Promotion Center (PHPC), we analyzed the …


Gaussian Graphical Models For Omics Data: New Methodology And Applications, Katherine H. Shutta Mar 2022

Gaussian Graphical Models For Omics Data: New Methodology And Applications, Katherine H. Shutta

Doctoral Dissertations

Gaussian graphical models (GGMs) are useful network estimation tools for modeling direct dependencies that characterize multivariate data. The GGM modeling framework is one way to elucidate complex systems-level properties that can be difficult to detect in univariate analyses. In this dissertation, we begin by presenting a tutorial and review of the current state of the field of GGM theory and application. Next, we present a motivating application of GGMs in a study of metabolomic networks associated with chronic distress in women in the Women's Health Initiative (WHI) and in the Nurses' Health Study cohorts. In the third chapter, we present …


Impact Of Loss To Follow-Up And Time Parameterization In Multiple-Period Cluster Randomized Trials And Assessing The Association Between Institution Affiliation And Journal Publication, Jonathan Moyer Mar 2022

Impact Of Loss To Follow-Up And Time Parameterization In Multiple-Period Cluster Randomized Trials And Assessing The Association Between Institution Affiliation And Journal Publication, Jonathan Moyer

Doctoral Dissertations

Difference-in-difference cluster randomized trials (CRTs) use baseline and post-test measurements. Standard power equations for these trials assume no loss to follow-up. We present a general equation for calculating treatment effect variance in difference-in-difference CRTs, with special cases assuming loss to follow-up with replacement of lost participants and loss to follow-up with no replacement but retaining the baseline measurements of all participants. Multiple-period CRTs can represent time as continuous using random coefficients (RC) or categorical using repeated measures ANOVA (RM-ANOVA) analytic models. Previous work recommends the use of RC over RM-ANOVA for CRTs with more than two periods because RC exhibited …


Methods To Improve Inference From Dependent Network Data, Dongah Kim Feb 2022

Methods To Improve Inference From Dependent Network Data, Dongah Kim

Doctoral Dissertations

Over the past decade, network research has increased dramatically. Network data are used in many fields because they contain not only covariates of each observation, but also `relationships' between observations. Therefore, statistical analysis of network data has been rapidly developed. However, network data presents many challenges, such as collecting network data, inferring the prevalence of an outcome of interest, and valid statistical testing typically with highly dependent data. The methods discussed in this thesis are developed to improve statistical inference from dependent network data.


Statistical Improvements For Ecological Learning About Spatial Processes, Gaetan L. Dupont Oct 2021

Statistical Improvements For Ecological Learning About Spatial Processes, Gaetan L. Dupont

Masters Theses

Ecological inquiry is rooted fundamentally in understanding population abundance, both to develop theory and improve conservation outcomes. Despite this importance, estimating abundance is difficult due to the imperfect detection of individuals in a sample population. Further, accounting for space can provide more biologically realistic inference, shifting the focus from abundance to density and encouraging the exploration of spatial processes. To address these challenges, Spatial Capture-Recapture (“SCR”) has emerged as the most prominent method for estimating density reliably. The SCR model is conceptually straightforward: it combines a spatial model of detection with a point process model of the spatial distribution of …


High-Dimensional Feature Selection And Multi-Level Causal Mediation Analysis With Applications To Human Aging And Cluster-Based Intervention Studies, Hachem Saddiki Oct 2021

High-Dimensional Feature Selection And Multi-Level Causal Mediation Analysis With Applications To Human Aging And Cluster-Based Intervention Studies, Hachem Saddiki

Doctoral Dissertations

Many questions in public health and medicine are fundamentally causal in that our objective is to learn the effect of some exposure, randomized or not, on an outcome of interest. As a result, causal inference frameworks and methodologies have gained interest as a promising tool to reliably answer scientific questions. However, the tasks of identifying and efficiently estimating causal effects from observed data still pose significant challenges under complex data generating scenarios. We focus on (1) high-dimensional settings where the number of variables is orders of magnitude higher than the number of observations; and (2) multi-level settings, where study participants …


Monitoring Mammals At Multiple Scales: Case Studies From Carnivore Communities, Kadambari Devarajan Oct 2021

Monitoring Mammals At Multiple Scales: Case Studies From Carnivore Communities, Kadambari Devarajan

Doctoral Dissertations

Carnivores are distributed widely and threatened by habitat loss, poaching, climate change, and disease. They are considered integral to ecosystem function through their direct and indirect interactions with species at different trophic levels. Given the importance of carnivores, it is of high conservation priority to understand the processes driving carnivore assemblages in different systems. It is thus essential to determine the abiotic and biotic drivers of carnivore community composition at different spatial scales and address the following questions: (i) What factors influence carnivore community composition and diversity? (ii) How do the factors influencing carnivore communities vary across spatial and temporal …


Measurement Invariance Across Immigrant And Non-Immigrant Populations On Pisa Cognitive And Non-Cognitive Scales, Maritza Casas Oct 2021

Measurement Invariance Across Immigrant And Non-Immigrant Populations On Pisa Cognitive And Non-Cognitive Scales, Maritza Casas

Doctoral Dissertations

International large-scale educational assessments (ILSAs) have played a relevant role in educational policies targeting immigrant students across countries as their results are used by governments as input for decision-making purposes. Given the potential impact that ILSAs can have, the psychometric features of these assessments must be carefully assessed and empirical evidence about the extent to which the inferences made based on test results are valid must be collected. To do so, the first step is to determine if the test results have the same meaning across countries and groups of examinees that is, if the measures are invariant so that …


Using Generalizability And Rasch Measurement Theory To Ensure Rigorous Measurement In An International Development Education Evaluation, Louise Bahry Oct 2021

Using Generalizability And Rasch Measurement Theory To Ensure Rigorous Measurement In An International Development Education Evaluation, Louise Bahry

Doctoral Dissertations

Between the United States and Great Britain, over 30 billion USD was spent in 2018 on international aid, over a billion of which is dedicated to education programs alone. Recently, there has been increased attention on the rigorous evaluation of aid-funded programs, moving beyond counting outputs to the measurement of educational impact. The current study uses two methodological approaches (Generalizability (Brennan, 1992, 2001) and Rasch Measurement Theory (Andrich, 1978; Rasch, 1980; Wright & Masters, 1982) to analyze data from math and literacy assessments, and self-report surveys used in an international evaluation of an educational initiative in the Democratic Republic of …


Evaluating Public Masking Mandates On Covid-19 Growth Rates In U.S. States, Angus K. Wong Jul 2021

Evaluating Public Masking Mandates On Covid-19 Growth Rates In U.S. States, Angus K. Wong

Masters Theses

U.S. state governments have implemented numerous policies to help mitigate the spread of COVID-19. While there is strong biological evidence supporting the wearing of face masks or coverings in public spaces, the impact of public masking policies remains unclear. We aimed to evaluate how early versus delayed implementation of state-level public masking orders impacted subsequent COVID-19 growth rates. We defined “early” implementation as having a state-level mandate in place before September 1, 2020, the approximate start of the school-year. We defined COVID-19 growth rates as the relative increase in confirmed cases 7, 14, 21, 30, 45, 60-days after September 1. …


Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang Jul 2021

Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang

Doctoral Dissertations

In the process of statistical modeling, the descriptive modeling plays an essential role in accelerating the formulation of plausible hypotheses in the subsequent explanatory modeling and facilitating the selection of potential variables in the subsequent predictive modeling. Especially, for multivariate categorical data analysis, it is desirable to use the descriptive modeling methods for uncovering and summarizing the potential association structure among multiple categorical variables in a compact manner. However, many classical methods in this case either rely on strong assumptions for parametric models or become infeasible when the data dimension is higher. To this end, we propose a model-free method …


Geometric Representation Learning, Luke Vilnis Apr 2021

Geometric Representation Learning, Luke Vilnis

Doctoral Dissertations

Vector embedding models are a cornerstone of modern machine learning methods for knowledge representation and reasoning. These methods aim to turn semantic questions into geometric questions by learning representations of concepts and other domain objects in a lower-dimensional vector space. In that spirit, this work advocates for density- and region-based representation learning. Embedding domain elements as geometric objects beyond a single point enables us to naturally represent breadth and polysemy, make asymmetric comparisons, answer complex queries, and provides a strong inductive bias when labeled data is scarce. We present a model for word representation using Gaussian densities, enabling asymmetric entailment …


Interacting Effects Of Climate And Biotic Factors On Mesocarnivore Distribution And Snowshoe Hare Demography Along The Boreal-Temperate Ecotone, Alexej P. Siren Jul 2020

Interacting Effects Of Climate And Biotic Factors On Mesocarnivore Distribution And Snowshoe Hare Demography Along The Boreal-Temperate Ecotone, Alexej P. Siren

Doctoral Dissertations

The motivation of my dissertation research was to understand the influence of climate and biotic factors on range limits with a focus on winter-adapted species, including the Canada lynx (Lynx canadensis), American marten (Martes americana), and snowshoe hare (Lepus americanus). I investigated range dynamics along the boreal-temperate ecotone of the northeastern US. Through an integrative literature review, I developed a theoretical framework building from existing thinking on range limits and ecological theory. I used this theory for my second chapter to evaluate direct and indirect causes of carnivore range limits in the northeastern US, …


Latent Class Models For At-Risk Populations, Shuaimin Kang Jul 2020

Latent Class Models For At-Risk Populations, Shuaimin Kang

Doctoral Dissertations

Clustering Network Tree Data From Respondent-Driven Sampling With Application to Opioid Users in New York City There is great interest in finding meaningful subgroups of attributed network data. There are many available methods for clustering complete network. Unfortunately, much network data is collected through sampling, and therefore incomplete. Respondent-driven sampling (RDS) is a widely used method for sampling hard-to-reach human populations based on tracing links in the underlying unobserved social network. The resulting data therefore have tree structure representing a sub-sample of the network, along with many nodal attributes. In this paper, we introduce an approach to adjust mixture models …


The Limits Of Location Privacy In Mobile Devices, Keen Yuun Sung Jul 2020

The Limits Of Location Privacy In Mobile Devices, Keen Yuun Sung

Doctoral Dissertations

Mobile phones are widely adopted by users across the world today. However, the privacy implications of persistent connectivity are not well understood. This dissertation focuses on one important concern of mobile phone users: location privacy. I approach this problem from the perspective of three adversaries that users are exposed to via smartphone apps: the mobile advertiser, the app developer, and the cellular service provider. First, I quantify the proportion of mobile users who use location permissive apps and are able to be tracked through their advertising identifier, and demonstrate a mark and recapture attack that allows continued tracking of users …


Bayesian Methods For The Assessment Of Reporting Errors For Data-Sparse Population-Periods With Applications To Estimating Mortality, Emily Peterson Mar 2020

Bayesian Methods For The Assessment Of Reporting Errors For Data-Sparse Population-Periods With Applications To Estimating Mortality, Emily Peterson

Doctoral Dissertations

Population level mortality data is often subject to substantial reporting errors due to misclassification of cause of death, misclassification of death status, or age reporting errors. Accuracy of error-prone data sources can be assessed by comparing such data to gold standard data for the same population-period. We present Bayesian methods for assessing the extent of reporting errors across different population-periods and generalizing those to settings where gold-standard data are lacking. Firstly, we investigate misclassification errors of maternal cause of death reporting in civil registration vital statistics data. We use a Bayesian hierarchical bivariate random-walk model to estimate country-year specific sensitivity …


Nanoindentation Characterization Of Elastic Properties Of Shales And Swelling Clay Minerals, Shengmin Luo Mar 2020

Nanoindentation Characterization Of Elastic Properties Of Shales And Swelling Clay Minerals, Shengmin Luo

Doctoral Dissertations

Oil and gas shales are a class of multiscale, multiphase, hybrid inorganic-organic sedimentary rocks that consist of a generally uniform, preferentially oriented clay matrix with randomly embedded silt and sand particles as solid inclusions. A thorough understanding of the mechanical properties of shales is crucial for the exploration and production of oil and gas in the unconventional shale reservoirs, but it can be a challenging task due to their nature of compositional heterogeneity and microstructural anisotropy. In efforts to better characterize the mechanical properties of shales across different length scales and to fundamentally understand the laws of upscaling from individual …


Joint Asymptotics For Smoothing Spline Semiparametric Nonlinear Models, Jiahui Yu Oct 2019

Joint Asymptotics For Smoothing Spline Semiparametric Nonlinear Models, Jiahui Yu

Doctoral Dissertations

We study the joint asymptotics of general smoothing spline semiparametric models in the settings of density estimation and regression. We provide a systematic framework which incorporates many existing models as special cases, and further allows for nonlinear relationships between the finite-dimensional Euclidean parameter and the infinite-dimensional functional parameter. For both density estimation and regression, we establish the local existence and uniqueness of the penalized likelihood estimators for our proposed models. In the density estimation setting, we prove joint consistency and obtain the rates of convergence of the joint estimator in an appropriate norm. The convergence rate of the parametric component …


Characterization Of The Anomalous Ph Of Aqueous Nanoemulsions, Kieran P. Ramos Oct 2019

Characterization Of The Anomalous Ph Of Aqueous Nanoemulsions, Kieran P. Ramos

Doctoral Dissertations

Aqueous water-in-oil nanoemulsions have emerged as a versatile tool for use in microfluidics, drug delivery, single-molecule measurements, and other research. Nanoemulsions are often prepared with perfluorocarbons which are remarkably biocompatbile due to their stability, low surface tension, lipophobicity, and hydrophobicity. Therefore it is often assumed that droplet contents are unperturbed by the perfluorinated surface. However, in microemulsions, which are similar to nanoemulsions, it is known that either the pH of the aqueous phase or the ionization constants of encapsulated molecules are different from bulk solution. There is also recent evidence of low pH in perfluorinated aqueous nanoemulsions. The current underlying …


Function And Dissipation In Finite State Automata - From Computing To Intelligence And Back, Natesh Ganesh Oct 2019

Function And Dissipation In Finite State Automata - From Computing To Intelligence And Back, Natesh Ganesh

Doctoral Dissertations

Society has benefited from the technological revolution and the tremendous growth in computing powered by Moore's law. However, we are fast approaching the ultimate physical limits in terms of both device sizes and the associated energy dissipation. It is important to characterize these limits in a physically grounded and implementation-agnostic manner, in order to capture the fundamental energy dissipation costs associated with performing computing operations with classical information in nano-scale quantum systems. It is also necessary to identify and understand the effect of quantum in-distinguishability, noise, and device variability on these dissipation limits. Identifying these parameters is crucial to designing …


Model-Form Uncertainty Quantification For Predictive Probabilistic Graphical Models, Jinchao Feng Oct 2019

Model-Form Uncertainty Quantification For Predictive Probabilistic Graphical Models, Jinchao Feng

Doctoral Dissertations

In this thesis, we focus on Uncertainty Quantification and Sensitivity Analysis, which can provide performance guarantees for predictive models built with both aleatoric and epistemic uncertainties, as well as data, and identify which components in a model have the most influence on predictions of our quantities of interest. In the first part (Chapter 2), we propose non-parametric methods for both local and global sensitivity analysis of chemical reaction models with correlated parameter dependencies. The developed mathematical and statistical tools are applied to a benchmark Langmuir competitive adsorption model on a close packed platinum surface, whose parameters, estimated from quantum-scale computations, …


Allocative Poisson Factorization For Computational Social Science, Aaron Schein Jul 2019

Allocative Poisson Factorization For Computational Social Science, Aaron Schein

Doctoral Dissertations

Social science data often comes in the form of high-dimensional discrete data such as categorical survey responses, social interaction records, or text. These data sets exhibit high degrees of sparsity, missingness, overdispersion, and burstiness, all of which present challenges to traditional statistical modeling techniques. The framework of Poisson factorization (PF) has emerged in recent years as a natural way to model high-dimensional discrete data sets. This framework assumes that each observed count in a data set is a Poisson random variable $y ~ Pois(\mu)$ whose rate parameter $\mu$ is a function of shared model parameters. This thesis examines a specific …