Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 53

Full-Text Articles in Physical Sciences and Mathematics

Nonparametric Derivative Estimation Using Penalized Splines: Theory And Application, Bright Antwi Boasiako Nov 2023

Nonparametric Derivative Estimation Using Penalized Splines: Theory And Application, Bright Antwi Boasiako

Doctoral Dissertations

This dissertation is in the field of Nonparametric Derivative Estimation using
Penalized Splines. It is conducted in two parts. In the first part, we study the L2
convergence rates of estimating derivatives of mean regression functions using penalized splines. In 1982, Stone provided the optimal rates of convergence for estimating derivatives of mean regression functions using nonparametric methods. Using these rates, Zhou et. al. in their 2000 paper showed that the MSE of derivative estimators based on regression splines approach zero at the optimal rate of convergence. Also, in 2019, Xiao showed that, under some general conditions, penalized spline estimators …


Inverse Probability Weighting In Survival Analysis And Network Analysis, Yukun Lu Feb 2023

Inverse Probability Weighting In Survival Analysis And Network Analysis, Yukun Lu

Doctoral Dissertations

Inverse probability weighting is a popular technique to accommodate selection bias due to non-random sampling and missing data. In the first chapter, we develop an inverse probability weighted estimator and an augmented inverse probability weighted estimator of regression coefficients for a linear model with randomly censored covariates, when the censoring mechanism may be dependent on the outcome. We investigate the asymptotic properties of both estimators and evaluate their finite sample performance through extensive simulation studies. We apply the proposed methods to an Alzheimer’s disease study. In the second chapter, we present an application of network analysis in a study of …


Estimation Of Causal Effects In Complex Clustered Data, Joshua R. Nugent Oct 2022

Estimation Of Causal Effects In Complex Clustered Data, Joshua R. Nugent

Doctoral Dissertations

Analysis of clustered data from randomized trials or observational data often poses theoretical and practical statistical challenges, including but not limited to small numbers of independent units, many adjustment variables, continuous exposures, and/or differential clustering across trial arms. Further, commonly-used parametric methods rely on assumptions that may be violated in practice. Motivated by three scientific questions in public health, methods are developed and/or demonstrated for non-parametric estimation of causal effects. In Chapter 1, methods are elaborated for a cluster randomized trial (CRT) with missing individual-level data at baseline and follow-up, a complex sampling strategy, and limited number of clusters. Chapter …


Applications Of Statistical Physics To Ecology: Ising Models And Two-Cycle Coupled Oscillators, Vahini Reddy Nareddy Oct 2022

Applications Of Statistical Physics To Ecology: Ising Models And Two-Cycle Coupled Oscillators, Vahini Reddy Nareddy

Doctoral Dissertations

Many ecological systems exhibit noisy period-2 oscillations and, when they are spatially extended, they undergo phase transition from synchrony to incoherence in the Ising universality class. Period-2 cycles have two possible phases of oscillations and can be represented as two states in the bistable systems. Understanding the dynamics of ecological systems by representing their oscillations as bistable states and developing dynamical models using the tools from statistical physics to predict their future states is the focus of this thesis. As the ecological oscillators with two-cycle behavior undergo phase transitions in the Ising universality class, many features of synchrony and equilibrium …


Bayesian Hierarchical Temporal Modeling And Targeted Learning With Application To Reproductive Health, Herbert P. Susmann Oct 2022

Bayesian Hierarchical Temporal Modeling And Targeted Learning With Application To Reproductive Health, Herbert P. Susmann

Doctoral Dissertations

The international community via the United Nations Sustainable Development Goals has set the target of universal access to reproductive health-care services, including family planning, by 2030. Progress towards reaching this goal is assessed by tracking appropriate demographic and health indicators at national and subnational levels. This task is challenging, however, in populations where relevant data are limited or of low quality. Statistical models are then needed to estimate and project demographic and health indicators in populations based on the available data. Our first contribution, in Chapter 1, is to unify many existing demographic and health indicator models by proposing an …


Statistical Methods To Study Transposon Sequencing Data: Nonparametric Bayesian Models With Sampling Algorithms, Shai He Oct 2022

Statistical Methods To Study Transposon Sequencing Data: Nonparametric Bayesian Models With Sampling Algorithms, Shai He

Doctoral Dissertations

As the development of Next Generation Sequencing(NGS) technology, researchers can easily obtain data from millions of cells( bulk samples) or just collecting data from a single cell. However, while bulk samples can capture broad changes, it may risk providing an average measurement that is not representative of the genetic state of any individual cell. While single-cell experiments can capture the genetic state of the individual cell, a single cell sample can increase uncertainty, sampling enough cells to gain a representative sample of population is expensive. Therefore, there is a need to integrate information from both bulk and single-cell data to …


Gaussian Graphical Models For Omics Data: New Methodology And Applications, Katherine H. Shutta Mar 2022

Gaussian Graphical Models For Omics Data: New Methodology And Applications, Katherine H. Shutta

Doctoral Dissertations

Gaussian graphical models (GGMs) are useful network estimation tools for modeling direct dependencies that characterize multivariate data. The GGM modeling framework is one way to elucidate complex systems-level properties that can be difficult to detect in univariate analyses. In this dissertation, we begin by presenting a tutorial and review of the current state of the field of GGM theory and application. Next, we present a motivating application of GGMs in a study of metabolomic networks associated with chronic distress in women in the Women's Health Initiative (WHI) and in the Nurses' Health Study cohorts. In the third chapter, we present …


Impact Of Loss To Follow-Up And Time Parameterization In Multiple-Period Cluster Randomized Trials And Assessing The Association Between Institution Affiliation And Journal Publication, Jonathan Moyer Mar 2022

Impact Of Loss To Follow-Up And Time Parameterization In Multiple-Period Cluster Randomized Trials And Assessing The Association Between Institution Affiliation And Journal Publication, Jonathan Moyer

Doctoral Dissertations

Difference-in-difference cluster randomized trials (CRTs) use baseline and post-test measurements. Standard power equations for these trials assume no loss to follow-up. We present a general equation for calculating treatment effect variance in difference-in-difference CRTs, with special cases assuming loss to follow-up with replacement of lost participants and loss to follow-up with no replacement but retaining the baseline measurements of all participants. Multiple-period CRTs can represent time as continuous using random coefficients (RC) or categorical using repeated measures ANOVA (RM-ANOVA) analytic models. Previous work recommends the use of RC over RM-ANOVA for CRTs with more than two periods because RC exhibited …


Methods To Improve Inference From Dependent Network Data, Dongah Kim Feb 2022

Methods To Improve Inference From Dependent Network Data, Dongah Kim

Doctoral Dissertations

Over the past decade, network research has increased dramatically. Network data are used in many fields because they contain not only covariates of each observation, but also `relationships' between observations. Therefore, statistical analysis of network data has been rapidly developed. However, network data presents many challenges, such as collecting network data, inferring the prevalence of an outcome of interest, and valid statistical testing typically with highly dependent data. The methods discussed in this thesis are developed to improve statistical inference from dependent network data.


High-Dimensional Feature Selection And Multi-Level Causal Mediation Analysis With Applications To Human Aging And Cluster-Based Intervention Studies, Hachem Saddiki Oct 2021

High-Dimensional Feature Selection And Multi-Level Causal Mediation Analysis With Applications To Human Aging And Cluster-Based Intervention Studies, Hachem Saddiki

Doctoral Dissertations

Many questions in public health and medicine are fundamentally causal in that our objective is to learn the effect of some exposure, randomized or not, on an outcome of interest. As a result, causal inference frameworks and methodologies have gained interest as a promising tool to reliably answer scientific questions. However, the tasks of identifying and efficiently estimating causal effects from observed data still pose significant challenges under complex data generating scenarios. We focus on (1) high-dimensional settings where the number of variables is orders of magnitude higher than the number of observations; and (2) multi-level settings, where study participants …


Monitoring Mammals At Multiple Scales: Case Studies From Carnivore Communities, Kadambari Devarajan Oct 2021

Monitoring Mammals At Multiple Scales: Case Studies From Carnivore Communities, Kadambari Devarajan

Doctoral Dissertations

Carnivores are distributed widely and threatened by habitat loss, poaching, climate change, and disease. They are considered integral to ecosystem function through their direct and indirect interactions with species at different trophic levels. Given the importance of carnivores, it is of high conservation priority to understand the processes driving carnivore assemblages in different systems. It is thus essential to determine the abiotic and biotic drivers of carnivore community composition at different spatial scales and address the following questions: (i) What factors influence carnivore community composition and diversity? (ii) How do the factors influencing carnivore communities vary across spatial and temporal …


Measurement Invariance Across Immigrant And Non-Immigrant Populations On Pisa Cognitive And Non-Cognitive Scales, Maritza Casas Oct 2021

Measurement Invariance Across Immigrant And Non-Immigrant Populations On Pisa Cognitive And Non-Cognitive Scales, Maritza Casas

Doctoral Dissertations

International large-scale educational assessments (ILSAs) have played a relevant role in educational policies targeting immigrant students across countries as their results are used by governments as input for decision-making purposes. Given the potential impact that ILSAs can have, the psychometric features of these assessments must be carefully assessed and empirical evidence about the extent to which the inferences made based on test results are valid must be collected. To do so, the first step is to determine if the test results have the same meaning across countries and groups of examinees that is, if the measures are invariant so that …


Using Generalizability And Rasch Measurement Theory To Ensure Rigorous Measurement In An International Development Education Evaluation, Louise Bahry Oct 2021

Using Generalizability And Rasch Measurement Theory To Ensure Rigorous Measurement In An International Development Education Evaluation, Louise Bahry

Doctoral Dissertations

Between the United States and Great Britain, over 30 billion USD was spent in 2018 on international aid, over a billion of which is dedicated to education programs alone. Recently, there has been increased attention on the rigorous evaluation of aid-funded programs, moving beyond counting outputs to the measurement of educational impact. The current study uses two methodological approaches (Generalizability (Brennan, 1992, 2001) and Rasch Measurement Theory (Andrich, 1978; Rasch, 1980; Wright & Masters, 1982) to analyze data from math and literacy assessments, and self-report surveys used in an international evaluation of an educational initiative in the Democratic Republic of …


Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang Jul 2021

Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang

Doctoral Dissertations

In the process of statistical modeling, the descriptive modeling plays an essential role in accelerating the formulation of plausible hypotheses in the subsequent explanatory modeling and facilitating the selection of potential variables in the subsequent predictive modeling. Especially, for multivariate categorical data analysis, it is desirable to use the descriptive modeling methods for uncovering and summarizing the potential association structure among multiple categorical variables in a compact manner. However, many classical methods in this case either rely on strong assumptions for parametric models or become infeasible when the data dimension is higher. To this end, we propose a model-free method …


Geometric Representation Learning, Luke Vilnis Apr 2021

Geometric Representation Learning, Luke Vilnis

Doctoral Dissertations

Vector embedding models are a cornerstone of modern machine learning methods for knowledge representation and reasoning. These methods aim to turn semantic questions into geometric questions by learning representations of concepts and other domain objects in a lower-dimensional vector space. In that spirit, this work advocates for density- and region-based representation learning. Embedding domain elements as geometric objects beyond a single point enables us to naturally represent breadth and polysemy, make asymmetric comparisons, answer complex queries, and provides a strong inductive bias when labeled data is scarce. We present a model for word representation using Gaussian densities, enabling asymmetric entailment …


Interacting Effects Of Climate And Biotic Factors On Mesocarnivore Distribution And Snowshoe Hare Demography Along The Boreal-Temperate Ecotone, Alexej P. Siren Jul 2020

Interacting Effects Of Climate And Biotic Factors On Mesocarnivore Distribution And Snowshoe Hare Demography Along The Boreal-Temperate Ecotone, Alexej P. Siren

Doctoral Dissertations

The motivation of my dissertation research was to understand the influence of climate and biotic factors on range limits with a focus on winter-adapted species, including the Canada lynx (Lynx canadensis), American marten (Martes americana), and snowshoe hare (Lepus americanus). I investigated range dynamics along the boreal-temperate ecotone of the northeastern US. Through an integrative literature review, I developed a theoretical framework building from existing thinking on range limits and ecological theory. I used this theory for my second chapter to evaluate direct and indirect causes of carnivore range limits in the northeastern US, …


Latent Class Models For At-Risk Populations, Shuaimin Kang Jul 2020

Latent Class Models For At-Risk Populations, Shuaimin Kang

Doctoral Dissertations

Clustering Network Tree Data From Respondent-Driven Sampling With Application to Opioid Users in New York City There is great interest in finding meaningful subgroups of attributed network data. There are many available methods for clustering complete network. Unfortunately, much network data is collected through sampling, and therefore incomplete. Respondent-driven sampling (RDS) is a widely used method for sampling hard-to-reach human populations based on tracing links in the underlying unobserved social network. The resulting data therefore have tree structure representing a sub-sample of the network, along with many nodal attributes. In this paper, we introduce an approach to adjust mixture models …


The Limits Of Location Privacy In Mobile Devices, Keen Yuun Sung Jul 2020

The Limits Of Location Privacy In Mobile Devices, Keen Yuun Sung

Doctoral Dissertations

Mobile phones are widely adopted by users across the world today. However, the privacy implications of persistent connectivity are not well understood. This dissertation focuses on one important concern of mobile phone users: location privacy. I approach this problem from the perspective of three adversaries that users are exposed to via smartphone apps: the mobile advertiser, the app developer, and the cellular service provider. First, I quantify the proportion of mobile users who use location permissive apps and are able to be tracked through their advertising identifier, and demonstrate a mark and recapture attack that allows continued tracking of users …


Bayesian Methods For The Assessment Of Reporting Errors For Data-Sparse Population-Periods With Applications To Estimating Mortality, Emily Peterson Mar 2020

Bayesian Methods For The Assessment Of Reporting Errors For Data-Sparse Population-Periods With Applications To Estimating Mortality, Emily Peterson

Doctoral Dissertations

Population level mortality data is often subject to substantial reporting errors due to misclassification of cause of death, misclassification of death status, or age reporting errors. Accuracy of error-prone data sources can be assessed by comparing such data to gold standard data for the same population-period. We present Bayesian methods for assessing the extent of reporting errors across different population-periods and generalizing those to settings where gold-standard data are lacking. Firstly, we investigate misclassification errors of maternal cause of death reporting in civil registration vital statistics data. We use a Bayesian hierarchical bivariate random-walk model to estimate country-year specific sensitivity …


Nanoindentation Characterization Of Elastic Properties Of Shales And Swelling Clay Minerals, Shengmin Luo Mar 2020

Nanoindentation Characterization Of Elastic Properties Of Shales And Swelling Clay Minerals, Shengmin Luo

Doctoral Dissertations

Oil and gas shales are a class of multiscale, multiphase, hybrid inorganic-organic sedimentary rocks that consist of a generally uniform, preferentially oriented clay matrix with randomly embedded silt and sand particles as solid inclusions. A thorough understanding of the mechanical properties of shales is crucial for the exploration and production of oil and gas in the unconventional shale reservoirs, but it can be a challenging task due to their nature of compositional heterogeneity and microstructural anisotropy. In efforts to better characterize the mechanical properties of shales across different length scales and to fundamentally understand the laws of upscaling from individual …


Joint Asymptotics For Smoothing Spline Semiparametric Nonlinear Models, Jiahui Yu Oct 2019

Joint Asymptotics For Smoothing Spline Semiparametric Nonlinear Models, Jiahui Yu

Doctoral Dissertations

We study the joint asymptotics of general smoothing spline semiparametric models in the settings of density estimation and regression. We provide a systematic framework which incorporates many existing models as special cases, and further allows for nonlinear relationships between the finite-dimensional Euclidean parameter and the infinite-dimensional functional parameter. For both density estimation and regression, we establish the local existence and uniqueness of the penalized likelihood estimators for our proposed models. In the density estimation setting, we prove joint consistency and obtain the rates of convergence of the joint estimator in an appropriate norm. The convergence rate of the parametric component …


Characterization Of The Anomalous Ph Of Aqueous Nanoemulsions, Kieran P. Ramos Oct 2019

Characterization Of The Anomalous Ph Of Aqueous Nanoemulsions, Kieran P. Ramos

Doctoral Dissertations

Aqueous water-in-oil nanoemulsions have emerged as a versatile tool for use in microfluidics, drug delivery, single-molecule measurements, and other research. Nanoemulsions are often prepared with perfluorocarbons which are remarkably biocompatbile due to their stability, low surface tension, lipophobicity, and hydrophobicity. Therefore it is often assumed that droplet contents are unperturbed by the perfluorinated surface. However, in microemulsions, which are similar to nanoemulsions, it is known that either the pH of the aqueous phase or the ionization constants of encapsulated molecules are different from bulk solution. There is also recent evidence of low pH in perfluorinated aqueous nanoemulsions. The current underlying …


Function And Dissipation In Finite State Automata - From Computing To Intelligence And Back, Natesh Ganesh Oct 2019

Function And Dissipation In Finite State Automata - From Computing To Intelligence And Back, Natesh Ganesh

Doctoral Dissertations

Society has benefited from the technological revolution and the tremendous growth in computing powered by Moore's law. However, we are fast approaching the ultimate physical limits in terms of both device sizes and the associated energy dissipation. It is important to characterize these limits in a physically grounded and implementation-agnostic manner, in order to capture the fundamental energy dissipation costs associated with performing computing operations with classical information in nano-scale quantum systems. It is also necessary to identify and understand the effect of quantum in-distinguishability, noise, and device variability on these dissipation limits. Identifying these parameters is crucial to designing …


Model-Form Uncertainty Quantification For Predictive Probabilistic Graphical Models, Jinchao Feng Oct 2019

Model-Form Uncertainty Quantification For Predictive Probabilistic Graphical Models, Jinchao Feng

Doctoral Dissertations

In this thesis, we focus on Uncertainty Quantification and Sensitivity Analysis, which can provide performance guarantees for predictive models built with both aleatoric and epistemic uncertainties, as well as data, and identify which components in a model have the most influence on predictions of our quantities of interest. In the first part (Chapter 2), we propose non-parametric methods for both local and global sensitivity analysis of chemical reaction models with correlated parameter dependencies. The developed mathematical and statistical tools are applied to a benchmark Langmuir competitive adsorption model on a close packed platinum surface, whose parameters, estimated from quantum-scale computations, …


Allocative Poisson Factorization For Computational Social Science, Aaron Schein Jul 2019

Allocative Poisson Factorization For Computational Social Science, Aaron Schein

Doctoral Dissertations

Social science data often comes in the form of high-dimensional discrete data such as categorical survey responses, social interaction records, or text. These data sets exhibit high degrees of sparsity, missingness, overdispersion, and burstiness, all of which present challenges to traditional statistical modeling techniques. The framework of Poisson factorization (PF) has emerged in recent years as a natural way to model high-dimensional discrete data sets. This framework assumes that each observed count in a data set is a Poisson random variable $y ~ Pois(\mu)$ whose rate parameter $\mu$ is a function of shared model parameters. This thesis examines a specific …


Methods For Making Policy-Relevant Forecasts Of Infectious Disease Incidence, Stephen A. Lauer Jul 2019

Methods For Making Policy-Relevant Forecasts Of Infectious Disease Incidence, Stephen A. Lauer

Doctoral Dissertations

Infectious diseases place an enormous burden on the people of the developing world and their governments. When, where, and how to allocate resources in order to slow the spread of a virus or deal with the aftermath of an outbreak is often the responsibility of local public health officials. In this thesis, we develop statistical methods for forecasting future incidence of infectious diseases and estimating the effects of interventions designed to reduce future incidence, bearing in mind the needs and concerns of those public health officials. While most infectious disease forecasting models focus on short-term horizons (i.e. weeks or …


Population Viability And Connectivity Of The Federally Threatened Eastern Indigo Snake In Central Peninsular Florida, Javan Bauder Mar 2019

Population Viability And Connectivity Of The Federally Threatened Eastern Indigo Snake In Central Peninsular Florida, Javan Bauder

Doctoral Dissertations

Understanding the factors influencing the likelihood of persistence of real-world populations requires both an accurate understanding of the traits and behaviors of individuals within those populations (e.g., movement, habitat selection, survival, fecundity, dispersal) but also an understanding of how those traits and behaviors are influenced by landscape features. The federally threatened eastern indigo snake (EIS, Drymarchon couperi) has declined throughout its range primarily due to anthropogenically-induced habitat loss and fragmentation making spatially-explicit assessments of population viability and connectivity essential for understanding its current status and directing future conservation efforts. The primary goal of my dissertation was to understand how …


Quantile Regression For Survival Data With Delayed Entry, Boqin Sun Nov 2018

Quantile Regression For Survival Data With Delayed Entry, Boqin Sun

Doctoral Dissertations

Delayed entry arises frequently in follow-up studies for survival outcomes, where additional study subjects enter during the study period. We propose a quantile regression model to analyze survival data subject to delayed entry and right-censoring. Such a model offers flexibility in assessing covariate effects on survival outcome and the regression coefficients are interpretable as direct effects on the event time. Under the conditional independent censoring assumption, we proposed a weighted martingale-based estimating equation, and formulated the solution finding as a $\ell_1$-type convex optimization problem, which was solved through a linear programming algorithm. We established uniform consistency and weak convergence of …


Variational Approximations For Density Deconvolution, Yue Chang Nov 2018

Variational Approximations For Density Deconvolution, Yue Chang

Doctoral Dissertations

This thesis considers the problem of density estimation when the variables of interest are subject to measurement error. The measurement error is assumed to be additive and homoscedastic. We specify the density of interest by a Dirichlet Process Mixture Model and establish variational approximation approaches to the density deconvolution problem. Gaussian and Laplacian error distributions are considered, which are representatives of supersmooth and ordinary smooth distributions, respectively. We develop two variational approximation algorithms for Gaussian error deconvolution and one variational approximation algorithm for Laplacian error deconvolution. Their performances are compared to deconvoluting kernels and Monte Carlo Markov Chain method by …


Model-Based Predictive Analytics For Additive And Smart Manufacturing, Zhuo Yang Oct 2018

Model-Based Predictive Analytics For Additive And Smart Manufacturing, Zhuo Yang

Doctoral Dissertations

Qualification and certification for additive and smart manufacturing systems can be uncertain and very costly. Using available historical data can mitigate some costs of producing and testing sample parts. However, use of such data lacks the flexibility to represent specific new problems which decreases predictive accuracy and efficiency. To address these compelling needs, in this dissertation modeling techniques are introduced that can proactively estimate results expected from additive and smart manufacturing processes swiftly and with practical levels of accuracy and reliability. More specifically, this research addresses the current challenges and limitations posed by use of available data and the high …