Open Access. Powered by Scholars. Published by Universities.®

Statistical Methodology Commons

Open Access. Powered by Scholars. Published by Universities.®

University of Massachusetts Amherst

Discipline
Keyword
Publication Year
Publication
Publication Type

Articles 1 - 22 of 22

Full-Text Articles in Statistical Methodology

Nonparametric Derivative Estimation Using Penalized Splines: Theory And Application, Bright Antwi Boasiako Nov 2023

Nonparametric Derivative Estimation Using Penalized Splines: Theory And Application, Bright Antwi Boasiako

Doctoral Dissertations

This dissertation is in the field of Nonparametric Derivative Estimation using
Penalized Splines. It is conducted in two parts. In the first part, we study the L2
convergence rates of estimating derivatives of mean regression functions using penalized splines. In 1982, Stone provided the optimal rates of convergence for estimating derivatives of mean regression functions using nonparametric methods. Using these rates, Zhou et. al. in their 2000 paper showed that the MSE of derivative estimators based on regression splines approach zero at the optimal rate of convergence. Also, in 2019, Xiao showed that, under some general conditions, penalized spline estimators …


Inverse Probability Weighting In Survival Analysis And Network Analysis, Yukun Lu Feb 2023

Inverse Probability Weighting In Survival Analysis And Network Analysis, Yukun Lu

Doctoral Dissertations

Inverse probability weighting is a popular technique to accommodate selection bias due to non-random sampling and missing data. In the first chapter, we develop an inverse probability weighted estimator and an augmented inverse probability weighted estimator of regression coefficients for a linear model with randomly censored covariates, when the censoring mechanism may be dependent on the outcome. We investigate the asymptotic properties of both estimators and evaluate their finite sample performance through extensive simulation studies. We apply the proposed methods to an Alzheimer’s disease study. In the second chapter, we present an application of network analysis in a study of …


Methods To Improve Inference From Dependent Network Data, Dongah Kim Feb 2022

Methods To Improve Inference From Dependent Network Data, Dongah Kim

Doctoral Dissertations

Over the past decade, network research has increased dramatically. Network data are used in many fields because they contain not only covariates of each observation, but also `relationships' between observations. Therefore, statistical analysis of network data has been rapidly developed. However, network data presents many challenges, such as collecting network data, inferring the prevalence of an outcome of interest, and valid statistical testing typically with highly dependent data. The methods discussed in this thesis are developed to improve statistical inference from dependent network data.


Monitoring Mammals At Multiple Scales: Case Studies From Carnivore Communities, Kadambari Devarajan Oct 2021

Monitoring Mammals At Multiple Scales: Case Studies From Carnivore Communities, Kadambari Devarajan

Doctoral Dissertations

Carnivores are distributed widely and threatened by habitat loss, poaching, climate change, and disease. They are considered integral to ecosystem function through their direct and indirect interactions with species at different trophic levels. Given the importance of carnivores, it is of high conservation priority to understand the processes driving carnivore assemblages in different systems. It is thus essential to determine the abiotic and biotic drivers of carnivore community composition at different spatial scales and address the following questions: (i) What factors influence carnivore community composition and diversity? (ii) How do the factors influencing carnivore communities vary across spatial and temporal …


Using Generalizability And Rasch Measurement Theory To Ensure Rigorous Measurement In An International Development Education Evaluation, Louise Bahry Oct 2021

Using Generalizability And Rasch Measurement Theory To Ensure Rigorous Measurement In An International Development Education Evaluation, Louise Bahry

Doctoral Dissertations

Between the United States and Great Britain, over 30 billion USD was spent in 2018 on international aid, over a billion of which is dedicated to education programs alone. Recently, there has been increased attention on the rigorous evaluation of aid-funded programs, moving beyond counting outputs to the measurement of educational impact. The current study uses two methodological approaches (Generalizability (Brennan, 1992, 2001) and Rasch Measurement Theory (Andrich, 1978; Rasch, 1980; Wright & Masters, 1982) to analyze data from math and literacy assessments, and self-report surveys used in an international evaluation of an educational initiative in the Democratic Republic of …


Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang Jul 2021

Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang

Doctoral Dissertations

In the process of statistical modeling, the descriptive modeling plays an essential role in accelerating the formulation of plausible hypotheses in the subsequent explanatory modeling and facilitating the selection of potential variables in the subsequent predictive modeling. Especially, for multivariate categorical data analysis, it is desirable to use the descriptive modeling methods for uncovering and summarizing the potential association structure among multiple categorical variables in a compact manner. However, many classical methods in this case either rely on strong assumptions for parametric models or become infeasible when the data dimension is higher. To this end, we propose a model-free method …


Latent Class Models For At-Risk Populations, Shuaimin Kang Jul 2020

Latent Class Models For At-Risk Populations, Shuaimin Kang

Doctoral Dissertations

Clustering Network Tree Data From Respondent-Driven Sampling With Application to Opioid Users in New York City There is great interest in finding meaningful subgroups of attributed network data. There are many available methods for clustering complete network. Unfortunately, much network data is collected through sampling, and therefore incomplete. Respondent-driven sampling (RDS) is a widely used method for sampling hard-to-reach human populations based on tracing links in the underlying unobserved social network. The resulting data therefore have tree structure representing a sub-sample of the network, along with many nodal attributes. In this paper, we introduce an approach to adjust mixture models …


Joint Asymptotics For Smoothing Spline Semiparametric Nonlinear Models, Jiahui Yu Oct 2019

Joint Asymptotics For Smoothing Spline Semiparametric Nonlinear Models, Jiahui Yu

Doctoral Dissertations

We study the joint asymptotics of general smoothing spline semiparametric models in the settings of density estimation and regression. We provide a systematic framework which incorporates many existing models as special cases, and further allows for nonlinear relationships between the finite-dimensional Euclidean parameter and the infinite-dimensional functional parameter. For both density estimation and regression, we establish the local existence and uniqueness of the penalized likelihood estimators for our proposed models. In the density estimation setting, we prove joint consistency and obtain the rates of convergence of the joint estimator in an appropriate norm. The convergence rate of the parametric component …


Allocative Poisson Factorization For Computational Social Science, Aaron Schein Jul 2019

Allocative Poisson Factorization For Computational Social Science, Aaron Schein

Doctoral Dissertations

Social science data often comes in the form of high-dimensional discrete data such as categorical survey responses, social interaction records, or text. These data sets exhibit high degrees of sparsity, missingness, overdispersion, and burstiness, all of which present challenges to traditional statistical modeling techniques. The framework of Poisson factorization (PF) has emerged in recent years as a natural way to model high-dimensional discrete data sets. This framework assumes that each observed count in a data set is a Poisson random variable $y ~ Pois(\mu)$ whose rate parameter $\mu$ is a function of shared model parameters. This thesis examines a specific …


Variational Approximations For Density Deconvolution, Yue Chang Nov 2018

Variational Approximations For Density Deconvolution, Yue Chang

Doctoral Dissertations

This thesis considers the problem of density estimation when the variables of interest are subject to measurement error. The measurement error is assumed to be additive and homoscedastic. We specify the density of interest by a Dirichlet Process Mixture Model and establish variational approximation approaches to the density deconvolution problem. Gaussian and Laplacian error distributions are considered, which are representatives of supersmooth and ordinary smooth distributions, respectively. We develop two variational approximation algorithms for Gaussian error deconvolution and one variational approximation algorithm for Laplacian error deconvolution. Their performances are compared to deconvoluting kernels and Monte Carlo Markov Chain method by …


Real-Time Dengue Forecasting In Thailand: A Comparison Of Penalized Regression Approaches Using Internet Search Data, Caroline Kusiak Oct 2018

Real-Time Dengue Forecasting In Thailand: A Comparison Of Penalized Regression Approaches Using Internet Search Data, Caroline Kusiak

Masters Theses

Dengue fever affects over 390 million people annually worldwide and is of particu- lar concern in Southeast Asia where it is one of the leading causes of hospitalization. Modeling trends in dengue occurrence can provide valuable information to Public Health officials, however many challenges arise depending on the data available. In Thailand, reporting of dengue cases is often delayed by more than 6 weeks, and a small fraction of cases may not be reported until over 11 months after they occurred. This study shows that incorporating data on Google Search trends can improve dis- ease predictions in settings with severely …


Statistical Methods On Risk Management Of Extreme Events, Zijing Zhang Jul 2017

Statistical Methods On Risk Management Of Extreme Events, Zijing Zhang

Doctoral Dissertations

The goal of the dissertation is the investigation of financial risk analysis methodologies, using the schemes for extreme value modeling as well as techniques from copula modeling. Extreme value theory is concerned with probabilistic and statistical questions re- lated to unusual behavior or rare events. The subject has a rich mathematical theory and also a long tradition of applications in a variety of areas. We are interested in its application in risk management, with a focus on estimating and forcasting the Value-at-Risk of financial time series data. Extremal data are inherently scarce, thus making inference challenging. In order to obtain …


Statistical Methods For High Dimensional Data Arising From Large Epidemiological Studies, Hui Xu Jul 2017

Statistical Methods For High Dimensional Data Arising From Large Epidemiological Studies, Hui Xu

Doctoral Dissertations

In this thesis, we propose statistical models for addressing commonly encountered data types and study designs in large epidemiologic investigations aimed at understanding the molecular basis of complex disorders. The motivating applications come from diverse disease areas in Women's Health, including the study of type II diabetes in the Women's Health Initiative (WHI), invasive breast cancer in the Nurses' Health Study and the study of the metabolomic underpinnings of cardiovascular disease in the WHI. We have also put significant effort into making the implementation of the proposed methods accessible through freely available, user-friendly software packages in R. The first chapter …


Inference From Network Data In Hard-To-Reach Populations, Isabelle Beaudry Mar 2017

Inference From Network Data In Hard-To-Reach Populations, Isabelle Beaudry

Doctoral Dissertations

The objective of this thesis is to develop methods to make inference about the prevalence of an outcome of interest in hard-to-reach populations. The proposed methods address issues specific to the survey strategies employed to access those populations. One of the common sampling methodology used in this context is respondent-driven sampling (RDS). Under RDS, the network connecting members of the target population is used to uncover the hidden members. Specialized techniques are then used to make inference from the data collected in this fashion. Our first objective is to correct traditional RDS prevalence estimators and their associated uncertainty estimators for …


Intrinsic Functions For Securing Cmos Computation: Variability, Modeling And Noise Sensitivity, Xiaolin Xu Nov 2016

Intrinsic Functions For Securing Cmos Computation: Variability, Modeling And Noise Sensitivity, Xiaolin Xu

Doctoral Dissertations

A basic premise behind modern secure computation is the demand for lightweight cryptographic primitives, like identifier or key generator. From a circuit perspective, the development of cryptographic modules has also been driven by the aggressive scalability of complementary metal-oxide-semiconductor (CMOS) technology. While advancing into nano-meter regime, one significant characteristic of today's CMOS design is the random nature of process variability, which limits the nominal circuit design. With the continuous scaling of CMOS technology, instead of mitigating the physical variability, leveraging such properties becomes a promising way. One of the famous products adhering to this double-edged sword philosophy is the Physically …


Identifying Examinees Who Possess Distinct And Reliable Subscores When Added Value Is Lacking For The Total Sample, Joseph A. Rios Nov 2016

Identifying Examinees Who Possess Distinct And Reliable Subscores When Added Value Is Lacking For The Total Sample, Joseph A. Rios

Doctoral Dissertations

Research has demonstrated that although subdomain information may provide no added value beyond the total score, in some contexts such information is of utility to particular demographic subgroups (Sinharay & Haberman, 2014). However, it is argued that the utility of reporting subscores for an individual should not be based on one’s manifest characteristics (e.g., gender or ethnicity), but rather on individual needs for diagnostic information, which is driven by multidimensionality in subdomain scores. To improve the validity of diagnostic information, this study proposed the use of Mahalanobis Distance and HT indices to assess whether an individual’s data significantly departs …


Niche-Based Modeling Of Japanese Stiltgrass (Microstegium Vimineum) Using Presence-Only Information, Nathan Bush Nov 2015

Niche-Based Modeling Of Japanese Stiltgrass (Microstegium Vimineum) Using Presence-Only Information, Nathan Bush

Masters Theses

The Connecticut River watershed is experiencing a rapid invasion of aggressive non-native plant species, which threaten watershed function and structure. Volunteer-based monitoring programs such as the University of Massachusetts’ OutSmart Invasives Species Project, Early Detection Distribution Mapping System (EDDMapS) and the Invasive Plant Atlas of New England (IPANE) have gathered valuable invasive plant data. These programs provide a unique opportunity for researchers to model invasive plant species utilizing citizen-sourced data. This study took advantage of these large data sources to model invasive plant distribution and to determine environmental and biophysical predictors that are most influential in dispersion, and to identify …


Variable Selection In Single Index Varying Coefficient Models With Lasso, Peng Wang Nov 2015

Variable Selection In Single Index Varying Coefficient Models With Lasso, Peng Wang

Doctoral Dissertations

Single index varying coefficient model is a very attractive statistical model due to its ability to reduce dimensions and easy-of-interpretation. There are many theoretical studies and practical applications with it, but typically without features of variable selection, and no public software is available for solving it. Here we propose a new algorithm to fit the single index varying coefficient model, and to carry variable selection in the index part with LASSO. The core idea is a two-step scheme which alternates between estimating coefficient functions and selecting-and-estimating the single index. Both in simulation and in application to a Geoscience dataset, we …


Threat Analysis, Countermeaures And Design Strategies For Secure Computation In Nanometer Cmos Regime, Raghavan Kumar Nov 2015

Threat Analysis, Countermeaures And Design Strategies For Secure Computation In Nanometer Cmos Regime, Raghavan Kumar

Doctoral Dissertations

Advancements in CMOS technologies have led to an era of Internet Of Things (IOT), where the devices have the ability to communicate with each other apart from their computational power. As more and more sensitive data is processed by embedded devices, the trend towards lightweight and efficient cryptographic primitives has gained significant momentum. Achieving a perfect security in silicon is extremely difficult, as the traditional cryptographic implementations are vulnerable to various active and passive attacks. There is also a threat in the form of "hardware Trojans" inserted into the supply chain by the untrusted third-party manufacturers for economic incentives. Apart …


Robust Optimization Of Biological Protocols, Patrick Flaherty, Ronald W. Davis Jan 2015

Robust Optimization Of Biological Protocols, Patrick Flaherty, Ronald W. Davis

Mathematics and Statistics Department Faculty Publication Series

When conducting high-throughput biological experiments, it is often necessary to develop a protocol that is both inexpensive and robust. Standard approaches are either not cost-effective or arrive at an optimized protocol that is sensitive to experimental variations. Here, we describe a novel approach that directly minimizes the cost of the protocol while ensuring the protocol is robust to experimental variation. Our approach uses a risk-averse conditional value-at-risk criterion in a robust parameter design framework. We demonstrate this approach on a polymerase chain reaction protocol and show that our improved protocol is less expensive than the standard protocol and more robust …


Determinants Of Health Care Use Among Rural, Low-Income Mothers And Children: A Simultaneous Systems Approach To Negative Binomial Regression Modeling, Swetha Valluri Jan 2011

Determinants Of Health Care Use Among Rural, Low-Income Mothers And Children: A Simultaneous Systems Approach To Negative Binomial Regression Modeling, Swetha Valluri

Masters Theses 1911 - February 2014

The determinants of health care use among rural, low-income mothers and their children were assessed using a multi-state, longitudinal data set, Rural Families Speak. The results indicate that rural mothers’ decisions regarding health care utilization for themselves and for their child can be best modeled using a simultaneous systems approach to negative binomial regression. Mothers’ visits to a health care provider increased with higher self-assessed depression scores, increased number of child’s doctor visits, greater numbers of total children in the household, greater numbers of chronic conditions, need for prenatal or post-partum care, development of a new medical condition, and …


Dynamic Model Pooling Methodology For Improving Aberration Detection Algorithms, Brenton J. Sellati Jan 2010

Dynamic Model Pooling Methodology For Improving Aberration Detection Algorithms, Brenton J. Sellati

Masters Theses 1911 - February 2014

Syndromic surveillance is defined generally as the collection and statistical analysis of data which are believed to be leading indicators for the presence of deleterious activities developing within a system. Conceptually, syndromic surveillance can be applied to any discipline in which it is important to know when external influences manifest themselves in a system by forcing it to depart from its baseline. Comparing syndromic surveillance systems have led to mixed results, where models that dominate in one performance metric are often sorely deficient in another. This results in a zero-sum trade off where one performance metric must be afforded greater …