Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 15 of 15

Full-Text Articles in Statistics and Probability

Observational Studies In Group Testing And Potential Applications., Alexander Christopher Noll May 2021

Observational Studies In Group Testing And Potential Applications., Alexander Christopher Noll

Electronic Theses and Dissertations

The use of group testing to identify individuals with targeted outcomes in a population can greatly improve the efficiency, speed, and cost effectiveness of testing a population for an outcome, or at least for identifying the prevalence of an outcome in a population. The implementation of causal inference techniques can provide the basis for an observational study that would allow an investigator to gather estimates for treatment effectiveness if group testing was conducted on the population in a certain way. This thesis examines a simulation of the above outlined principles in order to demonstrate a potential application for determining treatment …


Dot: Gene-Set Analysis By Combining Decorrelated Association Statistics, Olga A. Vsevolozhskaya, Min Shi, Fengjiao Hu, Dmitri V. Zaykin Apr 2020

Dot: Gene-Set Analysis By Combining Decorrelated Association Statistics, Olga A. Vsevolozhskaya, Min Shi, Fengjiao Hu, Dmitri V. Zaykin

Biostatistics Faculty Publications

Historically, the majority of statistical association methods have been designed assuming availability of SNP-level information. However, modern genetic and sequencing data present new challenges to access and sharing of genotype-phenotype datasets, including cost of management, difficulties in consolidation of records across research groups, etc. These issues make methods based on SNP-level summary statistics particularly appealing. The most common form of combining statistics is a sum of SNP-level squared scores, possibly weighted, as in burden tests for rare variants. The overall significance of the resulting statistic is evaluated using its distribution under the null hypothesis. Here, we demonstrate that this basic …


Spatio-Temporal Cluster Detection And Local Moran Statistics Of Point Processes, Jennifer L. Matthews Apr 2019

Spatio-Temporal Cluster Detection And Local Moran Statistics Of Point Processes, Jennifer L. Matthews

Mathematics & Statistics Theses & Dissertations

Moran's index is a statistic that measures spatial dependence, quantifying the degree of dispersion or clustering of point processes and events in some location/area. Recognizing that a single Moran's index may not give a sufficient summary of the spatial autocorrelation measure, a local indicator of spatial association (LISA) has gained popularity. Accordingly, we propose extending LISAs to time after partitioning the area and computing a Moran-type statistic for each subarea. Patterns between the local neighbors are unveiled that would not otherwise be apparent. We consider the measures of Moran statistics while incorporating a time factor under simulated multilevel Palm distribution, …


Comparison Of The Performance Of Simple Linear Regression And Quantile Regression With Non-Normal Data: A Simulation Study, Marjorie Howard Jan 2018

Comparison Of The Performance Of Simple Linear Regression And Quantile Regression With Non-Normal Data: A Simulation Study, Marjorie Howard

Theses and Dissertations

Linear regression is a widely used method for analysis that is well understood across a wide variety of disciplines. In order to use linear regression, a number of assumptions must be met. These assumptions, specifically normality and homoscedasticity of the error distribution can at best be met only approximately with real data. Quantile regression requires fewer assumptions, which offers a potential advantage over linear regression. In this simulation study, we compare the performance of linear (least squares) regression to quantile regression when these assumptions are violated, in order to investigate under what conditions quantile regression becomes the more advantageous method …


Simulating Longer Vectors Of Correlated Binary Random Variables Via Multinomial Sampling, Justine Shults Mar 2016

Simulating Longer Vectors Of Correlated Binary Random Variables Via Multinomial Sampling, Justine Shults

UPenn Biostatistics Working Papers

The ability to simulate correlated binary data is important for sample size calculation and comparison of methods for analysis of clustered and longitudinal data with dichotomous outcomes. One available approach for simulating length n vectors of dichotomous random variables is to sample from the multinomial distribution of all possible length n permutations of zeros and ones. However, the multinomial sampling method has only been implemented in general form (without first making restrictive assumptions) for vectors of length 2 and 3, because specifying the multinomial distribution is very challenging for longer vectors. I overcome this difficulty by presenting an algorithm for …


Assessing The Probability That A Finding Is Genuine For Large-Scale Genetic Association Studies, Chia-Ling Kuo, Olga A. Vsevolozhskaya, Dmitri V. Zaykin May 2015

Assessing The Probability That A Finding Is Genuine For Large-Scale Genetic Association Studies, Chia-Ling Kuo, Olga A. Vsevolozhskaya, Dmitri V. Zaykin

Olga A. Vsevolozhskaya

Genetic association studies routinely involve massive numbers of statistical tests accompanied by P-values. Whole genome sequencing technologies increased the potential number of tested variants to tens of millions. The more tests are performed, the smaller P-value is required to be deemed significant. However, a small P-value is not equivalent to small chances of a spurious finding and significance thresholds may fail to serve as efficient filters against false results. While the Bayesian approach can provide a direct assessment of the probability that a finding is spurious, its adoption in association studies has been slow, due in part to the ubiquity …


Functional Analysis Of Variance For Association Studies, Olga A. Vsevolozhskaya, Dmitri V. Zaykin, Mark C. Greenwood, Changshuai Wei, Qing Lu Sep 2014

Functional Analysis Of Variance For Association Studies, Olga A. Vsevolozhskaya, Dmitri V. Zaykin, Mark C. Greenwood, Changshuai Wei, Qing Lu

Olga A. Vsevolozhskaya

While progress has been made in identifying common genetic variants associated with human diseases, for most of common complex diseases, the identified genetic variants only account for a small proportion of heritability. Challenges remain in finding additional unknown genetic variants predisposing to complex diseases. With the advance in next-generation sequencing technologies, sequencing studies have become commonplace in genetic research. The ongoing exome-sequencing and whole-genome-sequencing studies generate a massive amount of sequencing variants and allow researchers to comprehensively investigate their role in human diseases. The discovery of new disease-associated variants can be enhanced by utilizing powerful and computationally efficient statistical methods. …


In Silico Surveillance: Evaluating Outbreak Detection With Simulation Models, Bryan Lewis, Stephen Eubank, Allyson M. Abrams, Ken Kleinman Jan 2013

In Silico Surveillance: Evaluating Outbreak Detection With Simulation Models, Bryan Lewis, Stephen Eubank, Allyson M. Abrams, Ken Kleinman

Public Health Department Faculty Publication Series

Background

Detecting outbreaks is a crucial task for public health officials, yet gaps remain in the systematic evaluation of outbreak detection protocols. The authors’ objectives were to design, implement, and test a flexible methodology for generating detailed synthetic surveillance data that provides realistic geographical and temporal clustering of cases and use to evaluate outbreak detection protocols.

Methods

A detailed representation of the Boston area was constructed, based on data about individuals, locations, and activity patterns. Influenza-like illness (ILI) transmission was simulated, producing 100 years ofin silico ILI data. Six different surveillance systems were designed and developed using gathered cases …


Sample Size Calculations For Roc Studies: Parametric Robustness And Bayesian Nonparametrics, Dunlei Cheng, Adam J. Branscum, Wesley O. Johnson Jan 2012

Sample Size Calculations For Roc Studies: Parametric Robustness And Bayesian Nonparametrics, Dunlei Cheng, Adam J. Branscum, Wesley O. Johnson

Dunlei Cheng

Methods for sample size calculations in ROC studies often assume independent normal distributions for test scores among the diseased and non-diseased populations. We consider sample size requirements under the default two-group normal model when the data distribution for the diseased population is either skewed or multimodal. For these two common scenarios we investigate the potential for robustness of calculated sample sizes under the mis-specified normal model and we compare to sample sizes calculated under a more flexible nonparametric Dirichlet process mixture model. We also highlight the utility of flexible models for ROC data analysis and their importance to study design. …


Accounting For Response Misclassification And Covariate Measurement Error Improves Powers And Reduces Bias In Epidemiologic Studies, Dunlei Cheng, Adam J. Branscum, James D. Stamey Jan 2010

Accounting For Response Misclassification And Covariate Measurement Error Improves Powers And Reduces Bias In Epidemiologic Studies, Dunlei Cheng, Adam J. Branscum, James D. Stamey

Dunlei Cheng

Purpose: To quantify the impact of ignoring misclassification of a response variable and measurement error in a covariate on statistical power, and to develop software for sample size and power analysis that accounts for these flaws in epidemiologic data. Methods: A Monte Carlo simulation-based procedure is developed to illustrate the differences in design requirements and inferences between analytic methods that properly account for misclassification and measurement error to those that do not in regression models for cross-sectional and cohort data. Results: We found that failure to account for these flaws in epidemiologic data can lead to a substantial reduction in …


A Bayesian Approach To Sample Size Determination For Studies Designed To Evaluate Continuous Medical Tests, Dunlei Cheng, Adam J. Branscum, James D. Stamey Jan 2010

A Bayesian Approach To Sample Size Determination For Studies Designed To Evaluate Continuous Medical Tests, Dunlei Cheng, Adam J. Branscum, James D. Stamey

Dunlei Cheng

We develop a Bayesian approach to sample size and power calculations for cross-sectional studies that are designed to evaluate and compare continuous medical tests. For studies that involve one test or two conditionally independent or dependent tests, we present methods that are applicable when the true disease status of sampled individuals will be available and when it will not. Within a hypothesis testing framework, we consider the goal of demonstrating that a medical test has area under the receiver operating characteristic (ROC) curve that exceeds a minimum acceptable level or another relevant threshold, and the goals of establishing the superiority …


Variance-Mean Relationships To Analyze Large Survey Data With Application To Health Expenditure Data, Wenli Luo Jan 2009

Variance-Mean Relationships To Analyze Large Survey Data With Application To Health Expenditure Data, Wenli Luo

Legacy Theses & Dissertations (2009 - 2024)

A great deal of work has been done in cost analysis in the last several decades. However, relatively little has been done to learn how efficiently to address the relationship between the variance and mean of the response distribution and how this will affect the choice of an appropriate generalized linear model.


Bayesian Approach To Average Power Calculations For Binary Regression Models With Misclassified Outcomes, Dunlei Cheng, James D. Stamey, Adam J. Branscum Dec 2008

Bayesian Approach To Average Power Calculations For Binary Regression Models With Misclassified Outcomes, Dunlei Cheng, James D. Stamey, Adam J. Branscum

Dunlei Cheng

We develop a simulation-based procedure for determining the required sample size in binomial regression risk assessment studies when response data are subject to misclassification. A Bayesian average power criterion is used to determine a sample size that provides high probability, averaged over the distribution of potential future data sets, of correctly establishing the direction of association between predictor variables and the probability of event occurrence. The method is broadly applicable to any parametric binomial regression model including, but not limited to, the popular logistic, probit, and complementary log-log models. We detail a common medical scenario wherein ascertainment of true disease …


Combining Information From Two Surveys To Estimate County-Level Prevalence Rates Of Cancer Risk Factors And Screening, Trivellore E. Raghuanthan, Dawei Xie, Nathaniel Schenker, Van Parsons, William W. Davis, Kevin W. Dodd, Eric J. Feuer May 2006

Combining Information From Two Surveys To Estimate County-Level Prevalence Rates Of Cancer Risk Factors And Screening, Trivellore E. Raghuanthan, Dawei Xie, Nathaniel Schenker, Van Parsons, William W. Davis, Kevin W. Dodd, Eric J. Feuer

The University of Michigan Department of Biostatistics Working Paper Series

Cancer surveillance requires estimates of the prevalence of cancer risk factors and screening for small areas such as counties. Two popular data sources are the Behavioral Risk Factor Surveillance System (BRFSS), a telephone survey conducted by state agencies, and the National Health Interview Survey (NHIS), an area probability sample survey conducted through face-to-face interviews. Both data sources have advantages and disadvantages. The BRFSS is a larger survey, and almost every county is included in the survey; but it has lower response rates as is typical with telephone surveys, and it does not include subjects who live in households with no …


A Comparison For Longitudinal Data Missing Due To Truncation, Rong Liu Jan 2006

A Comparison For Longitudinal Data Missing Due To Truncation, Rong Liu

Theses and Dissertations

Many longitudinal clinical studies suffer from patient dropout. Often the dropout is nonignorable and the missing mechanism needs to be incorporated in the analysis. The methods handling missing data make various assumptions about the missing mechanism, and their utility in practice depends on whether these assumptions apply in a specific application. Ramakrishnan and Wang (2005) proposed a method (MDT) to handle nonignorable missing data, where missing is due to the observations exceeding an unobserved threshold. Assuming that the observations arise from a truncated normal distribution, they suggested an EM algorithm to simplify the estimation.In this dissertation the EM algorithm is …