Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 18 of 18

Full-Text Articles in Statistics and Probability

Assessing The Probability That A Finding Is Genuine For Large-Scale Genetic Association Studies, Chia-Ling Kuo, Olga A. Vsevolozhskaya, Dmitri V. Zaykin May 2015

Assessing The Probability That A Finding Is Genuine For Large-Scale Genetic Association Studies, Chia-Ling Kuo, Olga A. Vsevolozhskaya, Dmitri V. Zaykin

Olga A. Vsevolozhskaya

Genetic association studies routinely involve massive numbers of statistical tests accompanied by P-values. Whole genome sequencing technologies increased the potential number of tested variants to tens of millions. The more tests are performed, the smaller P-value is required to be deemed significant. However, a small P-value is not equivalent to small chances of a spurious finding and significance thresholds may fail to serve as efficient filters against false results. While the Bayesian approach can provide a direct assessment of the probability that a finding is spurious, its adoption in association studies has been slow, due in part to the ubiquity …


Functional Analysis Of Variance For Association Studies, Olga A. Vsevolozhskaya, Dmitri V. Zaykin, Mark C. Greenwood, Changshuai Wei, Qing Lu Sep 2014

Functional Analysis Of Variance For Association Studies, Olga A. Vsevolozhskaya, Dmitri V. Zaykin, Mark C. Greenwood, Changshuai Wei, Qing Lu

Olga A. Vsevolozhskaya

While progress has been made in identifying common genetic variants associated with human diseases, for most of common complex diseases, the identified genetic variants only account for a small proportion of heritability. Challenges remain in finding additional unknown genetic variants predisposing to complex diseases. With the advance in next-generation sequencing technologies, sequencing studies have become commonplace in genetic research. The ongoing exome-sequencing and whole-genome-sequencing studies generate a massive amount of sequencing variants and allow researchers to comprehensively investigate their role in human diseases. The discovery of new disease-associated variants can be enhanced by utilizing powerful and computationally efficient statistical methods. …


Simulating Univariate And Multivariate Tukey G-And-H Distributions Based On The Method Of Percentiles, Tzu-Chun Kou, Todd C. Headrick Jan 2014

Simulating Univariate And Multivariate Tukey G-And-H Distributions Based On The Method Of Percentiles, Tzu-Chun Kou, Todd C. Headrick

Todd Christopher Headrick

This paper derives closed-form solutions for the 𝑔-and-ℎ shape parameters associated with the Tukey family of distributions based on the method of percentiles (MOP). A proposed MOP univariate procedure is described and compared with the method of moments (MOM) in the context of distribution fitting and estimating skew and kurtosis functions. The MOP methodology is also extended from univariate to multivariate data generation. A procedure is described for simulating nonnormal distributions with specified Spearman correlations. The MOP procedure has an advantage over the MOM because it does not require numerical integration to compute intermediate correlations. Simulation results demonstrate that the …


Simulating Non-Normal Distributions With Specified L-Moments And L-Correlations, Todd C. Headrick, Mohan D. Pant May 2012

Simulating Non-Normal Distributions With Specified L-Moments And L-Correlations, Todd C. Headrick, Mohan D. Pant

Mohan Dev Pant

This paper derives a procedure for simulating continuous non-normal distributions with specified L-moments and L-correlations in the context of power method polynomials of order three. It is demonstrated that the proposed procedure has computational advantages over the traditional product-moment procedure in terms of solving for intermediate correlations. Simulation results also demonstrate that the proposed L-moment-based procedure is an attractive alternative to the traditional procedure when distributions with more severe departures from normality are considered. Specifically, estimates of L-skew and L-kurtosis are superior to the conventional estimates of skew and kurtosis in terms of both relative bias and relative standard error. …


Sample Size Calculations For Roc Studies: Parametric Robustness And Bayesian Nonparametrics, Dunlei Cheng, Adam J. Branscum, Wesley O. Johnson Jan 2012

Sample Size Calculations For Roc Studies: Parametric Robustness And Bayesian Nonparametrics, Dunlei Cheng, Adam J. Branscum, Wesley O. Johnson

Dunlei Cheng

Methods for sample size calculations in ROC studies often assume independent normal distributions for test scores among the diseased and non-diseased populations. We consider sample size requirements under the default two-group normal model when the data distribution for the diseased population is either skewed or multimodal. For these two common scenarios we investigate the potential for robustness of calculated sample sizes under the mis-specified normal model and we compare to sample sizes calculated under a more flexible nonparametric Dirichlet process mixture model. We also highlight the utility of flexible models for ROC data analysis and their importance to study design. …


Simulating Non-Normal Distributions With Specified L-Moments And L-Correlations, Todd C. Headrick, Mohan D. Pant Jan 2012

Simulating Non-Normal Distributions With Specified L-Moments And L-Correlations, Todd C. Headrick, Mohan D. Pant

Todd Christopher Headrick

This paper derives a procedure for simulating continuous non-normal distributions with specified L-moments and L-correlations in the context of power method polynomials of order three. It is demonstrated that the proposed procedure has computational advantages over the traditional product-moment procedure in terms of solving for intermediate correlations. Simulation results also demonstrate that the proposed L-moment-based procedure is an attractive alternative to the traditional procedure when distributions with more severe departures from normality are considered. Specifically, estimates of L-skew and L-kurtosis are superior to the conventional estimates of skew and kurtosis in terms of both relative bias and relative standard error. …


Data Envelopment Analysis In The Presence Of Measurement Error: Case Study From The National Database Of Nursing Quality Indicators (Ndnqi), Byron J. Gajewski, Robert Lee, Nancy Dunton Jan 2012

Data Envelopment Analysis In The Presence Of Measurement Error: Case Study From The National Database Of Nursing Quality Indicators (Ndnqi), Byron J. Gajewski, Robert Lee, Nancy Dunton

Byron J Gajewski

Data envelopment analysis (DEA) is the most commonly used approach for evaluating healthcare efficiency [B. Hollingsworth, The measurement of efficiency and productivity of health care delivery. Health Economics 17(10) (2008), pp. 1107–1128], but a long-standing concern is that DEA assumes that data are measured without error. This is quite unlikely, and DEA and other efficiency analysis techniques may yield biased efficiency estimates if it is not realized [B.J. Gajewski, R. Lee, M. Bott, U. Piamjariyakul, and R.L. Taunton, On estimating the distribution of data envelopment analysis efficiency scores: an application to nursing homes’ care planning process. Journal of Applied Statistics …


On Simulating Univariate And Multivariate Burr Type Iii And Type Xii Distributions, Todd C. Headrick, Mohan D. Pant, Yanyan Sheng Mar 2010

On Simulating Univariate And Multivariate Burr Type Iii And Type Xii Distributions, Todd C. Headrick, Mohan D. Pant, Yanyan Sheng

Mohan Dev Pant

This paper describes a method for simulating univariate and multivariate Burr Type III and Type XII distributions with specified correlation matrices. The methodology is based on the derivation of the parametric forms of a pdf and cdf for this family of distributions. The paper shows how shape parameters can be computed for specified values of skew and kurtosis. It is also demonstrated how to compute percentage points and other measures of central tendency such as the mode, median, and trimmed mean. Examples are provided to demonstrate how this Burr family can be used in the context of distribution fitting using …


Simulating Multivariate G-And-H Distributions, Rhonda K. Kowalchuk, Todd C. Headrick Jan 2010

Simulating Multivariate G-And-H Distributions, Rhonda K. Kowalchuk, Todd C. Headrick

Todd Christopher Headrick

The Tukey family of g-and-h distributions is often used to model univariate real-world data. There is a paucity of research demonstrating appropriate multivariate data generation using the g-and-h family of distributions with specified correlations. Therefore, the methodology and algorithms are presented to extend the g-and-h family from univariate to multivariate data generation. An example is provided along with a Monte Carlo simulation demonstrating the methodology. In addition, algorithms written in Mathematica 7.0 are available from the authors for implementing the procedure.


Statistical Simulation: Power Method Polynomials And Other Transformations, Todd C. Headrick Jan 2010

Statistical Simulation: Power Method Polynomials And Other Transformations, Todd C. Headrick

Todd Christopher Headrick

Although power method polynomials based on the standard normal distributions have been used in many different contexts for the past 30 years, it was not until recently that the probability density function (pdf) and cumulative distribution function (cdf) were derived and made available. Focusing on both univariate and multivariate nonnormal data generation, Statistical Simulation: Power Method Polynomials and Other Transformations presents techniques for conducting a Monte Carlo simulation study. It shows how to use power method polynomials for simulating univariate and multivariate nonnormal distributions with specified cumulants and correlation matrices. The book first explores the methodology underlying the power method, …


Accounting For Response Misclassification And Covariate Measurement Error Improves Powers And Reduces Bias In Epidemiologic Studies, Dunlei Cheng, Adam J. Branscum, James D. Stamey Jan 2010

Accounting For Response Misclassification And Covariate Measurement Error Improves Powers And Reduces Bias In Epidemiologic Studies, Dunlei Cheng, Adam J. Branscum, James D. Stamey

Dunlei Cheng

Purpose: To quantify the impact of ignoring misclassification of a response variable and measurement error in a covariate on statistical power, and to develop software for sample size and power analysis that accounts for these flaws in epidemiologic data. Methods: A Monte Carlo simulation-based procedure is developed to illustrate the differences in design requirements and inferences between analytic methods that properly account for misclassification and measurement error to those that do not in regression models for cross-sectional and cohort data. Results: We found that failure to account for these flaws in epidemiologic data can lead to a substantial reduction in …


A Bayesian Approach To Sample Size Determination For Studies Designed To Evaluate Continuous Medical Tests, Dunlei Cheng, Adam J. Branscum, James D. Stamey Jan 2010

A Bayesian Approach To Sample Size Determination For Studies Designed To Evaluate Continuous Medical Tests, Dunlei Cheng, Adam J. Branscum, James D. Stamey

Dunlei Cheng

We develop a Bayesian approach to sample size and power calculations for cross-sectional studies that are designed to evaluate and compare continuous medical tests. For studies that involve one test or two conditionally independent or dependent tests, we present methods that are applicable when the true disease status of sampled individuals will be available and when it will not. Within a hypothesis testing framework, we consider the goal of demonstrating that a medical test has area under the receiver operating characteristic (ROC) curve that exceeds a minimum acceptable level or another relevant threshold, and the goals of establishing the superiority …


Creation Of Synthetic Discrete Response Regression Models, Joseph Hilbe Jan 2010

Creation Of Synthetic Discrete Response Regression Models, Joseph Hilbe

Joseph M Hilbe

The development and use of synthetic regression models has proven to assist statisticians in better understanding bias in data, as well as how to best interpret various statistics associated with a modeling situation. In this article I present code that can be easily amended for the creation of synthetic binomial, count, and categorical response models. Parameters may be assigned to any number of predictors (which are shown as continuous, binary, or categorical), negative binomial heterogeneity parameters may be assigned, and the number of levels or cut points and values may be specified for ordered and unordered categorical response models. I …


Bayesian Approach To Average Power Calculations For Binary Regression Models With Misclassified Outcomes, Dunlei Cheng, James D. Stamey, Adam J. Branscum Dec 2008

Bayesian Approach To Average Power Calculations For Binary Regression Models With Misclassified Outcomes, Dunlei Cheng, James D. Stamey, Adam J. Branscum

Dunlei Cheng

We develop a simulation-based procedure for determining the required sample size in binomial regression risk assessment studies when response data are subject to misclassification. A Bayesian average power criterion is used to determine a sample size that provides high probability, averaged over the distribution of potential future data sets, of correctly establishing the direction of association between predictor variables and the probability of event occurrence. The method is broadly applicable to any parametric binomial regression model including, but not limited to, the popular logistic, probit, and complementary log-log models. We detail a common medical scenario wherein ascertainment of true disease …


Simulating Controlled Variate And Rank Correlations Based On The Power Method Transformation, Todd C. Headrick, Simon Y. Aman, T. Mark Beasley Dec 2007

Simulating Controlled Variate And Rank Correlations Based On The Power Method Transformation, Todd C. Headrick, Simon Y. Aman, T. Mark Beasley

Todd Christopher Headrick

The power method transformation is a popular algorithm used for simulating correlated non normal continuous variates because of its simplicity and ease of execution. Statistical models may consist of continuous and (or) ranked variates. In view of this, the methodology is derived for simulating controlled correlation structures between non normal (a) variates, (b) ranks, and (c) variates with ranks in the context of the power method. The correlation structure between variate-values and their associated rank-order is also derived for the power method. As such, a measure of the potential loss of information is provided when ranks are used in place …


The Power Method Transformation: Its Probability Density Function, Distribution Function, And Its Further Use For Fitting Data, Todd C. Headrick, Rhonda K. Kowalchuk Mar 2007

The Power Method Transformation: Its Probability Density Function, Distribution Function, And Its Further Use For Fitting Data, Todd C. Headrick, Rhonda K. Kowalchuk

Todd Christopher Headrick

The power method polynomial transformation is a popular algorithm used for simulating non-normal distributions because of its simplicity and ease of execution. The primary limitations of the power method transformation are that its probability density function (pdf) and cumulative distribution function (cdf) are unknown. In view of this, the power method’s pdf and cdf are derived in general form. More specific properties are also derived for determining if a given transformation will also have an associated pdf in the context of polynomials of order three and five. Numerical examples and parametric plots of power method densities are provided to confirm …


Fast Fifth-Order Polynomial Transforms For Generating Univariate And Multivariate Nonnormal Distributions, Todd C. Headrick Oct 2002

Fast Fifth-Order Polynomial Transforms For Generating Univariate And Multivariate Nonnormal Distributions, Todd C. Headrick

Todd Christopher Headrick

A general procedure is derived for simulating univariate and multivariate nonnormal distributions using polynomial transformations of order five. The procedure allows for the additional control of the fifth and sixth moments. The ability to control higher moments increases the precision in the approximations of nonnormal distributions and lowers the skew and kurtosis boundary relative to the competing procedures considered. Tabled values of constants are provided for approximating various probability density functions. A numerical example is worked to demonstrate the multivariate procedure. The results of a Monte Carlo simulation are provided to demonstrate that the procedure generates specified population parameters and …


Simulating Correlated Multivariate Nonnormal Distributions: Extending The Fleishman Power Method, Todd C. Headrick, Shlomo S. Sawilowsky Mar 1999

Simulating Correlated Multivariate Nonnormal Distributions: Extending The Fleishman Power Method, Todd C. Headrick, Shlomo S. Sawilowsky

Todd Christopher Headrick

A procedure for generating multivariate nonnormal distributions is proposed. Our procedure generates average values of intercorrelations much closer to population parameters than competing procedures for skewed and/or heavy tailed distributions and for small sample sizes. Also, it eliminates the necessity of conducting a factorization procedure on the population correlation matrix that underlies the random deviates, and it is simpler to code in a programming language (e.g,, FORTRAN). Numerical examples demonstrating the procedures are given. Monte Carlo results indicate our procedure yields excellent agreement between population parameters and average values of intercorrelation, skew, and kurtosis.