Biostatistics | Open Access Articles | Digital Commons Network™

Addition To Pglr Chap 6, Joseph M. Hilbe

Joseph M Hilbe

Addition to Chapter 6 in Practical Guide to Logistic Regression. Added section on Bayesian logistic regression using Stata.

Go to article

Testing Homogeneity In Semiparametric Mixture Case-Control Models, C Z. Di, G Kc Chan, C Zheng, Ky Liang

Chongzhi Di

Recently, Qin and Liang (Biometrics, 2011) considered a semiparametric mixture case-control model and proposed a score test for homogeneity. The mixture model is semiparametric in the sense that the density ratio of two distributions is assumed to be of exponential form, while the baseline density is unspecified. In a family of parametric admixture models, Di and Liang (Biometrics, 2011) showed that the likelihood ratio test statistics, which is equivalent to a supremum statistics, could improve power over score tests. We generalize the likelihood ratio or supremum statistics to the semiparametric mixture model and demonstrate the power gain over the score …

Go to article

Online Variational Bayes Inference For High-Dimensional Correlated Data, Sylvie T. Kabisa, Jeffrey S. Morris, David Dunson

Jeffrey S. Morris

High-dimensional data with hundreds of thousands of observations are becoming commonplace in many disciplines. The analysis of such data poses many computational challenges, especially when the observations are correlated over time and/or across space. In this paper we propose exible hierarchical regression models for analyzing such data that accommodate serial and/or spatial correlation. We address the computational challenges involved in fitting these models by adopting an approximate inference framework. We develop an online variational Bayes algorithm that works by incrementally reading the data into memory one portion at a time. The performance of the method is assessed through simulation studies. …

Go to article

Functional Car Models For Spatially Correlated Functional Datasets, Lin Zhang, Veerabhadran Baladandayuthapani, Hongxiao Zhu, Keith A. Baggerly, Tadeusz Majewski, Bogdan Czerniak, Jeffrey S. Morris

Jeffrey S. Morris

We develop a functional conditional autoregressive (CAR) model for spatially correlated data for which functions are collected on areal units of a lattice. Our model performs functional response regression while accounting for spatial correlations with potentially nonseparable and nonstationary covariance structure, in both the space and functional domains. We show theoretically that our construction leads to a CAR model at each functional location, with spatial covariance parameters varying and borrowing strength across the functional domain. Using basis transformation strategies, the nonseparable spatial-functional model is computationally scalable to enormous functional datasets, generalizable to different basis functions, and can be used on …

Go to article

Bayesian Function-On-Function Regression For Multi-Level Functional Data, Mark J. Meyer, Brent A. Coull, Francesco Versace, Paul Cinciripini, Jeffrey S. Morris

Jeffrey S. Morris

Medical and public health research increasingly involves the collection of more and more complex and high dimensional data. In particular, functional data|where the unit of observation is a curve or set of curves that are finely sampled over a grid -- is frequently obtained. Moreover, researchers often sample multiple curves per person resulting in repeated functional measures. A common question is how to analyze the relationship between two functional variables. We propose a general function-on-function regression model for repeatedly sampled functional data, presenting a simple model as well as a more extensive mixed model framework, along with multiple functional posterior …

Go to article

Functional Regression, Jeffrey S. Morris

Jeffrey S. Morris

Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the development of this field, which has accelerated in the past 10 years to become one of the fastest growing areas of statistics, fueled by the growing number of applications yielding this type of data. One unique characteristic of FDA is the need to combine information both across and within functions, which Ramsay and …

Go to article

Ordinal Probit Wavelet-Based Functional Models For Eqtl Analysis, Mark J. Meyer, Jeffrey S. Morris, Craig P. Hersh, Jarret D. Morrow, Christoph Lange, Brent A. Coull

Jeffrey S. Morris

Current methods for conducting expression Quantitative Trait Loci (eQTL) analysis are limited in scope to a pairwise association testing between a single nucleotide polymorphism (SNPs) and expression probe set in a region around a gene of interest, thus ignoring the inherent between-SNP correlation. To determine association, p-values are then typically adjusted using Plug-in False Discovery Rate. As many SNPs are interrogated in the region and multiple probe-sets taken, the current approach requires the fitting of a large number of models. We propose to remedy this by introducing a flexible function-on-scalar regression that models the genome as a functional outcome. The …

Go to article

Estimating Controlled Direct Effects Of Restrictive Feeding Practices In The `Early Dieting In Girls' Study, Yeying Zhu, Debashis Ghosh, Donna L. Coffman, Jennifer S. Williams

Debashis Ghosh

In this article, we examine the causal effect of parental restrictive feeding practices on children’s weight status. An important mediator we are interested in is children’s self-regulation status. Traditional mediation analysis (Baron and Kenny, 1986) applies a structural equation modelling (SEM) approach and decomposes the intent-to-treat (ITT) effect into direct and indirect effects. More recent approaches interpret the mediation effects based on the potential outcomes framework. In practice, there often exist confounders that jointly influence the mediator and the outcome. Inverse probability weighting based on propensity scores are used to adjust for confounding and reduce the dimensionality of confounders simultaneously. …

Go to article

A General Approach To Goodness Of Fit For U Processes, Debashis Ghosh, Youngjoo Cho

Debashis Ghosh

Goodness of fit procedures are essential tools for assessing model adequacy in statistics. In this work, we present a general theory and approach to goodness of fit techniques based on U-processes for the accelerated failure time (AFT) model. Many of the examples will focus on U-statistics of order 2. While many authors have proposed goodness of fit tests for U-statistics of order one, less has been developed for higher order U-statistics. In this paper, we propose goodness of fit tests for U-statistics of order 2 by using theoretical results from Nolan and Pollard (1987) and Nolan and Pollard (1988). We …

Go to article

A Boosting Algorithm For Estimating Generalized Propensity Scores With Continuous Treatments, Yeying Zhu, Donna L. Coffman, Debashis Ghosh

Debashis Ghosh

In this article, we study the causal inference problem with a continuous treatment variable using propensity score-based methods. For a continuous treatment, the generalized propensity score is defined as the conditional density of the treatment-level given covariates (confounders). The dose–response function is then estimated by inverse probability weighting, where the weights are calculated from the estimated propensity scores. When the dimension of the covariates is large, the traditional nonparametric density estimation suffers from the curse of dimensionality. Some researchers have suggested a two-step estimation procedure by first modeling the mean function. In this study, we suggest a boosting algorithm to …

Go to article

Testing Hypotheses About Medical Test Accuracy: Considerations For Design And Inference, Adam J. Branscum, Dunlei Cheng, J Jack Lee

Dunlei Cheng

Developing new medical tests and identifying single biomarkers or panels of biomarkers with superior accuracy over existing classifiers promotes lifelong health of individuals and populations. Before a medical test can be routinely used in clinical practice, its accuracy within diseased and non-diseased populations must be rigorously evaluated. We introduce a method for sample size determination for studies designed to test hypotheses about medical test or biomarker sensitivity and specificity. We show how a sample size can be determined to guard against making type I and/or type II errors by calculating Bayes factors from multiple data sets simulated under null and/or …

Go to article

Statistical Power In Parallel Group Point Exposure Studies With Time-To-Event Outcomes: An Empirical Comparison Of The Performance Of Randomized Controlled Trials And The Inverse Probability Of Treatment Weighting (Iptw) Approach, Peter Austin, Tibor Schuster, Robert W. Platt

Peter Austin

Background: Estimating statistical power is an important component of the design of both randomized controlled trials (RCTs) and observational studies. Methods for estimating statistical power in RCTs have been well described and can be implemented simply. In observational studies, statistical methods must be used to remove the effects of confounding that can occur due to non-random treatment assignment. Inverse probability of treatment weighting (IPTW) using the propensity score is an attractive method for estimating the effects of treatment using observational data. However, sample size and power calculations have not been adequately described for these methods.

Methods: We used an extensive …

Go to article

Moving Towards Best Practice When Using Inverse Probability Of Treatment Weighting (Iptw) Using The Propensity Score To Estimate Causal Treatment Effects In Observational Studies, Peter Austin, Elizabeth Stuart

Peter Austin

The propensity score is defined as a subject’s probability of treatment selection, conditional on observed baseline covariates.Weighting subjects by the inverse probability of treatment received creates a synthetic sample in which treatment assignment is independent of measured baseline covariates. Inverse probability of treatment weighting (IPTW) using the propensity score allows one to obtain unbiased estimates of average treatment effects. However, these estimates are only valid if there are no residual systematic differences in observed baseline characteristics between treated and control subjects in the sample weighted by the estimated inverse probability of treatment. We report on a systematic literature review, in …

Go to article

Bayesian Joint Selection Of Genes And Pathways: Applications In Multiple Myeloma Genomics, Lin Zhang, Jeffrey S. Morris, Jiexin Zhang, Robert Orlowski, Veerabhadran Baladandayuthapani

Jeffrey S. Morris

It is well-established that the development of a disease, especially cancer, is a complex process that results from the joint effects of multiple genes involved in various molecular signaling pathways. In this article, we propose methods to discover genes and molecular pathways significantly associ- ated with clinical outcomes in cancer samples. We exploit the natural hierarchal structure of genes related to a given pathway as a group of interacting genes to conduct selection of both pathways and genes. We posit the problem in a hierarchical structured variable selection (HSVS) framework to analyze the corresponding gene expression data. HSVS methods conduct …

Go to article

Sas Macro: Testing Marginal Homogeneity In Clustered Matched-Pair Data, Zhao Yang

Zhao (Tony) Yang, Ph.D.

The SAS Macro and simulated data example are used to demonstrate the application of tests for marginal homogeneity in clustered matched-pair data.

Go to article

Sas Macro: Weighted Kappa Statistic For Clustered Matched-Pair Ordinal Data, Zhao Yang

Zhao (Tony) Yang, Ph.D.

This SAS macro calculate the weighted kappa statistic and its corresponding non-parametric variance estimator for the clustered matched-pair ordinal data.

Go to article

Sas Macro: Kappa Statistic For Clustered Physician-Patients Polytomous Data, Zhao Yang

Zhao (Tony) Yang, Ph.D.

This SAS macro calculate the kappa statistic and its semi-parametric variance estimator for the clustered physician-patients polytomous data. The proposed method depends on the assumption of conditional independence for the clustered physician-patients data structure.

Go to article

Combining Biomarkers Linearly And Nonlinearly For Classification Using The Area Under The Roc Curve, Youyi Fong, Shuxin Yin, Ying Huang

Youyi Fong

In biomedical studies, it is often of interest to classify/predict a subject's disease status based on some biomarker measurements. Two approaches have received a lot of attention in the biostatistical literature for finding optimal biomarker combinations using a training data. The likelihood approach maximizes logistic regression model likelihood, while the AUC (area under the receiver operating characteristic curve) approach maximizes the empirical AUC based on biomarker combination. The two approaches are complementary to each other in practice. Existing methods in the AUC approach either approximate the empirical AUC by a smooth function or replace it with a convex upper bound. …

Go to article

Causal Models And Learning From Data: Integrating Causal Modeling And Statistical Estimation, Maya Petersen, M J. Van Der Laan

Maya Petersen

No abstract provided.

Go to article

Targeted Maximum Likelihood Estimation For Dynamic And Static Longitudinal Marginal Structural Working Models, Maya Petersen, J Schwab, S Gruber, N Blaser, M Schomaker, M J. Van Der Laan

Maya Petersen

No abstract provided.

Go to article

Multiple Comparison Procedures For Neuroimaging Genomewide Association Studies, Wen-Yu Hua, Thomas E. Nichols, Debashis Ghosh

Debashis Ghosh

Recent research in neuroimaging has been focusing on assessing associations between genetic variants measured on a genomewide scale and brain imaging phenotypes. Many publications in the area use massively univariate analyses on a genomewide basis for finding single nucleotide polymorphisms that influence brain structure. In this work, we propose using various dimensionalityreduction methods on both brain MRI scans and genomic data, motivated by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study. We also consider a new multiple testing adjustments inspired from the idea of local false discovery rate of Efron and others (2001). Our proposed procedure is able to find associations …

Go to article

On Likelihood Ratio Tests When Nuisance Parameters Are Present Only Under The Alternative, Cz Di, K-Y Liang

Chongzhi Di

In parametric models, when one or more parameters disappear under the null hypothesis, the likelihood ratio test statistic does not converge to chi-square distributions. Rather, its limiting distribution is shown to be equivalent to that of the supremum of a squared Gaussian process. However, the limiting distribution is analytically intractable for most of examples, and approximation or simulation based methods must be used to calculate the p values. In this article, we investigate conditions under which the asymptotic distributions have analytically tractable forms, based on the principal component decomposition of Gaussian processes. When these conditions are not satisfied, the principal …

Go to article

Hypothesis Testing For An Extended Cox Model With Time-Varying Coefficients, Takumi Saegusa, Chongzhi Di, Ying Qing Chen

Chongzhi Di

In many randomized clinical trials, the log-rank test has routinely been used to detect a treatment effect under the Cox proportional hazards model for censored time-to-event outcomes. However, it may lose power substantially when the proportional hazards assumption does not hold. There are approaches to testing the proportionality, such as the smoothing spline-based score test by Lin, Zhang and Davidian (2006). In this paper, we consider an extended Cox model assuming time-varying treatment effect. We then use smoothing splines to model the time-varying treatment effect, and we propose spline-based score tests for the overall treatment effect. Our proposed tests take …

Go to article

Multilevel Sparse Functional Principal Component Analysis, Cz Di, C M. Crainiceanu, W Jank

Chongzhi Di

We consider analysis of sparsely sampled multilevel functional data, where the basic observational unit is a function and data have a natural hierarchy of basic units. An example is when functions are recorded at multiple visits for each subject. Multilevel functional principal component analysis (MFPCA; Di et al. 2009) was proposed for such data when functions are densely recorded. Here we consider the case when functions are sparsely sampled and may contain only a few observations per function. We exploit the multilevel structure of covariance operators and achieve data reduction by principal component decompositions at both between and within subject …

Go to article

A Comparison Of 12 Algorithms For Matching On The Propensity Score, Peter C. Austin

Peter Austin

Propensity-score matching is increasingly being used to reduce the confounding that can occur in observational studies examining the effects of treatments or interventions on outcomes. We used Monte Carlo simulations to examine the following algorithms for forming matched pairs of treated and untreated subjects: optimal matching, greedy nearest neighbor matching without replacement, and greedy nearest neighbor matching without replacement within specified caliper widths. For each of the latter two algorithms, we examined four different sub-algorithms defined by the order in which treated subjects were selected for matching to an untreated subject: lowest to highest propensity score, highest to lowest propensity …

Go to article

The Use Of Propensity Score Methods With Survival Or Time-To-Event Outcomes: Reporting Measures Of Effect Similar To Those Used In Randomized Experiments, Peter C. Austin

Peter Austin

Propensity score methods are increasingly being used to estimate causal treatment effects in observational studies. In medical and epidemiological studies, outcomes are frequently time-to-event in nature. Propensity-score methods are often applied incorrectly when estimating the effect of treatment on time-to-event outcomes. This article describes how two different propensity score methods (matching and inverse probability of treatment weighting) can be used to estimate the measures of effect that are frequently reported in randomized controlled trials: (i) marginal survival curves, which describe survival in the population if all subjects were treated or if all subjects were untreated; and (ii) marginal hazard ratios. …

Go to article

The Performance Of Different Propensity Score Methods For Estimating Absolute Effects Of Treatments On Survival Outcomes: A Simulation Study, Peter C. Austin

Peter Austin

Observational studies are increasingly being used to estimate the effect of treatments, interventions and exposures on outcomes that can occur over time. Historically, the hazard ratio, which is a relative measure of effect, has been reported. However, medical decision making is best informed when both relative and absolute measures of effect are reported. When outcomes are time-to-event in nature, the effect of treatment can also be quantified as the change in mean or median survival time due to treatment and the absolute reduction in the probability of the occurrence of an event within a specified duration of follow-up. We describe …

Go to article

A Study Of Mexican Free-Tailed Bat Chirp Syllables: Bayesian Functional Mixed Modeling Of Nonstationary Time Series Data With Time-Dependent Spectra, Josue G. Martinez, Kirsten M. Bohn, Raymond J. Carroll, Jeffrey S. Morris

Jeffrey S. Morris

We describe a new approach to analyze chirp syllables of free-tailed bats from two regions of Texas in which they are predominant: Austin and College Station. Our goal is to characterize any systematic regional differences in the mating chirps and assess whether individual bats have signature chirps. The data are analyzed by modeling spectrograms of the chirps as responses in a Bayesian functional mixed model. Given the variable chirp lengths, we compute the spectrograms on a relative time scale interpretable as the relative chirp position, using a variable window overlap based on chirp length. We use 2D wavelet transforms to …

Go to article

Global Quantitative Assessment Of The Colorectal Polyp Burden In Familial Adenomatous Polyposis Using A Web-Based Tool, Patrick M. Lynch, Jeffrey S. Morris, William A. Ross, Miguel A. Rodriguez-Bigas, Juan Posadas, Rossa Khalaf, Diane M. Weber, Valerie O. Sepeda, Bernard Levin, Imad Shureiqi

Jeffrey S. Morris

Background: Accurate measures of the total polyp burden in familial adenomatous polyposis (FAP) are lacking. Current assessment tools include polyp quantitation in limited-field photographs and qualitative total colorectal polyp burden by video.

Objective: To develop global quantitative tools of the FAP colorectal adenoma burden.

Design: A single-arm, phase II trial.

Patients: Twenty-seven patients with FAP.

Intervention: Treatment with celecoxib for 6 months, with before-treatment and after-treatment videos posted to an intranet with an interactive site for scoring.

Main Outcome Measurements: Global adenoma counts and sizes (grouped into categories: less than 2 mm, 2-4 mm, and greater than 4 mm) were …

Go to article

Sas Macro: Kappa Statistic For Clustered Matched-Pair Data, Zhao Yang

Zhao (Tony) Yang, Ph.D.

The SAS macro was developed to calculate the kappa statistic for the clustered matched-pair data.

Go to article

Full-Text Articles in Biostatistics

Addition To Pglr Chap 6, Joseph M. Hilbe

Joseph M Hilbe

Testing Homogeneity In Semiparametric Mixture Case-Control Models, C Z. Di, G Kc Chan, C Zheng, Ky Liang

Chongzhi Di

Online Variational Bayes Inference For High-Dimensional Correlated Data, Sylvie T. Kabisa, Jeffrey S. Morris, David Dunson

Jeffrey S. Morris

Functional Car Models For Spatially Correlated Functional Datasets, Lin Zhang, Veerabhadran Baladandayuthapani, Hongxiao Zhu, Keith A. Baggerly, Tadeusz Majewski, Bogdan Czerniak, Jeffrey S. Morris

Jeffrey S. Morris

Bayesian Function-On-Function Regression For Multi-Level Functional Data, Mark J. Meyer, Brent A. Coull, Francesco Versace, Paul Cinciripini, Jeffrey S. Morris

Jeffrey S. Morris

Functional Regression, Jeffrey S. Morris

Jeffrey S. Morris

Ordinal Probit Wavelet-Based Functional Models For Eqtl Analysis, Mark J. Meyer, Jeffrey S. Morris, Craig P. Hersh, Jarret D. Morrow, Christoph Lange, Brent A. Coull

Jeffrey S. Morris

Estimating Controlled Direct Effects Of Restrictive Feeding Practices In The `Early Dieting In Girls' Study, Yeying Zhu, Debashis Ghosh, Donna L. Coffman, Jennifer S. Williams

Debashis Ghosh

A General Approach To Goodness Of Fit For U Processes, Debashis Ghosh, Youngjoo Cho

Debashis Ghosh

A Boosting Algorithm For Estimating Generalized Propensity Scores With Continuous Treatments, Yeying Zhu, Donna L. Coffman, Debashis Ghosh

Debashis Ghosh

Testing Hypotheses About Medical Test Accuracy: Considerations For Design And Inference, Adam J. Branscum, Dunlei Cheng, J Jack Lee

Dunlei Cheng

Statistical Power In Parallel Group Point Exposure Studies With Time-To-Event Outcomes: An Empirical Comparison Of The Performance Of Randomized Controlled Trials And The Inverse Probability Of Treatment Weighting (Iptw) Approach, Peter Austin, Tibor Schuster, Robert W. Platt

Peter Austin

Moving Towards Best Practice When Using Inverse Probability Of Treatment Weighting (Iptw) Using The Propensity Score To Estimate Causal Treatment Effects In Observational Studies, Peter Austin, Elizabeth Stuart

Peter Austin

Bayesian Joint Selection Of Genes And Pathways: Applications In Multiple Myeloma Genomics, Lin Zhang, Jeffrey S. Morris, Jiexin Zhang, Robert Orlowski, Veerabhadran Baladandayuthapani

Jeffrey S. Morris

Sas Macro: Testing Marginal Homogeneity In Clustered Matched-Pair Data, Zhao Yang

Zhao (Tony) Yang, Ph.D.

Sas Macro: Weighted Kappa Statistic For Clustered Matched-Pair Ordinal Data, Zhao Yang

Zhao (Tony) Yang, Ph.D.

Sas Macro: Kappa Statistic For Clustered Physician-Patients Polytomous Data, Zhao Yang

Zhao (Tony) Yang, Ph.D.

Combining Biomarkers Linearly And Nonlinearly For Classification Using The Area Under The Roc Curve, Youyi Fong, Shuxin Yin, Ying Huang

Youyi Fong

Causal Models And Learning From Data: Integrating Causal Modeling And Statistical Estimation, Maya Petersen, M J. Van Der Laan

Maya Petersen

Targeted Maximum Likelihood Estimation For Dynamic And Static Longitudinal Marginal Structural Working Models, Maya Petersen, J Schwab, S Gruber, N Blaser, M Schomaker, M J. Van Der Laan

Maya Petersen

Multiple Comparison Procedures For Neuroimaging Genomewide Association Studies, Wen-Yu Hua, Thomas E. Nichols, Debashis Ghosh

Debashis Ghosh

On Likelihood Ratio Tests When Nuisance Parameters Are Present Only Under The Alternative, Cz Di, K-Y Liang

Chongzhi Di

Hypothesis Testing For An Extended Cox Model With Time-Varying Coefficients, Takumi Saegusa, Chongzhi Di, Ying Qing Chen

Chongzhi Di

Multilevel Sparse Functional Principal Component Analysis, Cz Di, C M. Crainiceanu, W Jank

Chongzhi Di

A Comparison Of 12 Algorithms For Matching On The Propensity Score, Peter C. Austin

Peter Austin

The Use Of Propensity Score Methods With Survival Or Time-To-Event Outcomes: Reporting Measures Of Effect Similar To Those Used In Randomized Experiments, Peter C. Austin

Peter Austin

The Performance Of Different Propensity Score Methods For Estimating Absolute Effects Of Treatments On Survival Outcomes: A Simulation Study, Peter C. Austin

Peter Austin

A Study Of Mexican Free-Tailed Bat Chirp Syllables: Bayesian Functional Mixed Modeling Of Nonstationary Time Series Data With Time-Dependent Spectra, Josue G. Martinez, Kirsten M. Bohn, Raymond J. Carroll, Jeffrey S. Morris

Jeffrey S. Morris

Global Quantitative Assessment Of The Colorectal Polyp Burden In Familial Adenomatous Polyposis Using A Web-Based Tool, Patrick M. Lynch, Jeffrey S. Morris, William A. Ross, Miguel A. Rodriguez-Bigas, Juan Posadas, Rossa Khalaf, Diane M. Weber, Valerie O. Sepeda, Bernard Levin, Imad Shureiqi

Jeffrey S. Morris

Sas Macro: Kappa Statistic For Clustered Matched-Pair Data, Zhao Yang

Zhao (Tony) Yang, Ph.D.