Statistical Methodology Commons

Open Access. Powered by Scholars. Published by Universities.

19 Institutions 511 Full-Text Articles 623 Authors 76,629 Downloads

Recent Articles in Statistical Methodology

Targeted Maximum Likelihood Estimation For Dynamic And Static Longitudinal Marginal Structural Working Models, Maya L. Petersen, Joshua Schwab, Susan Gruber, Nello Blaser, Michael Schomaker, Mark J. van der Laan COBRA

Targeted Maximum Likelihood Estimation For Dynamic And Static Longitudinal Marginal Structural Working Models, Maya L. Petersen, Joshua Schwab, Susan Gruber, Nello Blaser, Michael Schomaker, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

This paper presents a novel targeted maximum likelihood estimator (TMLE) estimator for the parameters of longitudinal static and dynamic marginal structural models.We consider a longitudinal data structure consisting of baseline covariates, time-dependent intervention nodes, intermediate time-dependent covariates, and a possibly time dependent outcome. The intervention nodes at each time point can include a binary treatment as well as a right-censoring indicator. Given a class of dynamic or static interventions, a marginal structural model is used to model the mean of the intervention specific counterfactual outcome as a function of the intervention and time point.Because the true shape of ...


Balancing Score Adjusted Targeted Minimum Loss-Based Estimation, Samuel D. Lendle, Bruce Fireman, Mark J. van der Laan COBRA

Balancing Score Adjusted Targeted Minimum Loss-Based Estimation, Samuel D. Lendle, Bruce Fireman, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Adjusting for a balancing score is sufficient for bias reduction when estimating causal effects including the average treatment effect and effect among the treated. Estimators that adjust for the propensity score in a nonparametric way, such as matching on an estimate of the propensity score, can be consistent when the estimated propensity score is not consistent for the true propensity score but converges to some other balancing score. We call this property the balancing score property, and discuss a class of estimators that have this property. We introduce a targeted minimum loss-based estimator (TMLE) for a treatment specific mean with ...


Analysis Of Spatial Data, Xiang Zhang University of Kentucky

Analysis Of Spatial Data, Xiang Zhang

Theses and Dissertations--Statistics

In many areas of the agriculture, biological, physical and social sciences, spatial lattice data are becoming increasingly common. In addition, a large amount of lattice data shows not only visible spatial pattern but also temporal pattern (see, Zhu et al. 2005). An interesting problem is to develop a model to systematically model the relationship between the response variable and possible explanatory variable, while accounting for space and time effect simultaneously.

Spatial-temporal linear model and the corresponding likelihood-based statistical inference are important tools for the analysis of spatial-temporal lattice data. We propose a general asymptotic framework for spatial-temporal linear models and ...


Optimal Tests Of Treatment Effects For The Overall Population And Two Subpopulations In Randomized Trials, Using Sparse Linear Programming, Michael Rosenblum, Han Liu, En-Hsu Yen COBRA

Optimal Tests Of Treatment Effects For The Overall Population And Two Subpopulations In Randomized Trials, Using Sparse Linear Programming, Michael Rosenblum, Han Liu, En-Hsu Yen

Johns Hopkins University, Dept. of Biostatistics Working Papers

We propose new, optimal methods for analyzing randomized trials, when it is suspected that treatment effects may differ in two predefined subpopulations. Such sub-populations could be defined by a biomarker or risk factor measured at baseline. The goal is to simultaneously learn which subpopulations benefit from an experimental treatment, while providing strong control of the familywise Type I error rate. We formalize this as a multiple testing problem and show it is computationally infeasible to solve using existing techniques. Our solution involves a novel approach, in which we first transform the original multiple testing problem into a large, sparse linear ...


Quest For Continuous Improvement: Gathering Feedback And Data Through Multiple Methods To Evaluate And Improve A Library’S Discovery Tool, Jeanne M. Brown University of Nevada, Las Vegas

Quest For Continuous Improvement: Gathering Feedback And Data Through Multiple Methods To Evaluate And Improve A Library’S Discovery Tool, Jeanne M. Brown

Presentations (Libraries)

Summon at UNLV

  • Implemented fall 2011: a web-scale discovery tool
  • Expectations for Summon
  • Continuous Summon Improvement (CSI)Group

The environment

  • User changes
  • Library changes
  • Vendor changes
  • Product changes
  • Complex information environment
  • Change + complexity = need to assess using multiple streams of feedback


A Bayesian Regression Tree Approach To Identify The Effect Of Nanoparticles Properties On Toxicity Profiles, Cecile Low-Kam, Haiyuan Zhang, Zhaoxia Ji, Tian Xia, Jeffrey I. Zinc, Andre Nel, Donatello Telesca COBRA

A Bayesian Regression Tree Approach To Identify The Effect Of Nanoparticles Properties On Toxicity Profiles, Cecile Low-Kam, Haiyuan Zhang, Zhaoxia Ji, Tian Xia, Jeffrey I. Zinc, Andre Nel, Donatello Telesca

COBRA Preprint Series

We introduce a Bayesian multiple regression tree model to characterize relationships between physico-chemical properties of nanoparticles and their in-vitro toxicity over multiple doses and times of exposure. Unlike conventional models that rely on data summaries, our model solves the low sample size issue and avoids arbitrary loss of information by combining all measurements from a general exposure experiment across doses, times of exposure, and replicates. The proposed technique integrates Bayesian trees for modeling threshold effects and interactions, and penalized B-splines for dose and time-response surfaces smoothing. The resulting posterior distribution is sampled via a Markov Chain Monte Carlo algorithm. This ...


Global Quantitative Assessment Of The Colorectal Polyp Burden In, Patrick M. Lynch, Jeffrey S. Morris, William A. Ross, Miguel A. Rodriguez-Bigas, Juan Posadas, Rossa Khalaf, Diane M. Weber, Valerie O. Sepeda, Bernard Levin, Imad Shureiqi The University of Texas

Global Quantitative Assessment Of The Colorectal Polyp Burden In, Patrick M. Lynch, Jeffrey S. Morris, William A. Ross, Miguel A. Rodriguez-Bigas, Juan Posadas, Rossa Khalaf, Diane M. Weber, Valerie O. Sepeda, Bernard Levin, Imad Shureiqi

Jeffrey S. Morris

Background: Accurate measures of the total polyp burden in familial adenomatous polyposis (FAP) are lacking. Current assessment tools include polyp quantitation in limited-field photographs and qualitative total colorectal polyp burden by video.

Objective: To develop global quantitative tools of the FAP colorectal adenoma burden.

Design: A single-arm, phase II trial.

Patients: Twenty-seven patients with FAP.

Intervention: Treatment with celecoxib for 6 months, with before-treatment and after-treatment videos posted to an intranet with an interactive site for scoring.

Main Outcome Measurements: Global adenoma counts and sizes (grouped into categories: less than 2 mm, 2-4 mm, and greater than 4 mm) were ...


Fast Covariance Estimation For High-Dimensional Functional Data, Luo Xiao, David Ruppert, Vadim Zipunnikov, Ciprian Crainiceanu COBRA

Fast Covariance Estimation For High-Dimensional Functional Data, Luo Xiao, David Ruppert, Vadim Zipunnikov, Ciprian Crainiceanu

Johns Hopkins University, Dept. of Biostatistics Working Papers

We propose a fast covariance smoothing method and associated software that scale up linearly to very large matrices. The main idea is to exploit a very fast new bivariate penalized spline smoothing approach and focus on the practicality and scalability of the method. Currently available methods and software cannot smooth covariance matrices of dimension J > 500, whereas our approach provides fast smoothing for matrices of dimension J > 10; 000. An R function, simulations, and data analysis provide ready to use, reproducible, and scalable tools for practical data analysis of noisy high-dimensional functional data.


Identification Of Biologically Relevant Subtypes Via Preweighted Sparse Clustering, Sheila Gaynor, Eric Bair COBRA

Identification Of Biologically Relevant Subtypes Via Preweighted Sparse Clustering, Sheila Gaynor, Eric Bair

The University of North Carolina at Chapel Hill Department of Biostatistics Technical Report Series

Cluster analysis methods are used to identify homogeneous subgroups in a data set. Frequently one applies cluster analysis in order to identify biologically interesting subgroups. In particular, one may wish to identify subgroups that are associated with a particular outcome of interest. Conventional clustering methods often fail to identify such subgroups, particularly when there are a large number of high-variance features in the data set. Conventional methods may identify clusters associated with these high-variance features when one wishes to obtain secondary clusters that are more interesting biologically or more strongly associated with a particular outcome of interest. We describe a ...


Likelihood Inference For Left Truncated And Right Censored Lifetime Data, Debanjan Mitra McMaster University

Likelihood Inference For Left Truncated And Right Censored Lifetime Data, Debanjan Mitra

Open Access Dissertations and Theses

Left truncation arises because in many situations, failure of a unit is observed only if it fails after a certain period. In many situations, the units under study may not be followed until all of them fail and the experimenter may have to stop at a certain time when some of the units may still be working. This introduces right censoring into the data. Some commonly used lifetime distributions are lognormal, Weibull and gamma, all of which are special cases of the flexible generalized gamma family. Likelihood inference via the Expectation Maximization (EM) algorithm is used to estimate the model ...


Statistical And Methodological Issues On Covariate Adjustment In Clinical Trials, Rong Chu McMaster University

Statistical And Methodological Issues On Covariate Adjustment In Clinical Trials, Rong Chu

Open Access Dissertations and Theses

Background and objectives

We investigate three issues related to the adjustment for baseline covariates in late phase clinical trials: (1) the analysis of correlated outcomes in multicentre RCTs, (2) the assessment of the probability and implication of prognostic imbalance in RCTs, and (3) the adjustment for baseline confounding in cohort studies.

Methods

Project 1: We investigated the properties of six statistical methods for analyzing continuous outcomes in multicentre randomized controlled trials (RCTs) where within-centre clustering was possible. We simulated studies over various intraclass correlation (ICC) values with several centre combinations.

Project 2: We simulated data from RCTs evaluating a binary ...


Group Testing Regression Models, Boan Zhang University of Nebraska - Lincoln

Group Testing Regression Models, Boan Zhang

Dissertations and Theses in Statistics

Group testing, where groups of individual specimens are composited to test for the presence or absence of a disease (or some other binary characteristic), is a procedure commonly used to reduce the costs of screening a large number of individuals. Statistical research in group testing has traditionally focused on a homogeneous population, where individuals are assumed to have the same probability of having a disease. However, individuals often have different risks of positivity, so recent research has examined regression models that allow for heterogeneity among individuals within the population. This dissertation focuses on two problems involving group testing regression models ...


Residuals In The Growth Curve Model With Applications To The Analysis Of Longitudinal Data, WEILIANG HUANG McMaster University

Residuals In The Growth Curve Model With Applications To The Analysis Of Longitudinal Data, Weiliang Huang

Open Access Dissertations and Theses

Statistical models often rely on several assumptions including distributional assumptions on outcome variables and relational assumptions where we model the relationship between outcomes and independent variables. Further assumptions are also made depending on the complexity of the data and the model being used. Model diagnostics is, therefore, a crucial component of any model fitting problem. Residuals play important roles in model diagnostics. Residuals are not only used to check adequacy of model fit, but they also are excellent tools to validate model assumptions as well as identify outliers and influential observations. Residuals in univariate models are studied extensively and are ...


A Prior-Free Framework Of Coherent Inference And Its Derivation Of Simple Shrinkage Estimators, David R. Bickel COBRA

A Prior-Free Framework Of Coherent Inference And Its Derivation Of Simple Shrinkage Estimators, David R. Bickel

COBRA Preprint Series

The reasoning behind uses of confidence intervals and p-values in scientific practice may be made coherent by modeling the inferring statistician or scientist as an idealized intelligent agent. With other things equal, such an agent regards a hypothesis coinciding with a confidence interval of a higher confidence level as more certain than a hypothesis coinciding with a confidence interval of a lower confidence level. The agent uses different methods of confidence intervals conditional on what information is available. The coherence requirement means all levels of certainty of hypotheses about the parameter agree with the same distribution of certainty over parameter ...


Package 'Morsegen': Simple Raw Data Generator Based On User-Specified Summary Statistics, Brendan J. Morse Bridgewater State University

Package 'Morsegen': Simple Raw Data Generator Based On User-Specified Summary Statistics, Brendan J. Morse

Faculty, Administrator & Staff Articles

MorseGen is a program for generating raw data based on user-specified summary (descriptive) statistics. Samples based on the supplied statistics are drawn from a normal distribution (or, in some cases, an exponential distribution) and scaled to match the desired descriptive statistics. Intended uses include creating raw data that fits desired characteristics or to replicate the results in a published study.


Methods For Shape-Constrained Kernel Density Estimation, Mark A. Wolters Western University

Methods For Shape-Constrained Kernel Density Estimation, Mark A. Wolters

Electronic Thesis and Dissertation Repository

Nonparametric density estimators are used to estimate an unknown probability density while making minimal assumptions about its functional form. Although the low reliance of nonparametric estimators on modelling assumptions is a benefit, their performance will be improved if auxiliary information about the density's shape is incorporated into the estimate. Auxiliary information can take the form of shape constraints, such as unimodality or symmetry, that the estimate must satisfy. Finding the constrained estimate is usually a difficult optimization problem, however, and a consistent framework for finding estimates across a variety of problems is lacking.

It is proposed to find shape-constrained ...


Sparse Principal Component Analysis For High-Dimensional Data: A Comparative Study, Ashley J. Bonner McMaster University

Sparse Principal Component Analysis For High-Dimensional Data: A Comparative Study, Ashley J. Bonner

Open Access Dissertations and Theses

Background: Through unprecedented advances in technology, high-dimensional datasets have exploded into many fields of observational research. For example, it is now common to expect thousands or millions of genetic variables (p) with only a limited number of study participants (n). Determining the important features proves statistically difficult, as multivariate analysis techniques become flooded and mathematically insufficient when n < p. Principal Component Analysis (PCA) is a commonly used multivariate method for dimension reduction and data visualization but suffers from these issues. A collection of Sparse PCA methods have been proposed to counter these flaws but have not been tested in comparative detail. Methods: Performances of three Sparse PCA methods were evaluated through simulations. Data was generated for 56 different data-structures, ranging p, the number of underlying groups and the variance structure within them. Estimation and interpretability of the principal components (PCs) were rigorously tested ...


On Penalized Likelihood Estimation For A Non-Proportional Hazards Regression Model, Karthik Devarajan, Nader Ebrahimi COBRA

On Penalized Likelihood Estimation For A Non-Proportional Hazards Regression Model, Karthik Devarajan, Nader Ebrahimi

COBRA Preprint Series

The fundamental assumption of proportionality of hazards in the Cox
model sometimes does not hold in practice. In this paper, a semi-parametric generalization of the Cox model that permits crossing hazard curves is described. This model allows the interaction between covariates and the baseline hazard, and has been the subject of recent investigation. It includes, for the two sample problem, the case of two Weibull distributions and two extreme value distributions differing in both scale and shape parameters. The partial likelihood approach cannot be applied here to estimate the model parameters, and flexible methods based on splines and sieves for ...


Tests That Reject At Least One Subpopulation Null Hypothesis After Rejecting For Overall Population, Michael Rosenblum COBRA

Tests That Reject At Least One Subpopulation Null Hypothesis After Rejecting For Overall Population, Michael Rosenblum

Johns Hopkins University, Dept. of Biostatistics Working Papers

It is often of interest to determine treatment effects in the overall study population, as well as in certain subpopulations. These subpopulations could be defined by a risk factor, such as a biomarker, measured at baseline. We consider situations where the overall population is
partitioned into two subpopulations of interest.
If the null hypothesis of no treatment effect in the overall population is rejected, a natural question is what can be said about these subpopulations.
Whenever there is a treatment effect in the overall population, it follows logically that there must be a treatment effect in at least one of ...