Open Access. Powered by Scholars. Published by Universities.®

Statistical Methodology Commons

Open Access. Powered by Scholars. Published by Universities.®

Discipline
Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 931 - 960 of 1116

Full-Text Articles in Statistical Methodology

Direct Effect Models, Mark J. Van Der Laan, Maya L. Petersen Aug 2005

Direct Effect Models, Mark J. Van Der Laan, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

The causal effect of a treatment on an outcome is generally mediated by several intermediate variables. Estimation of the component of the causal effect of a treatment that is mediated by a given intermediate variable (the indirect effect of the treatment), and the component that is not mediated by that intermediate variable (the direct effect of the treatment) is often relevant to mechanistic understanding and to the design of clinical and public health interventions. Under the assumption of no-unmeasured confounders for treatment and the intermediate variable, Robins & Greenland (1992) define an individual direct effect as the counterfactual effect of …


Application Of A Multiple Testing Procedure Controlling The Proportion Of False Positives To Protein And Bacterial Data, Merrill D. Birkner, Alan E. Hubbard, Mark J. Van Der Laan Aug 2005

Application Of A Multiple Testing Procedure Controlling The Proportion Of False Positives To Protein And Bacterial Data, Merrill D. Birkner, Alan E. Hubbard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Simultaneously testing multiple hypotheses is important in high-dimensional biological studies. In these situations, one is often interested in controlling the Type-I error rate, such as the proportion of false positives to total rejections (TPPFP) at a specific level, alpha. This article will present an application of the E-Bayes/Bootstrap TPPFP procedure, presented in van der Laan et al. (2005), which controls the tail probability of the proportion of false positives (TPPFP), on two biological datasets. The two data applications include firstly, the application to a mass-spectrometry dataset of two leukemia subtypes, AML and ALL. The protein data measurements include intensity and …


Cross-Validating And Bagging Partitioning Algorithms With Variable Importance, Annette M. Molinaro, Mark J. Van Der Laan Aug 2005

Cross-Validating And Bagging Partitioning Algorithms With Variable Importance, Annette M. Molinaro, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We present a cross-validated bagging scheme in the context of partitioning algorithms. To explore the benefits of the various bagging scheme, we compare via simulations the predictive ability of single Classification and Regression (CART) Tree with several previously suggested bagging schemes and with our proposed approach. Additionally, a variable importance measure is explained and illustrated.


Test Statistics Null Distributions In Multiple Testing: Simulation Studies And Applications To Genomics, Katherine S. Pollard, Merrill D. Birkner, Mark J. Van Der Laan, Sandrine Dudoit Jul 2005

Test Statistics Null Distributions In Multiple Testing: Simulation Studies And Applications To Genomics, Katherine S. Pollard, Merrill D. Birkner, Mark J. Van Der Laan, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

Multiple hypothesis testing problems arise frequently in biomedical and genomic research, for instance, when identifying differentially expressed or co-expressed genes in microarray experiments. We have developed generally applicable resampling-based single-step and stepwise multiple testing procedures (MTP) for control of a broad class of Type I error rates, defined as tail probabilities and expected values for arbitrary functions of the numbers of false positives and rejected hypotheses (Dudoit and van der Laan, 2005; Dudoit et al., 2004a,b; Pollard and van der Laan, 2004; van der Laan et al., 2005, 2004a,b). As argued in the early article of Pollard and van der …


Linear Regression Of Censored Length-Biased Lifetimes, Ying Qing Chen, Yan Wang Jul 2005

Linear Regression Of Censored Length-Biased Lifetimes, Ying Qing Chen, Yan Wang

UW Biostatistics Working Paper Series

Length-biased lifetimes may be collected in observational studies or sample surveys due to biased sampling scheme. In this article, we use a linear regression model, namely, the accelerated failure time model, for the population lifetime distributions in regression analysis of the length-biased lifetimes. It is discovered that the associated regression parameters are invariant under the length-biased sampling scheme. According to this discovery, we propose the quasi partial score estimating equations to estimate the population regression parameters. The proposed methodologies are evaluated and demonstrated by simulation studies and an application to actual data set.


A Note On The Construction Of Counterfactuals And The G-Computation Formula, Zhuo Yu, Mark J. Van Der Laan Jun 2005

A Note On The Construction Of Counterfactuals And The G-Computation Formula, Zhuo Yu, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Robins' causal inference theory assumes existence of treatment specific counterfactual variables so that the observed data augmented by the counterfactual data will satisfy a consistency and a randomization assumption. In this paper we provide an explicit function that maps the observed data into a counterfactual variable which satisfies the consistency and randomization assumptions. This offers a practically useful imputation method for counterfactuals. Gill & Robins [2001]'s construction of counterfactuals can be used as an imputation method in principle, but it is very hard to implement in practice. Robins [1987] shows that the counterfactual distribution can be identified from the observed …


Cross-Validated Bagged Learning, Mark J. Van Der Laan, Sandra E. Sinisi, Maya L. Petersen Jun 2005

Cross-Validated Bagged Learning, Mark J. Van Der Laan, Sandra E. Sinisi, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

Many applications aim to learn a high dimensional parameter of a data generating distribution based on a sample of independent and identically distributed observations. For example, the goal might be to estimate the conditional mean of an outcome given a list of input variables. In this prediction context, Breiman (1996a) introduced bootstrap aggregating (bagging) as a method to reduce the variance of a given estimator at little cost to bias. Bagging involves applying the estimator to multiple bootstrap samples, and averaging the result across bootstrap samples. In order to deal with the curse of dimensionality, typical practice has been to …


On Additive Regression Of Expectancy, Ying Qing Chen Jun 2005

On Additive Regression Of Expectancy, Ying Qing Chen

UW Biostatistics Working Paper Series

Regression models have been important tools to study the association between outcome variables and their covariates. The traditional linear regression models usually specify such an association by the expectations of the outcome variables as function of the covariates and some parameters. In reality, however, interests often focus on their expectancies characterized by the conditional means. In this article, a new class of additive regression models is proposed to model the expectancies. The model parameters carry practical implication, which may allow the models to be useful in applications such as treatment assessment, resource planning or short-term forecasting. Moreover, the new model …


An Empirical Process Limit Theorem For Sparsely Correlated Data, Thomas Lumley Jun 2005

An Empirical Process Limit Theorem For Sparsely Correlated Data, Thomas Lumley

UW Biostatistics Working Paper Series

We consider data that are dependent, but where most small sets of observations are independent. By extending Bernstein's inequality we prove a strong law of law numbers and an empirical process central limit theorem under bracketing entropy conditions.


New Statistical Paradigms Leading To Web-Based Tools For Clinical/Translational Science, Knut M. Wittkowski May 2005

New Statistical Paradigms Leading To Web-Based Tools For Clinical/Translational Science, Knut M. Wittkowski

COBRA Preprint Series

As the field of functional genetics and genomics is beginning to mature, we become confronted with new challenges. The constant drop in price for sequencing and gene expression profiling as well as the increasing number of genetic and genomic variables that can be measured makes it feasible to address more complex questions. The success with rare diseases caused by single loci or genes has provided us with a proof-of-concept that new therapies can be developed based on functional genomics and genetics.

Common diseases, however, typically involve genetic epistasis, genomic pathways, and proteomic pattern. Moreover, to better understand the underlying biologi-cal …


Multiple Imputation For Correcting Verification Bias, Ofer Harel, Xiao-Hua Zhou May 2005

Multiple Imputation For Correcting Verification Bias, Ofer Harel, Xiao-Hua Zhou

UW Biostatistics Working Paper Series

In the case in which all subjects are screened using a common test, and only a subset of these subjects are tested using a golden standard test, it is well documented that there is a risk for bias, called verification bias. When the test has only two levels (e.g. positive and negative) and we are trying to estimate the sensitivity and specificity of the test, one is actually constructing a confidence interval for a binomial proportion. Since it is well documented that this estimation is not trivial even with complete data, we adopt Multiple imputation (MI) framework for verification bias …


A Comparison Of Parametric And Coarsened Bayesian Interval Estimation In The Presence Of A Known Mean-Variance Relationship, Kent Koprowicz, Scott S. Emerson, Peter Hoff Apr 2005

A Comparison Of Parametric And Coarsened Bayesian Interval Estimation In The Presence Of A Known Mean-Variance Relationship, Kent Koprowicz, Scott S. Emerson, Peter Hoff

UW Biostatistics Working Paper Series

While the use of Bayesian methods of analysis have become increasingly common, classical frequentist hypothesis testing still holds sway in medical research - especially clinical trials. One major difference between a standard frequentist approach and the most common Bayesian approaches is that even when a frequentist hypothesis test is derived from parametric models, the interpretation and operating characteristics of the test may be considered in a distribution-free manner. Bayesian inference, on the other hand, is often conducted in a parametric setting where the interpretation of the results is dependent on the parametric model. Here we consider a Bayesian counterpart to …


Causal Inference In Longitudinal Studies With History-Restricted Marginal Structural Models, Romain Neugebauer, Mark J. Van Der Laan, Ira B. Tager Apr 2005

Causal Inference In Longitudinal Studies With History-Restricted Marginal Structural Models, Romain Neugebauer, Mark J. Van Der Laan, Ira B. Tager

U.C. Berkeley Division of Biostatistics Working Paper Series

Causal Inference based on Marginal Structural Models (MSMs) is particularly attractive to subject-matter investigators because MSM parameters provide explicit representations of causal effects. We introduce History-Restricted Marginal Structural Models (HRMSMs) for longitudinal data for the purpose of defining causal parameters which may often be better suited for Public Health research. This new class of MSMs allows investigators to analyze the causal effect of a treatment on an outcome based on a fixed, shorter and user-specified history of exposure compared to MSMs. By default, the latter represents the treatment causal effect of interest based on a treatment history defined by the …


The Bayesian Two-Sample T-Test, Mithat Gonen, Wesley O. Johnson, Yonggang Lu, Peter H. Westfall Apr 2005

The Bayesian Two-Sample T-Test, Mithat Gonen, Wesley O. Johnson, Yonggang Lu, Peter H. Westfall

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

In this article we show how the pooled-variance two-sample t-statistic arises from a Bayesian formulation of the two-sided point null testing problem, with emphasis on teaching. We identify a reasonable and useful prior giving a closed-form Bayes factor that can be written in terms of the distribution of the two-sample t-statistic under the null and alternative hypotheses respectively. This provides a Bayesian motivation for the two-sample t-statistic, which has heretofore been buried as a special case of more complex linear models, or given only roughly via analytic or Monte Carlo approximations. The resulting formulation of the Bayesian test is easy …


Graphic Violence, Dale K. Hathaway Apr 2005

Graphic Violence, Dale K. Hathaway

Faculty Scholarship – Mathematics

Statistical graphs are everywhere, yet they are one of the most common places for misinformation. Numerous graphical displays are presented that misrepresent the data. Included are issues like missing baselines, squaring the effect, and hidden bias in graphs.


Resampling Based Multiple Testing Procedure Controlling Tail Probability Of The Proportion Of False Positives, Mark J. Van Der Laan, Merrill D. Birkner, Alan E. Hubbard Mar 2005

Resampling Based Multiple Testing Procedure Controlling Tail Probability Of The Proportion Of False Positives, Mark J. Van Der Laan, Merrill D. Birkner, Alan E. Hubbard

U.C. Berkeley Division of Biostatistics Working Paper Series

Simultaneously testing a collection of null hypotheses about a data generating distribution based on a sample of independent and identically distributed observations is a fundamental and important statistical problem involving many applications. In this article we propose a new resampling based multiple testing procedure asymptotically controlling the probability that the proportion of false positives among the set of rejections exceeds q at level alpha, where q and alpha are user supplied numbers. The procedure involves 1) specifying a conditional distribution for a guessed set of true null hypotheses, given the data, which asymptotically is degenerate at the true set of …


Bayesian Evaluation Of Group Sequential Clinical Trial Designs, Scott S. Emerson, John M. Kittelson, Daniel L. Gillen Mar 2005

Bayesian Evaluation Of Group Sequential Clinical Trial Designs, Scott S. Emerson, John M. Kittelson, Daniel L. Gillen

UW Biostatistics Working Paper Series

Clincal trial designs often incorporate a sequential stopping rule to serve as a guide in the early termination of a study. When choosing a particular stopping rule, it is most common to examine frequentist operating characteristics such as type I error, statistical power, and precision of confi- dence intervals (Emerson, et al. [1]). Increasingly, however, clinical trials are designed and analyzed in the Bayesian paradigm. In this paper we describe how the Bayesian operating characteristics of a particular stopping rule might be evaluated and communicated to the scientific community. In particular, we consider a choice of probability models and a …


Implementation Of Estimating-Function Based Inference Procedures With Mcmc Sampler, Lu Tian, Jun S. Liu, L. J. Wei Feb 2005

Implementation Of Estimating-Function Based Inference Procedures With Mcmc Sampler, Lu Tian, Jun S. Liu, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


Fixed-Width Output Analysis For Markov Chain Monte Carlo, Galin L. Jones, Murali Haran, Brian S. Caffo, Ronald Neath Feb 2005

Fixed-Width Output Analysis For Markov Chain Monte Carlo, Galin L. Jones, Murali Haran, Brian S. Caffo, Ronald Neath

Johns Hopkins University, Dept. of Biostatistics Working Papers

Markov chain Monte Carlo is a method of producing a correlated sample in order to estimate features of a complicated target distribution via simple ergodic averages. A fundamental question in MCMC applications is when should the sampling stop? That is, when are the ergodic averages good estimates of the desired quantities? We consider a method that stops the MCMC sampling the first time the width of a confidence interval based on the ergodic averages is less than a user-specified value. Hence calculating Monte Carlo standard errors is a critical step in assessing the output of the simulation. In particular, we …


Multiple Testing Procedures And Applications To Genomics, Merrill D. Birkner, Katherine S. Pollard, Mark J. Van Der Laan, Sandrine Dudoit Jan 2005

Multiple Testing Procedures And Applications To Genomics, Merrill D. Birkner, Katherine S. Pollard, Mark J. Van Der Laan, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

This chapter proposes widely applicable resampling-based single-step and stepwise multiple testing procedures (MTP) for controlling a broad class of Type I error rates, in testing problems involving general data generating distributions (with arbitrary dependence structures among variables), null hypotheses, and test statistics (Dudoit and van der Laan, 2005; Dudoit et al., 2004a,b; van der Laan et al., 2004a,b; Pollard and van der Laan, 2004; Pollard et al., 2005). Procedures are provided to control Type I error rates defined as tail probabilities for arbitrary functions of the numbers of Type I errors, V_n, and rejected hypotheses, R_n. These error rates include: …


Robust Inferences For Covariate Effects On Survival Time With Censored Linear Regression Models, Larry Leon, Tianxi Cai, L. J. Wei Jan 2005

Robust Inferences For Covariate Effects On Survival Time With Censored Linear Regression Models, Larry Leon, Tianxi Cai, L. J. Wei

Harvard University Biostatistics Working Paper Series

Various inference procedures for linear regression models with censored failure times have been studied extensively. Recent developments on efficient algorithms to implement these procedures enhance the practical usage of such models in survival analysis. In this article, we present robust inferences for certain covariate effects on the failure time in the presence of "nuisance" confounders under a semiparametric, partial linear regression setting. Specifically, the estimation procedures for the regression coefficients of interest are derived from a working linear model and are valid even when the function of the confounders in the model is not correctly specified. The new proposals are …


Ramping Up Assessment At The Unlv Libraries, Jeanne M. Brown Jan 2005

Ramping Up Assessment At The Unlv Libraries, Jeanne M. Brown

Library Faculty Publications

Purpose – Sets out to describe the development of an assessment program at UNLV Libraries and current assessment activities.

Design/methodology/approach – Assessment activities are first placed in organizational context, distinguishing between assessment initiated by departments, and assessment done library-wide. Common expressions of resistance to assessment are noted, followed by the library and campus context relating to assessment. The impact of technology and of the LibQual+ survey is discussed.

Findings – Assessment activities at UNLV Libraries have strengthened and diversified over the last several years, thanks to several factors including the guidance of its dean, the development of technology and human …


Identifying A Source Of Financial Volatility, Douglas G. Steigerwald, Richard Vagnoni Dec 2004

Identifying A Source Of Financial Volatility, Douglas G. Steigerwald, Richard Vagnoni

Douglas G. Steigerwald

How should one combine stock and option markets in models of trade and asset price volatility? We address this question, paying particular attention to the identification of parameters of interest.


Inferring Information Frequency And Quality, Douglas G. Steigerwald, John Owens Dec 2004

Inferring Information Frequency And Quality, Douglas G. Steigerwald, John Owens

Douglas G. Steigerwald

We develop a microstructure model that, in contrast to previous models, allows one to estimate the frequency and quality of private information. In addition, the model produces stationary asset price and trading volume series. We find evidence that information arrives frequently within a day and that this information is of high quality. The frequent arrival of information, while in contrast to previous microstructure model estimates, accords with nonmodel-based estimates and the related literature testing the mixture-of-distributions hypothesis. To determine if the estimates are correctly reflecting the arrival of latent information, we estimate the parameters over half-hour intervals within the day. …


Multiple Testing Procedures For Controlling Tail Probability Error Rates, Sandrine Dudoit, Mark J. Van Der Laan, Merrill D. Birkner Dec 2004

Multiple Testing Procedures For Controlling Tail Probability Error Rates, Sandrine Dudoit, Mark J. Van Der Laan, Merrill D. Birkner

U.C. Berkeley Division of Biostatistics Working Paper Series

The present article discusses and compares multiple testing procedures (MTP) for controlling Type I error rates defined as tail probabilities for the number (gFWER) and proportion (TPPFP) of false positives among the rejected hypotheses. Specifically, we consider the gFWER- and TPPFP-controlling MTPs proposed recently by Lehmann & Romano (2004) and in a series of four articles by Dudoit et al. (2004), van der Laan et al. (2004b,a), and Pollard & van der Laan (2004). The former Lehmann & Romano (2004) procedures are marginal, in the sense that they are based solely on the marginal distributions of the test statistics, i.e., …


Multiple Testing Procedures: R Multtest Package And Applications To Genomics, Katherine S. Pollard, Sandrine Dudoit, Mark J. Van Der Laan Dec 2004

Multiple Testing Procedures: R Multtest Package And Applications To Genomics, Katherine S. Pollard, Sandrine Dudoit, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

The Bioconductor R package multtest implements widely applicable resampling-based single-step and stepwise multiple testing procedures (MTP) for controlling a broad class of Type I error rates, in testing problems involving general data generating distributions (with arbitrary dependence structures among variables), null hypotheses, and test statistics. The current version of multtest provides MTPs for tests concerning means, differences in means, and regression parameters in linear and Cox proportional hazards models. Procedures are provided to control Type I error rates defined as tail probabilities for arbitrary functions of the numbers of false positives and rejected hypotheses. These error rates include tail probabilities …


Semiparametric Regression In Capture-Recapture Modelling, O. Gimenez, C. Barbraud, Ciprian M. Crainiceanu, S. Jenouvrier, B.T. Morgan Dec 2004

Semiparametric Regression In Capture-Recapture Modelling, O. Gimenez, C. Barbraud, Ciprian M. Crainiceanu, S. Jenouvrier, B.T. Morgan

Johns Hopkins University, Dept. of Biostatistics Working Papers

Capture-recapture models were developed to estimate survival using data arising from marking and monitoring wild animals over time. Variation in the survival process may be explained by incorporating relevant covariates. We develop nonparametric and semiparametric regression models for estimating survival in capture-recapture models. A fully Bayesian approach using MCMC simulations was employed to estimate the model parameters. The work is illustrated by a study of Snow petrels, in which survival probabilities are expressed as nonlinear functions of a climate covariate, using data from a 40-year study on marked individuals, nesting at Petrels Island, Terre Adelie.


Semi-Parametric Single-Index Two-Part Regression Models, Xiao-Hua Zhou, Hua Liang Dec 2004

Semi-Parametric Single-Index Two-Part Regression Models, Xiao-Hua Zhou, Hua Liang

UW Biostatistics Working Paper Series

In this paper, we proposed a semi-parametric single-index two-part regression model to weaken assumptions in parametric regression methods that were frequently used in the analysis of skewed data with additional zero values. The estimation procedure for the parameters of interest in the model was easily implemented. The proposed estimators were shown to be consistent and asymptotically normal. Through a simulation study, we showed that the proposed estimators have reasonable finite-sample performance. We illustrated the application of the proposed method in one real study on the analysis of health care costs.


A Bayesian Mixture Model Relating Dose To Critical Organs And Functional Complication In 3d Conformal Radiation Therapy, Tim Johnson, Jeremy Taylor, Randall K. Ten Haken, Avraham Eisbruch Nov 2004

A Bayesian Mixture Model Relating Dose To Critical Organs And Functional Complication In 3d Conformal Radiation Therapy, Tim Johnson, Jeremy Taylor, Randall K. Ten Haken, Avraham Eisbruch

The University of Michigan Department of Biostatistics Working Paper Series

A goal of radiation therapy is to deliver maximum dose to the target tumor while minimizing complications due to irradiation of critical organs. Technological advances in 3D conformal radiation therapy has allowed great strides in realizing this goal, however complications may still arise. Critical organs may be adjacent to tumors or in the path of the radiation beam. Several mathematical models have been proposed that describe a relationship between dose and observed functional complication, however only a few published studies have successfully fit these models to data using modern statistical methods which make efficient use of the data. One complication …


Choice Of Monitoring Mechanism For Optimal Nonparametric Functional Estimation For Binary Data, Nicholas P. Jewell, Mark J. Van Der Laan, Stephen Shiboski Nov 2004

Choice Of Monitoring Mechanism For Optimal Nonparametric Functional Estimation For Binary Data, Nicholas P. Jewell, Mark J. Van Der Laan, Stephen Shiboski

U.C. Berkeley Division of Biostatistics Working Paper Series

Optimal designs of dose levels in order to estimate parameters from a model for binary response data have a long and rich history. These designs are based on parametric models. Here we consider fully nonparametric models with interest focused on estimation of smooth functionals using plug-in estimators based on the nonparametric maximum likelihood estimator. An important application of the results is the derivation of the optimal choice of the monitoring time distribution function for current status observation of a survival distribution. The optimal choice depends in a simple way on the dose response function and the form of the functional. …