Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 46

Full-Text Articles in Statistics and Probability

Robust Likelihood-Based Analysis Of Multivariate Data With Missing Values, Rod Little, An Hyonggin Dec 2003

Robust Likelihood-Based Analysis Of Multivariate Data With Missing Values, Rod Little, An Hyonggin

The University of Michigan Department of Biostatistics Working Paper Series

The model-based approach to inference from multivariate data with missing values is reviewed. Regression prediction is most useful when the covariates are predictive of the missing values and the probability of being missing, and in these circumstances predictions are particularly sensitive to model misspecification. The use of penalized splines of the propensity score is proposed to yield robust model-based inference under the missing at random (MAR) assumption, assuming monotone missing data. Simulation comparisons with other methods suggest that the method works well in a wide range of populations, with little loss of efficiency relative to parametric models when the latter …


Multiple Testing. Part Ii. Step-Down Procedures For Control Of The Family-Wise Error Rate, Mark J. Van Der Laan, Sandrine Dudoit, Katherine S. Pollard Dec 2003

Multiple Testing. Part Ii. Step-Down Procedures For Control Of The Family-Wise Error Rate, Mark J. Van Der Laan, Sandrine Dudoit, Katherine S. Pollard

U.C. Berkeley Division of Biostatistics Working Paper Series

The present article proposes two step-down multiple testing procedures for asymptotic control of the family-wise error rate (FWER): the first procedure is based on maxima of test statistics (step-down maxT), while the second relies on minima of unadjusted p-values (step-down minP). A key feature of our approach is the test statistics null distribution (rather than data generating null distribution) used to derive cut-offs (i.e., rejection regions) for these test statistics and the resulting adjusted p-values. For general null hypotheses, corresponding to submodels for the data generating distribution, we identify an asymptotic domination condition for a null distribution under which the …


Multiple Testing. Part I. Single-Step Procedures For Control Of General Type I Error Rates, Sandrine Dudoit, Mark J. Van Der Laan, Katherine S. Pollard Dec 2003

Multiple Testing. Part I. Single-Step Procedures For Control Of General Type I Error Rates, Sandrine Dudoit, Mark J. Van Der Laan, Katherine S. Pollard

U.C. Berkeley Division of Biostatistics Working Paper Series

The present article proposes general single-step multiple testing procedures for controlling Type I error rates defined as arbitrary parameters of the distribution of the number of Type I errors, such as the generalized family-wise error rate. A key feature of our approach is the test statistics null distribution (rather than data generating null distribution) used to derive cut-offs (i.e., rejection regions) for these test statistics and the resulting adjusted p-values. For general null hypotheses, corresponding to submodels for the data generating distribution, we identify an asymptotic domination condition for a null distribution under which single-step common-quantile and common-cut-off procedures asymptotically …


Loss-Based Estimation With Cross-Validation: Applications To Microarray Data Analysis And Motif Finding, Sandrine Dudoit, Mark J. Van Der Laan, Sunduz Keles, Annette M. Molinaro, Sandra E. Sinisi, Siew Leng Teng Dec 2003

Loss-Based Estimation With Cross-Validation: Applications To Microarray Data Analysis And Motif Finding, Sandrine Dudoit, Mark J. Van Der Laan, Sunduz Keles, Annette M. Molinaro, Sandra E. Sinisi, Siew Leng Teng

U.C. Berkeley Division of Biostatistics Working Paper Series

Current statistical inference problems in genomic data analysis involve parameter estimation for high-dimensional multivariate distributions, with typically unknown and intricate correlation patterns among variables. Addressing these inference questions satisfactorily requires: (i) an intensive and thorough search of the parameter space to generate good candidate estimators, (ii) an approach for selecting an optimal estimator among these candidates, and (iii) a method for reliably assessing the performance of the resulting estimator. We propose a unified loss-based methodology for estimator construction, selection, and performance assessment with cross-validation. In this approach, the parameter of interest is defined as the risk minimizer for a suitable …


Kernel Estimation Of Rate Function For Recurrent Event Data, Chin-Tsang Chiang, Mei-Cheng Wang, Chiung-Yu Huang Dec 2003

Kernel Estimation Of Rate Function For Recurrent Event Data, Chin-Tsang Chiang, Mei-Cheng Wang, Chiung-Yu Huang

Johns Hopkins University, Dept. of Biostatistics Working Papers

Recurrent event data are largely characterized by the rate function but smoothing techniques for estimating the rate function have never been rigorously developed or studied in statistical literature. This paper considers the moment and least squares methods for estimating the rate function from recurrent event data. With an independent censoring assumption on the recurrent event process, we study statistical properties of the proposed estimators and propose bootstrap procedures for the bandwidth selection and for the approximation of confidence intervals in the estimation of the occurrence rate function. It is identified that the moment method without resmoothing via a smaller bandwidth …


Unified Cross-Validation Methodology For Selection Among Estimators And A General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities And Examples, Mark J. Van Der Laan, Sandrine Dudoit Nov 2003

Unified Cross-Validation Methodology For Selection Among Estimators And A General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities And Examples, Mark J. Van Der Laan, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

In Part I of this article we propose a general cross-validation criterian for selecting among a collection of estimators of a particular parameter of interest based on n i.i.d. observations. It is assumed that the parameter of interest minimizes the expectation (w.r.t. to the distribution of the observed data structure) of a particular loss function of a candidate parameter value and the observed data structure, possibly indexed by a nuisance parameter. The proposed cross-validation criterian is defined as the empirical mean over the validation sample of the loss function at the parameter estimate based on the training sample, averaged over …


Weighting Adjustments For Unit Nonresponse With Multiple Outcome Variables, Sonya L. Vartivarian, Rod Little Nov 2003

Weighting Adjustments For Unit Nonresponse With Multiple Outcome Variables, Sonya L. Vartivarian, Rod Little

The University of Michigan Department of Biostatistics Working Paper Series

Weighting is a common form of unit nonresponse adjustment in sample surveys where entire questionnaires are missing due to noncontact or refusal to participate. Weights are inversely proportional to the probability of selection and response. A common approach computes the response weight adjustment cells based on covariate information. When the number of cells thus created is too large, a coarsening method such as response propensity stratification can be applied to reduce the number of adjustment cells. Simulations in Vartivarian and Little (2002) indicate improved efficiency and robustness of weighting adjustments based on the joint classification of the sample by two …


Estimating Predictors For Long- Or Short-Term Survivors, Lu Tian, Wei Wang, L. J. Wei Nov 2003

Estimating Predictors For Long- Or Short-Term Survivors, Lu Tian, Wei Wang, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


Smooth Quantile Ratio Estimation With Regression: Estimating Medical Expenditures For Smoking Attributable Diseases, Francesca Dominici, Scott L. Zeger Nov 2003

Smooth Quantile Ratio Estimation With Regression: Estimating Medical Expenditures For Smoking Attributable Diseases, Francesca Dominici, Scott L. Zeger

Johns Hopkins University, Dept. of Biostatistics Working Papers

In this paper we introduce a semi-parametric regression model for estimating the difference in the expected value of two positive and highly skewed random variables as a function of covariates. Our method extends Smooth Quantile Ratio Estimation (SQUARE), a novel estimator of the mean difference of two positive random variables, to a regression model.

The methodological development of this paper is motivated by a common problem in econometrics where we are interested in estimating the difference in the average expenditures between two populations, say with and without a disease, taking covariates into account. Let Y1 and Y2 be two positive …


A Nonparametric Comparison Of Conditional Distributions With Nonnegligible Cure Fractions, Yi Li, Jin Feng Nov 2003

A Nonparametric Comparison Of Conditional Distributions With Nonnegligible Cure Fractions, Yi Li, Jin Feng

Harvard University Biostatistics Working Paper Series

No abstract provided.


Survival Analysis With Heterogeneous Covariate Measurement Error, Yi Li, Louise Ryan Nov 2003

Survival Analysis With Heterogeneous Covariate Measurement Error, Yi Li, Louise Ryan

Harvard University Biostatistics Working Paper Series

No abstract provided.


Loss Function Based Ranking In Two-Stage, Hierarchical Models, Rongheng Lin, Thomas A. Louis, Susan M. Paddock, Greg Ridgeway Nov 2003

Loss Function Based Ranking In Two-Stage, Hierarchical Models, Rongheng Lin, Thomas A. Louis, Susan M. Paddock, Greg Ridgeway

Johns Hopkins University, Dept. of Biostatistics Working Papers

Several authors have studied the performance of optimal, squared error loss (SEL) estimated ranks. Though these are effective, in many applications interest focuses on identifying the relatively good (e.g., in the upper 10%) or relatively poor performers. We construct loss functions that address this goal and evaluate candidate rank estimates, some of which optimize specific loss functions. We study performance for a fully parametric hierarchical model with a Gaussian prior and Gaussian sampling distributions, evaluating performance for several loss functions. Results show that though SEL-optimal ranks and percentiles do not specifically focus on classifying with respect to a percentile cut …


Statistical Inference For Infinite Dimensional Parameters Via Asymptotically Pivotal Estimating Functions, Meredith A. Goldwasser, Lu Tian, L. J. Wei Nov 2003

Statistical Inference For Infinite Dimensional Parameters Via Asymptotically Pivotal Estimating Functions, Meredith A. Goldwasser, Lu Tian, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


Joint Modeling And Estimation For Recurrent Event Processes And Failure Time Data, Chiung-Yu Huang, Mei-Cheng Wang Nov 2003

Joint Modeling And Estimation For Recurrent Event Processes And Failure Time Data, Chiung-Yu Huang, Mei-Cheng Wang

Johns Hopkins University, Dept. of Biostatistics Working Papers

Recurrent event data are commonly encountered in longitudinal follow-up studies related to biomedical science, econometrics, reliability, and demography. In many studies, recurrent events serve as important measurements for evaluating disease progression, health deterioration, or insurance risk. When analyzing recurrent event data, an independent censoring condition is typically required for the construction of statistical methods. Nevertheless, in some situations, the terminating time for observing recurrent events could be correlated with the recurrent event process and, as a result, the assumption of independent censoring is violated. In this paper, we consider joint modeling of a recurrent event process and a failure time …


Semi-Parametric Box-Cox Power Transformation Models For Censored Survival Observations, Tianxi Cai, Lu Tian, L. J. Wei Oct 2003

Semi-Parametric Box-Cox Power Transformation Models For Censored Survival Observations, Tianxi Cai, Lu Tian, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


Unification Of Variance Components And Haseman-Elston Regression For Quantitative Trait Linkage Analysis, Wei-Min Chen, Karl W. Broman, Kung-Yee Liang Oct 2003

Unification Of Variance Components And Haseman-Elston Regression For Quantitative Trait Linkage Analysis, Wei-Min Chen, Karl W. Broman, Kung-Yee Liang

Johns Hopkins University, Dept. of Biostatistics Working Papers

Two of the major approaches for linkage analysis with quantitative traits in humans include variance components and Haseman-Elston regression. Previously, these have been viewed as quite separate methods. We describe a general model, fit by use of generalized estimating equations (GEE), for which the variance components and Haseman-Elston methods (including many of the extensions to the original Haseman-Elston method) are special cases, corresponding to different choices for a working covariance matrix. We also show that the regression-based test of Sham et al.(2002) is equivalent to a robust score statistic derived from our GEE approach. These results have several important implications. …


Smooth Quantile Ratio Estimation, Francesca Dominici, Leslie Cope, Daniel Q. Naiman, Scott L. Zeger Oct 2003

Smooth Quantile Ratio Estimation, Francesca Dominici, Leslie Cope, Daniel Q. Naiman, Scott L. Zeger

Johns Hopkins University, Dept. of Biostatistics Working Papers

In a study of health care expenditures attributable to smoking, we seek to compare the distribution of medical costs for persons with lung cancer or chronic obstructive pulmonary disease (cases) to those without (controls) using a national survey which includes hundreds of cases and thousands of controls. The distribution of costs is highly skewed toward larger values, making estimates of the mean from the smaller sample dependent on a small fraction of the biggest values. One approach to deal with the smaller sample is to rely on a simple parametric model such as the log-normal, but this makes the undesirable …


Statistical Inferences Based On Non-Smooth Estimating Functions, Lu Tian, Jun S. Liu, Mary Zhao, L. J. Wei Oct 2003

Statistical Inferences Based On Non-Smooth Estimating Functions, Lu Tian, Jun S. Liu, Mary Zhao, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


On The Cox Model With Time-Varying Regression Coefficients, Lu Tian, David Zucker, L. J. Wei Oct 2003

On The Cox Model With Time-Varying Regression Coefficients, Lu Tian, David Zucker, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


Maximum Likelihood Estimation Of Ordered Multinomial Parameters , Nicholas P. Jewell, Jack Kalbfleisch Oct 2003

Maximum Likelihood Estimation Of Ordered Multinomial Parameters , Nicholas P. Jewell, Jack Kalbfleisch

The University of Michigan Department of Biostatistics Working Paper Series

The pool-adjacent violator-algorithm (Ayer et al., 1955) has long been known to give the maximum likelihood estimator of a series of ordered binomial parameters, based on an independent observation from each distribution (see, Barlow et al., 1972). This result has immediate application to estimation of a survival distribution based on current survival status at a set of monitoring times. This paper considers an extended problem of maximum likelihood estimation of a series of ‘ordered’ multinomial parameters pi = (p1i, p2i, . . . , pmi) for 1 < = I < = k, where ordered means that pj1 < = pj2 < = .. . < = pjk for each j with 1 < = j < = m-1. The data consist of k independent observations X1, . . . ,Xk where Xi has a multinomial distribution with probability parameter pi and known index ni > = 1. By making use of variants of the pool adjacent violator algorithm, …


Nonparametric Estimation Of The Bivariate Recurrence Time Distribution, Chiung-Yu Huang, Mei-Cheng Wang Oct 2003

Nonparametric Estimation Of The Bivariate Recurrence Time Distribution, Chiung-Yu Huang, Mei-Cheng Wang

Johns Hopkins University, Dept. of Biostatistics Working Papers

This paper considers statistical models in which two different types of events, such as the diagnosis of a disease and the remission of the disease, occur alternately over time and are observed subject to right censoring. We propose nonparametric estimators for the joint distribution of bivariate recurrence times and the marginal distribution of the first recurrence time. In general, the marginal distribution of the second recurrence time cannot be estimated due to an identifiability problem, but a conditional distribution of the second recurrence time can be estimated non-parametrically. In literature, statistical methods have been developed to estimate the joint distribution …


Equivalent Kernels Of Smoothing Splines In Nonparametric Regression For Clustered/Longitudinal Data, Xihong Lin, Naisyin Wang, Alan H. Welsh, Raymond J. Carroll Sep 2003

Equivalent Kernels Of Smoothing Splines In Nonparametric Regression For Clustered/Longitudinal Data, Xihong Lin, Naisyin Wang, Alan H. Welsh, Raymond J. Carroll

The University of Michigan Department of Biostatistics Working Paper Series

We compare spline and kernel methods for clustered/longitudinal data. For independent data, it is well known that kernel methods and spline methods are essentially asymptotically equivalent (Silverman, 1984). However, the recent work of Welsh, et al. (2002) shows that the same is not true for clustered/longitudinal data. First, conventional kernel methods fail to account for the within- cluster correlation, while spline methods are able to account for this correlation. Second, kernel methods and spline methods were found to have different local behavior, with conventional kernels being local and splines being non-local. To resolve these differences, we show that a smoothing …


Efficient Semiparametric Marginal Estimation For Longitudinal/Clustered Data, Naisyin Wang, Raymond J. Carroll, Xihong Lin Sep 2003

Efficient Semiparametric Marginal Estimation For Longitudinal/Clustered Data, Naisyin Wang, Raymond J. Carroll, Xihong Lin

The University of Michigan Department of Biostatistics Working Paper Series

We consider marginal generalized semiparametric partially linear models for clustered data. Lin and Carroll (2001a) derived the semiparametric efficinet score funtion for this problem in the mulitvariate Gaussian case, but they were unable to contruct a semiparametric efficient estimator that actually achieved the semiparametric information bound. We propose such an estimator here and generalize the work to marginal generalized partially liner models. Asymptotic relative efficincies of the estimation or throughout are investigated. The finite sample performance of these estimators is evaluated through simulations and illustrated using a longtiudinal CD4 count data set. Both theoretical and numerical results indicate that properly …


Measuring Treatment Effects Using Semiparametric Models, Zhuo Yu, Mark J. Van Der Laan Sep 2003

Measuring Treatment Effects Using Semiparametric Models, Zhuo Yu, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In order to estimate the causal effect of treatments on an outcome of interest, one has to account for the effect of confounding factors which covary with the treatments and also contribute to the outcome of interest. In this paper, we use the semiparametric regression model to estimate the causal parameters. We assume the causal effect of the treatments can be described by the parametric component of the semiparametric regression model. Following the general methodology which was developed in van der Laan and Robins (2002) we give the orthogonal complement of the nuisance tangent space which identifies all the estimating …


Asymptotically Optimal Model Selection Method With Right Censored Outcomes, Sunduz Keles, Mark J. Van Der Laan, Sandrine Dudoit Sep 2003

Asymptotically Optimal Model Selection Method With Right Censored Outcomes, Sunduz Keles, Mark J. Van Der Laan, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

Over the last two decades, non-parametric and semi-parametric approaches that adapt well known techniques such as regression methods to the analysis of right censored data, e.g. right censored survival data, became popular in the statistics literature. However, the problem of choosing the best model (predictor) among a set of proposed models (predictors) in the right censored data setting have not gained much attention. In this paper, we develop a new cross-validation based model selection method to select among predictors of right censored outcomes such as survival times. The proposed method considers the risk of a given predictor based on the …


Tree-Based Multivariate Regression And Density Estimation With Right-Censored Data , Annette M. Molinaro, Sandrine Dudoit, Mark J. Van Der Laan Sep 2003

Tree-Based Multivariate Regression And Density Estimation With Right-Censored Data , Annette M. Molinaro, Sandrine Dudoit, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We propose a unified strategy for estimator construction, selection, and performance assessment in the presence of censoring. This approach is entirely driven by the choice of a loss function for the full (uncensored) data structure and can be stated in terms of the following three main steps. (1) Define the parameter of interest as the minimizer of the expected loss, or risk, for a full data loss function chosen to represent the desired measure of performance. Map the full data loss function into an observed (censored) data loss function having the same expected value and leading to an efficient estimator …


Inference For The Population Total From Probability-Proportional-To-Size Samples Based On Predictions From A Penalized Spline Nonparametric Model, Hui Zheng, Rod Little Aug 2003

Inference For The Population Total From Probability-Proportional-To-Size Samples Based On Predictions From A Penalized Spline Nonparametric Model, Hui Zheng, Rod Little

The University of Michigan Department of Biostatistics Working Paper Series

Inference about the finite population total from probability-proportional-to-size (PPS) samples is considered. In previous work (Zheng and Little, 2003), penalized spline (p-spline) nonparametric model-based estimators were shown to generally outperform the Horvitz-Thompson (HT) and generalized regression (GR) estimators in terms of the root mean squared error. In this article we develop model-based, jackknife and balanced repeated replicate variance estimation methods for the p-spline based estimators. Asymptotic properties of the jackknife method are discussed. Simulations show that p-spline point estimators and their jackknife standard errors lead to inferences that are superior to HT or GR based inferences. This suggests that nonparametric …


Locally Efficient Estimation Of Nonparametric Causal Effects On Mean Outcomes In Longitudinal Studies, Romain Neugebauer, Mark J. Van Der Laan Jul 2003

Locally Efficient Estimation Of Nonparametric Causal Effects On Mean Outcomes In Longitudinal Studies, Romain Neugebauer, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Marginal Structural Models (MSM) have been introduced by Robins (1998a) as a powerful tool for causal inference as they directly model causal curves of interest, i.e. mean treatment-specific outcomes possibly adjusted for baseline covariates. Two estimators of the corresponding MSM parameters of interest have been proposed, see van der Laan and Robins (2002): the Inverse Probability of Treatment Weighted (IPTW) and the Double Robust (DR) estimators. A parametric MSM approach to causal inference has been favored since the introduction of MSM. It relies on correct specification of a parametric MSM to consistently estimate the parameter of interest using the IPTW …


Resampling-Based Multiple Testing: Asymptotic Control Of Type I Error And Applications To Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan Jun 2003

Resampling-Based Multiple Testing: Asymptotic Control Of Type I Error And Applications To Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We define a general statistical framework for multiple hypothesis testing and show that the correct null distribution for the test statistics is obtained by projecting the true distribution of the test statistics onto the space of mean zero distributions. For common choices of test statistics (based on an asymptotically linear parameter estimator), this distribution is asymptotically multivariate normal with mean zero and the covariance of the vector influence curve for the parameter estimator. This test statistic null distribution can be estimated by applying the non-parametric or parametric bootstrap to correctly centered test statistics. We prove that this bootstrap estimated null …


Maximization By Parts In Likelihood Inference, Peter Xuekun Song, Yanqin Fan, Jack Kalbfleisch Jun 2003

Maximization By Parts In Likelihood Inference, Peter Xuekun Song, Yanqin Fan, Jack Kalbfleisch

The University of Michigan Department of Biostatistics Working Paper Series

This paper presents and examines a new algorithm for solving a score equation for the maximum likelyhood estimate in certain problems of practical interest. The method circumvents the need to compute second order derivaties of the full likelihood function. It exploits the structure of certain models that yield a natural decomposition of a very complicated likelihood function. In this decomposition, the first part is a log likelihood from a simply analyzed model and the second part is used to update estimates from the first. Convergence properties of this fixed point algorithm are examined and asymptotics are derived for estimators obtained …