Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 24 of 24

Full-Text Articles in Statistics and Probability

Conditional Screening For Ultra-High Dimensional Covariates With Survival Outcomes, Hyokyoung Grace Hong, Jian Kang, Yi Li Mar 2016

Conditional Screening For Ultra-High Dimensional Covariates With Survival Outcomes, Hyokyoung Grace Hong, Jian Kang, Yi Li

The University of Michigan Department of Biostatistics Working Paper Series

Identifying important biomarkers that are predictive for cancer patients' prognosis is key in gaining better insights into the biological influences on the disease and has become a critical component of precision medicine. The emergence of large-scale biomedical survival studies, which typically involve excessive number of biomarkers, has brought high demand in designing efficient screening tools for selecting predictive biomarkers. The vast amount of biomarkers defies any existing variable selection methods via regularization. The recently developed variable screening methods, though powerful in many practical setting, fail to incorporate prior information on the importance of each biomarker and are less powerful in …


C-Learning: A New Classification Framework To Estimate Optimal Dynamic Treatment Regimes, Baqun Zhang, Min Zhang Aug 2015

C-Learning: A New Classification Framework To Estimate Optimal Dynamic Treatment Regimes, Baqun Zhang, Min Zhang

The University of Michigan Department of Biostatistics Working Paper Series

Personalizing treatment to accommodate patient heterogeneity and the evolving nature of a disease over time has received considerable attention lately. A dynamic treatment regime is a set of decision rules, each corresponding to a decision point, that determine that next treatment based on each individual’s own available characteristics and treatment history up to that point. We show that identifying the optimal dynamic treatment regime can be recast as a sequential classification problem and is equivalent to sequentially minimizing a weighted expected misclassification error. This general classification perspective targets the exact goal of optimally individualizing treatments and is new and fundamentally …


Weighting And Prediction In Sample Surveys, Rod Little Feb 2009

Weighting And Prediction In Sample Surveys, Rod Little

The University of Michigan Department of Biostatistics Working Paper Series

A fundamental technique in survey sampling is to weight included units by the inverse of their probability of inclusion, which may be known (as in the case of sampling weights) or estimated (as in the case of nonresponse weights). The technique is closely associated with the design-based approach to survey inference, with the idea that units in the sample are representing a certain number of units in the population. I discuss weighting from a modeling perspective. Some common misconceptions of weighting will be addressed, including the idea that modelers can ignore the sampling weights, or that weighting necessarily reduces bias …


A Bayesian Mixture Model Relating Dose To Critical Organs And Functional Complication In 3d Conformal Radiation Therapy, Tim Johnson, Jeremy Taylor, Randall K. Ten Haken, Avraham Eisbruch Nov 2004

A Bayesian Mixture Model Relating Dose To Critical Organs And Functional Complication In 3d Conformal Radiation Therapy, Tim Johnson, Jeremy Taylor, Randall K. Ten Haken, Avraham Eisbruch

The University of Michigan Department of Biostatistics Working Paper Series

A goal of radiation therapy is to deliver maximum dose to the target tumor while minimizing complications due to irradiation of critical organs. Technological advances in 3D conformal radiation therapy has allowed great strides in realizing this goal, however complications may still arise. Critical organs may be adjacent to tumors or in the path of the radiation beam. Several mathematical models have been proposed that describe a relationship between dose and observed functional complication, however only a few published studies have successfully fit these models to data using modern statistical methods which make efficient use of the data. One complication …


Semiparametric Binary Regression Under Monotonicity Constraints, Moulinath Banerjee, Pinaki Biswas, Debashis Ghosh Nov 2004

Semiparametric Binary Regression Under Monotonicity Constraints, Moulinath Banerjee, Pinaki Biswas, Debashis Ghosh

The University of Michigan Department of Biostatistics Working Paper Series

Summary: We study a binary regression model where the response variable $\Delta$ is the indicator of an event of interest (for example, the incidence of cancer) and the set of covariates can be partitioned as $(X,Z)$ where $Z$ (real valued) is the covariate of primary interest and $X$ (vector valued) denotes a set of control variables. For any fixed $X$, the conditional probability of the event of interest is assumed to be a monotonic function of $Z$. The effect of the control variables is captured by a regression parameter $\beta$. We show that the baseline conditional probability function (corresponding to …


Censored Linear Regression For Case-Cohort Studies, Bin Nan, Menggang Yu, Jack Kalbfleisch Oct 2004

Censored Linear Regression For Case-Cohort Studies, Bin Nan, Menggang Yu, Jack Kalbfleisch

The University of Michigan Department of Biostatistics Working Paper Series

Right censored data from a classical case-cohort design and a stratified case-cohort design are considered. In the classical case-cohort design, the subcohort is obtained as a simple random sample of the entire cohort, whereas in the stratified design, the subcohort is selected by independent Bernoulli sampling with arbitrary selection probabilities. For each design and under a linear regression model, methods for estimating the regression parameters are proposed and analyzed. These methods are derived by modifying the linear ranks tests and estimating equations that arise from full-cohort data using methods that are similar to the "pseudo-likelihood" estimating equation that has been …


New Estimating Methods For Surrogate Outcome Data, Bin Nan Jun 2004

New Estimating Methods For Surrogate Outcome Data, Bin Nan

The University of Michigan Department of Biostatistics Working Paper Series

Surrogate outcome data arise frequently in medical research. The true outcomes of interest are expensive or hard to ascertain, but measurements of surrogate outcomes (or more generally speaking, the correlates of the true outcomes) are usually available. In this paper we assume that the conditional expectation of the true outcome given covariates is known up to a finite dimensional parameter. When the true outcome is missing at random, the e±cient score function for the parameter in the conditional mean model has a simple form, which is similar to the generalized estimating functions. There is no integral equation involved as in …


Asymptotic Results For Simultaneous Group Sequential Analysis Of Rank-Based And Weighted Kaplan-Meier Tests With Paired Survival Data In The Presence Of Censoring. Technical Report, Adin-Cristian Andrei, Susan Murray Jun 2004

Asymptotic Results For Simultaneous Group Sequential Analysis Of Rank-Based And Weighted Kaplan-Meier Tests With Paired Survival Data In The Presence Of Censoring. Technical Report, Adin-Cristian Andrei, Susan Murray

The University of Michigan Department of Biostatistics Working Paper Series

This research sequentially monitors paired survival differences using a new class of non-parametric tests based on functionals of standardized paired weighted log-rank (PWLR) and standardized paired weighted Kaplan-Meier (PWKM) tests. During a trial these tests may alternately assume the role of the more extreme statistic. By monitoring PEMAX, the maximum between the absolute values of the standardized PWLR and PWKM, one combines advantages of rank-based and non rank-based paired testing paradigms. Simulations show that monitoring treatment differences using PEMAX maintains type I error and is nearly as powerful as using the more advantageous of the two tests, in proportional hazards …


Nonparametric Methods For Analyzing Replication Origins In Genomewide Data, Debashis Ghosh Jun 2004

Nonparametric Methods For Analyzing Replication Origins In Genomewide Data, Debashis Ghosh

The University of Michigan Department of Biostatistics Working Paper Series

Due to the advent of high-throughput genomic technology, it has become possible to globally monitor cellular activities on a genomewide basis. With these new methods, scientists can begin to address important biological questions. One such question involves the identification of replication origins, which are regions in chromosomes where DNA replication is initiated. In addition, one hypothesis regarding replication origins is that their locations are non-random throughout the genome. In this article, we develop methods for identification of and cluster inference regarding replication origins involving genomewide expression data. We compare several nonparametric regression methods for the identification of replication origin locations. …


The False Discovery Rate: A Variable Selection Perspective, Debashis Ghosh, Wei Chen, Trivellore E. Raghuanthan Jun 2004

The False Discovery Rate: A Variable Selection Perspective, Debashis Ghosh, Wei Chen, Trivellore E. Raghuanthan

The University of Michigan Department of Biostatistics Working Paper Series

In many scientific and medical settings, large-scale experiments are generating large quantities of data that lead to inferential problems involving multiple hypotheses. This has led to recent tremendous interest in statistical methods regarding the false discovery rate (FDR). Several authors have studied the properties involving FDR in a univariate mixture model setting. In this article, we turn the problem on its side; in this manuscript, we show that FDR is a by-product of Bayesian analysis of variable selection problem for a hierarchical linear regression model. This equivalence gives many Bayesian insights as to why FDR is a natural quantity to …


Semiparametic Models And Estimation Procedures For Binormal Roc Curves With Multiple Biomarkers, Debashis Ghosh May 2004

Semiparametic Models And Estimation Procedures For Binormal Roc Curves With Multiple Biomarkers, Debashis Ghosh

The University of Michigan Department of Biostatistics Working Paper Series

In diagnostic medicine, there is great interest in developing strategies for combining biomarkers in order to optimize classification accuracy. A popular model that has been used for receiver operating characteristic (ROC) curve modelling when one biomarker is available is the binormal model. Extension of the model to accommodate multiple biomarkers has not been considered in this literature. Here, we consider a multivariate binormal framework for combining biomarkers using copula functions that leads to a natural multivariate extension of the binormal model. Estimation in this model will be done using rank-based procedures. We show that the Van der Waerden rank score …


Nonparametric And Semiparametric Inference For Models Of Tumor Size And Metastasis, Debashis Ghosh May 2004

Nonparametric And Semiparametric Inference For Models Of Tumor Size And Metastasis, Debashis Ghosh

The University of Michigan Department of Biostatistics Working Paper Series

There has been some recent work in the statistical literature for modelling the relationship between the size of primary cancers and the occurrences of metastases. While nonparametric methods have been proposed for estimation of the tumor size distribution at which metastatic transition occurs, their asymptotic properties have not been studied. In addition, no testing or regression methods are available so that potential confounders and prognostic factors can be adjusted for. We develop a unified approach to nonparametric and semiparametric analysis of modelling tumor size-metastasis data in this article. An equivalence between the models considered by previous authors with survival data …


Resampling Methods For Estimating Functions With U-Statistic Structure, Wenyu Jiang, Jack Kalbfleisch Apr 2004

Resampling Methods For Estimating Functions With U-Statistic Structure, Wenyu Jiang, Jack Kalbfleisch

The University of Michigan Department of Biostatistics Working Paper Series

Suppose that inference about parameters of interest is to be based on an unbiased estimating function that is U-statistic of degree 1 or 2. We define suitable studentized versions of such estimating functions and consider asymptotic approximations as well as an estimating function bootstrap (EFB) method based on resampling the estimated terms in the estimating functions. These methods are justified asymptotically and lead to confidence intervals produced directly from the studentized estimating functions. Particular examples in this class of estimating functions arise in La estimation as well as Wilcoxon rank regression and other related estimation problems. The proposed methods are …


Multiple Imputation For Interval Censored Data With Auxiliary Variables, Chiu-Hsieh Hsu, Jeremy Taylor, Susan Murray Feb 2004

Multiple Imputation For Interval Censored Data With Auxiliary Variables, Chiu-Hsieh Hsu, Jeremy Taylor, Susan Murray

The University of Michigan Department of Biostatistics Working Paper Series

We propose a nonparametric multiple imputation scheme, NPMLE imputation, for the analysis of interval censored survival data. Features of the method are that it converts interval-censored data problems to complete data or right censored data problems to which many standard approaches can be used, and the measures of uncertainty are easily obtained. In addition to the event time of primary interest, there are frequently other auxiliary variables that are associated with the event time. For the goal of estimating the marginal survival distribution, these auxiliary variables may provide some additional information about the event time for the interval censored observations. …


Piecewise Constant Cross-Ratio Estimation For Association In Bivariate Survival Data With Application To Studying Markers Of Menopausal Transition, Bin Nan, Xihong Lin, Lynda D. Lisabet, Sioban Harlow Feb 2004

Piecewise Constant Cross-Ratio Estimation For Association In Bivariate Survival Data With Application To Studying Markers Of Menopausal Transition, Bin Nan, Xihong Lin, Lynda D. Lisabet, Sioban Harlow

The University of Michigan Department of Biostatistics Working Paper Series

A question of significant interest in female reproductive aging is to identify bleeding criteria for the menopausal transition. Although various bleeding criteria, or markers, have been proposed for the menopausal transition, their validity has not been adequately examined. The Tremin Trust data are collected from a long-term cohort study that followed a group of women throughout their whole reproductive life, and provide a unique opportunity for assessing the association between age at onset of a bleeding marker and age onset of menopause. Formal statistical analysis of this dependence is challenging give the fact that both the marker event and menopause …


Robust Likelihood-Based Analysis Of Multivariate Data With Missing Values, Rod Little, An Hyonggin Dec 2003

Robust Likelihood-Based Analysis Of Multivariate Data With Missing Values, Rod Little, An Hyonggin

The University of Michigan Department of Biostatistics Working Paper Series

The model-based approach to inference from multivariate data with missing values is reviewed. Regression prediction is most useful when the covariates are predictive of the missing values and the probability of being missing, and in these circumstances predictions are particularly sensitive to model misspecification. The use of penalized splines of the propensity score is proposed to yield robust model-based inference under the missing at random (MAR) assumption, assuming monotone missing data. Simulation comparisons with other methods suggest that the method works well in a wide range of populations, with little loss of efficiency relative to parametric models when the latter …


Weighting Adjustments For Unit Nonresponse With Multiple Outcome Variables, Sonya L. Vartivarian, Rod Little Nov 2003

Weighting Adjustments For Unit Nonresponse With Multiple Outcome Variables, Sonya L. Vartivarian, Rod Little

The University of Michigan Department of Biostatistics Working Paper Series

Weighting is a common form of unit nonresponse adjustment in sample surveys where entire questionnaires are missing due to noncontact or refusal to participate. Weights are inversely proportional to the probability of selection and response. A common approach computes the response weight adjustment cells based on covariate information. When the number of cells thus created is too large, a coarsening method such as response propensity stratification can be applied to reduce the number of adjustment cells. Simulations in Vartivarian and Little (2002) indicate improved efficiency and robustness of weighting adjustments based on the joint classification of the sample by two …


Maximum Likelihood Estimation Of Ordered Multinomial Parameters , Nicholas P. Jewell, Jack Kalbfleisch Oct 2003

Maximum Likelihood Estimation Of Ordered Multinomial Parameters , Nicholas P. Jewell, Jack Kalbfleisch

The University of Michigan Department of Biostatistics Working Paper Series

The pool-adjacent violator-algorithm (Ayer et al., 1955) has long been known to give the maximum likelihood estimator of a series of ordered binomial parameters, based on an independent observation from each distribution (see, Barlow et al., 1972). This result has immediate application to estimation of a survival distribution based on current survival status at a set of monitoring times. This paper considers an extended problem of maximum likelihood estimation of a series of ‘ordered’ multinomial parameters pi = (p1i, p2i, . . . , pmi) for 1 < = I < = k, where ordered means that pj1 < = pj2 < = .. . < = pjk for each j with 1 < = j < = m-1. The data consist of k independent observations X1, . . . ,Xk where Xi has a multinomial distribution with probability parameter pi and known index ni > = 1. By making use of variants of the pool adjacent violator algorithm, …


Equivalent Kernels Of Smoothing Splines In Nonparametric Regression For Clustered/Longitudinal Data, Xihong Lin, Naisyin Wang, Alan H. Welsh, Raymond J. Carroll Sep 2003

Equivalent Kernels Of Smoothing Splines In Nonparametric Regression For Clustered/Longitudinal Data, Xihong Lin, Naisyin Wang, Alan H. Welsh, Raymond J. Carroll

The University of Michigan Department of Biostatistics Working Paper Series

We compare spline and kernel methods for clustered/longitudinal data. For independent data, it is well known that kernel methods and spline methods are essentially asymptotically equivalent (Silverman, 1984). However, the recent work of Welsh, et al. (2002) shows that the same is not true for clustered/longitudinal data. First, conventional kernel methods fail to account for the within- cluster correlation, while spline methods are able to account for this correlation. Second, kernel methods and spline methods were found to have different local behavior, with conventional kernels being local and splines being non-local. To resolve these differences, we show that a smoothing …


Efficient Semiparametric Marginal Estimation For Longitudinal/Clustered Data, Naisyin Wang, Raymond J. Carroll, Xihong Lin Sep 2003

Efficient Semiparametric Marginal Estimation For Longitudinal/Clustered Data, Naisyin Wang, Raymond J. Carroll, Xihong Lin

The University of Michigan Department of Biostatistics Working Paper Series

We consider marginal generalized semiparametric partially linear models for clustered data. Lin and Carroll (2001a) derived the semiparametric efficinet score funtion for this problem in the mulitvariate Gaussian case, but they were unable to contruct a semiparametric efficient estimator that actually achieved the semiparametric information bound. We propose such an estimator here and generalize the work to marginal generalized partially liner models. Asymptotic relative efficincies of the estimation or throughout are investigated. The finite sample performance of these estimators is evaluated through simulations and illustrated using a longtiudinal CD4 count data set. Both theoretical and numerical results indicate that properly …


Inference For The Population Total From Probability-Proportional-To-Size Samples Based On Predictions From A Penalized Spline Nonparametric Model, Hui Zheng, Rod Little Aug 2003

Inference For The Population Total From Probability-Proportional-To-Size Samples Based On Predictions From A Penalized Spline Nonparametric Model, Hui Zheng, Rod Little

The University of Michigan Department of Biostatistics Working Paper Series

Inference about the finite population total from probability-proportional-to-size (PPS) samples is considered. In previous work (Zheng and Little, 2003), penalized spline (p-spline) nonparametric model-based estimators were shown to generally outperform the Horvitz-Thompson (HT) and generalized regression (GR) estimators in terms of the root mean squared error. In this article we develop model-based, jackknife and balanced repeated replicate variance estimation methods for the p-spline based estimators. Asymptotic properties of the jackknife method are discussed. Simulations show that p-spline point estimators and their jackknife standard errors lead to inferences that are superior to HT or GR based inferences. This suggests that nonparametric …


Maximization By Parts In Likelihood Inference, Peter Xuekun Song, Yanqin Fan, Jack Kalbfleisch Jun 2003

Maximization By Parts In Likelihood Inference, Peter Xuekun Song, Yanqin Fan, Jack Kalbfleisch

The University of Michigan Department of Biostatistics Working Paper Series

This paper presents and examines a new algorithm for solving a score equation for the maximum likelyhood estimate in certain problems of practical interest. The method circumvents the need to compute second order derivaties of the full likelihood function. It exploits the structure of certain models that yield a natural decomposition of a very complicated likelihood function. In this decomposition, the first part is a log likelihood from a simply analyzed model and the second part is used to update estimates from the first. Convergence properties of this fixed point algorithm are examined and asymptotics are derived for estimators obtained …


Semiparametric Regression Models With Missing Data: The Mathematics In The Work Of Robins Et Al., Menggang Yu, Bin Nan May 2003

Semiparametric Regression Models With Missing Data: The Mathematics In The Work Of Robins Et Al., Menggang Yu, Bin Nan

The University of Michigan Department of Biostatistics Working Paper Series

This review is an attempt to understand the landmark papers of Robins, Rotnitzky, and Zhao (1994) and Robins and Rotnitzky (1992). We revisit their main results and corresponding proofs using the theory outlined in the monograph by Bickel, Klaassen, Ritov, and Wellner (1993). We also discuss an illustrative example to show the details of applying these theoretical results.


Penalized Spline Nonparametric Mixed Models For Inference About A Finite Population Mean From Two-Stage Samples, Hui Zheng, Rod Little Mar 2003

Penalized Spline Nonparametric Mixed Models For Inference About A Finite Population Mean From Two-Stage Samples, Hui Zheng, Rod Little

The University of Michigan Department of Biostatistics Working Paper Series

Samplers often distrust model-based approaches to survey inference due to concerns about model misspecification when applied to large samples from complex populations. We suggest that the model-based paradigm can work very successfully in survey settings, provided models are chosen that take into account the sample design and avoid strong parametric assumptions. The Horvitz-Thompson (HT) estimator is a simple design-unbiased estimator of the finite population total in probability sampling designs. From a modeling perspective, the HT estimator performs well when the ratios of the outcome values and the inclusion probabilities are exchangeable. When this assumption is not met, the HT estimator …