Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Prediction (13)
- Causal inference (11)
- Genetics (11)
- Model selection (11)
- Bootstrap (10)
-
- Cross-validation (9)
- Adjusted p-value (8)
- Multiple testing (8)
- Type I error rate (8)
- Counterfactual (7)
- False discovery rate (7)
- Censored data (6)
- Classification (6)
- Counting process (6)
- Estimating equation (6)
- Gene expression (6)
- Loss function (6)
- Null distribution (6)
- Risk (6)
- Survival analysis (6)
- Asymptotic control (5)
- Current status data (5)
- Density estimation (5)
- Diagnostic tests (5)
- Generalized family-wise error rate (5)
- Influence curve (5)
- Longitudinal data (5)
- Microarray (5)
- Multiple hypothesis testing (5)
- Proportion of false positives (5)
- Publication Year
- Publication
-
- U.C. Berkeley Division of Biostatistics Working Paper Series (116)
- Harvard University Biostatistics Working Paper Series (73)
- UW Biostatistics Working Paper Series (55)
- Johns Hopkins University, Dept. of Biostatistics Working Papers (43)
- The University of Michigan Department of Biostatistics Working Paper Series (24)
Articles 331 - 336 of 336
Full-Text Articles in Statistics and Probability
A Method To Identify Significant Clusters In Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan
A Method To Identify Significant Clusters In Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Clustering algorithms have been widely applied to gene expression data. For both hierarchical and partitioning clustering algorithms, selecting the number of significant clusters is an important problem and many methods have been proposed. Existing methods for selecting the number of clusters tend to find only the global patterns in the data (e.g.: the over and under expressed genes). We have noted the need for a better method in the gene expression context, where small, biologically meaningful clusters can be difficult to identify. In this paper, we define a new criteria, Mean Split Silhouette (MSS), which is a measure of cluster …
A New Partitioning Around Medoids Algorithm, Mark J. Van Der Laan, Katherine S. Pollard, Jennifer Bryan
A New Partitioning Around Medoids Algorithm, Mark J. Van Der Laan, Katherine S. Pollard, Jennifer Bryan
U.C. Berkeley Division of Biostatistics Working Paper Series
Kaufman & Rousseeuw (1990) proposed a clustering algorithm Partitioning Around Medoids (PAM) which maps a distance matrix into a specified number of clusters. A particularly nice property is that PAM allows clustering with respect to any specified distance metric. In addition, the medoids are robust representations of the cluster centers, which is particularly important in the common context that many elements do not belong well to any cluster. Based on our experience in clustering gene expression data, we have noticed that PAM does have problems recognizing relatively small clusters in situations where good partitions around medoids clearly exist. In this …
Marginal Regression Of Gaps Between Recurrent Events, Yijian Huang, Ying Qing Chen
Marginal Regression Of Gaps Between Recurrent Events, Yijian Huang, Ying Qing Chen
U.C. Berkeley Division of Biostatistics Working Paper Series
Recurrent event data typically exhibit the phenomenon of intra-individual correlation, owing to not only observed covariates but also random effects. In many applications, the population can be reasonably postulated as a heterogeneous mixture of individual renewal processes, and the inference of interest is the effect of individual-level covariates. In this article, we suggest and investigate a marginal proportional hazards model for gaps between recurrent events. A connection is established between observed gap times and clustered survival data, however, with informative cluster size. We then derive a novel and general inference procedure for the latter, based on a functional formulation of …
Maximum Likelihood Estimation Of Ordered Multinomial Parameters, Nicholas P. Jewell, John D. Kalbfleisch
Maximum Likelihood Estimation Of Ordered Multinomial Parameters, Nicholas P. Jewell, John D. Kalbfleisch
U.C. Berkeley Division of Biostatistics Working Paper Series
The pool-adjacent violator-algorithm (Ayer, et al., 1955) has long been known to give the maximum likelihood estimator of a series of ordered binomial parameters, based on an independent observation from each distribution (see Barlow et al., 1972). This result has immediate application to estimation of a survival distribution based on current survival status at a set of monitoring times. This paper considers an extended problem of maximum likelihood estimation of a series of ‘ordered’ multinomial parameters. By making use of variants of the pool adjacent violator algorithm, we obtain a simple algorithm to compute the maximum likelihood estimator and demonstrate …
Statistical Inference For Simultaneous Clustering Of Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan
Statistical Inference For Simultaneous Clustering Of Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Current methods for analysis of gene expression data are mostly based on clustering and classification of either genes or samples. We offer support for the idea that more complex patterns can be identified in the data if genes and samples are considered simultaneously. We formalize the approach and propose a statistical framework for two-way clustering. A simultaneous clustering parameter is defined as a function of the true data generating distribution, and an estimate is obtained by applying this function to the empirical distribution. We illustrate that a wide range of clustering procedures, including generalized hierarchical methods, can be defined as …
Assessing The Accuracy Of A New Diagnostic Test When A Gold Standard Does Not Exist, Todd A. Alonzo, Margaret S. Pepe
Assessing The Accuracy Of A New Diagnostic Test When A Gold Standard Does Not Exist, Todd A. Alonzo, Margaret S. Pepe
UW Biostatistics Working Paper Series
Often the accuracy of a new diagnostic test must be assessed when a perfect gold standard does not exist. Use of an imperfect test biases the accuracy estimates of the new test. This paper reviews existing approaches to this problem including discrepant resolution and latent class analysis. Deficiencies with these approaches are identified. A new approach is proposed that combines the results of several imperfect reference tests to define a better reference standard. We call this the composite reference standard (CRS). Using the CRS, accuracy can be assessed using multistage sampling designs. Maximum likelihood estimates of accuracy and expressions for …