Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Multivariate Analysis

PDF

Institution
Keyword
Publication Year
Publication
Publication Type

Articles 151 - 166 of 166

Full-Text Articles in Statistical Models

Test Statistics Null Distributions In Multiple Testing: Simulation Studies And Applications To Genomics, Katherine S. Pollard, Merrill D. Birkner, Mark J. Van Der Laan, Sandrine Dudoit Jul 2005

Test Statistics Null Distributions In Multiple Testing: Simulation Studies And Applications To Genomics, Katherine S. Pollard, Merrill D. Birkner, Mark J. Van Der Laan, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

Multiple hypothesis testing problems arise frequently in biomedical and genomic research, for instance, when identifying differentially expressed or co-expressed genes in microarray experiments. We have developed generally applicable resampling-based single-step and stepwise multiple testing procedures (MTP) for control of a broad class of Type I error rates, defined as tail probabilities and expected values for arbitrary functions of the numbers of false positives and rejected hypotheses (Dudoit and van der Laan, 2005; Dudoit et al., 2004a,b; Pollard and van der Laan, 2004; van der Laan et al., 2005, 2004a,b). As argued in the early article of Pollard and van der …


New Statistical Paradigms Leading To Web-Based Tools For Clinical/Translational Science, Knut M. Wittkowski May 2005

New Statistical Paradigms Leading To Web-Based Tools For Clinical/Translational Science, Knut M. Wittkowski

COBRA Preprint Series

As the field of functional genetics and genomics is beginning to mature, we become confronted with new challenges. The constant drop in price for sequencing and gene expression profiling as well as the increasing number of genetic and genomic variables that can be measured makes it feasible to address more complex questions. The success with rare diseases caused by single loci or genes has provided us with a proof-of-concept that new therapies can be developed based on functional genomics and genetics.

Common diseases, however, typically involve genetic epistasis, genomic pathways, and proteomic pattern. Moreover, to better understand the underlying biologi-cal …


Causal Inference In Longitudinal Studies With History-Restricted Marginal Structural Models, Romain Neugebauer, Mark J. Van Der Laan, Ira B. Tager Apr 2005

Causal Inference In Longitudinal Studies With History-Restricted Marginal Structural Models, Romain Neugebauer, Mark J. Van Der Laan, Ira B. Tager

U.C. Berkeley Division of Biostatistics Working Paper Series

Causal Inference based on Marginal Structural Models (MSMs) is particularly attractive to subject-matter investigators because MSM parameters provide explicit representations of causal effects. We introduce History-Restricted Marginal Structural Models (HRMSMs) for longitudinal data for the purpose of defining causal parameters which may often be better suited for Public Health research. This new class of MSMs allows investigators to analyze the causal effect of a treatment on an outcome based on a fixed, shorter and user-specified history of exposure compared to MSMs. By default, the latter represents the treatment causal effect of interest based on a treatment history defined by the …


Survival Ensembles, Torsten Hothorn, Peter Buhlmann, Sandrine Dudoit, Annette M. Molinaro, Mark J. Van Der Laan Apr 2005

Survival Ensembles, Torsten Hothorn, Peter Buhlmann, Sandrine Dudoit, Annette M. Molinaro, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We propose a unified and flexible framework for ensemble learning in the presence of censoring. For right-censored data, we introduce a random forest algorithm and a generic gradient boosting algorithm for the construction of prognostic models. The methodology is utilized for predicting the survival time of patients suffering from acute myeloid leukemia based on clinical and genetic covariates. Furthermore, we compare the diagnostic capabilities of the proposed censored data random forest and boosting methods applied to the recurrence free survival time of node positive breast cancer patients with previously published findings.


Combining Predictors For Classification Using The Area Under The Roc Curve, Margaret S. Pepe, Tianxi Cai, Zheng Zhang, Gary M. Longton Jan 2005

Combining Predictors For Classification Using The Area Under The Roc Curve, Margaret S. Pepe, Tianxi Cai, Zheng Zhang, Gary M. Longton

UW Biostatistics Working Paper Series

No single biomarker for cancer is considered adequately sensitive and specific for cancer screening. It is expected that the results of multiple markers will need to be combined in order to yield adequately accurate classification. Typically the objective function that is optimized for combining markers is the likelihood function. In this paper we consider an alternative objective function -- the area under the empirical receiver operating characteristic curve (AUC). We note that it yields consistent estimates of parameters in a generalized linear model for the risk score but does not require specifying the link function. Like logistic regression it yields …


Spatially Adaptive Bayesian P-Splines With Heteroscedastic Errors, Ciprian M. Crainiceanu, David Ruppert, Raymond J. Carroll Nov 2004

Spatially Adaptive Bayesian P-Splines With Heteroscedastic Errors, Ciprian M. Crainiceanu, David Ruppert, Raymond J. Carroll

Johns Hopkins University, Dept. of Biostatistics Working Papers

An increasingly popular tool for nonparametric smoothing are penalized splines (P-splines) which use low-rank spline bases to make computations tractable while maintaining accuracy as good as smoothing splines. This paper extends penalized spline methodology by both modeling the variance function nonparametrically and using a spatially adaptive smoothing parameter. These extensions have been studied before, but never together and never in the multivariate case. This combination is needed for satisfactory inference and can be implemented effectively by Bayesian \mbox{MCMC}. The variance process controlling the spatially-adaptive shrinkage of the mean and the variance of the heteroscedastic error process are modeled as log-penalized …


Finding Cancer Subtypes In Microarray Data Using Random Projections, Debashis Ghosh Oct 2004

Finding Cancer Subtypes In Microarray Data Using Random Projections, Debashis Ghosh

The University of Michigan Department of Biostatistics Working Paper Series

One of the benefits of profiling of cancer samples using microarrays is the generation of molecular fingerprints that will define subtypes of disease. Such subgroups have typically been found in microarray data using hierarchical clustering. A major problem in interpretation of the output is determining the number of clusters. We approach the problem of determining disease subtypes using mixture models. A novel estimation procedure of the parameters in the mixture model is developed based on a combination of random projections and the expectation-maximization algorithm. Because the approach is probabilistic, our approach provides a measure for the number of true clusters …


Combining Predictors For Classification Using The Area Under The Roc Curve, Margaret S. Pepe, Tianxi Cai, Zheng Zhang Jun 2004

Combining Predictors For Classification Using The Area Under The Roc Curve, Margaret S. Pepe, Tianxi Cai, Zheng Zhang

UW Biostatistics Working Paper Series

We compare simple logistic regression with an alternative robust procedure for constructing linear predictors to be used for the two state classification task. Theoritical advantages of the robust procedure over logistic regression are: (i) although it assumes a generalized linear model for the dichotomous outcome variable, it does not require specification of the link function; (ii) it accommodates case-control designs even when the model is not logistic; and (iii) it yields sensible results even when the generalized linear model assumption fails to hold. Surprisingly, we find that the linear predictor derived from the logistic regression likelihood is very robust in …


Classification Using Generalized Partial Least Squares, Beiying Ding, Robert Gentleman May 2004

Classification Using Generalized Partial Least Squares, Beiying Ding, Robert Gentleman

Bioconductor Project Working Papers

The advances in computational biology have made simultaneous monitoring of thousands of features possible. The high throughput technologies not only bring about a much richer information context in which to study various aspects of gene functions but they also present challenge of analyzing data with large number of covariates and few samples. As an integral part of machine learning, classification of samples into two or more categories is almost always of interest to scientists. In this paper, we address the question of classification in this setting by extending partial least squares (PLS), a popular dimension reduction tool in chemometrics, in …


A Nested Unsupervised Approach To Identifying Novel Molecular Subtypes, Elizabeth Garrett, Giovanni Parmigiani Oct 2003

A Nested Unsupervised Approach To Identifying Novel Molecular Subtypes, Elizabeth Garrett, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

In classification problems arising in genomics research it is common to study populations for which a broad class assignment is known (say, normal versus diseased) and one seeks to find undiscovered subclasses within one or both of the known classes. Formally, this problem can be thought of as an unsupervised analysis nested within a supervised one. Here we take the view that the nested unsupervised analysis can successfully utilize information from the entire data set for constructing and/or selecting useful predictors. Specifically, we propose a mixture model approach to the nested unsupervised problem, where the supervised information is used to …


Tree-Based Multivariate Regression And Density Estimation With Right-Censored Data , Annette M. Molinaro, Sandrine Dudoit, Mark J. Van Der Laan Sep 2003

Tree-Based Multivariate Regression And Density Estimation With Right-Censored Data , Annette M. Molinaro, Sandrine Dudoit, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We propose a unified strategy for estimator construction, selection, and performance assessment in the presence of censoring. This approach is entirely driven by the choice of a loss function for the full (uncensored) data structure and can be stated in terms of the following three main steps. (1) Define the parameter of interest as the minimizer of the expected loss, or risk, for a full data loss function chosen to represent the desired measure of performance. Map the full data loss function into an observed (censored) data loss function having the same expected value and leading to an efficient estimator …


Semi-Parametric Regression For The Area Under The Receiver Operating Characteristic Curve, Lori E. Dodd, Margaret S. Pepe Jan 2003

Semi-Parametric Regression For The Area Under The Receiver Operating Characteristic Curve, Lori E. Dodd, Margaret S. Pepe

UW Biostatistics Working Paper Series

Medical advances continue to provide new and potentially better means for detecting disease. Such is true in cancer, for example, where biomarkers are sought for early detection and where improvements in imaging methods may pick up the initial functional and molecular changes associated with cancer development. In other binary classification tasks, computational algorithms such as Neural Networks, Support Vector Machines and Evolutionary Algorithms have been applied to areas as diverse as credit scoring, object recognition, and peptide-binding prediction. Before a classifier becomes an accepted technology, it must undergo rigorous evaluation to determine its ability to discriminate between states. Characterization of …


Additive Nonparametric Regression With Autocorrelated Errors, Michael S. Smith, C Wong, Robert Kohn Dec 1997

Additive Nonparametric Regression With Autocorrelated Errors, Michael S. Smith, C Wong, Robert Kohn

Michael Stanley Smith

A Bayesian approach is presented for nonparametric estimation of an additive regression model with autocorrelated errors. Each of the potentially nonlinear components is modelled as a regression spline using many knots, while the errors are modelled by a high order stationary autoregressive process parameterised in terms of its autocorrelations. The distribution of significant knots and partial autocorrelations is accounted for using subset selection. Our approach also allows the selection of a suitable transformation of the dependent variable. All aspects of the model are estimated simultaneously using Markov chain Monte Carlo. It is shown empirically that the proposed approach works well …


A Bayesian Approach To Additive Nonparametric Regression, Michael S. Smith, Robert Kohn Dec 1993

A Bayesian Approach To Additive Nonparametric Regression, Michael S. Smith, Robert Kohn

Michael Stanley Smith

This proceedings paper was the first to suggest using a Gaussian g-prior combined with a point mass to undertake Bayesian variable selection in a Gaussian linear regression model. It also was the first to suggest integrating out the regression parameters and variance in closed form, resulting in an efficient Gibbs sampling scheme. The idea was applied to estimate regression functions in an additive model by using a linear basis expansion for each component function in an additive model. The conference proceeding was eventually published in a slightly tighter form in Journal of Econometrics (1996).


Regression Models For Bivariate Binary Responses, Juni Palmgren Nov 1989

Regression Models For Bivariate Binary Responses, Juni Palmgren

UW Biostatistics Working Paper Series

We discuss maximum likelihood inference for the bivariate logistic model, specified in terms of the marginal logits and the log odds ratio. Using the exponential family nonlinear model formulation the model fitting can be done in GLIM. The procedure is illustrated by modelling survival of unilateral and bilateral total hip arthroplasties as function of patient specific and hip specific covariates. We compare maximum likelihood inference with inference obtained from solving likelihood equations under the assumption of within block independence and using robust standard errors for the estimates. Simulations indicate that the latter procedure is effcient for block specific covariates but …


The Effect Of A Values Clarification Strategy On The Smoking Behavior Of Pregnant Women, Jean Eiber Nov 1979

The Effect Of A Values Clarification Strategy On The Smoking Behavior Of Pregnant Women, Jean Eiber

Loma Linda University Electronic Theses, Dissertations & Projects

The null hypothesis of this study stated that there would be no significant difference (α=.05) in the change in smoking behavior following a teaching intervention to discourage smoking in pregnant women between those who completed a values clarifying strategy prior to the intervention and those who did not. Forty subjects were randomly assigned 20 to an experimental group and 20 to a control group. Both groups completed a pretest questionnaire concerning demographic data and smoking behavior. Information regarding the adverse effects on the fetus and on the developing child of smoking was presented to each subject by the researcher. The …