Open Access. Powered by Scholars. Published by Universities.®
- Institution
-
- COBRA (32)
- Selected Works (16)
- SelectedWorks (11)
- Southern Methodist University (7)
- University of Kentucky (7)
-
- University of Tennessee, Knoxville (7)
- University of Massachusetts Amherst (6)
- Florida International University (5)
- City University of New York (CUNY) (4)
- Georgia Southern University (4)
- Virginia Commonwealth University (4)
- Loma Linda University (3)
- University of Arkansas, Fayetteville (3)
- Western University (3)
- James Madison University (2)
- Kennesaw State University (2)
- Michigan Technological University (2)
- Missouri State University (2)
- Murray State University (2)
- The Texas Medical Center Library (2)
- University of Louisville (2)
- University of Nebraska - Lincoln (2)
- University of New Mexico (2)
- University of New Orleans (2)
- Wayne State University (2)
- Bard College (1)
- Bellarmine University (1)
- Brigham Young University (1)
- Bryant University (1)
- Cal Poly Humboldt (1)
- Keyword
-
- Classification (8)
- Bayesian Model Averaging and Semiparametric Regression (7)
- Statistics (7)
- Copula Modeling (6)
- Functional Data Analysis (6)
-
- Machine learning (6)
- Regression (6)
- Genomics (5)
- Statistical Models (5)
- Forecasting and Time Series (4)
- GIS (4)
- Logistic regression (4)
- Proteomics (4)
- Bayesian (3)
- Clustering (3)
- Correlation (3)
- Cross-validation (3)
- Genetics (3)
- Image Analysis (3)
- MCMC (3)
- Model selection (3)
- Multivariate Models in Marketing (3)
- Poisson (3)
- Prediction (3)
- Random forest (3)
- Statistical Theory and Methods (3)
- Survival analysis (3)
- Time Series (3)
- Bayesian methods (2)
- Biomarkers (2)
- Publication Year
- Publication
-
- Michael Stanley Smith (13)
- Jeffrey S. Morris (11)
- Doctoral Dissertations (9)
- COBRA Preprint Series (8)
- Electronic Theses and Dissertations (7)
-
- Harvard University Biostatistics Working Paper Series (7)
- Theses and Dissertations (7)
- SMU Data Science Review (6)
- Theses and Dissertations--Statistics (6)
- U.C. Berkeley Division of Biostatistics Working Paper Series (6)
- FIU Electronic Theses and Dissertations (5)
- UW Biostatistics Working Paper Series (5)
- Electronic Thesis and Dissertation Repository (3)
- Johns Hopkins University, Dept. of Biostatistics Working Papers (3)
- Loma Linda University Electronic Theses, Dissertations & Projects (3)
- Dissertations & Theses (Open Access) (2)
- Dissertations, Master's Theses and Master's Reports (2)
- Graduate Theses and Dissertations (2)
- MSU Graduate Theses (2)
- Masters Theses (2)
- Medical Student Research Symposium (2)
- University of New Orleans Theses and Dissertations (2)
- All Master's Theses (1)
- Annual Symposium on Biomathematics and Ecology Education and Research (1)
- Bioconductor Project Working Papers (1)
- Blair T. Johnson (1)
- CHIP Documents (1)
- Cal Poly Humboldt theses and projects (1)
- Chancellor’s Honors Program Projects (1)
- Computational and Data Sciences (PhD) Dissertations (1)
Articles 151 - 166 of 166
Full-Text Articles in Statistical Models
Test Statistics Null Distributions In Multiple Testing: Simulation Studies And Applications To Genomics, Katherine S. Pollard, Merrill D. Birkner, Mark J. Van Der Laan, Sandrine Dudoit
Test Statistics Null Distributions In Multiple Testing: Simulation Studies And Applications To Genomics, Katherine S. Pollard, Merrill D. Birkner, Mark J. Van Der Laan, Sandrine Dudoit
U.C. Berkeley Division of Biostatistics Working Paper Series
Multiple hypothesis testing problems arise frequently in biomedical and genomic research, for instance, when identifying differentially expressed or co-expressed genes in microarray experiments. We have developed generally applicable resampling-based single-step and stepwise multiple testing procedures (MTP) for control of a broad class of Type I error rates, defined as tail probabilities and expected values for arbitrary functions of the numbers of false positives and rejected hypotheses (Dudoit and van der Laan, 2005; Dudoit et al., 2004a,b; Pollard and van der Laan, 2004; van der Laan et al., 2005, 2004a,b). As argued in the early article of Pollard and van der …
New Statistical Paradigms Leading To Web-Based Tools For Clinical/Translational Science, Knut M. Wittkowski
New Statistical Paradigms Leading To Web-Based Tools For Clinical/Translational Science, Knut M. Wittkowski
COBRA Preprint Series
As the field of functional genetics and genomics is beginning to mature, we become confronted with new challenges. The constant drop in price for sequencing and gene expression profiling as well as the increasing number of genetic and genomic variables that can be measured makes it feasible to address more complex questions. The success with rare diseases caused by single loci or genes has provided us with a proof-of-concept that new therapies can be developed based on functional genomics and genetics.
Common diseases, however, typically involve genetic epistasis, genomic pathways, and proteomic pattern. Moreover, to better understand the underlying biologi-cal …
Causal Inference In Longitudinal Studies With History-Restricted Marginal Structural Models, Romain Neugebauer, Mark J. Van Der Laan, Ira B. Tager
Causal Inference In Longitudinal Studies With History-Restricted Marginal Structural Models, Romain Neugebauer, Mark J. Van Der Laan, Ira B. Tager
U.C. Berkeley Division of Biostatistics Working Paper Series
Causal Inference based on Marginal Structural Models (MSMs) is particularly attractive to subject-matter investigators because MSM parameters provide explicit representations of causal effects. We introduce History-Restricted Marginal Structural Models (HRMSMs) for longitudinal data for the purpose of defining causal parameters which may often be better suited for Public Health research. This new class of MSMs allows investigators to analyze the causal effect of a treatment on an outcome based on a fixed, shorter and user-specified history of exposure compared to MSMs. By default, the latter represents the treatment causal effect of interest based on a treatment history defined by the …
Survival Ensembles, Torsten Hothorn, Peter Buhlmann, Sandrine Dudoit, Annette M. Molinaro, Mark J. Van Der Laan
Survival Ensembles, Torsten Hothorn, Peter Buhlmann, Sandrine Dudoit, Annette M. Molinaro, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
We propose a unified and flexible framework for ensemble learning in the presence of censoring. For right-censored data, we introduce a random forest algorithm and a generic gradient boosting algorithm for the construction of prognostic models. The methodology is utilized for predicting the survival time of patients suffering from acute myeloid leukemia based on clinical and genetic covariates. Furthermore, we compare the diagnostic capabilities of the proposed censored data random forest and boosting methods applied to the recurrence free survival time of node positive breast cancer patients with previously published findings.
Combining Predictors For Classification Using The Area Under The Roc Curve, Margaret S. Pepe, Tianxi Cai, Zheng Zhang, Gary M. Longton
Combining Predictors For Classification Using The Area Under The Roc Curve, Margaret S. Pepe, Tianxi Cai, Zheng Zhang, Gary M. Longton
UW Biostatistics Working Paper Series
No single biomarker for cancer is considered adequately sensitive and specific for cancer screening. It is expected that the results of multiple markers will need to be combined in order to yield adequately accurate classification. Typically the objective function that is optimized for combining markers is the likelihood function. In this paper we consider an alternative objective function -- the area under the empirical receiver operating characteristic curve (AUC). We note that it yields consistent estimates of parameters in a generalized linear model for the risk score but does not require specifying the link function. Like logistic regression it yields …
Spatially Adaptive Bayesian P-Splines With Heteroscedastic Errors, Ciprian M. Crainiceanu, David Ruppert, Raymond J. Carroll
Spatially Adaptive Bayesian P-Splines With Heteroscedastic Errors, Ciprian M. Crainiceanu, David Ruppert, Raymond J. Carroll
Johns Hopkins University, Dept. of Biostatistics Working Papers
An increasingly popular tool for nonparametric smoothing are penalized splines (P-splines) which use low-rank spline bases to make computations tractable while maintaining accuracy as good as smoothing splines. This paper extends penalized spline methodology by both modeling the variance function nonparametrically and using a spatially adaptive smoothing parameter. These extensions have been studied before, but never together and never in the multivariate case. This combination is needed for satisfactory inference and can be implemented effectively by Bayesian \mbox{MCMC}. The variance process controlling the spatially-adaptive shrinkage of the mean and the variance of the heteroscedastic error process are modeled as log-penalized …
Finding Cancer Subtypes In Microarray Data Using Random Projections, Debashis Ghosh
Finding Cancer Subtypes In Microarray Data Using Random Projections, Debashis Ghosh
The University of Michigan Department of Biostatistics Working Paper Series
One of the benefits of profiling of cancer samples using microarrays is the generation of molecular fingerprints that will define subtypes of disease. Such subgroups have typically been found in microarray data using hierarchical clustering. A major problem in interpretation of the output is determining the number of clusters. We approach the problem of determining disease subtypes using mixture models. A novel estimation procedure of the parameters in the mixture model is developed based on a combination of random projections and the expectation-maximization algorithm. Because the approach is probabilistic, our approach provides a measure for the number of true clusters …
Combining Predictors For Classification Using The Area Under The Roc Curve, Margaret S. Pepe, Tianxi Cai, Zheng Zhang
Combining Predictors For Classification Using The Area Under The Roc Curve, Margaret S. Pepe, Tianxi Cai, Zheng Zhang
UW Biostatistics Working Paper Series
We compare simple logistic regression with an alternative robust procedure for constructing linear predictors to be used for the two state classification task. Theoritical advantages of the robust procedure over logistic regression are: (i) although it assumes a generalized linear model for the dichotomous outcome variable, it does not require specification of the link function; (ii) it accommodates case-control designs even when the model is not logistic; and (iii) it yields sensible results even when the generalized linear model assumption fails to hold. Surprisingly, we find that the linear predictor derived from the logistic regression likelihood is very robust in …
Classification Using Generalized Partial Least Squares, Beiying Ding, Robert Gentleman
Classification Using Generalized Partial Least Squares, Beiying Ding, Robert Gentleman
Bioconductor Project Working Papers
The advances in computational biology have made simultaneous monitoring of thousands of features possible. The high throughput technologies not only bring about a much richer information context in which to study various aspects of gene functions but they also present challenge of analyzing data with large number of covariates and few samples. As an integral part of machine learning, classification of samples into two or more categories is almost always of interest to scientists. In this paper, we address the question of classification in this setting by extending partial least squares (PLS), a popular dimension reduction tool in chemometrics, in …
A Nested Unsupervised Approach To Identifying Novel Molecular Subtypes, Elizabeth Garrett, Giovanni Parmigiani
A Nested Unsupervised Approach To Identifying Novel Molecular Subtypes, Elizabeth Garrett, Giovanni Parmigiani
Johns Hopkins University, Dept. of Biostatistics Working Papers
In classification problems arising in genomics research it is common to study populations for which a broad class assignment is known (say, normal versus diseased) and one seeks to find undiscovered subclasses within one or both of the known classes. Formally, this problem can be thought of as an unsupervised analysis nested within a supervised one. Here we take the view that the nested unsupervised analysis can successfully utilize information from the entire data set for constructing and/or selecting useful predictors. Specifically, we propose a mixture model approach to the nested unsupervised problem, where the supervised information is used to …
Tree-Based Multivariate Regression And Density Estimation With Right-Censored Data , Annette M. Molinaro, Sandrine Dudoit, Mark J. Van Der Laan
Tree-Based Multivariate Regression And Density Estimation With Right-Censored Data , Annette M. Molinaro, Sandrine Dudoit, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
We propose a unified strategy for estimator construction, selection, and performance assessment in the presence of censoring. This approach is entirely driven by the choice of a loss function for the full (uncensored) data structure and can be stated in terms of the following three main steps. (1) Define the parameter of interest as the minimizer of the expected loss, or risk, for a full data loss function chosen to represent the desired measure of performance. Map the full data loss function into an observed (censored) data loss function having the same expected value and leading to an efficient estimator …
Semi-Parametric Regression For The Area Under The Receiver Operating Characteristic Curve, Lori E. Dodd, Margaret S. Pepe
Semi-Parametric Regression For The Area Under The Receiver Operating Characteristic Curve, Lori E. Dodd, Margaret S. Pepe
UW Biostatistics Working Paper Series
Medical advances continue to provide new and potentially better means for detecting disease. Such is true in cancer, for example, where biomarkers are sought for early detection and where improvements in imaging methods may pick up the initial functional and molecular changes associated with cancer development. In other binary classification tasks, computational algorithms such as Neural Networks, Support Vector Machines and Evolutionary Algorithms have been applied to areas as diverse as credit scoring, object recognition, and peptide-binding prediction. Before a classifier becomes an accepted technology, it must undergo rigorous evaluation to determine its ability to discriminate between states. Characterization of …
Additive Nonparametric Regression With Autocorrelated Errors, Michael S. Smith, C Wong, Robert Kohn
Additive Nonparametric Regression With Autocorrelated Errors, Michael S. Smith, C Wong, Robert Kohn
Michael Stanley Smith
A Bayesian approach is presented for nonparametric estimation of an additive regression model with autocorrelated errors. Each of the potentially nonlinear components is modelled as a regression spline using many knots, while the errors are modelled by a high order stationary autoregressive process parameterised in terms of its autocorrelations. The distribution of significant knots and partial autocorrelations is accounted for using subset selection. Our approach also allows the selection of a suitable transformation of the dependent variable. All aspects of the model are estimated simultaneously using Markov chain Monte Carlo. It is shown empirically that the proposed approach works well …
A Bayesian Approach To Additive Nonparametric Regression, Michael S. Smith, Robert Kohn
A Bayesian Approach To Additive Nonparametric Regression, Michael S. Smith, Robert Kohn
Michael Stanley Smith
This proceedings paper was the first to suggest using a Gaussian g-prior combined with a point mass to undertake Bayesian variable selection in a Gaussian linear regression model. It also was the first to suggest integrating out the regression parameters and variance in closed form, resulting in an efficient Gibbs sampling scheme. The idea was applied to estimate regression functions in an additive model by using a linear basis expansion for each component function in an additive model. The conference proceeding was eventually published in a slightly tighter form in Journal of Econometrics (1996).
Regression Models For Bivariate Binary Responses, Juni Palmgren
Regression Models For Bivariate Binary Responses, Juni Palmgren
UW Biostatistics Working Paper Series
We discuss maximum likelihood inference for the bivariate logistic model, specified in terms of the marginal logits and the log odds ratio. Using the exponential family nonlinear model formulation the model fitting can be done in GLIM. The procedure is illustrated by modelling survival of unilateral and bilateral total hip arthroplasties as function of patient specific and hip specific covariates. We compare maximum likelihood inference with inference obtained from solving likelihood equations under the assumption of within block independence and using robust standard errors for the estimates. Simulations indicate that the latter procedure is effcient for block specific covariates but …
The Effect Of A Values Clarification Strategy On The Smoking Behavior Of Pregnant Women, Jean Eiber
The Effect Of A Values Clarification Strategy On The Smoking Behavior Of Pregnant Women, Jean Eiber
Loma Linda University Electronic Theses, Dissertations & Projects
The null hypothesis of this study stated that there would be no significant difference (α=.05) in the change in smoking behavior following a teaching intervention to discourage smoking in pregnant women between those who completed a values clarifying strategy prior to the intervention and those who did not. Forty subjects were randomly assigned 20 to an experimental group and 20 to a control group. Both groups completed a pretest questionnaire concerning demographic data and smoking behavior. Information regarding the adverse effects on the fetus and on the developing child of smoking was presented to each subject by the researcher. The …