Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 31

Full-Text Articles in Statistics and Probability

Integrative Analysis Of Cancer Genomic Data, Shuangge Ma Sep 2009

Integrative Analysis Of Cancer Genomic Data, Shuangge Ma

Shuangge Ma

In the past decade, we have witnessed a period of unparallel development in the field of cancer genomics. To address the same or similar biomedical questions, multiple cancer genomic studies have been independently designed and conducted. Cancer gene signatures identified from analysis of individual datasets often have low reproducibility. A cost-effective way of improving reproducibility is to conduct integrative analysis of datasets from multiple studies with comparable designs. To properly integrate multiple studies and conduct integrative analysis, we need to access various public data warehouses, retrieve experiment protocols and raw data, evaluate individual studies and select those with comparable designs, …


Identification Of Cancer-Associated Gene Pathways From Analysis Of Expression Data, Shuangge Ma Aug 2009

Identification Of Cancer-Associated Gene Pathways From Analysis Of Expression Data, Shuangge Ma

Shuangge Ma

No abstract provided.


Lecture 5, Shuangge Ma Jun 2009

Lecture 5, Shuangge Ma

Shuangge Ma

No abstract provided.


Final Project, Shuangge Ma Jun 2009

Final Project, Shuangge Ma

Shuangge Ma

No abstract provided.


Lecture 4, Shuangge Ma Jun 2009

Lecture 4, Shuangge Ma

Shuangge Ma

No abstract provided.


Lecture 4, Shuangge Ma Jun 2009

Lecture 4, Shuangge Ma

Shuangge Ma

No abstract provided.


Computer Intensive Methods Lecture 13, Shuangge Ma Jun 2009

Computer Intensive Methods Lecture 13, Shuangge Ma

Shuangge Ma

No abstract provided.


Final Project (Description), Shuangge Ma Jun 2009

Final Project (Description), Shuangge Ma

Shuangge Ma

No abstract provided.


Final Project (Data), Shuangge Ma Jun 2009

Final Project (Data), Shuangge Ma

Shuangge Ma

No abstract provided.


Lecture 3, Shuangge Ma Jun 2009

Lecture 3, Shuangge Ma

Shuangge Ma

No abstract provided.


Lecture 2, Shuangge Ma Jun 2009

Lecture 2, Shuangge Ma

Shuangge Ma

No abstract provided.


Reference: Multiple Imputation, Shuangge Ma Jun 2009

Reference: Multiple Imputation, Shuangge Ma

Shuangge Ma

No abstract provided.


Reference: Weighted Bootstrap, Shuangge Ma Jun 2009

Reference: Weighted Bootstrap, Shuangge Ma

Shuangge Ma

No abstract provided.


Computer Intensive Methods Lecture 9, Shuangge Ma Jun 2009

Computer Intensive Methods Lecture 9, Shuangge Ma

Shuangge Ma

No abstract provided.


Computer Intensive Methods Lecture 8, Shuangge Ma Jun 2009

Computer Intensive Methods Lecture 8, Shuangge Ma

Shuangge Ma

No abstract provided.


Reference: Counter Examples [Bootstrap], Shuangge Ma Jun 2009

Reference: Counter Examples [Bootstrap], Shuangge Ma

Shuangge Ma

No abstract provided.


Computer Intensive Methods Lecture 7 (Lab 2), Shuangge Ma Jun 2009

Computer Intensive Methods Lecture 7 (Lab 2), Shuangge Ma

Shuangge Ma

No abstract provided.


Computer Intensive Methods Lecture 6, Shuangge Ma Jun 2009

Computer Intensive Methods Lecture 6, Shuangge Ma

Shuangge Ma

No abstract provided.


Reference: Block Jackknife, Shuangge Ma Jun 2009

Reference: Block Jackknife, Shuangge Ma

Shuangge Ma

No abstract provided.


Computer Intensive Methods Lecture 5, Shuangge Ma Jun 2009

Computer Intensive Methods Lecture 5, Shuangge Ma

Shuangge Ma

No abstract provided.


Reading: Simulate Multivariate Distribution, Shuangge Ma Jun 2009

Reading: Simulate Multivariate Distribution, Shuangge Ma

Shuangge Ma

No abstract provided.


Computer Intensive Methods Lecture 4, Shuangge Ma Jun 2009

Computer Intensive Methods Lecture 4, Shuangge Ma

Shuangge Ma

No abstract provided.


Computer Intensive Methods Lecture 3 (Lab 1), Shuangge Ma Jun 2009

Computer Intensive Methods Lecture 3 (Lab 1), Shuangge Ma

Shuangge Ma

No abstract provided.


Computer Intensive Methods Lecture 2, Shuangge Ma Jun 2009

Computer Intensive Methods Lecture 2, Shuangge Ma

Shuangge Ma

No abstract provided.


A Tale Of Two Streets: Incorporating Grouping Structure In High Dimensional Data Mining, Shuangge Ma Jun 2009

A Tale Of Two Streets: Incorporating Grouping Structure In High Dimensional Data Mining, Shuangge Ma

Shuangge Ma

No abstract provided.


Resampling-Based Multiple Hypothesis Testing With Applications To Genomics: New Developments In The R/Bioconductor Package Multtest, Houston N. Gilbert, Katherine S. Pollard, Mark J. Van Der Laan, Sandrine Dudoit Apr 2009

Resampling-Based Multiple Hypothesis Testing With Applications To Genomics: New Developments In The R/Bioconductor Package Multtest, Houston N. Gilbert, Katherine S. Pollard, Mark J. Van Der Laan, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

The multtest package is a standard Bioconductor package containing a suite of functions useful for executing, summarizing, and displaying the results from a wide variety of multiple testing procedures (MTPs). In addition to many popular MTPs, the central methodological focus of the multtest package is the implementation of powerful joint multiple testing procedures. Joint MTPs are able to account for the dependencies between test statistics by effectively making use of (estimates of) the test statistics joint null distribution. To this end, two additional bootstrap-based estimates of the test statistics joint null distribution have been developed for use in the …


Joint Multiple Testing Procedures For Graphical Model Selection With Applications To Biological Networks, Houston N. Gilbert, Mark J. Van Der Laan, Sandrine Dudoit Apr 2009

Joint Multiple Testing Procedures For Graphical Model Selection With Applications To Biological Networks, Houston N. Gilbert, Mark J. Van Der Laan, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

Gaussian graphical models have become popular tools for identifying relationships between genes when analyzing microarray expression data. In the classical undirected Gaussian graphical model setting, conditional independence relationships can be inferred from partial correlations obtained from the concentration matrix (= inverse covariance matrix) when the sample size n exceeds the number of parameters p which need to estimated. In situations where n < p, another approach to graphical model estimation may rely on calculating unconditional (zero-order) and first-order partial correlations. In these settings, the goal is to identify a lower-order conditional independence graph, sometimes referred to as a ‘0-1 graphs’. For either choice of graph, model selection may involve a multiple testing problem, in which edges in a graph are drawn only after rejecting hypotheses involving (saturated or lower-order) partial correlation parameters. Most multiple testing procedures applied in previously proposed graphical model selection algorithms rely on standard, marginal testing methods which do not take into account the joint distribution of the test statistics derived from (partial) correlations. We propose and implement a multiple testing framework useful when testing for edge inclusion during graphical model selection. Two features of our methodology include (i) a computationally efficient and asymptotically valid test statistics joint null distribution derived from influence curves for correlation-based parameters, and (ii) the application of empirical Bayes joint multiple testing procedures which can effectively control a variety of popular Type I error rates by incorpo- rating joint null distributions such as those described here (Dudoit and van der Laan, 2008). Using a dataset from Arabidopsis thaliana, we observe that the use of more sophisticated, modular approaches to multiple testing allows one to identify greater numbers of edges when approximating an undirected graphical model using a 0-1 graph. Our framework may also be extended to edge testing algorithms for other types of graphical models (e.g., for classical undirected, bidirected, and directed acyclic graphs).


Sparse Linear Discriminant Analysis For Simultaneous Testing For The Significance Of A Gene Set/Pathway And Gene Selection, Michael C. Wu, Lingson Zhang, Zhaoxi Wang, David C. Christiani, Xihong Lin Jan 2009

Sparse Linear Discriminant Analysis For Simultaneous Testing For The Significance Of A Gene Set/Pathway And Gene Selection, Michael C. Wu, Lingson Zhang, Zhaoxi Wang, David C. Christiani, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.


Identification Of Yeast Transcriptional Regulation Networks Using Multivariate Random Forests, Yuanyuan Xiao, Mark Segal Dec 2008

Identification Of Yeast Transcriptional Regulation Networks Using Multivariate Random Forests, Yuanyuan Xiao, Mark Segal

Mark R Segal

The recent availability of whole-genome scale data sets that investigate complementary and diverse aspects of transcriptional regulation has spawned an increased need for new and effective computational approaches to analyze and integrate these large scale assays. Here, we propose a novel algorithm, based on random forest methodology, to relate gene expression (as derived from expression microarrays) to sequence features residing in gene promoters (as derived from DNA motif data) and transcription factor binding to gene promoters (as derived from tiling microarrays). We extend the random forest approach to model a multivariate response as represented, for example, by time-course gene expression …


Computer Intensive Methods Lecture 1, Shuangge Ma Dec 2008

Computer Intensive Methods Lecture 1, Shuangge Ma

Shuangge Ma

No abstract provided.