Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Bioinformatics (113)
- Computational Biology (113)
- Physical Sciences and Mathematics (92)
- Statistics and Probability (88)
- Genetics (73)
-
- Statistical Methodology (47)
- Statistical Theory (47)
- Microarrays (46)
- Multivariate Analysis (28)
- Statistical Models (27)
- Biostatistics (23)
- Medicine and Health Sciences (18)
- Survival Analysis (14)
- Applied Mathematics (13)
- Numerical Analysis and Computation (13)
- Categorical Data Analysis (12)
- Public Health (12)
- Epidemiology (10)
- Longitudinal Data Analysis and Time Series (8)
- Diseases (5)
- Genomics (5)
- Laboratory and Basic Science Research (5)
- Clinical Epidemiology (4)
- Design of Experiments and Sample Surveys (4)
- Disease Modeling (4)
- Biochemistry, Biophysics, and Structural Biology (3)
- Biometry (3)
- Clinical Trials (3)
- Keyword
-
- Genetics (35)
- Gene expression (10)
- Bioinformatics (5)
- Microarray (5)
- Model selection (4)
-
- Bootstrap (3)
- Classification (3)
- Cluster analysis (3)
- Comparative genomic hybridization (3)
- Cross-validation (3)
- Density estimation (3)
- Mixture models (3)
- Multiple comparison (3)
- Multiple comparisons (3)
- Prediction (3)
- Survival analysis (3)
- Cancer genomics (2)
- Censored data (2)
- Clustering (2)
- Compendium (2)
- Counting process (2)
- Differential expression (2)
- False discovery rate (2)
- Family-wise error rate control (2)
- High-dimensional data (2)
- High-throughput "omics" (2)
- Hypothesis testing (2)
- Linkage mapping (2)
- Loss function (2)
- Mixture model (2)
- Publication Year
Articles 31 - 60 of 166
Full-Text Articles in Genetics and Genomics
Permutation-Based Pathway Testing Using The Super Learner Algorithm, Paul Chaffee, Alan E. Hubbard, Mark L. Van Der Laan
Permutation-Based Pathway Testing Using The Super Learner Algorithm, Paul Chaffee, Alan E. Hubbard, Mark L. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Many diseases and other important phenotypic outcomes are the result of a combination of factors. For example, expression levels of genes have been used as input to various statistical methods for predicting phenotypic outcomes. One particular popular variety is the so-called gene set enrichment analysis (GSEA). This paper discusses an augmentation to an existing strategy to estimate the significance of an associations between a disease outcome and a predetermined combination of biological factors, based on a specific data adaptive regression method (the "Super Learner," van der Laan et al., 2007). The procedure uses an aggressive search procedure, potentially resulting in …
Accurate Genome-Scale Percentage Dna Methylation Estimates From Microarray Data, Martin J. Aryee, Zhijin Wu, Christine Ladd-Acosta, Brian Herb, Andrew P. Feinberg, Srinivasan Yegnasurbramanian, Rafael A. Irizarry
Accurate Genome-Scale Percentage Dna Methylation Estimates From Microarray Data, Martin J. Aryee, Zhijin Wu, Christine Ladd-Acosta, Brian Herb, Andrew P. Feinberg, Srinivasan Yegnasurbramanian, Rafael A. Irizarry
Johns Hopkins University, Dept. of Biostatistics Working Papers
DNA methylation is a key regulator of gene function in a multitude of both normal and abnormal biological processes, but tools to elucidate its roles on a genome-wide scale are still in their infancy. Methylation sensitive restriction enzymes and microarrays provide a potential high-throughput, low-cost platform to allow methylation profiling. However, accurate absolute methylation estimates have been elusive due to systematic errors and unwanted variability. Previous microarray pre-processing procedures, mostly developed for expression arrays, fail to adequately normalize methylation-related data since they rely on key assumptions that are violated in the case of DNA methylation. We develop a normalization strategy …
Modeling Dependent Gene Expression, Donatello Telesca, Peter Muller, Giovanni Parmigiani, Ralph S. Freedman
Modeling Dependent Gene Expression, Donatello Telesca, Peter Muller, Giovanni Parmigiani, Ralph S. Freedman
Harvard University Biostatistics Working Paper Series
No abstract provided.
Wavelet Based Functional Models For Transcriptome Analysis With Tiling Arrays, Lieven Clement, Kristof Debeuf, Ciprian Crainiceanu, Olivier Thas, Marnik Vuylsteke, Rafael Irizarry
Wavelet Based Functional Models For Transcriptome Analysis With Tiling Arrays, Lieven Clement, Kristof Debeuf, Ciprian Crainiceanu, Olivier Thas, Marnik Vuylsteke, Rafael Irizarry
Johns Hopkins University, Dept. of Biostatistics Working Papers
For a better understanding of the biology of an organism a complete description is needed of all regions of the genome that are actively transcribed. Tiling arrays can be used for this purpose. Such arrays allow the discovery of novel transcripts and the assessment of differential expression between two or more experimental conditions such as genotype, treatment, tissue, etc. Much of the initial methodological efforts were designed for transcript discovery, while more recent developments also focus on differential expression. To our knowledge no methods for tiling arrays are described in the literature that can both assess transcript discovery and identify …
Bayesian Methods For Network-Structured Genomics Data, Stefano Monni, Hongzhe Li
Bayesian Methods For Network-Structured Genomics Data, Stefano Monni, Hongzhe Li
UPenn Biostatistics Working Papers
Graphs and networks are common ways of depicting information. In biology, many different processes are represented by graphs, such as regulatory networks, metabolic pathways and protein-protein interaction networks. This information provides useful supplement to the standard numerical genomic data such as microarray gene expression data. Effectively utilizing such an information can lead to a better identification of biologically relevant genomic features in the context of our prior biological knowledge. In this paper, we present a Bayesian variable selection procedure for network-structured covariates for both Gaussian linear and probit models. The key of our approach is the introduction of a Markov …
Targeted Genomic Signature Profiling With Quasi-Alignment Statistics, Rao Mallik Kotamarti, Douglas W. Raiford, Michael Hahsler, Yuhang Wang, Monnie Mcgee, Maggie Dunham
Targeted Genomic Signature Profiling With Quasi-Alignment Statistics, Rao Mallik Kotamarti, Douglas W. Raiford, Michael Hahsler, Yuhang Wang, Monnie Mcgee, Maggie Dunham
COBRA Preprint Series
Genome databases continue to expand with no change in the basic format of sequence data. The prevalent use of the Classic alignment based search tools like BLAST have significantly pushed the limits of Genome Isolate research. The relatively new frontier of Metagenomic research deals with thousands of diverse genomes with newer demands beyond the current homologue search and analysis. Compressing sequence data into a complex form could facilitate a broader range of sequence analyses. To this end, this research explores reorganizing sequence data as complex Markov signatures also known as Extensible Markov Models. Markov models have found successful application in …
Integrative Clustering Of Multiple Genomic Data Types Using A Joint Latent Variable Model With Application To Breast And Lung Cancer Subtype Analysis, Ronglai Shen, Adam Olshen, Marc Ladanyi
Integrative Clustering Of Multiple Genomic Data Types Using A Joint Latent Variable Model With Application To Breast And Lung Cancer Subtype Analysis, Ronglai Shen, Adam Olshen, Marc Ladanyi
Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series
The molecular complexity of a tumor manifests itself at the genomic, epigenomic, transcriptomic, and proteomic levels. Genomic profiling at these multiple levels should allow an integrated characterization of tumor etiology. However, there is a shortage of effective statistical and bioinformatic tools for truly integrative data analysis. The standard approach to integrative clustering is separate clustering followed by manual integration. A more statistically powerful approach would incorporate all data types simultaneously and generate a single integrated cluster assignment. We developed a joint latent variable model for integrative clustering. We call the resulting methodology iCluster. iCluster incorporates flexible modeling of the associations …
Model-Based Quality Assessment And Base-Calling For Second-Generation Sequencing Data, Rafael A. Irizarry, Hector Corrada Bravo
Model-Based Quality Assessment And Base-Calling For Second-Generation Sequencing Data, Rafael A. Irizarry, Hector Corrada Bravo
Johns Hopkins University, Dept. of Biostatistics Working Papers
Second-generation sequencing (sec-gen) technology can sequence millions of short fragments of DNA in parallel, and is capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1,000 Genomes Project, plans to fully sequence the genomes of approximately 1,200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads—strings …
A Classification Model For Distinguishing Copy Number Variants From Cancer-Related Alterations, Irina Ostrovnaya, Gouri Nanjangud, Adam Olshen
A Classification Model For Distinguishing Copy Number Variants From Cancer-Related Alterations, Irina Ostrovnaya, Gouri Nanjangud, Adam Olshen
Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series
Both somatic copy number alterations (CNAs) and germline copy number variants (CNVs) that are prevalent in healthy individuals can appear as recurrent changes in comparative genomic hybridization (CGH) analyses of tumors. In order to identify important cancer genes CNAs and CNVs must be distinguished. Although the Database of Genomic Variants (Iafrate et al., 2004) contains a list of all known CNVs, there is no standard methodology to use the database effectively.
We develop a prediction model that distinguishes CNVs from CNAs based on the information contained in the Database and several other variables, including potential CNV’s length, height, closeness to …
Trio Logic Regression - Detection Of Snp - Snp Interactions In Case-Parent Trios, Qing Li, Thomas A. Louis, M. Daniele Fallin, Ingo Ruczinski
Trio Logic Regression - Detection Of Snp - Snp Interactions In Case-Parent Trios, Qing Li, Thomas A. Louis, M. Daniele Fallin, Ingo Ruczinski
Johns Hopkins University, Dept. of Biostatistics Working Papers
Statistical approaches to evaluate higher order SNP-SNP and SNP-environment interactions are critical in genetic association studies, as susceptibility to complex disease is likely to be related to the interaction of multiple SNPs and environmental factors. Logic regression (Kooperberg et al., 2001; Ruczinski et al., 2003) is one such approach, where interactions between SNPs and environmental variables are assessed in a regression framework, and interactions become part of the model search space. In this manuscript we extend the logic regression methodology, originally developed for cohort and case-control studies, for studies of trios with affected probands. Trio logic regression accounts for the …
Subset Quantile Normalization Using Negative Control Features, Zhijin Wu
Subset Quantile Normalization Using Negative Control Features, Zhijin Wu
Johns Hopkins University, Dept. of Biostatistics Working Papers
No abstract provided.
Frozen Robust Multi-Array Analysis (Frma), Matthew N. Mccall, Benjamin M. Bolstad, Rafael A. Irizarry
Frozen Robust Multi-Array Analysis (Frma), Matthew N. Mccall, Benjamin M. Bolstad, Rafael A. Irizarry
Johns Hopkins University, Dept. of Biostatistics Working Papers
Robust Multi-array Analysis (RMA) is the most widely used preprocessing algorithm for Affymetrix and Nimblegen gene-expression microarrays. RMA performs background correction, normalization, and summarization in a modular way. The last two steps require multiple arrays to be analyzed simultaneously. The ability to borrow information across samples provides RMA various advantages. For example, the summarization step fits a parametric model that accounts for probe-effects, assumed to be fixed across arrays, and improves outlier detection. Residuals, obtained from the fitted model, permit the creation of useful quality metrics. However, the dependence on multiple arrays has two drawbacks: (1) RMA can- not be …
Resampling-Based Multiple Hypothesis Testing With Applications To Genomics: New Developments In The R/Bioconductor Package Multtest, Houston N. Gilbert, Katherine S. Pollard, Mark J. Van Der Laan, Sandrine Dudoit
Resampling-Based Multiple Hypothesis Testing With Applications To Genomics: New Developments In The R/Bioconductor Package Multtest, Houston N. Gilbert, Katherine S. Pollard, Mark J. Van Der Laan, Sandrine Dudoit
U.C. Berkeley Division of Biostatistics Working Paper Series
The multtest package is a standard Bioconductor package containing a suite of functions useful for executing, summarizing, and displaying the results from a wide variety of multiple testing procedures (MTPs). In addition to many popular MTPs, the central methodological focus of the multtest package is the implementation of powerful joint multiple testing procedures. Joint MTPs are able to account for the dependencies between test statistics by effectively making use of (estimates of) the test statistics joint null distribution. To this end, two additional bootstrap-based estimates of the test statistics joint null distribution have been developed for use in the …
Evaluation Of Statistical Methods For Normalization And Differential Expression In Mrna-Seq Experiments, James H. Bullard, Elizabeth A. Purdom, Kasper D. Hansen, Sandrine Dudoit
Evaluation Of Statistical Methods For Normalization And Differential Expression In Mrna-Seq Experiments, James H. Bullard, Elizabeth A. Purdom, Kasper D. Hansen, Sandrine Dudoit
U.C. Berkeley Division of Biostatistics Working Paper Series
The focus of this article is on the design and analysis of mRNA-Seq experiments, with the aim of inferring transcript levels and identifying differentially expressed genes. We investigate two mRNA-Seq datasets obtained using Illumina's Genome Analyzer platform to measure transcript levels in reference samples considered in the MicroArray Quality Control (MAQC) Project. We address the following four main issues: (1) exploratory data analysis for mapped reads, relating read counts to variables describing input samples and genomic regions of interest; (2) assessment and quantitation of biological effects (e.g., expression levels in Brain vs. UHR) and nuisance experimental effects (e.g., library preparation, …
Gene Set Enrichment Analysis Made Simple, Rafael A. Irizarry, Chi Wang, Yun Zhou, Terence P. Speed
Gene Set Enrichment Analysis Made Simple, Rafael A. Irizarry, Chi Wang, Yun Zhou, Terence P. Speed
Johns Hopkins University, Dept. of Biostatistics Working Papers
Among the many applications of microarray technology, one of the most popular is the identification of genes that are differentially expressed in two conditions. A common statistical approach is to quantify the interest of each gene with a p-value, adjust these p-values for multiple comparisons, chose an appropriate cut-off, and create a list of candidate genes. This approach has been criticized for ignoring biological knowledge regarding how genes work together. Recently a series of methods, that do incorporate biological knowledge, have been proposed. However, many of these methods seem overly complicated. Furthermore, the most popular method, Gene Set Enrichment Analysis …
Generalized Liquid Association, Yen-Yi Ho, Leslie Cope, Thomas A. Louis, Giovanni Parmigiani
Generalized Liquid Association, Yen-Yi Ho, Leslie Cope, Thomas A. Louis, Giovanni Parmigiani
Johns Hopkins University, Dept. of Biostatistics Working Papers
The analysis of interactions among a group of genes is fundamental to fur- ther our understanding of their biological interactions in a cell. Several studies suggested that the co-expression relationship of two genes can be modulated by a third controller gene. These controller genes and the corresponding modulated co-expressed gene pairs are the subjects of interests in this study. This described \controller-modulated genes" three-way interactions is referred as liquid association in the literature. Analysis of gene expression data has suggested that these interactions are present in many biological systems.
To quantify the magnitude of liquid association for a given gene …
Joint Multiple Testing Procedures For Graphical Model Selection With Applications To Biological Networks, Houston N. Gilbert, Mark J. Van Der Laan, Sandrine Dudoit
Joint Multiple Testing Procedures For Graphical Model Selection With Applications To Biological Networks, Houston N. Gilbert, Mark J. Van Der Laan, Sandrine Dudoit
U.C. Berkeley Division of Biostatistics Working Paper Series
Gaussian graphical models have become popular tools for identifying relationships between genes when analyzing microarray expression data. In the classical undirected Gaussian graphical model setting, conditional independence relationships can be inferred from partial correlations obtained from the concentration matrix (= inverse covariance matrix) when the sample size n exceeds the number of parameters p which need to estimated. In situations where n < p, another approach to graphical model estimation may rely on calculating unconditional (zero-order) and first-order partial correlations. In these settings, the goal is to identify a lower-order conditional independence graph, sometimes referred to as a ‘0-1 graphs’. For either choice of graph, model selection may involve a multiple testing problem, in which edges in a graph are drawn only after rejecting hypotheses involving (saturated or lower-order) partial correlation parameters. Most multiple testing procedures applied in previously proposed graphical model selection algorithms rely on standard, marginal testing methods which do not take into account the joint distribution of the test statistics derived from (partial) correlations. We propose and implement a multiple testing framework useful when testing for edge inclusion during graphical model selection. Two features of our methodology include (i) a computationally efficient and asymptotically valid test statistics joint null distribution derived from influence curves for correlation-based parameters, and (ii) the application of empirical Bayes joint multiple testing procedures which can effectively control a variety of popular Type I error rates by incorpo- rating joint null distributions such as those described here (Dudoit and van der Laan, 2008). Using a dataset from Arabidopsis thaliana, we observe that the use of more sophisticated, modular approaches to multiple testing allows one to identify greater numbers of edges when approximating an undirected graphical model using a 0-1 graph. Our framework may also be extended to edge testing algorithms for other types of graphical models (e.g., for classical undirected, bidirected, and directed acyclic graphs).
A Novel Topology For Representing Protein Folds, Mark R. Segal
A Novel Topology For Representing Protein Folds, Mark R. Segal
COBRA Preprint Series
Various topologies for representing three dimensional protein structures have been advanced for purposes ranging from prediction of folding rates to ab initio structure prediction. Examples include relative contact order, Delaunay tessellations, and backbone torsion angle distributions. Here we introduce a new topology based on a novel means for operationalizing three dimensional proximities with respect to the underlying chain. The measure involves first interpreting a rank-based representation of the nearest neighbors of each residue as a permutation, then determining how perturbed this permutation is relative to an unfolded chain. We show that the resultant topology provides improved association with folding and …
Fitting Ace Structural Equation Models To Case-Control Family Data, Kristin N. Javaras, James I. Hudson, Nan M. Laird
Fitting Ace Structural Equation Models To Case-Control Family Data, Kristin N. Javaras, James I. Hudson, Nan M. Laird
COBRA Preprint Series
Investigators interested in whether a disease aggregates in families often collect case-control family data, which consist of disease status and covariate information for families selected via case or control probands. Here, we focus on the use of case-control family data to investigate the relative contributions to the disease of additive genetic effects (A), shared family environment (C), and unique environment (E). To this end, we describe a ACE model for binary family data and then introduce an approach to fitting the model to case-control family data. The structural equation model, which has been described previously, combines a general-family extension of …
Associaton Tests That Accommodate Genotyping Errors, Ingo Ruczinski, Qing Li, Benilton Carvalho, M. Daniele Fallin, Rafael A. Irizarry, Thomas A. Louis
Associaton Tests That Accommodate Genotyping Errors, Ingo Ruczinski, Qing Li, Benilton Carvalho, M. Daniele Fallin, Rafael A. Irizarry, Thomas A. Louis
Johns Hopkins University, Dept. of Biostatistics Working Papers
High-throughput SNP arrays provide estimates of genotypes for up to one million loci, often used in genome-wide association studies. While these estimates are typically very accurate, genotyping errors do occur, which can influence in particular the most extreme test statistics and p-values. Estimates for the genotype uncertainties are also available, although typically ignored. In this manuscript, we develop a framework to incorporate these genotype uncertainties in case-control studies for any genetic model. We verify that using the assumption of a “local alternative” in the score test is very reasonable for effect sizes typically seen in SNP association studies, and show …
Sparse Linear Discriminant Analysis For Simultaneous Testing For The Significance Of A Gene Set/Pathway And Gene Selection, Michael C. Wu, Lingson Zhang, Zhaoxi Wang, David C. Christiani, Xihong Lin
Sparse Linear Discriminant Analysis For Simultaneous Testing For The Significance Of A Gene Set/Pathway And Gene Selection, Michael C. Wu, Lingson Zhang, Zhaoxi Wang, David C. Christiani, Xihong Lin
Harvard University Biostatistics Working Paper Series
No abstract provided.
A Hidden Markov Random Field Model For Genome-Wide Association Studies, Hongzhe Li, Zhi Wei, J M. Maris
A Hidden Markov Random Field Model For Genome-Wide Association Studies, Hongzhe Li, Zhi Wei, J M. Maris
UPenn Biostatistics Working Papers
Genome-wide association studies (GWAS) are increasingly utilized for identifying novel susceptible genetic variants for complex traits, but there is little consensus on analysis methods for such data. Most commonly used methods include single SNP analysis or haplotype analysis with Bonferroni correction for multiple comparisons. Since the SNPs in typical GWAS are often in linkage disequilibrium (LD), at least locally, Bonferonni correction of multiple comparisons often leads to conservative error control and therefore lower statistical power. In this paper, we propose a hidden Markov random field model (HMRF) for GWAS analysis based on a weighted LD graph built from the prior …
Detection Of Recurrent Copy Number Alterations In The Genome: A Probabilistic Approach, Oscar M. Rueda, Ramon Diaz-Uriarte
Detection Of Recurrent Copy Number Alterations In The Genome: A Probabilistic Approach, Oscar M. Rueda, Ramon Diaz-Uriarte
COBRA Preprint Series
Copy number variation (CNV) in genomic DNA is linked to a variety of human diseases (including cancer, HIV acquisition, autoimmune and neurodegenerative diseases), and array-based CGH (aCGH) is currently the main technology to locate CNVs. Several methods can analyze aCGH data at the single sample level, but disease-critical genes are more likely to be found in regions that are common or recurrent among samples. Unfortunately, defining recurrent CNV regions remains a challenge. Moreover, the heterogeneous nature of many diseases requires that we search for CNVs that affect only some subsets of the samples (without prior knowledge of which regions and …
Finding Recurrent Regions Of Copy Number Variation: A Review, Oscar M. Rueda, Ramon Diaz-Uriarte
Finding Recurrent Regions Of Copy Number Variation: A Review, Oscar M. Rueda, Ramon Diaz-Uriarte
COBRA Preprint Series
Copy number variation (CNV) in genomic DNA is linked to a variety of human diseases, and array-based CGH (aCGH) is currently the main technology to locate CNVs. Although many methods have been developed to analyze aCGH from a single array/subject, disease-critical genes are more likely to be found in regions that are common or recurrent among subjects. Unfortunately, finding recurrent CNV regions remains a challenge. We review existing methods for the identification of recurrent CNV regions. The working definition of ``common'' or ``recurrent'' region differs between methods, leading to approaches that use different types of input (discretized output from a …
The Strength Of Statistical Evidence For Composite Hypotheses With An Application To Multiple Comparisons, David R. Bickel
The Strength Of Statistical Evidence For Composite Hypotheses With An Application To Multiple Comparisons, David R. Bickel
COBRA Preprint Series
The strength of the statistical evidence in a sample of data that favors one composite hypothesis over another may be quantified by the likelihood ratio using the parameter value consistent with each hypothesis that maximizes the likelihood function. Unlike the p-value and the Bayes factor, this measure of evidence is coherent in the sense that it cannot support a hypothesis over any hypothesis that it entails. Further, when comparing the hypothesis that the parameter lies outside a non-trivial interval to the hypotheses that it lies within the interval, the proposed measure of evidence almost always asymptotically favors the correct hypothesis …
A Network-Constrained Empirical Bayes Method For Analysis Of Genomic Data, Caiyan Li, Zhi Wei, Hongzhe Li
A Network-Constrained Empirical Bayes Method For Analysis Of Genomic Data, Caiyan Li, Zhi Wei, Hongzhe Li
UPenn Biostatistics Working Papers
Empirical Bayes methods are widely used in the analysis of microarray gene expression data in order to identify the differentially expressed genes or genes that are associated with other general phenotypes. Available methods often assume that genes are independent. However, genes are expected to function interactively and to form molecular modules to affect the phenotypes. In order to account for regulatory dependency among genes, we propose in this paper a network-constrained empirical Bayes method for analyzing genomic data in the framework of general linear models, where the dependency of genes is modeled by a discrete Markov random field model defined …
Estimation And Testing For The Effect Of A Genetic Pathway On A Disease Outcome Using Logistic Kernel Machine Regression Via Logistic Mixed Models, Dawei Liu, Debashis Ghosh, Xihong Lin
Estimation And Testing For The Effect Of A Genetic Pathway On A Disease Outcome Using Logistic Kernel Machine Regression Via Logistic Mixed Models, Dawei Liu, Debashis Ghosh, Xihong Lin
Harvard University Biostatistics Working Paper Series
No abstract provided.
A Powerful And Flexible Multilocus Association Test For Quantitative Traits, Lydia Coulter Kwee, Dawei Liu, Xihong Lin, Debashis Ghosh, Michael P. Epstein
A Powerful And Flexible Multilocus Association Test For Quantitative Traits, Lydia Coulter Kwee, Dawei Liu, Xihong Lin, Debashis Ghosh, Michael P. Epstein
Harvard University Biostatistics Working Paper Series
No abstract provided.
Model-Based Clustering Of Methylation Array Data: A Recursive-Partitioning Algorithm For High-Dimensional Data Arising As A Mixture Of Beta Distributions, E. Andres Houseman, Brock C. Christensen, Ru-Fang Yeh, Carmen J. Marsit, Margaret R. Karagas, Margaret Wrensch, Heather H. Nelson, Joseph Wiemels, Shichun Zheng, John K. Wiencke, Karl T. Kelsey
Model-Based Clustering Of Methylation Array Data: A Recursive-Partitioning Algorithm For High-Dimensional Data Arising As A Mixture Of Beta Distributions, E. Andres Houseman, Brock C. Christensen, Ru-Fang Yeh, Carmen J. Marsit, Margaret R. Karagas, Margaret Wrensch, Heather H. Nelson, Joseph Wiemels, Shichun Zheng, John K. Wiencke, Karl T. Kelsey
Harvard University Biostatistics Working Paper Series
No abstract provided.
U-Statistics-Based Tests For Multiple Genes In Genetic Association Studies, Zhi Wei, Mingyao Li Phd, Timothy Rebbeck, Hongzhe Li
U-Statistics-Based Tests For Multiple Genes In Genetic Association Studies, Zhi Wei, Mingyao Li Phd, Timothy Rebbeck, Hongzhe Li
UPenn Biostatistics Working Papers
Abstract: As our understanding of biological pathways and the genes that regulate these pathways increases, consideration of these biological pathways has become an increasingly important part of genetic and molecular epidemiology. Pathway-based genetic association studies often involve genotyping of variants in genes acting in certain biological pathways. Such pathway-based genetic association studies can potentially capture the highly heterogeneous nature of many complex traits, with multiple causative loci and multiple alleles at some of the causative loci. In this paper, we develop two nonparametric test statistics that consider simultaneously the effects of multiple markers. Our approach, which is based on data-adaptive …