Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Genetics (10)
- Gene Set Enrichment; microarray; ALL; (1)
- Alternative p-values (1)
- Annotation metadata; Gene Ontology (GO); genomics; microarray; multiple hypothesis testing; resampling (1)
- As-treated analysis; Per-protocol analysis; Causal inference; Instrumental variables; Principal stratification; Propensity scores (1)
-
- Asymptotic bias and variance; Clustered survival data; Efficiency; Estimating equation; Kernel smoothing; Marginal model; Sandwich estimator (1)
- Asymptotic bias; EM algorithm; Maximum likelihood estimator; Measurement error; Structural modeling; Transitional Models (1)
- Asymptotic efficiency; Conditional score method; Functional modeling; Measurement error; Longitudinal data; Semiparametric inference; Transition models (1)
- BLUPs; Kernel function; Model/variable selection; Nonparametric regression; Penalized likelihood; REML; Score test; Smoothing parameter; Support vector machines (1)
- Balanced testing (1)
- Bayes credible intervals (1)
- Block design (1)
- Blocked factorial (1)
- Clinical trials; Doubly randomized preference trials; EM algorithm; Partically randomized preference trials; Randomization; Selection bias (1)
- Decision problems; Multiplicities; False discovery rate (1)
- Empirical Bayes; False discovery rate; Clustering; Density estimation (1)
- Factorial Design (1)
- False Discovery Rate; Genetics; High Dimensional Data; Human Immunode Effciency Virus; Kullback-Leibler; Mahalanobis; Multinomial; Sequence Analysis (1)
- Fractional factorial (1)
- Gene Set Enrichment; microarray; ALL; (1)
- Gene expression (1)
- Graphical methods; Hierarchical models; Interactions; Lasso; Log-linear models; Variable selection (1)
- Microarray (1)
- QTL; LOD support intervals; Confidence intervals; Bootstrap (1)
- Visualization (1)
Articles 1 - 28 of 28
Full-Text Articles in Entire DC Network
Use Of Hidden Markov Models For Qtl Mapping, Karl W. Broman
Use Of Hidden Markov Models For Qtl Mapping, Karl W. Broman
Johns Hopkins University, Dept. of Biostatistics Working Papers
An important aspect of the QTL mapping problem is the treatment of missing genotype data. If complete genotype data were available, QTL mapping would reduce to the problem of model selection in linear regression. However, in the consideration of loci in the intervals between the available genetic markers, genotype data is inherently missing. Even at the typed genetic markers, genotype data is seldom complete, as a result of failures in the genotyping assays or for the sake of economy (for example, in the case of selective genotyping, where only individuals with extreme phenotypes are genotyped). We discuss the use of …
Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh
Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh
Harvard University Biostatistics Working Paper Series
No abstract provided.
Penalized Likelihood And Bayesian Methods For Sparse Contingency Tables: An Analysis Of Alternative Splicing In Full-Length Cdna Libraries, Corinne Dahinden, Giovanni Parmigiani, Mark C. Emerick, Peter Buhlmann
Penalized Likelihood And Bayesian Methods For Sparse Contingency Tables: An Analysis Of Alternative Splicing In Full-Length Cdna Libraries, Corinne Dahinden, Giovanni Parmigiani, Mark C. Emerick, Peter Buhlmann
Johns Hopkins University, Dept. of Biostatistics Working Papers
We develop methods to perform model selection and parameter estimation in loglinear models for the analysis of sparse contingency tables to study the interaction of two or more factors. Typically, datasets arising from so-called full-length cDNA libraries, in the context of alternatively spliced genes, lead to such sparse contingency tables. Maximum Likelihood estimation of log-linear model coefficients fails to work because of zero cell entries. Therefore new methods are required to estimate the coefficients and to perform model selection. Our suggestions include computationally efficient penalization (Lasso-type) approaches as well as Bayesian methods using MCMC. We compare these procedures in a …
Multiple Testing With An Empirical Alternative Hypothesis, James E. Signorovitch
Multiple Testing With An Empirical Alternative Hypothesis, James E. Signorovitch
Harvard University Biostatistics Working Paper Series
An optimal multiple testing procedure is identified for linear hypotheses under the general linear model, maximizing the expected number of false null hypotheses rejected at any significance level. The optimal procedure depends on the unknown data-generating distribution, but can be consistently estimated. Drawing information together across many hypotheses, the estimated optimal procedure provides an empirical alternative hypothesis by adapting to underlying patterns of departure from the null. Proposed multiple testing procedures based on the empirical alternative are evaluated through simulations and an application to gene expression microarray data. Compared to a standard multiple testing procedure, it is not unusual for …
Estimating Genome-Wide Copy Number Using Allele Specific Mixture Models, Wenyi Wang , Benilton Caravalho, Nate Miller, Jonathan Pevsner, Aravinda Chakravarti, Rafael A. Irizarry
Estimating Genome-Wide Copy Number Using Allele Specific Mixture Models, Wenyi Wang , Benilton Caravalho, Nate Miller, Jonathan Pevsner, Aravinda Chakravarti, Rafael A. Irizarry
Johns Hopkins University, Dept. of Biostatistics Working Papers
Genomic changes such as copy number alterations are thought to be one of the major underlying causes of human phenotypic variation among normal and disease subjects [23,11,25,26,5,4,7,18]. These include chromosomal regions with so-called copy number alterations: instead of the expected two copies, a section of the chromosome for a particular individual may have zero copies (homozygous deletion), one copy (hemizygous deletions), or more than two copies (amplifications). The canonical example is Down syndrome which is caused by an extra copy of chromosome 21. Identification of such abnormalities in smaller regions has been of great interest, because it is believed to …
Exploration Of Distributional Models For A Novel Intensity-Dependent Normalization , Nicola Lama, Patrizia Boracchi, Elia Mario Biganzoli
Exploration Of Distributional Models For A Novel Intensity-Dependent Normalization , Nicola Lama, Patrizia Boracchi, Elia Mario Biganzoli
COBRA Preprint Series
Currently used gene intensity-dependent normalization methods, based on regression smoothing techniques, usually approach the two problems of location bias detrending and data re-scaling without taking into account the censoring characteristic of certain gene expressions produced by experiment measurement constraints or by previous normalization steps. Moreover, the bias vs variance balance control of normalization procedures is not often discussed but left to the user's experience. Here an approximate maximum likelihood procedure to fit a model smoothing the dependences of log-fold gene expression differences on average gene intensities is presented. Central tendency and scaling factor were modeled by means of B-splines smoothing …
A Unifying Approach For Haplotype Analysis Of Quantitative Traits In Family-Based Association Studies: Testing And Estimating Gene-Environment Interactions With Complex Exposure Variables, Stijn Vansteelandt, Christoph Lange
A Unifying Approach For Haplotype Analysis Of Quantitative Traits In Family-Based Association Studies: Testing And Estimating Gene-Environment Interactions With Complex Exposure Variables, Stijn Vansteelandt, Christoph Lange
COBRA Preprint Series
We propose robust and e±cient tests and estimators for gene-environment/gene-drug interactions in family-based association studies. The methodology is designed for studies in which haplotypes, quantitative pheno- types and complex exposure/treatment variables are analyzed. Using causal inference methodology, we derive family-based association tests and estimators for the genetic main effects and the interactions. The tests and estimators are robust against population admixture and strati¯cation without requiring adjustment for confounding variables. We illustrate the practical relevance of our approach by an application to a COPD study. The data analysis suggests a gene-environment interaction between a SNP in the Serpine gene and smok- …
Structural Inference In Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Xihong Lin, Donglin Zeng
Structural Inference In Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Xihong Lin, Donglin Zeng
Harvard University Biostatistics Working Paper Series
No abstract provided.
Estimation In Semiparametric Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Donglin Zeng, Xihong Lin
Estimation In Semiparametric Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Donglin Zeng, Xihong Lin
Harvard University Biostatistics Working Paper Series
No abstract provided.
Nonparametric Regression Using Local Kernel Estimating Equations For Correlated Failure Time Data, Zhangsheng Yu, Xihong Lin
Nonparametric Regression Using Local Kernel Estimating Equations For Correlated Failure Time Data, Zhangsheng Yu, Xihong Lin
Harvard University Biostatistics Working Paper Series
No abstract provided.
Causal Inference In Hybrid Intervention Trials Involving Treatment Choice, Qi Long, Rod Little, Xihong Lin
Causal Inference In Hybrid Intervention Trials Involving Treatment Choice, Qi Long, Rod Little, Xihong Lin
Harvard University Biostatistics Working Paper Series
No abstract provided.
A Comparison Of Methods For Estimating The Causal Effect Of A Treatment In Randomized Clinical Trials Subject To Noncompliance, Rod Little, Qi Long, Xihong Lin
A Comparison Of Methods For Estimating The Causal Effect Of A Treatment In Randomized Clinical Trials Subject To Noncompliance, Rod Little, Qi Long, Xihong Lin
Harvard University Biostatistics Working Paper Series
No abstract provided.
Group Additive Regression Models For Genomic Data Analysis, Yihui Luan, Hongzhe Li
Group Additive Regression Models For Genomic Data Analysis, Yihui Luan, Hongzhe Li
UPenn Biostatistics Working Papers
One important problem in genomic research is to identify genomic features such as gene expression data or DNA single nucleotide polymorphisms (SNPs) that are related to clinical phenotypes. Often these genomic data can be naturally divided into biologically meaningful groups such as genes belonging to the same pathways or SNPs within genes. In this paper, we propose group additive regression models and a group gradient descent boosting procedure for identifying groups of genomic features that are related to clinical phenotypes. Our simulation results show that by dividing the variables into appropriate groups, we can obtain better identification of the group …
Extensions To Gene Set Enrichment, Zhen Jiang, Robert Gentleman
Extensions To Gene Set Enrichment, Zhen Jiang, Robert Gentleman
Bioconductor Project Working Papers
Motivation: Gene Set Enrichment Analysis (GSEA) has been developed recently to capture moderate but coordinated changes in the expression of sets of functionally related genes. We propose number of extensions to GSEA, which uses different statistics to describe the association between genes and phenotype of interest. We make use of dimension reduction procedures, such as principle component analysis to identify gene sets containing coordinated genes. We also address the problem of overlapping among gene sets in this paper.
Results: We applied our methods to the data come from a clinical trial in acute lymphoblastic leukemia (ALL) [1]. We identified interesting …
Supervised Detection Of Conserved Motifs In Dna Sequences With Cosmo, Oliver Bembom, Sunduz Keles, Mark J. Van Der Laan
Supervised Detection Of Conserved Motifs In Dna Sequences With Cosmo, Oliver Bembom, Sunduz Keles, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
A number of computational methods have been proposed for identifying transcription factor binding sites from a set of unaligned sequences that are thought to share the motif in question. We here introduce an algorithm, called cosmo, that allows this search to be supervised by specifying a set of constraints that the position weight matrix of the unknown motif must satisfy. Such constraints may be formulated, for example, on the basis of prior knowledge about the structure of the transcription factor in question. The algorithm is based on the same two-component multinomial mixture model used by MEME, with stronger reliance, however, …
Fdr And Bayesian Multiple Comparisons Rules, Peter Muller, Giovanni Parmigiani, Kenneth Rice
Fdr And Bayesian Multiple Comparisons Rules, Peter Muller, Giovanni Parmigiani, Kenneth Rice
Johns Hopkins University, Dept. of Biostatistics Working Papers
We discuss Bayesian approaches to multiple comparison problems, using a decision theoretic perspective to critically compare competing approaches. We set up decision problems that lead to the use of FDR-based rules and generalizations. Alternative definitions of the probability model and the utility function lead to different rules and problem-specific adjustments. Using a loss function that controls realized FDR we derive an optimal Bayes rule that is a variation of the Benjamini and Hochberg (1995) procedure. The cutoff is based on increments in ordered posterior probabilities instead of ordered p- values. Throughout the discussion we take a Bayesian perspective. In particular, …
Exploration, Normalization, And Genotype Calls Of High Density Oligonucleotide Snp Array Data, Benilton Carvalho, Terence P. Speed, Rafael A. Irizarry
Exploration, Normalization, And Genotype Calls Of High Density Oligonucleotide Snp Array Data, Benilton Carvalho, Terence P. Speed, Rafael A. Irizarry
Johns Hopkins University, Dept. of Biostatistics Working Papers
In most microarray technologies, a number of critical steps are required to convert raw intensity measurements into the data relied upon by data analysts, biologists and clinicians. These data manipulations, referred to as preprocessing, can influence the quality of the ultimate measurements. In the last few years, the high-throughput measurement of gene expression is the most popular application of microarray technology. For this application, various groups have demonstrated that the use of modern statistical methodology can substantially improve accuracy and precision of gene expression measurements, relative to ad-hoc procedures introduced by designers and manufacturers of the technology. Currently, other applications …
Multivariate Analysis And Visualization Of Splicing Correlations In Single-Gene Transcriptomes, Mark C. Emerick, Giovanni Parmigiani, William S. Agnew
Multivariate Analysis And Visualization Of Splicing Correlations In Single-Gene Transcriptomes, Mark C. Emerick, Giovanni Parmigiani, William S. Agnew
Johns Hopkins University, Dept. of Biostatistics Working Papers
Through ‘combinatorial splicing’, RNA metabolism may create enormous structural diversity in the proteome. Functional interactions among multiple alternative domains can have a disproportionate impact on the phenotype, requiring integrated RNA-level regulation of molecular composition. Splicing correlations within molecules expressed from a single gene, where these effects would be greatest, provide valuable clues to functional relationships and targets for splicing regulation. We present tools to visualize complex splicing patterns in full-length cDNA libraries. Developmental changes in pair-wise correlations are presented vectorially in ‘clock plots’ and linkage grids. Higher-order correlations are assessed via a loglinear model and Monte Carlo analysis with an …
A Faster Circular Binary Segmentation Algorithm For The Analysis Of Array Cgh Data, E S. Venkatraman, Adam Olshen
A Faster Circular Binary Segmentation Algorithm For The Analysis Of Array Cgh Data, E S. Venkatraman, Adam Olshen
Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series
Motivation: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number (Olshen {\it et~al}, 2004). The algorithm tests for change-points using a maximal $t$-statistic with a permutation reference distribution to obtain the corresponding $p$-value. The number of computations required for the maximal test statistic is $O(N^2),$ where $N$ is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the …
Plasq: A Generalized Linear Model-Based Procedure To Determine Allelic Dosage Ini Cancer Cells From Snp Array Data, Thomas Laframboise, David P. Harrington, Barbara A. Weir
Plasq: A Generalized Linear Model-Based Procedure To Determine Allelic Dosage Ini Cancer Cells From Snp Array Data, Thomas Laframboise, David P. Harrington, Barbara A. Weir
Harvard University Biostatistics Working Paper Series
No abstract provided.
Selecting 'Significant' Differentially Expressed Genes From The Combined Perspective Of The Null And The Alternative, Beatrijs Moerkerke, Els Goetghebeur
Selecting 'Significant' Differentially Expressed Genes From The Combined Perspective Of The Null And The Alternative, Beatrijs Moerkerke, Els Goetghebeur
Harvard University Biostatistics Working Paper Series
No abstract provided.
Poor Performance Of Bootstrap Confidence Intervals For The Location Of A Quantitative Trait Loucs, Ani Manichaikul, Josee Dupuis, Saunak Sen, Karl W. Broman
Poor Performance Of Bootstrap Confidence Intervals For The Location Of A Quantitative Trait Loucs, Ani Manichaikul, Josee Dupuis, Saunak Sen, Karl W. Broman
Johns Hopkins University, Dept. of Biostatistics Working Papers
The aim of many genetic studies is to locate the genomic regions (called quantitative trait loci, QTLs) that contribute to variation in a quantitative trait (such as body weight). Confidence intervals for the locations of QTLs are particularly important for the design of further experiments to identify the gene or genes responsible for the effect. Likelihood support intervals are the most widely used method to obtain confidence intervals for QTL location, but the non-parametric bootstrap has also been recommended. Through extensive computer simulation, we show that bootstrap confidence intervals are poorly behaved and so should not be used in this …
Feature-Level Exploration Of The Choe Et Al. Affymetrix Genechip Control Dataset, Rafael A. Irizarry, Leslie Cope, Zhijin Wu
Feature-Level Exploration Of The Choe Et Al. Affymetrix Genechip Control Dataset, Rafael A. Irizarry, Leslie Cope, Zhijin Wu
Johns Hopkins University, Dept. of Biostatistics Working Papers
We describe why the Choe et al. control dataset should not be used to assess GeneChip expression measures.
Genome Scanning Methods For Comparing Sequences Between Groups, With Application To Hiv Vaccine Trials, Peter B. Gilbert, Chunyuan Wu, David V. Jobes
Genome Scanning Methods For Comparing Sequences Between Groups, With Application To Hiv Vaccine Trials, Peter B. Gilbert, Chunyuan Wu, David V. Jobes
UW Biostatistics Working Paper Series
Consider a placebo-controlled preventive HIV vaccine efficacy trial. An HIV amino acid sequence is measured from each volunteer who acquires HIV, and these sequences are aligned together with the reference HIV sequence represented in the vaccine. We develop genome scanning methods to identify HIV positions at which the amino acids in sequences from infected vaccine recipients tend to be more divergent from the corresponding reference amino acid than the amino acids in sequences from infected placebo recipients. We consider five two-sample test statistics, based on Euclidean, Mahalanobis, and Kullback-Leibler divergence measures. Weights are incorporated to reflect biological information contained in …
2^K Factorials In Blocks Of Size 2, With Application To Two-Color Microarray Experiments, Kathleen F. Kerr
2^K Factorials In Blocks Of Size 2, With Application To Two-Color Microarray Experiments, Kathleen F. Kerr
UW Biostatistics Working Paper Series
When a two-level design must be run in blocks of size two, there is a unique blocking scheme that enables estimation of all the main effects. Unfortunately this design does not enable estimation of any two-factor interactions. When the experimental goal is to estimate all main effects and two-factor interactions, it is necessary to combine replicates of the experiment that use different blocking schemes. In this paper we identify such designs for up to eight factors that enable estimation of all main effects and two-factor interactions with the fewest number of replications. In addition, we give a construction for general …
Multiple Tests Of Association With Biological Annotation Metadata, Sandrine Dudoit, Sunduz Keles, Mark J. Van Der Laan
Multiple Tests Of Association With Biological Annotation Metadata, Sandrine Dudoit, Sunduz Keles, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
We propose a general and formal statistical framework for the multiple tests of associations between known fixed features of a genome and unknown parameters of the distribution of variable features of this genome in a population of interest. The known fixed gene-annotation profiles, corresponding to the fixed features of the genome, may concern Gene Ontology (GO) annotation, pathway membership, regulation by particular transcription factors, nucleotide sequences, or protein sequences. The unknown gene-parameter profiles, corresponding to the variable features of the genome, may be, for example, regression coefficients relating genome-wide transcript levels or DNA copy numbers to possibly censored biological and …
Nonparametric Pathway-Based Regression Models For Analysis Of Genomic Data, Zhi Wei, Hongzhe Li
Nonparametric Pathway-Based Regression Models For Analysis Of Genomic Data, Zhi Wei, Hongzhe Li
UPenn Biostatistics Working Papers
High-throughout genomic data provide an opportunity for identifying pathways and genes that are related to various clinical phenotypes. Besides these genomic data, another valuable source of data is the biological knowledge about genes and pathways that might be related to the phenotypes of many complex diseases. Databases of such knowledge are often called the metadata. In microarray data analysis, such metadata are currently explored in post hoc ways by gene set enrichment analysis but have hardly been utilized in the modeling step. We propose to develop and evaluate a pathway-based gradient descent boosting procedure for nonparametric pathways-based regression(NPR) analysis to …
Visualizing Genomic Data, Robert Gentleman, Florian Hahne, Wolfgang Huber
Visualizing Genomic Data, Robert Gentleman, Florian Hahne, Wolfgang Huber
Bioconductor Project Working Papers
The advent of experimental techniques capable of probing biomolecules and cells at high levels of resolution has led to a rapid change in the methods used for the analysis of experimental molecular biology data. In this article we give an overview over visualization techniques and methods that can be used to assess various aspects of genomic data.