Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Physical Sciences and Mathematics (10)
- Statistics and Probability (10)
- Epidemiology (6)
- Medicine and Health Sciences (6)
- Public Health (6)
-
- Categorical Data Analysis (5)
- Genetics (5)
- Longitudinal Data Analysis and Time Series (5)
- Survival Analysis (5)
- Microarrays (4)
- Biostatistics (2)
- Statistical Methodology (2)
- Statistical Theory (2)
- Design of Experiments and Sample Surveys (1)
- Laboratory and Basic Science Research (1)
- Multivariate Analysis (1)
- Statistical Models (1)
- Keyword
-
- Genetics (5)
- Gene Set Enrichment; microarray; ALL; (1)
- Alternative p-values (1)
- As-treated analysis; Per-protocol analysis; Causal inference; Instrumental variables; Principal stratification; Propensity scores (1)
- Asymptotic bias and variance; Clustered survival data; Efficiency; Estimating equation; Kernel smoothing; Marginal model; Sandwich estimator (1)
-
- Asymptotic bias; EM algorithm; Maximum likelihood estimator; Measurement error; Structural modeling; Transitional Models (1)
- Asymptotic efficiency; Conditional score method; Functional modeling; Measurement error; Longitudinal data; Semiparametric inference; Transition models (1)
- BLUPs; Kernel function; Model/variable selection; Nonparametric regression; Penalized likelihood; REML; Score test; Smoothing parameter; Support vector machines (1)
- Balanced testing (1)
- Block design (1)
- Blocked factorial (1)
- Clinical trials; Doubly randomized preference trials; EM algorithm; Partically randomized preference trials; Randomization; Selection bias (1)
- Decision problems; Multiplicities; False discovery rate (1)
- Empirical Bayes; False discovery rate; Clustering; Density estimation (1)
- Factorial Design (1)
- False Discovery Rate; Genetics; High Dimensional Data; Human Immunode Effciency Virus; Kullback-Leibler; Mahalanobis; Multinomial; Sequence Analysis (1)
- Fractional factorial (1)
- Gene Set Enrichment; microarray; ALL; (1)
- Gene expression (1)
- Graphical methods; Hierarchical models; Interactions; Lasso; Log-linear models; Variable selection (1)
- Microarray (1)
- Visualization (1)
Articles 1 - 22 of 22
Full-Text Articles in Computational Biology
Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh
Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh
Harvard University Biostatistics Working Paper Series
No abstract provided.
Penalized Likelihood And Bayesian Methods For Sparse Contingency Tables: An Analysis Of Alternative Splicing In Full-Length Cdna Libraries, Corinne Dahinden, Giovanni Parmigiani, Mark C. Emerick, Peter Buhlmann
Penalized Likelihood And Bayesian Methods For Sparse Contingency Tables: An Analysis Of Alternative Splicing In Full-Length Cdna Libraries, Corinne Dahinden, Giovanni Parmigiani, Mark C. Emerick, Peter Buhlmann
Johns Hopkins University, Dept. of Biostatistics Working Papers
We develop methods to perform model selection and parameter estimation in loglinear models for the analysis of sparse contingency tables to study the interaction of two or more factors. Typically, datasets arising from so-called full-length cDNA libraries, in the context of alternatively spliced genes, lead to such sparse contingency tables. Maximum Likelihood estimation of log-linear model coefficients fails to work because of zero cell entries. Therefore new methods are required to estimate the coefficients and to perform model selection. Our suggestions include computationally efficient penalization (Lasso-type) approaches as well as Bayesian methods using MCMC. We compare these procedures in a …
Multiple Testing With An Empirical Alternative Hypothesis, James E. Signorovitch
Multiple Testing With An Empirical Alternative Hypothesis, James E. Signorovitch
Harvard University Biostatistics Working Paper Series
An optimal multiple testing procedure is identified for linear hypotheses under the general linear model, maximizing the expected number of false null hypotheses rejected at any significance level. The optimal procedure depends on the unknown data-generating distribution, but can be consistently estimated. Drawing information together across many hypotheses, the estimated optimal procedure provides an empirical alternative hypothesis by adapting to underlying patterns of departure from the null. Proposed multiple testing procedures based on the empirical alternative are evaluated through simulations and an application to gene expression microarray data. Compared to a standard multiple testing procedure, it is not unusual for …
Estimating Genome-Wide Copy Number Using Allele Specific Mixture Models, Wenyi Wang , Benilton Caravalho, Nate Miller, Jonathan Pevsner, Aravinda Chakravarti, Rafael A. Irizarry
Estimating Genome-Wide Copy Number Using Allele Specific Mixture Models, Wenyi Wang , Benilton Caravalho, Nate Miller, Jonathan Pevsner, Aravinda Chakravarti, Rafael A. Irizarry
Johns Hopkins University, Dept. of Biostatistics Working Papers
Genomic changes such as copy number alterations are thought to be one of the major underlying causes of human phenotypic variation among normal and disease subjects [23,11,25,26,5,4,7,18]. These include chromosomal regions with so-called copy number alterations: instead of the expected two copies, a section of the chromosome for a particular individual may have zero copies (homozygous deletion), one copy (hemizygous deletions), or more than two copies (amplifications). The canonical example is Down syndrome which is caused by an extra copy of chromosome 21. Identification of such abnormalities in smaller regions has been of great interest, because it is believed to …
Exploration Of Distributional Models For A Novel Intensity-Dependent Normalization , Nicola Lama, Patrizia Boracchi, Elia Mario Biganzoli
Exploration Of Distributional Models For A Novel Intensity-Dependent Normalization , Nicola Lama, Patrizia Boracchi, Elia Mario Biganzoli
COBRA Preprint Series
Currently used gene intensity-dependent normalization methods, based on regression smoothing techniques, usually approach the two problems of location bias detrending and data re-scaling without taking into account the censoring characteristic of certain gene expressions produced by experiment measurement constraints or by previous normalization steps. Moreover, the bias vs variance balance control of normalization procedures is not often discussed but left to the user's experience. Here an approximate maximum likelihood procedure to fit a model smoothing the dependences of log-fold gene expression differences on average gene intensities is presented. Central tendency and scaling factor were modeled by means of B-splines smoothing …
Structural Inference In Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Xihong Lin, Donglin Zeng
Structural Inference In Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Xihong Lin, Donglin Zeng
Harvard University Biostatistics Working Paper Series
No abstract provided.
Estimation In Semiparametric Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Donglin Zeng, Xihong Lin
Estimation In Semiparametric Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Donglin Zeng, Xihong Lin
Harvard University Biostatistics Working Paper Series
No abstract provided.
Nonparametric Regression Using Local Kernel Estimating Equations For Correlated Failure Time Data, Zhangsheng Yu, Xihong Lin
Nonparametric Regression Using Local Kernel Estimating Equations For Correlated Failure Time Data, Zhangsheng Yu, Xihong Lin
Harvard University Biostatistics Working Paper Series
No abstract provided.
Causal Inference In Hybrid Intervention Trials Involving Treatment Choice, Qi Long, Rod Little, Xihong Lin
Causal Inference In Hybrid Intervention Trials Involving Treatment Choice, Qi Long, Rod Little, Xihong Lin
Harvard University Biostatistics Working Paper Series
No abstract provided.
A Comparison Of Methods For Estimating The Causal Effect Of A Treatment In Randomized Clinical Trials Subject To Noncompliance, Rod Little, Qi Long, Xihong Lin
A Comparison Of Methods For Estimating The Causal Effect Of A Treatment In Randomized Clinical Trials Subject To Noncompliance, Rod Little, Qi Long, Xihong Lin
Harvard University Biostatistics Working Paper Series
No abstract provided.
Group Additive Regression Models For Genomic Data Analysis, Yihui Luan, Hongzhe Li
Group Additive Regression Models For Genomic Data Analysis, Yihui Luan, Hongzhe Li
UPenn Biostatistics Working Papers
One important problem in genomic research is to identify genomic features such as gene expression data or DNA single nucleotide polymorphisms (SNPs) that are related to clinical phenotypes. Often these genomic data can be naturally divided into biologically meaningful groups such as genes belonging to the same pathways or SNPs within genes. In this paper, we propose group additive regression models and a group gradient descent boosting procedure for identifying groups of genomic features that are related to clinical phenotypes. Our simulation results show that by dividing the variables into appropriate groups, we can obtain better identification of the group …
Extensions To Gene Set Enrichment, Zhen Jiang, Robert Gentleman
Extensions To Gene Set Enrichment, Zhen Jiang, Robert Gentleman
Bioconductor Project Working Papers
Motivation: Gene Set Enrichment Analysis (GSEA) has been developed recently to capture moderate but coordinated changes in the expression of sets of functionally related genes. We propose number of extensions to GSEA, which uses different statistics to describe the association between genes and phenotype of interest. We make use of dimension reduction procedures, such as principle component analysis to identify gene sets containing coordinated genes. We also address the problem of overlapping among gene sets in this paper.
Results: We applied our methods to the data come from a clinical trial in acute lymphoblastic leukemia (ALL) [1]. We identified interesting …
Fdr And Bayesian Multiple Comparisons Rules, Peter Muller, Giovanni Parmigiani, Kenneth Rice
Fdr And Bayesian Multiple Comparisons Rules, Peter Muller, Giovanni Parmigiani, Kenneth Rice
Johns Hopkins University, Dept. of Biostatistics Working Papers
We discuss Bayesian approaches to multiple comparison problems, using a decision theoretic perspective to critically compare competing approaches. We set up decision problems that lead to the use of FDR-based rules and generalizations. Alternative definitions of the probability model and the utility function lead to different rules and problem-specific adjustments. Using a loss function that controls realized FDR we derive an optimal Bayes rule that is a variation of the Benjamini and Hochberg (1995) procedure. The cutoff is based on increments in ordered posterior probabilities instead of ordered p- values. Throughout the discussion we take a Bayesian perspective. In particular, …
Exploration, Normalization, And Genotype Calls Of High Density Oligonucleotide Snp Array Data, Benilton Carvalho, Terence P. Speed, Rafael A. Irizarry
Exploration, Normalization, And Genotype Calls Of High Density Oligonucleotide Snp Array Data, Benilton Carvalho, Terence P. Speed, Rafael A. Irizarry
Johns Hopkins University, Dept. of Biostatistics Working Papers
In most microarray technologies, a number of critical steps are required to convert raw intensity measurements into the data relied upon by data analysts, biologists and clinicians. These data manipulations, referred to as preprocessing, can influence the quality of the ultimate measurements. In the last few years, the high-throughput measurement of gene expression is the most popular application of microarray technology. For this application, various groups have demonstrated that the use of modern statistical methodology can substantially improve accuracy and precision of gene expression measurements, relative to ad-hoc procedures introduced by designers and manufacturers of the technology. Currently, other applications …
Multivariate Analysis And Visualization Of Splicing Correlations In Single-Gene Transcriptomes, Mark C. Emerick, Giovanni Parmigiani, William S. Agnew
Multivariate Analysis And Visualization Of Splicing Correlations In Single-Gene Transcriptomes, Mark C. Emerick, Giovanni Parmigiani, William S. Agnew
Johns Hopkins University, Dept. of Biostatistics Working Papers
Through ‘combinatorial splicing’, RNA metabolism may create enormous structural diversity in the proteome. Functional interactions among multiple alternative domains can have a disproportionate impact on the phenotype, requiring integrated RNA-level regulation of molecular composition. Splicing correlations within molecules expressed from a single gene, where these effects would be greatest, provide valuable clues to functional relationships and targets for splicing regulation. We present tools to visualize complex splicing patterns in full-length cDNA libraries. Developmental changes in pair-wise correlations are presented vectorially in ‘clock plots’ and linkage grids. Higher-order correlations are assessed via a loglinear model and Monte Carlo analysis with an …
Plasq: A Generalized Linear Model-Based Procedure To Determine Allelic Dosage Ini Cancer Cells From Snp Array Data, Thomas Laframboise, David P. Harrington, Barbara A. Weir
Plasq: A Generalized Linear Model-Based Procedure To Determine Allelic Dosage Ini Cancer Cells From Snp Array Data, Thomas Laframboise, David P. Harrington, Barbara A. Weir
Harvard University Biostatistics Working Paper Series
No abstract provided.
Selecting 'Significant' Differentially Expressed Genes From The Combined Perspective Of The Null And The Alternative, Beatrijs Moerkerke, Els Goetghebeur
Selecting 'Significant' Differentially Expressed Genes From The Combined Perspective Of The Null And The Alternative, Beatrijs Moerkerke, Els Goetghebeur
Harvard University Biostatistics Working Paper Series
No abstract provided.
Feature-Level Exploration Of The Choe Et Al. Affymetrix Genechip Control Dataset, Rafael A. Irizarry, Leslie Cope, Zhijin Wu
Feature-Level Exploration Of The Choe Et Al. Affymetrix Genechip Control Dataset, Rafael A. Irizarry, Leslie Cope, Zhijin Wu
Johns Hopkins University, Dept. of Biostatistics Working Papers
We describe why the Choe et al. control dataset should not be used to assess GeneChip expression measures.
Genome Scanning Methods For Comparing Sequences Between Groups, With Application To Hiv Vaccine Trials, Peter B. Gilbert, Chunyuan Wu, David V. Jobes
Genome Scanning Methods For Comparing Sequences Between Groups, With Application To Hiv Vaccine Trials, Peter B. Gilbert, Chunyuan Wu, David V. Jobes
UW Biostatistics Working Paper Series
Consider a placebo-controlled preventive HIV vaccine efficacy trial. An HIV amino acid sequence is measured from each volunteer who acquires HIV, and these sequences are aligned together with the reference HIV sequence represented in the vaccine. We develop genome scanning methods to identify HIV positions at which the amino acids in sequences from infected vaccine recipients tend to be more divergent from the corresponding reference amino acid than the amino acids in sequences from infected placebo recipients. We consider five two-sample test statistics, based on Euclidean, Mahalanobis, and Kullback-Leibler divergence measures. Weights are incorporated to reflect biological information contained in …
2^K Factorials In Blocks Of Size 2, With Application To Two-Color Microarray Experiments, Kathleen F. Kerr
2^K Factorials In Blocks Of Size 2, With Application To Two-Color Microarray Experiments, Kathleen F. Kerr
UW Biostatistics Working Paper Series
When a two-level design must be run in blocks of size two, there is a unique blocking scheme that enables estimation of all the main effects. Unfortunately this design does not enable estimation of any two-factor interactions. When the experimental goal is to estimate all main effects and two-factor interactions, it is necessary to combine replicates of the experiment that use different blocking schemes. In this paper we identify such designs for up to eight factors that enable estimation of all main effects and two-factor interactions with the fewest number of replications. In addition, we give a construction for general …
Nonparametric Pathway-Based Regression Models For Analysis Of Genomic Data, Zhi Wei, Hongzhe Li
Nonparametric Pathway-Based Regression Models For Analysis Of Genomic Data, Zhi Wei, Hongzhe Li
UPenn Biostatistics Working Papers
High-throughout genomic data provide an opportunity for identifying pathways and genes that are related to various clinical phenotypes. Besides these genomic data, another valuable source of data is the biological knowledge about genes and pathways that might be related to the phenotypes of many complex diseases. Databases of such knowledge are often called the metadata. In microarray data analysis, such metadata are currently explored in post hoc ways by gene set enrichment analysis but have hardly been utilized in the modeling step. We propose to develop and evaluate a pathway-based gradient descent boosting procedure for nonparametric pathways-based regression(NPR) analysis to …
Visualizing Genomic Data, Robert Gentleman, Florian Hahne, Wolfgang Huber
Visualizing Genomic Data, Robert Gentleman, Florian Hahne, Wolfgang Huber
Bioconductor Project Working Papers
The advent of experimental techniques capable of probing biomolecules and cells at high levels of resolution has led to a rapid change in the methods used for the analysis of experimental molecular biology data. In this article we give an overview over visualization techniques and methods that can be used to assess various aspects of genomic data.