Open Access. Powered by Scholars. Published by Universities.®

Genetics and Genomics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 23 of 23

Full-Text Articles in Genetics and Genomics

Minimum Description Length Measures Of Evidence For Enrichment, Zhenyu Yang, David R. Bickel Dec 2010

Minimum Description Length Measures Of Evidence For Enrichment, Zhenyu Yang, David R. Bickel

COBRA Preprint Series

In order to functionally interpret differentially expressed genes or other discovered features, researchers seek to detect enrichment in the form of overrepresentation of discovered features associated with a biological process. Most enrichment methods treat the p-value as the measure of evidence using a statistical test such as the binomial test, Fisher's exact test or the hypergeometric test. However, the p-value is not interpretable as a measure of evidence apart from adjustments in light of the sample size. As a measure of evidence supporting one hypothesis over the other, the Bayes factor (BF) overcomes this drawback of the p-value but lacks …


A Bayesian Shared Component Model For Genetic Association Studies, Juan J. Abellan, Carlos Abellan, Juan R. Gonzalez Nov 2010

A Bayesian Shared Component Model For Genetic Association Studies, Juan J. Abellan, Carlos Abellan, Juan R. Gonzalez

COBRA Preprint Series

We present a novel approach to address genome association studies between single nucleotide polymorphisms (SNPs) and disease. We propose a Bayesian shared component model to tease out the genotype information that is common to cases and controls from the one that is specific to cases only. This allows to detect the SNPs that show the strongest association with the disease. The model can be applied to case-control studies with more than one disease. In fact, we illustrate the use of this model with a dataset of 23,418 SNPs from a case-control study by The Welcome Trust Case Control Consortium (2007) …


Minimum Description Length And Empirical Bayes Methods Of Identifying Snps Associated With Disease, Ye Yang, David R. Bickel Nov 2010

Minimum Description Length And Empirical Bayes Methods Of Identifying Snps Associated With Disease, Ye Yang, David R. Bickel

COBRA Preprint Series

The goal of determining which of hundreds of thousands of SNPs are associated with disease poses one of the most challenging multiple testing problems. Using the empirical Bayes approach, the local false discovery rate (LFDR) estimated using popular semiparametric models has enjoyed success in simultaneous inference. However, the estimated LFDR can be biased because the semiparametric approach tends to overestimate the proportion of the non-associated single nucleotide polymorphisms (SNPs). One of the negative consequences is that, like conventional p-values, such LFDR estimates cannot quantify the amount of information in the data that favors the null hypothesis of no disease-association.

We …


A Novel Totivirus And Piscine Reovirus (Prv) In Atlantic Salmon (Salmo Salar) With Cardiomyopathy Syndrome (Cms), Torstein Tengs Nov 2010

A Novel Totivirus And Piscine Reovirus (Prv) In Atlantic Salmon (Salmo Salar) With Cardiomyopathy Syndrome (Cms), Torstein Tengs

Dr. Torstein Tengs

BACKGROUNDCardiomyopathy syndrome (CMS) is a severe disease affecting large farmed Atlantic salmon. Mortality often appears without prior clinical signs, typically shortly prior to slaughter. We recently reported the finding and the complete genomic sequence of a novel piscine reovirus (PRV), which is associated with another cardiac disease in Atlantic salmon; heart and skeletal muscle inflammation (HSMI). In the present work we have studied whether PRV or other infectious agents may be involved in the etiology of CMS.RESULTSUsing high throughput sequencing on heart samples from natural outbreaks of CMS and from fish experimentally challenged with material from fish diagnosed with CMS …


Sample Size And Statistical Power Considerations In High-Dimensionality Data Settings: A Comparative Study Of Classification Algorithms, Yu Guo, Armin Garber, Raji Balasubramanian Sep 2010

Sample Size And Statistical Power Considerations In High-Dimensionality Data Settings: A Comparative Study Of Classification Algorithms, Yu Guo, Armin Garber, Raji Balasubramanian

Raji Balasubramanian

Background: Data generated using ‘omics’ technologies are characterized by high dimensionality, where the number of features measured per subject vastly exceeds the number of subjects in the study. In this paper, we consider issues relevant in the design of biomedical studies in which the goal is the discovery of a subset of features and an associated algorithm that can predict a binary outcome, such as disease status. We compare the performance of four commonly used classifiers (K-Nearest Neighbors, Prediction Analysis for Microarrays, Random Forests and Support Vector Machines) in high-dimensionality data settings. We evaluate the effects of varying levels of …


A Perturbation Method For Inference On Regularized Regression Estimates, Jessica Minnier, Lu Tian, Tianxi Cai Aug 2010

A Perturbation Method For Inference On Regularized Regression Estimates, Jessica Minnier, Lu Tian, Tianxi Cai

Harvard University Biostatistics Working Paper Series

No abstract provided.


Heart And Skeletal Muscle Inflammation Of Farmed Salmon Is Associated With Infection With A Novel Reovirus, Torstein Tengs Jul 2010

Heart And Skeletal Muscle Inflammation Of Farmed Salmon Is Associated With Infection With A Novel Reovirus, Torstein Tengs

Dr. Torstein Tengs

Atlantic salmon (Salmo salar L.) mariculture has been associated with epidemics of infectious diseases that threaten not only local production, but also wild fish coming into close proximity to marine pens and fish escaping from them. Heart and skeletal muscle inflammation (HSMI) is a frequently fatal disease of farmed Atlantic salmon. First recognized in one farm in Norway in 1999, HSMI was subsequently implicated in outbreaks in other farms in Norway and the United Kingdom. Although pathology and disease transmission studies indicated an infectious basis, efforts to identify an agent were unsuccessful. Here we provide evidence that HSMI is associated …


The Strength Of Statistical Evidence For Composite Hypotheses: Inference To The Best Explanation, David R. Bickel Jun 2010

The Strength Of Statistical Evidence For Composite Hypotheses: Inference To The Best Explanation, David R. Bickel

COBRA Preprint Series

A general function to quantify the weight of evidence in a sample of data for one hypothesis over another is derived from the law of likelihood and from a statistical formalization of inference to the best explanation. For a fixed parameter of interest, the resulting weight of evidence that favors one composite hypothesis over another is the likelihood ratio using the parameter value consistent with each hypothesis that maximizes the likelihood function over the parameter of interest. Since the weight of evidence is generally only known up to a nuisance parameter, it is approximated by replacing the likelihood function with …


Non-Prejudiced Detection And Characterization Of Genetic Modifications, Torstein Tengs Jun 2010

Non-Prejudiced Detection And Characterization Of Genetic Modifications, Torstein Tengs

Dr. Torstein Tengs

The application of gene technology is becoming widespread much thanks to the rapid increase in technology, resource, and knowledge availability. Consequently, the diversity and number of genetically modified organisms (GMOs) that may find their way into the food chain or the environment, intended or unintended, is rapidly growing. From a safety point of view the ability to detect and characterize in detail any GMO, independent of publicly available information, is fundamental. Pre-release risk assessments of GMOs are required in most jurisdictions and are usually based on application of technologies with limited ability to detect unexpected rearrangements and insertions. We present …


Powerful Snp Set Analysis For Case-Control Genome Wide Association Studies, Michael C. Wu, Peter Kraft, Michael P. Epstein, Deanne M. Taylor, Stephen J. Chanock, David J. Hunter, Xihong Lin May 2010

Powerful Snp Set Analysis For Case-Control Genome Wide Association Studies, Michael C. Wu, Peter Kraft, Michael P. Epstein, Deanne M. Taylor, Stephen J. Chanock, David J. Hunter, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.


Comparison Of Nine Different Real-Time Pcr Chemistries For Qualitative And Quantitative Applications In Gmo Detection, Torstein Tengs Mar 2010

Comparison Of Nine Different Real-Time Pcr Chemistries For Qualitative And Quantitative Applications In Gmo Detection, Torstein Tengs

Dr. Torstein Tengs

Several techniques have been developed for detection and quantification of genetically modified organisms, but quantitative real-time PCR is by far the most popular approach. Among the most commonly used realtime PCR chemistries are TaqMan probes and SYBR green, but many other detection chemistries have also been developed. Because their performance has never been compared systematically, here we present an extensive evaluation of some promising chemistries: sequenceunspecific DNA labeling dyes (SYBR green), primer-based technologies (AmpliFluor, Plexor, Lux primers), and techniques involving double-labeled probes, comprising hybridization (molecular beacon) and hydrolysis (TaqMan, CPT, LNA, and MGB) probes, based on recently published experimental data. …


Wavelet-Based Functional Linear Mixed Models: An Application To Measurement Error–Corrected Distributed Lag Models, Elizabeth J. Malloy, Jeffrey S. Morris, Sara D. Adar, Helen Suh, Diane R. Gold, Brent A. Coull Jan 2010

Wavelet-Based Functional Linear Mixed Models: An Application To Measurement Error–Corrected Distributed Lag Models, Elizabeth J. Malloy, Jeffrey S. Morris, Sara D. Adar, Helen Suh, Diane R. Gold, Brent A. Coull

Jeffrey S. Morris

Frequently, exposure data are measured over time on a grid of discrete values that collectively define a functional observation. In many applications, researchers are interested in using these measurements as covariates to predict a scalar response in a regression setting, with interest focusing on the most biologically relevant time window of exposure. One example is in panel studies of the health effects of particulate matter (PM), where particle levels are measured over time. In such studies, there are many more values of the functional data than observations in the data set so that regularization of the corresponding functional regression coefficient …


Members’ Discoveries: Fatal Flaws In Cancer Research, Jeffrey S. Morris Jan 2010

Members’ Discoveries: Fatal Flaws In Cancer Research, Jeffrey S. Morris

Jeffrey S. Morris

A recent article published in The Annals of Applied Statistics (AOAS) by two MD Anderson researchers—Keith Baggerly and Kevin Coombes—dissects results from a highly-influential series of medical papers involving genomics-driven personalized cancer therapy, and outlines a series of simple yet fatal flaws that raises serious questions about the veracity of the original results. Having immediate and strong impact, this paper, along with related work, is providing the impetus for new standards of reproducibility in scientific research.


Statistical Contributions To Proteomic Research, Jeffrey S. Morris, Keith A. Baggerly, Howard B. Gutstein, Kevin R. Coombes Jan 2010

Statistical Contributions To Proteomic Research, Jeffrey S. Morris, Keith A. Baggerly, Howard B. Gutstein, Kevin R. Coombes

Jeffrey S. Morris

Proteomic profiling has the potential to impact the diagnosis, prognosis, and treatment of various diseases. A number of different proteomic technologies are available that allow us to look at many proteins at once, and all of them yield complex data that raise significant quantitative challenges. Inadequate attention to these quantitative issues can prevent these studies from achieving their desired goals, and can even lead to invalid results. In this chapter, we describe various ways the involvement of statisticians or other quantitative scientists in the study team can contribute to the success of proteomic research, and we outline some of the …


Informatics And Statistics For Analyzing 2-D Gel Electrophoresis Images, Andrew W. Dowsey, Jeffrey S. Morris, Howard G. Gutstein, Guang Z. Yang Jan 2010

Informatics And Statistics For Analyzing 2-D Gel Electrophoresis Images, Andrew W. Dowsey, Jeffrey S. Morris, Howard G. Gutstein, Guang Z. Yang

Jeffrey S. Morris

Whilst recent progress in ‘shotgun’ peptide separation by integrated liquid chromatography and mass spectrometry (LC/MS) has enabled its use as a sensitive analytical technique, proteome coverage and reproducibility is still limited and obtaining enough replicate runs for biomarker discovery is a challenge. For these reasons, recent research demonstrates the continuing need for protein separation by two-dimensional gel electrophoresis (2-DE). However, with traditional 2-DE informatics, the digitized images are reduced to symbolic data though spot detection and quantification before proteins are compared for differential expression by spot matching. Recently, a more robust and automated paradigm has emerged where gels are directly …


Bayesian Random Segmentationmodels To Identify Shared Copy Number Aberrations For Array Cgh Data, Veerabhadran Baladandayuthapani, Yuan Ji, Rajesh Talluri, Luis E. Nieto-Barajas, Jeffrey S. Morris Jan 2010

Bayesian Random Segmentationmodels To Identify Shared Copy Number Aberrations For Array Cgh Data, Veerabhadran Baladandayuthapani, Yuan Ji, Rajesh Talluri, Luis E. Nieto-Barajas, Jeffrey S. Morris

Jeffrey S. Morris

Array-based comparative genomic hybridization (aCGH) is a high-resolution high-throughput technique for studying the genetic basis of cancer. The resulting data consists of log fluorescence ratios as a function of the genomic DNA location and provides a cytogenetic representation of the relative DNA copy number variation. Analysis of such data typically involves estimation of the underlying copy number state at each location and segmenting regions of DNA with similar copy number states. Most current methods proceed by modeling a single sample/array at a time, and thus fail to borrow strength across multiple samples to infer shared regions of copy number aberrations. …


Discrete Nonparametric Algorithms For Outlier Detection With Genomic Data, Debashis Ghosh Jan 2010

Discrete Nonparametric Algorithms For Outlier Detection With Genomic Data, Debashis Ghosh

Debashis Ghosh

In high-throughput studies involving genetic data such as from gene expression mi- croarrays, dierential expression analysis between two or more experimental conditions has been a very common analytical task. Much of the resulting literature on multiple comparisons has paid relatively little attention to the choice of test statistic. In this article, we focus on the issue of choice of test statistic based on a special pattern of dierential expression. The approach here is based on recasting multiple comparisons procedures for assessing outlying expression values. A major complication is that the resulting p-values are discrete; some theoretical properties of sequential testing …


Detecting Outlier Genes From High-Dimensional Data: A Fuzzy Approach, Debashis Ghosh Jan 2010

Detecting Outlier Genes From High-Dimensional Data: A Fuzzy Approach, Debashis Ghosh

Debashis Ghosh

A recent nding in cancer research has been the characterization of previously undis- covered chromosomal abnormalities in several types of solid tumors. This was found based on analyses of high-throughput data from gene expression microarrays and motivated the development of so-called `outlier' tests for dierential expression. One statistical issue was the potential discreteness of the test statistics. Using ideas from fuzzy set theory, we develop fuzzy outlier detection algorithms that have links to ideas in multiple comparisons. Two- and K-sample extensions are considered. The methodology is illustrated by application to two microarray studies.


Links Between Analysis Of Surrogate Endpoints And Endogeneity, Debashis Ghosh, Jeremy M. Taylor, Michael R. Elliott Jan 2010

Links Between Analysis Of Surrogate Endpoints And Endogeneity, Debashis Ghosh, Jeremy M. Taylor, Michael R. Elliott

Debashis Ghosh

There has been substantive interest in the assessment of surrogate endpoints in medical research. These are measures which could potentially replace \true" endpoints in clinical trials and lead to studies that require less follow-up. Recent research in the area has focused on assessments using causal inference frameworks. Beginning with a simple model for associating the surrogate and true endpoints in the population, we approach the problem as one of endogenous covariates. An instrumental variables estimator and general two-stage algorithm is proposed. Existing surrogacy frameworks are then evaluated in the context of the model. A numerical example is used to illustrate …


Meta-Analysis For Surrogacy: Accelerated Failure Time Models And Semicompeting Risks Modelling, Debashis Ghosh, Jeremy M. Taylor, Daniel J. Sargent Jan 2010

Meta-Analysis For Surrogacy: Accelerated Failure Time Models And Semicompeting Risks Modelling, Debashis Ghosh, Jeremy M. Taylor, Daniel J. Sargent

Debashis Ghosh

There has been great recent interest in the medical and statistical literature in the assessment and validation of surrogate endpoints as proxies for clinical endpoints in medical studies. More recently, authors have focused on using meta-analytical methods for quanti cation of surrogacy. In this article, we extend existing procedures for analysis based on the accelerated failure time model to this setting. An advantage of this approach relative to proportional hazards model is that it allows for analysis in the semi-competing risks setting, where we constrain the surrogate endpoint to occur before the true endpoint. A novel principal components procedure is …


Spline-Based Models For Predictiveness Curves, Debashis Ghosh, Michael Sabel Jan 2010

Spline-Based Models For Predictiveness Curves, Debashis Ghosh, Michael Sabel

Debashis Ghosh

A biomarker is dened to be a biological characteristic that is objectively measured and evaluated as an indicator of normal biologic processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention. The use of biomarkers in cancer has been advocated for a variety of purposes, which include use as surrogate endpoints, early detection of disease, proxies for environmental exposure and risk prediction. We deal with the latter issue in this paper. Several authors have proposed use of the predictiveness curve for assessing the capacity of a biomarker for risk prediction. For most situations, it is reasonable to assume monotonicity of …


Combining Multiple Models With Survival Data: The Phase Algorithm, Debashis Ghosh, Zheng Yuan Jan 2010

Combining Multiple Models With Survival Data: The Phase Algorithm, Debashis Ghosh, Zheng Yuan

Debashis Ghosh

In many scientic studies, one common goal is to develop good prediction rules based on a set of available measurements. This paper proposes a model averaging methodology using proportional hazards regression models to construct new estimators of predicted survival probabilities. A screening step based on an adaptive searching algorithm is used to handle large numbers of covariates. The nite-sample properties of the proposed methodology is assessed using simulation studies. Application of the method to a cancer biomarker study is also given.


Semiparametric Analysis Of Recurrent Events: Artificial Censoring, Truncation, Pairwise Estimation And Inference, Debashis Ghosh Dec 2009

Semiparametric Analysis Of Recurrent Events: Artificial Censoring, Truncation, Pairwise Estimation And Inference, Debashis Ghosh

Debashis Ghosh

The analysis of recurrent failure time data from longitudinal studies can be complicated by the presence of dependent censoring. There has been a substantive literature that has developed based on an artificial censoring device. We explore in this article the connection between this class of methods with truncated data structures. In addition, a new procedure is developed for estimation and inference in a joint model for recurrent events and dependent censoring. Estimation proceeds using a mixed U-statistic based estimating function approach. New resampling-based methods for variance estimation and model checking are also described. The methods are illustrated by application to …