Open Access. Powered by Scholars. Published by Universities.®

Life Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Genomics

Medicine and Health Sciences

Institution
Publication Year
Publication
Publication Type
File Type

Articles 91 - 117 of 117

Full-Text Articles in Life Sciences

Genetic Predictors Of Metabolic Side Effects Of Diuretic Therapy, Jorge L. Del Aguila Aug 2014

Genetic Predictors Of Metabolic Side Effects Of Diuretic Therapy, Jorge L. Del Aguila

Dissertations & Theses (Open Access)

Thiazide diuretics are a recommended first-line monotherapy for hypertension (i.e.SBP>140 mmHg or DBP>90 mmHg). Even so, diuretics are associated with adverse metabolic side effects, such as hyperlipidemia, hyperglycemia and hypokalemia which increase the risk of developing type II diabetes. This thesis used three analytical strategies to identify and quantify genetic factors that contribute to the development of adverse metabolic effects due to thiazide diuretic treatment. I performed a genome-wide association study (GWAS) and meta-analysis of the change in fasting plasma glucose and triglycerides in response to HCTZ from two different clinical trials: the Pharmacogenomic Evaluation of Antihypertensive Responses …


Evaluation Of Eight Live Attenuated Vaccine Candidates For Protection Against Challenge With Virulent Mycobacterium Avium Subspecies Paratuberclosis In Mice, John Bannantine, Jamie L. Everman, Sasha J. J.Rose, Lmar Babrak, Robab Katani, Raul G. Barletta, Adel M. Talaat, Yrjö J. Gröhn, Yung-Fu Chang, Vivek Kapur, Luiz E. Bermudez Jul 2014

Evaluation Of Eight Live Attenuated Vaccine Candidates For Protection Against Challenge With Virulent Mycobacterium Avium Subspecies Paratuberclosis In Mice, John Bannantine, Jamie L. Everman, Sasha J. J.Rose, Lmar Babrak, Robab Katani, Raul G. Barletta, Adel M. Talaat, Yrjö J. Gröhn, Yung-Fu Chang, Vivek Kapur, Luiz E. Bermudez

School of Veterinary and Biomedical Sciences: Faculty Publications

Johne's disease is caused by Mycobacterium avium subsp. paratuberculosis (MAP), which results in serious economic losses worldwide in farmed livestock such as cattle, sheep, and goats. To control this disease, an effective vaccine with minimal adverse effects is needed. In order to identify a live vaccine for Johne's disease, we evaluated eight attenuated mutant strains of MAP using a C57BL/6 mouse model. The persistaence of the vaccine candidates was measured at 6, 12, and 18 weeks post vaccination. Only strains 320, 321, and 329 colonized both the liver and spleens up until the 12-week time point. The remaining five mutants …


How To Get The Most From Microarray Data: Advice From Reverse Genomics, Ivan P. Gorlov, Ji-Yeon Yang, Jinyoung Byun, Christopher Logothetis, Olga Y. Gorlova, Kim-Anh Do, Christopher Amos Mar 2014

How To Get The Most From Microarray Data: Advice From Reverse Genomics, Ivan P. Gorlov, Ji-Yeon Yang, Jinyoung Byun, Christopher Logothetis, Olga Y. Gorlova, Kim-Anh Do, Christopher Amos

Dartmouth Scholarship

Whole-genome profiling of gene expression is a powerful tool for identifying cancer-associated genes. Genes differentially expressed between normal and tumorous tissues are usually considered to be cancer associated. We recently demonstrated that the analysis of interindividual variation in gene expression can be useful for identifying cancer associated genes. The goal of this study was to identify the best microarray data–derived predictor of known cancer associated genes. We found that the traditional approach of identifying cancer genes—identifying differentially expressed genes—is not very efficient. The analysis of interindividual variation of gene expression in tumor samples identifies cancer-associated genes more effectively. The results …


Bayesian Joint Selection Of Genes And Pathways: Applications In Multiple Myeloma Genomics, Lin Zhang, Jeffrey S. Morris, Jiexin Zhang, Robert Orlowski, Veerabhadran Baladandayuthapani Jan 2014

Bayesian Joint Selection Of Genes And Pathways: Applications In Multiple Myeloma Genomics, Lin Zhang, Jeffrey S. Morris, Jiexin Zhang, Robert Orlowski, Veerabhadran Baladandayuthapani

Jeffrey S. Morris

It is well-established that the development of a disease, especially cancer, is a complex process that results from the joint effects of multiple genes involved in various molecular signaling pathways. In this article, we propose methods to discover genes and molecular pathways significantly associ- ated with clinical outcomes in cancer samples. We exploit the natural hierarchal structure of genes related to a given pathway as a group of interacting genes to conduct selection of both pathways and genes. We posit the problem in a hierarchical structured variable selection (HSVS) framework to analyze the corresponding gene expression data. HSVS methods conduct …


Finding Fault?: Exploring Legal Duties To Return Incidental Findings In Genomic Research, Elizabeth R. Pike, Karen H. Rothenberg, Benjamin E. Berkman Jan 2014

Finding Fault?: Exploring Legal Duties To Return Incidental Findings In Genomic Research, Elizabeth R. Pike, Karen H. Rothenberg, Benjamin E. Berkman

Faculty Scholarship

The use of whole genome sequencing in biomedical research is expected to produce dramatic advances in human health. The increasing use of this powerful, data-rich new technology in research, however, will inevitably give rise to incidental findings (IFs), findings with individual health or reproductive significance that are beyond the aims of the particular research, and the related questions of whether and to what extent researchers have an ethical obligation to return IFs. Many have concluded that researchers have an ethical obligation to return some findings in some circumstances, but have provided vague or context-dependent approaches to determining which IFs must …


A Systems Biology Approach To Detect Eqtls Associated With Mirna And Mrna Co-Expression Networks In The Nucleus Accumbens Of Chronic Alcoholic Patients, Mohammed Mamdani Jan 2014

A Systems Biology Approach To Detect Eqtls Associated With Mirna And Mrna Co-Expression Networks In The Nucleus Accumbens Of Chronic Alcoholic Patients, Mohammed Mamdani

Theses and Dissertations

Alcohol Dependence (AD) is a chronic substance use disorder with moderate heritability (60%). Linkage and genome-wide association studies (GWAS) have implicated a number of loci; however, the molecular mechanisms underlying AD are unclear. Advances in systems biology allow genome-wide expression data to be integrated with genetic data to detect expression quantitative trait loci (eQTL), polymorphisms that regulate gene expression levels, influence phenotypes and are significantly enriched among validated genetic signals for many commonly studied traits including AD.

We integrated genome-wide mRNA and miRNA expression data with genotypic data from the nucleus accumbens (NAc), a major addiction-related brain region, of 36 …


Advanced Molecular Biologic Techniques In Toxicologic Disease, Jeanine Ward, Gyongyi Szabo, David Mcmanus, Edward Boyer Oct 2012

Advanced Molecular Biologic Techniques In Toxicologic Disease, Jeanine Ward, Gyongyi Szabo, David Mcmanus, Edward Boyer

Gyongyi Szabo

The advancement of molecular biologic techniques and their capabilities to answer questions pertaining to mechanisms of pathophysiologic events have greatly expanded over the past few years. In particular, these opportunities have provided researchers and clinicians alike the framework from with which to answer clinical questions not amenable for elucidation using previous, more antiquated methods. Utilizing extremely small molecules, namely microRNA, DNA, protein, and nanoparticles, we discuss the background and utility of these approaches to the progressive, practicing physician. Finally, we consider the application of these tools employed as future bedside point of care tests, aiding in the ultimate goal of …


Technical Desiderata For The Integration Of Genomic Data Into Electronic Health Records., Daniel R Masys, Gail P Jarvik, Neil F Abernethy, Nicholas R Anderson, George J Papanicolaou, Dina N Paltoo, Mark A Hoffman, Isaac S Kohane, Howard P Levy Jun 2012

Technical Desiderata For The Integration Of Genomic Data Into Electronic Health Records., Daniel R Masys, Gail P Jarvik, Neil F Abernethy, Nicholas R Anderson, George J Papanicolaou, Dina N Paltoo, Mark A Hoffman, Isaac S Kohane, Howard P Levy

Manuscripts, Articles, Book Chapters and Other Papers

The era of "Personalized Medicine," guided by individual molecular variation in DNA, RNA, expressed proteins and other forms of high volume molecular data brings new requirements and challenges to the design and implementation of Electronic Health Records (EHRs). In this article we describe the characteristics of biomolecular data that differentiate it from other classes of data commonly found in EHRs, enumerate a set of technical desiderata for its management in healthcare settings, and offer a candidate technical approach to its compact and efficient representation in operational systems.


Elucidating The Igfbp2 Signaling Pathway In Glioma Development And Progression, Kristen M. Holmes May 2012

Elucidating The Igfbp2 Signaling Pathway In Glioma Development And Progression, Kristen M. Holmes

Dissertations & Theses (Open Access)

Diffuse gliomas are highly lethal central nervous system malignancies which, unfortunately, are the most common primary brain tumor and also the least responsive to the very few therapeutic modalities currently available to treat them. IGFBP2 is a newly recognized oncogene that is operative in multiple cancer types, including glioma, and shows promise for a targeted therapeutic approach. Elevated IGFBP2 expression is present in high-grade glioma and correlates with poor survival. We have previously demonstrated that IGFBP2 induces glioma development and progression in a spontaneous glioma mouse model, which highlighted its significance and potential for future therapy. However, we did not …


Tissue Sampling Methods And Standards For Vertebrate Genomics, Pamela B. Y. Wong, Edward O. Wiley, Warren E. Johnson, Oliver A. Ryder, Stephen J. O'Brien, David Haussler, Klaus-Peter Koepfli, Marlys L. Houck, Polina L. Perelman, Gabriela Mastromonaco, Andrew C. Bentley, Byrappa Venkatesh, Ya-Ping Zhang, Robert W. Murphy, Genome 10k Project Community Of Scientists Jan 2012

Tissue Sampling Methods And Standards For Vertebrate Genomics, Pamela B. Y. Wong, Edward O. Wiley, Warren E. Johnson, Oliver A. Ryder, Stephen J. O'Brien, David Haussler, Klaus-Peter Koepfli, Marlys L. Houck, Polina L. Perelman, Gabriela Mastromonaco, Andrew C. Bentley, Byrappa Venkatesh, Ya-Ping Zhang, Robert W. Murphy, Genome 10k Project Community Of Scientists

Biology Faculty Articles

The recent rise in speed and efficiency of new sequencing technologies have facilitated high-throughput sequencing, assembly and analyses of genomes, advancing ongoing efforts to analyze genetic sequences across major vertebrate groups. Standardized procedures in acquiring high quality DNA and RNA and establishing cell lines from target species will facilitate these initiatives. We provide a legal and methodological guide according to four standards of acquiring and storing tissue for the Genome 10K Project and similar initiatives as follows: four-star (banked tissue/cell cultures, RNA from multiple types of tissue for transcriptomes, and sufficient flash-frozen tissue for 1 mg of DNA, all from …


Integrative Bayesian Analysis Of High-Dimensional Multi-Platform Genomics Data, Wenting Wang, Veerabhadran Baladandayuthapani, Jeffrey S. Morris, Bradley M. Broom, Ganiraju C. Manyam, Kim-Anh Do Jan 2012

Integrative Bayesian Analysis Of High-Dimensional Multi-Platform Genomics Data, Wenting Wang, Veerabhadran Baladandayuthapani, Jeffrey S. Morris, Bradley M. Broom, Ganiraju C. Manyam, Kim-Anh Do

Jeffrey S. Morris

Motivation: Analyzing data from multi-platform genomics experiments combined with patients’ clinical outcomes helps us understand the complex biological processes that characterize a disease, as well as how these processes relate to the development of the disease. Current integration approaches that treat the data are limited in that they do not consider the fundamental biological relationships that exist among the data from platforms.

Statistical Model: We propose an integrative Bayesian analysis of genomics data (iBAG) framework for identifying important genes/biomarkers that are associated with clinical outcome. This framework uses a hierarchical modeling technique to combine the data obtained from multiple platforms …


Members’ Discoveries: Fatal Flaws In Cancer Research, Jeffrey S. Morris Jan 2010

Members’ Discoveries: Fatal Flaws In Cancer Research, Jeffrey S. Morris

Jeffrey S. Morris

A recent article published in The Annals of Applied Statistics (AOAS) by two MD Anderson researchers—Keith Baggerly and Kevin Coombes—dissects results from a highly-influential series of medical papers involving genomics-driven personalized cancer therapy, and outlines a series of simple yet fatal flaws that raises serious questions about the veracity of the original results. Having immediate and strong impact, this paper, along with related work, is providing the impetus for new standards of reproducibility in scientific research.


Statistical Contributions To Proteomic Research, Jeffrey S. Morris, Keith A. Baggerly, Howard B. Gutstein, Kevin R. Coombes Jan 2010

Statistical Contributions To Proteomic Research, Jeffrey S. Morris, Keith A. Baggerly, Howard B. Gutstein, Kevin R. Coombes

Jeffrey S. Morris

Proteomic profiling has the potential to impact the diagnosis, prognosis, and treatment of various diseases. A number of different proteomic technologies are available that allow us to look at many proteins at once, and all of them yield complex data that raise significant quantitative challenges. Inadequate attention to these quantitative issues can prevent these studies from achieving their desired goals, and can even lead to invalid results. In this chapter, we describe various ways the involvement of statisticians or other quantitative scientists in the study team can contribute to the success of proteomic research, and we outline some of the …


Bayesian Random Segmentationmodels To Identify Shared Copy Number Aberrations For Array Cgh Data, Veerabhadran Baladandayuthapani, Yuan Ji, Rajesh Talluri, Luis E. Nieto-Barajas, Jeffrey S. Morris Jan 2010

Bayesian Random Segmentationmodels To Identify Shared Copy Number Aberrations For Array Cgh Data, Veerabhadran Baladandayuthapani, Yuan Ji, Rajesh Talluri, Luis E. Nieto-Barajas, Jeffrey S. Morris

Jeffrey S. Morris

Array-based comparative genomic hybridization (aCGH) is a high-resolution high-throughput technique for studying the genetic basis of cancer. The resulting data consists of log fluorescence ratios as a function of the genomic DNA location and provides a cytogenetic representation of the relative DNA copy number variation. Analysis of such data typically involves estimation of the underlying copy number state at each location and segmenting regions of DNA with similar copy number states. Most current methods proceed by modeling a single sample/array at a time, and thus fail to borrow strength across multiple samples to infer shared regions of copy number aberrations. …


Detecting Outlier Genes From High-Dimensional Data: A Fuzzy Approach, Debashis Ghosh Jan 2010

Detecting Outlier Genes From High-Dimensional Data: A Fuzzy Approach, Debashis Ghosh

Debashis Ghosh

A recent nding in cancer research has been the characterization of previously undis- covered chromosomal abnormalities in several types of solid tumors. This was found based on analyses of high-throughput data from gene expression microarrays and motivated the development of so-called `outlier' tests for dierential expression. One statistical issue was the potential discreteness of the test statistics. Using ideas from fuzzy set theory, we develop fuzzy outlier detection algorithms that have links to ideas in multiple comparisons. Two- and K-sample extensions are considered. The methodology is illustrated by application to two microarray studies.


Microbial Nad Metabolism: Lessons From Comparative Genomics, Francesca Gazzaniga, Rebecca Stebbins, Sheila Z. Chang, Mark A. Mcpeek, Charles Brenner Sep 2009

Microbial Nad Metabolism: Lessons From Comparative Genomics, Francesca Gazzaniga, Rebecca Stebbins, Sheila Z. Chang, Mark A. Mcpeek, Charles Brenner

Dartmouth Scholarship

NAD is a coenzyme for redox reactions and a substrate of NAD-consuming enzymes, including ADP-ribose transferases, Sir2-related protein lysine deacetylases, and bacterial DNA ligases. Microorganisms that synthesize NAD from as few as one to as many as five of the six identified biosynthetic precursors have been identified. De novo NAD synthesis from aspartate or tryptophan is neither universal nor strictly aerobic. Salvage NAD synthesis from nicotinamide, nicotinic acid, nicotinamide riboside, and nicotinic acid riboside occurs via modules of different genes. Nicotinamide salvage genes nadV and pncA, found in distinct bacteria, appear to have spread throughout the tree of life …


The Genome-Enabled Electronic Medical Record., M A Hoffman Feb 2007

The Genome-Enabled Electronic Medical Record., M A Hoffman

Manuscripts, Articles, Book Chapters and Other Papers

The integration of patient-specific genomic information into the electronic medical record (EMR) will create many opportunities to improve patient care. Key to the successful incorporation of genomic information into the EMR will be the development of laboratory information systems capable of appropriately formatting molecular diagnostic and cytogenetic findings in the EMR. Due to the lack of granular genomics-related content in existing medical vocabularies, the adoption of new standards for describing clinically significant genomic information will be an important step toward recognizing the genome-enabled EMR. Appropriate capture of patient-specific genomic results in the EMR will generate new opportunities to utilize this …


Alternative Probeset Definitions For Combining Microarray Data Across Studies Using Different Versions Of Affymetrix Oligonucleotide Arrays, Jeffrey S. Morris, Chunlei Wu, Kevin R. Coombes, Keith A. Baggerly, Jing Wang, Li Zhang Dec 2006

Alternative Probeset Definitions For Combining Microarray Data Across Studies Using Different Versions Of Affymetrix Oligonucleotide Arrays, Jeffrey S. Morris, Chunlei Wu, Kevin R. Coombes, Keith A. Baggerly, Jing Wang, Li Zhang

Jeffrey S. Morris

Many published microarray studies have small to moderate sample sizes, and thus have low statistical power to detect significant relationships between gene expression levels and outcomes of interest. By pooling data across multiple studies, however, we can gain power, enabling us to detect new relationships. This type of pooling is complicated by the fact that gene expression measurements from different microarray platforms are not directly comparable. In this chapter, we discuss two methods for combining information across different versions of Affymetrix oligonucleotide arrays. Each involves a new approach for combining probes on the array into probesets. The first approach involves …


Some Statistical Issues In Microarray Gene Expression Data, Matthew S. Mayo, Byron J. Gajewski, Jeffrey S. Morris Jun 2006

Some Statistical Issues In Microarray Gene Expression Data, Matthew S. Mayo, Byron J. Gajewski, Jeffrey S. Morris

Jeffrey S. Morris

In this paper we discuss some of the statistical issues that should be considered when conducting experiments involving microarray gene expression data. We discuss statistical issues related to preprocessing the data as well as the analysis of the data. Analysis of the data is discussed in three contexts: class comparison, class prediction and class discovery. We also review the methods used in two studies that are using microarray gene expression to assess the effect of exposure to radiofrequency (RF) fields on gene expression. Our intent is to provide a guide for radiation researchers when conducting studies involving microarray gene expression …


Shrinkage Estimation For Sage Data Using A Mixture Dirichlet Prior, Jeffrey S. Morris, Keith A. Baggerly, Kevin R. Coombes Mar 2006

Shrinkage Estimation For Sage Data Using A Mixture Dirichlet Prior, Jeffrey S. Morris, Keith A. Baggerly, Kevin R. Coombes

Jeffrey S. Morris

Serial Analysis of Gene Expression (SAGE) is a technique for estimating the gene expression profile of a biological sample. Any efficient inference in SAGE must be based upon efficient estimates of these gene expression profiles, which consist of the estimated relative abundances for each mRNA species present in the sample. The data from SAGE experiments are counts for each observed mRNA species, and can be modeled using a multinomial distribution with two characteristics: skewness in the distribution of relative abundances and small sample size relative to the dimension. As a result of these characteristics, a given SAGE sample will fail …


An Introduction To High-Throughput Bioinformatics Data, Keith A. Baggerly, Kevin R. Coombes, Jeffrey S. Morris Mar 2006

An Introduction To High-Throughput Bioinformatics Data, Keith A. Baggerly, Kevin R. Coombes, Jeffrey S. Morris

Jeffrey S. Morris

High throughput biological assays supply thousands of measurements per sample, and the sheer amount of related data increases the need for better models to enhance inference. Such models, however, are more effective if they take into account the idiosyncracies associated with the specific methods of measurement: where the numbers come from. We illustrate this point by describing three different measurement platforms: microarrays, serial analysis of gene expression (SAGE), and proteomic mass spectrometry.


Bayesian Mixture Models For Gene Expression And Protein Profiles, Michele Guindani, Kim-Anh Do, Peter Mueller, Jeffrey S. Morris Mar 2006

Bayesian Mixture Models For Gene Expression And Protein Profiles, Michele Guindani, Kim-Anh Do, Peter Mueller, Jeffrey S. Morris

Jeffrey S. Morris

We review the use of semi-parametric mixture models for Bayesian inference in high throughput genomic data. We discuss three specific approaches for microarray data, for protein mass spectrometry experiments, and for SAGE data. For the microarray data and the protein mass spectrometry we assume group comparison experiments, i.e., experiments that seek to identify genes and proteins that are differentially expressed across two biologic conditions of interest. For the SAGE data example we consider inference for a single biologic sample.


Pooling Information Across Different Studies And Oligonucleotide Microarray Chip Types To Identify Prognostic Genes For Lung Cancer., Jeffrey S. Morris, Guosheng Yin, Keith A. Baggerly, Chunlei Wu, Li Zhang Dec 2005

Pooling Information Across Different Studies And Oligonucleotide Microarray Chip Types To Identify Prognostic Genes For Lung Cancer., Jeffrey S. Morris, Guosheng Yin, Keith A. Baggerly, Chunlei Wu, Li Zhang

Jeffrey S. Morris

Our goal in this work is to pool information across microarray studies conducted at different institutions using two different versions of Affymetrix chips to identify genes whose expression levels offer information on lung cancer patients’ survival above and beyond the information provided by readily available clinical covariates. We combine information across chip types by identifying “matching probes” present on both chips, and then assembling them into new probesets based on Unigene clusters. This method yields comparable expression level quantifications across chips without sacrificing much precision or significantly altering the relative ordering of the samples. We fit a series of multivariable …


The Importance Of Experimental Design In Proteomic Mass Spectrometry Experiments: Some Cautionary Tales, Jeffrey S. Morris, Jianhua Hu, Kevin R. Coombes, Keith A. Baggerly Mar 2005

The Importance Of Experimental Design In Proteomic Mass Spectrometry Experiments: Some Cautionary Tales, Jeffrey S. Morris, Jianhua Hu, Kevin R. Coombes, Keith A. Baggerly

Jeffrey S. Morris

Proteomic expression patterns derived from mass spectrometry have been put forward as potential biomarkers for the early diagnosis of cancer and other diseases. This approach has generated much excitement and has led to a large number of new experiments and vast amounts of new data. The data, derived at great expense, can have very little value if careful attention is not paid to the experimental design and analysis. Using examples from surfaceenhanced laser desorption/ionisation time-of-flight (SELDI-TOF) and matrix-assisted laser desorption–ionisation/time-of-flight (MALDI-TOF) experiments, we describe several experimental design issues that can corrupt a dataset. Fortunately, the problems we identify can be …


Genomic And Proteomic Profiling Of Responses To Toxic Metals In Human Lung Cells, Angeline S. Andrew, Amy J. Warren, Aaron Barchowsky, Kaili A. Temple, Linda Klei, Nicole V. Soucy, Kimberly A. O'Hara, Joshua W. Hamilton May 2003

Genomic And Proteomic Profiling Of Responses To Toxic Metals In Human Lung Cells, Angeline S. Andrew, Amy J. Warren, Aaron Barchowsky, Kaili A. Temple, Linda Klei, Nicole V. Soucy, Kimberly A. O'Hara, Joshua W. Hamilton

Dartmouth Scholarship

Examining global effects of toxic metals on gene expression can be useful for elucidating patterns of biological response, discovering underlying mechanisms of toxicity, and identifying candidate metal-specific genetic markers of exposure and response. Using a 1,200 gene nylon array, we examined changes in gene expression following low-dose, acute exposures of cadmium, chromium, arsenic, nickel, or mitomycin C (MMC) in BEAS-2B human bronchial epithelial cells. Total RNA was isolated from cells exposed to 3 M Cd(II) (as cadmium chloride), 10 M Cr(VI) (as sodium dichromate), 3 g/cm2 Ni(II) (as nickel subsulfide), 5 M or 50 M As(III) (as sodium arsenite), or …


Bayesian Shrinkage Estimation Of The Relative Abundance Of Mrna Transcripts Using Sage, Jeffrey S. Morris, Keith A. Baggerly, Kevin R. Coombes Mar 2003

Bayesian Shrinkage Estimation Of The Relative Abundance Of Mrna Transcripts Using Sage, Jeffrey S. Morris, Keith A. Baggerly, Kevin R. Coombes

Jeffrey S. Morris

Serial analysis of gene expression (SAGE) is a technology for quantifying gene expression in biological tissue that yields count data that can be modeled by a multinomial distribution with two characteristics: skewness in the relative frequencies and small sample size relative to the dimension. As a result of these characteristics, a given SAGE sample may fail to capture a large number of expressed mRNA species present in the tissue. Empirical estimators of mRNA species’ relative abundance effectively ignore these missing species, and as a result tend to overestimate the abundance of the scarce observed species comprising a vast majority of …


Selecting Differentially Expressed Genes From Microarray Experiments, Margaret S. Pepe, Gary M. Longton, Garnet L. Anderson, Michel Schummer Jan 2003

Selecting Differentially Expressed Genes From Microarray Experiments, Margaret S. Pepe, Gary M. Longton, Garnet L. Anderson, Michel Schummer

UW Biostatistics Working Paper Series

High throughput technologies, such as gene expression arrays and protein mass spectrometry, allow one to simultaneously evaluate thousands of potential biomarkers that distinguish different tissue types. Of particular interest here is cancer versus normal organ tissues. We consider statistical methods to rank genes (or proteins) in regards to differential expression between tissues. Various statistical measures are considered and we argue that two measures related to the Receiver Operating Characteristic Curve are particularly suitable for this purpose. We also propose that sampling variability in the gene rankings be quantified and suggest using the “selection probability function”, the probability distribution of rankings …