Open Access. Powered by Scholars. Published by Universities.®

Microarrays Commons

Open Access. Powered by Scholars. Published by Universities.®

Life Sciences

Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 81

Full-Text Articles in Microarrays

Bayesian Methods For Graphical Models With Neighborhood Selection., Sagnik Bhadury Dec 2022

Bayesian Methods For Graphical Models With Neighborhood Selection., Sagnik Bhadury

Electronic Theses and Dissertations

Graphical models determine associations between variables through the notion of conditional independence. Gaussian graphical models are a widely used class of such models, where the relationships are formalized by non-null entries of the precision matrix. However, in high-dimensional cases, covariance estimates are typically unstable. Moreover, it is natural to expect only a few significant associations to be present in many realistic applications. This necessitates the injection of sparsity techniques into the estimation method. Classical frequentist methods, like GLASSO, use penalization techniques for this purpose. Fully Bayesian methods, on the contrary, are slow because they require iteratively sampling over a quadratic …


Gene Set Testing By Distance Correlation, Sho-Hsien Su Dec 2020

Gene Set Testing By Distance Correlation, Sho-Hsien Su

Graduate Theses and Dissertations

Pathways are the functional building blocks of complex diseases such as cancers. Pathway-level studies may provide insights on some important biological processes. Gene set test is an important tool to study the differential expression of a gene set between two groups, e.g., cancer vs normal. The differential expression of a gene set could be due to the difference in mean, variability, or both. However, most existing gene set tests only target the mean difference but overlook other types of differential expression. In this thesis, we propose to use the recently developed distance correlation for gene set testing. To assess the …


Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das Dec 2020

Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das

Electronic Theses and Dissertations

Recently, gene set analysis has become the first choice for gaining insights into the underlying complex biology of diseases through high-throughput genomic studies, such as Microarrays, bulk RNA-Sequencing, single cell RNA-Sequencing, etc. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Further, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. Hence, a comprehensive overview of the available gene set analysis approaches used for different high-throughput genomic studies is provided. The analysis of gene sets is usually carried out based on …


Gene Expression Profiling In Salmonella Choleraesuis-Infected Porcine Lung Using A Long Oligonucleotide Microarray, Shu-Hong Zhao, Daniel Kuhar, Joan K. Lunney, Harry Dawson, Catherine Guidry, Jolita J. Uthe, Shawn M. D. Bearson, Justin Recknor, Dan Nettleton, Christopher K. Tuggle Jul 2019

Gene Expression Profiling In Salmonella Choleraesuis-Infected Porcine Lung Using A Long Oligonucleotide Microarray, Shu-Hong Zhao, Daniel Kuhar, Joan K. Lunney, Harry Dawson, Catherine Guidry, Jolita J. Uthe, Shawn M. D. Bearson, Justin Recknor, Dan Nettleton, Christopher K. Tuggle

Dan Nettleton

Understanding the transcriptional response to pathogenic bacterial infection within food animals is of fundamental and applied interest. To determine the transcriptional response to Salmonella enterica serovar Choleraesuis (SC) infection, a 13,297-oligonucleotide swine array was used to analyze RNA from control, 24-h postinoculation (hpi), and 48-hpi porcine lung tissue from pigs infected with SC. In total, 57 genes showed differential expression (p < 0.001; false discovery rate = 12%). Quantitative real-time PCR (qRT-PCR) of 61 genes was used to confirm the microarray results and to identify pathways responding to infection. Of the 33 genes identified by microarray analysis as differentially expressed, 23 were confirmed by qRT-PCR results. A novel finding was that two transglutaminase family genes (TGM1 and TGM3) showed dramatic increases in expression postinoculation; combined with several other apoptotic genes, they indicated the induction of apoptotic pathways during SC infection. A predominant T helper 1-type immune response occurred during infection, with interferon …


Laser Microdissection Of Narrow Sheath Mutant Maize Uncovers Novel Gene Expression In The Shoot Apical Meristem, Xiaolan Zhang, Shahinez Madi, Lisa Borsuk, Dan Nettleton, Robert J. Elshire, Brent Buckner, Diane Janick-Buckner, Jon Beck, Marja Timmermans, Patrick S. Schnable, Michael J. Scanlon Jul 2019

Laser Microdissection Of Narrow Sheath Mutant Maize Uncovers Novel Gene Expression In The Shoot Apical Meristem, Xiaolan Zhang, Shahinez Madi, Lisa Borsuk, Dan Nettleton, Robert J. Elshire, Brent Buckner, Diane Janick-Buckner, Jon Beck, Marja Timmermans, Patrick S. Schnable, Michael J. Scanlon

Dan Nettleton

Microarrays enable comparative analyses of gene expression on a genomic scale, however these experiments frequently identify an abundance of differentially expressed genes such that it may be difficult to identify discrete functional networks that are hidden within large microarray datasets. Microarray analyses in which mutant organisms are compared to nonmutant siblings can be especially problematic when the gene of interest is expressed in relatively few cells. Here, we describe the use of laser microdissection microarray to perform transcriptional profiling of the maize shoot apical meristem (SAM), a ~100-μm pillar of organogenic cells that is required for leaf initiation. Microarray analyses …


Scanning Microarrays At Multiple Intensities Enhances Discovery Of Differentially Expressed Genes, David S. Skibbe, Xiujuan Wang, Xuefeng Zhao, Lisa A. Borsuk, Dan Nettleton, Patrick S. Schnable Jul 2019

Scanning Microarrays At Multiple Intensities Enhances Discovery Of Differentially Expressed Genes, David S. Skibbe, Xiujuan Wang, Xuefeng Zhao, Lisa A. Borsuk, Dan Nettleton, Patrick S. Schnable

Dan Nettleton

Motivation: Scanning parameters are often overlooked when optimizing microarray experiments. A scanning approach that extends the dynamic data range by acquiring multiple scans of different intensities has been developed.

Results: Data from each of three scan intensities (low, medium, high) were analyzed separately using multiple scan and linear regression approaches to identify and compare the sets of genes that exhibit statistically significant differential expression. In the multiple scan approach only one-third of the differentially expressed genes were shared among the three intensities, and each scan intensity identified unique sets of differentially expressed genes. The set of differentially expressed genes from …


Microarray Gene Expression Profiles Of Fasting Induced Changes In Liver And Adipose Tissues Of Pigs Expressing The Melanocortin-4 Receptor D298n Variant, Sender Lkhagvadorj, Long Qu, Weiguo Cai, Oliver P. Coutoure, C. Richard Barb, Gary J. Hausman, Dan Nettleton, Lloyd L. Anderson, Jack C. M. Dekkers, Christopher K. Tuggle Jul 2019

Microarray Gene Expression Profiles Of Fasting Induced Changes In Liver And Adipose Tissues Of Pigs Expressing The Melanocortin-4 Receptor D298n Variant, Sender Lkhagvadorj, Long Qu, Weiguo Cai, Oliver P. Coutoure, C. Richard Barb, Gary J. Hausman, Dan Nettleton, Lloyd L. Anderson, Jack C. M. Dekkers, Christopher K. Tuggle

Dan Nettleton

Transcriptional profiling coupled with blood metabolite analyses were used to identify porcine genes and pathways that respond to a fasting treatment or to a D298N missense mutation in the melanocortin-4 receptor (MC4R) gene. Gilts (12 homozygous for D298 and 12 homozygous for N298) were either fed ad libitum or fasted for 3 days. Fasting decreased body weight, backfat, and serum urea concentration and increased serum nonesterified fatty acid. In response to fasting, 7,029 genes in fat and 1,831 genes in liver were differentially expressed (DE). MC4R genotype did not significantly affect gene expression, body weight, backfat depth, or any measured …


Analysis Of Porcine Transcriptional Response To Salmonella Enterica Serovar Choleraesuis Suggests Novel Targets Of Nfkappab Are Activated In The Mesenteric Lymph Node, Yanfang Wang, Olivre P. Couture, Long Qu, Jolita J. Uthe, Shawn M. D. Bearson, Daniel Kuhar, Joan K. Lunney, Dan Nettleton, Jack C. M. Dekkers, Christopher K. Tuggle Jul 2019

Analysis Of Porcine Transcriptional Response To Salmonella Enterica Serovar Choleraesuis Suggests Novel Targets Of Nfkappab Are Activated In The Mesenteric Lymph Node, Yanfang Wang, Olivre P. Couture, Long Qu, Jolita J. Uthe, Shawn M. D. Bearson, Daniel Kuhar, Joan K. Lunney, Dan Nettleton, Jack C. M. Dekkers, Christopher K. Tuggle

Dan Nettleton

Background: Specific knowledge of the molecular pathways controlling host-pathogen interactions can increase our understanding of immune response biology as well as provide targets for drug development and genetic improvement of disease resistance. Toward this end, we have characterized the porcine transcriptional response to Salmonella enterica serovar Choleraesuis (S. Choleraesuis), a Salmonella serovar that predominately colonizes swine, yet can cause serious infections in human patients. Affymetrix technology was used to screen for differentially expressed genes in pig mesenteric lymph nodes (MLN) responding to infection with S. Choleraesuis at acute (8 hours (h), 24 h and 48 h post-inoculation (pi)) and chronic …


Comparative Gene Expression Profiles Between Heterotic And Non-Heterotic Hybrids Of Tetraploid Medicago Sativa, Xuehui Li, Yanling Wei, Dan Nettleton, E. Charles Brummer Jul 2019

Comparative Gene Expression Profiles Between Heterotic And Non-Heterotic Hybrids Of Tetraploid Medicago Sativa, Xuehui Li, Yanling Wei, Dan Nettleton, E. Charles Brummer

Dan Nettleton

Background: Heterosis, the superior performance of hybrids relative to parents, has clear agricultural value, but its genetic control is unknown. Our objective was to test the hypotheses that hybrids expressing heterosis for biomass yield would show more gene expression levels that were different from midparental values and outside the range of parental values than hybrids that do not exhibit heterosis.

Results: We tested these hypotheses in three Medicago sativa (alfalfa) genotypes and their three hybrids, two of which expressed heterosis for biomass yield and a third that did not, using Affymetrix M. truncatula GeneChip arrays. Alfalfa hybridized to approximately 47% …


Distinct Peripheral Blood Rna Responses To Salmonella In Pigs Differing In Salmonella Shedding Levels: Intersection Of Ifng, Tlr And Mirna Pathways, Ting-Hua Huang, Jolita J. Uthe, Shawn M. D. Bearson, Cumhur Yusuf Demirkale, Dan Nettleton, Susan Knetter, Curtis Christian, Amanda E. Ramer-Tait, Michael J. Wannemeuhler, Christopher K. Tuggle Jul 2019

Distinct Peripheral Blood Rna Responses To Salmonella In Pigs Differing In Salmonella Shedding Levels: Intersection Of Ifng, Tlr And Mirna Pathways, Ting-Hua Huang, Jolita J. Uthe, Shawn M. D. Bearson, Cumhur Yusuf Demirkale, Dan Nettleton, Susan Knetter, Curtis Christian, Amanda E. Ramer-Tait, Michael J. Wannemeuhler, Christopher K. Tuggle

Dan Nettleton

Transcriptomic analysis of the response to bacterial pathogens has been reported for several species, yet few studies have investigated the transcriptional differences in whole blood in subjects that differ in their disease response phenotypes. Salmonella species infect many vertebrate species, and pigs colonized with Salmonella enterica serovar Typhimurium (ST) are usually asymptomatic, making detection of these Salmonella-carrier pigs difficult. The variable fecal shedding of Salmonella is an important cause of foodborne illness and zoonotic disease. To investigate gene pathways and biomarkers associated with the variance in Salmonellashedding following experimental inoculation, we initiated the first analysis of the whole …


Unique Genome-Wide Transcriptome Profiles Of Chicken Macrophages Exposed To Salmonella-Derived Endotoxin, Ceren Ciraci, Christopher K. Tuggle, Michael J. Wannemeuhler, Dan Nettleton, Susan J. Lamont Jul 2019

Unique Genome-Wide Transcriptome Profiles Of Chicken Macrophages Exposed To Salmonella-Derived Endotoxin, Ceren Ciraci, Christopher K. Tuggle, Michael J. Wannemeuhler, Dan Nettleton, Susan J. Lamont

Dan Nettleton

Background: Macrophages play essential roles in both innate and adaptive immune responses. Bacteria require endotoxin, a complex lipopolysaccharide, for outer membrane permeability and the host interprets endotoxin as a signal to initiate an innate immune response. The focus of this study is kinetic and global transcriptional analysis of the chicken macrophage response to in vitro stimulation with endotoxin from Salmonella typhimurium-798.

Results: The 38535-probeset Affymetrix GeneChip Chicken Genome array was used to profile transcriptional response to endotoxin 1, 2, 4, and 8 hours post stimulation (hps). Using a maximum FDR (False Discovery Rate) of 0.05 to declare genes as differentially …


Feature Selection For Longitudinal Data By Using Sign Averages To Summarize Gene Expression Values Over Time, Suyan Tian, Chi Wang Mar 2019

Feature Selection For Longitudinal Data By Using Sign Averages To Summarize Gene Expression Values Over Time, Suyan Tian, Chi Wang

Biostatistics Faculty Publications

With the rapid evolution of high-throughput technologies, time series/longitudinal high-throughput experiments have become possible and affordable. However, the development of statistical methods dealing with gene expression profiles across time points has not kept up with the explosion of such data. The feature selection process is of critical importance for longitudinal microarray data. In this study, we proposed aggregating a gene’s expression values across time into a single value using the sign average method, thereby degrading a longitudinal feature selection process into a classic one. Regularized logistic regression models with pseudogenes (i.e., the sign average of genes across time as predictors) …


Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan Mar 2019

Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan

COBRA Preprint Series

One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease's process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards …


Non-Invasive Analysis Of The Sputum Transcriptome Discriminates Clinical Phenotypes Of Asthma, Xiting Yan Jan 2019

Non-Invasive Analysis Of The Sputum Transcriptome Discriminates Clinical Phenotypes Of Asthma, Xiting Yan

Yale Day of Data

Whole transcriptome wide gene expression profiles in the sputum and circulation from 100 asthma patients were measured using the Affymetrix HuGene 1.0ST arrays. Unsupervised clustering analysis based on pathways from KEGG were used to identify TEA clusters of patients from the sputum gene expression profiles. The identified TEA clusters have significantly different pre-bronchodilator FEV1, bronchodilator responsiveness, exhaled nitric oxide levels, history of hospitalization for asthma and history of intubation. Evaluation of TEA clusters in children from Asthma BRIDGE cohort confirmed the identified differences in intubation and hospitalization. Furthermore, evaluation of the TH2 gene signatures suggested a much lower prevalence of …


A Novel Pathway-Based Distance Score Enhances Assessment Of Disease Heterogeneity In Gene Expression, Yunqing Liu, Xiting Yan Jan 2019

A Novel Pathway-Based Distance Score Enhances Assessment Of Disease Heterogeneity In Gene Expression, Yunqing Liu, Xiting Yan

Yale Day of Data

Distance-based unsupervised clustering of gene expression data is commonly used to identify heterogeneity in biologic samples. However, high noise levels in gene expression data and the relatively high correlation between genes are often encountered, so traditional distances such as Euclidean distance may not be effective at discriminating the biological differences between samples. In this study, we developed a novel computational method to assess the biological differences based on pathways by assuming that ontologically defined biological pathways in biologically similar samples have similar behavior. Application of this distance score results in more accurate, robust, and biologically meaningful clustering results in both …


Microarray Data Analysis And Classification Of Cancers, Grant Gates Jan 2019

Microarray Data Analysis And Classification Of Cancers, Grant Gates

Williams Honors College, Honors Research Projects

When it comes to cancer, there is no standardized approach for identifying new cancer classes nor is there a standardized approach for assigning cancer tumors to existing classes. These two ideas are known as class discovery and class prediction. For a cancer patient to receive proper treatment, it is important that the type of cancer be accurately identified. For my Senior Honors Project, I would like to use this opportunity to research a topic in bioinformatics. Bioinformatics incorporates a few different subjects into one including biology, computer science and statistics. An intricate method for class discovery and class prediction is …


Power In Pairs: Assessing The Statistical Value Of Paired Samples In Tests For Differential Expression, John R. Stevens, Jennifer S. Herrick, Roger K. Wolff, Martha L. Slattery Dec 2018

Power In Pairs: Assessing The Statistical Value Of Paired Samples In Tests For Differential Expression, John R. Stevens, Jennifer S. Herrick, Roger K. Wolff, Martha L. Slattery

Mathematics and Statistics Faculty Publications

Background: When genomics researchers design a high-throughput study to test for differential expression, some biological systems and research questions provide opportunities to use paired samples from subjects, and researchers can plan for a certain proportion of subjects to have paired samples. We consider the effect of this paired samples proportion on the statistical power of the study, using characteristics of both count (RNA-Seq) and continuous (microarray) expression data from a colorectal cancer study.

Results: We demonstrate that a higher proportion of subjects with paired samples yields higher statistical power, for various total numbers of samples, and for various strengths of …


Computational Modelling Of Human Transcriptional Regulation By An Information Theory-Based Approach, Ruipeng Lu Apr 2018

Computational Modelling Of Human Transcriptional Regulation By An Information Theory-Based Approach, Ruipeng Lu

Electronic Thesis and Dissertation Repository

ChIP-seq experiments can identify the genome-wide binding site motifs of a transcription factor (TF) and determine its sequence specificity. Multiple algorithms were developed to derive TF binding site (TFBS) motifs from ChIP-seq data, including the entropy minimization-based Bipad that can derive both contiguous and bipartite motifs. Prior studies applying these algorithms to ChIP-seq data only analyzed a small number of top peaks with the highest signal strengths, biasing their resultant position weight matrices (PWMs) towards consensus-like, strong binding sites; nor did they derive bipartite motifs, disabling the accurate modelling of binding behavior of dimeric TFs.

This thesis presents a novel …


Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang Feb 2016

Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang

COBRA Preprint Series

Non-negative matrix factorization (NMF) is a widely used machine learning algorithm for dimension reduction of large-scale data. It has found successful applications in a variety of fields such as computational biology, neuroscience, natural language processing, information retrieval, image processing and speech recognition. In bioinformatics, for example, it has been used to extract patterns and profiles from genomic and text-mining data as well as in protein sequence and structure analysis. While the scientific performance of NMF is very promising in dealing with high dimensional data sets and complex data structures, its computational cost is high and sometimes could be critical for …


Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret Jan 2016

Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret

UW Biostatistics Working Paper Series

We have frequently implemented crossover studies to evaluate new therapeutic interventions for genital herpes simplex virus infection. The outcome measured to assess the efficacy of interventions on herpes disease severity is the viral shedding rate, defined as the frequency of detection of HSV on the genital skin and mucosa. We performed a simulation study to ascertain whether our standard model, which we have used previously, was appropriately considering all the necessary features of the shedding data to provide correct inference. We simulated shedding data under our standard, validated assumptions and assessed the ability of 5 different models to reproduce the …


Graph-Based Regularization In Machine Learning: Discovering Driver Modules In Biological Networks, Xi Gao Jan 2015

Graph-Based Regularization In Machine Learning: Discovering Driver Modules In Biological Networks, Xi Gao

Theses and Dissertations

Curiosity of human nature drives us to explore the origins of what makes each of us different. From ancient legends and mythology, Mendel's law, Punnett square to modern genetic research, we carry on this old but eternal question. Thanks to technological revolution, today's scientists try to answer this question using easily measurable gene expression and other profiling data. However, the exploration can easily get lost in the data of growing volume, dimension, noise and complexity. This dissertation is aimed at developing new machine learning methods that take data from different classes as input, augment them with knowledge of feature relationships, …


Bayesian Joint Selection Of Genes And Pathways: Applications In Multiple Myeloma Genomics, Lin Zhang, Jeffrey S. Morris, Jiexin Zhang, Robert Orlowski, Veerabhadran Baladandayuthapani Jan 2014

Bayesian Joint Selection Of Genes And Pathways: Applications In Multiple Myeloma Genomics, Lin Zhang, Jeffrey S. Morris, Jiexin Zhang, Robert Orlowski, Veerabhadran Baladandayuthapani

Jeffrey S. Morris

It is well-established that the development of a disease, especially cancer, is a complex process that results from the joint effects of multiple genes involved in various molecular signaling pathways. In this article, we propose methods to discover genes and molecular pathways significantly associ- ated with clinical outcomes in cancer samples. We exploit the natural hierarchal structure of genes related to a given pathway as a group of interacting genes to conduct selection of both pathways and genes. We posit the problem in a hierarchical structured variable selection (HSVS) framework to analyze the corresponding gene expression data. HSVS methods conduct …


Methods For Integrative Analysis Of Genomic Data, Paul Manser Jan 2014

Methods For Integrative Analysis Of Genomic Data, Paul Manser

Theses and Dissertations

In recent years, the development of new genomic technologies has allowed for the investigation of many regulatory epigenetic marks besides expression levels, on a genome-wide scale. As the price for these technologies continues to decrease, study sizes will not only increase, but several different assays are beginning to be used for the same samples. It is therefore desirable to develop statistical methods to integrate multiple data types that can handle the increased computational burden of incorporating large data sets. Furthermore, it is important to develop sound quality control and normalization methods as technical errors can compound when integrating multiple genomic …


Global Quantitative Assessment Of The Colorectal Polyp Burden In Familial Adenomatous Polyposis Using A Web-Based Tool, Patrick M. Lynch, Jeffrey S. Morris, William A. Ross, Miguel A. Rodriguez-Bigas, Juan Posadas, Rossa Khalaf, Diane M. Weber, Valerie O. Sepeda, Bernard Levin, Imad Shureiqi Jan 2013

Global Quantitative Assessment Of The Colorectal Polyp Burden In Familial Adenomatous Polyposis Using A Web-Based Tool, Patrick M. Lynch, Jeffrey S. Morris, William A. Ross, Miguel A. Rodriguez-Bigas, Juan Posadas, Rossa Khalaf, Diane M. Weber, Valerie O. Sepeda, Bernard Levin, Imad Shureiqi

Jeffrey S. Morris

Background: Accurate measures of the total polyp burden in familial adenomatous polyposis (FAP) are lacking. Current assessment tools include polyp quantitation in limited-field photographs and qualitative total colorectal polyp burden by video.

Objective: To develop global quantitative tools of the FAP colorectal adenoma burden.

Design: A single-arm, phase II trial.

Patients: Twenty-seven patients with FAP.

Intervention: Treatment with celecoxib for 6 months, with before-treatment and after-treatment videos posted to an intranet with an interactive site for scoring.

Main Outcome Measurements: Global adenoma counts and sizes (grouped into categories: less than 2 mm, 2-4 mm, and greater than 4 mm) were …


Bayesian Methods For Expression-Based Integration, Elizabeth M. Jennings, Jeffrey S. Morris, Raymond J. Carroll, Ganiraju C. Manyam, Veera Baladandayuthapani Dec 2012

Bayesian Methods For Expression-Based Integration, Elizabeth M. Jennings, Jeffrey S. Morris, Raymond J. Carroll, Ganiraju C. Manyam, Veera Baladandayuthapani

Jeffrey S. Morris

We propose methods to integrate data across several genomic platforms using a hierarchical Bayesian analysis framework that incorporates the biological relationships among the platforms to identify genes whose expression is related to clinical outcomes in cancer. This integrated approach combines information across all platforms, leading to increased statistical power in finding these predictive genes, and further provides mechanistic information about the manner in which the gene affects the outcome. We demonstrate the advantages of the shrinkage estimation used by this approach through a simulation, and finally, we apply our method to a Glioblastoma Multiforme dataset and identify several genes potentially …


Statistical Methods For Proteomic Biomarker Discovery Based On Feature Extraction Or Functional Modeling Approaches, Jeffrey S. Morris Jan 2012

Statistical Methods For Proteomic Biomarker Discovery Based On Feature Extraction Or Functional Modeling Approaches, Jeffrey S. Morris

Jeffrey S. Morris

In recent years, developments in molecular biotechnology have led to the increased promise of detecting and validating biomarkers, or molecular markers that relate to various biological or medical outcomes. Proteomics, the direct study of proteins in biological samples, plays an important role in the biomarker discovery process. These technologies produce complex, high dimensional functional and image data that present many analytical challenges that must be addressed properly for effective comparative proteomics studies that can yield potential biomarkers. Specific challenges include experimental design, preprocessing, feature extraction, and statistical analysis accounting for the inherent multiple testing issues. This paper reviews various computational …


Integrative Bayesian Analysis Of High-Dimensional Multi-Platform Genomics Data, Wenting Wang, Veerabhadran Baladandayuthapani, Jeffrey S. Morris, Bradley M. Broom, Ganiraju C. Manyam, Kim-Anh Do Jan 2012

Integrative Bayesian Analysis Of High-Dimensional Multi-Platform Genomics Data, Wenting Wang, Veerabhadran Baladandayuthapani, Jeffrey S. Morris, Bradley M. Broom, Ganiraju C. Manyam, Kim-Anh Do

Jeffrey S. Morris

Motivation: Analyzing data from multi-platform genomics experiments combined with patients’ clinical outcomes helps us understand the complex biological processes that characterize a disease, as well as how these processes relate to the development of the disease. Current integration approaches that treat the data are limited in that they do not consider the fundamental biological relationships that exist among the data from platforms.

Statistical Model: We propose an integrative Bayesian analysis of genomics data (iBAG) framework for identifying important genes/biomarkers that are associated with clinical outcome. This framework uses a hierarchical modeling technique to combine the data obtained from multiple platforms …


A Bayesian Model Averaging Approach For Observational Gene Expression Studies, Xi Kathy Zhou, Fei Liu, Andrew J. Dannenberg Jun 2011

A Bayesian Model Averaging Approach For Observational Gene Expression Studies, Xi Kathy Zhou, Fei Liu, Andrew J. Dannenberg

COBRA Preprint Series

Identifying differentially expressed (DE) genes associated with a sample characteristic is the primary objective of many microarray studies. As more and more studies are carried out with observational rather than well controlled experimental samples, it becomes important to evaluate and properly control the impact of sample heterogeneity on DE gene finding. Typical methods for identifying DE genes require ranking all the genes according to a pre-selected statistic based on a single model for two or more group comparisons, with or without adjustment for other covariates. Such single model approaches unavoidably result in model misspecification, which can lead to increased error …


Clustering With Exclusion Zones: Genomic Applications, Mark Segal, Yuanyuan Xiao, Fred Huffer Dec 2010

Clustering With Exclusion Zones: Genomic Applications, Mark Segal, Yuanyuan Xiao, Fred Huffer

Mark R Segal

Methods for formally evaluating the clustering of events in space or time, notably the scan statistic, have been richly developed and widely applied. In order to utilize the scan statistic and related approaches, it is necessary to know the extent of the spatial or temporal domains wherein the events arise. Implicit in their usage is that these domains have no “holes”—hereafter “exclusion zones”—regions in which events a priori cannot occur. However, in many contexts, this requirement is not met. When the exclusion zones are known, it is straightforward to correct the scan statistic for their occurrence by simply adjusting the …


Minimum Description Length Measures Of Evidence For Enrichment, Zhenyu Yang, David R. Bickel Dec 2010

Minimum Description Length Measures Of Evidence For Enrichment, Zhenyu Yang, David R. Bickel

COBRA Preprint Series

In order to functionally interpret differentially expressed genes or other discovered features, researchers seek to detect enrichment in the form of overrepresentation of discovered features associated with a biological process. Most enrichment methods treat the p-value as the measure of evidence using a statistical test such as the binomial test, Fisher's exact test or the hypergeometric test. However, the p-value is not interpretable as a measure of evidence apart from adjustments in light of the sample size. As a measure of evidence supporting one hypothesis over the other, the Bayes factor (BF) overcomes this drawback of the p-value but lacks …