Non-Invasive Analysis Of The Sputum Transcriptome Discriminates Clinical Phenotypes Of Asthma, 2019 Yale University School of Medicine
Non-Invasive Analysis Of The Sputum Transcriptome Discriminates Clinical Phenotypes Of Asthma, Xiting Yan
Yale Day of Data
Whole transcriptome wide gene expression profiles in the sputum and circulation from 100 asthma patients were measured using the Affymetrix HuGene 1.0ST arrays. Unsupervised clustering analysis based on pathways from KEGG were used to identify TEA clusters of patients from the sputum gene expression profiles. The identified TEA clusters have significantly different pre-bronchodilator FEV1, bronchodilator responsiveness, exhaled nitric oxide levels, history of hospitalization for asthma and history of intubation. Evaluation of TEA clusters in children from Asthma BRIDGE cohort confirmed the identified differences in intubation and hospitalization. Furthermore, evaluation of the TH2 gene signatures suggested a much lower prevalence ...
A Novel Pathway-Based Distance Score Enhances Assessment Of Disease Heterogeneity In Gene Expression, 2019 Yale University School of Public Health
A Novel Pathway-Based Distance Score Enhances Assessment Of Disease Heterogeneity In Gene Expression, Yunqing Liu, Xiting Yan
Yale Day of Data
Distance-based unsupervised clustering of gene expression data is commonly used to identify heterogeneity in biologic samples. However, high noise levels in gene expression data and the relatively high correlation between genes are often encountered, so traditional distances such as Euclidean distance may not be effective at discriminating the biological differences between samples. In this study, we developed a novel computational method to assess the biological differences based on pathways by assuming that ontologically defined biological pathways in biologically similar samples have similar behavior. Application of this distance score results in more accurate, robust, and biologically meaningful clustering results in both ...
Power In Pairs: Assessing The Statistical Value Of Paired Samples In Tests For Differential Expression, 2018 Utah State University
Power In Pairs: Assessing The Statistical Value Of Paired Samples In Tests For Differential Expression, John R. Stevens, Jennifer S. Herrick, Roger K. Wolff, Martha L. Slattery
Mathematics and Statistics Faculty Publications
Background: When genomics researchers design a high-throughput study to test for differential expression, some biological systems and research questions provide opportunities to use paired samples from subjects, and researchers can plan for a certain proportion of subjects to have paired samples. We consider the effect of this paired samples proportion on the statistical power of the study, using characteristics of both count (RNA-Seq) and continuous (microarray) expression data from a colorectal cancer study.
Results: We demonstrate that a higher proportion of subjects with paired samples yields higher statistical power, for various total numbers of samples, and for various strengths of ...
Innate Immunity, The Hepatic Extracellular Matrix, And Liver Injury: Mathematical Modeling Of Metastatic Potential And Tumor Development In Alcoholic Liver Disease., Shanice V. Hudson
Electronic Theses and Dissertations
The overarching goals of the current work are to fill key gaps in the current understanding of alcohol consumption and the risk of metastasis to the liver. Considering the evidence this research group has compiled confirming that the hepatic matrisome responds dynamically to injury, an altered extracellular matrix (ECM) profile appears to be a key feature of pre-fibrotic inflammatory injury in the liver. This group has demonstrated that the hepatic ECM responds dynamically to alcohol exposure, in particular, sensitizing the liver to LPS-induced inflammatory damage. Although the study of alcohol in its role as a contributing factor to oncogenesis and ...
Predictions Generated From A Simulation Engine For Gene Expression Micro-Arrays For Use In Research Laboratories, 2018 Southern Methodist University
Predictions Generated From A Simulation Engine For Gene Expression Micro-Arrays For Use In Research Laboratories, Gopinath R. Mavankal, John Blevins, Dominique Edwards, Monnie Mcgee, Andrew Hardin
SMU Data Science Review
In this paper we introduce the technical components, the biology and data science involved in the use of microarray technology in biological and clinical research. We discuss how laborious experimental protocols involved in obtaining this data used in laboratories could benefit from using simulations of the data. We discuss the approach used in the simulation engine from . We use this simulation engine to generate a prediction tool in Power BI, a Microsoft, business intelligence tool for analytics and data visualization . This tool could be used in any laboratory using micro-arrays to improve experimental design by comparing how predicted ...
Analysis Challenges For High Dimensional Data, 2018 The University of Western Ontario
Analysis Challenges For High Dimensional Data, Bangxin Zhao
Electronic Thesis and Dissertation Repository
In this thesis, we propose new methodologies targeting the areas of high-dimensional variable screening, influence measure and post-selection inference. We propose a new estimator for the correlation between the response and high-dimensional predictor variables, and based on the estimator we develop a new screening technique termed Dynamic Tilted Current Correlation Screening (DTCCS) for high dimensional variables screening. DTCCS is capable of picking up the relevant predictor variables within a finite number of steps. The DTCCS method takes the popular used sure independent screening (SIS) method and the high-dimensional ordinary least squares projection (HOLP) approach as its special cases.
Two methods ...
A Comparison Of Unsupervised Methods For Dna Microarray Leukemia Data, 2018 East Tennessee State University
A Comparison Of Unsupervised Methods For Dna Microarray Leukemia Data, Denise Harness
Appalachian Student Research Forum
Advancements in DNA microarray data sequencing have created the need for sophisticated machine learning algorithms and feature selection methods. Probabilistic graphical models, in particular, have been used to identify whether microarrays or genes cluster together in groups of individuals having a similar diagnosis. These clusters of genes are informative, but can be misleading when every gene is used in the calculation. First feature reduction techniques are explored, however the size and nature of the data prevents traditional techniques from working efficiently. Our method is to use the partial correlations between the features to create a precision matrix and predict which ...
Contributions To Statistical Testing, Prediction, And Modeling, 2017 University of New Mexico
Contributions To Statistical Testing, Prediction, And Modeling, John C. Pesko
Mathematics & Statistics ETDs
1. "Parametric Bootstrap (PB) and Objective Bayesian (OB) Testing with Applications to Heteroscedastic ANOVA": For one-way heteroscedastic ANOVA, we show a close relationship between the PB and OB approaches to significance testing, demonstrating the conditions for which the two approaches are equivalent. Using a simulation study, PB and OB performance is compared to a test based on the predictive distribution as well as the unweighted test of Akritas & Papadatos (2004). We extend this work to the RCBD with subsampling model, and prove a repeated sampling property and large sample property for general OB significance testing.
2. "Early Identification of Binswanger ...
The Generalized Monotone Incremental Forward Stagewise Method For Modeling Longitudinal, Clustered, And Overdispersed Count Data: Application Predicting Nuclear Bud And Micronuclei Frequencies, 2017 Virginia Commonwealth University
The Generalized Monotone Incremental Forward Stagewise Method For Modeling Longitudinal, Clustered, And Overdispersed Count Data: Application Predicting Nuclear Bud And Micronuclei Frequencies, Rebecca Lehman
Theses and Dissertations
With the influx of high-dimensional data there is an immediate need for statistical methods that are able to handle situations when the number of predictors greatly exceeds the number of samples. One such area of growth is in examining how environmental exposures to toxins impact the body long term. The cytokinesis-block micronucleus assay can measure the genotoxic effect of exposure as a count outcome. To investigate potential biomarkers, high-throughput assays that assess gene expression and methylation have been developed. It is of interest to identify biomarkers or molecular features that are associated with elevated micronuclei (MN) or nuclear bud (Nbud ...
Integration Of Multi-Platform High-Dimensional Omic Data, 2016 The University of Texas Graduate School of Biomedical Sciences at Houston
Integration Of Multi-Platform High-Dimensional Omic Data, Xuebei An
UT GSBS Dissertations and Theses (Open Access)
The development of high-throughput biotechnologies have made data accessible from different platforms, including RNA sequencing, copy number variation, DNA methylation, protein lysate arrays, etc. The high-dimensional omic data derived from different technological platforms have been extensively used to facilitate comprehensive understanding of disease mechanisms and to determine personalized health treatments. Although vital to the progress of clinical research, the high dimensional multi-platform data impose new challenges for data analysis. Numerous studies have been proposed to integrate multi-platform omic data; however, few have efficiently and simultaneously addressed the problems that arise from high dimensionality and complex correlations.
In my dissertation, I ...
Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, 2016 Fox Chase Cancer Center
Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang
COBRA Preprint Series
Non-negative matrix factorization (NMF) is a widely used machine learning algorithm for dimension reduction of large-scale data. It has found successful applications in a variety of fields such as computational biology, neuroscience, natural language processing, information retrieval, image processing and speech recognition. In bioinformatics, for example, it has been used to extract patterns and profiles from genomic and text-mining data as well as in protein sequence and structure analysis. While the scientific performance of NMF is very promising in dealing with high dimensional data sets and complex data structures, its computational cost is high and sometimes could be critical for ...
Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, 2016 University of Washington - Seattle Campus
Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret
UW Biostatistics Working Paper Series
We have frequently implemented crossover studies to evaluate new therapeutic interventions for genital herpes simplex virus infection. The outcome measured to assess the efficacy of interventions on herpes disease severity is the viral shedding rate, defined as the frequency of detection of HSV on the genital skin and mucosa. We performed a simulation study to ascertain whether our standard model, which we have used previously, was appropriately considering all the necessary features of the shedding data to provide correct inference. We simulated shedding data under our standard, validated assumptions and assessed the ability of 5 different models to reproduce the ...
Development In Normal Mixture And Mixture Of Experts Modeling, 2016 University of Kentucky
Development In Normal Mixture And Mixture Of Experts Modeling, Meng Qi
Theses and Dissertations--Statistics
In this dissertation, first we consider the problem of testing homogeneity and order in a contaminated normal model, when the data is correlated under some known covariance structure. To address this problem, we developed a moment based homogeneity and order test, and design weights for test statistics to increase power for homogeneity test. We applied our test to microarray about Down’s syndrome. This dissertation also studies a singular Bayesian information criterion (sBIC) for a bivariate hierarchical mixture model with varying weights, and develops a new data dependent information criterion (sFLIC).We apply our model and criteria to birth- weight ...
A Weighted Gene Co-Expression Network Analysis For Streptococcus Sanguinis Microarray Experiments, 2016 Virginia Commonwealth University
A Weighted Gene Co-Expression Network Analysis For Streptococcus Sanguinis Microarray Experiments, Erik C. Dvergsten
Theses and Dissertations
Streptococcus sanguinis is a gram-positive, non-motile bacterium native to human mouths. It is the primary cause of endocarditis and is also responsible for tooth decay. Two-component systems (TCSs) are commonly found in bacteria. In response to environmental signals, TCSs may regulate the expression of virulence factor genes.
Gene co-expression networks are exploratory tools used to analyze system-level gene functionality. A gene co-expression network consists of gene expression profiles represented as nodes and gene connections, which occur if two genes are significantly co-expressed. An adjacency function transforms the similarity matrix containing co-expression similarities into the adjacency matrix containing connection strengths. Gene ...
Transcriptomic Analyses Of Onecut1 And Onecut2 Deficient Retinas, 2015 Iowa State University
Transcriptomic Analyses Of Onecut1 And Onecut2 Deficient Retinas, Jillian J. Goetz, Jeffrey M. Trimarchi
Genetics, Development and Cell Biology Publications
In this article, we further explore the data generated for the research article “Onecut1 and Onecut2 play critical roles in the development of the mouse retina”. To better understand the functionality of the Onecut family of transcription factors in retinogenesis, we investigated the retinal transcriptomes of developing and mature mice to identify genes with differential expression. This data article reports the full transcriptomes resulting from these experiments and provides tables detailing the differentially expressed genes between wildtype and Onecut1 or 2 deficient retinas. The raw array data of our transcriptomes as generated using Affymetrix microarrays are available on the NCBI ...
Bayesian Joint Selection Of Genes And Pathways: Applications In Multiple Myeloma Genomics, 2014 The University of Texas MD Anderson Cancer Center
Bayesian Joint Selection Of Genes And Pathways: Applications In Multiple Myeloma Genomics, Lin Zhang, Jeffrey S. Morris, Jiexin Zhang, Robert Orlowski, Veerabhadran Baladandayuthapani
Jeffrey S. Morris
It is well-established that the development of a disease, especially cancer, is a complex process that results from the joint effects of multiple genes involved in various molecular signaling pathways. In this article, we propose methods to discover genes and molecular pathways significantly associ- ated with clinical outcomes in cancer samples. We exploit the natural hierarchal structure of genes related to a given pathway as a group of interacting genes to conduct selection of both pathways and genes. We posit the problem in a hierarchical structured variable selection (HSVS) framework to analyze the corresponding gene expression data. HSVS methods conduct ...
Normal Mixture And Contaminated Model With Nuisance Parameter And Applications, 2014 University of Kentucky
Normal Mixture And Contaminated Model With Nuisance Parameter And Applications, Qian Fan
Theses and Dissertations--Statistics
This paper intend to find the proper hypothesis and test statistic for testing existence of bilaterally contamination when there exists nuisance parameter. The test statistic is based on method of moments estimators. Union-Intersection test is used for testing if the distribution of population can be implemented by a bilaterally contaminated normal model with unknown variance. This paper also developed a hierarchical normal mixture model (HNM) and applied it to birth weight data. EM algorithm is employed for parameter estimation and a singular Bayesian information criterion (sBIC) is applied to choose the number components. We also proposed a singular flexible information ...
Contaminated Chi-Square Modeling And Its Application In Microarray Data Analysis, 2014 University of Kentucky
Contaminated Chi-Square Modeling And Its Application In Microarray Data Analysis, Feng Zhou
Theses and Dissertations--Statistics
Mixture modeling has numerous applications. One particular interest is microarray data analysis. My dissertation research is focused on the Contaminated Chi-Square (CCS) Modeling and its application in microarray. A moment-based method and two likelihood-based methods including Modified Likelihood Ratio Test (MLRT) and Expectation-Maximization (EM) Test are developed for testing the omnibus null hypothesis of no contamination of a central chi-square distribution by a non-central Chi-Square distribution. When the omnibus null hypothesis is rejected, we further developed the moment-based test and the EM test for testing an extra component to the Contaminated Chi-Square (CCS+EC) Model. The moment-based approach is easy ...
Methods For Integrative Analysis Of Genomic Data, 2014 Virginia Commonwealth University
Methods For Integrative Analysis Of Genomic Data, Paul Manser
Theses and Dissertations
In recent years, the development of new genomic technologies has allowed for the investigation of many regulatory epigenetic marks besides expression levels, on a genome-wide scale. As the price for these technologies continues to decrease, study sizes will not only increase, but several different assays are beginning to be used for the same samples. It is therefore desirable to develop statistical methods to integrate multiple data types that can handle the increased computational burden of incorporating large data sets. Furthermore, it is important to develop sound quality control and normalization methods as technical errors can compound when integrating multiple genomic ...
Integrative Biomarker Identification And Classification Using High Throughput Assays, 2013 The University of Texas Graduate School of Biomedical Sciences at Houston
Integrative Biomarker Identification And Classification Using High Throughput Assays, Pan Tong
UT GSBS Dissertations and Theses (Open Access)
It is well accepted that tumorigenesis is a multi-step procedure involving aberrant functioning of genes regulating cell proliferation, differentiation, apoptosis, genome stability, angiogenesis and motility. To obtain a full understanding of tumorigenesis, it is necessary to collect information on all aspects of cell activity. Recent advances in high throughput technologies allow biologists to generate massive amounts of data, more than might have been imagined decades ago. These advances have made it possible to launch comprehensive projects such as (TCGA) and (ICGC) which systematically characterize the molecular fingerprints of cancer cells using gene expression, methylation, copy number, microRNA and SNP microarrays ...