Open Access. Powered by Scholars. Published by Universities.®

Microarrays Commons

Open Access. Powered by Scholars. Published by Universities.®

Discipline
Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 143

Full-Text Articles in Microarrays

Bayesian Methods For Graphical Models With Neighborhood Selection., Sagnik Bhadury Dec 2022

Bayesian Methods For Graphical Models With Neighborhood Selection., Sagnik Bhadury

Electronic Theses and Dissertations

Graphical models determine associations between variables through the notion of conditional independence. Gaussian graphical models are a widely used class of such models, where the relationships are formalized by non-null entries of the precision matrix. However, in high-dimensional cases, covariance estimates are typically unstable. Moreover, it is natural to expect only a few significant associations to be present in many realistic applications. This necessitates the injection of sparsity techniques into the estimation method. Classical frequentist methods, like GLASSO, use penalization techniques for this purpose. Fully Bayesian methods, on the contrary, are slow because they require iteratively sampling over a quadratic …


Beta Mixture And Contaminated Model With Constraints And Application With Micro-Array Data, Ya Qi Jan 2022

Beta Mixture And Contaminated Model With Constraints And Application With Micro-Array Data, Ya Qi

Theses and Dissertations--Statistics

This dissertation research is concentrated on the Contaminated Beta(CB) model and its application in micro-array data analysis. Modified Likelihood Ratio Test (MLRT) introduced by [Chen et al., 2001] is used for testing the omnibus null hypothesis of no contamination of Beta(1,1)([Dai and Charnigo, 2008]). We design constraints for two-component CB model, which put the mode toward the left end of the distribution to reflect the abundance of small p-values of micro-array data, to increase the test power. A three-component CB model might be useful when distinguishing high differentially expressed genes and moderate differentially expressed genes. If the null hypothesis above …


Gene Set Testing By Distance Correlation, Sho-Hsien Su Dec 2020

Gene Set Testing By Distance Correlation, Sho-Hsien Su

Graduate Theses and Dissertations

Pathways are the functional building blocks of complex diseases such as cancers. Pathway-level studies may provide insights on some important biological processes. Gene set test is an important tool to study the differential expression of a gene set between two groups, e.g., cancer vs normal. The differential expression of a gene set could be due to the difference in mean, variability, or both. However, most existing gene set tests only target the mean difference but overlook other types of differential expression. In this thesis, we propose to use the recently developed distance correlation for gene set testing. To assess the …


Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das Dec 2020

Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das

Electronic Theses and Dissertations

Recently, gene set analysis has become the first choice for gaining insights into the underlying complex biology of diseases through high-throughput genomic studies, such as Microarrays, bulk RNA-Sequencing, single cell RNA-Sequencing, etc. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Further, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. Hence, a comprehensive overview of the available gene set analysis approaches used for different high-throughput genomic studies is provided. The analysis of gene sets is usually carried out based on …


Detecting Differentially Co-Expressed Gene Modules Via The Edge-Count Test, Anne Gratius Lin Dec 2019

Detecting Differentially Co-Expressed Gene Modules Via The Edge-Count Test, Anne Gratius Lin

Graduate Theses and Dissertations

Background

Gene expression profiling by microarray has been used to uncover molecular variations in many different diseases. Complementary to conventional differential expression analysis, differential co-expression analysis can identify gene markers from the systematic and granular level. There are three aspects for differential co-expression network analysis, including the network global topological comparison, differential co-expression cluster identification, and differential co-expressed genes and gene pair identification. To date, most of the methods available still rely on Pearson’s correlation coefficient despite its nonlinear insensitivity.

Results

Here we present an approach that is robust to nonlinearity by using the edge-count test for differential co-expression analysis. …


Classification Of Coronary Artery Disease In Non-Diabetic Patients Using Artificial Neural Networks, Demond Handley Oct 2019

Classification Of Coronary Artery Disease In Non-Diabetic Patients Using Artificial Neural Networks, Demond Handley

Annual Symposium on Biomathematics and Ecology Education and Research

No abstract provided.


Gene Expression Profiling In Salmonella Choleraesuis-Infected Porcine Lung Using A Long Oligonucleotide Microarray, Shu-Hong Zhao, Daniel Kuhar, Joan K. Lunney, Harry Dawson, Catherine Guidry, Jolita J. Uthe, Shawn M. D. Bearson, Justin Recknor, Dan Nettleton, Christopher K. Tuggle Jul 2019

Gene Expression Profiling In Salmonella Choleraesuis-Infected Porcine Lung Using A Long Oligonucleotide Microarray, Shu-Hong Zhao, Daniel Kuhar, Joan K. Lunney, Harry Dawson, Catherine Guidry, Jolita J. Uthe, Shawn M. D. Bearson, Justin Recknor, Dan Nettleton, Christopher K. Tuggle

Dan Nettleton

Understanding the transcriptional response to pathogenic bacterial infection within food animals is of fundamental and applied interest. To determine the transcriptional response to Salmonella enterica serovar Choleraesuis (SC) infection, a 13,297-oligonucleotide swine array was used to analyze RNA from control, 24-h postinoculation (hpi), and 48-hpi porcine lung tissue from pigs infected with SC. In total, 57 genes showed differential expression (p < 0.001; false discovery rate = 12%). Quantitative real-time PCR (qRT-PCR) of 61 genes was used to confirm the microarray results and to identify pathways responding to infection. Of the 33 genes identified by microarray analysis as differentially expressed, 23 were confirmed by qRT-PCR results. A novel finding was that two transglutaminase family genes (TGM1 and TGM3) showed dramatic increases in expression postinoculation; combined with several other apoptotic genes, they indicated the induction of apoptotic pathways during SC infection. A predominant T helper 1-type immune response occurred during infection, with interferon …


Laser Microdissection Of Narrow Sheath Mutant Maize Uncovers Novel Gene Expression In The Shoot Apical Meristem, Xiaolan Zhang, Shahinez Madi, Lisa Borsuk, Dan Nettleton, Robert J. Elshire, Brent Buckner, Diane Janick-Buckner, Jon Beck, Marja Timmermans, Patrick S. Schnable, Michael J. Scanlon Jul 2019

Laser Microdissection Of Narrow Sheath Mutant Maize Uncovers Novel Gene Expression In The Shoot Apical Meristem, Xiaolan Zhang, Shahinez Madi, Lisa Borsuk, Dan Nettleton, Robert J. Elshire, Brent Buckner, Diane Janick-Buckner, Jon Beck, Marja Timmermans, Patrick S. Schnable, Michael J. Scanlon

Dan Nettleton

Microarrays enable comparative analyses of gene expression on a genomic scale, however these experiments frequently identify an abundance of differentially expressed genes such that it may be difficult to identify discrete functional networks that are hidden within large microarray datasets. Microarray analyses in which mutant organisms are compared to nonmutant siblings can be especially problematic when the gene of interest is expressed in relatively few cells. Here, we describe the use of laser microdissection microarray to perform transcriptional profiling of the maize shoot apical meristem (SAM), a ~100-μm pillar of organogenic cells that is required for leaf initiation. Microarray analyses …


Scanning Microarrays At Multiple Intensities Enhances Discovery Of Differentially Expressed Genes, David S. Skibbe, Xiujuan Wang, Xuefeng Zhao, Lisa A. Borsuk, Dan Nettleton, Patrick S. Schnable Jul 2019

Scanning Microarrays At Multiple Intensities Enhances Discovery Of Differentially Expressed Genes, David S. Skibbe, Xiujuan Wang, Xuefeng Zhao, Lisa A. Borsuk, Dan Nettleton, Patrick S. Schnable

Dan Nettleton

Motivation: Scanning parameters are often overlooked when optimizing microarray experiments. A scanning approach that extends the dynamic data range by acquiring multiple scans of different intensities has been developed.

Results: Data from each of three scan intensities (low, medium, high) were analyzed separately using multiple scan and linear regression approaches to identify and compare the sets of genes that exhibit statistically significant differential expression. In the multiple scan approach only one-third of the differentially expressed genes were shared among the three intensities, and each scan intensity identified unique sets of differentially expressed genes. The set of differentially expressed genes from …


Microarray Gene Expression Profiles Of Fasting Induced Changes In Liver And Adipose Tissues Of Pigs Expressing The Melanocortin-4 Receptor D298n Variant, Sender Lkhagvadorj, Long Qu, Weiguo Cai, Oliver P. Coutoure, C. Richard Barb, Gary J. Hausman, Dan Nettleton, Lloyd L. Anderson, Jack C. M. Dekkers, Christopher K. Tuggle Jul 2019

Microarray Gene Expression Profiles Of Fasting Induced Changes In Liver And Adipose Tissues Of Pigs Expressing The Melanocortin-4 Receptor D298n Variant, Sender Lkhagvadorj, Long Qu, Weiguo Cai, Oliver P. Coutoure, C. Richard Barb, Gary J. Hausman, Dan Nettleton, Lloyd L. Anderson, Jack C. M. Dekkers, Christopher K. Tuggle

Dan Nettleton

Transcriptional profiling coupled with blood metabolite analyses were used to identify porcine genes and pathways that respond to a fasting treatment or to a D298N missense mutation in the melanocortin-4 receptor (MC4R) gene. Gilts (12 homozygous for D298 and 12 homozygous for N298) were either fed ad libitum or fasted for 3 days. Fasting decreased body weight, backfat, and serum urea concentration and increased serum nonesterified fatty acid. In response to fasting, 7,029 genes in fat and 1,831 genes in liver were differentially expressed (DE). MC4R genotype did not significantly affect gene expression, body weight, backfat depth, or any measured …


Analysis Of Porcine Transcriptional Response To Salmonella Enterica Serovar Choleraesuis Suggests Novel Targets Of Nfkappab Are Activated In The Mesenteric Lymph Node, Yanfang Wang, Olivre P. Couture, Long Qu, Jolita J. Uthe, Shawn M. D. Bearson, Daniel Kuhar, Joan K. Lunney, Dan Nettleton, Jack C. M. Dekkers, Christopher K. Tuggle Jul 2019

Analysis Of Porcine Transcriptional Response To Salmonella Enterica Serovar Choleraesuis Suggests Novel Targets Of Nfkappab Are Activated In The Mesenteric Lymph Node, Yanfang Wang, Olivre P. Couture, Long Qu, Jolita J. Uthe, Shawn M. D. Bearson, Daniel Kuhar, Joan K. Lunney, Dan Nettleton, Jack C. M. Dekkers, Christopher K. Tuggle

Dan Nettleton

Background: Specific knowledge of the molecular pathways controlling host-pathogen interactions can increase our understanding of immune response biology as well as provide targets for drug development and genetic improvement of disease resistance. Toward this end, we have characterized the porcine transcriptional response to Salmonella enterica serovar Choleraesuis (S. Choleraesuis), a Salmonella serovar that predominately colonizes swine, yet can cause serious infections in human patients. Affymetrix technology was used to screen for differentially expressed genes in pig mesenteric lymph nodes (MLN) responding to infection with S. Choleraesuis at acute (8 hours (h), 24 h and 48 h post-inoculation (pi)) and chronic …


Comparative Gene Expression Profiles Between Heterotic And Non-Heterotic Hybrids Of Tetraploid Medicago Sativa, Xuehui Li, Yanling Wei, Dan Nettleton, E. Charles Brummer Jul 2019

Comparative Gene Expression Profiles Between Heterotic And Non-Heterotic Hybrids Of Tetraploid Medicago Sativa, Xuehui Li, Yanling Wei, Dan Nettleton, E. Charles Brummer

Dan Nettleton

Background: Heterosis, the superior performance of hybrids relative to parents, has clear agricultural value, but its genetic control is unknown. Our objective was to test the hypotheses that hybrids expressing heterosis for biomass yield would show more gene expression levels that were different from midparental values and outside the range of parental values than hybrids that do not exhibit heterosis.

Results: We tested these hypotheses in three Medicago sativa (alfalfa) genotypes and their three hybrids, two of which expressed heterosis for biomass yield and a third that did not, using Affymetrix M. truncatula GeneChip arrays. Alfalfa hybridized to approximately 47% …


Distinct Peripheral Blood Rna Responses To Salmonella In Pigs Differing In Salmonella Shedding Levels: Intersection Of Ifng, Tlr And Mirna Pathways, Ting-Hua Huang, Jolita J. Uthe, Shawn M. D. Bearson, Cumhur Yusuf Demirkale, Dan Nettleton, Susan Knetter, Curtis Christian, Amanda E. Ramer-Tait, Michael J. Wannemeuhler, Christopher K. Tuggle Jul 2019

Distinct Peripheral Blood Rna Responses To Salmonella In Pigs Differing In Salmonella Shedding Levels: Intersection Of Ifng, Tlr And Mirna Pathways, Ting-Hua Huang, Jolita J. Uthe, Shawn M. D. Bearson, Cumhur Yusuf Demirkale, Dan Nettleton, Susan Knetter, Curtis Christian, Amanda E. Ramer-Tait, Michael J. Wannemeuhler, Christopher K. Tuggle

Dan Nettleton

Transcriptomic analysis of the response to bacterial pathogens has been reported for several species, yet few studies have investigated the transcriptional differences in whole blood in subjects that differ in their disease response phenotypes. Salmonella species infect many vertebrate species, and pigs colonized with Salmonella enterica serovar Typhimurium (ST) are usually asymptomatic, making detection of these Salmonella-carrier pigs difficult. The variable fecal shedding of Salmonella is an important cause of foodborne illness and zoonotic disease. To investigate gene pathways and biomarkers associated with the variance in Salmonellashedding following experimental inoculation, we initiated the first analysis of the whole …


Unique Genome-Wide Transcriptome Profiles Of Chicken Macrophages Exposed To Salmonella-Derived Endotoxin, Ceren Ciraci, Christopher K. Tuggle, Michael J. Wannemeuhler, Dan Nettleton, Susan J. Lamont Jul 2019

Unique Genome-Wide Transcriptome Profiles Of Chicken Macrophages Exposed To Salmonella-Derived Endotoxin, Ceren Ciraci, Christopher K. Tuggle, Michael J. Wannemeuhler, Dan Nettleton, Susan J. Lamont

Dan Nettleton

Background: Macrophages play essential roles in both innate and adaptive immune responses. Bacteria require endotoxin, a complex lipopolysaccharide, for outer membrane permeability and the host interprets endotoxin as a signal to initiate an innate immune response. The focus of this study is kinetic and global transcriptional analysis of the chicken macrophage response to in vitro stimulation with endotoxin from Salmonella typhimurium-798.

Results: The 38535-probeset Affymetrix GeneChip Chicken Genome array was used to profile transcriptional response to endotoxin 1, 2, 4, and 8 hours post stimulation (hps). Using a maximum FDR (False Discovery Rate) of 0.05 to declare genes as differentially …


Feature Selection For Longitudinal Data By Using Sign Averages To Summarize Gene Expression Values Over Time, Suyan Tian, Chi Wang Mar 2019

Feature Selection For Longitudinal Data By Using Sign Averages To Summarize Gene Expression Values Over Time, Suyan Tian, Chi Wang

Biostatistics Faculty Publications

With the rapid evolution of high-throughput technologies, time series/longitudinal high-throughput experiments have become possible and affordable. However, the development of statistical methods dealing with gene expression profiles across time points has not kept up with the explosion of such data. The feature selection process is of critical importance for longitudinal microarray data. In this study, we proposed aggregating a gene’s expression values across time into a single value using the sign average method, thereby degrading a longitudinal feature selection process into a classic one. Regularized logistic regression models with pseudogenes (i.e., the sign average of genes across time as predictors) …


Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan Mar 2019

Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan

COBRA Preprint Series

One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease's process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards …


Non-Invasive Analysis Of The Sputum Transcriptome Discriminates Clinical Phenotypes Of Asthma, Xiting Yan Jan 2019

Non-Invasive Analysis Of The Sputum Transcriptome Discriminates Clinical Phenotypes Of Asthma, Xiting Yan

Yale Day of Data

Whole transcriptome wide gene expression profiles in the sputum and circulation from 100 asthma patients were measured using the Affymetrix HuGene 1.0ST arrays. Unsupervised clustering analysis based on pathways from KEGG were used to identify TEA clusters of patients from the sputum gene expression profiles. The identified TEA clusters have significantly different pre-bronchodilator FEV1, bronchodilator responsiveness, exhaled nitric oxide levels, history of hospitalization for asthma and history of intubation. Evaluation of TEA clusters in children from Asthma BRIDGE cohort confirmed the identified differences in intubation and hospitalization. Furthermore, evaluation of the TH2 gene signatures suggested a much lower prevalence of …


A Novel Pathway-Based Distance Score Enhances Assessment Of Disease Heterogeneity In Gene Expression, Yunqing Liu, Xiting Yan Jan 2019

A Novel Pathway-Based Distance Score Enhances Assessment Of Disease Heterogeneity In Gene Expression, Yunqing Liu, Xiting Yan

Yale Day of Data

Distance-based unsupervised clustering of gene expression data is commonly used to identify heterogeneity in biologic samples. However, high noise levels in gene expression data and the relatively high correlation between genes are often encountered, so traditional distances such as Euclidean distance may not be effective at discriminating the biological differences between samples. In this study, we developed a novel computational method to assess the biological differences based on pathways by assuming that ontologically defined biological pathways in biologically similar samples have similar behavior. Application of this distance score results in more accurate, robust, and biologically meaningful clustering results in both …


Estimation And Variable Selection In High-Dimensional Settings With Mismeasured Observations, Michael Byrd Jan 2019

Estimation And Variable Selection In High-Dimensional Settings With Mismeasured Observations, Michael Byrd

Statistical Science Theses and Dissertations

Understanding high-dimensional data has become essential for practitioners across many disciplines. The general increase in ability to collect large amounts of data has prompted statistical methods to adapt for the rising number of possible relationships to be uncovered. The key to this adaptation has been the notion of sparse models, or, rather, models where most relationships between variables are assumed to be negligible at best. Driving these sparse models have been constraints on the solution set, yielding regularization penalties imposed on the optimization procedure. While these penalties have found great success, they are typically formulated with strong assumptions on the …


Microarray Data Analysis And Classification Of Cancers, Grant Gates Jan 2019

Microarray Data Analysis And Classification Of Cancers, Grant Gates

Williams Honors College, Honors Research Projects

When it comes to cancer, there is no standardized approach for identifying new cancer classes nor is there a standardized approach for assigning cancer tumors to existing classes. These two ideas are known as class discovery and class prediction. For a cancer patient to receive proper treatment, it is important that the type of cancer be accurately identified. For my Senior Honors Project, I would like to use this opportunity to research a topic in bioinformatics. Bioinformatics incorporates a few different subjects into one including biology, computer science and statistics. An intricate method for class discovery and class prediction is …


Power In Pairs: Assessing The Statistical Value Of Paired Samples In Tests For Differential Expression, John R. Stevens, Jennifer S. Herrick, Roger K. Wolff, Martha L. Slattery Dec 2018

Power In Pairs: Assessing The Statistical Value Of Paired Samples In Tests For Differential Expression, John R. Stevens, Jennifer S. Herrick, Roger K. Wolff, Martha L. Slattery

Mathematics and Statistics Faculty Publications

Background: When genomics researchers design a high-throughput study to test for differential expression, some biological systems and research questions provide opportunities to use paired samples from subjects, and researchers can plan for a certain proportion of subjects to have paired samples. We consider the effect of this paired samples proportion on the statistical power of the study, using characteristics of both count (RNA-Seq) and continuous (microarray) expression data from a colorectal cancer study.

Results: We demonstrate that a higher proportion of subjects with paired samples yields higher statistical power, for various total numbers of samples, and for various strengths of …


Innate Immunity, The Hepatic Extracellular Matrix, And Liver Injury: Mathematical Modeling Of Metastatic Potential And Tumor Development In Alcoholic Liver Disease., Shanice V. Hudson Dec 2018

Innate Immunity, The Hepatic Extracellular Matrix, And Liver Injury: Mathematical Modeling Of Metastatic Potential And Tumor Development In Alcoholic Liver Disease., Shanice V. Hudson

Electronic Theses and Dissertations

The overarching goals of the current work are to fill key gaps in the current understanding of alcohol consumption and the risk of metastasis to the liver. Considering the evidence this research group has compiled confirming that the hepatic matrisome responds dynamically to injury, an altered extracellular matrix (ECM) profile appears to be a key feature of pre-fibrotic inflammatory injury in the liver. This group has demonstrated that the hepatic ECM responds dynamically to alcohol exposure, in particular, sensitizing the liver to LPS-induced inflammatory damage. Although the study of alcohol in its role as a contributing factor to oncogenesis and …


Predictions Generated From A Simulation Engine For Gene Expression Micro-Arrays For Use In Research Laboratories, Gopinath R. Mavankal, John Blevins, Dominique Edwards, Monnie Mcgee, Andrew Hardin Jul 2018

Predictions Generated From A Simulation Engine For Gene Expression Micro-Arrays For Use In Research Laboratories, Gopinath R. Mavankal, John Blevins, Dominique Edwards, Monnie Mcgee, Andrew Hardin

SMU Data Science Review

In this paper we introduce the technical components, the biology and data science involved in the use of microarray technology in biological and clinical research. We discuss how laborious experimental protocols involved in obtaining this data used in laboratories could benefit from using simulations of the data. We discuss the approach used in the simulation engine from [7]. We use this simulation engine to generate a prediction tool in Power BI, a Microsoft, business intelligence tool for analytics and data visualization [22]. This tool could be used in any laboratory using micro-arrays to improve experimental design by comparing how predicted …


Analysis Challenges For High Dimensional Data, Bangxin Zhao Apr 2018

Analysis Challenges For High Dimensional Data, Bangxin Zhao

Electronic Thesis and Dissertation Repository

In this thesis, we propose new methodologies targeting the areas of high-dimensional variable screening, influence measure and post-selection inference. We propose a new estimator for the correlation between the response and high-dimensional predictor variables, and based on the estimator we develop a new screening technique termed Dynamic Tilted Current Correlation Screening (DTCCS) for high dimensional variables screening. DTCCS is capable of picking up the relevant predictor variables within a finite number of steps. The DTCCS method takes the popular used sure independent screening (SIS) method and the high-dimensional ordinary least squares projection (HOLP) approach as its special cases.

Two methods …


Computational Modelling Of Human Transcriptional Regulation By An Information Theory-Based Approach, Ruipeng Lu Apr 2018

Computational Modelling Of Human Transcriptional Regulation By An Information Theory-Based Approach, Ruipeng Lu

Electronic Thesis and Dissertation Repository

ChIP-seq experiments can identify the genome-wide binding site motifs of a transcription factor (TF) and determine its sequence specificity. Multiple algorithms were developed to derive TF binding site (TFBS) motifs from ChIP-seq data, including the entropy minimization-based Bipad that can derive both contiguous and bipartite motifs. Prior studies applying these algorithms to ChIP-seq data only analyzed a small number of top peaks with the highest signal strengths, biasing their resultant position weight matrices (PWMs) towards consensus-like, strong binding sites; nor did they derive bipartite motifs, disabling the accurate modelling of binding behavior of dimeric TFs.

This thesis presents a novel …


A Comparison Of Unsupervised Methods For Dna Microarray Leukemia Data, Denise Harness Apr 2018

A Comparison Of Unsupervised Methods For Dna Microarray Leukemia Data, Denise Harness

Appalachian Student Research Forum

Advancements in DNA microarray data sequencing have created the need for sophisticated machine learning algorithms and feature selection methods. Probabilistic graphical models, in particular, have been used to identify whether microarrays or genes cluster together in groups of individuals having a similar diagnosis. These clusters of genes are informative, but can be misleading when every gene is used in the calculation. First feature reduction techniques are explored, however the size and nature of the data prevents traditional techniques from working efficiently. Our method is to use the partial correlations between the features to create a precision matrix and predict which …


Contributions To Statistical Testing, Prediction, And Modeling, John C. Pesko Mar 2017

Contributions To Statistical Testing, Prediction, And Modeling, John C. Pesko

Mathematics & Statistics ETDs

1. "Parametric Bootstrap (PB) and Objective Bayesian (OB) Testing with Applications to Heteroscedastic ANOVA": For one-way heteroscedastic ANOVA, we show a close relationship between the PB and OB approaches to significance testing, demonstrating the conditions for which the two approaches are equivalent. Using a simulation study, PB and OB performance is compared to a test based on the predictive distribution as well as the unweighted test of Akritas & Papadatos (2004). We extend this work to the RCBD with subsampling model, and prove a repeated sampling property and large sample property for general OB significance testing.

2. "Early Identification of …


The Generalized Monotone Incremental Forward Stagewise Method For Modeling Longitudinal, Clustered, And Overdispersed Count Data: Application Predicting Nuclear Bud And Micronuclei Frequencies, Rebecca Lehman Jan 2017

The Generalized Monotone Incremental Forward Stagewise Method For Modeling Longitudinal, Clustered, And Overdispersed Count Data: Application Predicting Nuclear Bud And Micronuclei Frequencies, Rebecca Lehman

Theses and Dissertations

With the influx of high-dimensional data there is an immediate need for statistical methods that are able to handle situations when the number of predictors greatly exceeds the number of samples. One such area of growth is in examining how environmental exposures to toxins impact the body long term. The cytokinesis-block micronucleus assay can measure the genotoxic effect of exposure as a count outcome. To investigate potential biomarkers, high-throughput assays that assess gene expression and methylation have been developed. It is of interest to identify biomarkers or molecular features that are associated with elevated micronuclei (MN) or nuclear bud (Nbud) …


Integration Of Multi-Platform High-Dimensional Omic Data, Xuebei An May 2016

Integration Of Multi-Platform High-Dimensional Omic Data, Xuebei An

Dissertations & Theses (Open Access)

The development of high-throughput biotechnologies have made data accessible from different platforms, including RNA sequencing, copy number variation, DNA methylation, protein lysate arrays, etc. The high-dimensional omic data derived from different technological platforms have been extensively used to facilitate comprehensive understanding of disease mechanisms and to determine personalized health treatments. Although vital to the progress of clinical research, the high dimensional multi-platform data impose new challenges for data analysis. Numerous studies have been proposed to integrate multi-platform omic data; however, few have efficiently and simultaneously addressed the problems that arise from high dimensionality and complex correlations.

In my dissertation, I …


Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang Feb 2016

Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang

COBRA Preprint Series

Non-negative matrix factorization (NMF) is a widely used machine learning algorithm for dimension reduction of large-scale data. It has found successful applications in a variety of fields such as computational biology, neuroscience, natural language processing, information retrieval, image processing and speech recognition. In bioinformatics, for example, it has been used to extract patterns and profiles from genomic and text-mining data as well as in protein sequence and structure analysis. While the scientific performance of NMF is very promising in dealing with high dimensional data sets and complex data structures, its computational cost is high and sometimes could be critical for …