Open Access. Powered by Scholars. Published by Universities.®

Bioinformatics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 13 of 13

Full-Text Articles in Bioinformatics

Verrucous Carcinoma Of The Vulva: Patterns Of Care And Treatment Outcomes., Sara M. Dryden, Leonid B. Reshko, Jeremy T. Gaskins, Scott R. Silva Nov 2021

Verrucous Carcinoma Of The Vulva: Patterns Of Care And Treatment Outcomes., Sara M. Dryden, Leonid B. Reshko, Jeremy T. Gaskins, Scott R. Silva

Faculty Scholarship

Background: Verrucous vulvar carcinoma (VC) is an uncommon and distinct histologic subtype of squamous cell carcinoma (SCC). The available literature on VC is currently limited to case reports and small single institution studies. Aims: The goals of this study were to analyze data from the National Cancer Database (NCDB) to quantitate the incidence of VC and to investigate the effects of patient demographics, tumor characteristics, and treatment regimens on overall survival (OS) in women with verrucous vulvar carcinoma. Methods and results: Patients diagnosed with vulvar SCC or VC between the years of 2004 and 2016 were identified in the NCDB. …


Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das Dec 2020

Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das

Electronic Theses and Dissertations

Recently, gene set analysis has become the first choice for gaining insights into the underlying complex biology of diseases through high-throughput genomic studies, such as Microarrays, bulk RNA-Sequencing, single cell RNA-Sequencing, etc. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Further, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. Hence, a comprehensive overview of the available gene set analysis approaches used for different high-throughput genomic studies is provided. The analysis of gene sets is usually carried out based on …


Modified-Half-Normal Distribution And Different Methods To Estimate Average Treatment Effect., Jingchao Sun Dec 2020

Modified-Half-Normal Distribution And Different Methods To Estimate Average Treatment Effect., Jingchao Sun

Electronic Theses and Dissertations

This dissertation consists of three projects related to Modified-Half-Normal distribution and causal inference. In my first project, a new distribution called Modified-Half-Normal distribution was introduced. I explored a few of its distributional properties, the procedures for generating random samples based on Bayesian approaches, and the parameter estimation based on the method of moments. The second project deals with the problem of selection bias of average treatment effect (ATE) if we use the observational data. I combined the propensity score based inverse probability of treatment weighting (IPTW) method and the directed acyclic graph (DAG) to solve this problem. The third project …


Designing And Sample Size Calculation In Presence Of Heterogeneity In Biological Studies Involving High-Throughput Data., Sudhir Srivastava Aug 2019

Designing And Sample Size Calculation In Presence Of Heterogeneity In Biological Studies Involving High-Throughput Data., Sudhir Srivastava

Electronic Theses and Dissertations

The designing and determination of sample size are important for conducting high-throughput biological experiments such as proteomics experiments and RNA-Seq expression studies, thus leading to better understanding of complex mechanisms underlying various biological processes. The variations in the biological data or technical approaches to data collection lead to heterogeneity for the samples under study. We critically worked on the issues of technical and biological heterogeneity. The quantitative measurements based on liquid chromatography (LC) coupled with mass spectrometry (MS) often suffer from the problem of missing values (MVs) and data heterogeneity. We considered a proteomics data set generated from human kidney …


Bayesian Analytical Approaches For Metabolomics : A Novel Method For Molecular Structure-Informed Metabolite Interaction Modeling, A Novel Diagnostic Model For Differentiating Myocardial Infarction Type, And Approaches For Compound Identification Given Mass Spectrometry Data., Patrick J. Trainor Aug 2018

Bayesian Analytical Approaches For Metabolomics : A Novel Method For Molecular Structure-Informed Metabolite Interaction Modeling, A Novel Diagnostic Model For Differentiating Myocardial Infarction Type, And Approaches For Compound Identification Given Mass Spectrometry Data., Patrick J. Trainor

Electronic Theses and Dissertations

Metabolomics, the study of small molecules in biological systems, has enjoyed great success in enabling researchers to examine disease-associated metabolic dysregulation and has been utilized for the discovery biomarkers of disease and phenotypic states. In spite of recent technological advances in the analytical platforms utilized in metabolomics and the proliferation of tools for the analysis of metabolomics data, significant challenges in metabolomics data analyses remain. In this dissertation, we present three of these challenges and Bayesian methodological solutions for each. In the first part we develop a new methodology to serve a basis for making higher order inferences in metabolomics, …


Region Based Gene Expression Via Reanalysis Of Publicly Available Microarray Data Sets., Ernur Saka May 2018

Region Based Gene Expression Via Reanalysis Of Publicly Available Microarray Data Sets., Ernur Saka

Electronic Theses and Dissertations

A DNA microarray is a high-throughput technology used to identify relative gene expression. One of the most widely used platforms is the Affymetrix® GeneChip® technology which detects gene expression levels based on probe sets composed of a set of twenty-five nucleotide probes designed to hybridize with specific gene targets. Given a particular Affymetrix® GeneChip® platform, the design of the probes is fixed. However, the method of analysis is dynamic in nature due to the ability to annotate and group probes into uniquely defined groupings. This is particularly important since publicly available repositories of microarray datasets, such as ArrayExpress and NCBI’s …


Functional Data Analysis Methods For Predicting Disease Status., Sarah Kendrick Dec 2017

Functional Data Analysis Methods For Predicting Disease Status., Sarah Kendrick

Electronic Theses and Dissertations

Introduction: Differential scanning calorimetry (DSC) is used to determine thermally-induced conformational changes of biomolecules within a blood plasma sample. Recent research has indicated that DSC curves (or thermograms) may have different characteristics based on disease status and, thus, may be useful as a monitoring and diagnostic tool for some diseases. Since thermograms are curves measured over a range of temperature values, they are often considered as functional data. In this dissertation we propose and apply functional data analysis (FDA) techniques to analyze DSC data from the Lupus Family Registry and Repository (LFRR). The aim is to develop FDA methods to …


Novel Statistical Approaches For Missing Values In Truncated High-Dimensional Metabolomics Data With A Detection Threshold., Jasmit Sureshkumar Shah May 2017

Novel Statistical Approaches For Missing Values In Truncated High-Dimensional Metabolomics Data With A Detection Threshold., Jasmit Sureshkumar Shah

Electronic Theses and Dissertations

Despite considerable advances in high throughput technology over the last decade, new challenges have emerged related to the analysis, interpretation, and integration of high-dimensional data. The arrival of omics datasets has contributed to the rapid improvement of systems biology, which seeks the understanding of complex biological systems. Metabolomics is an emerging omics field, where mass spectrometry technologies generate high dimensional datasets. As advances in this area are progressing, the need for better analysis methods to provide correct and adequate results are required. While in other omics sectors such as genomics or proteomics there has and continues to be critical understanding …


Integrated Analysis Of Mirna/Mrna Expression And Gene Methylation Using Sparse Canonical Correlation Analysis., Dake Yang May 2016

Integrated Analysis Of Mirna/Mrna Expression And Gene Methylation Using Sparse Canonical Correlation Analysis., Dake Yang

Electronic Theses and Dissertations

MicroRNAs (miRNAs) are a large number of small endogenous non-coding RNA molecules (18-25 nucleotides in length) which regulate expression of genes post-transcriptionally. While a variety of algorithms exist for determining the targets of miRNAs, they are generally based on sequence information and frequently produce lists consisting of thousands of genes. Canonical correlation analysis (CCA) is a multivariate statistical method that can be used to find linear relationships between two data sets, and here we apply CCA to find the linear combination of differentially expressed miRNAs and their corresponding target genes having maximal negative correlation. Due to the high dimensionality, sparse …


Optcluster : An R Package For Determining The Optimal Clustering Algorithm And Optimal Number Of Clusters., Michael N. Sekula May 2015

Optcluster : An R Package For Determining The Optimal Clustering Algorithm And Optimal Number Of Clusters., Michael N. Sekula

Electronic Theses and Dissertations

Determining the best clustering algorithm and ideal number of clusters for a particular dataset is a fundamental difficulty in unsupervised clustering analysis. In biological research, data generated from Next Generation Sequencing technology and microarray gene expression data are becoming more and more common, so new tools and resources are needed to group such high dimensional data using clustering analysis. Different clustering algorithms can group data very differently. Therefore, there is a need to determine the best groupings in a given dataset using the most suitable clustering algorithm for that data. This paper presents the R package optCluster as an efficient …


Summary Of Survival Analysis With Sas Procedures., Derek Duane Childers 1990- May 2015

Summary Of Survival Analysis With Sas Procedures., Derek Duane Childers 1990-

Electronic Theses and Dissertations

The research conducted for this thesis was performed to summarize some of the most commonly used survival analysis techniques as well as to create one macro that will provide the solutions for these techniques. Some of the techniques that this thesis focuses on are survival and hazard functions, mean and median survival times, life table, log rank test, proportional hazards/model building, and competing risk. To further analyze these survival analysis techniques I will use the Bone Marrow Transplantation for Leukemia dataset. This trial consists of either acute myelocytic leukemia (AML 99 patients) or acute lymphoblastic leukemia (ALL 38 patients). There …


Statistical Methods For Assessing Treatment Effects For Observational Studies., Kristopher C. Gardner 1984- May 2014

Statistical Methods For Assessing Treatment Effects For Observational Studies., Kristopher C. Gardner 1984-

Electronic Theses and Dissertations

Though randomized clinical (RCTs) trials are the gold standard for comparing treatments, they are often infeasible or exclude clinically important subjects, or generally represent an idealized medical setting rather than real practice. Observational data provide an opportunity to study practice-based evidence, but also present challenges for analysis. Traditional statistical methods which are suitable for RCTs may be inadequate for the observational studies. In this project, four of the most popular statistical methods for observational studies: ANCOVA, propensity score matching, regression with the propensity score as a covariate, and instrumental variables (IV) are investigated through application to MarketScan insurance claims data. …


Compound Identification Using Penalized Linear Regression., Ruiqi Liu May 2013

Compound Identification Using Penalized Linear Regression., Ruiqi Liu

Electronic Theses and Dissertations

In this study, we propose a new method for compound identification using penalized linear regression. Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. In the context of the linear regression, the response variable is an experimental mass spectrum (i.e., query) and all the compounds in the reference library are the independent variables. However, the number of compounds in the reference library is much larger than the range of m/z values so that the data become high dimensional data with suffering from singularity. For …