Open Access. Powered by Scholars. Published by Universities.®

Life Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 12 of 12

Full-Text Articles in Life Sciences

Metrics For Comparison Of Complex Networks, Clarissa Reyes Dec 2023

Metrics For Comparison Of Complex Networks, Clarissa Reyes

Open Access Theses & Dissertations

Heuristic network statistics are used as a preliminary approach to identify change across networks. In networks where there is known node correspondence (KNC), conventional network comparison methods include taking a norm of the difference matrix, or calculating dissimilarity measures like DeltaCon and cut distance. Since different KNC measures provide varying insight to the network comparison problem, we propose employing Rank Score Characteristic Functions (RSCFs) and the rank-score process as a method for reaching a consensus when ranking quantified change across multiple pairs of networks â?? which is particularly useful for ranking change across subpopulations or subgraphs. Additionally, we propose a …


Developing And Applying Computational Algorithms To Reveal Health-Related Biomolecular Interactions, Yixin Xie May 2022

Developing And Applying Computational Algorithms To Reveal Health-Related Biomolecular Interactions, Yixin Xie

Open Access Theses & Dissertations

Computational biology is an interdisciplinary area that applies computational approaches in biological big data, including protein amino acid sequences, genetic sequences, etc., which is widely used to analyze protein-protein interactions, make predictions in drug discovery, develop vaccines, etc. Popular methods include mathematical modeling, molecular dynamics simulations, data science mythology, etc. With the help of computational algorithms and applications, drug development is much faster than traditional processes, as it reduces risks early on in a drug discovery process and helps researchers select target candidates that have the highest potential for success. In my doctoral research, I applied multi-scale computational approaches to …


Statistical Analysis Of Genetic Sequence Variants In Whole Exome Sequencing Data From Patients With Prostate Cancer, Kelvin Ofori-Minta Aug 2021

Statistical Analysis Of Genetic Sequence Variants In Whole Exome Sequencing Data From Patients With Prostate Cancer, Kelvin Ofori-Minta

Open Access Theses & Dissertations

A single variation in the genetic sequence within the DNA of an organism could easily lead to beneficial, detrimental or neutral effects. Most often than not, these effects are detrimental than beneficial. While many biomedical and bioinformatics studies have been conducted to determine the genetic cause of prostate cancer (PrCa) which is still the second leading cause of cancer related death among men in the United States. An appreciable effort in statistical bioinformatics researches has been directed towards this aim. Through statistical analyses of a set of whole exome sequencing data from patients with PrCa obtained via The Cancer Genome …


The Hybridizing Ions Treatment (Hit) Method Development And Computational Study On Sars-Cov-2 E Protein., Shengjie Sun May 2021

The Hybridizing Ions Treatment (Hit) Method Development And Computational Study On Sars-Cov-2 E Protein., Shengjie Sun

Open Access Theses & Dissertations

Fast and accurate calculations of the electrostatic features for highly charged biomolecules such as DNA, RNA, highly charged proteins, are crucial but challenging tasks. Traditional implicit solvent methods calculate the electrostatic features fast, but they are not able to balance the high net charges in the biomolecules effectively. Explicit solvent methods add unbalanced ions to neutralize the highly charged biomolecules in molecular dynamic simulations, which require more expensive computing resources. Here we developed a novel method, the Hybridizing Ions Treatment (HIT) method, which hybridizes the implicit solvent method with the explicit method to realistically calculate the electrostatic potential for highly …


Gene Selection And Classification In High-Throughput Biological Data With Integrated Machine Learning Algorithms And Bioinformatics Approaches, Abhijeet R Patil May 2021

Gene Selection And Classification In High-Throughput Biological Data With Integrated Machine Learning Algorithms And Bioinformatics Approaches, Abhijeet R Patil

Open Access Theses & Dissertations

With the rise of high throughput technologies in biomedical research, large volumes of expression profiling, methylation profiling, and RNA-sequencing data are being generated. These high-dimensional data have large number of features with small number of samples, a characteristic called the "curse of dimensionality." The selection of optimal features, which largely affects the performance of classification algorithms in machine learning models, has led to challenging problems in bioinformatics analyses of such high-dimensional datasets. In this work, I focus on the design of two-stage frameworks of feature selection and classification and their applications in multiple sets of colorectal cancer data. The first …


Combination Of Resampling Based Lasso Feature Selection And Ensembles Of Regularized Regression Models, Abhijeet R. Patil Jan 2019

Combination Of Resampling Based Lasso Feature Selection And Ensembles Of Regularized Regression Models, Abhijeet R. Patil

Open Access Theses & Dissertations

In high-dimensional data, the performance of various classiers is largely dependent on the selection of important features. Most of the individual classiers using existing feature selection (FS) methods do not perform well for highly correlated data. Obtaining important

features using the FS method and selecting the best performing classier is a challenging task in high throughput data. In this research, we propose a combination of resampling based least absolute shrinkage and selection operator (LASSO) feature selection (RLFS)

and ensembles of regularized regression models (ERRM) capable of handling data with the high correlation structures. The ERRM boosts the prediction accuracy with …


Integrated Statistical And Machine Learning Algorithms For Predicting And Classifying G Protein-Coupled Receptors, Fredrick Ayivor Jan 2018

Integrated Statistical And Machine Learning Algorithms For Predicting And Classifying G Protein-Coupled Receptors, Fredrick Ayivor

Open Access Theses & Dissertations

G protein-coupled receptors (GPCRs) are transmembrane proteins with important functions in signal transduction and often serve as drug targets. With increasing availability of protein sequence information, there is much interest in computationally predicting GPCRs and classifying them according to their biological roles. Such predictions are cost-efficient and can be valuable guides for designing wet lab experiments to help elucidate signaling pathways and expedite drug discovery. There are existing computational tools of GPCR prediction that involve principal component analysis (PCA), intimate sorting (IS), support vector machine, and random forest (RF) techniques using various sequence derived features. While accuracies of over 90\% …


Sample Size Estimation For Genomics Experiments With Dependent End Points, Desmond Koomson Jan 2016

Sample Size Estimation For Genomics Experiments With Dependent End Points, Desmond Koomson

Open Access Theses & Dissertations

In typical genomics studies involving numerous association tests of gene mutations with a disease, error rate control via multiplicity adjustment is paramount because even if all genes were to be non-differentially associated, we would still make some false positives. Many methods exist that incorporate the control of multiplicity for normally distributed endpoints in sample size estimation, but none addresses the issue for non-normally correlated endpoints.

One common practice in the literature is to assume an equal correlation among all differentially associated or expressed genes, thereby using the generalized binomial or beta-binomial model to compute the comparison-wise power of detecting these …


Secondary Structure Prediction Of Long Rna Sequences Based On Inversion Excursions And A Modularized Mapreduce Framework, Daniel Tesfai Yehdego Jan 2012

Secondary Structure Prediction Of Long Rna Sequences Based On Inversion Excursions And A Modularized Mapreduce Framework, Daniel Tesfai Yehdego

Open Access Theses & Dissertations

Ribonucleic acid (RNA) molecules and their secondary structures play important roles in many biological processes including gene expression and regulation. The genomes of many viruses are also RNA molecules. Since secondary structures are crucial for RNA functionality, computational predictions of the RNA secondary structures have been widely studied. However, the tremendous demands on computer memory and computing time for complex secondary structures limit the capability of existing thermodynamically based algorithms for structure predictions to handling only short RNA sequences with a few hundred bases. One approach to overcome this limitation is by first cutting long RNA sequences into shorter, non-overlapping …


Generalized Linear Latent Mixed Modeling Of Functional Independent Measures And Patient Outcomes, Maduranga Kasun Dassanayake Jan 2012

Generalized Linear Latent Mixed Modeling Of Functional Independent Measures And Patient Outcomes, Maduranga Kasun Dassanayake

Open Access Theses & Dissertations

The Functional Independent Measure (FIM) is one of the most widely accepted functional assessment measures used in the rehabilitation community. Past research studies have investigated the relationship between place of discharge, admission FIM scores or FIM difference scores, and patients' characteristics and found relationships between those variables. However, most of these studies fail to account for the multi-layered multidimensionality of the FIM and the measurement error associated with the FIM items. This study utilizes Generalized Linear Latent Mixed Models (GLLAMM) and Structural Equation Models (SEM) to assess which patient characteristics are associated with FIM difference scores and the structural relationship …


Distributional Properties Of Inversions And Segmentation Algorithms For Rna Sequences, Sameera Dhananjaya Viswakula Jan 2011

Distributional Properties Of Inversions And Segmentation Algorithms For Rna Sequences, Sameera Dhananjaya Viswakula

Open Access Theses & Dissertations

Ribonucleic acid (RNA) is a long single stranded molecule made up of four types of nucleotide bases: Adenine (A), Cytosine(C), Guanine (G) and Uracil (U). It folds back on itself and forms C-G and A-U complementary base pairs. The set of such hydrogen-bonded pairs in an RNA molecule is called its secondary structure. Knowing the secondary structure of RNA is useful for understanding its biological function. Prediction of RNA secondary structure from the nucleotide sequence has been an important bioinformatics problem for over two decades.

The work in this thesis is motivated by the need to improve the secondary structure …


Computational Methods Of Hidden Markov Models With Respect To Cpg Island Prediction In Dna Sequences, Roberto Angel Ortega Jan 2011

Computational Methods Of Hidden Markov Models With Respect To Cpg Island Prediction In Dna Sequences, Roberto Angel Ortega

Open Access Theses & Dissertations

Hidden Markov models (HMM's) are a specific case of Markov models where, contrary to Markov chains, the observer is unaware of what state the model was in when the symbol is observed. Like Markov chains, HMM's assume that the future state of a sequence is dependent only on the current state of the sequence. The parameters associated with HMM's are transition and emission probabilities, where transition probabilities are associated with the probability of transitioning from one state to another, and emission probabilities are the probabilities associated with observing a symbol given it came from a specific state.

The structure of …