Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Lasso (2)
- Baum-Welch algorithm (1)
- Bioinformatics (1)
- Biological sequence analysis (1)
- COVID-19 (1)
-
- Cognition (1)
- Comparison-wise Power (1)
- Computational biology (1)
- CpG islands (1)
- DNA repair (1)
- Distributional Properties of Inversions (1)
- Drug discovery (1)
- EGA networks (1)
- Excursion (1)
- Familywise Error Rate (1)
- Feature Selection (1)
- Functional Independent Measure (FIM) (1)
- GPCR (1)
- Generalized Linear Latent Mixed Models (GLLAMM) (1)
- Graph theory (1)
- Graphical networks (1)
- Hadoop (1)
- Helices (1)
- Hidden Markov models (1)
- Hierarchical clustering (1)
- High-dimensional data (1)
- Intimate sorting (1)
- Inversion (1)
- Inversion Palindromes (1)
- Known node correspondence (1)
Articles 1 - 12 of 12
Full-Text Articles in Life Sciences
Metrics For Comparison Of Complex Networks, Clarissa Reyes
Metrics For Comparison Of Complex Networks, Clarissa Reyes
Open Access Theses & Dissertations
Heuristic network statistics are used as a preliminary approach to identify change across networks. In networks where there is known node correspondence (KNC), conventional network comparison methods include taking a norm of the difference matrix, or calculating dissimilarity measures like DeltaCon and cut distance. Since different KNC measures provide varying insight to the network comparison problem, we propose employing Rank Score Characteristic Functions (RSCFs) and the rank-score process as a method for reaching a consensus when ranking quantified change across multiple pairs of networks â?? which is particularly useful for ranking change across subpopulations or subgraphs. Additionally, we propose a …
Developing And Applying Computational Algorithms To Reveal Health-Related Biomolecular Interactions, Yixin Xie
Developing And Applying Computational Algorithms To Reveal Health-Related Biomolecular Interactions, Yixin Xie
Open Access Theses & Dissertations
Computational biology is an interdisciplinary area that applies computational approaches in biological big data, including protein amino acid sequences, genetic sequences, etc., which is widely used to analyze protein-protein interactions, make predictions in drug discovery, develop vaccines, etc. Popular methods include mathematical modeling, molecular dynamics simulations, data science mythology, etc. With the help of computational algorithms and applications, drug development is much faster than traditional processes, as it reduces risks early on in a drug discovery process and helps researchers select target candidates that have the highest potential for success. In my doctoral research, I applied multi-scale computational approaches to …
Statistical Analysis Of Genetic Sequence Variants In Whole Exome Sequencing Data From Patients With Prostate Cancer, Kelvin Ofori-Minta
Statistical Analysis Of Genetic Sequence Variants In Whole Exome Sequencing Data From Patients With Prostate Cancer, Kelvin Ofori-Minta
Open Access Theses & Dissertations
A single variation in the genetic sequence within the DNA of an organism could easily lead to beneficial, detrimental or neutral effects. Most often than not, these effects are detrimental than beneficial. While many biomedical and bioinformatics studies have been conducted to determine the genetic cause of prostate cancer (PrCa) which is still the second leading cause of cancer related death among men in the United States. An appreciable effort in statistical bioinformatics researches has been directed towards this aim. Through statistical analyses of a set of whole exome sequencing data from patients with PrCa obtained via The Cancer Genome …
The Hybridizing Ions Treatment (Hit) Method Development And Computational Study On Sars-Cov-2 E Protein., Shengjie Sun
The Hybridizing Ions Treatment (Hit) Method Development And Computational Study On Sars-Cov-2 E Protein., Shengjie Sun
Open Access Theses & Dissertations
Fast and accurate calculations of the electrostatic features for highly charged biomolecules such as DNA, RNA, highly charged proteins, are crucial but challenging tasks. Traditional implicit solvent methods calculate the electrostatic features fast, but they are not able to balance the high net charges in the biomolecules effectively. Explicit solvent methods add unbalanced ions to neutralize the highly charged biomolecules in molecular dynamic simulations, which require more expensive computing resources. Here we developed a novel method, the Hybridizing Ions Treatment (HIT) method, which hybridizes the implicit solvent method with the explicit method to realistically calculate the electrostatic potential for highly …
Gene Selection And Classification In High-Throughput Biological Data With Integrated Machine Learning Algorithms And Bioinformatics Approaches, Abhijeet R Patil
Gene Selection And Classification In High-Throughput Biological Data With Integrated Machine Learning Algorithms And Bioinformatics Approaches, Abhijeet R Patil
Open Access Theses & Dissertations
With the rise of high throughput technologies in biomedical research, large volumes of expression profiling, methylation profiling, and RNA-sequencing data are being generated. These high-dimensional data have large number of features with small number of samples, a characteristic called the "curse of dimensionality." The selection of optimal features, which largely affects the performance of classification algorithms in machine learning models, has led to challenging problems in bioinformatics analyses of such high-dimensional datasets. In this work, I focus on the design of two-stage frameworks of feature selection and classification and their applications in multiple sets of colorectal cancer data. The first …
Combination Of Resampling Based Lasso Feature Selection And Ensembles Of Regularized Regression Models, Abhijeet R. Patil
Combination Of Resampling Based Lasso Feature Selection And Ensembles Of Regularized Regression Models, Abhijeet R. Patil
Open Access Theses & Dissertations
In high-dimensional data, the performance of various classiers is largely dependent on the selection of important features. Most of the individual classiers using existing feature selection (FS) methods do not perform well for highly correlated data. Obtaining important
features using the FS method and selecting the best performing classier is a challenging task in high throughput data. In this research, we propose a combination of resampling based least absolute shrinkage and selection operator (LASSO) feature selection (RLFS)
and ensembles of regularized regression models (ERRM) capable of handling data with the high correlation structures. The ERRM boosts the prediction accuracy with …
Integrated Statistical And Machine Learning Algorithms For Predicting And Classifying G Protein-Coupled Receptors, Fredrick Ayivor
Integrated Statistical And Machine Learning Algorithms For Predicting And Classifying G Protein-Coupled Receptors, Fredrick Ayivor
Open Access Theses & Dissertations
G protein-coupled receptors (GPCRs) are transmembrane proteins with important functions in signal transduction and often serve as drug targets. With increasing availability of protein sequence information, there is much interest in computationally predicting GPCRs and classifying them according to their biological roles. Such predictions are cost-efficient and can be valuable guides for designing wet lab experiments to help elucidate signaling pathways and expedite drug discovery. There are existing computational tools of GPCR prediction that involve principal component analysis (PCA), intimate sorting (IS), support vector machine, and random forest (RF) techniques using various sequence derived features. While accuracies of over 90\% …
Sample Size Estimation For Genomics Experiments With Dependent End Points, Desmond Koomson
Sample Size Estimation For Genomics Experiments With Dependent End Points, Desmond Koomson
Open Access Theses & Dissertations
In typical genomics studies involving numerous association tests of gene mutations with a disease, error rate control via multiplicity adjustment is paramount because even if all genes were to be non-differentially associated, we would still make some false positives. Many methods exist that incorporate the control of multiplicity for normally distributed endpoints in sample size estimation, but none addresses the issue for non-normally correlated endpoints.
One common practice in the literature is to assume an equal correlation among all differentially associated or expressed genes, thereby using the generalized binomial or beta-binomial model to compute the comparison-wise power of detecting these …
Secondary Structure Prediction Of Long Rna Sequences Based On Inversion Excursions And A Modularized Mapreduce Framework, Daniel Tesfai Yehdego
Secondary Structure Prediction Of Long Rna Sequences Based On Inversion Excursions And A Modularized Mapreduce Framework, Daniel Tesfai Yehdego
Open Access Theses & Dissertations
Ribonucleic acid (RNA) molecules and their secondary structures play important roles in many biological processes including gene expression and regulation. The genomes of many viruses are also RNA molecules. Since secondary structures are crucial for RNA functionality, computational predictions of the RNA secondary structures have been widely studied. However, the tremendous demands on computer memory and computing time for complex secondary structures limit the capability of existing thermodynamically based algorithms for structure predictions to handling only short RNA sequences with a few hundred bases. One approach to overcome this limitation is by first cutting long RNA sequences into shorter, non-overlapping …
Generalized Linear Latent Mixed Modeling Of Functional Independent Measures And Patient Outcomes, Maduranga Kasun Dassanayake
Generalized Linear Latent Mixed Modeling Of Functional Independent Measures And Patient Outcomes, Maduranga Kasun Dassanayake
Open Access Theses & Dissertations
The Functional Independent Measure (FIM) is one of the most widely accepted functional assessment measures used in the rehabilitation community. Past research studies have investigated the relationship between place of discharge, admission FIM scores or FIM difference scores, and patients' characteristics and found relationships between those variables. However, most of these studies fail to account for the multi-layered multidimensionality of the FIM and the measurement error associated with the FIM items. This study utilizes Generalized Linear Latent Mixed Models (GLLAMM) and Structural Equation Models (SEM) to assess which patient characteristics are associated with FIM difference scores and the structural relationship …
Distributional Properties Of Inversions And Segmentation Algorithms For Rna Sequences, Sameera Dhananjaya Viswakula
Distributional Properties Of Inversions And Segmentation Algorithms For Rna Sequences, Sameera Dhananjaya Viswakula
Open Access Theses & Dissertations
Ribonucleic acid (RNA) is a long single stranded molecule made up of four types of nucleotide bases: Adenine (A), Cytosine(C), Guanine (G) and Uracil (U). It folds back on itself and forms C-G and A-U complementary base pairs. The set of such hydrogen-bonded pairs in an RNA molecule is called its secondary structure. Knowing the secondary structure of RNA is useful for understanding its biological function. Prediction of RNA secondary structure from the nucleotide sequence has been an important bioinformatics problem for over two decades.
The work in this thesis is motivated by the need to improve the secondary structure …
Computational Methods Of Hidden Markov Models With Respect To Cpg Island Prediction In Dna Sequences, Roberto Angel Ortega
Computational Methods Of Hidden Markov Models With Respect To Cpg Island Prediction In Dna Sequences, Roberto Angel Ortega
Open Access Theses & Dissertations
Hidden Markov models (HMM's) are a specific case of Markov models where, contrary to Markov chains, the observer is unaware of what state the model was in when the symbol is observed. Like Markov chains, HMM's assume that the future state of a sequence is dependent only on the current state of the sequence. The parameters associated with HMM's are transition and emission probabilities, where transition probabilities are associated with the probability of transitioning from one state to another, and emission probabilities are the probabilities associated with observing a symbol given it came from a specific state.
The structure of …