Open Access. Powered by Scholars. Published by Universities.®

Life Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 34

Full-Text Articles in Life Sciences

Model-Based Deep Autoencoders For Clustering Single-Cell Rna Sequencing Data With Side Information, Xiang Lin Dec 2023

Model-Based Deep Autoencoders For Clustering Single-Cell Rna Sequencing Data With Side Information, Xiang Lin

Dissertations

Clustering analysis has been conducted extensively in single-cell RNA sequencing (scRNA-seq) studies. scRNA-seq can profile tens of thousands of genes' activities within a single cell. Thousands or tens of thousands of cells can be captured simultaneously in a typical scRNA-seq experiment. Biologists would like to cluster these cells for exploring and elucidating cell types or subtypes. Numerous methods have been designed for clustering scRNA-seq data. Yet, single-cell technologies develop so fast in the past few years that those existing methods do not catch up with these rapid changes and fail to fully fulfil their potential. For instance, besides profiling transcription …


Quantifying Balance: Computational And Learning Frameworks For The Characterization Of Balance In Bipedal Systems, Kubra Akbas Aug 2023

Quantifying Balance: Computational And Learning Frameworks For The Characterization Of Balance In Bipedal Systems, Kubra Akbas

Dissertations

In clinical practice and general healthcare settings, the lack of reliable and objective balance and stability assessment metrics hinders the tracking of patient performance progression during rehabilitation; the assessment of bipedal balance plays a crucial role in understanding stability and falls in humans and other bipeds, while providing clinicians important information regarding rehabilitation outcomes. Bipedal balance has often been examined through kinematic or kinetic quantities, such as the Zero Moment Point and Center of Pressure; however, analyzing balance specifically through the body's Center of Mass (COM) state offers a holistic and easily comprehensible view of balance and stability.

Building upon …


Machine Learning And Network Embedding Methods For Gene Co-Expression Networks, Niloofar Aghaieabiane May 2023

Machine Learning And Network Embedding Methods For Gene Co-Expression Networks, Niloofar Aghaieabiane

Dissertations

High-throughput technologies such as DNA microarrays and RNA-seq are used to measure the expression levels of large numbers of genes simultaneously. To support the extraction of biological knowledge, individual gene expression levels are transformed into Gene Co-expression Networks (GCNs). GCNs are analyzed to discover gene modules. GCN construction and analysis is a well-studied topic, for nearly two decades. While new types of sequencing and the corresponding data are now available, the software package WGCNA and its most recent variants are still widely used, contributing to biological discovery.

The discovery of biologically significant modules of genes from raw expression data is …


Deep Hybrid Modeling Of Neuronal Dynamics Using Generative Adversarial Networks, Soheil Saghafi May 2023

Deep Hybrid Modeling Of Neuronal Dynamics Using Generative Adversarial Networks, Soheil Saghafi

Dissertations

Mechanistic modeling and machine learning methods are powerful techniques for approximating biological systems and making accurate predictions from data. However, when used in isolation these approaches suffer from distinct shortcomings: model and parameter uncertainty limit mechanistic modeling, whereas machine learning methods disregard the underlying biophysical mechanisms. This dissertation constructs Deep Hybrid Models that address these shortcomings by combining deep learning with mechanistic modeling. In particular, this dissertation uses Generative Adversarial Networks (GANs) to provide an inverse mapping of data to mechanistic models and identifies the distributions of mechanistic model parameters coherent to the data.

Chapter 1 provides background information on …


One-Stage Blind Source Separation Via A Sparse Autoencoder Framework, Jason Anthony Dabin May 2022

One-Stage Blind Source Separation Via A Sparse Autoencoder Framework, Jason Anthony Dabin

Dissertations

Blind source separation (BSS) is the process of recovering individual source transmissions from a received mixture of co-channel signals without a priori knowledge of the channel mixing matrix or transmitted source signals. The received co-channel composite signal is considered to be captured across an antenna array or sensor network and is assumed to contain sparse transmissions, as users are active and inactive aperiodically over time. An unsupervised machine learning approach using an artificial feedforward neural network sparse autoencoder with one hidden layer is formulated for blindly recovering the channel matrix and source activity of co-channel transmissions. The BSS sparse autoencoder …


Methods For Extending Biomedical Reference Ontologies And Interface Terminologies For Ehrr Text Annotation, Vipina Kuttichi Keloth May 2021

Methods For Extending Biomedical Reference Ontologies And Interface Terminologies For Ehrr Text Annotation, Vipina Kuttichi Keloth

Dissertations

Biomedical ontologies and terminologies are a cornerstone in various electronic health record systems (EHRs) for encoding information related to diseases, diagnoses, treatments, etc. Ontologies in general represent entities (concepts) and events along with all interdependent properties and relationships in an efficient way to facilitate easy access, retrieval and sharing. With the landscape of medicine rapidly changing, biomedical ontologies and terminologies need to rapidly evolve to support interoperability, medical coding, record keeping, and healthcare activities in general, and to facilitate interdisciplinary research. Extending ontologies by identifying new and missing concepts plays a vital role in the maintenance of ontologies to keep …


Development Of Deep Learning Neural Network For Ecological And Medical Images, Shaobo Liu May 2021

Development Of Deep Learning Neural Network For Ecological And Medical Images, Shaobo Liu

Dissertations

Deep learning in computer vision and image processing has attracted attentions from various fields including ecology and medical image. Ecologists are interested in finding an effective model structure to classify different species. Tradition deep learning model use a convolutional neural network, such as LeNet, AlexNet, VGG models, residual neural network, and inception models, are first used on classifying bee wing and butterfly datasets. However, insufficient data sample and unbalanced samples in each class have caused a poor accuracy. To make improvement the test accuracy, data augmentation and transfer learning are applied. Recently developed deep learning framework based on mathematical morphology …


Enrichment Of Ontologies Using Machine Learning And Summarization, Hao Liu Aug 2020

Enrichment Of Ontologies Using Machine Learning And Summarization, Hao Liu

Dissertations

Biomedical ontologies are structured knowledge systems in biomedicine. They play a major role in enabling precise communications in support of healthcare applications, e.g., Electronic Healthcare Records (EHR) systems. Biomedical ontologies are used in many different contexts to facilitate information and knowledge management. The most widely used clinical ontology is the SNOMED CT. Placing a new concept into its proper position in an ontology is a fundamental task in its lifecycle of curation and enrichment.

A large biomedical ontology, which typically consists of many tens of thousands of concepts and relationships, can be viewed as a complex network with concepts as …


Cancer Risk Prediction With Whole Exome Sequencing And Machine Learning, Abdulrhman Fahad M Aljouie Dec 2019

Cancer Risk Prediction With Whole Exome Sequencing And Machine Learning, Abdulrhman Fahad M Aljouie

Dissertations

Accurate cancer risk and survival time prediction are important problems in personalized medicine, where disease diagnosis and prognosis are tuned to individuals based on their genetic material. Cancer risk prediction provides an informed decision about making regular screening that helps to detect disease at the early stage and therefore increases the probability of successful treatments. Cancer risk prediction is a challenging problem. Lifestyle, environment, family history, and genetic predisposition are some factors that influence the disease onset. Cancer risk prediction based on predisposing genetic variants has been studied extensively. Most studies have examined the predictive ability of variants in known …


Model-Based Deep Autoencoders For Characterizing Discrete Data With Application To Genomic Data Analysis, Tian Tian May 2019

Model-Based Deep Autoencoders For Characterizing Discrete Data With Application To Genomic Data Analysis, Tian Tian

Dissertations

Deep learning techniques have achieved tremendous successes in a wide range of real applications in recent years. For dimension reduction, deep neural networks (DNNs) provide a natural choice to parameterize a non-linear transforming function that maps the original high dimensional data to a lower dimensional latent space. Autoencoder is a kind of DNNs used to learn efficient feature representation in an unsupervised manner. Deep autoencoder has been widely explored and applied to analysis of continuous data, while it is understudied for characterizing discrete data. This dissertation focuses on developing model-based deep autoencoders for modeling discrete data. A motivating example of …


Polya Db3: A Database Cataloging Polyadenation Sites(Pas) Across Different Species And Their Conservation, Ram Mohan Nambiar Dec 2018

Polya Db3: A Database Cataloging Polyadenation Sites(Pas) Across Different Species And Their Conservation, Ram Mohan Nambiar

Theses

Polyadenation is an important process occurring in the messenger RNA that involves cleavage of 3 end nascent mRNAs and addition of poly(A) tails. For this thesis,I present PolyA DB3 ,a database cataloging cleavage and polyadenylation sites (PASs) in several genomes specifically for human,mouse,rat and chicken. This database is based on deep sequencing data. PASs are mapped by the 3’ region extraction and deep sequencing (3’READS) method, ensuring unequivocal PAS identification. Large volume of data based on diverse biological samples is used to increase PAS coverage and provide PAS usage information. Strand-specific RNA-seq data were used to extend annotated 3’ ends …


Hypoxic And Viral Contributions To The Etiopathogenesis Of Schizophrenia: A Whole Transcriptome Analysis, Kathryn A. Gorski May 2018

Hypoxic And Viral Contributions To The Etiopathogenesis Of Schizophrenia: A Whole Transcriptome Analysis, Kathryn A. Gorski

Theses

Schizophrenia is a mental illness with a complex and as of yet unclear etiology. It is highly heritable and has a strong polygenic character, however, studies examining the genetics of schizophrenia have not sufficiently explained all variability in its prevalence. Environmental causes are theorized to have a non trivial contribution to the pathoetiology of schizophrenia, including interactions with genetic components, but these mechanisms remain unclear. Analyzing schizophrenia dysfunction using transcriptomic approaches is a paradigm still in its infancy, and fewer studies still have examined non neurological contributions to schizophrenia pathology with next generation sequencing technologies. This pilot study uses several …


Gene Network Understanding And Analysis, Maria E. Somoza May 2016

Gene Network Understanding And Analysis, Maria E. Somoza

Theses

Gene regulatory network (GRN) is a collection of regulators that interact with each other in the cell to govern the gene expression levels of mRNA and proteins. These regulators can either be DNA, RNA, protein and their complex. Transcriptional gene regulation is an important mechanisms in which an in-depth study can lead to various practical applications, and a greater understanding of how organisms control their cellular behavior. One of the most widely studied organisms in gene regulatory networks are the Mycobacterium tuberculosis and Corynebacterium glutamicum ATCC 13032.

Gene co-expression networks are of biological interests due to co-expressed genes which are …


Uusing The Kdj As A Trading Strategy On Biotech Companies, Shijie Zha May 2016

Uusing The Kdj As A Trading Strategy On Biotech Companies, Shijie Zha

Theses

Mean Reversion is the most commonly used model in quantitative trading. This model is associated with several factors, like ma5 and ma10 line. These factors are the most significant in stock markets. However, the disadvantages of this model are lag and inaccuracy.

In this research, we get the historical and current stock data by web crawler, analyze the quantitative data and build a new model involved with the KDJ. Taking biotech companies marketed in the United States and B-share marketed in China as the research subjects, the result shows increased profits compared with the Mean Reversion model. It also shows …


Unsupervised Gene Regulatory Network Inference On Microarray Data, Nidhi Radia May 2015

Unsupervised Gene Regulatory Network Inference On Microarray Data, Nidhi Radia

Theses

Obtaining gene regulatory networks (GRNs) from expression data is a challenging and crucial task. Many computational methods and algorithms have been developed to infer gene networks for gene expression data, which are usually obtained from microarray experiments. A gene network is a method to depict the relation among clusters of genes. To infer gene networks, the unsupervised method is used in this study. The two types of data used are time-series data and steady-state data. The data is analyzed using various tools containing different algorithms and concepts. GRNs from time-series data tools are obtained using the Time-delayed Algorithm for the …


Exact Genome Alignment, Nandini Ghosh May 2015

Exact Genome Alignment, Nandini Ghosh

Theses

The increase in the volume of genomic data due to the decrease in the cost of whole genome sequencing techniques has opened up new avenues of research in the field of Bioinformatics, like comparative genomics and evolutionary dynamics. The fundamental task in these studies is to align the genome sequences accurately. Sequence alignment helps to identify regions of similarity between the sequences to establish their functional, evolutionary and structural relationship. The thesis investigates the performance of two sequence alignment programs LASTZ, a hash table based faster method and SSEARCH, a slower but more rigorous Smith-Waterman based approach, on whole genome …


Identifying Modifier Genes In Sma Model Mice, Weiting Xu May 2015

Identifying Modifier Genes In Sma Model Mice, Weiting Xu

Theses

Spinal Muscular Atrophy (SMA) involves the loss of nerve cells called motor neurons in the spinal cord and is classified as a motor neuron disease, it affects 1 in 5000-10000 newborns, one of the leading genetic causes of infant death in USA. Mutations in the SMN1, UBA1, DYNC1H1 and VAPB genes cause spinal muscular atrophy. Extra copies of the SMN2 gene modify the severity of spinal muscular atrophy. Mutations in SMN1 (Motor Neuron 1) mainly causes SMA (Autosomal recessive inheritance). SMN1 gene mutations lead to a shortage of the SMN protein and SMN protein forms SMN complex …


Rice And Mouse Quantitative Phenotype Prediction In Genome-Wide Association Studies With Support Vector Regression, Abdulrhman Fahad M. Aljouie Jan 2015

Rice And Mouse Quantitative Phenotype Prediction In Genome-Wide Association Studies With Support Vector Regression, Abdulrhman Fahad M. Aljouie

Theses

Quantitative phenotypes prediction from genotype data is significant for pathogenesis, crop yields, and immunity tests. The scientific community conducted many studies to find unobserved quantitative phenotype high predictive ability models. Early genome-wide association studies (GWAS) focused on genetic variants that are associated with disease or phenotype, however, these variants manly covers small portion of the whole genetic variance, and therefore, the effectiveness of predictions obtained using this information may possibly be circumscribed [ 1 ].

Instead, this study shows prediction ability from whole genome single nucleotide polymorphisms (SNPs) data of 1940 genotyped stoke mouse with - 12k SNPs, and 413 …


Cancer Risk Prediction With Next Generation Sequencing Data Using Machine Learning, Nihir Patel Jan 2015

Cancer Risk Prediction With Next Generation Sequencing Data Using Machine Learning, Nihir Patel

Theses

The use of computational biology for next generation sequencing (NGS) analysis is rapidly increasing in genomics research. However, the effectiveness of NGS data to predict disease abundance is yet unclear. This research investigates the problem in the whole exome NGS data of the chronic lymphocytic leukemia (CLL) available at dbGaP. Initially, raw reads from samples are aligned to the human reference genome using burrows wheeler aligner. From the samples, structural variants, namely, Single Nucleotide Polymorphism (SNP) and Insertion Deletion (INDEL) are identified and are filtered using SAMtools as well as with Genome Analyzer Tool Kit (GATK). Subsequently, the variants are …


Risk Prediction With Genomic Data, Bharati Jadhav May 2014

Risk Prediction With Genomic Data, Bharati Jadhav

Theses

Genome wide association study (GWAS) is widely used with various machine learning algorithms to predict disease risk. This thesis investigates this widely used approach of GWAS using Single Nucleotide Polymorphism (SNP) genotype data and a novel approach of disease risk prediction with whole exome sequencing data, namely Whole Exome Wide Association Study (WEWAS). It further applies a discriminating machine learning algorithm, namely a Support Vector Machine (SVM) with different Kernel functions. For this study, only SNPs generated using genotyping technology, which focuses more on common variants, are used initially for disease prediction. Later, the whole exome data generated using Next …


Comparison Of Different Differential Expression Analysis Tools For Rna-Seq Data, Junfei Zhu Jan 2014

Comparison Of Different Differential Expression Analysis Tools For Rna-Seq Data, Junfei Zhu

Theses

In molecular biology research, RNA-seq is a relatively new method for transcriptome profiling. It utilizes the next generation sequencing technology to provide huge amount information about the variety and abundance of RNA present in an organism of interest at a specific state and a given time. One of the most important tasks of RNA-seq analysis is finding genes that are expressed differently in different subject groups. A lot of differential expression analysis tools for RNA-seq have been developed, but there is no golden standard in this field. In this research, four commonly used tools (DESeq, edgeR, limma, and cuffdiff) are …


Polyaseeker: A Computational Framework For Identifying Polyadenylation Cleavage Site From Rna-Seq, Xiao Ling May 2013

Polyaseeker: A Computational Framework For Identifying Polyadenylation Cleavage Site From Rna-Seq, Xiao Ling

Theses

Alternative polyadenylation (APA) of mRNA plays a crucial role for post-transcriptional gene regulation. Recently, advances in next generation sequencing technology have made it possible to efficiently characterize the transcriptome and identify the 3’end of polyadenylated RNAs. However, no comprehensive bioi nformatic pipelines have fulfilled this goal. The PolyASeeker, a computational framework for identifying polyadenylation cleavage sites from RNA-Seq data is proposed in this thesis. By using the simulated RNA-seq dataset, a novel method is developed to evaluate the performance of the proposed framework versus the traditional A-stretch approach, and compute accurate Precisions and Recalls that previous estimation could not get. …


Performance Comparison Of Five Rna-Seq Alignment Tools, Yuanpeng Lu May 2013

Performance Comparison Of Five Rna-Seq Alignment Tools, Yuanpeng Lu

Theses

Aligning millions of short reads to a reference genome is a critical task in high throughput sequencing. In recent years, a large number of mapping algorithms have been developed, all of which have in common that they align a vast number of reads to genomic or transcriptomic sequences. RNA-Seq data is discrete in nature, therefore with reasonable gene models and comparative metrics RNA-Seq data can be simulated to sufficient accuracy to enable meaningful benchmarking of alignment algorithms. To provide guidance in the choice of alignment algorithms, five different alignment tools for RNA-Seq data are evaluated. In order to compare the …


A Gpu Program To Compute Snp-Snp Interactions In Genome-Wide Association Studies, Srividya Ramakrishnan May 2013

A Gpu Program To Compute Snp-Snp Interactions In Genome-Wide Association Studies, Srividya Ramakrishnan

Theses

With the recent advances in the next generation sequencing technologies, short read sequences of human genome are made more accessible. Paired end sequencing of short reads is currently the most sensitive method for detecting somatic mutations that arise during tumor development. In this study, a novel approach to optimize the detection of structural variants using a new short read alignment program is presented.

Pairwise interaction effects of the Single Nucleotide Polymorphisms (SNPs) have proven to uncover the underlying complex disease traits. Computing the disease risk based on the interaction effects of SNPs on a case - control study is a …


Genome Wide Search For Pseudo Knotted Non-Coding Rnas, Meghana S. Vasavada May 2013

Genome Wide Search For Pseudo Knotted Non-Coding Rnas, Meghana S. Vasavada

Theses

Non-coding RNAs (ncRNAs) are the functional RNA molecules that are involved in many biological processes including gene regulation, chromosome replication and RNA modification. Searching genomes using computational methods has become an important asset for prediction and annotation of ncRNAs. To annotate an individual genome for a specific family of ncRNAs, a computational tool is interpreted to scan through the genome and align its sequence segments to some structure model for the ncRNA family. With the recent advances in detecting an ncRNA in the genome, heuristic techniques are designed to perform an accurate search and sequence-structure alignment. This study uses a …


Rna-Sequence Analysis Of Human Melanoma Cells, Jharna Miya May 2013

Rna-Sequence Analysis Of Human Melanoma Cells, Jharna Miya

Theses

RNA-sequencing refers to the use of high throughput sequencing technologies that are used to sequence cDNA in order to get the complete information of a sample’s RNA content. The objective of this study is to analyze this data in different aspects and to characterize gene expression. Besides this characterization, the data was also used to investigate the effect of sequencing depth on gene expression measurements.

This research focuses on quantitative measurement of expression levels of genes and their transcripts. In this study, complementary DNA fragments of cultured human melanoma cells are sequenced and a total of 139,501,106 million 200-bp reads …


A Comparative Analysis Of Machine Learning Algorithms For Genome Wide Association Studies, Neha Singh May 2012

A Comparative Analysis Of Machine Learning Algorithms For Genome Wide Association Studies, Neha Singh

Theses

Variations present in human genome play a vital role in the emergence of genetic disorders and abnormal traits. Single Nucleotide Polymorphism (SNP) is considered as the most common source of genetic variations. Genome Wide Association Studies (GWAS) probe these variations present in human population and find their association with complex genetic disorders. Now these days, recent advances in technology and drastic reduction in costs of Genome Wide Association Studies provide the opportunity to have a plethora of genomic data that delivers huge information of these variations to analyze. In fact, there is significant difference in pace of data generation and …


Phenotype Prediction And Feature Selection In Genome-Wide Association Studies, Andrew Roberts May 2012

Phenotype Prediction And Feature Selection In Genome-Wide Association Studies, Andrew Roberts

Theses

Genome wide association studies (GWAS) search for correlations between single nucleotide polymorphisms (SNPs) in a subject genome and an observed phenotype. GWAS can be used to generate models for predicting phenotype based on genotype, as well as aiding in identification of specific genes affecting the biological mechanism underlying the phenotype.

In this investigation, phenotype prediction models are constructed from GWAS training data and are evaluated for performance on test data. Three methods are used to rank SNPs by their correlation with the phenotype: the univariate Wald test, a multivariate, support vector machine (SVM) based technique, and a hybrid method where …


Data Mining Of Tetraloop-Tetraloop Receptors In Rna Xml Files, Sinan Ramazanoglu May 2012

Data Mining Of Tetraloop-Tetraloop Receptors In Rna Xml Files, Sinan Ramazanoglu

Theses

RNA (Ribonucleic acid) Motifs are tertiary structures that play an important role in the folding mechanism of the RNA molecule. The overall function of a RNA Motif depends on its specific bp (base pairs) sequence that constitutes the secondary structure. Data mining is a novel method in both discovering potential tertiary structures within DNA (Deoxyribonucleic acid), RNA, and protein molecules and storing the information in databases. The RNA Motif of interest is the tetraloop-tetraloop receptor, which is composed of a highly conserved 11 nt (nucleotide) sequence and a tetraloop with the generic form of GNRA (where N = any base …


Fast Program For Sequence Alignment Using Partition Function Posterior Probabilities, Meera Prasad May 2011

Fast Program For Sequence Alignment Using Partition Function Posterior Probabilities, Meera Prasad

Theses

The key requirements of a good sequence alignment tool are high accuracy and fast execution. The existing Probalign program is a highly accurate tool for sequence alignment of both proteins and nucleotides. However, the time for execution is fairly high. The focus is therefore, to reduce the running time of the existing version of Probalign, maintaining its current accuracy level.

The thesis conducts a detail analysis of the performance of Probalign to bring down the running time of the existing code. A modified version of Probalign, Version 1.4 is released. A new program for sequence alignment with faster computation is …