Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Bioinformatics

Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 121

Full-Text Articles in Physical Sciences and Mathematics

Comparative Analyses Of De Novo Transcriptome Assembly Pipelines For Diploid Wheat, Natasha Pavlovikj May 2022

Comparative Analyses Of De Novo Transcriptome Assembly Pipelines For Diploid Wheat, Natasha Pavlovikj

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Gene expression and transcriptome analysis are currently one of the main focuses of research for a great number of scientists. However, the assembly of raw sequence data to obtain a draft transcriptome of an organism is a complex multi-stage process usually composed of pre-processing, assembling, and post-processing. Each of these stages includes multiple steps such as data cleaning, error correction and assembly validation. Different combinations of steps, as well as different computational methods for the same step, generate transcriptome assemblies with different accuracy. Thus, using a combination that generates more accurate assemblies is crucial for any novel biological discoveries. Implementing …


Building A Learning Healthcare System: A Path To Optimizing Big Health Data To Inform Clinical Care Decisions, Danne Charlotte Emily Elbers Jan 2022

Building A Learning Healthcare System: A Path To Optimizing Big Health Data To Inform Clinical Care Decisions, Danne Charlotte Emily Elbers

Graduate College Dissertations and Theses

The explosive growth of data and computing power of the last decades has had large impacts on a myriad of domains, not in the least on one of society’s most complex systems: healthcare. In this work, a version of the resulting Learning Healthcare System (LHS) is explored and elements of it have been implemented and are in use at the Department of Veterans’ Affairs today. After an overview of what a LHS is and what it could be once executed in its full form, the chapters will describe in detail some of the individual elements and how they address cogs …


Novel Natural Language Processing Models For Medical Terms And Symptoms Detection In Twitter, Farahnaz Golrooy Motlagh Jan 2022

Novel Natural Language Processing Models For Medical Terms And Symptoms Detection In Twitter, Farahnaz Golrooy Motlagh

Browse all Theses and Dissertations

This dissertation focuses on disambiguation of language use on Twitter about drug use, consumption types of drugs, drug legalization, ontology-enhanced approaches, and prediction analysis of data-driven by developing novel NLP models. Three technical aims comprise this work: (a) leveraging pattern recognition techniques to improve the quality and quantity of crawled Twitter posts related to drug abuse; (b) using an expert-curated, domain-specific DsOn ontology model that improve knowledge extraction in the form of drug-to-symptom and drug-to-side effect relations; and (c) modeling the prediction of public perception of the drug’s legalization and the sentiment analysis of drug consumption on Twitter. We collected …


Enhancing Microbiome Host Disease Prediction With Variational Autoencoders, Celeste Manughian-Peter Aug 2021

Enhancing Microbiome Host Disease Prediction With Variational Autoencoders, Celeste Manughian-Peter

Computational and Data Sciences (MS) Theses

Advancements in genetic sequencing methods for microbiomes in recent decades have permitted the collection of taxonomic and functional profiles of microbial communities, accelerating the discovery of the functional aspects of the microbiome and generating an increased interest among clinicians in applying these techniques with patients. This advancement has coincided with software and hardware improvements in the field of machine learning and deep learning. Combined, these advancements implicate further potential for progress in disease diagnosis and treatment in humans. The ability to classify a human microbiome profile into a disease category, and additionally identify the differentiating factors within the profile between …


Spaceflight And The Differential Gene Expression Of Human Stem Cell-Derived Cardiomyocytes, Eugenie Zhu May 2021

Spaceflight And The Differential Gene Expression Of Human Stem Cell-Derived Cardiomyocytes, Eugenie Zhu

Master's Projects

The National Aeronautics and Space Administration (NASA) has performed many experiments on the International Space Station (ISS) to further understand how conditions in space can affect life on Earth. This project analyzed GLDS-258, a gene set from NASA’s GeneLab repository which examines the impact of microgravity on human induced pluripotent stem-cell-derived cardiomyocytes (hiPSC-CMs). While many datasets have been run through NASA’s RNA-Seq Consensus Pipeline (RCP) to study differential gene expression in space, a Homo sapiens dataset has yet to be analyzed using the RCP. The aim of this project was to run the first Homo sapiens dataset, GLDS-258, through the …


Using Deep Learning To Analyze Materials In Medical Images, Carson Molder May 2021

Using Deep Learning To Analyze Materials In Medical Images, Carson Molder

Computer Science and Computer Engineering Undergraduate Honors Theses

Modern deep learning architectures have become increasingly popular in medicine, especially for analyzing medical images. In some medical applications, deep learning image analysis models have been more accurate at predicting medical conditions than experts. Deep learning has also been effective for material analysis on photographs. We aim to leverage deep learning to perform material analysis on medical images. Because material datasets for medicine are scarce, we first introduce a texture dataset generation algorithm that automatically samples desired textures from annotated or unannotated medical images. Second, we use a novel Siamese neural network called D-CNN to predict patch similarity and build …


Trunctrimmer: A First Step Towards Automating Standard Bioinformatic Analysis, Z. Gunner Lawless, Dana Dittoe, Dale R. Thompson, Steven C. Ricke May 2021

Trunctrimmer: A First Step Towards Automating Standard Bioinformatic Analysis, Z. Gunner Lawless, Dana Dittoe, Dale R. Thompson, Steven C. Ricke

Computer Science and Computer Engineering Undergraduate Honors Theses

Bioinformatic analysis is a time-consuming process for labs performing research on various microbiomes. Researchers use tools like Qiime2 to help standardize the bioinformatic analysis methods, but even large, extensible platforms like Qiime2 have drawbacks due to the attention required by researchers. In this project, we propose to automate additional standard lab bioinformatic procedures by eliminating the existing manual process of determining the trim and truncate locations for paired end 2 sequences. We introduce a new Qiime2 plugin called TruncTrimmer to automate the process that usually requires the researcher to make a decision on where to trim and truncate manually after …


Analysis Of Subtelomeric Rextal Assemblies Using Quast, Tunazzina Islam, Desh Ranjan, Mohammad Zubair, Eleanor Young, Ming Xiao, Harold Riethman Jan 2021

Analysis Of Subtelomeric Rextal Assemblies Using Quast, Tunazzina Islam, Desh Ranjan, Mohammad Zubair, Eleanor Young, Ming Xiao, Harold Riethman

Computer Science Faculty Publications

Genomic regions of high segmental duplication content and/or structural variation have led to gaps and misassemblies in the human reference sequence, and are refractory to assembly from whole-genome short-read datasets. Human subtelomere regions are highly enriched in both segmental duplication content and structural variations, and as a consequence are both impossible to assemble accurately and highly variable from individual to individual. Recently, we developed a pipeline for improved region-specific assembly called Regional Extension of Assemblies Using Linked-Reads (REXTAL). In this study, we evaluate REXTAL and genome-wide assembly (Supernova) approaches on 10X Genomics linked-reads data sets partitioned and barcoded using the …


The Role Of Software Engineering In Bioinformatics, Brendan Sean Lawlor Jan 2021

The Role Of Software Engineering In Bioinformatics, Brendan Sean Lawlor

Theses

This thesis proposes that by applying state-of-the-art software engineering tools, techniques and frameworks to currently recognised challenges in bioinformatics, improved outcomes can be attained in that field. It begins by decomposing software engineering into two categories, namely process and architecture, and choosing two key challenges in the practice of bioinformatics: reproducibility and scalability. The body of the thesis is an exploration of the intersection between these two software engineering categories and these two bioinformatics challenges. The question is asked: Can best practices in professional software engineering be applied to address key issues in the bioinformatics domain, creating positive outcomes? And …


An Automated Method To Enrich And Expand Consumer Health Vocabularies Using Glove Word Embeddings, Mohammed Ibrahim Jan 2021

An Automated Method To Enrich And Expand Consumer Health Vocabularies Using Glove Word Embeddings, Mohammed Ibrahim

Graduate Theses and Dissertations

Clear language makes communication easier between any two parties. However, a layman may have difficulty communicating with a professional due to not understanding the specialized terms common to the domain. In healthcare, it is rare to find a layman knowledgeable in medical jargon, which can lead to poor understanding of their condition and/or treatment. To bridge this gap, several professional vocabularies and ontologies have been created to map laymen medical terms to professional medical terms and vice versa. Many of the presented vocabularies are built manually or semi-automatically requiring large investments of time and human effort and consequently the slow …


Ensemble Protein Inference Evaluation, Kyle Lee Lucke Jan 2021

Ensemble Protein Inference Evaluation, Kyle Lee Lucke

Graduate Student Theses, Dissertations, & Professional Papers

The Protein inference problem is becoming an increasingly important tool that aids in the characterization of complex proteomes and analysis of complex protein samples. In bottom-up shotgun proteomics experiments the metrics for evaluation (like AUC and calibration error) are based on an often imperfect target-decoy database. These metrics make the inherent assumption that all of the proteins in the target set are present in the sample being analyzed. In general, this is not the case, they are typically a mix of present and absent proteins. To objectively evaluate inference methods, protein standard datasets are used. These datasets are special in …


Soda: An Open-Source Library For Visualizing Biological Sequence Annotation, Jack W. Roddy, Travis J. Wheeler Jan 2021

Soda: An Open-Source Library For Visualizing Biological Sequence Annotation, Jack W. Roddy, Travis J. Wheeler

Graduate Student Theses, Dissertations, & Professional Papers

Genome annotation is the process of identifying and labeling known genetic sequences or features within a genome. Across the various subfields within modern molecular biology, there is a common need for the visualization of such annotations. Genomic data is often visualized on web browser platforms, providing users with easy access to visualization tools without the need for installing any software or, in many cases, underlying datasets. While there exists a broad range of web-based visualization tools, there is, to my knowledge, no lightweight, modern library tailored towards the visualization of genomic data. Instead, developers charged with the task of producing …


Development Of Computational Tools To Target Microrna, Luo Song Dec 2020

Development Of Computational Tools To Target Microrna, Luo Song

Dissertations & Theses (Open Access)

MicroRNAs (a.k.a, miRNAs) play an important role in disease development. However, few of their structures have been determined and structure-based computational methods remain challenging in accurately predicting their interactions with small molecules. To address this issue, my thesis is to develop integrated approaches to screening for novel inhibitors by targeting specific structure motifs in miRNAs. The project starts with implementing a tool to find potential miRNA targets with desired motifs. I combined both sequence information of miRNAs and known RNA structure data from Protein Data Bank (PDB) to predict the miRNA structure and identify the motif to target, then I …


New Methods For Deep Learning Based Real-Valued Inter-Residue Distance Prediction, Jacob Barger Nov 2020

New Methods For Deep Learning Based Real-Valued Inter-Residue Distance Prediction, Jacob Barger

Theses

Background: Much of the recent success in protein structure prediction has been a result of accurate protein contact prediction--a binary classification problem. Dozens of methods, built from various types of machine learning and deep learning algorithms, have been published over the last two decades for predicting contacts. Recently, many groups, including Google DeepMind, have demonstrated that reformulating the problem as a multi-class classification problem is a more promising direction to pursue. As an alternative approach, we recently proposed real-valued distance predictions, formulating the problem as a regression problem. The nuances of protein 3D structures make this formulation appropriate, allowing predictions …


Formal Concept Analysis Applications In Bioinformatics, Sarah Roscoe Nov 2020

Formal Concept Analysis Applications In Bioinformatics, Sarah Roscoe

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Bioinformatics is an important field that seeks to solve biological problems with the help of computation. One specific field in bioinformatics is that of genomics, the study of genes and their functions. Genomics can provide valuable analysis as to the interaction between how genes interact with their environment. One such way to measure the interaction is through gene expression data, which determines whether (and how much) a certain gene activates in a situation. Analyzing this data can be critical for predicting diseases or other biological reactions. One method used for analysis is Formal Concept Analysis (FCA), a computing technique based …


Characterizing The Behavior Of Mutated Proteins With Emcap: The Energy Minimization Curve Analysis Pipeline, Matthew Lee, Bodi Van Roy, Filip Jagodzinski Oct 2020

Characterizing The Behavior Of Mutated Proteins With Emcap: The Energy Minimization Curve Analysis Pipeline, Matthew Lee, Bodi Van Roy, Filip Jagodzinski

WWU Honors College Senior Projects

Studies of protein mutants in wet laboratory experiments are expensive and time consuming. Computational experiments that simulate the motions of protein with amino acid substitutions can complement wet lab experiments for studying the effects of mutations. In this work we present a computational pipeline that performs exhaustive single-point amino acid substitutions in silico. We perform energy minimization as part of molecular dynamics (MD) of our generated mutant proteins, and the wild type, and log the energy potentials for each step of the simulations. We motivate several metrics that rely on the energy minimization curves of the wild type and mutant, …


Machine Learning With Digital Signal Processing For Rapid And Accurate Alignment-Free Genome Analysis: From Methodological Design To A Covid-19 Case Study, Gurjit Singh Randhawa Jun 2020

Machine Learning With Digital Signal Processing For Rapid And Accurate Alignment-Free Genome Analysis: From Methodological Design To A Covid-19 Case Study, Gurjit Singh Randhawa

Electronic Thesis and Dissertation Repository

In the field of bioinformatics, taxonomic classification is the scientific practice of identifying, naming, and grouping of organisms based on their similarities and differences. The problem of taxonomic classification is of immense importance considering that nearly 86% of existing species on Earth and 91% of marine species remain unclassified. Due to the magnitude of the datasets, the need exists for an approach and software tool that is scalable enough to handle large datasets and can be used for rapid sequence comparison and analysis. We propose ML-DSP, a stand-alone alignment-free software tool that uses Machine Learning and Digital Signal Processing to …


A Survey Of Feature Extraction And Fusion Of Deep Learning For Detection Of Abnormalities In Video Endoscopy Of Gastrointestinal-Tract, Hussam Ali, Muhammad Sharif, Mussarat Yasmin, Mubashir Husain Rehmani, Farhan Riaz Apr 2020

A Survey Of Feature Extraction And Fusion Of Deep Learning For Detection Of Abnormalities In Video Endoscopy Of Gastrointestinal-Tract, Hussam Ali, Muhammad Sharif, Mussarat Yasmin, Mubashir Husain Rehmani, Farhan Riaz

Publications

A standard screening procedure involves video endoscopy of the Gastrointestinal tract. It is a less invasive method which is practiced for early diagnosis of gastric diseases. Manual inspection of a large number of gastric frames is an exhaustive, time-consuming task, and requires expertise. Conversely, several computer-aided diagnosis systems have been proposed by researchers to cope with the dilemma of manual inspection of the massive volume of frames. This article gives an overview of different available alternatives for automated inspection, detection, and classification of various GI abnormalities. Also, this work elaborates techniques associated with content-based image retrieval and automated systems for …


Color-Based Template Selection For Detection Of Gastric Abnormalities In Video Endoscopy, Hussam Ali, Muhammad Sharif, Mussarat Yasmin, Mubashir Husain Rehmani Feb 2020

Color-Based Template Selection For Detection Of Gastric Abnormalities In Video Endoscopy, Hussam Ali, Muhammad Sharif, Mussarat Yasmin, Mubashir Husain Rehmani

Publications

Computer-aided diagnosis of gastric diseases from endoscopy frames is an important task. It facilitates both the patient and gastroenterologist in terms of time, money and most important health. Colors are the basic visual features of endoscopic images and also provide clues about abnormal regions in endoscopy frames. A variety of color spaces available for representation of color frames. However, we are not certain about which color space is more suitable for representing color features of gastric images. This paper presents a comparison of color features in different color spaces for detection of abnormal areas in chromoendoscopy (CH) frames. In addition, …


Using Cuda To Enhance Data Processing Of Variant Call Format Files For Statistical Genetic Analysis, Heather Mckinnon Jan 2020

Using Cuda To Enhance Data Processing Of Variant Call Format Files For Statistical Genetic Analysis, Heather Mckinnon

All Graduate Projects

Utilizing the power of GPU parallel processing with CUDA can speed up the processing of Variant Call Format (VCF) files and statistical analysis of genomic data. A software package designed toward this purpose would be beneficial to genetic researchers by saving them time which they could spend on other aspects of their research. A data set containing genetics from a study of trichome production in Mimulus guttatus, or yellow monkey flower, was used to develop a package to test the effectiveness of GPU parallel processing versus serial executions. After a serial version of the code was generated and benchmarked, OpenACC …


Towards Personalized Medicine: Computational Approaches For Drug Repurposing And Cell Type Identification, Azam Peyvandipour Jan 2020

Towards Personalized Medicine: Computational Approaches For Drug Repurposing And Cell Type Identification, Azam Peyvandipour

Wayne State University Dissertations

The traditional drug discovery process is extremely slow and costly. More than 90% of drugs fail to pass beyond the early stage of development and toxicity tests, and many of the drugs that go through early phases of the clinical trials fail because of adverse reactions, side effects, or lack of efficiency. In spite of unprecedented investments in research and development (R&D), the number of new FDA-approved drugs remains low, reflecting the limitations of the current R&D model.

In this context, finding new disease indications for existing drugs sidesteps these issues and can therefore increase the available therapeutic choices at …


9th Annual Postdoctoral Science Symposium, University Of Texas Md Anderson Cancer Center Postdoctoral Association Sep 2019

9th Annual Postdoctoral Science Symposium, University Of Texas Md Anderson Cancer Center Postdoctoral Association

Annual Postdoctoral Science Symposium Abstracts

The mission of the Annual Postdoctoral Science Symposium (APSS) is to provide a platform for talented postdoctoral fellows throughout the Texas Medical Center to present their work to a wider audience. The MD Anderson Postdoctoral Association convened its inaugural Annual Postdoctoral Science Symposium (APSS) on August 4, 2011.

The APSS provides a professional venue for postdoctoral scientists to develop, clarify, and refine their research as a result of formal reviews and critiques of faculty and other postdoctoral scientists. Additionally, attendees discuss current research on a broad range of subjects while promoting academic interactions and enrichment and developing new collaborations.


Model-Based Deep Autoencoders For Characterizing Discrete Data With Application To Genomic Data Analysis, Tian Tian May 2019

Model-Based Deep Autoencoders For Characterizing Discrete Data With Application To Genomic Data Analysis, Tian Tian

Dissertations

Deep learning techniques have achieved tremendous successes in a wide range of real applications in recent years. For dimension reduction, deep neural networks (DNNs) provide a natural choice to parameterize a non-linear transforming function that maps the original high dimensional data to a lower dimensional latent space. Autoencoder is a kind of DNNs used to learn efficient feature representation in an unsupervised manner. Deep autoencoder has been widely explored and applied to analysis of continuous data, while it is understudied for characterizing discrete data. This dissertation focuses on developing model-based deep autoencoders for modeling discrete data. A motivating example of …


Poriferal Vision, Saketh Saxena May 2019

Poriferal Vision, Saketh Saxena

Master's Projects

Sponges provide nourishment as well as a habitat for various aquatic organisms. Anatomically, sponges are made up of soft tissue with a silica based exoskeleton which serves both as support and protection for the underlying tissue. The exoskeleton persists after the tissue decomposes, and microscopic parts of the exoskeleton break away to form spicules. Oceanographic studies have shown that the density of the sponge spicules is a good indicator of the sponge population in an area. This measure can be used to study sponge population dynamics over time. The spicule density is measured by imaging spicules from samples of water …


Simplicity Diffexpress: A Bespoke Cloud-Based Interface For Rna-Seq Differential Expression Modeling And Analysis, Cintia C. Palu, Marcelo Ribeiro-Alves, Yanxin Wu, Brendan Lawlor, Pavel V. Baranov, Brian Kelly, Paul Walsh May 2019

Simplicity Diffexpress: A Bespoke Cloud-Based Interface For Rna-Seq Differential Expression Modeling And Analysis, Cintia C. Palu, Marcelo Ribeiro-Alves, Yanxin Wu, Brendan Lawlor, Pavel V. Baranov, Brian Kelly, Paul Walsh

Department of Computer Science Publications

One of the key challenges for transcriptomics-based research is not only the processing of large data but also modeling the complexity of features that are sources of variation across samples, which is required for an accurate statistical analysis. Therefore, our goal is to foster access for wet lab researchers to bioinformatics tools, in order to enhance their ability to explore biological aspects and validate hypotheses with robust analysis. In this context, user-friendly interfaces can enable researchers to apply computational biology methods without requiring bioinformatics expertise. Such bespoke platforms can improve the quality of the findings by allowing the researcher to …


Designing Computational Biology Workflows With Perl - Part 1, Esma Yildirim May 2019

Designing Computational Biology Workflows With Perl - Part 1, Esma Yildirim

Open Educational Resources

This material introduces Linux File System structures and demonstrates how to use commands to communicate with the operating system through a Terminal program. Basic program structures and system() function of Perl are discussed. A brief introduction to gene-sequencing terminology and file formats are given.


Designing Computational Biology Workflows With Perl - Part 1, Esma Yildirim May 2019

Designing Computational Biology Workflows With Perl - Part 1, Esma Yildirim

Open Educational Resources

This material introduces the AWS console interface, describes how to create an instance on AWS with the VMI provided, connect to that machine instance using the SSH protocol. Once connected, it requires the students to write a script to enter the data folder, which includes gene-sequencing input files and print the first five line of each file remotely. The same exercise can be applied if the VMI is installed on a local machine using virtualization software (e.g. Oracle VirtualBox). In this case, the Terminal program of the VMI can be used to do the exercise.


Designing Computational Biology Workflows With Perl - Part 2, Esma Yildirim May 2019

Designing Computational Biology Workflows With Perl - Part 2, Esma Yildirim

Open Educational Resources

This material introduces the AWS console interface, describes how to create an instance on AWS with the VMI provided and connect to that machine instance using the SSH protocol. Once connected, it requires the students to write a script to automate the tasks to create VCF files from two different sample genomes belonging to E.coli microorganisms by using the FASTA and FASTQ files in the input folder of the virtual machine. The same exercise can be applied if the VMI is installed on a local machine using virtualization software (e.g. Oracle VirtualBox). In this case, the Terminal program of the …


Designing Computational Biology Workflows With Perl - Part 2, Esma Yildirim May 2019

Designing Computational Biology Workflows With Perl - Part 2, Esma Yildirim

Open Educational Resources

This material briefly reintroduces the DNA double Helix structure, explains SNP and INDEL mutations in genes and describes FASTA, FASTQ, BAM and VCF file formats. It also explains the index creation, alignment, sorting, marking duplicates and variant calling steps of a simple preprocessing workflow and how to write a Perl script to automate the execution of these steps on a Virtual Machine Image.


Designing Computational Biology Workflows With Perl - Part 1 & 2, Esma Yildirim May 2019

Designing Computational Biology Workflows With Perl - Part 1 & 2, Esma Yildirim

Open Educational Resources

This manual guides the instructor to combine the partial files of the virtual machine image and construct sequencer.ova file. It is accompanied by the partial files of the virtual machine image.