Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

2021

Bioinformatics

Discipline
Institution
Publication
Publication Type

Articles 1 - 16 of 16

Full-Text Articles in Physical Sciences and Mathematics

The Bioinformatics Virtual Coordination Network: An Open-Source And Interactive Learning Environment, Benjamin J. Tully, Joy Buongiorno, Ashley B. Cohen, Jacob A. Cram, Arkadiy I. Garber, Sarah K. Hu, Arianna I. Krinos, Philip T. Leftwich, Alexis J. Marshall, Ella T. Sieradzki, Daan R. Speth, Elizabeth A. Suter, Christopher B. Trivedi, Luis E. Valentin-Alvarado, Jake L. Weissman, Bvcn Instructor Consortium Oct 2021

The Bioinformatics Virtual Coordination Network: An Open-Source And Interactive Learning Environment, Benjamin J. Tully, Joy Buongiorno, Ashley B. Cohen, Jacob A. Cram, Arkadiy I. Garber, Sarah K. Hu, Arianna I. Krinos, Philip T. Leftwich, Alexis J. Marshall, Ella T. Sieradzki, Daan R. Speth, Elizabeth A. Suter, Christopher B. Trivedi, Luis E. Valentin-Alvarado, Jake L. Weissman, Bvcn Instructor Consortium

Faculty Works: Biology, Chemistry, and Environmental Studies

Lockdowns and “stay-at-home” orders, starting in March 2020, shuttered bench and field dependent research across the world as a consequence of the global COVID-19 pandemic. The pandemic continues to have an impact on research progress and career development, especially for graduate students and early career researchers, as strict social distance limitations stifle ongoing research and impede in-person educational programs. The goal of the Bioinformatics Virtual Coordination Network (BVCN) was to reduce some of these impacts by helping research biologists learn new skills and initiate computational projects as alternative ways to carry out their research. The BVCN was founded in April …


Enhancing Microbiome Host Disease Prediction With Variational Autoencoders, Celeste Manughian-Peter Aug 2021

Enhancing Microbiome Host Disease Prediction With Variational Autoencoders, Celeste Manughian-Peter

Computational and Data Sciences (MS) Theses

Advancements in genetic sequencing methods for microbiomes in recent decades have permitted the collection of taxonomic and functional profiles of microbial communities, accelerating the discovery of the functional aspects of the microbiome and generating an increased interest among clinicians in applying these techniques with patients. This advancement has coincided with software and hardware improvements in the field of machine learning and deep learning. Combined, these advancements implicate further potential for progress in disease diagnosis and treatment in humans. The ability to classify a human microbiome profile into a disease category, and additionally identify the differentiating factors within the profile between …


Visualizing Amino Acid Substitutions In A Physicochemical Vector Space, Louis R. Nemzer Jul 2021

Visualizing Amino Acid Substitutions In A Physicochemical Vector Space, Louis R. Nemzer

Chemistry and Physics Faculty Articles

A three-dimensional representation of the twenty proteinogenic amino acids in a physicochemical space is presented. Vectors corresponding to amino acid substitutions are classified based on whether they are accessible via a single-nucleotide mutation. It is shown that the standard genetic code establishes a “choice architecture” that permits nearly independent tuning of the properties related with size and those related with hydrophobicity. This work sheds light on the metarules of evolvability that may have shaped the standard genetic code to increase the probability that adaptive point mutations will be generated. An illustration of the usefulness of visualizing amino acid substitutions in …


Spaceflight And The Differential Gene Expression Of Human Stem Cell-Derived Cardiomyocytes, Eugenie Zhu May 2021

Spaceflight And The Differential Gene Expression Of Human Stem Cell-Derived Cardiomyocytes, Eugenie Zhu

Master's Projects

The National Aeronautics and Space Administration (NASA) has performed many experiments on the International Space Station (ISS) to further understand how conditions in space can affect life on Earth. This project analyzed GLDS-258, a gene set from NASA’s GeneLab repository which examines the impact of microgravity on human induced pluripotent stem-cell-derived cardiomyocytes (hiPSC-CMs). While many datasets have been run through NASA’s RNA-Seq Consensus Pipeline (RCP) to study differential gene expression in space, a Homo sapiens dataset has yet to be analyzed using the RCP. The aim of this project was to run the first Homo sapiens dataset, GLDS-258, through the …


Gene Selection And Classification In High-Throughput Biological Data With Integrated Machine Learning Algorithms And Bioinformatics Approaches, Abhijeet R Patil May 2021

Gene Selection And Classification In High-Throughput Biological Data With Integrated Machine Learning Algorithms And Bioinformatics Approaches, Abhijeet R Patil

Open Access Theses & Dissertations

With the rise of high throughput technologies in biomedical research, large volumes of expression profiling, methylation profiling, and RNA-sequencing data are being generated. These high-dimensional data have large number of features with small number of samples, a characteristic called the "curse of dimensionality." The selection of optimal features, which largely affects the performance of classification algorithms in machine learning models, has led to challenging problems in bioinformatics analyses of such high-dimensional datasets. In this work, I focus on the design of two-stage frameworks of feature selection and classification and their applications in multiple sets of colorectal cancer data. The first …


Simulation Of The Interaction Between Striated Muscle Unc-45 And Transcription Factor Gata-4, Drake Alexander Duncan May 2021

Simulation Of The Interaction Between Striated Muscle Unc-45 And Transcription Factor Gata-4, Drake Alexander Duncan

Electronic Theses and Dissertations

Striated Muscle UNC-45, also known as UNC-45b, is an important protein that acts as a chaperone for myosin in cardiac and skeletal muscles, binding to myosin at its C-terminal UCS domain and regulating its assembly into thick filaments and sarcomeric structures. The UCS domain contains a large loop that is believed to be the first point of interaction between myosin and UNC-45b. GATA-4 is an essential transcription factor that facilitates transcription of several genes in cardiac development, particularly alpha-heavy chain myosin in heart tissue. Recently, studies have shown that there is interaction of GATA-4 with UNC-45b and that GATA-4 binds …


Using Deep Learning To Analyze Materials In Medical Images, Carson Molder May 2021

Using Deep Learning To Analyze Materials In Medical Images, Carson Molder

Computer Science and Computer Engineering Undergraduate Honors Theses

Modern deep learning architectures have become increasingly popular in medicine, especially for analyzing medical images. In some medical applications, deep learning image analysis models have been more accurate at predicting medical conditions than experts. Deep learning has also been effective for material analysis on photographs. We aim to leverage deep learning to perform material analysis on medical images. Because material datasets for medicine are scarce, we first introduce a texture dataset generation algorithm that automatically samples desired textures from annotated or unannotated medical images. Second, we use a novel Siamese neural network called D-CNN to predict patch similarity and build …


Trunctrimmer: A First Step Towards Automating Standard Bioinformatic Analysis, Z. Gunner Lawless, Dana Dittoe, Dale R. Thompson, Steven C. Ricke May 2021

Trunctrimmer: A First Step Towards Automating Standard Bioinformatic Analysis, Z. Gunner Lawless, Dana Dittoe, Dale R. Thompson, Steven C. Ricke

Computer Science and Computer Engineering Undergraduate Honors Theses

Bioinformatic analysis is a time-consuming process for labs performing research on various microbiomes. Researchers use tools like Qiime2 to help standardize the bioinformatic analysis methods, but even large, extensible platforms like Qiime2 have drawbacks due to the attention required by researchers. In this project, we propose to automate additional standard lab bioinformatic procedures by eliminating the existing manual process of determining the trim and truncate locations for paired end 2 sequences. We introduce a new Qiime2 plugin called TruncTrimmer to automate the process that usually requires the researcher to make a decision on where to trim and truncate manually after …


In Silicoidentification Of Toxins And Their Effect Onhost Pathways: Feature Extraction, Classificationand Pathway Prediction., Rishika Sen Dr. Jan 2021

In Silicoidentification Of Toxins And Their Effect Onhost Pathways: Feature Extraction, Classificationand Pathway Prediction., Rishika Sen Dr.

Doctoral Theses

Identification of toxins, which are either proteins or small molecules, from pathogens is of paramount importance due to their crucial role as first-line invaders infiltrating a host, often leading to infection of the host. These toxins can affect specific proteins, like enzymes that catalyze metabolic pathways, affect metabolites that form the basis of metabolic reactions, and prevent the progression of those pathways, or more generally they may affect the regular functioning of other proteins in signaling pathways in the host. In this regard, the thesis addresses the problem of identification of toxins, and the effect of perturbations by toxins on …


Machine Learning And Bioinformatic Insights Into Key Enzymes For A Bio-Based Circular Economy, Japheth E. Gado Jan 2021

Machine Learning And Bioinformatic Insights Into Key Enzymes For A Bio-Based Circular Economy, Japheth E. Gado

Theses and Dissertations--Chemical and Materials Engineering

The world is presently faced with a sustainability crisis; it is becoming increasingly difficult to meet the energy and material needs of a growing global population without depleting and polluting our planet. Greenhouse gases released from the continuous combustion of fossil fuels engender accelerated climate change, and plastic waste accumulates in the environment. There is need for a circular economy, where energy and materials are renewably derived from waste items, rather than by consuming limited resources. Deconstruction of the recalcitrant linkages in natural and synthetic polymers is crucial for a circular economy, as deconstructed monomers can be used to manufacture …


The Role Of Software Engineering In Bioinformatics, Brendan Sean Lawlor Jan 2021

The Role Of Software Engineering In Bioinformatics, Brendan Sean Lawlor

Theses

This thesis proposes that by applying state-of-the-art software engineering tools, techniques and frameworks to currently recognised challenges in bioinformatics, improved outcomes can be attained in that field. It begins by decomposing software engineering into two categories, namely process and architecture, and choosing two key challenges in the practice of bioinformatics: reproducibility and scalability. The body of the thesis is an exploration of the intersection between these two software engineering categories and these two bioinformatics challenges. The question is asked: Can best practices in professional software engineering be applied to address key issues in the bioinformatics domain, creating positive outcomes? And …


An Automated Method To Enrich And Expand Consumer Health Vocabularies Using Glove Word Embeddings, Mohammed Ibrahim Jan 2021

An Automated Method To Enrich And Expand Consumer Health Vocabularies Using Glove Word Embeddings, Mohammed Ibrahim

Graduate Theses and Dissertations

Clear language makes communication easier between any two parties. However, a layman may have difficulty communicating with a professional due to not understanding the specialized terms common to the domain. In healthcare, it is rare to find a layman knowledgeable in medical jargon, which can lead to poor understanding of their condition and/or treatment. To bridge this gap, several professional vocabularies and ontologies have been created to map laymen medical terms to professional medical terms and vice versa. Many of the presented vocabularies are built manually or semi-automatically requiring large investments of time and human effort and consequently the slow …


Ensemble Protein Inference Evaluation, Kyle Lee Lucke Jan 2021

Ensemble Protein Inference Evaluation, Kyle Lee Lucke

Graduate Student Theses, Dissertations, & Professional Papers

The Protein inference problem is becoming an increasingly important tool that aids in the characterization of complex proteomes and analysis of complex protein samples. In bottom-up shotgun proteomics experiments the metrics for evaluation (like AUC and calibration error) are based on an often imperfect target-decoy database. These metrics make the inherent assumption that all of the proteins in the target set are present in the sample being analyzed. In general, this is not the case, they are typically a mix of present and absent proteins. To objectively evaluate inference methods, protein standard datasets are used. These datasets are special in …


Analysis Of Subtelomeric Rextal Assemblies Using Quast, Tunazzina Islam, Desh Ranjan, Mohammad Zubair, Eleanor Young, Ming Xiao, Harold Riethman Jan 2021

Analysis Of Subtelomeric Rextal Assemblies Using Quast, Tunazzina Islam, Desh Ranjan, Mohammad Zubair, Eleanor Young, Ming Xiao, Harold Riethman

Computer Science Faculty Publications

Genomic regions of high segmental duplication content and/or structural variation have led to gaps and misassemblies in the human reference sequence, and are refractory to assembly from whole-genome short-read datasets. Human subtelomere regions are highly enriched in both segmental duplication content and structural variations, and as a consequence are both impossible to assemble accurately and highly variable from individual to individual. Recently, we developed a pipeline for improved region-specific assembly called Regional Extension of Assemblies Using Linked-Reads (REXTAL). In this study, we evaluate REXTAL and genome-wide assembly (Supernova) approaches on 10X Genomics linked-reads data sets partitioned and barcoded using the …


Soda: An Open-Source Library For Visualizing Biological Sequence Annotation, Jack W. Roddy, Travis J. Wheeler Jan 2021

Soda: An Open-Source Library For Visualizing Biological Sequence Annotation, Jack W. Roddy, Travis J. Wheeler

Graduate Student Theses, Dissertations, & Professional Papers

Genome annotation is the process of identifying and labeling known genetic sequences or features within a genome. Across the various subfields within modern molecular biology, there is a common need for the visualization of such annotations. Genomic data is often visualized on web browser platforms, providing users with easy access to visualization tools without the need for installing any software or, in many cases, underlying datasets. While there exists a broad range of web-based visualization tools, there is, to my knowledge, no lightweight, modern library tailored towards the visualization of genomic data. Instead, developers charged with the task of producing …


A Review And Evaluation Of Techniques For Improved Feature Detection In Mass Spectrometry Data, Annika R. Tostengard, Rob Smith Jan 2021

A Review And Evaluation Of Techniques For Improved Feature Detection In Mass Spectrometry Data, Annika R. Tostengard, Rob Smith

Graduate Student Theses, Dissertations, & Professional Papers

Mass spectrometry (MS) is used in analysis of chemical samples to identify the molecules present and their quantities. This analytical technique has applications in many fields, from pharmacology to space exploration. Its impacts on medicine are particularly significant, since MS aids in the identification of molecules associated with disease; for instance, in proteomics, MS allows researchers to identify proteins that are associated with autoimmune disorders, cancers, and other conditions. Since the applications are so wide-ranging and the tool is ubiquitous across so many fields, it is critical that the analytical methods used to collect data are sound.

Data analysis in …