Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Bioinformatics

2019

Institution
Keyword
Publication
Publication Type
File Type

Articles 1 - 29 of 29

Full-Text Articles in Computer Sciences

Cancer Risk Prediction With Whole Exome Sequencing And Machine Learning, Abdulrhman Fahad M Aljouie Dec 2019

Cancer Risk Prediction With Whole Exome Sequencing And Machine Learning, Abdulrhman Fahad M Aljouie

Dissertations

Accurate cancer risk and survival time prediction are important problems in personalized medicine, where disease diagnosis and prognosis are tuned to individuals based on their genetic material. Cancer risk prediction provides an informed decision about making regular screening that helps to detect disease at the early stage and therefore increases the probability of successful treatments. Cancer risk prediction is a challenging problem. Lifestyle, environment, family history, and genetic predisposition are some factors that influence the disease onset. Cancer risk prediction based on predisposing genetic variants has been studied extensively. Most studies have examined the predictive ability of variants in known …


Game-Assisted Rehabilitation For Post-Stroke Survivors, Hee-Tae Jung Oct 2019

Game-Assisted Rehabilitation For Post-Stroke Survivors, Hee-Tae Jung

Doctoral Dissertations

Stroke is a leading cause of permanent impairments among its survivors. Although patients need to go through intensive, longitudinal rehabilitation to regain function before the stroke, patients show poor engagement and adherence to rehabilitation therapies which hampers their recovery. As a means to enhance stroke survivors' motivation, engagement, and adherence to intensive and longitudinal rehabilitation, the use of games in stroke rehabilitation has received attention from research and clinical communities. In order to realize this, it is important to take a holistic, end-to-end research approach that encompasses 1) the development of game technologies that are not only entertaining but also …


Enhancing Timeliness Of Drug Overdose Mortality Surveillance: A Machine Learning Approach, Patrick J. Ward, Peter J. Rock, Svetla Slavova, April M. Young, Terry L. Bunn, Ramakanth Kavuluru Oct 2019

Enhancing Timeliness Of Drug Overdose Mortality Surveillance: A Machine Learning Approach, Patrick J. Ward, Peter J. Rock, Svetla Slavova, April M. Young, Terry L. Bunn, Ramakanth Kavuluru

Kentucky Injury Prevention and Research Center Faculty Publications

BACKGROUND: Timely data is key to effective public health responses to epidemics. Drug overdose deaths are identified in surveillance systems through ICD-10 codes present on death certificates. ICD-10 coding takes time, but free-text information is available on death certificates prior to ICD-10 coding. The objective of this study was to develop a machine learning method to classify free-text death certificates as drug overdoses to provide faster drug overdose mortality surveillance.

METHODS: Using 2017–2018 Kentucky death certificate data, free-text fields were tokenized and features were created from these tokens using natural language processing (NLP). Word, bigram, and trigram features were created …


Texture-Based Deep Neural Network For Histopathology Cancer Whole Slide Image (Wsi) Classification, Nelson Zange Tsaku Aug 2019

Texture-Based Deep Neural Network For Histopathology Cancer Whole Slide Image (Wsi) Classification, Nelson Zange Tsaku

Master of Science in Computer Science Theses

Automatic histopathological Whole Slide Image (WSI) analysis for cancer classification has been highlighted along with the advancements in microscopic imaging techniques. However, manual examination and diagnosis with WSIs is time-consuming and tiresome. Recently, deep convolutional neural networks have succeeded in histopathological image analysis. In this paper, we propose a novel cancer texture-based deep neural network (CAT-Net) that learns scalable texture features from histopathological WSIs. The innovation of CAT-Net is twofold: (1) capturing invariant spatial patterns by dilated convolutional layers and (2) Reducing model complexity while improving performance. Moreover, CAT-Net can provide discriminative texture patterns formed on cancerous regions of histopathological …


Effective Statistical Energy Function Based Protein Un/Structure Prediction, Avdesh Mishra Aug 2019

Effective Statistical Energy Function Based Protein Un/Structure Prediction, Avdesh Mishra

University of New Orleans Theses and Dissertations

Proteins are an important component of living organisms, composed of one or more polypeptide chains, each containing hundreds or even thousands of amino acids of 20 standard types. The structure of a protein from the sequence determines crucial functions of proteins such as initiating metabolic reactions, DNA replication, cell signaling, and transporting molecules. In the past, proteins were considered to always have a well-defined stable shape (structured proteins), however, it has recently been shown that there exist intrinsically disordered proteins (IDPs), which lack a fixed or ordered 3D structure, have dynamic characteristics and therefore, exist in multiple states. Based on …


Designing And Sample Size Calculation In Presence Of Heterogeneity In Biological Studies Involving High-Throughput Data., Sudhir Srivastava Aug 2019

Designing And Sample Size Calculation In Presence Of Heterogeneity In Biological Studies Involving High-Throughput Data., Sudhir Srivastava

Electronic Theses and Dissertations

The designing and determination of sample size are important for conducting high-throughput biological experiments such as proteomics experiments and RNA-Seq expression studies, thus leading to better understanding of complex mechanisms underlying various biological processes. The variations in the biological data or technical approaches to data collection lead to heterogeneity for the samples under study. We critically worked on the issues of technical and biological heterogeneity. The quantitative measurements based on liquid chromatography (LC) coupled with mass spectrometry (MS) often suffer from the problem of missing values (MVs) and data heterogeneity. We considered a proteomics data set generated from human kidney …


High Performance Computing Techniques To Better Understand Protein Conformational Space, Arpita Joshi Aug 2019

High Performance Computing Techniques To Better Understand Protein Conformational Space, Arpita Joshi

Graduate Doctoral Dissertations

This thesis presents an amalgamation of high performance computing techniques to get better insight into protein molecular dynamics. Key aspects of protein function and dynamics can be learned from their conformational space. Datasets that represent the complex nuances of a protein molecule are high dimensional. Efficient dimensionality reduction becomes indispensable for the analysis of such exorbitant datasets. Dimensionality reduction forms a formidable portion of this work and its application has been explored for other datasets as well. It begins with the parallelization of a known non-liner feature reduction algorithm called Isomap. The code for the algorithm was re-written in C …


Iamhappy: Towards An Iot Knowledge-Based Cross-Domain Well-Being Recommendation System For Everyday Happiness, Amelia Gyrard, Amit Sheth Jul 2019

Iamhappy: Towards An Iot Knowledge-Based Cross-Domain Well-Being Recommendation System For Everyday Happiness, Amelia Gyrard, Amit Sheth

Kno.e.sis Publications

Nowadays, healthy lifestyle, fitness, and diet habits have become central applications in our daily life. Positive psychology such as well-being and happiness is the ultimate dream of everyday people’s feelings (even without being aware of it). Wearable devices are being increasingly employed to support well-being and fitness. Those devices produce physiological signals that are analyzed by machines to understand emotions and physical state. The Internetof Things (IoT) technology connects (wearable) devices to the Internet to easily access and process data, even using Web technologies (aka Web of Things).

We design IAMHAPPY, an innovative IoT-based well-being recommendation system to encourage every …


High-Performance Computing Frameworks For Large-Scale Genome Assembly, Sayan Goswami Jun 2019

High-Performance Computing Frameworks For Large-Scale Genome Assembly, Sayan Goswami

LSU Doctoral Dissertations

Genome sequencing technology has witnessed tremendous progress in terms of throughput and cost per base pair, resulting in an explosion in the size of data. Typical de Bruijn graph-based assembly tools demand a lot of processing power and memory and cannot assemble big datasets unless running on a scaled-up server with terabytes of RAMs or scaled-out cluster with several dozens of nodes. In the first part of this work, we present a distributed next-generation sequence (NGS) assembler called Lazer, that achieves both scalability and memory efficiency by using partitioned de Bruijn graphs. By enhancing the memory-to-disk swapping and reducing the …


Model-Based Deep Autoencoders For Characterizing Discrete Data With Application To Genomic Data Analysis, Tian Tian May 2019

Model-Based Deep Autoencoders For Characterizing Discrete Data With Application To Genomic Data Analysis, Tian Tian

Dissertations

Deep learning techniques have achieved tremendous successes in a wide range of real applications in recent years. For dimension reduction, deep neural networks (DNNs) provide a natural choice to parameterize a non-linear transforming function that maps the original high dimensional data to a lower dimensional latent space. Autoencoder is a kind of DNNs used to learn efficient feature representation in an unsupervised manner. Deep autoencoder has been widely explored and applied to analysis of continuous data, while it is understudied for characterizing discrete data. This dissertation focuses on developing model-based deep autoencoders for modeling discrete data. A motivating example of …


Simplicity Diffexpress: A Bespoke Cloud-Based Interface For Rna-Seq Differential Expression Modeling And Analysis, Cintia C. Palu, Marcelo Ribeiro-Alves, Yanxin Wu, Brendan Lawlor, Pavel V. Baranov, Brian Kelly, Paul Walsh May 2019

Simplicity Diffexpress: A Bespoke Cloud-Based Interface For Rna-Seq Differential Expression Modeling And Analysis, Cintia C. Palu, Marcelo Ribeiro-Alves, Yanxin Wu, Brendan Lawlor, Pavel V. Baranov, Brian Kelly, Paul Walsh

Department of Computer Science Publications

One of the key challenges for transcriptomics-based research is not only the processing of large data but also modeling the complexity of features that are sources of variation across samples, which is required for an accurate statistical analysis. Therefore, our goal is to foster access for wet lab researchers to bioinformatics tools, in order to enhance their ability to explore biological aspects and validate hypotheses with robust analysis. In this context, user-friendly interfaces can enable researchers to apply computational biology methods without requiring bioinformatics expertise. Such bespoke platforms can improve the quality of the findings by allowing the researcher to …


Designing Computational Biology Workflows With Perl - Part 1, Esma Yildirim May 2019

Designing Computational Biology Workflows With Perl - Part 1, Esma Yildirim

Open Educational Resources

This material introduces Linux File System structures and demonstrates how to use commands to communicate with the operating system through a Terminal program. Basic program structures and system() function of Perl are discussed. A brief introduction to gene-sequencing terminology and file formats are given.


Designing Computational Biology Workflows With Perl - Part 1, Esma Yildirim May 2019

Designing Computational Biology Workflows With Perl - Part 1, Esma Yildirim

Open Educational Resources

This material introduces the AWS console interface, describes how to create an instance on AWS with the VMI provided, connect to that machine instance using the SSH protocol. Once connected, it requires the students to write a script to enter the data folder, which includes gene-sequencing input files and print the first five line of each file remotely. The same exercise can be applied if the VMI is installed on a local machine using virtualization software (e.g. Oracle VirtualBox). In this case, the Terminal program of the VMI can be used to do the exercise.


Designing Computational Biology Workflows With Perl - Part 2, Esma Yildirim May 2019

Designing Computational Biology Workflows With Perl - Part 2, Esma Yildirim

Open Educational Resources

This material introduces the AWS console interface, describes how to create an instance on AWS with the VMI provided and connect to that machine instance using the SSH protocol. Once connected, it requires the students to write a script to automate the tasks to create VCF files from two different sample genomes belonging to E.coli microorganisms by using the FASTA and FASTQ files in the input folder of the virtual machine. The same exercise can be applied if the VMI is installed on a local machine using virtualization software (e.g. Oracle VirtualBox). In this case, the Terminal program of the …


Designing Computational Biology Workflows With Perl - Part 2, Esma Yildirim May 2019

Designing Computational Biology Workflows With Perl - Part 2, Esma Yildirim

Open Educational Resources

This material briefly reintroduces the DNA double Helix structure, explains SNP and INDEL mutations in genes and describes FASTA, FASTQ, BAM and VCF file formats. It also explains the index creation, alignment, sorting, marking duplicates and variant calling steps of a simple preprocessing workflow and how to write a Perl script to automate the execution of these steps on a Virtual Machine Image.


Gogo: An Improved Algorithm To Measure The Semantic Similarity Between Gene Ontology Terms, Chenguang Zhao May 2019

Gogo: An Improved Algorithm To Measure The Semantic Similarity Between Gene Ontology Terms, Chenguang Zhao

Master's Theses

Measuring the semantic similarity between Gene Ontology (GO) terms is an essential step in functional bioinformatics research. We implemented a software named GOGO for calculating the semantic similarity between GO terms. GOGO has the advantages of both information-content-based and hybrid methods, such as Resnik’s and Wang’s methods. Moreover, GOGO is relatively fast and does not need to calculate information content (IC) from a large gene annotation corpus but still has the advantage of using IC. This is achieved by considering the number of children nodes in the GO directed acyclic graphs when calculating the semantic contribution of an ancestor node …


Designing Computational Biology Workflows With Perl - Part 1 & 2, Esma Yildirim May 2019

Designing Computational Biology Workflows With Perl - Part 1 & 2, Esma Yildirim

Open Educational Resources

This manual guides the instructor to combine the partial files of the virtual machine image and construct sequencer.ova file. It is accompanied by the partial files of the virtual machine image.


A Machine Learning Technology For Rapid Detection Of Carbon Nanotubes/Dna Hybridization In Biosensor Healthcare Applications, Steven K. Ang May 2019

A Machine Learning Technology For Rapid Detection Of Carbon Nanotubes/Dna Hybridization In Biosensor Healthcare Applications, Steven K. Ang

Master's Theses

In molecular biology, the term “DNA hybridization” generally refers to the process of forming a double stranded nucleic acid from joining two complementary strands of DNA. The degree of genetic similarity of the DNA resulting from hybridization can be detected ei ther by using the chemical characteristics of DNA samples or by utilizing reliable biosensors which transform the chemical characteristics into a source of electrical measurements. In past research about such sensors, known as DNA Hybridization Detection Systems, the thermal and electrical characteristics of carbon nanotubes are utilized to detect whether hybridization takes place or not. However, human interpretation of …


Highly Accurate Fragment Library For Protein Fold Recognition, Wessam Elhefnawy Apr 2019

Highly Accurate Fragment Library For Protein Fold Recognition, Wessam Elhefnawy

Computer Science Theses & Dissertations

Proteins play a crucial role in living organisms as they perform many vital tasks in every living cell. Knowledge of protein folding has a deep impact on understanding the heterogeneity and molecular functions of proteins. Such information leads to crucial advances in drug design and disease understanding. Fold recognition is a key step in the protein structure discovery process, especially when traditional computational methods fail to yield convincing structural homologies. In this work, we present a new protein fold recognition approach using machine learning and data mining methodologies.

First, we identify a protein structural fragment library (Frag-K) composed of a …


Computational Analysis Of Large-Scale Trends And Dynamics In Eukaryotic Protein Family Evolution, Joseph Boehm Ahrens Mar 2019

Computational Analysis Of Large-Scale Trends And Dynamics In Eukaryotic Protein Family Evolution, Joseph Boehm Ahrens

FIU Electronic Theses and Dissertations

The myriad protein-coding genes found in present-day eukaryotes arose from a combination of speciation and gene duplication events, spanning more than one billion years of evolution. Notably, as these proteins evolved, the individual residues at each site in their amino acid sequences were replaced at markedly different rates. The relationship between protein structure, protein function, and site-specific rates of amino acid replacement is a topic of ongoing research. Additionally, there is much interest in the different evolutionary constraints imposed on sequences related by speciation (orthologs) versus sequences related by gene duplication (paralogs). A principal aim of this dissertation is to …


Data Analytics Pipeline For Rna Structure Analysis Via Shape, Quinn Nelson Mar 2019

Data Analytics Pipeline For Rna Structure Analysis Via Shape, Quinn Nelson

UNO Student Research and Creative Activity Fair

Coxsackievirus B3 (CVB3) is a cardiovirulent enterovirus from the family Picornaviridae. The RNA genome houses an internal ribosome entry site (IRES) in the 5’ untranslated region (5’UTR) that enables cap-independent translation. Ample evidence suggests that the structure of the 5’UTR is a critical element for virulence. We probe RNA structure in solution using base-specific modifying agents such as dimethyl sulfate as well as backbone targeting agents such as N-methylisatoic anhydride used in Selective 2’-Hydroxyl Acylation Analyzed by Primer Extension (SHAPE). We have developed a pipeline that merges and evaluates base-specific and SHAPE data together with statistical analyses that provides confidence …


Question Answering For Suicide Risk Assessment Using Reddit, Amanuel Alambo, Usha Lokala, Ugur Kursuncu, Krishnaprasad Thirunarayan, Amelia Gyrard, Randon S. Welton, Jyotishman Pathak, Amit P. Sheth Feb 2019

Question Answering For Suicide Risk Assessment Using Reddit, Amanuel Alambo, Usha Lokala, Ugur Kursuncu, Krishnaprasad Thirunarayan, Amelia Gyrard, Randon S. Welton, Jyotishman Pathak, Amit P. Sheth

Kno.e.sis Publications

Mental Health America designed ten questionnaires that are used to determine the risk of mental disorders. They are also commonly used by Mental Health Professionals (MHPs) to assess suicidality. Specifically, the Columbia Suicide Severity Rating Scale (C-SSRS), a widely used suicide assessment questionnaire, helps MHPs determine the severity of suicide risk and offer an appropriate treatment. A major challenge in suicide treatment is the social stigma wherein the patient feels reluctance in discussing his/her conditions with an MHP, which leads to inaccurate assessment and treatment of patients. On the other hand, the same patient is comfortable freely discussing his/her mental …


Gene Ontology-Guided Force-Directed Visualization Of Protein Interaction Networks, James Lowell King Jan 2019

Gene Ontology-Guided Force-Directed Visualization Of Protein Interaction Networks, James Lowell King

CCE Theses and Dissertations

Protein interaction data is being generated at unprecedented rates thanks to advancements made in high throughput techniques such as mass spectrometry and DNA microarrays. Biomedical researchers, operating under budgetary constraints, have found it difficult to scale their efforts to keep up with the ever-increasing amount of available data. They often lack the resources and manpower required to analyze the data using existing methodologies. These research deficiencies impede our ability to understand diseases, delay the advancement of clinical therapeutics, and ultimately costs lives.

One of the most commonly used techniques to analyze protein interaction data is the construction and visualization of …


Citationally Enhanced Semantic Literature Based Discovery, John David Fleig Jan 2019

Citationally Enhanced Semantic Literature Based Discovery, John David Fleig

CCE Theses and Dissertations

We are living within the age of information. The ever increasing flow of data and publications poses a monumental bottleneck to scientific progress as despite the amazing abilities of the human mind, it is woefully inadequate in processing such a vast quantity of multidimensional information. The small bits of flotsam and jetsam that we leverage belies the amount of useful information beneath the surface. It is imperative that automated tools exist to better search, retrieve, and summarize this content. Combinations of document indexing and search engines can quickly find you a document whose content best matches your query - if …


Empathi: An Ontology For Emergency Managing And Planning About Hazard Crisis, Manas Gaur, Kaeedeh Shekarpour, Amelia Gyrard, Amit P. Sheth Jan 2019

Empathi: An Ontology For Emergency Managing And Planning About Hazard Crisis, Manas Gaur, Kaeedeh Shekarpour, Amelia Gyrard, Amit P. Sheth

Kno.e.sis Publications

In the domain of emergency management during hazard crises, having sufficient situational awareness information is critical. It requires capturing and integrating information from sources such as satellite images, local sensors and social media content generated by local people.
A bold obstacle to capturing, representing and integrating such heterogeneous and diverse information is lack of a proper ontology which properly conceptualizes this domain, aggregates and unifies datasets. Thus, in this paper, we introduce empathi ontology which conceptualizes the core concepts describing the domain of emergency managing and planning of hazard crises.
Although empathi has a coarse-grained view, it considers the necessary …


Adaptive Knowledge Networks: A Time Capsule, Swati Padhee, Anurag Illendula, Amit Sheth, Krishnaprasad Thirunarayan, Valerie L. Shalin Jan 2019

Adaptive Knowledge Networks: A Time Capsule, Swati Padhee, Anurag Illendula, Amit Sheth, Krishnaprasad Thirunarayan, Valerie L. Shalin

Kno.e.sis Publications

❖ Real world events are dynamic in nature Periodic events e.g. US Presidential Election Non-periodic events e.g. Cyclone Idai

❖ Need for real-time predictive analysis, trend analysis, spatio-temporal decision making, public opinion analysis for events.

❖ Current state-of-the-art curates dynamic knowledge graph from structured text.

❖ We propose creating an Adaptive Knowledge Network from incoming real-time multimodal spatio-temporally evolving data.


Relation Prediction Over Biomedical Knowledge Bases For Drug Repositioning, Mehmet Bakal Jan 2019

Relation Prediction Over Biomedical Knowledge Bases For Drug Repositioning, Mehmet Bakal

Theses and Dissertations--Computer Science

Identifying new potential treatment options for medical conditions that cause human disease burden is a central task of biomedical research. Since all candidate drugs cannot be tested with animal and clinical trials, in vitro approaches are first attempted to identify promising candidates. Likewise, identifying other essential relations (e.g., causation, prevention) between biomedical entities is also critical to understand biomedical processes. Hence, it is crucial to develop automated relation prediction systems that can yield plausible biomedical relations to expedite the discovery process. In this dissertation, we demonstrate three approaches to predict treatment relations between biomedical entities for the drug repositioning task …


Automatic Identification Of Individual Drugs In Death Certificates, Soon Jye Kho, Amit Sheth, Olivier Bodenreider Jan 2019

Automatic Identification Of Individual Drugs In Death Certificates, Soon Jye Kho, Amit Sheth, Olivier Bodenreider

Kno.e.sis Publications

Background:

Establishing trends of drug overdoses requires the identification of individual drugs in death certificates, not supported by coding with the International Classification of Diseases. However, identifying drug mentions from the literal portion of death certificates remains challenging due to the variability of drug names.

Objectives:

To automatically identify individual drugs in death certificates.

Methods:

We use RxNorm to collect variants for drug names (generic names, synonyms, brand names) and we algorithmically generate common misspellings. We use this automatically compiled list to identify drug mentions from 703,106 death certificates and compare the performance of our automated approach to that of …


Exploring Strategies To Integrate Disparate Bioinformatics Datasets, Charbel Bader Fakhry Jan 2019

Exploring Strategies To Integrate Disparate Bioinformatics Datasets, Charbel Bader Fakhry

Walden Dissertations and Doctoral Studies

Distinct bioinformatics datasets make it challenging for bioinformatics specialists to locate the required datasets and unify their format for result extraction. The purpose of this single case study was to explore strategies to integrate distinct bioinformatics datasets. The technology acceptance model was used as the conceptual framework to understand the perceived usefulness and ease of use of integrating bioinformatics datasets. The population of this study included bioinformatics specialists of a research institution in Lebanon that has strategies to integrate distinct bioinformatics datasets. The data collection process included interviews with 6 bioinformatics specialists and reviewing 27 organizational documents relating to integrating …