Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 7 of 7

Full-Text Articles in Computer Sciences

Language Models For Rare Disease Information Extraction: Empirical Insights And Model Comparisons, Shashank Gupta Jan 2024

Language Models For Rare Disease Information Extraction: Empirical Insights And Model Comparisons, Shashank Gupta

Theses and Dissertations--Computer Science

End-to-end relation extraction (E2ERE) is a crucial task in natural language processing (NLP) that involves identifying and classifying semantic relationships between entities in text. This thesis compares three paradigms for end-to-end relation extraction (E2ERE) in biomedicine, focusing on rare diseases with discontinuous and nested entities. We evaluate Named Entity Recognition (NER) to Relation Extraction (RE) pipelines, sequence-to-sequence models, and generative pre-trained transformer (GPT) models using the RareDis information extraction dataset. Our findings indicate that pipeline models are the most effective, followed closely by sequence-to-sequence models. GPT models, despite having eight times as many parameters, perform worse than sequence-to-sequence models and …


Relation Prediction Over Biomedical Knowledge Bases For Drug Repositioning, Mehmet Bakal Jan 2019

Relation Prediction Over Biomedical Knowledge Bases For Drug Repositioning, Mehmet Bakal

Theses and Dissertations--Computer Science

Identifying new potential treatment options for medical conditions that cause human disease burden is a central task of biomedical research. Since all candidate drugs cannot be tested with animal and clinical trials, in vitro approaches are first attempted to identify promising candidates. Likewise, identifying other essential relations (e.g., causation, prevention) between biomedical entities is also critical to understand biomedical processes. Hence, it is crucial to develop automated relation prediction systems that can yield plausible biomedical relations to expedite the discovery process. In this dissertation, we demonstrate three approaches to predict treatment relations between biomedical entities for the drug repositioning task …


Ultra-Fast And Memory-Efficient Lookups For Cloud, Networked Systems, And Massive Data Management, Ye Yu Jan 2018

Ultra-Fast And Memory-Efficient Lookups For Cloud, Networked Systems, And Massive Data Management, Ye Yu

Theses and Dissertations--Computer Science

Systems that process big data (e.g., high-traffic networks and large-scale storage) prefer data structures and algorithms with small memory and fast processing speed. Efficient and fast algorithms play an essential role in system design, despite the improvement of hardware. This dissertation is organized around a novel algorithm called Othello Hashing. Othello Hashing supports ultra-fast and memory-efficient key-value lookup, and it fits the requirements of the core algorithms of many large-scale systems and big data applications. Using Othello hashing, combined with domain expertise in cloud, computer networks, big data, and bioinformatics, I developed the following applications that resolve several major …


Scalable Feature Selection And Extraction With Applications In Kinase Polypharmacology, Derek Jones Jan 2018

Scalable Feature Selection And Extraction With Applications In Kinase Polypharmacology, Derek Jones

Theses and Dissertations--Computer Science

In order to reduce the time associated with and the costs of drug discovery, machine learning is being used to automate much of the work in this process. However the size and complex nature of molecular data makes the application of machine learning especially challenging. Much work must go into the process of engineering features that are then used to train machine learning models, costing considerable amounts of time and requiring the knowledge of domain experts to be most effective. The purpose of this work is to demonstrate data driven approaches to perform the feature selection and extraction steps in …


Automated Tree-Level Forest Quantification Using Airborne Lidar, Hamid Hamraz Jan 2018

Automated Tree-Level Forest Quantification Using Airborne Lidar, Hamid Hamraz

Theses and Dissertations--Computer Science

Traditional forest management relies on a small field sample and interpretation of aerial photography that not only are costly to execute but also yield inaccurate estimates of the entire forest in question. Airborne light detection and ranging (LiDAR) is a remote sensing technology that records point clouds representing the 3D structure of a forest canopy and the terrain underneath. We present a method for segmenting individual trees from the LiDAR point clouds without making prior assumptions about tree crown shapes and sizes. We then present a method that vertically stratifies the point cloud to an overstory and multiple understory tree …


Novel Computational Methods For Transcript Reconstruction And Quantification Using Rna-Seq Data, Yan Huang Jan 2015

Novel Computational Methods For Transcript Reconstruction And Quantification Using Rna-Seq Data, Yan Huang

Theses and Dissertations--Computer Science

The advent of RNA-seq technologies provides an unprecedented opportunity to precisely profile the mRNA transcriptome of a specific cell population. It helps reveal the characteristics of the cell under the particular condition such as a disease. It is now possible to discover mRNA transcripts not cataloged in existing database, in addition to assessing the identities and quantities of the known transcripts in a given sample or cell. However, the sequence reads obtained from an RNA-seq experiment is only a short fragment of the original transcript. How to recapitulate the mRNA transcriptome from short RNA-seq reads remains a challenging problem. We …


A Novel Computational Framework For Transcriptome Analysis With Rna-Seq Data, Yin Hu Jan 2013

A Novel Computational Framework For Transcriptome Analysis With Rna-Seq Data, Yin Hu

Theses and Dissertations--Computer Science

The advance of high-throughput sequencing technologies and their application on mRNA transcriptome sequencing (RNA-seq) have enabled comprehensive and unbiased profiling of the landscape of transcription in a cell. In order to address the current limitation of analyzing accuracy and scalability in transcriptome analysis, a novel computational framework has been developed on large-scale RNA-seq datasets with no dependence on transcript annotations. Directly from raw reads, a probabilistic approach is first applied to infer the best transcript fragment alignments from paired-end reads. Empowered by the identification of alternative splicing modules, this framework then performs precise and efficient differential analysis at automatically detected …