Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 9 of 9

Full-Text Articles in Physical Sciences and Mathematics

Bert Efficacy On Scientific And Medical Datasets: A Systematic Literature Review, Clayton Cohn Nov 2020

Bert Efficacy On Scientific And Medical Datasets: A Systematic Literature Review, Clayton Cohn

College of Computing and Digital Media Dissertations

Bidirectional Encoder Representations from Transformers (BERT) [Devlin et al., 2018] has been shown to be effective at modeling a multitude of datasets across a wide variety of Natural Language Processing (NLP) tasks; however, little research has been done regarding BERT’s effectiveness at modeling domain-specific datasets. Specifically, scientific and medical datasets present a particularly difficult challenge in NLP, as these types of corpora are often rife with technical jargon that is largely absent from the canonical corpora that BERT and other transfer learning models were originally trained on. This thesis is a Systematic Literature Review (SLR) of twenty-seven studies that were …


A Study Of Information Bots And Knowledge Bots, Amartya Hatua Aug 2020

A Study Of Information Bots And Knowledge Bots, Amartya Hatua

Dissertations

In this dissertation, a study of different aspects of information bots and knowledge bots is done. The research contributes to a better understanding of the various characteristics of information bots as well as the different patterns and factors responsible for the information diffusion in a social network. This research also shows how these factors can be used to predict information diffusion for a particular topic in a social network. The second part of the research is focused on strategies for improving the knowledge base of knowledge bots, where two different approaches are studied. In the first approach, knowledge is transferred …


Automatic Learning Of Document Section Structure For Ontology-Based Semantic Search, Deya Banisakher Jul 2020

Automatic Learning Of Document Section Structure For Ontology-Based Semantic Search, Deya Banisakher

FIU Electronic Theses and Dissertations

Modeling natural human behavior in understanding written language is crucial for developing true artificial intelligence. For people, words convey certain semantic concepts. While documents represent an abstract concept---they are collections of text organized in some logical structure, that is, sentences, paragraphs, sections, and so on. Similar to words, these document structures, are used to convey a logical flow of semantic concepts. Machines however, only view words as spans of characters and documents as mere collections of free-text, missing any underlying meanings behind words and the logical structure of those documents.

Automatic semantic concept detection is the process by which the …


Improved Chinese Language Processing For An Open Source Search Engine, Xianghong Sun May 2020

Improved Chinese Language Processing For An Open Source Search Engine, Xianghong Sun

Master's Projects

Natural Language Processing (NLP) is the process of computers analyzing on human languages. There are also many areas in NLP. Some of the areas include speech recognition, natural language understanding, and natural language generation.

Information retrieval and natural language processing for Asians languages has its own unique set of challenges not present for Indo-European languages. Some of these are text segmentation, named entity recognition in unsegmented text, and part of speech tagging. In this report, we describe our implementation of and experiments with improving the Chinese language processing sub-component of an open source search engine, Yioop. In particular, we rewrote …


Identifying Privacy Policy In Service Terms Using Natural Language Processing, Ange-Thierry Ishimwe May 2020

Identifying Privacy Policy In Service Terms Using Natural Language Processing, Ange-Thierry Ishimwe

Computer Science and Computer Engineering Undergraduate Honors Theses

Ever since technology (tech) companies realized that people's usage data from their activities on mobile applications to the internet could be sold to advertisers for a profit, it began the Big Data era where tech companies collect as much data as possible from users. One of the benefits of this new era is the creation of new types of jobs such as data scientists, Big Data engineers, etc. However, this new era has also raised one of the hottest topics, which is data privacy. A myriad number of complaints have been raised on data privacy, such as how much access …


Identifying External Cross-References Using Natural Language Processing (Nlp), Elham Rahmani Apr 2020

Identifying External Cross-References Using Natural Language Processing (Nlp), Elham Rahmani

Electronic Thesis and Dissertation Repository

[Context and motivation] Software engineers build systems that need to be compliant with relevant regulations. These regulations are stated in authoritative documents from which regulatory requirements need to be elicited. Project contract contains cross-references to these regulatory requirements in external documents. [Problem] Exploring and identifying the regulatory requirements in voluminous textual data is enormously time consuming, and hence costly, and error-prone in sizable software projects. [Principal idea and novelty] We use Natural Language Processing (NLP), Pattern Recognition and Web Scrapping techniques for automatically extracting external cross-references from contractual requirements and prepare a map for representing related external cross-references …


Robust Neural Machine Translation, Abdul Rafae Khan Feb 2020

Robust Neural Machine Translation, Abdul Rafae Khan

Dissertations, Theses, and Capstone Projects

This thesis aims for general robust Neural Machine Translation (NMT) that is agnostic to the test domain. NMT has achieved high quality on benchmarks with closed datasets such as WMT and NIST but can fail when the translation input contains noise due to, for example, mismatched domains or spelling errors. The standard solution is to apply domain adaptation or data augmentation to build a domain-dependent system. However, in real life, the input noise varies in a wide range of domains and types, which is unknown in the training phase. This thesis introduces five general approaches to improve NMT accuracy and …


Deep Neural Architectures For End-To-End Relation Extraction, Tung Tran Jan 2020

Deep Neural Architectures For End-To-End Relation Extraction, Tung Tran

Theses and Dissertations--Computer Science

The rapid pace of scientific and technological advancements has led to a meteoric growth in knowledge, as evidenced by a sharp increase in the number of scholarly publications in recent years. PubMed, for example, archives more than 30 million biomedical articles across various domains and covers a wide range of topics including medicine, pharmacy, biology, and healthcare. Social media and digital journalism have similarly experienced their own accelerated growth in the age of big data. Hence, there is a compelling need for ways to organize and distill the vast, fragmented body of information (often unstructured in the form of natural …


Word Embedding Driven Concept Detection In Philosophical Corpora, Dylan Hayton-Ruffner Jan 2020

Word Embedding Driven Concept Detection In Philosophical Corpora, Dylan Hayton-Ruffner

Honors Projects

During the course of research, scholars often explore large textual databases for segments of text relevant to their conceptual analyses. This study proposes, develops and evaluates two algorithms for automated concept detection in theoretical corpora: ACS and WMD retrieval. Both novel algorithms are compared to key word retrieval, using a test set from the Digital Ricoeur corpus tagged by scholarly experts. WMD retrieval outperforms key word search on the concept detection task. Thus, WMD retrieval is a promising tool for concept detection and information retrieval systems focused on theoretical corpora.