Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Data Science

PDF

Series

2022

Institution
Keyword
Publication

Articles 1 - 30 of 99

Full-Text Articles in Entire DC Network

Utilizing Remote Sensing Technology To Relocate Lubra Village And Visualize Flood Damages, Ronan Wallace Dec 2022

Utilizing Remote Sensing Technology To Relocate Lubra Village And Visualize Flood Damages, Ronan Wallace

Mathematics, Statistics, and Computer Science Honors Projects

As weather patterns change worldwide, isolated communities impacted by climate change go unnoticed and we need community and habitat-conscious solutions. In Himalayan Mustang, Nepal, indigenous Lubra village faces threats of increasing flash flooding. After every flood, residual concrete-like sediment hardens across the riverbed, causing the riverbed elevation to rise. As elevation increases, sediment encroaches on Lubra’s agricultural fields and homes, magnifying flood vulnerability. In the last monsoon season alone, the village witnessed floods swallowing several fields and damaging two homes. One solution considers relocating the village to a new location entirely. However, relocation poses a challenging task, as eight centuries …


Pyseg: A Python Package For 2d Material Flake Localization, Segmentation, And Thickness Prediction, Diana B. Horangic Dec 2022

Pyseg: A Python Package For 2d Material Flake Localization, Segmentation, And Thickness Prediction, Diana B. Horangic

Student Research Projects

Thin materials are of interest for their extraordinary physical, mechanical, thermal, electrical, and optical properties. Monolayers and bilayers of 2D materials can be manufactured through a variety of exfoliation methods. To determine layer thickness, Raman spectroscopy or other methods like Rayleigh scattering are used. These methods are, however, slow, and they require equipment beyond an optical microscope. A Python package that automates flake identification processes was built, with access solely to RGB data from an optical microscope assumed. My package, pyseg, localizes flakes on a substrate and then makes a rough estimate of their thickness from first principles. It can …


Spatial Validation Of Agent-Based Models, Kristoffer Wikstrom, Hal T. Nelson Dec 2022

Spatial Validation Of Agent-Based Models, Kristoffer Wikstrom, Hal T. Nelson

Public Administration Faculty Publications and Presentations

This paper adapts an existing techno–social agent-based model (ABM) in order to develop a new framework for spatially validating ABMs. The ABM simulates citizen opposition to locally unwanted land uses, using historical data from an energy infrastructure siting process in Southern California. Spatial theory, as well as the model’s design, suggest that adequate validation requires multiple tests rather than relying solely on a single test-statistic. A pattern-oriented modeling approach was employed that first mapped real and simulated citizen comments across the US Census tract. The suite of spatial tests included Global Moran’s I, complemented with bivariate correlations, as well as …


The Interaction Of Normalisation And Clustering In Sub-Domain Definition For Multi-Source Transfer Learning Based Time Series Anomaly Detection, Matthew Nicholson, Rahul Agrahari, Clare Conran, Haythem Assem, John D. Kelleher Dec 2022

The Interaction Of Normalisation And Clustering In Sub-Domain Definition For Multi-Source Transfer Learning Based Time Series Anomaly Detection, Matthew Nicholson, Rahul Agrahari, Clare Conran, Haythem Assem, John D. Kelleher

Articles

This paper examines how data normalisation and clustering interact in the definition of sub-domains within multi-source transfer learning systems for time series anomaly detection. The paper introduces a distinction between (i) clustering as a primary/direct method for anomaly detection, and (ii) clustering as a method for identifying sub-domains within the source or target datasets. Reporting the results of three sets of experiments, we find that normalisation after feature extraction and before clustering results in the best performance for anomaly detection. Interestingly, we find that in the multi-source transfer learning scenario clustering on the target dataset and identifying subdomains in the …


Safe Sharing For Sensitive Data, Kristi Thompson Dec 2022

Safe Sharing For Sensitive Data, Kristi Thompson

Western Libraries Presentations

This workshop focused on the question of when and how human subjects' data can be safely shared. It introduced the basics of data anonymization and discussed how to tell if a dataset has been de-identified. Case studies of successful anonymization and some spectacular failures were shared


Identity Term Sampling For Measuring Gender Bias In Training Data, Nasim Sobhani, Sarah Jane Delany Dec 2022

Identity Term Sampling For Measuring Gender Bias In Training Data, Nasim Sobhani, Sarah Jane Delany

Conference Papers

Predictions from machine learning models can reflect biases in the data on which they are trained. Gender bias has been identified in natural language processing systems such as those used for recruitment. The development of approaches to mitigate gender bias in training data typically need to be able to isolate the effect of gender on the output to see the impact of gender. While it is possible to isolate and identify gender for some types of training data, e.g. CVs in recruitment, for most textual corpora there is no obvious gender label. This paper proposes a general approach to measure …


A Multistate Competing Risks Framework For Preconception Prediction Of Pregnancy Outcomes, Kaitlyn Cook, Neil J. Perkins, Enrique Schisterman, Sebastien Haneuse Dec 2022

A Multistate Competing Risks Framework For Preconception Prediction Of Pregnancy Outcomes, Kaitlyn Cook, Neil J. Perkins, Enrique Schisterman, Sebastien Haneuse

Statistical and Data Sciences: Faculty Publications

Background: Preconception pregnancy risk profiles—characterizing the likelihood that a pregnancy attempt results in a full-term birth, preterm birth, clinical pregnancy loss, or failure to conceive—can provide critical information during the early stages of a pregnancy attempt, when obstetricians are best positioned to intervene to improve the chances of successful conception and full-term live birth. Yet the task of constructing and validating risk assessment tools for this earlier intervention window is complicated by several statistical features: the final outcome of the pregnancy attempt is multinomial in nature, and it summarizes the results of two intermediate stages, conception and gestation, whose outcomes …


Natural Language Processing For Disaster Tweets, Akinyemi D. Apampa, Nan Li Dec 2022

Natural Language Processing For Disaster Tweets, Akinyemi D. Apampa, Nan Li

Publications and Research

Our goal is to establish an automatic model that identifies which tweets are about natural disasters based on the content of the tweets. Our method is to construct a decision tree based on keyword searching. We will construct the model using 7,645 tweets and test our model on 3,465 tweets as an assessment of the performance.


Extracellular Dnases Facilitate Antagonism And Coexistence In Bacterial Competitor-Sensing Interference Competition, Aoi Ogawa, Christophe Golé, Maria Bermudez, Odrine Habarugira, Gabrielle Joslin, Taylor Mccain, Autumn Mineo, Jennifer Wise, Julie Xiong, Katherine Yan, Jan A.C. Vriezen Nov 2022

Extracellular Dnases Facilitate Antagonism And Coexistence In Bacterial Competitor-Sensing Interference Competition, Aoi Ogawa, Christophe Golé, Maria Bermudez, Odrine Habarugira, Gabrielle Joslin, Taylor Mccain, Autumn Mineo, Jennifer Wise, Julie Xiong, Katherine Yan, Jan A.C. Vriezen

Biological Sciences: Faculty Publications

Over the last 4 decades, the rate of discovery of novel antibiotics has decreased drastically, ending the era of fortuitous antibiotic discovery. A better understanding of the biology of bacteriogenic toxins potentially helps to prospect for new antibiotics. To initiate this line of research, we quantified antagonists from two different sites at two different depths of soil and found the relative number of antagonists to correlate with the bacterial load and carbon-to-nitrogen (C/N) ratio of the soil. Consecutive studies show the importance of antagonist interactions between soil isolates and the lack of a predicted role for nutrient availability and, therefore, …


A New Kind Of Data Science: The Need For Ethical Analytics, Jonathan Boardman Nov 2022

A New Kind Of Data Science: The Need For Ethical Analytics, Jonathan Boardman

Published and Grey Literature from PhD Candidates

Ethics can no longer be regarded as an add-on in data science and analytics. This paper argues for the necessity of formalizing a new, practically-oriented sub-discipline of AI ethics by outlining the needs, highlighting shortcomings in current approaches, and providing a framework for ethical analytics, which is concerned with the study of the ethical issues surrounding the development, deployment, and/or dissemination of ML/AI systems and data science research, as well as the development of tools and procedures to mitigate ethical harms. While data science and machine learning are primarily concerned with data from start to finish, ethical analytics is concerned …


Design Of Secure Communication Schemes To Provide Authentication And Integrity Among The Iot Devices, Vidya Rao Dr. Nov 2022

Design Of Secure Communication Schemes To Provide Authentication And Integrity Among The Iot Devices, Vidya Rao Dr.

Technical Collection

The fast growth in Internet-of-Things (IoT) based applications, has increased the number of end-devices communicating over the Internet. The end devices are made with fewer resources and are low battery-powered. These resource-constrained devices are exposed to various security and privacy concerns over publicly available Internet communication. Thus, it becomes essential to provide lightweight security solutions to safeguard data and user privacy. Elliptic Curve Cryptography (ECC) can be used to generate the digital signature and also encrypt the data. The method can be evaluated on a real-time testbed deployed using Raspberry Pi3 devices and every message transmitted is subjected to ECC. …


Getting Started Analyzing Data In Spss, Kristi Thompson Nov 2022

Getting Started Analyzing Data In Spss, Kristi Thompson

Western Libraries Presentations

SPSS is a popular package for analyzing data. This session will discuss how to get started on a simple quantitative analysis project using SPSS. Topics covered will include getting summary statistics, creating and modifying variables, creating graphs, running simple analyses, and interpreting SPSS output.


Lstm-Sdm: An Integrated Framework Of Lstm Implementation For Sequential Data Modeling[Formula Presented], Hum Nath Bhandari, Binod Rimal, Nawa Raj Pokhrel, Ramchandra Rimal, Keshab R. Dahal Nov 2022

Lstm-Sdm: An Integrated Framework Of Lstm Implementation For Sequential Data Modeling[Formula Presented], Hum Nath Bhandari, Binod Rimal, Nawa Raj Pokhrel, Ramchandra Rimal, Keshab R. Dahal

Arts & Sciences Faculty Publications

LSTM-SDM is a python-based integrated computational framework built on the top of Tensorflow/Keras and written in the Jupyter notebook. It provides several object-oriented functionalities for implementing single layer and multilayer LSTM models for sequential data modeling and time series forecasting. Multiple subroutines are blended to create a conducive user-friendly environment that facilitates data exploration and visualization, normalization and input preparation, hyperparameter tuning, performance evaluations, visualization of results, and statistical analysis. We utilized the LSTM-SDM framework in predicting the stock market index and observed impressive results. The framework can be generalized to solve several other real-world time series problems.


'Flux+Mutability': A Conditional Generative Approach To One-Class Classification And Anomaly Detection, Cristiano Fanelli, James Giroux, Z. Papandreou Nov 2022

'Flux+Mutability': A Conditional Generative Approach To One-Class Classification And Anomaly Detection, Cristiano Fanelli, James Giroux, Z. Papandreou

Arts & Sciences Articles

Anomaly Detection is becoming increasingly popular within the experimental physics community. At experiments such as the Large Hadron Collider, anomaly detection is growing in interest for finding new physics beyond the Standard Model. This paper details the implementation of a novel Machine Learning architecture, called Flux+Mutability, which combines cutting-edge conditional generative models with clustering algorithms. In the 'flux' stage we learn the distribution of a reference class. The 'mutability' stage at inference addresses if data significantly deviates from the reference class. We demonstrate the validity of our approach and its connection to multiple problems spanning from one-class classification to anomaly …


Recall Distortion In Neural Network Pruning And The Undecayed Pruning Algorithm, Aidan Good, Jiaqi Lin, Hannah Sieg, Mikey Ferguson, Xin Yu, Shandian Zhe, Jerzy Wieczorek, Thiago Serra Nov 2022

Recall Distortion In Neural Network Pruning And The Undecayed Pruning Algorithm, Aidan Good, Jiaqi Lin, Hannah Sieg, Mikey Ferguson, Xin Yu, Shandian Zhe, Jerzy Wieczorek, Thiago Serra

Faculty Conference Papers and Presentations

Pruning techniques have been successfully used in neural networks to trade accuracy for sparsity. However, the impact of network pruning is not uniform: prior work has shown that the recall for underrepresented classes in a dataset may be more negatively affected. In this work, we study such relative distortions in recall by hypothesizing an intensification effect that is inherent to the model. Namely, that pruning makes recall relatively worse for a class with recall below accuracy and, conversely, that it makes recall relatively better for a class with recall above accuracy. In addition, we propose a new pruning algorithm aimed …


An Investigation Of The Reconstruction Capacity Of Stacked Convolutional Autoencoders For Log-Mel-Spectrograms, Anastasia Natsiou, Luca Longo, Seán O'Leary Oct 2022

An Investigation Of The Reconstruction Capacity Of Stacked Convolutional Autoencoders For Log-Mel-Spectrograms, Anastasia Natsiou, Luca Longo, Seán O'Leary

Conference Papers

In audio processing applications, the generation of expressive sounds based on high-level representations demonstrates a high demand. These representations can be used to manipulate the timbre and influence the synthesis of creative instrumental notes. Modern algorithms, such as neural networks, have inspired the development of expressive synthesizers based on musical instrument timbre compression. Unsupervised deep learning methods can achieve audio compression by training the network to learn a mapping from waveforms or spectrograms to low-dimensional representations. This study investigates the use of stacked convolutional autoencoders for the compression of time-frequency audio representations for a variety of instruments for a single …


Tutorial: Neuro-Symbolic Ai For Mental Healthcare, Kaushik Roy, Usha Lokala, Manas Gaur, Amit Sheth Oct 2022

Tutorial: Neuro-Symbolic Ai For Mental Healthcare, Kaushik Roy, Usha Lokala, Manas Gaur, Amit Sheth

Publications

Artificial Intelligence (AI) systems for mental healthcare (MHCare) have been ever-growing after realizing the importance of early interventions for patients with chronic mental health (MH) conditions. Social media (SocMedia) emerged as the go-to platform for supporting patients seeking MHCare. The creation of peer-support groups without social stigma has resulted in patients transitioning from clinical settings to SocMedia supported interactions for quick help. Researchers started exploring SocMedia content in search of cues that showcase correlation or causation between different MH conditions to design better interventional strategies. User-level Classification-based AI systems were designed to leverage diverse SocMedia data from various MH conditions, …


Fellowship Application Sample, John Dowd Oct 2022

Fellowship Application Sample, John Dowd

ICS Fellow Applications

No abstract provided.


The Link Between Democratic Institutions And Population Health In The American States, Julianna Pacheco, Scott Lacombe Oct 2022

The Link Between Democratic Institutions And Population Health In The American States, Julianna Pacheco, Scott Lacombe

Government: Faculty Publications

Context: This project investigates the role of state-level institutions in explaining variation in population health in the American states. Although cross-national research has established the positive effects of democracy on population health, little attention has been given to subnational units. The authors leverage a new data set to understand how political accountability and a system of checks and balances are associated with state population health. Methods: The authors estimate error correction models and two-way fixed effects models to estimate how the strength of state-level democratic institutions is associated with infant mortality rates, life expectancy, and midlife mortality. Findings: The authors …


Learning To Automate Follow-Up Question Generation Using Process Knowledge For Depression Triage On Reddit Posts, Shrey Gupta, Anmol Agarwal, Manas Gaur, Kaushik Roy, Vignesh Narayanan, Ponnurangam Kumaraguru, Amit Sheth Oct 2022

Learning To Automate Follow-Up Question Generation Using Process Knowledge For Depression Triage On Reddit Posts, Shrey Gupta, Anmol Agarwal, Manas Gaur, Kaushik Roy, Vignesh Narayanan, Ponnurangam Kumaraguru, Amit Sheth

Publications

Conversational Agents (CAs) powered with deep language models (DLMs) have shown tremendous promise in the domain of mental health. Prominently, the CAs have been used to provide informational or therapeutic services (e.g., cognitive behavioral therapy) to patients. However, the utility of CAs to assist in mental health triaging has not been explored in the existing work as it requires a controlled generation of follow-up questions (FQs), which are often initiated and guided by the mental health professionals (MHPs) in clinical settings. In the context of `depression', our experiments show that DLMs coupled with process knowledge in a mental health questionnaire …


“Be A Pattern For The World”: The Development Of A Dark Patterns Detection Tool To Prevent Online User Loss, Jordan Donnelly, Alan Downley, Yunpeng Liu, Yufei Su, Quanwei Sun, Lan Zeng, Andrea Curley, Damian Gordon, Paul Kelly, Dympna O'Sullivan, Anna Becevel Sep 2022

“Be A Pattern For The World”: The Development Of A Dark Patterns Detection Tool To Prevent Online User Loss, Jordan Donnelly, Alan Downley, Yunpeng Liu, Yufei Su, Quanwei Sun, Lan Zeng, Andrea Curley, Damian Gordon, Paul Kelly, Dympna O'Sullivan, Anna Becevel

Articles

Dark Patterns are designed to trick users into sharing more information or spending more money than they had intended to do, by configuring online interactions to confuse or add pressure to the users. They are highly varied in their form, and are therefore difficult to classify and detect. Therefore, this research is designed to develop a framework for the automated detection of potential instances of web-based dark patterns, and from there to develop a software tool that will provide a highly useful defensive tool that helps detect and highlight these patterns.


Self-Supervised Learning For Invariant Representations From Multi-Spectral And Sar Images, Pallavi Jain, Bianca Schoen Phelan, Robert J. Ross Sep 2022

Self-Supervised Learning For Invariant Representations From Multi-Spectral And Sar Images, Pallavi Jain, Bianca Schoen Phelan, Robert J. Ross

Articles

Self-Supervised learning (SSL) has become the new state of the art in several domain classification and segmentation tasks. One popular category of SSL are distillation networks such as Bootstrap Your Own Latent (BYOL). This work proposes RS-BYOL, which builds on BYOL in the remote sensing (RS) domain where data are non-trivially different from natural RGB images. Since multi-spectral (MS) and synthetic aperture radar (SAR) sensors provide varied spectral and spatial resolution information, we utilise them as an implicit augmentation to learn invariant feature embeddings. In order to learn RS based invariant features with SSL, we trained RS-BYOL in two ways, …


Deep Learning Fusion Of Satellite And Social Information To Estimate Human Migratory Flows, Daniel Runfola, Heather Baier, Laura Mills, Maeve Naughton-Rockwell, Anthony Stefanidis Sep 2022

Deep Learning Fusion Of Satellite And Social Information To Estimate Human Migratory Flows, Daniel Runfola, Heather Baier, Laura Mills, Maeve Naughton-Rockwell, Anthony Stefanidis

Arts & Sciences Articles

Human migratory decisions are driven by a wide range of factors, including economic and environmental condi-tions, conflict, and evolving social dynamics. These factors are reflected in disparate data sources, including house-hold surveys, satellite imagery, and even news and social media. Here, we present a deep learning- based data fusion technique integrating satellite and census data to estimate migratory flows from Mexico to the United States. We leverage a three-stage approach, in which we (1) construct a matrix- based representation of socioeconomic information for each municipality in Mexico, (2) implement a convolutional neural network with both satellite imagery and the constructed …


Implementing Github Actions Continuous Integration To Reduce Error Rates In Ecological Data Collection, Albert Y. Kim, Valentine Herrmann, Ross Barreto, Brianna Calkins, Erika Gonzalez-Akre, Daniel J. Johnson, Jennifer A. Jordan, Lukas Magee, Ian R. Mcgregor, Nicolle Montero, Karl Novak, Teagan Rogers, Jessica Shue, Kristina J. Anderson-Teixeira Sep 2022

Implementing Github Actions Continuous Integration To Reduce Error Rates In Ecological Data Collection, Albert Y. Kim, Valentine Herrmann, Ross Barreto, Brianna Calkins, Erika Gonzalez-Akre, Daniel J. Johnson, Jennifer A. Jordan, Lukas Magee, Ian R. Mcgregor, Nicolle Montero, Karl Novak, Teagan Rogers, Jessica Shue, Kristina J. Anderson-Teixeira

Statistical and Data Sciences: Faculty Publications

Accurate field data are essential to understanding ecological systems and forecasting their responses to global change. Yet, data collection errors are common, and data analysis often lags far enough behind its collection that many errors can no longer be corrected, nor can anomalous observations be revisited. Needed is a system in which data quality assurance and control (QA/QC), along with the production of basic data summaries, can be automated immediately following data collection.

Here, we implement and test a system to satisfy these needs. For two annual tree mortality censuses and a dendrometer band survey at two forest research sites, …


Developing Research Data Management Services In A Regional Comprehensive University: The Case Of Central Washington University, Ping Fu, Maurice Blackson, Maura Valentino Aug 2022

Developing Research Data Management Services In A Regional Comprehensive University: The Case Of Central Washington University, Ping Fu, Maurice Blackson, Maura Valentino

Library Scholarship

This study aims to analyze the needs of researchers in a regional comprehensive university for research data management services; discuss the options for developing a research data management program at the university; and then propose a phased three-year implementation plan for the university libraries. The method was to design a survey to collect information from researchers and assess and evaluate their needs for research data management services. The results show that researchers’ needs in a regional comprehensive university could be quite different from those of researchers in a research-intensive university. Also, the results verify the hypothesis that researchers in the …


“I Think I Discovered A Military Base In The Middle Of The Ocean”—Null Island, The Most Real Of Fictional Places, Levente Juhasz, Peter Mooney Aug 2022

“I Think I Discovered A Military Base In The Middle Of The Ocean”—Null Island, The Most Real Of Fictional Places, Levente Juhasz, Peter Mooney

GIS Center

This paper explores Null Island, a fictional place located at 0° latitude and 0° longitude in the WGS84 (World Geodetic System 1984) geographic coordinate system. Null Island is erroneously associated with large amounts of geographic data in a wide variety of location-based services, place databases, social media and web-based maps. Whereas it was originally considered a joke within the geospatial community, this article will demonstrate implications of its existence, both technological and social in nature, promoting Null Island as a fundamental issue of geographic information that requires more widespread awareness. The article summarizes error sources that lead to data being …


Data Engineering Techniques And Designs With Music Generating Neural Networks, Noah Solomon Aug 2022

Data Engineering Techniques And Designs With Music Generating Neural Networks, Noah Solomon

Honors Program Theses and Projects

The generation of music artificially is an interesting concept to many and has received a lot of attention in recent years. The advancement of neural networks has allowed for the creation of models that can seemingly generate music creatively to mimic a specific genre or composer. This project delved deep into the many ways to construct music generating neural networks and compared different model architectures and data engineering techniques. Three main types of models were implemented and the resulting generated music was evaluated with respect to the melody, note agreeableness, and rhythm. These models used the Bach Chorales corpus as …


Automated Identification Of Astronauts On Board The International Space Station: A Case Study In Space Archaeology, Rao Hamza Ali, Amir Kanan Kashefi, Alice C. Gorman, Justin St. P. Walsh, Erik J. Linstead Aug 2022

Automated Identification Of Astronauts On Board The International Space Station: A Case Study In Space Archaeology, Rao Hamza Ali, Amir Kanan Kashefi, Alice C. Gorman, Justin St. P. Walsh, Erik J. Linstead

Art Faculty Articles and Research

We develop and apply a deep learning-based computer vision pipeline to automatically identify crew members in archival photographic imagery taken on-board the International Space Station. Our approach is able to quickly tag thousands of images from public and private photo repositories without human supervision with high degrees of accuracy, including photographs where crew faces are partially obscured. Using the results of our pipeline, we carry out a large-scale network analysis of the crew, using the imagery data to provide novel insights into the social interactions among crew during their missions.


Estimating The Health Effects Of Adding Bicycle And Pedestrian Paths At The Census Tract Level: Multiple Model Comparison, Ross J. Gore, Christopher Lynch, Craig Jordan, Andrew Collins, R. Michael Robinson, Gabrielle Fuller, Pearson Ames, Prateek Keerthi, Yash Kandukuri Aug 2022

Estimating The Health Effects Of Adding Bicycle And Pedestrian Paths At The Census Tract Level: Multiple Model Comparison, Ross J. Gore, Christopher Lynch, Craig Jordan, Andrew Collins, R. Michael Robinson, Gabrielle Fuller, Pearson Ames, Prateek Keerthi, Yash Kandukuri

VMASC Publications

Background: Adding additional bicycle and pedestrian paths to an area can lead to improved health outcomes for residents over time. However, quantitatively determining which areas benefit more from bicycle and pedestrian paths, how many miles of bicycle and pedestrian paths are needed, and the health outcomes that may be most improved remain open questions.

Objective: Our work provides and evaluates a methodology that offers actionable insight for city-level planners, public health officials, and decision makers tasked with the question “To what extent will adding specified bicycle and pedestrian path mileage to a census tract improve residents’ health outcomes over time?” …


A Comparative Study On Deep Learning Models For Text Classification Of Unstructured Medical Notes With Various Levels Of Class Imbalance, Hongxia Lu, Louis Ehwerhemuepha, Cyril Rakovski Jul 2022

A Comparative Study On Deep Learning Models For Text Classification Of Unstructured Medical Notes With Various Levels Of Class Imbalance, Hongxia Lu, Louis Ehwerhemuepha, Cyril Rakovski

Mathematics, Physics, and Computer Science Faculty Articles and Research

Background

Discharge medical notes written by physicians contain important information about the health condition of patients. Many deep learning algorithms have been successfully applied to extract important information from unstructured medical notes data that can entail subsequent actionable results in the medical domain. This study aims to explore the model performance of various deep learning algorithms in text classification tasks on medical notes with respect to different disease class imbalance scenarios.

Methods

In this study, we employed seven artificial intelligence models, a CNN (Convolutional Neural Network), a Transformer encoder, a pretrained BERT (Bidirectional Encoder Representations from Transformers), and four typical …