Open Access. Powered by Scholars. Published by Universities.®

Medicine and Health Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 2 of 2

Full-Text Articles in Medicine and Health Sciences

Deep Active Learning For Classifying Cancer Pathology Reports, Kevin De Angeli, Shang Gao, Mohammed Alawad, Hong‑Jun Yoon, Noah Schaeferkoetter, Xiao‑Cheng Wu, Eric B. Durbin, Jennifer Doherty, Antoinette Stroup, Linda Coyle, Lynne Penberthy, Georgia Tourassi Mar 2021

Deep Active Learning For Classifying Cancer Pathology Reports, Kevin De Angeli, Shang Gao, Mohammed Alawad, Hong‑Jun Yoon, Noah Schaeferkoetter, Xiao‑Cheng Wu, Eric B. Durbin, Jennifer Doherty, Antoinette Stroup, Linda Coyle, Lynne Penberthy, Georgia Tourassi

Kentucky Cancer Registry Faculty Publications

Background: Automated text classification has many important applications in the clinical setting; however, obtaining labelled data for training machine learning and deep learning models is often difficult and expensive. Active learning techniques may mitigate this challenge by reducing the amount of labelled data required to effectively train a model. In this study, we analyze the effectiveness of 11 active learning algorithms on classifying subsite and histology from cancer pathology reports using a Convolutional Neural Network as the text classification model.

Results: We compare the performance of each active learning strategy using two differently sized datasets and two different classification tasks. …


Limitations Of Transformers On Clinical Text Classification, Shang Gao, Mohammed Alawad, Michael Todd Young, John Gounley, Noah Schaefferkoetter, Hong-Jun Yoon, Xiao-Cheng Wu, Eric B. Durbin, Jennifer Doherty, Antoinette Stroup, Linda Coyle, Georgia D. Tourassi Feb 2021

Limitations Of Transformers On Clinical Text Classification, Shang Gao, Mohammed Alawad, Michael Todd Young, John Gounley, Noah Schaefferkoetter, Hong-Jun Yoon, Xiao-Cheng Wu, Eric B. Durbin, Jennifer Doherty, Antoinette Stroup, Linda Coyle, Georgia D. Tourassi

Kentucky Cancer Registry Faculty Publications

Bidirectional Encoder Representations from Transformers (BERT) and BERT-based approaches are the current state-of-the-art in many natural language processing (NLP) tasks; however, their application to document classification on long clinical texts is limited. In this work, we introduce four methods to scale BERT, which by default can only handle input sequences up to approximately 400 words long, to perform document classification on clinical texts several thousand words long. We compare these methods against two much simpler architectures -- a word-level convolutional neural network and a hierarchical self-attention network -- and show that BERT often cannot beat these simpler baselines when classifying …