Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Other

Theses/Dissertations

Classification

Articles 1 - 6 of 6

Full-Text Articles in Physical Sciences and Mathematics

Enhancing Health Tweet Classification: An Evaluation Of Transformer-Based Models For Comprehensive Analysis, Foram Pankajbhai Patel May 2023

Enhancing Health Tweet Classification: An Evaluation Of Transformer-Based Models For Comprehensive Analysis, Foram Pankajbhai Patel

Computer Science and Engineering Theses

The task of health tweet classification entails identifying whether a given tweet is health-related or not. While existing research in this area has made significant progress in classifying tweets into specific sub-domains of health, such as mental health, COVID-19, or specific diseases, there is a need for a more comprehensive approach that considers a broader range of health-related topics. This thesis addresses this need by proposing a diverse and comprehensive dataset that includes various existing health-related datasets, data collected through a keyword-based approach, and manually annotated data. However, the use of health-related keywords in a figurative or non-health context poses …


Randomized And Evolutionary Approaches To Dataset Characterization, Feature Weighting, And Sampling In K-Nearest Neighbors, Suryoday Basak May 2020

Randomized And Evolutionary Approaches To Dataset Characterization, Feature Weighting, And Sampling In K-Nearest Neighbors, Suryoday Basak

Computer Science and Engineering Theses

K-Nearest Neighbors (KNN) has remained one of the most popular methods for supervised machine learning tasks. However, its performance often depends on the characteristics of the dataset and on appropriate feature scaling. In this thesis, characteristics of a dataset that make it suitable for being used within KNN are explored. As part of this, two new measures for dataset dispersion, called mean neighborhood target variance (MNTV), and mean neighborhood target entropy (MNTE) are developed to help determine the performance we expect while using KNN regressors and classifiers, respectively. It is empirically demonstrated that these measures of dispersion can be indicative …


Teaching Computers To Teach Themselves: Synthesizing Training Data Based On Human-Perceived Elements, James Little May 2019

Teaching Computers To Teach Themselves: Synthesizing Training Data Based On Human-Perceived Elements, James Little

Honors Projects

Isolation-Based Scene Generation (IBSG) is a process for creating synthetic datasets made to train machine learning detectors and classifiers. In this project, we formalize the IBSG process and describe the scenarios—object detection and object classification given audio or image input—in which it can be useful. We then look at the Stanford Street View House Number (SVHN) dataset and build several different IBSG training datasets based on existing SVHN data. We try to improve the compositing algorithm used to build the IBSG dataset so that models trained with synthetic data perform as well as models trained with the original SVHN training …


Topological And Feature Based Identification Of Hole Boundaries In Point Cloud Data And Differentiation Between Surface And Physical Holes, Aaqif Muhtasim Dec 2018

Topological And Feature Based Identification Of Hole Boundaries In Point Cloud Data And Differentiation Between Surface And Physical Holes, Aaqif Muhtasim

Computer Science and Engineering Theses

With the advent of autonomous agents becoming prominent in everyday lives, the importance of processing the surroundings into understandable features becomes more and more important. 3D point clouds play a major role in the perception of such agents and thus having the ability to correctly decipher features from point clouds is crucial to the planning of actions that the agent would need to undertake. This thesis analyzes holes found in point clouds. Based on two approaches that center around topological data analysis and local point set features respectively. It studies how each of the methods works and how a combination …


Classification Of Clinical Narratives Using Convolutional Neural Network, Nikit Rajiv Lonari Dec 2018

Classification Of Clinical Narratives Using Convolutional Neural Network, Nikit Rajiv Lonari

Computer Science and Engineering Theses

Patient safety is a key aspect for good consumer care. When an individual is hospitalized or receives medication the family wants the patient safety to be above all factors. For instance, a drug can do both either cure the disease or perhaps, give rise to an adverse event. A drug administered for an indicated condition has substantial power to reduce or cure a disease and further to prevent it from happening again in the future but at the risk of side effects. At present, there are several methods in patient safety and in particular in the area of signal detection …


From Text Classification To Image Clustering, Problems Less Optimized, Amirhossein Herandi May 2018

From Text Classification To Image Clustering, Problems Less Optimized, Amirhossein Herandi

Computer Science and Engineering Theses

Machine Learning is thriving. Every industry is using its techniques in some way to improve their efficiency and revenue. However, the focus on research is not divided equally between all of the different areas and problems that this field can tackle and analyze. Currently, Computer Vision is the one area that is being focused very extensively by researchers and companies alike, and as a result has seen an amazing boost in the recent years. This ranges from the well-known problems of classification that use discriminative models all the way to more novel problems that use generative models such as style …