Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Machine Learning

Databases and Information Systems

Institution
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 52

Full-Text Articles in Physical Sciences and Mathematics

Your Cursor Reveals: On Analyzing Workers’ Browsing Behavior And Annotation Quality In Crowdsourcing Tasks, Pei-Chi Lo, Ee-Peng Lim Oct 2023

Your Cursor Reveals: On Analyzing Workers’ Browsing Behavior And Annotation Quality In Crowdsourcing Tasks, Pei-Chi Lo, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

In this work, we investigate the connection between browsing behavior and task quality of crowdsourcing workers performing annotation tasks that require information judgements. Such information judgements are often required to derive ground truth answers to information retrieval queries. We explore the use of workers’ browsing behavior to directly determine their annotation result quality. We hypothesize user attention to be the main factor contributing to a worker’s annotation quality. To predict annotation quality at the task level, we model two aspects of task-specific user attention, also known as general and semantic user attentions . Both aspects of user attention can be …


Data-Driven 2d Materials Discovery For Next-Generation Electronics, Zeyu Zhang Aug 2023

Data-Driven 2d Materials Discovery For Next-Generation Electronics, Zeyu Zhang

Dissertations

The development of material discovery and design has lasted centuries in human history. After the concept of modern chemistry and material science was established, the strategy of material discovery relies on the experiments. Such a strategy becomes expensive and time-consuming with the increasing number of materials nowadays. Therefore, a novel strategy that is faster and more comprehensive is urgently needed. In this dissertation, an experiment-guided material discovery strategy is developed and explained using metal-organic frameworks (MOFs) as instances. The advent of 7r-stacked layered MOFs, which offer electrical conductivity on top of permanent porosity and high surface area, opened up new …


A Study Of Various Data Sizes Using Machine Learning, Sochaeta Koeum May 2023

A Study Of Various Data Sizes Using Machine Learning, Sochaeta Koeum

Electronic Theses, Projects, and Dissertations

Social media is a great domain for news consumption; however, it is referred to as a double-edged sword. While it is user-friendly and low-cost, social media is the reason why fake news can spread rapidly, which is detrimental to society, businesses, and many consumers. Therefore, fake news detection is an emerging field. However, some challenges have restricted other researchers from developing a universal machine learning model that is fast, efficient, and reliable to stop the proliferation because of the lack of resources available, such as large-sized datasets. The goal of this culminating experience project is to explore how varying datasets …


Chatgpt As Metamorphosis Designer For The Future Of Artificial Intelligence (Ai): A Conceptual Investigation, Amarjit Kumar Singh (Library Assistant), Dr. Pankaj Mathur (Deputy Librarian) Mar 2023

Chatgpt As Metamorphosis Designer For The Future Of Artificial Intelligence (Ai): A Conceptual Investigation, Amarjit Kumar Singh (Library Assistant), Dr. Pankaj Mathur (Deputy Librarian)

Library Philosophy and Practice (e-journal)

Abstract

Purpose: The purpose of this research paper is to explore ChatGPT’s potential as an innovative designer tool for the future development of artificial intelligence. Specifically, this conceptual investigation aims to analyze ChatGPT’s capabilities as a tool for designing and developing near about human intelligent systems for futuristic used and developed in the field of Artificial Intelligence (AI). Also with the helps of this paper, researchers are analyzed the strengths and weaknesses of ChatGPT as a tool, and identify possible areas for improvement in its development and implementation. This investigation focused on the various features and functions of ChatGPT that …


Data Poisoning: A New Threat To Artificial Intelligence, Nary Simms Jan 2023

Data Poisoning: A New Threat To Artificial Intelligence, Nary Simms

Mathematics and Computer Science Capstones

Artificial Intelligence (AI) adoption is rapidly being deployed in a number of fields, from banking and finance to healthcare, robotics, transportation, military, e-commerce and social networks. Grand View Research estimates that the global AI market was worth 93.5 billion in 2021 and that it will increase at a compound annual growth rate (CAGR) of 38.1% from 2022 to 2030. According to a 2020 MIT Sloan Management survey, 87% of multinational corporations believe that AI technology will provide a competitive edge. Artificial Intelligence relies heavily on datasets to train its models. The more data, the better it learns and predicts. However, …


Sequence Checking And Deduplication For Existing Fingerprint Databases, Tahsin Islam Sakif Jan 2023

Sequence Checking And Deduplication For Existing Fingerprint Databases, Tahsin Islam Sakif

Graduate Theses, Dissertations, and Problem Reports

Biometric technology is a rapidly evolving field with applications that range from access to devices to border crossing and entry/exit processes. Large-scale applications to collect biometric data, such as border crossings result in multimodal biometric databases containing thousands of identities. However, due to human operator error, these databases often contain many instances of image labeling and classification; this is due to the lack of training and throughput pressure that comes with human error. Multiple entries from the same individual may be assigned to a different identity. Rolled fingerprints may be labeled as flat images, a face image entered into a …


Champions For Social Good: How Can We Discover Social Sentiment And Attitude-Driven Patterns In Prosocial Communication?, Raghava Rao Mukkamala, Robert J. Kauffman, Helle Zinner Henriksen Jan 2023

Champions For Social Good: How Can We Discover Social Sentiment And Attitude-Driven Patterns In Prosocial Communication?, Raghava Rao Mukkamala, Robert J. Kauffman, Helle Zinner Henriksen

Research Collection School Of Computing and Information Systems

The UN High Commissioner on Refugees (UNHCR) is pursuing a social media strategy to inform people about displaced populations and refugee emergencies. It is actively engaging public figures to increase awareness through its prosocial communications and improve social informedness and support for policy changes in its services. We studied the Twitter communications of UNHCR social media champions and investigated their role as high-profile influencers. In this study, we offer a design science research and data analytics framework and propositions based on the social informedness theory we propose in this paper to assess communication about UNHCR’s mission. Two variables—refugee-emergency and champion …


Supervised Representation Learning For Improving Prediction Performance In Medical Decision Support Applications, Phawis Thammasorn May 2022

Supervised Representation Learning For Improving Prediction Performance In Medical Decision Support Applications, Phawis Thammasorn

Graduate Theses and Dissertations

Machine learning approaches for prediction play an integral role in modern-day decision supports system. An integral part of the process is extracting interest variables or features to describe the input data. Then, the variables are utilized for training machine-learning algorithms to map from the variables to the target output. After the training, the model is validated with either validation or testing data before making predictions with a new dataset. Despite the straightforward workflow, the process relies heavily on good feature representation of data. Engineering suitable representation eases the subsequent actions and copes with many practical issues that potentially prevent the …


Using A Bert-Based Ensemble Network For Abusive Language Detection, Noah Ballinger May 2022

Using A Bert-Based Ensemble Network For Abusive Language Detection, Noah Ballinger

Computer Science and Computer Engineering Undergraduate Honors Theses

Over the past two decades, online discussion has skyrocketed in scope and scale. However, so has the amount of toxicity and offensive posts on social media and other discussion sites. Despite this rise in prevalence, the ability to automatically moderate online discussion platforms has seen minimal development. Recently, though, as the capabilities of artificial intelligence (AI) continue to improve, the potential of AI-based detection of harmful internet content has become a real possibility. In the past couple years, there has been a surge in performance on tasks in the field of natural language processing, mainly due to the development of …


Entity Based Sentiment Analysis For Textual Health Advice, Dae Lim Chung Apr 2022

Entity Based Sentiment Analysis For Textual Health Advice, Dae Lim Chung

Computer Science Senior Theses

This work explores entity based sentiment analysis for textual health advice through deep learning. We fine tuned a pretrained BERT model to analyze sentiments across five different predetermined categories which consist of food, medicine, disease, exercise, and vitality for three different sentiments: positive, negative, and neutral. Original set of annotated medical dataset from Dartmouth College’s Persist Lab was used to conduct the experiments. For the aim of tailoring the data for the purpose of entity based sentiment analysis, we explored data transformation techniques to generate optimum training examples. During the experiments, we were able to discover that the wide variety …


Caption And Image Based Next-Word Auto-Completion, Meet Patel Jan 2022

Caption And Image Based Next-Word Auto-Completion, Meet Patel

Master's Projects

With the increasing number of options or choices in terms of entities like products, movies, songs, etc. which are now available to users, they try to save time by looking for an application or system that provides automatic recommendations. Recommender systems are automated computing processes that leverage concepts of Machine Learning, Data Mining and Artificial Intelligence towards generating product recommendations based on a user’s preferences. These systems have given a significant boost to businesses across multiple segments as a result of reduced human intervention. One similar aspect of this is content writing. It would save users a lot of time …


Information Extraction And Classification On Journal Papers, Lei Yu Nov 2021

Information Extraction And Classification On Journal Papers, Lei Yu

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

The importance of journals for diffusing the results of scientific research has increased considerably. In the digital era, Portable Document Format (PDF) became the established format of electronic journal articles. This structured form, combined with a regular and wide dissemination, spread scientific advancements easily and quickly. However, the rapidly increasing numbers of published scientific articles requires more time and effort on systematic literature reviews, searches and screens. The comprehension and extraction of useful information from the digital documents is also a challenging task, due to the complex structure of PDF.

To help a soil science team from the United States …


A Survey On Ml4vis: Applying Machine Learning Advances To Data Visualization, Qianwen Wang, Zhutian Chen, Yong Wang, Huamin Qu Aug 2021

A Survey On Ml4vis: Applying Machine Learning Advances To Data Visualization, Qianwen Wang, Zhutian Chen, Yong Wang, Huamin Qu

Research Collection School Of Computing and Information Systems

Inspired by the great success of machine learning (ML), researchers have applied ML techniques to visualizations to achieve a better design, development, and evaluation of visualizations. This branch of studies, known as ML4VIS, is gaining increasing research attention in recent years. To successfully adapt ML techniques for visualizations, a structured understanding of the integration of ML4VIS is needed. In this article, we systematically survey 88 ML4VIS studies, aiming to answer two motivating questions: “what visualization processes can be assisted by ML?” and “how ML techniques can be used to solve visualization problems? ” This survey reveals seven main processes where …


Signal Processing And Data Analysis For Real-Time Intermodal Freight Classification Through A Multimodal Sensor System., Enrique J. Sanchez Headley Jul 2021

Signal Processing And Data Analysis For Real-Time Intermodal Freight Classification Through A Multimodal Sensor System., Enrique J. Sanchez Headley

Graduate Theses and Dissertations

Identifying freight patterns in transit is a common need among commercial and municipal entities. For example, the allocation of resources among Departments of Transportation is often predicated on an understanding of freight patterns along major highways. There exist multiple sensor systems to detect and count vehicles at areas of interest. Many of these sensors are limited in their ability to detect more specific features of vehicles in traffic or are unable to perform well in adverse weather conditions. Despite this limitation, to date there is little comparative analysis among Laser Imaging and Detection and Ranging (LIDAR) sensors for freight detection …


Soarnet, Deep Learning Thermal Detection For Free Flight, Jake T. Tallman Jun 2021

Soarnet, Deep Learning Thermal Detection For Free Flight, Jake T. Tallman

Master's Theses

Thermals are regions of rising hot air formed on the ground through the warming of the surface by the sun. Thermals are commonly used by birds and glider pilots to extend flight duration, increase cross-country distance, and conserve energy. This kind of powerless flight using natural sources of lift is called soaring. Once a thermal is encountered, the pilot flies in circles to keep within the thermal, so gaining altitude before flying off to the next thermal and towards the destination. A single thermal can net a pilot thousands of feet of elevation gain, however estimating thermal locations is not …


Online Review Analysis From Two Perspectives: Customers And Business Owners, Eunjung Lee May 2021

Online Review Analysis From Two Perspectives: Customers And Business Owners, Eunjung Lee

Theses and Dissertations

As online reviews become increasingly prevalent, both online businesses and customers face big data challenges. Individuals are now relying on reviews derived from websites where the reliability of a source depends on the reviewers. Customers spend much time and effort looking for reviews that are useful for them. Accordingly, online review platforms aim to explore various approaches to select useful reviews and present them to customers. At the same time, for business owners, marketers, and e-commerce managers, it has become an essential strategy in recent years to collect as many online reviews as possible. If marketers and managers are able …


Two Essays On Leveraging Analytics To Improve Healthcare, Deepika Gopukumar May 2021

Two Essays On Leveraging Analytics To Improve Healthcare, Deepika Gopukumar

Theses and Dissertations

The healthcare cost has continued to increase over the past few years despite various policies, efforts, and initiatives taken by the government. It is still projected to grow over the next few years by the Centers for Medicare and Medicaid Services (CMS). Readmissions have been a major contributor to the increase in costs and have always been a contributing factor. To get a perspective, considering the fact that at least 9% of individuals who had COVID-19 were likely to get readmitted shortly, according to a study by the Centers for Disease Control and Prevention (CDC) COVID-19 response team, along with …


Machine Learning Approaches To Dribble Hand-Off Action Classification With Sportvu Nba Player Coordinate Data, Dembe Stephanos May 2021

Machine Learning Approaches To Dribble Hand-Off Action Classification With Sportvu Nba Player Coordinate Data, Dembe Stephanos

Electronic Theses and Dissertations

Recently, strategies of National Basketball Association teams have evolved with the skillsets of players and the emergence of advanced analytics. One of the most effective actions in dynamic offensive strategies in basketball is the dribble hand-off (DHO). This thesis proposes an architecture for a classification pipeline for detecting DHOs in an accurate and automated manner. This pipeline consists of a combination of player tracking data and event labels, a rule set to identify candidate actions, manually reviewing game recordings to label the candidates, and embedding player trajectories into hexbin cell paths before passing the completed training set to the classification …


A New Feature Selection Method Based On Class Association Rule, Sami A. Al-Dhaheri Feb 2021

A New Feature Selection Method Based On Class Association Rule, Sami A. Al-Dhaheri

Dissertations, Theses, and Capstone Projects

Feature selection is a key process for supervised learning algorithms. It involves discarding irrelevant attributes from the training dataset from which the models are derived. One of the vital feature selection approaches is Filtering, which often uses mathematical models to compute the relevance for each feature in the training dataset and then sorts the features into descending order based on their computed scores. However, most Filtering methods face several challenges including, but not limited to, merely considering feature-class correlation when defining a feature’s relevance; additionally, not recommending which subset of features to retain. Leaving this decision to the end-user may …


Visual Analysis Of Discrimination In Machine Learning, Qianwen Wang, Zhenghua Xu, Zhutian Chen, Yong Wang, Shixia Liu, Huamin Qu Feb 2021

Visual Analysis Of Discrimination In Machine Learning, Qianwen Wang, Zhenghua Xu, Zhutian Chen, Yong Wang, Shixia Liu, Huamin Qu

Research Collection School Of Computing and Information Systems

The growing use of automated decision-making in critical applications, such as crime prediction and college admission, has raised questions about fairness in machine learning. How can we decide whether different treatments are reasonable or discriminatory? In this paper, we investigate discrimination in machine learning from a visual analytics perspective and propose an interactive visualization tool, DiscriLens, to support a more comprehensive analysis. To reveal detailed information on algorithmic discrimination, DiscriLens identifies a collection of potentially discriminatory itemsets based on causal modeling and classification rules mining. By combining an extended Euler diagram with a matrix-based visualization, we develop a novel set …


Data: The Good, The Bad And The Ethical, John D. Kelleher, Filipe Cabral Pinto, Luis M. Cortesao Dec 2020

Data: The Good, The Bad And The Ethical, John D. Kelleher, Filipe Cabral Pinto, Luis M. Cortesao

Articles

It is often the case with new technologies that it is very hard to predict their long-term impacts and as a result, although new technology may be beneficial in the short term, it can still cause problems in the longer term. This is what happened with oil by-products in different areas: the use of plastic as a disposable material did not take into account the hundreds of years necessary for its decomposition and its related long-term environmental damage. Data is said to be the new oil. The message to be conveyed is associated with its intrinsic value. But as in …


Cross Dataset Evaluation For Iot Network Intrusion Detection, Anjum Farah Dec 2020

Cross Dataset Evaluation For Iot Network Intrusion Detection, Anjum Farah

Theses and Dissertations

With the advent of Internet of Things (IOT) technology, the need to ensure the security of an IOT network has become important. There are several intrusion detection systems (IDS) that are available for analyzing and predicting network anomalies and threats. However, it is challenging to evaluate them to realistically estimate their performance when deployed. A lot of research has been conducted where the training and testing is done using the same simulated dataset. However, realistically, a network on which an intrusion detection model is deployed will be very different from the network on which it was trained. The aim of …


Using Object Detection Algorithm And Optical Character Recognition To Read Data From Alphanumeric Tags In Text, Ana Bazerque, Davi Moraes, Marcela Souza Oct 2020

Using Object Detection Algorithm And Optical Character Recognition To Read Data From Alphanumeric Tags In Text, Ana Bazerque, Davi Moraes, Marcela Souza

ICT

The present document explores the use of machine learning techniques, specifically supervised learning and classification. It applies those techniques to create a solution for a real world company that provides medical products and services to hospitals. This project will deal with streamlining the calibration of medical weighing scales. The developed application will use object detection and character recognition to identify and classify a digital image of a scale’s tag, and fill in a form with the corresponding data. The main reason for the need of this application is to avoid human errors and automate the collection of data from the …


Cover Song Identification - A Novel Stem-Based Approach To Improve Song-To-Song Similarity Measurements, Lavonnia Newman, Dhyan Shah, Chandler Vaughn, Faizan Javed Sep 2020

Cover Song Identification - A Novel Stem-Based Approach To Improve Song-To-Song Similarity Measurements, Lavonnia Newman, Dhyan Shah, Chandler Vaughn, Faizan Javed

SMU Data Science Review

Music is incorporated into our daily lives whether intentional or unintentional. It evokes responses and behavior so much so there is an entire study dedicated to the psychology of music. Music creates the mood for dancing, exercising, creative thought or even relaxation. It is a powerful tool that can be used in various venues and through advertisements to influence and guide human reactions. Music is also often "borrowed" in the industry today. The practices of sampling and remixing music in the digital age have made cover song identification an active area of research. While most of this research is focused …


Machine Learning Applications For Drug Repurposing, Hansaim Lim Sep 2020

Machine Learning Applications For Drug Repurposing, Hansaim Lim

Dissertations, Theses, and Capstone Projects

The cost of bringing a drug to market is astounding and the failure rate is intimidating. Drug discovery has been of limited success under the conventional reductionist model of one-drug-one-gene-one-disease paradigm, where a single disease-associated gene is identified and a molecular binder to the specific target is subsequently designed. Under the simplistic paradigm of drug discovery, a drug molecule is assumed to interact only with the intended on-target. However, small molecular drugs often interact with multiple targets, and those off-target interactions are not considered under the conventional paradigm. As a result, drug-induced side effects and adverse reactions are often neglected …


Visual Analytics Of Electronic Health Records With A Focus On Acute Kidney Injury, Sheikh S. Abdullah Jul 2020

Visual Analytics Of Electronic Health Records With A Focus On Acute Kidney Injury, Sheikh S. Abdullah

Electronic Thesis and Dissertation Repository

The increasing use of electronic platforms in healthcare has resulted in the generation of unprecedented amounts of data in recent years. The amount of data available to clinical researchers, physicians, and healthcare administrators continues to grow, which creates an untapped resource with the ability to improve the healthcare system drastically. Despite the enthusiasm for adopting electronic health records (EHRs), some recent studies have shown that EHR-based systems hardly improve the ability of healthcare providers to make better decisions. One reason for this inefficacy is that these systems do not allow for human-data interaction in a manner that fits and supports …


Dynamic Fraud Detection Via Sequential Modeling, Panpan Zheng May 2020

Dynamic Fraud Detection Via Sequential Modeling, Panpan Zheng

Graduate Theses and Dissertations

The impacts of information revolution are omnipresent from life to work. The web services have signicantly changed our living styles in daily life, such as Facebook for communication and Wikipedia for knowledge acquirement. Besides, varieties of information systems, such as data management system and management information system, make us work more eciently. However, it is usually a double-edged sword. With the popularity of web services, relevant security issues are arising, such as fake news on Facebook and vandalism on Wikipedia, which denitely impose severe security threats to OSNs and their legitimate participants. Likewise, oce automation incurs another challenging security issue, …


Chaff From The Wheat: Characterizing And Determining Valid Bug Reports, Yuanrui Fan, Xin Xia, David Lo, Ahmed E. Hassan May 2020

Chaff From The Wheat: Characterizing And Determining Valid Bug Reports, Yuanrui Fan, Xin Xia, David Lo, Ahmed E. Hassan

Research Collection School Of Computing and Information Systems

Developers use bug reports to triage and fix bugs. When triaging a bug report, developers must decide whether the bug report is valid (i.e., a real bug). A large amount of bug reports are submitted every day, with many of them end up being invalid reports. Manually determining valid bug report is a difficult and tedious task. Thus, an approach that can automatically analyze the validity of a bug report and determine whether a report is valid can help developers prioritize their triaging tasks and avoid wasting time and effort on invalid bug reports. In this study, motivated by the …


Information Extraction From Biomedical Text Using Machine Learning, Deepti Garg Dec 2019

Information Extraction From Biomedical Text Using Machine Learning, Deepti Garg

Master's Projects

Inadequate drug experimental data and the use of unlicensed drugs may cause adverse drug reactions, especially in pediatric populations. Every year the U.S. Food and Drug Administration approves human prescription drugs for marketing. The labels associated with these drugs include information about clinical trials and drug response in pediatric population. In order for doctors to make an informed decision about the safety and effectiveness of these drugs for children, there is a need to analyze complex and often unstructured drug labels. In this work, first, an exploratory analysis of drug labels using a Natural Language Processing pipeline is performed. Second, …


Multimodal Data Analytics And Fusion For Data Science, Haiman Tian Jun 2019

Multimodal Data Analytics And Fusion For Data Science, Haiman Tian

FIU Electronic Theses and Dissertations

Advances in technologies have rapidly accumulated a zettabyte of “new” data every two years. The huge amount of data have a powerful impact on various areas in science and engineering and generates enormous research opportunities, which calls for the design and development of advanced approaches in data analytics. Given such demands, data science has become an emerging hot topic in both industry and academia, ranging from basic business solutions, technological innovations, and multidisciplinary research to political decisions, urban planning, and policymaking. Within the scope of this dissertation, a multimodal data analytics and fusion framework is proposed for data-driven knowledge discovery …