Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Machine Learning

PDF

Software Engineering

Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 38

Full-Text Articles in Computer Sciences

Testsgd: Interpretable Testing Of Neural Networks Against Subtle Group Discrimination, Mengdi Zhang, Jun Sun, Jingyi Wang, Bing Sun Sep 2023

Testsgd: Interpretable Testing Of Neural Networks Against Subtle Group Discrimination, Mengdi Zhang, Jun Sun, Jingyi Wang, Bing Sun

Research Collection School Of Computing and Information Systems

Discrimination has been shown in many machine learning applications, which calls for sufficient fairness testing before their deployment in ethic-relevant domains. One widely concerning type of discrimination, testing against group discrimination, mostly hidden, is much less studied, compared with identifying individual discrimination. In this work, we propose TestSGD, an interpretable testing approach which systematically identifies and measures hidden (which we call ‘subtle’) group discrimination of a neural network characterized by conditions over combinations of the sensitive attributes. Specifically, given a neural network, TestSGD first automatically generates an interpretable rule set which categorizes the input space into two groups. Alongside, TestSGD …


Stream-Evolving Bot Detection Framework Using Graph-Based And Feature-Based Approaches For Identifying Social Bots On Twitter, Eiman Alothali Jun 2023

Stream-Evolving Bot Detection Framework Using Graph-Based And Feature-Based Approaches For Identifying Social Bots On Twitter, Eiman Alothali

Dissertations

This dissertation focuses on the problem of evolving social bots in online social networks, particularly Twitter. Such accounts spread misinformation and inflate social network content to mislead the masses. The main objective of this dissertation is to propose a stream-based evolving bot detection framework (SEBD), which was constructed using both graph- and feature-based models. It was built using Python, a real-time streaming engine (Apache Kafka version 3.2), and our pretrained model (bot multi-view graph attention network (Bot-MGAT)). The feature-based model was used to identify predictive features for bot detection and evaluate the SEBD predictions. The graph-based model was used to …


Ai Applications On Planetary Rovers, Alexis David Pascual Mar 2023

Ai Applications On Planetary Rovers, Alexis David Pascual

Electronic Thesis and Dissertation Repository

The rise in the number of robotic missions to space is paving the way for the use of artificial intelligence and machine learning in the autonomy and augmentation of rover operations. For one, more rovers mean more images, and more images mean more data bandwidth required for downlinking as well as more mental bandwidth for analyzing the images. On the other hand, light-weight, low-powered microrover platforms are being developed to accommodate the drive for planetary exploration. As a result of the mass and power constraints, these microrover platforms will not carry typical navigational instruments like a stereocamera or a laser …


Chatgpt As Metamorphosis Designer For The Future Of Artificial Intelligence (Ai): A Conceptual Investigation, Amarjit Kumar Singh (Library Assistant), Dr. Pankaj Mathur (Deputy Librarian) Mar 2023

Chatgpt As Metamorphosis Designer For The Future Of Artificial Intelligence (Ai): A Conceptual Investigation, Amarjit Kumar Singh (Library Assistant), Dr. Pankaj Mathur (Deputy Librarian)

Library Philosophy and Practice (e-journal)

Abstract

Purpose: The purpose of this research paper is to explore ChatGPT’s potential as an innovative designer tool for the future development of artificial intelligence. Specifically, this conceptual investigation aims to analyze ChatGPT’s capabilities as a tool for designing and developing near about human intelligent systems for futuristic used and developed in the field of Artificial Intelligence (AI). Also with the helps of this paper, researchers are analyzed the strengths and weaknesses of ChatGPT as a tool, and identify possible areas for improvement in its development and implementation. This investigation focused on the various features and functions of ChatGPT that …


On The Pursuit Of Developer Happiness: Webcam-Based Eye Tracking And Affect Recognition In The Ide, Tamsin Rogers Jan 2023

On The Pursuit Of Developer Happiness: Webcam-Based Eye Tracking And Affect Recognition In The Ide, Tamsin Rogers

Honors Theses

Recent research highlights the viability of webcam-based eye tracking as a low-cost alternative to dedicated remote eye trackers. Simultaneously, research shows the importance of understanding emotions of software developers, where it was found that emotions have significant effects on productivity, code quality, and team dynamics. In this paper, we present our work towards an integrated eye-tracking and affect recognition tool for use during software development. This combined approach could enhance our understanding of software development by combining information about the code developers are looking at, along with the emotions they experience. The presented tool utilizes an unmodified webcam to capture …


Liquid Tab, Nathan Hulet Jan 2023

Liquid Tab, Nathan Hulet

Williams Honors College, Honors Research Projects

Guitar transcription is a complex task requiring significant time, skill, and musical knowledge to achieve accurate results. Since most music is recorded and processed digitally, it would seem like many tools to digitally analyze and transcribe the audio would be available. However, the problem of automatic transcription presents many more difficulties than are initially evident. There are multiple ways to play a guitar, many diverse styles of playing, and every guitar sounds different. These problems become even more difficult considering the varying qualities of recordings and levels of background noise.

Machine learning has proven itself to be a flexible tool …


A Symbolic Music Transformer For Real-Time Expressive Performance And Improvisation, Arnav Shirodkar Jan 2023

A Symbolic Music Transformer For Real-Time Expressive Performance And Improvisation, Arnav Shirodkar

Senior Projects Fall 2023

With the widespread proliferation of AI technology, deep architectures — many of which are based on neural networks — have been incredibly successful in a variety of different research areas and applications. Within the relatively new domain of Music Information Retrieval (MIR), deep neural networks have also been successful for a variety of tasks, including tempo estimation, beat detection, genre classification, and more. Drawing inspiration from projects like George E. Lewis's Voyager and Al Biles's GenJam, two pioneering endeavors in human-computer interaction, this project attempts to tackle the problem of expressive music generation and seeks to create a Symbolic Music …


Adaptive Fairness Improvement Based Causality Analysis, Mengdi Zhang, Jun Sun Nov 2022

Adaptive Fairness Improvement Based Causality Analysis, Mengdi Zhang, Jun Sun

Research Collection School Of Computing and Information Systems

Given a discriminating neural network, the problem of fairness improvement is to systematically reduce discrimination without significantly scarifies its performance (i.e., accuracy). Multiple categories of fairness improving methods have been proposed for neural networks, including pre-processing, in-processing and postprocessing. Our empirical study however shows that these methods are not always effective (e.g., they may improve fairness by paying the price of huge accuracy drop) or even not helpful (e.g., they may even worsen both fairness and accuracy). In this work, we propose an approach which adaptively chooses the fairness improving method based on causality analysis. That is, we choose the …


Supporting The Discovery, Reuse, And Validation Of Cybersecurity Requirements At The Early Stages Of The Software Development Lifecycle, Jessica Antonia Steinmann Oct 2022

Supporting The Discovery, Reuse, And Validation Of Cybersecurity Requirements At The Early Stages Of The Software Development Lifecycle, Jessica Antonia Steinmann

Doctoral Dissertations and Master's Theses

The focus of this research is to develop an approach that enhances the elicitation and specification of reusable cybersecurity requirements. Cybersecurity has become a global concern as cyber-attacks are projected to cost damages totaling more than $10.5 trillion dollars by 2025. Cybersecurity requirements are more challenging to elicit than other requirements because they are nonfunctional requirements that requires cybersecurity expertise and knowledge of the proposed system. The goal of this research is to generate cybersecurity requirements based on knowledge acquired from requirements elicitation and analysis activities, to provide cybersecurity specifications without requiring the specialized knowledge of a cybersecurity expert, and …


Data Preprocessing For Machine Learning Modules, Rawan El Moghrabi Aug 2022

Data Preprocessing For Machine Learning Modules, Rawan El Moghrabi

Undergraduate Student Research Internships Conference

Data preprocessing is an essential step when building machine learning solutions. It significantly impacts the success of machine learning modules and the output of these algorithms. Typically, data preprocessing is made-up of data sanitization, feature engineering, normalization, and transformation. This paper outlines the data preprocessing methodology implemented for a data-driven predictive maintenance solution. The above-mentioned project entails acquiring historical electrical data from industrial assets and creating a health index indicating each asset's remaining useful life. This solution is built using machine learning algorithms and requires several data processing steps to increase the solution's accuracy and efficiency. In this project, the …


Development Of Software Tools For Efficient And Sustainable Process Development And Improvement, Jake P. Stengel Jun 2022

Development Of Software Tools For Efficient And Sustainable Process Development And Improvement, Jake P. Stengel

Theses and Dissertations

Infrastructure is a key component in the well-being of our society that leads to its growth, development, and productive operations. A well-built infrastructure allows the community to be more competitive and promotes economic advancement. In 2021, the ASCE (American Society of Civil Engineers) ranked the American infrastructure as substandard, with an overall grade of C-. The overall ranking suffers when key infrastructure categories are not maintained according to the needs of the population. Therefore, there is a need to consider alternative methods to improve our infrastructure and make it more sustainable to enhance the overall grade. One of the challenges …


Machine Learning With Big Data For Electrical Load Forecasting, Alexandra L'Heureux Jun 2022

Machine Learning With Big Data For Electrical Load Forecasting, Alexandra L'Heureux

Electronic Thesis and Dissertation Repository

Today, the amount of data collected is exploding at an unprecedented rate due to developments in Web technologies, social media, mobile and sensing devices and the internet of things (IoT). Data is gathered in every aspect of our lives: from financial information to smart home devices and everything in between. The driving force behind these extensive data collections is the promise of increased knowledge. Therefore, the potential of Big Data relies on our ability to extract value from these massive data sets. Machine learning is central to this quest because of its ability to learn from data and provide data-driven …


Legislative Language For Success, Sanjana Gundala Jun 2022

Legislative Language For Success, Sanjana Gundala

Master's Theses

Legislative committee meetings are an integral part of the lawmaking process for local and state bills. The testimony presented during these meetings is a large factor in the outcome of the proposed bill. This research uses Natural Language Processing and Machine Learning techniques to analyze testimonies from California Legislative committee meetings from 2015-2016 in order to identify what aspects of a testimony makes it successful. A testimony is considered successful if the alignment of the testimony matches the bill outcome (alignment is "For" and the bill passes or alignment is "Against" and the bill fails). The process of finding what …


The Bracelet: An American Sign Language (Asl) Interpreting Wearable Device, Samuel Aba, Ahmadre Darrisaw, Pei Lin, Thomas Leonard May 2022

The Bracelet: An American Sign Language (Asl) Interpreting Wearable Device, Samuel Aba, Ahmadre Darrisaw, Pei Lin, Thomas Leonard

Chancellor’s Honors Program Projects

No abstract provided.


Efficacy Of Reported Issue Times As A Means For Effort Estimation, Paul Phillip Maclean Jan 2022

Efficacy Of Reported Issue Times As A Means For Effort Estimation, Paul Phillip Maclean

Graduate Theses, Dissertations, and Problem Reports

Software effort is a measure of manpower dedicated to developing and maintaining and software. Effort estimation can help project managers monitor their software, teams, and timelines. Conversely, improper effort estimation can result in budget overruns, delays, lost contracts, and accumulated Technical Debt (TD). Issue Tracking Systems (ITS) have become mainstream project management tools, with over 65,000 companies using Jira alone. ITS are an untapped resource for issue resolution effort research. Related work investigates issue effort for specific issue types, usually Bugs or similar. They model their developer-documented issue resolution times using features from the issues themselves. This thesis explores a …


On The Documentation Of Refactoring Types, Eman Abdullah Alomar, Jiaqian Liu, Kenneth Addo, Mohamed Wiem Mkaouer, Christian D. Newman, Ali Ouni, Zhe Yu Dec 2021

On The Documentation Of Refactoring Types, Eman Abdullah Alomar, Jiaqian Liu, Kenneth Addo, Mohamed Wiem Mkaouer, Christian D. Newman, Ali Ouni, Zhe Yu

Articles

Commit messages are the atomic level of software documentation. They provide a natural language description of the code change and its purpose. Messages are critical for software maintenance and program comprehension. Unlike documenting feature updates and bug fixes, little is known about how developers document their refactoring activities. Specifically, developers can perform multiple refactoring operations, including moving methods, extracting classes, renaming attributes, for various reasons, such as improving software quality, managing technical debt, and removing defects. Yet, there is no systematic study that analyzes the extent to which the documentation of refactoring accurately describes the refactoring operations performed at the …


Learning To Interpret Knowledge From Software Q&A Sites, Bowen Xu Aug 2021

Learning To Interpret Knowledge From Software Q&A Sites, Bowen Xu

Dissertations and Theses Collection (Open Access)

Nowadays, software question and answer (SQA) data has become a treasure for software engineering as it contains a huge volume of programming knowledge. That knowledge can be interpreted in many different ways to support various software activities, such as code recommendation, program repair, and so on. In this dissertation, we interpret SQA data by addressing three novel research problems.

The first research problem is about linkable knowledge unit prediction. In this problem, a question and its answers within a post in Stack Overflow are considered as a knowledge unit (KU). KUs often contain semantically relevant knowledge, and thus linkable for …


Machine Learning Approaches To Dribble Hand-Off Action Classification With Sportvu Nba Player Coordinate Data, Dembe Stephanos May 2021

Machine Learning Approaches To Dribble Hand-Off Action Classification With Sportvu Nba Player Coordinate Data, Dembe Stephanos

Electronic Theses and Dissertations

Recently, strategies of National Basketball Association teams have evolved with the skillsets of players and the emergence of advanced analytics. One of the most effective actions in dynamic offensive strategies in basketball is the dribble hand-off (DHO). This thesis proposes an architecture for a classification pipeline for detecting DHOs in an accurate and automated manner. This pipeline consists of a combination of player tracking data and event labels, a rule set to identify candidate actions, manually reviewing game recordings to label the candidates, and embedding player trajectories into hexbin cell paths before passing the completed training set to the classification …


Visual Analysis Of Discrimination In Machine Learning, Qianwen Wang, Zhenghua Xu, Zhutian Chen, Yong Wang, Shixia Liu, Huamin Qu Feb 2021

Visual Analysis Of Discrimination In Machine Learning, Qianwen Wang, Zhenghua Xu, Zhutian Chen, Yong Wang, Shixia Liu, Huamin Qu

Research Collection School Of Computing and Information Systems

The growing use of automated decision-making in critical applications, such as crime prediction and college admission, has raised questions about fairness in machine learning. How can we decide whether different treatments are reasonable or discriminatory? In this paper, we investigate discrimination in machine learning from a visual analytics perspective and propose an interactive visualization tool, DiscriLens, to support a more comprehensive analysis. To reveal detailed information on algorithmic discrimination, DiscriLens identifies a collection of potentially discriminatory itemsets based on causal modeling and classification rules mining. By combining an extended Euler diagram with a matrix-based visualization, we develop a novel set …


Source Code Comment Classification Artificial Intelligence, Cole Sutyak Jan 2021

Source Code Comment Classification Artificial Intelligence, Cole Sutyak

Williams Honors College, Honors Research Projects

Source code comment classification is an important problem for future machine learning solutions. In particular, supervised machine learning solutions that have largely subjective data labels but are difficult to obtain the labels for. Machine learning problems are problems largely because of a lack of data. In machine learning solutions, it is better to have a large amount of mediocre data than it is to have a small amount of good data. While the mediocre data might not produce the best accuracy, it produces the best results because there is much more to learn from the problem.

In this project, data …


Distributed Load Testing By Modeling And Simulating User Behavior, Chester Ira Parrott Dec 2020

Distributed Load Testing By Modeling And Simulating User Behavior, Chester Ira Parrott

LSU Doctoral Dissertations

Modern human-machine systems such as microservices rely upon agile engineering practices which require changes to be tested and released more frequently than classically engineered systems. A critical step in the testing of such systems is the generation of realistic workloads or load testing. Generated workload emulates the expected behaviors of users and machines within a system under test in order to find potentially unknown failure states. Typical testing tools rely on static testing artifacts to generate realistic workload conditions. Such artifacts can be cumbersome and costly to maintain; however, even model-based alternatives can prevent adaptation to changes in a system …


Optimized Machine Learning Models Towards Intelligent Systems, Mohammadnoor Ahmad Mohammad Injadat Jul 2020

Optimized Machine Learning Models Towards Intelligent Systems, Mohammadnoor Ahmad Mohammad Injadat

Electronic Thesis and Dissertation Repository

The rapid growth of the Internet and related technologies has led to the collection of large amounts of data by individuals, organizations, and society in general [1]. However, this often leads to information overload which occurs when the amount of input (e.g. data) a human is trying to process exceeds their cognitive capacities [2]. Machine learning (ML) has been proposed as one potential methodology capable of extracting useful information from large sets of data [1]. This thesis focuses on two applications. The first is education, namely e-Learning environments. Within this field, this thesis proposes different optimized ML ensemble models to …


Automated Anomaly Detection And Localization System For A Microservices Based Cloud System, Priyanka Prakash Naikade Jul 2020

Automated Anomaly Detection And Localization System For A Microservices Based Cloud System, Priyanka Prakash Naikade

Electronic Thesis and Dissertation Repository

Context: With an increasing number of applications running on a microservices-based cloud system (such as AWS, GCP, IBM Cloud), it is challenging for the cloud providers to offer uninterrupted services with guaranteed Quality of Service (QoS) factors. Problem Statement: Existing monitoring frameworks often do not detect critical defects among a large volume of issues generated, thus affecting recovery response times and usage of maintenance human resource. Also, manually tracing the root causes of the issues requires a significant amount of time. Objective: The objective of this work is to: (i) detect performance anomalies, in real-time, through monitoring KPIs (Key Performance …


Toward The Automatic Classification Of Self-Affirmed Refactoring, Mohamed Wiem Mkaouer, Eman Abdullah Alomar, Ali Ouni May 2020

Toward The Automatic Classification Of Self-Affirmed Refactoring, Mohamed Wiem Mkaouer, Eman Abdullah Alomar, Ali Ouni

Articles

The concept of Self-Affirmed Refactoring (SAR) was introduced to explore how developers document their refactoring activities in commit messages, i.e., developers explicit documentation of refactoring operations intentionally introduced during a code change. In our previous study, we have manually identified refactoring patterns and defined three main common quality improvement categories including internal quality attributes, external quality attributes, and code smells, by only considering refactoring-related commits. However, this approach heavily depends on the manual inspection of commit messages. In this paper, we propose a two-step approach to first identify whether a commit describes developer-related refactoring events, then to classify it according …


Chaff From The Wheat: Characterizing And Determining Valid Bug Reports, Yuanrui Fan, Xin Xia, David Lo, Ahmed E. Hassan May 2020

Chaff From The Wheat: Characterizing And Determining Valid Bug Reports, Yuanrui Fan, Xin Xia, David Lo, Ahmed E. Hassan

Research Collection School Of Computing and Information Systems

Developers use bug reports to triage and fix bugs. When triaging a bug report, developers must decide whether the bug report is valid (i.e., a real bug). A large amount of bug reports are submitted every day, with many of them end up being invalid reports. Manually determining valid bug report is a difficult and tedious task. Thus, an approach that can automatically analyze the validity of a bug report and determine whether a report is valid can help developers prioritize their triaging tasks and avoid wasting time and effort on invalid bug reports. In this study, motivated by the …


Estimating Refactoring Efforts For Architecture Technical Debt, Samir Deeb Jan 2020

Estimating Refactoring Efforts For Architecture Technical Debt, Samir Deeb

Graduate Theses, Dissertations, and Problem Reports

Paying-off the Architectural Technical Debt by refactoring the flawed code is important to control the debt and to keep it as low as possible. Project Managers tend to delay paying off this debt because they face difficulties in comparing the cost of the refactoring against the benefits they gain. For these managers to decide whether to refactor or to postpone, they need to estimate the cost and the efforts required to conduct these refactoring activities as well as to decide which flaws have higher priority to be refactored among others.

Our research is based on a dataset used by other …


A Study Of Face Embedding In Face Recognition, Khanh Duc Le Mar 2019

A Study Of Face Embedding In Face Recognition, Khanh Duc Le

Master's Theses

Face Recognition has been a long-standing topic in computer vision and pattern recognition field because of its wide and important applications in our daily lives such as surveillance system, access control, and so on. The current modern face recognition model, which keeps only a couple of images per person in the database, can now recognize a face with high accuracy. Moreover, the model does not need to be retrained every time a new person is added to the database.

By using the face dataset from Digital Democracy, the thesis will explore the capability of this model by comparing it with …


Dish: Democracy In State Houses, Nicholas A. Russo Feb 2019

Dish: Democracy In State Houses, Nicholas A. Russo

Master's Theses

In our current political climate, state level legislators have become increasingly impor- tant. Due to cuts in funding and growing focus at the national level, public oversight for these legislators has drastically decreased. This makes it difficult for citizens and activists to understand the relationships and commonalities between legislators. This thesis provides three contributions to address this issue. First, we created a data set containing over 1200 features focused on a legislator’s activity on bills. Second, we created embeddings that represented a legislator’s level of activity and engagement for a given bill using a custom model called Democracy2Vec. Third, we …


A Machine Learning Recommender Model For Ride Sharing Based On Rider Characteristics And User Threshold Time, Govind Pramod Yatnalkar Jan 2019

A Machine Learning Recommender Model For Ride Sharing Based On Rider Characteristics And User Threshold Time, Govind Pramod Yatnalkar

Theses, Dissertations and Capstones

In the present age, human life is prospering incredibly due to the 4th Industrial Revolution or The Age of Digitization and Computing. The ubiquitous availability of the Internet and advanced computing systems have resulted in the rapid development of smart cities. From connected devices to live vehicle tracking, technology is taking the field of transportation to a new level. An essential part of the transportation domain in smart cities is Ride Sharing. It is an excellent solution to issues like pollution, traffic, and the rapid consumption of fuel. Even though Ride Sharing has several benefits, the current usage is …


Assessing The Quality Of Software Development Tutorials Available On The Web, Manziba A. Nishi Jan 2019

Assessing The Quality Of Software Development Tutorials Available On The Web, Manziba A. Nishi

Theses and Dissertations

Both expert and novice software developers frequently access software development resources available on the Web in order to lookup or learn new APIs, tools and techniques. Software quality is affected negatively when developers fail to find high-quality information relevant to their problem. While there is a substantial amount of freely available resources that can be accessed online, some of the available resources contain information that suffers from error proneness, copyright infringement, security concerns, and incompatible versions. Use of such toxic information can have a strong negative effect on developer’s efficacy. This dissertation focuses specifically on software tutorials, aiming to automatically …