Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Machine Learning

Theory and Algorithms

Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 39

Full-Text Articles in Physical Sciences and Mathematics

Hypothyroid Disease Analysis By Using Machine Learning, Sanjana Seelam Dec 2023

Hypothyroid Disease Analysis By Using Machine Learning, Sanjana Seelam

Electronic Theses, Projects, and Dissertations

Thyroid illness frequently manifests as hypothyroidism. It is evident that people with hypothyroidism are primarily female. Because the majority of people are unaware of the illness, it is quickly becoming more serious. It is crucial to catch it early on so that medical professionals can treat it more effectively and prevent it from getting worse. Machine learning illness prediction is a challenging task. Disease prediction is aided greatly by machine learning. Once more, unique feature selection strategies have made the process of disease assumption and prediction easier. To properly monitor and cure this illness, accurate detection is essential. In order …


Towards Long-Term Fairness In Sequential Decision Making, Yaowei Hu Dec 2023

Towards Long-Term Fairness In Sequential Decision Making, Yaowei Hu

Graduate Theses and Dissertations

With the development of artificial intelligence, automated decision-making systems are increasingly integrated into various applications, such as hiring, loans, education, recommendation systems, and more. These machine learning algorithms are expected to facilitate faster, more accurate, and impartial decision-making compared to human judgments. Nevertheless, these expectations are not always met in practice due to biased training data, leading to discriminatory outcomes. In contemporary society, countering discrimination has become a consensus among people, leading the EU and the US to enact laws and regulations that prohibit discrimination based on factors such as gender, age, race, and religion. Consequently, addressing algorithmic discrimination has …


How I Read An Article That Uses Machine Learning Methods, Aziz Nazha, Olivier Elemento, Shannon Mcweeney, Moses Miles, Torsten Haferlach Aug 2023

How I Read An Article That Uses Machine Learning Methods, Aziz Nazha, Olivier Elemento, Shannon Mcweeney, Moses Miles, Torsten Haferlach

Kimmel Cancer Center Faculty Papers

No abstract provided.


A Novel Approach To Extending Music Using Latent Diffusion, Keon Roohparvar, Franz J. Kurfess Jun 2023

A Novel Approach To Extending Music Using Latent Diffusion, Keon Roohparvar, Franz J. Kurfess

Master's Theses

Using deep learning to synthetically generate music is a research domain that has gained more attention from the public in the past few years. A subproblem of music generation is music extension, or the task of taking existing music and extending it. This work proposes the Continuer Pipeline, a novel technique that uses deep learning to take music and extend it in 5 second increments. It does this by treating the musical generation process as an image generation problem; we utilize latent diffusion models (LDMs) to generate spectrograms, which are image representations of music. The Continuer Pipeline is able to …


Eddy Current Defect Response Analysis Using Sum Of Gaussian Methods, James William Earnest May 2023

Eddy Current Defect Response Analysis Using Sum Of Gaussian Methods, James William Earnest

Theses and Dissertations

This dissertation is a study of methods to automatedly detect and produce approximations of eddy current differential coil defect signatures in terms of a summed collection of Gaussian functions (SoG). Datasets consisting of varying material, defect size, inspection frequency, and coil diameter were investigated. Dimensionally reduced representations of the defect responses were obtained utilizing common existing reduction methods and novel enhancements to them utilizing SoG Representations. Efficacy of the SoG enhanced representations were studied utilizing common Machine Learning (ML) interpretable classifier designs with the SoG representations indicating significant improvement of common analysis metrics.


A Machine Learning Approach For Predicting Clinical Trial Patient Enrollment In Drug Development Portfolio Demand Planning, Ahmed Shoieb May 2023

A Machine Learning Approach For Predicting Clinical Trial Patient Enrollment In Drug Development Portfolio Demand Planning, Ahmed Shoieb

Masters Theses

One of the biggest challenges the clinical research industry currently faces is the accurate forecasting of patient enrollment (namely if and when a clinical trial will achieve full enrollment), as the stochastic behavior of enrollment can significantly contribute to delays in the development of new drugs, increases in duration and costs of clinical trials, and the over- or under- estimation of clinical supply. This study proposes a Machine Learning model using a Fully Convolutional Network (FCN) that is trained on a dataset of 100,000 patient enrollment data points including patient age, patient gender, patient disease, investigational product, study phase, blinded …


Chatgpt As Metamorphosis Designer For The Future Of Artificial Intelligence (Ai): A Conceptual Investigation, Amarjit Kumar Singh (Library Assistant), Dr. Pankaj Mathur (Deputy Librarian) Mar 2023

Chatgpt As Metamorphosis Designer For The Future Of Artificial Intelligence (Ai): A Conceptual Investigation, Amarjit Kumar Singh (Library Assistant), Dr. Pankaj Mathur (Deputy Librarian)

Library Philosophy and Practice (e-journal)

Abstract

Purpose: The purpose of this research paper is to explore ChatGPT’s potential as an innovative designer tool for the future development of artificial intelligence. Specifically, this conceptual investigation aims to analyze ChatGPT’s capabilities as a tool for designing and developing near about human intelligent systems for futuristic used and developed in the field of Artificial Intelligence (AI). Also with the helps of this paper, researchers are analyzed the strengths and weaknesses of ChatGPT as a tool, and identify possible areas for improvement in its development and implementation. This investigation focused on the various features and functions of ChatGPT that …


Liquid Tab, Nathan Hulet Jan 2023

Liquid Tab, Nathan Hulet

Williams Honors College, Honors Research Projects

Guitar transcription is a complex task requiring significant time, skill, and musical knowledge to achieve accurate results. Since most music is recorded and processed digitally, it would seem like many tools to digitally analyze and transcribe the audio would be available. However, the problem of automatic transcription presents many more difficulties than are initially evident. There are multiple ways to play a guitar, many diverse styles of playing, and every guitar sounds different. These problems become even more difficult considering the varying qualities of recordings and levels of background noise.

Machine learning has proven itself to be a flexible tool …


Coded Distributed Function Computation, Pedro J. Soto Jun 2022

Coded Distributed Function Computation, Pedro J. Soto

Dissertations, Theses, and Capstone Projects

A ubiquitous problem in computer science research is the optimization of computation on large data sets. Such computations are usually too large to be performed on one machine and therefore the task needs to be distributed amongst a network of machines. However, a common problem within distributed computing is the mitigation of delays caused by faulty machines. This can be performed by the use of coding theory to optimize the amount of redundancy needed to handle such faults. This problem differs from classical coding theory since it is concerned with the dynamic coded computation on data rather than just statically …


Legislative Language For Success, Sanjana Gundala Jun 2022

Legislative Language For Success, Sanjana Gundala

Master's Theses

Legislative committee meetings are an integral part of the lawmaking process for local and state bills. The testimony presented during these meetings is a large factor in the outcome of the proposed bill. This research uses Natural Language Processing and Machine Learning techniques to analyze testimonies from California Legislative committee meetings from 2015-2016 in order to identify what aspects of a testimony makes it successful. A testimony is considered successful if the alignment of the testimony matches the bill outcome (alignment is "For" and the bill passes or alignment is "Against" and the bill fails). The process of finding what …


A Machine Learning And Deep Learning Framework For Binary, Ternary, And Multiclass Emotion Classification Of Covid-19 Vaccine-Related Tweets, Aditya Dubey May 2022

A Machine Learning And Deep Learning Framework For Binary, Ternary, And Multiclass Emotion Classification Of Covid-19 Vaccine-Related Tweets, Aditya Dubey

Honors Scholar Theses

My research mines public emotion toward the Covid-19 vaccine based on Twitter data collected over the past 6-12 months. This project is centered around building and developing machine learning and deep learning models to perform natural language processing of short-form text, which in our case tweets. These tweets are all vaccine-related tweets and the goal of the classification task is for our models to accurately classify a tweet into one of four emotion groups: Apprehension/Anticipation, Sadness/Anger/Frustration, Joy/Humor/Sarcasm, and Gratitude/Relief. Given this data and the goal of the paper, we aim to answer the following questions: (1) Can a framework be …


The Executive’S Guide To Getting Ai Wrong, Jerrold Soh May 2022

The Executive’S Guide To Getting Ai Wrong, Jerrold Soh

Asian Management Insights

It’s all math. Really.


Information Extraction And Classification On Journal Papers, Lei Yu Nov 2021

Information Extraction And Classification On Journal Papers, Lei Yu

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

The importance of journals for diffusing the results of scientific research has increased considerably. In the digital era, Portable Document Format (PDF) became the established format of electronic journal articles. This structured form, combined with a regular and wide dissemination, spread scientific advancements easily and quickly. However, the rapidly increasing numbers of published scientific articles requires more time and effort on systematic literature reviews, searches and screens. The comprehension and extraction of useful information from the digital documents is also a challenging task, due to the complex structure of PDF.

To help a soil science team from the United States …


Analytical Models For Traffic Congestion And Accident Analysis, Hongrui Liu, Rahul Ramachandra Shetty Nov 2021

Analytical Models For Traffic Congestion And Accident Analysis, Hongrui Liu, Rahul Ramachandra Shetty

Mineta Transportation Institute Publications

In the US, over 38,000 people die in road crashes each year, and 2.35 million are injured or disabled, according to the statistics report from the Association for Safe International Road Travel (ASIRT) in 2020. In addition, traffic congestion keeping Americans stuck on the road wastes millions of hours and billions of dollars each year. Using statistical techniques and machine learning algorithms, this research developed accurate predictive models for traffic congestion and road accidents to increase understanding of the complex causes of these challenging issues. The research used US Accidents data consisting of 49 variables describing 4.2 million accident records …


Machine Learning With Topological Data Analysis, Ephraim Robert Love May 2021

Machine Learning With Topological Data Analysis, Ephraim Robert Love

Doctoral Dissertations

Topological Data Analysis (TDA) is a relatively new focus in the fields of statistics and machine learning. Methods of exploiting the geometry of data, such as clustering, have proven theoretically and empirically invaluable. TDA provides a general framework within which to study topological invariants (shapes) of data, which are more robust to noise and can recover information on higher dimensional features than immediately apparent in the data. A common tool for conducting TDA is persistence homology, which measures the significance of these invariants. Persistence homology has prominent realizations in methods of data visualization, statistics and machine learning. Extending ML with …


Machine Learning Approaches To Dribble Hand-Off Action Classification With Sportvu Nba Player Coordinate Data, Dembe Stephanos May 2021

Machine Learning Approaches To Dribble Hand-Off Action Classification With Sportvu Nba Player Coordinate Data, Dembe Stephanos

Electronic Theses and Dissertations

Recently, strategies of National Basketball Association teams have evolved with the skillsets of players and the emergence of advanced analytics. One of the most effective actions in dynamic offensive strategies in basketball is the dribble hand-off (DHO). This thesis proposes an architecture for a classification pipeline for detecting DHOs in an accurate and automated manner. This pipeline consists of a combination of player tracking data and event labels, a rule set to identify candidate actions, manually reviewing game recordings to label the candidates, and embedding player trajectories into hexbin cell paths before passing the completed training set to the classification …


Visual Analysis Of Discrimination In Machine Learning, Qianwen Wang, Zhenghua Xu, Zhutian Chen, Yong Wang, Shixia Liu, Huamin Qu Feb 2021

Visual Analysis Of Discrimination In Machine Learning, Qianwen Wang, Zhenghua Xu, Zhutian Chen, Yong Wang, Shixia Liu, Huamin Qu

Research Collection School Of Computing and Information Systems

The growing use of automated decision-making in critical applications, such as crime prediction and college admission, has raised questions about fairness in machine learning. How can we decide whether different treatments are reasonable or discriminatory? In this paper, we investigate discrimination in machine learning from a visual analytics perspective and propose an interactive visualization tool, DiscriLens, to support a more comprehensive analysis. To reveal detailed information on algorithmic discrimination, DiscriLens identifies a collection of potentially discriminatory itemsets based on causal modeling and classification rules mining. By combining an extended Euler diagram with a matrix-based visualization, we develop a novel set …


Using Torchattacks To Improve The Robustness Of Models With Adversarial Training, William S. Matos Díaz Jan 2021

Using Torchattacks To Improve The Robustness Of Models With Adversarial Training, William S. Matos Díaz

Cybersecurity: Deep Learning Driven Cybersecurity Research in a Multidisciplinary Environment

Adversarial training has proven to be one of the most successful ways to defend models against adversarial examples. This process consists of training a model with an adversarial example to improve the robustness of the model. In this experiment, Torchattacks, a Pytorch library made for importing adversarial examples more easily, was used to determine which attack was the strongest. Later on, the strongest attack was used to train the model and make it more robust against adversarial examples. The datasets used to perform the experiments were MNIST and CIFAR-10. Both datasets were put to the test using PGD, FGSM, and …


K-Nearest Neighbors Density-Based Clustering, Avory C. Bryant Jan 2021

K-Nearest Neighbors Density-Based Clustering, Avory C. Bryant

Theses and Dissertations

Traditional density-based clustering approaches rely on a distance-based parameter to define data connectivity and density. However, an appropriate value of this parameter can be difficult to determine as it is highly dependent on the underlying distribution of the data. In particular, distribution parameters affect the scale of inter-group distances (e.g., variance); this dependence leads to a well-known inability to simultaneously detect clusters at varying levels of density. In this work, connectivity and density are defined according to the rank-order induced by the distance metric (i.e., invariant to the expected scale of the distances). Connectivity by k-nearest neighbors and density by …


Random Search Plus: A More Effective Random Search For Machine Learning Hyperparameters Optimization, Bohan Li Dec 2020

Random Search Plus: A More Effective Random Search For Machine Learning Hyperparameters Optimization, Bohan Li

Masters Theses

Machine learning hyperparameter optimization has always been the key to improve model performance. There are many methods of hyperparameter optimization. The popular methods include grid search, random search, manual search, Bayesian optimization, population-based optimization, etc. Random search occupies less computations than the grid search, but at the same time there is a penalty for accuracy. However, this paper proposes a more effective random search method based on the traditional random search and hyperparameter space separation. This method is named random search plus. This thesis empirically proves that random search plus is more effective than random search. There are some case …


Achieving Causal Fairness In Machine Learning, Yongkai Wu May 2020

Achieving Causal Fairness In Machine Learning, Yongkai Wu

Graduate Theses and Dissertations

Fairness is a social norm and a legal requirement in today's society. Many laws and regulations (e.g., the Equal Credit Opportunity Act of 1974) have been established to prohibit discrimination and enforce fairness on several grounds, such as gender, age, sexual orientation, race, and religion, referred to as sensitive attributes. Nowadays machine learning algorithms are extensively applied to make important decisions in many real-world applications, e.g., employment, admission, and loans. Traditional machine learning algorithms aim to maximize predictive performance, e.g., accuracy. Consequently, certain groups may get unfairly treated when those algorithms are applied for decision-making. Therefore, it is an imperative …


Supervised Machine Learning Models For Fake News Detection, Andrea Lopez, Adelo Vieira, Zafar Ahsan, Farooq Sabib, Shirley Marinho Jun 2019

Supervised Machine Learning Models For Fake News Detection, Andrea Lopez, Adelo Vieira, Zafar Ahsan, Farooq Sabib, Shirley Marinho

ICT

Fake news or the distribution of disinformation has become one of the most challenging issues in society. News and information are churned out across online websites and platforms in real-time, with little or no way for the viewing public to determine what is real or manufactured. But an awareness of what we are consuming online is becoming apparent and efforts are underway to explore how we separate fake content from genuine and truthful information. The most challenging part of fake news is determining how to spot it. In technology, there are ways to help us do this. Supervised machine learning …


Machine Learning Pipeline For Exoplanet Classification, George Clayton Sturrock, Brychan Manry, Sohail Rafiqi May 2019

Machine Learning Pipeline For Exoplanet Classification, George Clayton Sturrock, Brychan Manry, Sohail Rafiqi

SMU Data Science Review

Planet identification has typically been a tasked performed exclusively by teams of astronomers and astrophysicists using methods and tools accessible only to those with years of academic education and training. NASA’s Exoplanet Exploration program has introduced modern satellites capable of capturing a vast array of data regarding celestial objects of interest to assist with researching these objects. The availability of satellite data has opened up the task of planet identification to individuals capable of writing and interpreting machine learning models. In this study, several classification models and datasets are utilized to assign a probability of an observation being an exoplanet. …


Dish: Democracy In State Houses, Nicholas A. Russo Feb 2019

Dish: Democracy In State Houses, Nicholas A. Russo

Master's Theses

In our current political climate, state level legislators have become increasingly impor- tant. Due to cuts in funding and growing focus at the national level, public oversight for these legislators has drastically decreased. This makes it difficult for citizens and activists to understand the relationships and commonalities between legislators. This thesis provides three contributions to address this issue. First, we created a data set containing over 1200 features focused on a legislator’s activity on bills. Second, we created embeddings that represented a legislator’s level of activity and engagement for a given bill using a custom model called Democracy2Vec. Third, we …


Improving Vix Futures Forecasts Using Machine Learning Methods, James Hosker, Slobodan Djurdjevic, Hieu Nguyen, Robert Slater Jan 2019

Improving Vix Futures Forecasts Using Machine Learning Methods, James Hosker, Slobodan Djurdjevic, Hieu Nguyen, Robert Slater

SMU Data Science Review

The problem of forecasting market volatility is a difficult task for most fund managers. Volatility forecasts are used for risk management, alpha (risk) trading, and the reduction of trading friction. Improving the forecasts of future market volatility assists fund managers in adding or reducing risk in their portfolios as well as in increasing hedges to protect their portfolios in anticipation of a market sell-off event. Our analysis compares three existing financial models that forecast future market volatility using the Chicago Board Options Exchange Volatility Index (VIX) to six machine/deep learning supervised regression methods. This analysis determines which models provide best …


A Machine Learning Recommender Model For Ride Sharing Based On Rider Characteristics And User Threshold Time, Govind Pramod Yatnalkar Jan 2019

A Machine Learning Recommender Model For Ride Sharing Based On Rider Characteristics And User Threshold Time, Govind Pramod Yatnalkar

Theses, Dissertations and Capstones

In the present age, human life is prospering incredibly due to the 4th Industrial Revolution or The Age of Digitization and Computing. The ubiquitous availability of the Internet and advanced computing systems have resulted in the rapid development of smart cities. From connected devices to live vehicle tracking, technology is taking the field of transportation to a new level. An essential part of the transportation domain in smart cities is Ride Sharing. It is an excellent solution to issues like pollution, traffic, and the rapid consumption of fuel. Even though Ride Sharing has several benefits, the current usage is …


Randomized Algorithms For Preconditioner Selection With Applications To Kernel Regression, Conner Dipaolo Jan 2019

Randomized Algorithms For Preconditioner Selection With Applications To Kernel Regression, Conner Dipaolo

HMC Senior Theses

The task of choosing a preconditioner M to use when solving a linear system Ax=b with iterative methods is often tedious and most methods remain ad-hoc. This thesis presents a randomized algorithm to make this chore less painful through use of randomized algorithms for estimating traces. In particular, we show that the preconditioner stability || I - M-1A ||F, known to forecast preconditioner quality, can be computed in the time it takes to run a constant number of iterations of conjugate gradients through use of sketching methods. This is in spite of folklore which …


Quantifying Human Biological Age: A Machine Learning Approach, Syed Ashiqur Rahman Jan 2019

Quantifying Human Biological Age: A Machine Learning Approach, Syed Ashiqur Rahman

Graduate Theses, Dissertations, and Problem Reports

Quantifying human biological age is an important and difficult challenge. Different biomarkers and numerous approaches have been studied for biological age prediction, each with its advantages and limitations. In this work, we first introduce a new anthropometric measure (called Surface-based Body Shape Index, SBSI) that accounts for both body shape and body size, and evaluate its performance as a predictor of all-cause mortality. We analyzed data from the National Health and Human Nutrition Examination Survey (NHANES). Based on the analysis, we introduce a new body shape index constructed from four important anthropometric determinants of body shape and body size: body …


Object-Based Supervised Machine Learning Regional-Scale Land-Cover Classification Using High Resolution Remotely Sensed Data, Christopher A. Ramezan Jan 2019

Object-Based Supervised Machine Learning Regional-Scale Land-Cover Classification Using High Resolution Remotely Sensed Data, Christopher A. Ramezan

Graduate Theses, Dissertations, and Problem Reports

High spatial resolution (HR) (1m – 5m) remotely sensed data in conjunction with supervised machine learning classification are commonly used to construct land-cover classifications. Despite the increasing availability of HR data, most studies investigating HR remotely sensed data and associated classification methods employ relatively small study areas. This work therefore drew on a 2,609 km2, regional-scale study in northeastern West Virginia, USA, to investigates a number of core aspects of HR land-cover supervised classification using machine learning. Issues explored include training sample selection, cross-validation parameter tuning, the choice of machine learning algorithm, training sample set size, and feature selection. A …


Optimaztion Of Fantasy Basketball Lineups Via Machine Learning, James Earl Jan 2019

Optimaztion Of Fantasy Basketball Lineups Via Machine Learning, James Earl

Senior Honors Theses

Machine learning is providing a way to glean never before known insights from the data that gets recorded every day. This paper examines the application of machine learning to the novel field of Daily Fantasy Basketball. The particularities of the fantasy basketball ruleset and playstyle are discussed, and then the results of a data science case study are reviewed. The data set consists of player performance statistics as well as Fantasy Points, implied team total, DvP, and player status. The end goal is to evaluate how accurately the computer can predict a player’s fantasy performance based off a chosen feature …