Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Science

2022

Machine Learning

Institution
Publication
Publication Type

Articles 1 - 30 of 31

Full-Text Articles in Physical Sciences and Mathematics

Integrated Machine Learning And Optimization Approaches, Dogacan Yilmaz Dec 2022

Integrated Machine Learning And Optimization Approaches, Dogacan Yilmaz

Dissertations

This dissertation focuses on the integration of machine learning and optimization. Specifically, novel machine learning-based frameworks are proposed to help solve a broad range of well-known operations research problems to reduce the solution times. The first study presents a bidirectional Long Short-Term Memory framework to learn optimal solutions to sequential decision-making problems. Computational results show that the framework significantly reduces the solution time of benchmark capacitated lot-sizing problems without much loss in feasibility and optimality. Also, models trained using shorter planning horizons can successfully predict the optimal solution of the instances with longer planning horizons. For the hardest data set, …


Application Of Distributed Fiber-Optic Sensing For Pressure Predictions And Multiphase Flow Characterization, Gerald Kelechi Ekechukwu Dec 2022

Application Of Distributed Fiber-Optic Sensing For Pressure Predictions And Multiphase Flow Characterization, Gerald Kelechi Ekechukwu

LSU Doctoral Dissertations

In the oil and gas industry, distributed fiber optics sensing (DFOS) has the potential to revolutionize well and reservoir surveillance applications. Using fiber optic sensors is becoming increasingly common because of its chemically passive and non-magnetic interference properties, the possibility of flexible installations that could be behind the casing, on the tubing, or run on wireline, as well as the potential for densely distributed measurements along the entire length of the fiber. The main objectives of my research are to develop and demonstrate novel signal processing and machine learning computational techniques and workflows on DFOS data for a variety of …


Investigating Applications Of Deep Learning For Diagnosis Of Post Traumatic Elbow Disease, Hugh James Dec 2022

Investigating Applications Of Deep Learning For Diagnosis Of Post Traumatic Elbow Disease, Hugh James

McKelvey School of Engineering Theses & Dissertations

Traumatic events such as dislocation, breaks, and arthritis of musculoskeletal joints can cause the development of post-traumatic joint contracture (PTJC). Clinically, noninvasive techniques such as Magnetic Resonance Imaging (MRI) scans are used to analyze the disease. Such procedures require a patient to sit sedentary for long periods of time and can be expensive as well. Additionally, years of practice and experience are required for clinicians to accurately recognize the diseased anterior capsule region and make an accurate diagnosis. Manual tracing of the anterior capsule is done to help with diagnosis but is subjective and timely. As a result, there is …


Artificial Intelligence In The Medical Field: Medical Review Sentiment Analysis, Nicholas Podlesak Dec 2022

Artificial Intelligence In The Medical Field: Medical Review Sentiment Analysis, Nicholas Podlesak

Honors Capstones

In this research project, natural language processing techniques’ ability to accurately classify medical text was measured to reinforce the relevance of artificial intelligence in the medical field. Sentiment analyses (analyses to determine whether the text was positive or negative) were performed on the prescription drug reviews in an open-source dataset using four different models: lexical, a neural network, a support vector machine, and a logistic regression model. Each model’s effectiveness was gauged by its ability to correctly classify unlabeled drug reviews (i.e., a percentage representing accuracy). The machine learning models were able to accurately classify the text, while the lexical …


Enhancing The Performance Of The Mtcnn For The Classification Of Cancer Pathology Reports: From Data Annotation To Model Deployment, Kevin De Angeli Dec 2022

Enhancing The Performance Of The Mtcnn For The Classification Of Cancer Pathology Reports: From Data Annotation To Model Deployment, Kevin De Angeli

Doctoral Dissertations

Information contained in electronic health records (EHR) combined with the latest advances in machine learning (ML) have the potential to revolutionize the medical sciences. In particular, information contained in cancer pathology reports is essential to investigate cancer trends across the country. Unfortunately, large parts of information in EHRs are stored in the form of unstructured, free-text which limit their usability and research potential. To overcome this accessibility barrier, cancer registries depend on expert personnel who read, interpret, and extract relevant information. Naturally, as the number of stored pathology reports increases every day, depending on human experts presents scalability challenges. Recently, …


Identity Term Sampling For Measuring Gender Bias In Training Data, Nasim Sobhani, Sarah Jane Delany Dec 2022

Identity Term Sampling For Measuring Gender Bias In Training Data, Nasim Sobhani, Sarah Jane Delany

Conference Papers

Predictions from machine learning models can reflect biases in the data on which they are trained. Gender bias has been identified in natural language processing systems such as those used for recruitment. The development of approaches to mitigate gender bias in training data typically need to be able to isolate the effect of gender on the output to see the impact of gender. While it is possible to isolate and identify gender for some types of training data, e.g. CVs in recruitment, for most textual corpora there is no obvious gender label. This paper proposes a general approach to measure …


Performance Enhancement Of Hyperspectral Semantic Segmentation Leveraging Ensemble Networks, Nicholas Soucy Dec 2022

Performance Enhancement Of Hyperspectral Semantic Segmentation Leveraging Ensemble Networks, Nicholas Soucy

Electronic Theses and Dissertations

Hyperspectral image (HSI) semantic segmentation is a growing field within computer vision, machine learning, and forestry. Due to the separate nature of these communities, research applying deep learning techniques to ground-type semantic segmentation needs improvement, along with working to bring the research and expectations of these three communities together. Semantic segmentation consists of classifying individual pixels within the image based on the features present. Many issues need to be resolved in HSI semantic segmentation including data preprocessing, feature reduction, semantic segmentation techniques, and adversarial training. In this thesis, we tackle these challenges by employing ensemble methods for HSI semantic segmentation. …


A New Kind Of Data Science: The Need For Ethical Analytics, Jonathan Boardman Nov 2022

A New Kind Of Data Science: The Need For Ethical Analytics, Jonathan Boardman

Published and Grey Literature from PhD Candidates

Ethics can no longer be regarded as an add-on in data science and analytics. This paper argues for the necessity of formalizing a new, practically-oriented sub-discipline of AI ethics by outlining the needs, highlighting shortcomings in current approaches, and providing a framework for ethical analytics, which is concerned with the study of the ethical issues surrounding the development, deployment, and/or dissemination of ML/AI systems and data science research, as well as the development of tools and procedures to mitigate ethical harms. While data science and machine learning are primarily concerned with data from start to finish, ethical analytics is concerned …


Classification Of Pixel Tracks To Improve Track Reconstruction From Proton-Proton Collisions, Kebur Fantahun, Jobin Joseph, Halle Purdom, Nibhrat Lohia Sep 2022

Classification Of Pixel Tracks To Improve Track Reconstruction From Proton-Proton Collisions, Kebur Fantahun, Jobin Joseph, Halle Purdom, Nibhrat Lohia

SMU Data Science Review

In this paper, machine learning techniques are used to reconstruct particle collision pathways. CERN (Conseil européen pour la recherche nucléaire) uses a massive underground particle collider, called the Large Hadron Collider or LHC, to produce particle collisions at extremely high speeds. There are several layers of detectors in the collider that track the pathways of particles as they collide. The data produced from collisions contains an extraneous amount of background noise, i.e., decays from known particle collisions produce fake signal. Particularly, in the first layer of the detector, the pixel tracker, there is an overwhelming amount of background noise that …


Mathematical Models Yield Insights Into Cnns: Applications In Natural Image Restoration And Population Genetics, Ryan Cecil Aug 2022

Mathematical Models Yield Insights Into Cnns: Applications In Natural Image Restoration And Population Genetics, Ryan Cecil

Electronic Theses and Dissertations

Due to a rise in computational power, machine learning (ML) methods have become the state-of-the-art in a variety of fields. Known to be black-box approaches, however, these methods are oftentimes not well understood. In this work, we utilize our understanding of model-based approaches to derive insights into Convolutional Neural Networks (CNNs). In the field of Natural Image Restoration, we focus on the image denoising problem. Recent work have demonstrated the potential of mathematically motivated CNN architectures that learn both `geometric' and nonlinear higher order features and corresponding regularizers. We extend this work by showing that not only can geometric features …


Better Understanding Genomic Architecture With The Use Of Applied Statistics And Explainable Artificial Intelligence, Jonathon C. Romero Aug 2022

Better Understanding Genomic Architecture With The Use Of Applied Statistics And Explainable Artificial Intelligence, Jonathon C. Romero

Doctoral Dissertations

With the continuous improvements in biological data collection, new techniques are needed to better understand the complex relationships in genomic and other biological data sets. Explainable Artificial Intelligence (X-AI) techniques like Iterative Random Forest (iRF) excel at finding interactions within data, such as genomic epistasis. Here, the introduction of new methods to mine for these complex interactions is shown in a variety of scenarios. The application of iRF as a method for Genomic Wide Epistasis Studies shows that the method is robust in finding interacting sets of features in synthetic data, without requiring the exponentially increasing computation time of many …


Predicting Order Status Using Xgboost, Kegan J. Penovich Aug 2022

Predicting Order Status Using Xgboost, Kegan J. Penovich

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Invista, a Koch subsidiary, is a multinational producer of fibers, resins, and intermediaries, particularly nylon. To keep the company operating required them to take over 1.5 million orders over the course of - years, less than a third of which arrived on-time. Orders arriving other than when expected can cause many problems for any company. While arriving late is a clear problem, it also troublesome for them to arrive early. In the face of this, it becomes important to be able to tell a-priori if an order will arrive on-time or not.

To address this problem, we made use of …


Deep Learning For Detecting Trees In The Urban Environment From Lidar, Julian R. Rice Aug 2022

Deep Learning For Detecting Trees In The Urban Environment From Lidar, Julian R. Rice

Master's Theses

Cataloguing and classifying trees in the urban environment is a crucial step in urban and environmental planning. However, manual collection and maintenance of this data is expensive and time-consuming. Algorithmic approaches that rely on remote sensing data have been developed for tree detection in forests, though they generally struggle in the more varied urban environment. This work proposes a novel method for the detection of trees in the urban environment that applies deep learning to remote sensing data. Specifically, we train a PointNet-based neural network to predict tree locations directly from LIDAR data augmented with multi-spectral imaging. We compare this …


State-Based Biological Communication, Nathan Clement Aug 2022

State-Based Biological Communication, Nathan Clement

All Theses

Allostery (1) is the process through which proteins self-regulate in response to various stimuli. Allosteric interactions occur between nonadjacent spatially distant residues (1), and they are exhibited through the correlated motions (2) and momenta of participating residues. The location of allosteric sites in proteins can be determined experimentally but computational methods to predict the location of allosteric sites are being developed as well (2-4, 10). Experimental and computational methodologies for locating allosteric sites can be used to design specific targeted drug delivery (5-6, 19), but these methods have not yet …


Computational Models To Detect Radiation In Urban Environments: An Application Of Signal Processing Techniques And Neural Networks To Radiation Data Analysis, Jose Nicolas Gachancipa Jul 2022

Computational Models To Detect Radiation In Urban Environments: An Application Of Signal Processing Techniques And Neural Networks To Radiation Data Analysis, Jose Nicolas Gachancipa

Beyond: Undergraduate Research Journal

Radioactive sources, such as uranium-235, are nuclides that emit ionizing radiation, and which can be used to build nuclear weapons. In public areas, the presence of a radioactive nuclide can present a risk to the population, and therefore, it is imperative that threats are identified by radiological search and response teams in a timely and effective manner. In urban environments, such as densely populated cities, radioactive sources may be more difficult to detect, since background radiation produced by surrounding objects and structures (e.g., buildings, cars) can hinder the effective detection of unnatural radioactive material. This article presents a computational model …


Machine Learning With Big Data For Electrical Load Forecasting, Alexandra L'Heureux Jun 2022

Machine Learning With Big Data For Electrical Load Forecasting, Alexandra L'Heureux

Electronic Thesis and Dissertation Repository

Today, the amount of data collected is exploding at an unprecedented rate due to developments in Web technologies, social media, mobile and sensing devices and the internet of things (IoT). Data is gathered in every aspect of our lives: from financial information to smart home devices and everything in between. The driving force behind these extensive data collections is the promise of increased knowledge. Therefore, the potential of Big Data relies on our ability to extract value from these massive data sets. Machine learning is central to this quest because of its ability to learn from data and provide data-driven …


A Machine Learning Approach To Revenue Generation Within The Professional Hair Care Industry, Alexander K. Sepenu, Linda Eliasen Jun 2022

A Machine Learning Approach To Revenue Generation Within The Professional Hair Care Industry, Alexander K. Sepenu, Linda Eliasen

SMU Data Science Review

The cosmetic and beauty industry continues to grow and evolve to satisfy its patrons. In the United States, the industry is heavily science-driven, innovative, and fast-paced, suggesting that to remain productive and profitable, companies must seek smart alternatives to their current modus operandi or risk losing out on this multi-billion-dollar industry to fierce competition. In this paper, the authors seek to utilize machine learning models such as clustering and regression to improve the efficiency of current sales and customer segmentation models to help HairCo (pseudonym for confidentiality), a professional hair products manufacturer, strategize their marketing and sales efforts for revenue …


Analysis Of The Electric Power Outage Data And Prediction Of Electric Power Outage For Major Metropolitan Areas In Texas Using Machine Learning And Time Series Methods, Renfeng Wang, Venkata Leela 'Mg' Vanga, Zachary B. Zaiken, Jonathan Bennett Jun 2022

Analysis Of The Electric Power Outage Data And Prediction Of Electric Power Outage For Major Metropolitan Areas In Texas Using Machine Learning And Time Series Methods, Renfeng Wang, Venkata Leela 'Mg' Vanga, Zachary B. Zaiken, Jonathan Bennett

SMU Data Science Review

With growing energy usage, power outages affect millions of households. This case study focuses on gathering power outage historical data, modifying the data to attach weather attributes, and gathering ERCOT energy market conditions for Dallas-Fort Worth and Houston metropolitan areas of Texas. The transformed data is then analyzed using machine learning algorithms including, but not limited to, Regression, Random Forests and XGBoost to consider current weather and ERCOT features and predict power outage percentage for locations. The transformed data is also trained using time series models and serially correlated models including Autoregression and Vector Autoregression. This study also focuses on …


Machine Learning And The Network Analysis Of Ethereum Trading Data, Santosh Sivakumar Jun 2022

Machine Learning And The Network Analysis Of Ethereum Trading Data, Santosh Sivakumar

Dartmouth College Undergraduate Theses

Since their conception, cryptocurrencies have captured the public interest, motivating a growing body of research aimed at exploring blockchain-based transactions. This said, little work has been done to draw conclusions from transaction patterns, particularly in the realm of predicting cryptocurrency price movements. Moreover, research in the cryptocurrency sphere largely focuses on Bitcoin, paying little attention to Ethereum, Bitcoin's second-in-line with respect to market capitalization. In this paper, we construct hourly networks for a year of Ethereum transactions, using computed graph metrics as features in a series of machine learning models. We find that regression-based approaches to predicting Ether prices/price deltas …


Legislative Language For Success, Sanjana Gundala Jun 2022

Legislative Language For Success, Sanjana Gundala

Master's Theses

Legislative committee meetings are an integral part of the lawmaking process for local and state bills. The testimony presented during these meetings is a large factor in the outcome of the proposed bill. This research uses Natural Language Processing and Machine Learning techniques to analyze testimonies from California Legislative committee meetings from 2015-2016 in order to identify what aspects of a testimony makes it successful. A testimony is considered successful if the alignment of the testimony matches the bill outcome (alignment is "For" and the bill passes or alignment is "Against" and the bill fails). The process of finding what …


Building An Artificial Intelligence Framework For Hypertension Diagnosis: A Use Case Of The Problem List Curation, Ketemwabi Yves Shamavu May 2022

Building An Artificial Intelligence Framework For Hypertension Diagnosis: A Use Case Of The Problem List Curation, Ketemwabi Yves Shamavu

Theses & Dissertations

Hypertension is the world's leading factor in cardiovascular disease. Forty-seven percent or close to one in two Americans aged 18 and older are affected. It predicts approximately a thousand deaths per day. Based on recent statistics from the Centers for Disease Control and Prevention, one in three patients with hypertension does not know they are hypertensive. Seventy-five percent of hypertensive patients have uncontrolled hypertension - meaning that they are not treated to target. While there is extensive literature on hypertension diagnosis and management, there is an apparent gap in understanding and acknowledging that a person is hypertensive. Moreover, blood pressure …


Attempting To Predict The Unpredictable: March Madness, Coleton Kanzmeier May 2022

Attempting To Predict The Unpredictable: March Madness, Coleton Kanzmeier

Theses/Capstones/Creative Projects

Each year, millions upon millions of individuals fill out at least one if not hundreds of March Madness brackets. People test their luck every year, whether for fun, with friends or family, or to even win some money. Some people rely on their basketball knowledge whereas others know it is called March Madness for a reason and take a shot in the dark. Others have even tried using statistics to give them an edge. I intend to follow a similar approach, using statistics to my advantage. The end goal is to predict this year’s, 2022, March Madness bracket. To achieve …


The Bracelet: An American Sign Language (Asl) Interpreting Wearable Device, Samuel Aba, Ahmadre Darrisaw, Pei Lin, Thomas Leonard May 2022

The Bracelet: An American Sign Language (Asl) Interpreting Wearable Device, Samuel Aba, Ahmadre Darrisaw, Pei Lin, Thomas Leonard

Chancellor’s Honors Program Projects

No abstract provided.


Quadratic Neural Network Architecture As Evaluated Relative To Conventional Neural Network Architecture, Reid Taylor Apr 2022

Quadratic Neural Network Architecture As Evaluated Relative To Conventional Neural Network Architecture, Reid Taylor

Senior Theses

Current work in the field of deep learning and neural networks revolves around several variations of the same mathematical model for associative learning. These variations, while significant and exceptionally applicable in the real world, fail to push the limits of modern computational prowess. This research does just that: by leveraging high order tensors in place of 2nd order tensors, quadratic neural networks can be developed and can allow for substantially more complex machine learning models which allow for self-interactions of collected and analyzed data. This research shows the theorization and development of mathematical model necessary for such an idea to …


Netsec: Real-Time And Scalable Malware Traffic Detection Within Iot Networks, Ethan Weitkamp, Yusuke Satani, Peilong Li, Jingwen Wang Jan 2022

Netsec: Real-Time And Scalable Malware Traffic Detection Within Iot Networks, Ethan Weitkamp, Yusuke Satani, Peilong Li, Jingwen Wang

Summer Scholarship, Creative Arts and Research Projects (SCARP)

Detecting malicious network traffic in real time has become a crucial requirement at smart communities for elderly care and medical facilities with the prevalence of Internet-of-things (IoT) devices. Existing machine learning based solutions for network traffic malware detection often fail to scale with the exponential increase of IoT devices at the facility and to detect malicious traffic with desirable low latency. In this paper we seek to fill the gap by designing a scalable end-to-end network traffic analyzing system that permits real-time malware detection. By leveraging distributed systems such as Apache Kafka and Apache Spark, the system has demonstrated scalable …


A Machine Learning Algorithm Improves Surface Freeze-Thaw Classification, Fredrick Bunt Jan 2022

A Machine Learning Algorithm Improves Surface Freeze-Thaw Classification, Fredrick Bunt

Graduate Student Theses, Dissertations, & Professional Papers

The frozen or thawed state of the land surface is an important factor affecting a wide range of natural processes such as surface water movement, the carbon cycle, and ecosystem development. It is also important for human endeavors such as permafrost engineering and agricultural planning. This makes having an accurate record important. The Freeze-Thaw (FT) Earth System Data Record (FT-ESDR) is a global, daily product that strives to be a reliable record of the FT ground state. In its current form, the FT-ESDR uses annual regression analysis of reanalysis surface air temperatures (SAT) and brightness temperatures (Tb) at each grid …


Integrated Gradients Is A Nonlinear Generalization Of The Industry Standard Approach To Variable Attribution For Credit Risk Models, Jonathan Boardman, Md Shafiul Alam, Xiao Huang, Ying Xie Jan 2022

Integrated Gradients Is A Nonlinear Generalization Of The Industry Standard Approach To Variable Attribution For Credit Risk Models, Jonathan Boardman, Md Shafiul Alam, Xiao Huang, Ying Xie

Published and Grey Literature from PhD Candidates

In modern society, epistemic uncertainty limits trust in financial relationships, necessitating transparency and accountability mechanisms for both consumers and lenders. One upshot is that credit risk assessments must be explainable to the consumer. In the United States regulatory milieu, this entails both the identification of key factors in a decision and the provision of consistent actions that would improve standing. The traditionally accepted approach to explainable credit risk modeling involves generating scores with Generalized Linear Models (GLMs) - usually logistic regression, calculating the contribution of each predictor to the total points lost from the theoretical maximum, and generating reason codes …


Hydrocarbon Pay Zone Prediction Using Ai Neural Network Modeling., Darren D. Guedon Jan 2022

Hydrocarbon Pay Zone Prediction Using Ai Neural Network Modeling., Darren D. Guedon

Graduate Theses, Dissertations, and Problem Reports

This paper captures the ability of AI neural network technology to analyze petrophysical datasets for pattern recognition and accurate prediction of the pay zone of a vertical well from the Santa Fe field in Kansas.

During this project, data from 10 completed wells in the Santa Fe field were gathered, resulting in a dataset with 25,580 records, ten predictors (logs data), and a single binary output (Yes or No) to identify the availability of Hydrocarbon over a half feet depth segment in the well. Several models composed of different predictors combinations were also tested to determine how impactful some logs …


Batch Normalization Preconditioning For Neural Network Training, Susanna Luisa Gertrude Lange Jan 2022

Batch Normalization Preconditioning For Neural Network Training, Susanna Luisa Gertrude Lange

Theses and Dissertations--Mathematics

Batch normalization (BN) is a popular and ubiquitous method in deep learning that has been shown to decrease training time and improve generalization performance of neural networks. Despite its success, BN is not theoretically well understood. It is not suitable for use with very small mini-batch sizes or online learning. In this work, we propose a new method called Batch Normalization Preconditioning (BNP). Instead of applying normalization explicitly through a batch normalization layer as is done in BN, BNP applies normalization by conditioning the parameter gradients directly during training. This is designed to improve the Hessian matrix of the loss …


Predicting Outcomes Of El Clásico Using Random Forests And Extreme Gradient Boosting, Emanuel Jarquin Jan 2022

Predicting Outcomes Of El Clásico Using Random Forests And Extreme Gradient Boosting, Emanuel Jarquin

CMC Senior Theses

In the modern era, sports betting is becoming increasingly popular. This is especially true in the realm of soccer (or ‘football’ as it is known outside the United States). As a result, the concept of attempting to predict the outcomes of soccer matches using machine learning has garnered much attention in recent years. In this thesis, I utilize well-known machine learning techniques to predict the outcomes of El Clásico matchups and compare the predictive performance of these techniques. The predictive methods employed for this thesis are random forests using the party package in R and extreme gradient boosting using the …