Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Science

Theses/Dissertations

2020

Institution
Keyword
Publication

Articles 1 - 30 of 79

Full-Text Articles in Physical Sciences and Mathematics

Distributed Load Testing By Modeling And Simulating User Behavior, Chester Ira Parrott Dec 2020

Distributed Load Testing By Modeling And Simulating User Behavior, Chester Ira Parrott

LSU Doctoral Dissertations

Modern human-machine systems such as microservices rely upon agile engineering practices which require changes to be tested and released more frequently than classically engineered systems. A critical step in the testing of such systems is the generation of realistic workloads or load testing. Generated workload emulates the expected behaviors of users and machines within a system under test in order to find potentially unknown failure states. Typical testing tools rely on static testing artifacts to generate realistic workload conditions. Such artifacts can be cumbersome and costly to maintain; however, even model-based alternatives can prevent adaptation to changes in a system …


Bayesian Semi-Supervised Keyphrase Extraction And Jackknife Empirical Likelihood For Assessing Heterogeneity In Meta-Analysis, Guanshen Wang Dec 2020

Bayesian Semi-Supervised Keyphrase Extraction And Jackknife Empirical Likelihood For Assessing Heterogeneity In Meta-Analysis, Guanshen Wang

Statistical Science Theses and Dissertations

This dissertation investigates: (1) A Bayesian Semi-supervised Approach to Keyphrase Extraction with Only Positive and Unlabeled Data, (2) Jackknife Empirical Likelihood Confidence Intervals for Assessing Heterogeneity in Meta-analysis of Rare Binary Events.

In the big data era, people are blessed with a huge amount of information. However, the availability of information may also pose great challenges. One big challenge is how to extract useful yet succinct information in an automated fashion. As one of the first few efforts, keyphrase extraction methods summarize an article by identifying a list of keyphrases. Many existing keyphrase extraction methods focus on the unsupervised setting, …


Analysis Of Github Pull Requests, Canon Ellis Dec 2020

Analysis Of Github Pull Requests, Canon Ellis

Computer Science and Engineering Theses and Dissertations

The popularity of the software repository site GitHub has created a rise in the Pull Based Development Models' use. An essential portion of pull-based development is the creation of Pull Requests. Pull Requests often have to be reviewed by an individual to be approved and accepted into the Master branch of a software repository. The reviewing process can often be time-consuming and introduce a relatively high level of lost development time. This paper examines thousands of pull requests to understand the most valuable metadata of pull requests. We then introduce metrics in comparing the metadata of pull requests to understand …


Designing Surveys On Youth Immigration Reform: Lessons From The 2016 Cces Anomaly, Saige Calkins Dec 2020

Designing Surveys On Youth Immigration Reform: Lessons From The 2016 Cces Anomaly, Saige Calkins

Masters Theses

Even with clear advantages to using internet based survey research, there are still some uncertainties to which survey methods are most conducive to an online platform. Most survey method literature, whether focusing on online, telephone, or in-person formats, tend to observe little to no differences between using various survey modes and survey results. Despite this, there is little research focused on the interaction effect between survey formatting, in terms of design and framing, and public opinion on social issues, specifically child immigration policies - a recent topic of popular debate. This paper examines an anomalous result found within the 2016 …


Machine Learning Model Selection For Predicting Global Bathymetry, Nicholas P. Moran Dec 2020

Machine Learning Model Selection For Predicting Global Bathymetry, Nicholas P. Moran

University of New Orleans Theses and Dissertations

This work is concerned with the viability of Machine Learning (ML) in training models for predicting global bathymetry, and whether there is a best fit model for predicting that bathymetry. The desired result is an investigation of the ability for ML to be used in future prediction models and to experiment with multiple trained models to determine an optimum selection. Ocean features were aggregated from a set of external studies and placed into two minute spatial grids representing the earth's oceans. A set of regression models, classification models, and a novel classification model were then fit to this data and …


Automatically Classifying Non-Functional Requirements With Feature Extraction And Supervised Machine Learning Techniques, Mahtab Ezzatikarami Dec 2020

Automatically Classifying Non-Functional Requirements With Feature Extraction And Supervised Machine Learning Techniques, Mahtab Ezzatikarami

Electronic Thesis and Dissertation Repository

Abstract. Context and Motivation: Non-functional requirements (NFRs) of a system need to be classified into different types such as usability, performance, etc. This would enable stakeholders to ensure the completeness of their work by extracting specific NFRs related to their expertise. Question/Problem: Because of the size and complexity of requirement specification documents, the manual classification of NFRs is time-consuming, labour-intensive, and error-prone. We thus need an automated solution that can provide a highly accurate and efficient categorization of NFRs. Principal ideas/results: In this investigation, using natural language processing and supervised machine learning (SML) techniques, we investigate with feature extraction techniques …


Classifying Imbalanced Financial Fraud Data Utilizing Enhanced Random Forest Algorithm, Charles Gardner Dec 2020

Classifying Imbalanced Financial Fraud Data Utilizing Enhanced Random Forest Algorithm, Charles Gardner

Master of Science in Computer Science Theses

Imbalanced datasets have been a unique challenge for machine learning, requiring specialized approaches to correctly classify the minority class. Financial fraud detection involves using highly imbalanced datasets with a class imbalance of up to .01% frauds to 99.99% regular transactions. It is essential to identify all frauds in financial fraud detection, even if some classifications' precision is low. I developed a random forest assembly that separates fraudulent transactions into tiers of precision. With this approach, 96% of fraudulent transactions are identified, showing an 8% increase in recall when compared to standard approaches. 59% of fraud classifications' precision increases by 10% …


Nonparametric Bayesian Deep Learning For Scientific Data Analysis, Devanshu Agrawal Dec 2020

Nonparametric Bayesian Deep Learning For Scientific Data Analysis, Devanshu Agrawal

Doctoral Dissertations

Deep learning (DL) has emerged as the leading paradigm for predictive modeling in a variety of domains, especially those involving large volumes of high-dimensional spatio-temporal data such as images and text. With the rise of big data in scientific and engineering problems, there is now considerable interest in the research and development of DL for scientific applications. The scientific domain, however, poses unique challenges for DL, including special emphasis on interpretability and robustness. In particular, a priority of the Department of Energy (DOE) is the research and development of probabilistic ML methods that are robust to overfitting and offer reliable …


Exploration Of Mid To Late Paleozoic Tectonics Along The Cincinnati Arch Using Gis And Python To Automate Geologic Data Extraction From Disparate Sources, Kenneth Steven Boling Dec 2020

Exploration Of Mid To Late Paleozoic Tectonics Along The Cincinnati Arch Using Gis And Python To Automate Geologic Data Extraction From Disparate Sources, Kenneth Steven Boling

Doctoral Dissertations

Structure contour maps are one of the most common methods of visualizing geologic horizons as three-dimensional surfaces. In addition to their practical applications in the oil and gas and mining industries, these maps can be used to evaluate the relationships of different geologic units in order to unravel the tectonic history of an area. The construction of high-resolution regional structure contour maps of a particular geologic horizon requires a significant volume of data that must be compiled from all available surface and subsurface sources. Processing these data using conventional methods and even basic GIS tools can be tedious and very …


Unifying Chemistry And Machine Learning For The Study Of Noncovalent Interactions, Jacob A. Townsend Dec 2020

Unifying Chemistry And Machine Learning For The Study Of Noncovalent Interactions, Jacob A. Townsend

Doctoral Dissertations

Gas separations are in great demand for carbon emission reduction, natural gas purification, oxygen isolation, and much more. Many of these separations rely on cost-prohibitive methods such as cryogenic distillation or strong-binding solvents. As a result, novel materials are being developed to subvert the energetic expense of gas separation processes. These studies focus on improving the performance of alternative materials, including (but not limited to) metal-organic frameworks, covalent organic frameworks, dense polymeric membranes, porous polymers, and ionic liquids.

In this work, the atomistic effects of functional units are explored for gas separations processes using electronic structure theory and machine learning. …


Data And Assessment Management In Collegiate Recreation, Jeana Carow Dec 2020

Data And Assessment Management In Collegiate Recreation, Jeana Carow

Graduate Theses and Dissertations

Collegiate recreation programs and centers typically provide traditional programming space in addition to a range of physical activity spaces and resources, as a valuable part of the student experience. The external pressures of identifying and communicating departmental value and impact on the campus community has resulted in collegiate recreation departments’ use of data to communicate the effectiveness and impact of their work. The purpose of the study was to identify the data collection and assessment management practices of collegiate recreation departments, particularly focusing on the organization of data and assessment strategies as well as data collection, storage, reporting, analyzing, and …


Exploring Information For Quantum Machine Learning Models, Michael Telahun Dec 2020

Exploring Information For Quantum Machine Learning Models, Michael Telahun

Electronic Theses and Dissertations

Quantum computing performs calculations by using physical phenomena and quantum mechanics principles to solve problems. This form of computation theoretically has been shown to provide speed ups to some problems of modern-day processing. With much anticipation the utilization of quantum phenomena in the field of Machine Learning has become apparent. The work here develops models from two software frameworks: TensorFlow Quantum (TFQ) and PennyLane for machine learning purposes. Both developed models utilize an information encoding technique amplitude encoding for preparation of states in a quantum learning model. This thesis explores both the capacity for amplitude encoding to provide enriched state …


Hierarchical Aggregation Of Multidimensional Data For Efficient Data Mining, Safaa Khalil Alwajidi Dec 2020

Hierarchical Aggregation Of Multidimensional Data For Efficient Data Mining, Safaa Khalil Alwajidi

Dissertations

Big data analysis is essential for many smart applications in areas such as connected healthcare, intelligent transportation, human activity recognition, environment, and climate change monitoring. Traditional data mining algorithms do not scale well to big data due to the enormous number of data points and the velocity of their generation. Mining and learning from big data need time and memory efficiency techniques, albeit the cost of possible loss in accuracy. This research focuses on the mining of big data using aggregated data as input. We developed a data structure that is to be used to aggregate data at multiple resolutions. …


Incorporating Shear Resistance Into Debris Flow Triggering Model Statistics, Noah J. Lyman Dec 2020

Incorporating Shear Resistance Into Debris Flow Triggering Model Statistics, Noah J. Lyman

Master's Theses

Several regions of the Western United States utilize statistical binary classification models to predict and manage debris flow initiation probability after wildfires. As the occurrence of wildfires and large intensity rainfall events increase, so has the frequency in which development occurs in the steep and mountainous terrain where these events arise. This resulting intersection brings with it an increasing need to derive improved results from existing models, or develop new models, to reduce the economic and human impacts that debris flows may bring. Any development or change to these models could also theoretically increase the ease of collection, processing, and …


Computational Behavioral Analytics: Estimating Psychological Traits In Foreign Languages., Kristopher Wayne Reese Dec 2020

Computational Behavioral Analytics: Estimating Psychological Traits In Foreign Languages., Kristopher Wayne Reese

Electronic Theses and Dissertations

The rise of technology proliferating into the workplace has increased the threat of loss of intellectual property, classified, and proprietary information for companies, governments, and academics. This can cause economic damage to the creators of new IP, companies, and whole economies. This technology proliferation has also assisted terror groups and lone wolf actors in pushing their message to a larger audience or finding similar tribal groups that share common, sometimes flawed, beliefs across various social media platforms. These types of challenges have created numerous studies in psycholinguistics, as well as commercial tools, that look to assist in identifying potential threats …


Carbon Metabolism In Cave Subaerial Biofilms, Victoria E. Frazier Dec 2020

Carbon Metabolism In Cave Subaerial Biofilms, Victoria E. Frazier

Masters Theses

Subaerial biofilms (SABs) grow at the interface between the atmosphere and rock surfaces in terrestrial and subterranean environments around the world. Multi-colored SABs colonizing relatively dry and nutrient-limited cave surfaces are known to contain microbes putatively involved in chemolithoautotrophic processes using inorganic carbon like carbon dioxide (CO2) or methane (CH4). However, the importance of CO2 and CH4 to SAB biomass production has not been quantified, the environmental conditions influencing biomass production and diversity have not been thoroughly evaluated, and stable carbon and nitrogen isotope compositions have yet to be determined from epigenic cave SABs. …


Development Of Reduced Order Models Using Reservoir Simulation And Physics Informed Machine Learning Techniques, Mark V. Behl Jr Nov 2020

Development Of Reduced Order Models Using Reservoir Simulation And Physics Informed Machine Learning Techniques, Mark V. Behl Jr

LSU Master's Theses

Reservoir simulation is the industry standard for prediction and characterization of processes in the subsurface. However, simulation is computationally expensive and time consuming. This study explores reduced order models (ROMs) as an appropriate alternative. ROMs that use neural networks effectively capture nonlinear dependencies, and only require available operational data as inputs. Neural networks are a black box and difficult to interpret, however. Physics informed neural networks (PINNs) provide a potential solution to these shortcomings, but have not yet been applied extensively in petroleum engineering.

A mature black-oil simulation model from Volve public data release was used to generate training data …


Making Sense Of Online Public Health Debates With Visual Analytics Systems, Anton Ninkov Nov 2020

Making Sense Of Online Public Health Debates With Visual Analytics Systems, Anton Ninkov

Electronic Thesis and Dissertation Repository

Online debates occur frequently and on a wide variety of topics. Particularly, online debates about various public health topics (e.g., vaccines, statins, cannabis, dieting plans) are prevalent in today’s society. These debates are important because of the real-world implications they can have on public health. Therefore, it is important for public health stakeholders (i.e., those with a vested interest in public health) and the general public to have the ability to make sense of these debates quickly and effectively. This dissertation investigates ways of enabling sense-making of these debates with the use of visual analytics systems (VASes). VASes are computational …


Ensemble Labeling Towards Scientific Information Extraction (Elsie), Erin Murphy Nov 2020

Ensemble Labeling Towards Scientific Information Extraction (Elsie), Erin Murphy

College of Computing and Digital Media Dissertations

Extracting scientific facts from unstructured text is difficult due to challenges specific to the ambiguity of the language, the complexity of the scientific named entities and relations to be extracted. This problem is well illustrated through the extraction of polymer names and their properties. Even in the cases where the property is a temperature, identifying the polymer name associated with the temperature may require expertise due to the use of acronyms, synonyms, complicated naming conventions and by the fact that new polymer names are being “introduced” to the vernacular as polymer science advances. While there exist domain-specific machine learning toolkits …


Using Spatial Analysis And Machine Learning Techniques To Develop A Comprehensive Highway-Rail Grade Crossing Consolidation Model, Samira Soleimani Oct 2020

Using Spatial Analysis And Machine Learning Techniques To Develop A Comprehensive Highway-Rail Grade Crossing Consolidation Model, Samira Soleimani

LSU Doctoral Dissertations

The safety of highway-railroad grade crossings (HRGC) is still an issue in the United States of America (USA). The grade crossing is where a railroad crosses a road at the same level without any over or underpass. To improve the safety of crossings, the crossings’ condition should be explored from several aspects such as engineering design (speed limit, warning signs, etc.), road condition (number of lanes, surface markings, etc.), rail design (the type of track, ballast, etc.), temporal variables (weather, visibility, time of day, lightning, etc.), social variables (population, race, etc.), and last but not least, spatial variables (the type …


Machine Learning Applications For Drug Repurposing, Hansaim Lim Sep 2020

Machine Learning Applications For Drug Repurposing, Hansaim Lim

Dissertations, Theses, and Capstone Projects

The cost of bringing a drug to market is astounding and the failure rate is intimidating. Drug discovery has been of limited success under the conventional reductionist model of one-drug-one-gene-one-disease paradigm, where a single disease-associated gene is identified and a molecular binder to the specific target is subsequently designed. Under the simplistic paradigm of drug discovery, a drug molecule is assumed to interact only with the intended on-target. However, small molecular drugs often interact with multiple targets, and those off-target interactions are not considered under the conventional paradigm. As a result, drug-induced side effects and adverse reactions are often neglected …


A Data Exploration Of Jeopardy! From 1984 To The Present, Brian S. Hamilton Sep 2020

A Data Exploration Of Jeopardy! From 1984 To The Present, Brian S. Hamilton

Dissertations, Theses, and Capstone Projects

The gameshow Jeopardy! has been around in its current iteration—hosted by Alex Trebek—since 1984. During this time, it has accumulated data on clues, contestants, and possible strategies on how to win. Using a crowd-sourced archive called J! Archive, this project seeks to find trends in the topics that the game covers and take a deeper look into the performance of its contestants. It employs topic modeling, a text-analysis method, to organize the hundreds of thousands of archived clues and statistical analysis to rate the performance of contestants by gender. Using web-based visualization tools, the data is shown in an …


Sensory Stressors Impact Species Responses Across Local And Continental Scales, Ashley A. Wilson Sep 2020

Sensory Stressors Impact Species Responses Across Local And Continental Scales, Ashley A. Wilson

Master's Theses

Pervasive growth in industrialization and advances in technology now exposes much of the world to anthropogenic night light and noise (ANLN), which pose a global environmental challenge in terrestrial environments. An estimated one-tenth of the planet’s land area experiences artificial light at night — and that rises to 23% if skyglow is included. Moreover, anthropogenic noise is associated with urban development and transportation networks, as the ecological impact of roads alone is estimated to affect one-fifth of the total land cover of the United States and is increasing in space and intensity. Existing research involving impacts of light or noise …


Tommy John Surgery: Potential Risk Factors And Causes In Major League Pitchers, Ethan Rhinehart Aug 2020

Tommy John Surgery: Potential Risk Factors And Causes In Major League Pitchers, Ethan Rhinehart

Analytics Capstones

Since 1974, over 270 Tommy John surgeries have been performed on pitchers at the major league level. Thousands more surgeries have been performed on minor league, college, high school and youth pitchers. As more biomechanical and statistical research has been conducted over the past few decades, a clearer picture of some of the risks and causes that lead to serious elbow injuries in pitchers have been found. This paper explores the research surrounding several of those factors, including pitching mechanics, pitch velocity, and pitch type. Using a data set comprised of major league pitchers that have undergone Tommy John surgery …


Impact Of Lost Gas Tax Revenue Due To Sale Of Electric Vehicles: Analysis And Recommendations For The 50 States, Jennifer Ricciuti Aug 2020

Impact Of Lost Gas Tax Revenue Due To Sale Of Electric Vehicles: Analysis And Recommendations For The 50 States, Jennifer Ricciuti

Analytics Capstones

Although states might have policy reasons to encourage the use of Electric Vehicles (EVs), the impact of future U.S. EV sales present a significant loss of gas tax revenue for each of the states, as these vehicles do not require gas to operate. For the last three years the number of Electric Vehicle registrations have doubled and are steadily increasing as a result of people becoming more economically and ecologically minded. This is proving to be an optimal choice for car purchasers over standard Internal Combustion Engine (ICE) vehicles, as research has shown that Electric Vehicles are superior for exhibiting …


Team Formation Using Recommendation Systems, Shreyas Patil Aug 2020

Team Formation Using Recommendation Systems, Shreyas Patil

Theses

The importance of team formation has been realized since ages, but finding the most effective team out of the available human resources is a problem that persists to the date. Having members with complementary skills, along with a few must-have behavioral traits, such as trust and collaborativeness among the team members are the key ingredients behind team synergy and performance. This thesis designs and implements two different algorithms for the team formation problem using ideas adapted from the recommender systems literature. One of the proposed solutions uses the Glicko-2 rating system to rate the employees’ skills which can easily separate …


Blockchain Technology And Freight Forwarder Exploration Of Implications Focused On Practitioners In Shanghai, Johannes Van Bohemen Aug 2020

Blockchain Technology And Freight Forwarder Exploration Of Implications Focused On Practitioners In Shanghai, Johannes Van Bohemen

World Maritime University Dissertations

No abstract provided.


How Port Logistics Competitiveness Evolves Among Major Ports In China And Europe (1998-2018), Jiawei Wang Aug 2020

How Port Logistics Competitiveness Evolves Among Major Ports In China And Europe (1998-2018), Jiawei Wang

World Maritime University Dissertations

No abstract provided.


Discrimination Of Leucine And Isoleucine In De Novo Peptide Sequencing Using Deep Neural Networks, Bingran Shen Aug 2020

Discrimination Of Leucine And Isoleucine In De Novo Peptide Sequencing Using Deep Neural Networks, Bingran Shen

Electronic Thesis and Dissertation Repository

De novo peptide sequencing from tandem MS data is a key technology in proteomics for understanding the structure of proteins, especially for first seen sequences. Although this technique has advanced rapidly in recent years and become more effective, one crucial problem remained unsolved. Due to the isomerism of leucine and isoleucine, they are practically indistinguishable in de novo sequencing using traditional tandem MS data. Some experimental attempts have been made to resolve this ambiguity such as EThCD fragmentation process. In this study, we took a data focused approach rather than only looking for characteristic satellite ions produced by the EThCD …


Ranking Comments: An Entropy-Based Method With Word Embedding Clustering, Yuyang Zhang Aug 2020

Ranking Comments: An Entropy-Based Method With Word Embedding Clustering, Yuyang Zhang

Electronic Thesis and Dissertation Repository

Automatically ranking comments by their relevance plays an important role in text mining and text summarization area. In this thesis, firstly, we introduce a new text digitalization method: the bag of word clusters model. Unlike the traditional bag of words model that treats each word as an independent item, we group semantic-related words as clusters using pre-trained word2vec word embeddings and represent each comment as a distribution of word clusters. This method can extract both semantic and statistical information from texts. Next, we propose an unsupervised ranking algorithm that identifies relevant comments by their distance to the “ideal” comment. The …