Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Institution
-
- Southern Methodist University (4)
- University of New Mexico (3)
- City University of New York (CUNY) (2)
- Georgia Southern University (2)
- University of Montana (2)
-
- Western University (2)
- Kennesaw State University (1)
- Louisiana State University (1)
- Murray State University (1)
- Old Dominion University (1)
- Portland State University (1)
- Selected Works (1)
- SelectedWorks (1)
- The University of Akron (1)
- University of Louisville (1)
- University of Massachusetts Amherst (1)
- University of Nebraska - Lincoln (1)
- University of New Hampshire (1)
- Utah State University (1)
- Publication Year
- Publication
-
- SMU Data Science Review (4)
- Electronic Theses and Dissertations (3)
- Electronic Thesis and Dissertation Repository (2)
- Graduate Student Theses, Dissertations, & Professional Papers (2)
- All Graduate Theses and Dissertations, Spring 1920 to Summer 2023 (1)
-
- Computational Modeling & Simulation Engineering Theses & Dissertations (1)
- Computer Science ETDs (1)
- Department of Statistics: Dissertations, Theses, and Student Work (1)
- Dissertations, Theses, and Capstone Projects (1)
- Doctor of Data Science and Analytics Dissertations (1)
- Doctoral Dissertations (1)
- Electrical and Computer Engineering ETDs (1)
- Honors Theses and Capstones (1)
- John E. Sawyer (1)
- LSU Doctoral Dissertations (1)
- Mathematics & Statistics ETDs (1)
- Murray State Theses and Dissertations (1)
- Open Educational Resources (1)
- Peter Austin (1)
- Systems Science Friday Noon Seminar Series (1)
- Williams Honors College, Honors Research Projects (1)
- Publication Type
Articles 1 - 28 of 28
Full-Text Articles in Physical Sciences and Mathematics
Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre
Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre
SMU Data Science Review
Hair is found in over 90% of crime scenes and has long been analyzed as trace evidence. However, recent reviews of traditional hair fiber analysis techniques, primarily morphological examination, have cast doubt on its reliability. To address these concerns, this study employed machine learning algorithms, specifically Linear Discriminant Analysis (LDA) and Random Forest, on Direct Analysis in Real Time time-of-flight mass spectra collected from human, cat, and dog hair samples. The objective was to develop a chemistry- and statistics-based classification method for unbiased taxonomic identification of hair. The results of the study showed that LDA and Random Forest were highly …
Comparison Of Sampling Methods For Predicting Wine Quality Based On Physicochemical Properties, Robert Burigo, Scott Frazier, Eli Kravez, Nibhrat Lohia
Comparison Of Sampling Methods For Predicting Wine Quality Based On Physicochemical Properties, Robert Burigo, Scott Frazier, Eli Kravez, Nibhrat Lohia
SMU Data Science Review
Using the physicochemical properties of wine to predict quality has been done in numerous studies. Given the nature of these properties, the data is inherently skewed. Previous works have focused on handful of sampling techniques to balance the data. This research compares multiple sampling techniques in predicting the target with limited data. For this purpose, an ensemble model is used to evaluate the different techniques. There was no evidence found in this research to conclude that there are specific oversampling methods that improve random forest classifier for a multi-class problem.
Applications Of Transfer Learning From Malicious To Vulnerable Binaries, Sean Patrick Mcnulty
Applications Of Transfer Learning From Malicious To Vulnerable Binaries, Sean Patrick Mcnulty
Graduate Student Theses, Dissertations, & Professional Papers
Malware detection and vulnerability detection are important cybersecurity tasks. Previous research has successfully applied a variety of machine learning methods to both. However, despite their potential synergies, previous research has yet to unite these two tasks. Given the recent success of transfer learning in many domains, such as language modeling and image recognition, this thesis investigated the use of transfer learning to improve vulnerability detection. Specifically, we pre-trained a series of models to detect malicious binaries and used the weights from those models to kickstart the detection of vulnerable binaries. In our study, we also investigated five different data representations …
Applications Of Machine Learning Algorithms In Materials Science And Bioinformatics, Mohammed Quazi
Applications Of Machine Learning Algorithms In Materials Science And Bioinformatics, Mohammed Quazi
Mathematics & Statistics ETDs
The piezoelectric response has been a measure of interest in density functional theory (DFT) for micro-electromechanical systems (MEMS) since the inception of MEMS technology. Piezoelectric-based MEMS devices find wide applications in automobiles, mobile phones, healthcare devices, and silicon chips for computers, to name a few. Piezoelectric properties of doped aluminum nitride (AlN) have been under investigation in materials science for piezoelectric thin films because of its wide range of device applicability. In this research using rigorous DFT calculations, high throughput ab-initio simulations for 23 AlN alloys are generated.
This research is the first to report strong enhancements of piezoelectric properties …
A Course In Data Science: R And Prediction Modeling, Adam Kapelner
A Course In Data Science: R And Prediction Modeling, Adam Kapelner
Open Educational Resources
This is a self-contained course in data science and machine learning using R. It covers philosophy of modeling with data, prediction via linear models, machine learning including support vector machines and random forests, probability estimation and asymmetric costs using logistic regression and probit regression, underfitting vs. overfitting, model validation, handling missingness and much more. There is formal instruction of data manipulation using dplyr and data.table, visualization using ggplot2 and statistical computing.
Early-Warning Alert Systems For Financial-Instability Detection: An Hmm-Driven Approach, Xing Gu
Early-Warning Alert Systems For Financial-Instability Detection: An Hmm-Driven Approach, Xing Gu
Electronic Thesis and Dissertation Repository
Regulators’ early intervention is crucial when the financial system is experiencing difficulties. Financial stability must be preserved to avert banks’ bailouts, which hugely drain government's financial resources. Detecting in advance periods of financial crisis entails the development and customisation of accurate and robust quantitative techniques. The goal of this thesis is to construct automated systems via the interplay of various mathematical and statistical methodologies to signal financial instability episodes in the near-term horizon. These signal alerts could provide regulatory bodies with the capacity to initiate appropriate response that will thwart or at least minimise the occurrence of a financial crisis. …
Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano
Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano
Electrical and Computer Engineering ETDs
Due to the increasing use of photovoltaic systems, power grids are vulnerable to the projection of shadows from moving clouds. An intra-hour solar forecast provides power grids with the capability of automatically controlling the dispatch of energy, reducing the additional cost for a guaranteed, reliable supply of energy (i.e., energy storage). This dissertation introduces a novel sky imager consisting of a long-wave radiometric infrared camera and a visible light camera with a fisheye lens. The imager is mounted on a solar tracker to maintain the Sun in the center of the images throughout the day, reducing the scattering effect produced …
Finding The Best Predictors For Foot Traffic In Us Seafood Restaurants, Isabel Paige Beaulieu
Finding The Best Predictors For Foot Traffic In Us Seafood Restaurants, Isabel Paige Beaulieu
Honors Theses and Capstones
COVID-19 caused state and nation-wide lockdowns, which altered human foot traffic, especially in restaurants. The seafood sector in particular suffered greatly as there was an increase in illegal fishing, it is made up of perishable goods, it is seasonal in some places, and imports and exports were slowed. Foot traffic data is useful for business owners to have to know how much to order, how many employees to schedule, etc. One issue is that the data is very expensive, hard to get, and not available until months after it is recorded. Our goal is to not only find covariates that …
A Non-Deterministic Deep Learning Based Surrogate For Ice Sheet Modeling, Hannah Jordan
A Non-Deterministic Deep Learning Based Surrogate For Ice Sheet Modeling, Hannah Jordan
Graduate Student Theses, Dissertations, & Professional Papers
Surrogate modeling is a new and expanding field in the world of deep learning, providing a computationally inexpensive way to approximate results from computationally demanding high-fidelity simulations. Ice sheet modeling is one of these computationally expensive models, the model used in this study currently requires between 10 and 20 minutes to complete one simulation. While this process is adequate for certain applications, the ability to use sampling approaches to perform statistical inference becomes infeasible. This issue can be overcome by using a surrogate model to approximate the ice sheet model, bringing the time to produce output down to a tenth …
Comparing Machine Learning Techniques With State-Of-The-Art Parametric Prediction Models For Predicting Soybean Traits, Susweta Ray
Department of Statistics: Dissertations, Theses, and Student Work
Soybean is a significant source of protein and oil, and also widely used as animal feed. Thus, developing lines that are superior in terms of yield, protein and oil content is important to feed the ever-growing population. As opposed to the high-cost phenotyping, genotyping is both cost and time efficient for breeders while evaluating new lines in different environments (location-year combinations) can be costly. Several Genomic prediction (GP) methods have been developed to use the marker and environment data effectively to predict the yield or other relevant phenotypic traits of crops. Our study compares a conventional GP method (GBLUP), a …
Applications Of Machine Learning In High-Frequency Trade Direction Classification, Jared E. Hansen
Applications Of Machine Learning In High-Frequency Trade Direction Classification, Jared E. Hansen
All Graduate Theses and Dissertations, Spring 1920 to Summer 2023
The correct assignment of trades as buyer-initiated or seller-initiated is paramount in many quantitative finance studies. Simple decision rule methods have been used for signing trades since many data sets available to researchers do not include the sign of each trade executed. By utilizing these decision rule methods, as well as engineering new variables from available data, we have demonstrated that machine learning models outperform prior methods for accurately signing trades as buys and sells, achieving state-of-the-art results. The best model developed was 4.5 percentage points more accurate than older methods when predicting onto unseen data. Since finance and economics …
Data-Driven Investment Decisions In P2p Lending: Strategies Of Integrating Credit Scoring And Profit Scoring, Yan Wang
Doctor of Data Science and Analytics Dissertations
In this dissertation, we develop and discuss several loan evaluation methods to guide the investment decisions for peer-to-peer (P2P) lending. In evaluating loans, credit scoring and profit scoring are the two widely utilized approaches. Credit scoring aims at minimizing the risk while profit scoring aims at maximizing the profit. This dissertation addresses the strengths and weaknesses of each scoring method by integrating them in various ways in order to provide the optimal investment suggestions for different investors. Before developing the methods for loan evaluation at the individual level, we applied the state-of-the-art method called the Long Short Term Memory (LSTM) …
Evaluating An Ordinal Output Using Data Modeling, Algorithmic Modeling, And Numerical Analysis, Martin Keagan Wynne Brown
Evaluating An Ordinal Output Using Data Modeling, Algorithmic Modeling, And Numerical Analysis, Martin Keagan Wynne Brown
Murray State Theses and Dissertations
Data and algorithmic modeling are two different approaches used in predictive analytics. The models discussed from these two approaches include the proportional odds logit model (POLR), the vector generalized linear model (VGLM), the classification and regression tree model (CART), and the random forests model (RF). Patterns in the data were analyzed using trigonometric polynomial approximations and Fast Fourier Transforms. Predictive modeling is used frequently in statistics and data science to find the relationship between the explanatory (input) variables and a response (output) variable. Both approaches prove advantageous in different cases depending on the data set. In our case, the data …
Habitat Associations And Reproduction Of Fishes On The Northwestern Gulf Of Mexico Shelf Edge, Elizabeth Marie Keller
Habitat Associations And Reproduction Of Fishes On The Northwestern Gulf Of Mexico Shelf Edge, Elizabeth Marie Keller
LSU Doctoral Dissertations
Several of the northwestern Gulf of Mexico (GOM) shelf-edge banks provide critical hard bottom habitat for coral and fish communities, supporting a wide diversity of ecologically and economically important species. These sites may be fish aggregation and spawning sites and provide important habitat for fish growth and reproduction. Already designated as habitat areas of particular concern, many of these banks are also under consideration for inclusion in the expansion of the Flower Garden Banks National Marine Sanctuary. This project aimed to gain a more comprehensive understanding of the communities and fish species on shelf-edge banks by way of gonad histology, …
Semi-Supervised Regression With Generative Adversarial Networks Using Minimal Labeled Data, Greg Olmschenk
Semi-Supervised Regression With Generative Adversarial Networks Using Minimal Labeled Data, Greg Olmschenk
Dissertations, Theses, and Capstone Projects
This work studies the generalization of semi-supervised generative adversarial networks (GANs) to regression tasks. A novel feature layer contrasting optimization function, in conjunction with a feature matching optimization, allows the adversarial network to learn from unannotated data and thereby reduce the number of labels required to train a predictive network. An analysis of simulated training conditions is performed to explore the capabilities and limitations of the method. In concert with the semi-supervised regression GANs, an improved label topology and upsampling technique for multi-target regression tasks are shown to reduce data requirements. Improvements are demonstrated on a wide variety of vision …
Machine Learning In Support Of Electric Distribution Asset Failure Prediction, Robert D. Flamenbaum, Thomas Pompo, Christopher Havenstein, Jade Thiemsuwan
Machine Learning In Support Of Electric Distribution Asset Failure Prediction, Robert D. Flamenbaum, Thomas Pompo, Christopher Havenstein, Jade Thiemsuwan
SMU Data Science Review
In this paper, we present novel approaches to predicting as- set failure in the electric distribution system. Failures in overhead power lines and their associated equipment in particular, pose significant finan- cial and environmental threats to electric utilities. Electric device failure furthermore poses a burden on customers and can pose serious risk to life and livelihood. Working with asset data acquired from an electric utility in Southern California, and incorporating environmental and geospatial data from around the region, we applied a Random Forest methodology to predict which overhead distribution lines are most vulnerable to fail- ure. Our results provide evidence …
Statistical And Machine Learning Methods Evaluated For Incorporating Soil And Weather Into Corn Nitrogen Recommendations, Curtis J. Ransom, Newell R. Kitchen, James J. Camberato, Paul R. Carter, Richard B. Ferguson, Fabián G. Fernández, David W. Franzen, Carrie A. M. Laboski, D. Brenton Myers, Emerson D. Nafziger, John E. Sawyer, John F. Shanahan
Statistical And Machine Learning Methods Evaluated For Incorporating Soil And Weather Into Corn Nitrogen Recommendations, Curtis J. Ransom, Newell R. Kitchen, James J. Camberato, Paul R. Carter, Richard B. Ferguson, Fabián G. Fernández, David W. Franzen, Carrie A. M. Laboski, D. Brenton Myers, Emerson D. Nafziger, John E. Sawyer, John F. Shanahan
John E. Sawyer
Nitrogen (N) fertilizer recommendation tools could be improved for estimating corn (Zea mays L.) N needs by incorporating site-specific soil and weather information. However, an evaluation of analytical methods is needed to determine the success of incorporating this information. The objectives of this research were to evaluate statistical and machine learning (ML) algorithms for utilizing soil and weather information for improving corn N recommendation tools. Eight algorithms [stepwise, ridge regression, least absolute shrinkage and selection operator (Lasso), elastic net regression, principal component regression (PCR), partial least squares regression (PLSR), decision tree, and random forest] were evaluated using a dataset …
A Data-Driven Approach For Modeling Agents, Hamdi Kavak
A Data-Driven Approach For Modeling Agents, Hamdi Kavak
Computational Modeling & Simulation Engineering Theses & Dissertations
Agents are commonly created on a set of simple rules driven by theories, hypotheses, and assumptions. Such modeling premise has limited use of real-world data and is challenged when modeling real-world systems due to the lack of empirical grounding. Simultaneously, the last decade has witnessed the production and availability of large-scale data from various sensors that carry behavioral signals. These data sources have the potential to change the way we create agent-based models; from simple rules to driven by data. Despite this opportunity, the literature has neglected to offer a modeling approach to generate granular agent behaviors from data, creating …
Data Patterns Discovery Using Unsupervised Learning, Rachel A. Lewis
Data Patterns Discovery Using Unsupervised Learning, Rachel A. Lewis
Electronic Theses and Dissertations
Self-care activities classification poses significant challenges in identifying children’s unique functional abilities and needs within the exceptional children healthcare system. The accuracy of diagnosing a child's self-care problem, such as toileting or dressing, is highly influenced by an occupational therapists’ experience and time constraints. Thus, there is a need for objective means to detect and predict in advance the self-care problems of children with physical and motor disabilities. We use clustering to discover interesting information from self-care problems, perform automatic classification of binary data, and discover outliers. The advantages are twofold: the advancement of knowledge on identifying self-care problems in …
Longitudinal Tracking Of Physiological State With Electromyographic Signals., Robert Warren Stallard
Longitudinal Tracking Of Physiological State With Electromyographic Signals., Robert Warren Stallard
Electronic Theses and Dissertations
Electrophysiological measurements have been used in recent history to classify instantaneous physiological configurations, e.g., hand gestures. This work investigates the feasibility of working with changes in physiological configurations over time (i.e., longitudinally) using a variety of algorithms from the machine learning domain. We demonstrate a high degree of classification accuracy for a binary classification problem derived from electromyography measurements before and after a 35-day bedrest. The problem difficulty is increased with a more dynamic experiment testing for changes in astronaut sensorimotor performance by taking electromyography and force plate measurements before, during, and after a jump from a small platform. A …
Cognitive Virtual Admissions Counselor, Kumar Raja Guvindan Raju, Cory Adams, Raghuram Srinivas
Cognitive Virtual Admissions Counselor, Kumar Raja Guvindan Raju, Cory Adams, Raghuram Srinivas
SMU Data Science Review
Abstract. In this paper, we present a cognitive virtual admissions counselor for the Master of Science in Data Science program at Southern Methodist University. The virtual admissions counselor is a system capable of providing potential students accurate information at the time that they want to know it. After the evaluation of multiple technologies, Amazon’s LEX was selected to serve as the core technology for the virtual counselor chatbot. Student surveys were leveraged to collect and generate training data to deploy the natural language capability. The cognitive virtual admissions counselor platform is currently capable of providing an end-to-end conversational dialog to …
Comparing Various Machine Learning Statistical Methods Using Variable Differentials To Predict College Basketball, Nicholas Bennett
Comparing Various Machine Learning Statistical Methods Using Variable Differentials To Predict College Basketball, Nicholas Bennett
Williams Honors College, Honors Research Projects
The purpose of this Senior Honors Project is to research, study, and demonstrate newfound knowledge of various machine learning statistical techniques that are not covered in the University of Akron’s statistics major curriculum. This report will be an overview of three machine-learning methods that were used to predict NCAA Basketball results, specifically, the March Madness tournament. The variables used for these methods, models, and tests will include numerous variables kept throughout the season for each team, along with a couple variables that are used by the selection committee when tournament teams are being picked. The end goal is to find …
Classification With Large Sparse Datasets: Convergence Analysis And Scalable Algorithms, Xiang Li
Classification With Large Sparse Datasets: Convergence Analysis And Scalable Algorithms, Xiang Li
Electronic Thesis and Dissertation Repository
Large and sparse datasets, such as user ratings over a large collection of items, are common in the big data era. Many applications need to classify the users or items based on the high-dimensional and sparse data vectors, e.g., to predict the profitability of a product or the age group of a user, etc. Linear classifiers are popular choices for classifying such datasets because of their efficiency. In order to classify the large sparse data more effectively, the following important questions need to be answered.
1. Sparse data and convergence behavior. How different properties of a dataset, such as …
Audio-Based Productivity Forecasting Of Construction Cyclic Activities, Chris A. Sabillon
Audio-Based Productivity Forecasting Of Construction Cyclic Activities, Chris A. Sabillon
Electronic Theses and Dissertations
Due to its high cost, project managers must be able to monitor the performance of construction heavy equipment promptly. This cannot be achieved through traditional management techniques, which are based on direct observation or on estimations from historical data. Some manufacturers have started to integrate their proprietary technologies, but construction contractors are unlikely to have a fleet of entirely new and single manufacturer equipment for this to represent a solution. Third party automated approaches include the use of active sensors such as accelerometers and gyroscopes, passive technologies such as computer vision and image processing, and audio signal processing. Hitherto, most …
Towards Deeper Understanding In Neuroimaging, Rex Devon Hjelm
Towards Deeper Understanding In Neuroimaging, Rex Devon Hjelm
Computer Science ETDs
Neuroimaging is a growing domain of research, with advances in machine learning having tremendous potential to expand understanding in neuroscience and improve public health. Deep neural networks have recently and rapidly achieved historic success in numerous domains, and as a consequence have completely redefined the landscape of automated learners, giving promise of significant advances in numerous domains of research. Despite recent advances and advantages over traditional machine learning methods, deep neural networks have yet to have permeated significantly into neuroscience studies, particularly as a tool for discovery. This dissertation presents well-established and novel tools for unsupervised learning which aid in …
Incorporating Boltzmann Machine Priors For Semantic Labeling In Images And Videos, Andrew Kae
Incorporating Boltzmann Machine Priors For Semantic Labeling In Images And Videos, Andrew Kae
Doctoral Dissertations
Semantic labeling is the task of assigning category labels to regions in an image. For example, a scene may consist of regions corresponding to categories such as sky, water, and ground, or parts of a face such as eyes, nose, and mouth. Semantic labeling is an important mid-level vision task for grouping and organizing image regions into coherent parts. Labeling these regions allows us to better understand the scene itself as well as properties of the objects in the scene, such as their parts, location, and interaction within the scene. Typical approaches for this task include the conditional random field …
Using Methods From The Data-Mining And Machine-Learning Literature For Disease Classification And Prediction: A Case Study Examining Classification Of Heart Failure Subtypes, Peter C. Austin
Peter Austin
OBJECTIVE: Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine-learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines.
STUDY DESIGN AND SETTING: We compared the performance of these classification methods with that of conventional classification trees to classify patients with heart failure (HF) …
Bayesian And Related Methods: Techniques Based On Bayes' Theorem, Mehmet Vurkaç
Bayesian And Related Methods: Techniques Based On Bayes' Theorem, Mehmet Vurkaç
Systems Science Friday Noon Seminar Series
Bayes' theorem is a simple algebraic consequence of conditional probability. Yet, its consequences are critical to philosophy, society, and technology. Starting from its simple derivation, we will show how its interpretation in terms of base rates (priors) and class-conditional likelihoods illuminates everyday problems in medicine and law, and provides signal processing, communications, machine learning, model selection, and other applications of statistics with powerful classification and estimation tools. Next, we will briefly examine some of the ways in which this theorem can be adopted to include multiple attributes, contexts, hypotheses, and levels of risk. Methods derived from or related to Bayes’ …