Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 16 of 16

Full-Text Articles in Physical Sciences and Mathematics

Investigating The Effect Of Greediness On The Coordinate Exchange Algorithm For Generating Optimal Experimental Designs, William Thomas Gullion May 2023

Investigating The Effect Of Greediness On The Coordinate Exchange Algorithm For Generating Optimal Experimental Designs, William Thomas Gullion

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Design of Experiments (DoE) is the field of statistics concerned with helping researchers maximize the amount of information they gain from their experiments. Recently, researchers have been turning to optimal experimental designs instead of classical/catalog experimental designs. One of the most popular algorithms used today to generate optimal designs is the Coordinate Exchange (CEXCH) Algorithm. CEXCH is known to be a greedy algorithm, which means it tends to favor immediate, locally best designs instead of globally optimal designs. Previous research demonstrated that this tradeoff was efficacious in that it reduced the cost of a single run of CEXCH and allowed …


The Influence Of Instrumental Sources Of Variance On Mass Spectral Comparison Algorithms, Isabel Cristina Galvez Valencia Jan 2023

The Influence Of Instrumental Sources Of Variance On Mass Spectral Comparison Algorithms, Isabel Cristina Galvez Valencia

Graduate Theses, Dissertations, and Problem Reports

Current search algorithms for the identification of substances based only on their electron ionization mass spectra provide the correct compound as their top result approximately 80% of the time. One contributing factor to the ~20% deviation in the first-hit recognition rate is that traditional algorithms work by comparing the unknown spectrum to an ‘ideal’ or consensus spectrum of each reference compound. The inclusion of replicate reference spectra in a database has been shown to improve the probability of ranking the correct identity in the number one position, but the variance in ion abundances caused by different conditions or different instruments …


Classification Of Pixel Tracks To Improve Track Reconstruction From Proton-Proton Collisions, Kebur Fantahun, Jobin Joseph, Halle Purdom, Nibhrat Lohia Sep 2022

Classification Of Pixel Tracks To Improve Track Reconstruction From Proton-Proton Collisions, Kebur Fantahun, Jobin Joseph, Halle Purdom, Nibhrat Lohia

SMU Data Science Review

In this paper, machine learning techniques are used to reconstruct particle collision pathways. CERN (Conseil européen pour la recherche nucléaire) uses a massive underground particle collider, called the Large Hadron Collider or LHC, to produce particle collisions at extremely high speeds. There are several layers of detectors in the collider that track the pathways of particles as they collide. The data produced from collisions contains an extraneous amount of background noise, i.e., decays from known particle collisions produce fake signal. Particularly, in the first layer of the detector, the pixel tracker, there is an overwhelming amount of background noise that …


Use Of Linear Discriminant Analysis In Song Classification: Modeling Based On Wilco Albums, Caroline Pollard May 2021

Use Of Linear Discriminant Analysis In Song Classification: Modeling Based On Wilco Albums, Caroline Pollard

Honors Theses

The study of music recommender algorithms is a relatively new area of study. Although these algorithms serve a variety of functions, they primarily help advertise and suggest music to users on music streaming services. This thesis explores the use of linear discriminant analysis in music categorization for the purpose of serving as a cheaper and simpler content-based recommender algorithm. The use of linear discriminant analysis was tested by creating lineardiscriminant functions that classify Wilco’s songs into their respective albums, specifically A.M., Yankee Hotel Foxtrot, and Sky Blue Sky. 4 sample songs were chosen from each album, and song data was …


Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels Aug 2018

Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels

SMU Data Science Review

In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that reviews …


Deep Machine Learning For Mechanical Performance And Failure Prediction, Elijah Reber, Nickolas D. Winovich, Guang Lin Aug 2018

Deep Machine Learning For Mechanical Performance And Failure Prediction, Elijah Reber, Nickolas D. Winovich, Guang Lin

The Summer Undergraduate Research Fellowship (SURF) Symposium

Deep learning has provided opportunities for advancement in many fields. One such opportunity is being able to accurately predict real world events. Ensuring proper motor function and being able to predict energy output is a valuable asset for owners of wind turbines. In this paper, we look at how effective a deep neural network is at predicting the failure or energy output of a wind turbine. A data set was obtained that contained sensor data from 17 wind turbines over 13 months, measuring numerous variables, such as spindle speed and blade position and whether or not the wind turbine experienced …


An Exploratory Statistical Method For Finding Interactions In A Large Dataset With An Application Toward Periodontal Diseases, Joshua Lambert Jan 2017

An Exploratory Statistical Method For Finding Interactions In A Large Dataset With An Application Toward Periodontal Diseases, Joshua Lambert

Theses and Dissertations--Epidemiology and Biostatistics

It is estimated that Periodontal Diseases effects up to 90% of the adult population. Given the complexity of the host environment, many factors contribute to expression of the disease. Age, Gender, Socioeconomic Status, Smoking Status, and Race/Ethnicity are all known risk factors, as well as a handful of known comorbidities. Certain vitamins and minerals have been shown to be protective for the disease, while some toxins and chemicals have been associated with an increased prevalence. The role of toxins, chemicals, vitamins, and minerals in relation to disease is believed to be complex and potentially modified by known risk factors. A …


On A Multiple-Choice Guessing Game, Ryan Cushman, Adam J. Hammett Apr 2016

On A Multiple-Choice Guessing Game, Ryan Cushman, Adam J. Hammett

The Research and Scholarship Symposium (2013-2019)

We consider the following game (a generalization of a binary version explored by Hammett and Oman): the first player (“Ann”) chooses a (uniformly) random integer from the first n positive integers, which is not revealed to the second player (“Gus”). Then, Gus presents Ann with a k-option multiple choice question concerning the number she chose, to which Ann truthfully replies. After a predetermined number m of these questions have been asked, Gus attempts to guess the number chosen by Ann. Gus wins if he guesses Ann’s number. Our goal is to determine every m-question algorithm which maximizes the probability of …


Implementation And Application Of The Curds And Whey Algorithm To Regression Problems, John Kidd May 2014

Implementation And Application Of The Curds And Whey Algorithm To Regression Problems, John Kidd

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

A common statistical problem is trying to predict two or more variables using a set of predictor variables. The simplest model for this situation is called multivariate linear regression. This method uses each set of predictor variables to predict each of the response variables separately. This approach seems counter-intuitive as any possible relationship between the variables being predicted is ignored.

Breiman and Friedman found a way to take advantage of relationships among the response variables to increase the accuracy of the predictions for each of the predicted variables with an algorithm they called Curds and
Whey. It uses other statistical …


Calculating Ellipse Overlap Areas, Gary B. Hughes, Mohcine Chraibi Jan 2014

Calculating Ellipse Overlap Areas, Gary B. Hughes, Mohcine Chraibi

Statistics

We present an approach for finding the overlap area between two ellipses that does not rely on proxy curves. The Gauss-Green formula is used to determine a segment area between two points on an ellipse. Overlap between two ellipses is calculated by combining the areas of appropriate segments and polygons in each ellipse. For four of the ten possible orientations of two ellipses, the method requires numerical determination of transverse intersection points. Approximate intersection points can be determined by solving the two implicit ellipse equations simultaneously. Alternative approaches for finding transverse intersection points are available using tools from algebraic geometry, …


Empirical Sampling From Permutation Space With Unique Patterns, Justice I. Odiase May 2012

Empirical Sampling From Permutation Space With Unique Patterns, Justice I. Odiase

Journal of Modern Applied Statistical Methods

The exact distribution of a test statistic ultimately guarantees that the probability of a Type I error is exactly α. Several methods for estimating the exact distribution of a test statistic have evolved over the years with inherent computational problems and varying degrees of accuracy. The unique pattern of permutations resulting from using experimental data to sample within the permutation space without the risk of repeating permutations is identified. The method presented circumvents the theoretical requirements of asymptotic procedures and the computational difficulties associated with an exhaustive enumeration of permutations. Results show that time and space complexities are drastically reduced …


Autonomous Entropy-Based Intelligent Experimental Design, Nabin Kumar Malakar Jan 2011

Autonomous Entropy-Based Intelligent Experimental Design, Nabin Kumar Malakar

Legacy Theses & Dissertations (2009 - 2024)

The aim of this thesis is to explore the application of probability and information theory in experimental design, and to do so in a way that combines what we know about inference and inquiry in a comprehensive and consistent manner.


Dynamic Model Pooling Methodology For Improving Aberration Detection Algorithms, Brenton J. Sellati Jan 2010

Dynamic Model Pooling Methodology For Improving Aberration Detection Algorithms, Brenton J. Sellati

Masters Theses 1911 - February 2014

Syndromic surveillance is defined generally as the collection and statistical analysis of data which are believed to be leading indicators for the presence of deleterious activities developing within a system. Conceptually, syndromic surveillance can be applied to any discipline in which it is important to know when external influences manifest themselves in a system by forcing it to depart from its baseline. Comparing syndromic surveillance systems have led to mixed results, where models that dominate in one performance metric are often sorely deficient in another. This results in a zero-sum trade off where one performance metric must be afforded greater …


A Comparison For Longitudinal Data Missing Due To Truncation, Rong Liu Jan 2006

A Comparison For Longitudinal Data Missing Due To Truncation, Rong Liu

Theses and Dissertations

Many longitudinal clinical studies suffer from patient dropout. Often the dropout is nonignorable and the missing mechanism needs to be incorporated in the analysis. The methods handling missing data make various assumptions about the missing mechanism, and their utility in practice depends on whether these assumptions apply in a specific application. Ramakrishnan and Wang (2005) proposed a method (MDT) to handle nonignorable missing data, where missing is due to the observations exceeding an unobserved threshold. Assuming that the observations arise from a truncated normal distribution, they suggested an EM algorithm to simplify the estimation.In this dissertation the EM algorithm is …


Quantifying The Effects Of Correlated Covariates On Variable Importance Estimates From Random Forests, Ryan Vincent Kimes Jan 2006

Quantifying The Effects Of Correlated Covariates On Variable Importance Estimates From Random Forests, Ryan Vincent Kimes

Theses and Dissertations

Recent advances in computing technology have lead to the development of algorithmic modeling techniques. These methods can be used to analyze data which are difficult to analyze using traditional statistical models. This study examined the effectiveness of variable importance estimates from the random forest algorithm in identifying the true predictor among a large number of candidate predictors. A simulation study was conducted using twenty different levels of association among the independent variables and seven different levels of association between the true predictor and the response. We conclude that the random forest method is an effective classification tool when the goals …


G-Computation Estimation Of Nonparametric Causal Effects On Time-Dependent Mean Outcomes In Longitudinal Studies, Romain Neugebauer, Mark J. Van Der Laan Jul 2005

G-Computation Estimation Of Nonparametric Causal Effects On Time-Dependent Mean Outcomes In Longitudinal Studies, Romain Neugebauer, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Two approaches to Causal Inference based on Marginal Structural Models (MSM) have been proposed. They provide different representations of causal effects with distinct causal parameters. Initially, a parametric MSM approach to Causal Inference was developed: it relies on correct specification of a parametric MSM. Recently, a new approach based on nonparametric MSM was introduced. This later approach does not require the assumption of a correctly specified MSM and thus is more realistic if one believes that correct specification of a parametric MSM is unlikely in practice. However, this approach was described only for investigating causal effects on mean outcomes collected …