Open Access. Powered by Scholars. Published by Universities.®
![Digital Commons Network](http://assets.bepress.com/20200205/img/dcn/DCsunburst.png)
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Institution
- Publication Year
- Publication
-
- All Graduate Theses and Dissertations, Spring 1920 to Summer 2023 (2)
- SMU Data Science Review (2)
- Theses and Dissertations (2)
- Graduate Theses, Dissertations, and Problem Reports (1)
- Honors Theses (1)
-
- Journal of Modern Applied Statistical Methods (1)
- Legacy Theses & Dissertations (2009 - 2024) (1)
- Masters Theses 1911 - February 2014 (1)
- Statistics (1)
- The Research and Scholarship Symposium (2013-2019) (1)
- The Summer Undergraduate Research Fellowship (SURF) Symposium (1)
- Theses and Dissertations--Epidemiology and Biostatistics (1)
- U.C. Berkeley Division of Biostatistics Working Paper Series (1)
- Publication Type
Articles 1 - 16 of 16
Full-Text Articles in Physical Sciences and Mathematics
Investigating The Effect Of Greediness On The Coordinate Exchange Algorithm For Generating Optimal Experimental Designs, William Thomas Gullion
Investigating The Effect Of Greediness On The Coordinate Exchange Algorithm For Generating Optimal Experimental Designs, William Thomas Gullion
All Graduate Theses and Dissertations, Spring 1920 to Summer 2023
Design of Experiments (DoE) is the field of statistics concerned with helping researchers maximize the amount of information they gain from their experiments. Recently, researchers have been turning to optimal experimental designs instead of classical/catalog experimental designs. One of the most popular algorithms used today to generate optimal designs is the Coordinate Exchange (CEXCH) Algorithm. CEXCH is known to be a greedy algorithm, which means it tends to favor immediate, locally best designs instead of globally optimal designs. Previous research demonstrated that this tradeoff was efficacious in that it reduced the cost of a single run of CEXCH and allowed …
The Influence Of Instrumental Sources Of Variance On Mass Spectral Comparison Algorithms, Isabel Cristina Galvez Valencia
The Influence Of Instrumental Sources Of Variance On Mass Spectral Comparison Algorithms, Isabel Cristina Galvez Valencia
Graduate Theses, Dissertations, and Problem Reports
Current search algorithms for the identification of substances based only on their electron ionization mass spectra provide the correct compound as their top result approximately 80% of the time. One contributing factor to the ~20% deviation in the first-hit recognition rate is that traditional algorithms work by comparing the unknown spectrum to an ‘ideal’ or consensus spectrum of each reference compound. The inclusion of replicate reference spectra in a database has been shown to improve the probability of ranking the correct identity in the number one position, but the variance in ion abundances caused by different conditions or different instruments …
Classification Of Pixel Tracks To Improve Track Reconstruction From Proton-Proton Collisions, Kebur Fantahun, Jobin Joseph, Halle Purdom, Nibhrat Lohia
Classification Of Pixel Tracks To Improve Track Reconstruction From Proton-Proton Collisions, Kebur Fantahun, Jobin Joseph, Halle Purdom, Nibhrat Lohia
SMU Data Science Review
In this paper, machine learning techniques are used to reconstruct particle collision pathways. CERN (Conseil européen pour la recherche nucléaire) uses a massive underground particle collider, called the Large Hadron Collider or LHC, to produce particle collisions at extremely high speeds. There are several layers of detectors in the collider that track the pathways of particles as they collide. The data produced from collisions contains an extraneous amount of background noise, i.e., decays from known particle collisions produce fake signal. Particularly, in the first layer of the detector, the pixel tracker, there is an overwhelming amount of background noise that …
Use Of Linear Discriminant Analysis In Song Classification: Modeling Based On Wilco Albums, Caroline Pollard
Use Of Linear Discriminant Analysis In Song Classification: Modeling Based On Wilco Albums, Caroline Pollard
Honors Theses
The study of music recommender algorithms is a relatively new area of study. Although these algorithms serve a variety of functions, they primarily help advertise and suggest music to users on music streaming services. This thesis explores the use of linear discriminant analysis in music categorization for the purpose of serving as a cheaper and simpler content-based recommender algorithm. The use of linear discriminant analysis was tested by creating lineardiscriminant functions that classify Wilco’s songs into their respective albums, specifically A.M., Yankee Hotel Foxtrot, and Sky Blue Sky. 4 sample songs were chosen from each album, and song data was …
Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels
Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels
SMU Data Science Review
In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that reviews …
Deep Machine Learning For Mechanical Performance And Failure Prediction, Elijah Reber, Nickolas D. Winovich, Guang Lin
Deep Machine Learning For Mechanical Performance And Failure Prediction, Elijah Reber, Nickolas D. Winovich, Guang Lin
The Summer Undergraduate Research Fellowship (SURF) Symposium
Deep learning has provided opportunities for advancement in many fields. One such opportunity is being able to accurately predict real world events. Ensuring proper motor function and being able to predict energy output is a valuable asset for owners of wind turbines. In this paper, we look at how effective a deep neural network is at predicting the failure or energy output of a wind turbine. A data set was obtained that contained sensor data from 17 wind turbines over 13 months, measuring numerous variables, such as spindle speed and blade position and whether or not the wind turbine experienced …
An Exploratory Statistical Method For Finding Interactions In A Large Dataset With An Application Toward Periodontal Diseases, Joshua Lambert
An Exploratory Statistical Method For Finding Interactions In A Large Dataset With An Application Toward Periodontal Diseases, Joshua Lambert
Theses and Dissertations--Epidemiology and Biostatistics
It is estimated that Periodontal Diseases effects up to 90% of the adult population. Given the complexity of the host environment, many factors contribute to expression of the disease. Age, Gender, Socioeconomic Status, Smoking Status, and Race/Ethnicity are all known risk factors, as well as a handful of known comorbidities. Certain vitamins and minerals have been shown to be protective for the disease, while some toxins and chemicals have been associated with an increased prevalence. The role of toxins, chemicals, vitamins, and minerals in relation to disease is believed to be complex and potentially modified by known risk factors. A …
On A Multiple-Choice Guessing Game, Ryan Cushman, Adam J. Hammett
On A Multiple-Choice Guessing Game, Ryan Cushman, Adam J. Hammett
The Research and Scholarship Symposium (2013-2019)
We consider the following game (a generalization of a binary version explored by Hammett and Oman): the first player (“Ann”) chooses a (uniformly) random integer from the first n positive integers, which is not revealed to the second player (“Gus”). Then, Gus presents Ann with a k-option multiple choice question concerning the number she chose, to which Ann truthfully replies. After a predetermined number m of these questions have been asked, Gus attempts to guess the number chosen by Ann. Gus wins if he guesses Ann’s number. Our goal is to determine every m-question algorithm which maximizes the probability of …
Implementation And Application Of The Curds And Whey Algorithm To Regression Problems, John Kidd
Implementation And Application Of The Curds And Whey Algorithm To Regression Problems, John Kidd
All Graduate Theses and Dissertations, Spring 1920 to Summer 2023
A common statistical problem is trying to predict two or more variables using a set of predictor variables. The simplest model for this situation is called multivariate linear regression. This method uses each set of predictor variables to predict each of the response variables separately. This approach seems counter-intuitive as any possible relationship between the variables being predicted is ignored.
Breiman and Friedman found a way to take advantage of relationships among the response variables to increase the accuracy of the predictions for each of the predicted variables with an algorithm they called Curds and
Whey. It uses other statistical …
Calculating Ellipse Overlap Areas, Gary B. Hughes, Mohcine Chraibi
Calculating Ellipse Overlap Areas, Gary B. Hughes, Mohcine Chraibi
Statistics
We present an approach for finding the overlap area between two ellipses that does not rely on proxy curves. The Gauss-Green formula is used to determine a segment area between two points on an ellipse. Overlap between two ellipses is calculated by combining the areas of appropriate segments and polygons in each ellipse. For four of the ten possible orientations of two ellipses, the method requires numerical determination of transverse intersection points. Approximate intersection points can be determined by solving the two implicit ellipse equations simultaneously. Alternative approaches for finding transverse intersection points are available using tools from algebraic geometry, …
Empirical Sampling From Permutation Space With Unique Patterns, Justice I. Odiase
Empirical Sampling From Permutation Space With Unique Patterns, Justice I. Odiase
Journal of Modern Applied Statistical Methods
The exact distribution of a test statistic ultimately guarantees that the probability of a Type I error is exactly α. Several methods for estimating the exact distribution of a test statistic have evolved over the years with inherent computational problems and varying degrees of accuracy. The unique pattern of permutations resulting from using experimental data to sample within the permutation space without the risk of repeating permutations is identified. The method presented circumvents the theoretical requirements of asymptotic procedures and the computational difficulties associated with an exhaustive enumeration of permutations. Results show that time and space complexities are drastically reduced …
Autonomous Entropy-Based Intelligent Experimental Design, Nabin Kumar Malakar
Autonomous Entropy-Based Intelligent Experimental Design, Nabin Kumar Malakar
Legacy Theses & Dissertations (2009 - 2024)
The aim of this thesis is to explore the application of probability and information theory in experimental design, and to do so in a way that combines what we know about inference and inquiry in a comprehensive and consistent manner.
Dynamic Model Pooling Methodology For Improving Aberration Detection Algorithms, Brenton J. Sellati
Dynamic Model Pooling Methodology For Improving Aberration Detection Algorithms, Brenton J. Sellati
Masters Theses 1911 - February 2014
Syndromic surveillance is defined generally as the collection and statistical analysis of data which are believed to be leading indicators for the presence of deleterious activities developing within a system. Conceptually, syndromic surveillance can be applied to any discipline in which it is important to know when external influences manifest themselves in a system by forcing it to depart from its baseline. Comparing syndromic surveillance systems have led to mixed results, where models that dominate in one performance metric are often sorely deficient in another. This results in a zero-sum trade off where one performance metric must be afforded greater …
A Comparison For Longitudinal Data Missing Due To Truncation, Rong Liu
A Comparison For Longitudinal Data Missing Due To Truncation, Rong Liu
Theses and Dissertations
Many longitudinal clinical studies suffer from patient dropout. Often the dropout is nonignorable and the missing mechanism needs to be incorporated in the analysis. The methods handling missing data make various assumptions about the missing mechanism, and their utility in practice depends on whether these assumptions apply in a specific application. Ramakrishnan and Wang (2005) proposed a method (MDT) to handle nonignorable missing data, where missing is due to the observations exceeding an unobserved threshold. Assuming that the observations arise from a truncated normal distribution, they suggested an EM algorithm to simplify the estimation.In this dissertation the EM algorithm is …
Quantifying The Effects Of Correlated Covariates On Variable Importance Estimates From Random Forests, Ryan Vincent Kimes
Quantifying The Effects Of Correlated Covariates On Variable Importance Estimates From Random Forests, Ryan Vincent Kimes
Theses and Dissertations
Recent advances in computing technology have lead to the development of algorithmic modeling techniques. These methods can be used to analyze data which are difficult to analyze using traditional statistical models. This study examined the effectiveness of variable importance estimates from the random forest algorithm in identifying the true predictor among a large number of candidate predictors. A simulation study was conducted using twenty different levels of association among the independent variables and seven different levels of association between the true predictor and the response. We conclude that the random forest method is an effective classification tool when the goals …
G-Computation Estimation Of Nonparametric Causal Effects On Time-Dependent Mean Outcomes In Longitudinal Studies, Romain Neugebauer, Mark J. Van Der Laan
G-Computation Estimation Of Nonparametric Causal Effects On Time-Dependent Mean Outcomes In Longitudinal Studies, Romain Neugebauer, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Two approaches to Causal Inference based on Marginal Structural Models (MSM) have been proposed. They provide different representations of causal effects with distinct causal parameters. Initially, a parametric MSM approach to Causal Inference was developed: it relies on correct specification of a parametric MSM. Recently, a new approach based on nonparametric MSM was introduced. This later approach does not require the assumption of a correctly specified MSM and thus is more realistic if one believes that correct specification of a parametric MSM is unlikely in practice. However, this approach was described only for investigating causal effects on mean outcomes collected …