Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

Utah State University

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Theses/Dissertations

Random forests

Publication Year

Articles 1 - 3 of 3

Full-Text Articles in Physical Sciences and Mathematics

A Comparison Of R, Sas, And Python Implementations Of Random Forests, Breckell Soifua Aug 2018

A Comparison Of R, Sas, And Python Implementations Of Random Forests, Breckell Soifua

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

The Random Forest method is a useful machine learning tool developed by Leo Breiman. There are many existing implementations across different programming languages; the most popular of which exist in R, SAS, and Python. In this paper, we conduct a comprehensive comparison of these implementations with regards to the accuracy, variable importance measurements, and timing. This comparison was done on a variety of real and simulated data with different classification difficulty levels, number of predictors, and sample sizes. The comparison shows unexpectedly different results between the three implementations.


Imputation For Random Forests, Joshua Young Aug 2017

Imputation For Random Forests, Joshua Young

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

This project introduces two new methods for imputation of missing data in random forests. The new methods are compared against other frequently used imputation methods, including those used in the randomForest package in R. To test the effectiveness of these methods, missing data are imputed into datasets that contain two missing data mechanisms including missing at random and missing completely at random. After imputation, random forests are run on the data and accuracies for the predictions are obtained. Speed is an important aspect in computing; the speeds for all the tested methods are also compared.

One of the new methods …


Tree-Based Regression For Interval-Valued Data, Chih-Ching Yeh Aug 2017

Tree-Based Regression For Interval-Valued Data, Chih-Ching Yeh

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Regression methods for interval-valued data have been increasingly studied in recent years. As most of the existing works focus on linear models, it is important to note that many problems in practice are nonlinear in nature and therefore development of nonlinear regression tools for intervalvalued data is crucial. In this project, we propose a tree-based regression method for interval-valued data, which is well applicable to both linear and nonlinear problems. Unlike linear regression models that usually require additional constraints to ensure positivity of the predicted interval length, the proposed method estimates the regression function in a nonparametric way, so the …