Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 8 of 8

Full-Text Articles in Statistical Models

Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre Dec 2023

Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre

SMU Data Science Review

Hair is found in over 90% of crime scenes and has long been analyzed as trace evidence. However, recent reviews of traditional hair fiber analysis techniques, primarily morphological examination, have cast doubt on its reliability. To address these concerns, this study employed machine learning algorithms, specifically Linear Discriminant Analysis (LDA) and Random Forest, on Direct Analysis in Real Time time-of-flight mass spectra collected from human, cat, and dog hair samples. The objective was to develop a chemistry- and statistics-based classification method for unbiased taxonomic identification of hair. The results of the study showed that LDA and Random Forest were highly …


Modeling Species Distribution And Habitat Suitability Of American Ginseng (Panax Quinquefolius) In Virginia, Jacob D. J. Peters May 2020

Modeling Species Distribution And Habitat Suitability Of American Ginseng (Panax Quinquefolius) In Virginia, Jacob D. J. Peters

Masters Theses, 2020-current

American ginseng (Panax quinquefolius) is a well-known and sought-after medicinal plant native to North America that is facing increased threat of extinction due to overharvesting, herbivory, and habitat loss. Species distribution and habitat suitability models may be valuable to landowners interested in sustainable harvest or to institutions interested in the conservation and restoration of the species. With unequal sampling efforts across a region of interest, it is likely that some locations with appropriate habitat may be misrepresented in model predictions. This study refined a state-derived species distribution model for ginseng through increased sampling effort across the Cumberland Plateau …


Evaluating An Ordinal Output Using Data Modeling, Algorithmic Modeling, And Numerical Analysis, Martin Keagan Wynne Brown Jan 2020

Evaluating An Ordinal Output Using Data Modeling, Algorithmic Modeling, And Numerical Analysis, Martin Keagan Wynne Brown

Murray State Theses and Dissertations

Data and algorithmic modeling are two different approaches used in predictive analytics. The models discussed from these two approaches include the proportional odds logit model (POLR), the vector generalized linear model (VGLM), the classification and regression tree model (CART), and the random forests model (RF). Patterns in the data were analyzed using trigonometric polynomial approximations and Fast Fourier Transforms. Predictive modeling is used frequently in statistics and data science to find the relationship between the explanatory (input) variables and a response (output) variable. Both approaches prove advantageous in different cases depending on the data set. In our case, the data …


Using Random Forests To Estimate Win Probability Before Each Play Of An Nfl Game, Dennis Lock, Dan Nettleton Jul 2019

Using Random Forests To Estimate Win Probability Before Each Play Of An Nfl Game, Dennis Lock, Dan Nettleton

Dan Nettleton

Before any play of a National Football League (NFL) game, the probability that a given team will win depends on many situational variables (such as time remaining, yards to go for a first down, field position and current score) as well as the relative quality of the two teams as quantified by the Las Vegas point spread. We use a random forest method to combine pre-play variables to estimate Win Probability (WP) before any play of an NFL game. When a subset of NFL play-by-play data for the 12 seasons from 2001 to 2012 is used as a training dataset, …


Decision Trees And Their Application For Classification And Regression Problems, Obinna Chilezie Njoku May 2019

Decision Trees And Their Application For Classification And Regression Problems, Obinna Chilezie Njoku

MSU Graduate Theses

Tree methods are some of the best and most commonly used methods in the field of statistical learning. They are widely used in classification and regression modeling. This thesis introduces the concept and focuses more on decision trees such as Classification and Regression Trees (CART) used for classification and regression predictive modeling problems. We also introduced some ensemble methods such as bagging, random forest and boosting. These methods were introduced to improve the performance and accuracy of the models constructed by classification and regression tree models. This work also provides an in-depth understanding of how the CART models are constructed, …


Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse Apr 2019

Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse

FIU Electronic Theses and Dissertations

Regression is a statistical technique for modeling the relationship between a dependent variable Y and two or more predictor variables, also known as regressors. In the broad field of regression, there exists a special case in which the relationship between the dependent variable and the regressor(s) is linear. This is known as linear regression.

The purpose of this paper is to create a useful method that effectively selects a subset of regressors when dealing with high dimensional data and/or collinearity in linear regression. As the name depicts it, high dimensional data occurs when the number of predictor variables is far …


Assessment And Correction Of Lidar-Derived Dems In The Coastal Marshes Of Louisiana, William M. Lauve Mar 2019

Assessment And Correction Of Lidar-Derived Dems In The Coastal Marshes Of Louisiana, William M. Lauve

LSU Master's Theses

The onset of airborne light detection and ranging (lidar) has resulted in expansive, precise digital elevation models (DEMs). DEMs are essential for modeling complex systems, such as the coastal land margin of Louisiana. They are used for many applications (e.g. tide, storm surge, and ecological modeling) and by diverse groups (e.g. state and federal agencies, NGOs, and academia). However, in a marsh environment, it is difficult for airborne lidar to produce accurate bare-earth measurements and even accurate elevations are rarely verified by ground truth data. The accuracy of lidar in marshes is limited by the sensor’s resolution …


Survival Prediction For Brain Tumor Patients Using Gene Expression Data, Vinicius Bonato May 2010

Survival Prediction For Brain Tumor Patients Using Gene Expression Data, Vinicius Bonato

Dissertations & Theses (Open Access)

Brain tumor is one of the most aggressive types of cancer in humans, with an estimated median survival time of 12 months and only 4% of the patients surviving more than 5 years after disease diagnosis. Until recently, brain tumor prognosis has been based only on clinical information such as tumor grade and patient age, but there are reports indicating that molecular profiling of gliomas can reveal subgroups of patients with distinct survival rates. We hypothesize that coupling molecular profiling of brain tumors with clinical information might improve predictions of patient survival time and, consequently, better guide future treatment decisions. …