Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

PDF

Regression

Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 92

Full-Text Articles in Entire DC Network

Examining The Interaction Between Calcium Supplement Use, Demographics, And Lifestyle Factors On Bone Health In Women, Vix Talbot Jun 2024

Examining The Interaction Between Calcium Supplement Use, Demographics, And Lifestyle Factors On Bone Health In Women, Vix Talbot

University Honors Theses

Osteoporosis is a condition which poses a significant health threat, particularly among women during the menopause transition, where accelerated bone loss increases fracture risk. Calcium supplementation has been shown to be an important intervention to mitigate bone mineral density (BMD) decline during this and other periods of life. However, the efficacy of calcium supplementation is influenced by various individual factors, including demographics and lifestyle habits. This study investigates the interaction between calcium supplement use, and several interaction terms on bone health in women. Multiple linear regression analysis is employed to assess the impact of these factors on BMD. Data from …


Defensive Impact Wins: Developing A New Method To Rate Individual Defense In Nba Games, Dylan J. Stiles Jan 2024

Defensive Impact Wins: Developing A New Method To Rate Individual Defense In Nba Games, Dylan J. Stiles

Honors Theses and Capstones

With the analytics revolution in sports in the past 20 years, it seems that everything that can be quantified is. In basketball though, trying to break the game down into a set of numbers comes with a unique problem. While we've come up with a good set of advanced numbers to measure offensive efficiency, defense is fundamentally harder to quantify. The game is played five on five, but it has often been popular or convenient to model defense as a set of five one on one games. As defenses became more complex into the 2010s, this methodology became more insignificant. …


The Use Of Regularization To Detect Racial Inequities In Pay Equity Studies: An Empirical Study And Reflections On Regulation Methods, Christopher M. Peña Nov 2023

The Use Of Regularization To Detect Racial Inequities In Pay Equity Studies: An Empirical Study And Reflections On Regulation Methods, Christopher M. Peña

Electronic Theses and Dissertations

Since the late 1970s, multiple linear regression has been the preferred method for identifying discrimination in pay. An empirical study on this topic was conducted using quantitative critical methods. A literature review first examined conflicting views on using multiple linear regression in pay equity studies. The review found that multiple linear regression is used so prevalently in pay equity studies because the courts and practitioners have widely accepted it and because of its simplicity and ability to parse multiple sources of variance simultaneously. Commentaries in the literature cautioned about errors in model specification, the use of tainted variables, and the …


Examining Model Complexity's Effects When Predicting Continuous Measures From Ordinal Labels, Mckade S. Thomas May 2023

Examining Model Complexity's Effects When Predicting Continuous Measures From Ordinal Labels, Mckade S. Thomas

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Many real world problems require the prediction of ordinal variables where the values are a set of categories with an ordering to them. However, in many of these cases the categorical nature of the ordinal data is not a desirable outcome. As such, regression models treat ordinal variables as continuous and do not bind their predictions to discrete categories. Prior research has found that these models are capable of learning useful information between the discrete levels of the ordinal labels they are trained on, but complex models may learn ordinal labels too closely, missing the information between levels. In this …


The 2015 Ncaa Cost-Of-Attendance Stipend And Its Effects On Institutional Financial Aid Packages, Sara Greene Apr 2023

The 2015 Ncaa Cost-Of-Attendance Stipend And Its Effects On Institutional Financial Aid Packages, Sara Greene

Honors Theses

In 2015, the National Collegiate Athletic Association (NCAA) allowed “Cost of Attendance” (COA) stipends to be offered to athletic recruits for Division I schools. These stipends are intended to allow schools to grant aid to student-athletes beyond a full-ride scholarship to cover additional costs imposed on student-athletes. These stipends created an opportunity for the “Autonomy” Power 5 programs to utilize a competitive tactic to try to win over the top recruits. There is evidence that these COA stipends have caused an increase in the estimated cost of attendance reported by the university. This paper examines if the COA stipends have …


Analyzing Relationships With Machine Learning, Oscar Ko Feb 2023

Analyzing Relationships With Machine Learning, Oscar Ko

Dissertations, Theses, and Capstone Projects

Procedurally, this project aims to take a dataset, analyze it, and offer insights to the audience in an easy-to-digest format. Conceptually, this project will seek to explore questions like: “Do couples that meet through online dating or dating apps have higher or lower quality relationships?”, “Can any features in this dataset help predict how a subject would rate their relationship quality?”, and “What other insights can I derive from using machine learning for exploratory analysis?” The intended audience for this project is anyone interested in romantic relationships or machine learning.

The dataset is from a Stanford University survey, “How Couples …


Predicting Insulin Pump Therapy Settings, Riccardo L. Ferraro, David Grijalva, Alex Trahan Sep 2022

Predicting Insulin Pump Therapy Settings, Riccardo L. Ferraro, David Grijalva, Alex Trahan

SMU Data Science Review

Millions of people live with diabetes worldwide [7]. To mitigate some of the many symptoms associated with diabetes, an estimated 350,000 people in the United States rely on insulin pumps [17]. For many of these people, how effectively their insulin pump performs is the difference between sleeping through the night and a life threatening emergency treatment at a hospital. Three programmed insulin pump therapy settings governing effective insulin pump function are: Basal Rate (BR), Insulin Sensitivity Factor (ISF), and Carbohydrate Ratio (ICR). For many people using insulin pumps, these therapy settings are often not correct, given their physiological needs. While …


Contributions To Random Forest Variable Importance With Applications In R, Kelvyn K. Bladen Aug 2022

Contributions To Random Forest Variable Importance With Applications In R, Kelvyn K. Bladen

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

A major focus in statistics is building and improving computational algorithms that can use data to predict a response. Two fundamental camps of research arise from such a goal. The first camp is researching ways to get more accurate predictions. Many sophisticated methods, collectively known as machine learning methods, have been developed for this very purpose. One such method that is widely used across industry and many other areas of investigation is called Random Forests.

The second camp of research is that of improving the interpretability of machine learning methods. This is worthy of attention when analysts desire to optimize …


The Short-Term Effects Of Fine Airborne Particulate Matter And Climate On Covid-19 Disease Dynamics, El Hussain Shamsa, Kezhong Zhang Jun 2022

The Short-Term Effects Of Fine Airborne Particulate Matter And Climate On Covid-19 Disease Dynamics, El Hussain Shamsa, Kezhong Zhang

Medical Student Research Symposium

Background: Despite more than 60% of the United States population being fully vaccinated, COVID-19 cases continue to spike in a temporal pattern. These patterns in COVID-19 incidence and mortality may be linked to short-term changes in environmental factors.

Methods: Nationwide, county-wise measurements for COVID-19 cases and deaths, fine-airborne particulate matter (PM2.5), and maximum temperature were obtained from March 20, 2020 to March 20, 2021. Multivariate Linear Regression was used to analyze the association between environmental factors and COVID-19 incidence and mortality rates in each season. Negative Binomial Regression was used to analyze daily fluctuations of COVID-19 cases …


Assessing The Influence Of Health Policy And Population Mobility On Covid-19 Spread In Arkansas, Tayden Barretto May 2022

Assessing The Influence Of Health Policy And Population Mobility On Covid-19 Spread In Arkansas, Tayden Barretto

Industrial Engineering Undergraduate Honors Theses

The outbreak of COVID-19 has created a major crisis across the world since its start in 2019, and its influence on every realm of society is undeniable. Globally, more than 500 million cases have been recorded since March 2020, with almost 6 million deaths. In the wake of this crisis, many governments and health organizations have taken steps and precautions to mitigate its spread. These steps involve public mandates of information, reducing frequency of personal contact, and use of masks to minimize the risk of transmission. Current access to mobility data released from Google detailing population movements has provided a …


Analysis Of Minor League Rule Changes Effect On Stolen Bases, Zachary Houghtaling Jan 2022

Analysis Of Minor League Rule Changes Effect On Stolen Bases, Zachary Houghtaling

Williams Honors College, Honors Research Projects

This study uses various statistical analyses to evaluate the justification of rule changes for Major League Baseball that were implemented within the Minor Leagues during the 2021 minor league season. The primary focus of the study is predicting how some of these Minor League rule changes could affect the stolen base success rate and the number of attempts per game within the Major Leagues. A survey was conducted to evaluate how fans feel about stolen bases within the current game and if rules should be altered to increase the number of stolen bases that occur. Additionally, recorded Major and Minor …


Statistical Theory For Specialized Linear Regression Adjustment Methods Compared To Multiple Linear Regression In The Presence And Absence Of Interaction Effects, Leon Su Jan 2022

Statistical Theory For Specialized Linear Regression Adjustment Methods Compared To Multiple Linear Regression In The Presence And Absence Of Interaction Effects, Leon Su

Theses and Dissertations--Statistics

When building models to investigate outcomes and variables of interest, researchers often want to adjust for other variables. There is a variety of ways that these adjustments are performed. In this work, we will consider four approaches to adjustment utilized by researchers in various fields. We will compare the efficacy of these methods to what we call the ”true model method”, fitting a multiple linear regression model in which adjustment variables are model covariates. Our goal is to show that these adjustment methods have inferior performance to the true model method by comparing model parameter estimates, power, type I error, …


(R1239) A New Type Ii Half Logistic-G Family Of Distributions With Properties, Regression Models, System Reliability And Applications, Emrah Altun, Morad Alizadeh, Haitham M. Yousof, Mahdi Rasekhi, G. G. Hamedani Dec 2021

(R1239) A New Type Ii Half Logistic-G Family Of Distributions With Properties, Regression Models, System Reliability And Applications, Emrah Altun, Morad Alizadeh, Haitham M. Yousof, Mahdi Rasekhi, G. G. Hamedani

Applications and Applied Mathematics: An International Journal (AAM)

This study proposes a new family of distributions based on the half logistic distribution. With the new family, the baseline distributions gain flexibility through additional shape parameters. The important statistical properties of the proposed family are derived. A new generalization of the Weibull distribution is used to introduce a location-scale regression model for the censored response variable. The utility of the introduced models is demonstrated in survival analysis and estimation of the system reliability. Three data sets are analyzed. According to the empirical results, it is observed that the proposed family gives better results than other existing models.


Comparison Of Statistical Methods For Modeling Count Data With An Application To Length Of Hospital Stay, Gustavo A. Fernandez Dec 2021

Comparison Of Statistical Methods For Modeling Count Data With An Application To Length Of Hospital Stay, Gustavo A. Fernandez

Theses and Dissertations

Hospital length of stay (LOS) is a key indicator of hospital care management efficiency, cost of care, and hospital planning. Therefore, understanding hospital LOS variability is always an important healthcare focus. Hospital LOS data are count data, with discrete and nonnegative values, typically right-skewed, and often exhibiting excessive zeros. Numerous studies have been conducted to model hospital LOS to identify significant predictors contributing to its variability. Many researchers have used linear regression with or without logarithmic transformation of the outcome variable LOS, or logistic regression on a dichotomized LOS. These regression methods usually violate models’ assumptions and are subject …


Aggregating Twitter Text Through Generalized Linear Regression Models For Tweet Popularity Prediction And Automatic Topic Classification, Chen Mo, Jingjing Yin, Isaac Chun-Hai Fung, Zion Tse Nov 2021

Aggregating Twitter Text Through Generalized Linear Regression Models For Tweet Popularity Prediction And Automatic Topic Classification, Chen Mo, Jingjing Yin, Isaac Chun-Hai Fung, Zion Tse

Department of Biostatistics, Epidemiology, and Environmental Health Sciences Faculty Publications

Social media platforms have become accessible resources for health data analysis. However, the advanced computational techniques involved in big data text mining and analysis are challenging for public health data analysts to apply. This study proposes and explores the feasibility of a novel yet straightforward method by regressing the outcome of interest on the aggregated influence scores for association and/or classification analyses based on generalized linear models. The method reduces the document term matrix by transforming text data into a continuous summary score, thereby reducing the data dimension substantially and easing the data sparsity issue of the term matrix. To …


Empirical Modeling Of Tilt-Rotor Aerodynamic Performance, Michael C. Stratton Oct 2021

Empirical Modeling Of Tilt-Rotor Aerodynamic Performance, Michael C. Stratton

Mechanical & Aerospace Engineering Theses & Dissertations

There has been increasing interest into the performance of electric vertical takeoff and landing (eVTOL) aircraft. The propellers used for the eVTOL propulsion systems experience a broad range of aerodynamic conditions, not typically experienced by propellers in forward flight, that includes large incidence angles relative to the oncoming airflow. Formal experiment design and analysis techniques featuring response surface methods were applied to a subscale, tilt-rotor wind tunnel test for three, four, five, and six blade, 16-inch diameter, propeller configurations in support of development of the NASA LA-8 aircraft. Investigation of low-speed performance included a maximum speed of 12 m/s and …


Association Between Stream Impairment By Mercury And Superfund Sites In The Conterminous Usa, Karessa L. Manning May 2021

Association Between Stream Impairment By Mercury And Superfund Sites In The Conterminous Usa, Karessa L. Manning

Masters Theses

Mercury is a natural element that can cause harm to the brain, heart, kidneys, lungs, and immune system, especially to fetuses developing in the womb. Many natural and anthropogenic factors contribute to mercury in the environment, such as geologic deposits, landfills, gold and silver mining operations, cement production, and atmospheric deposition. Mercury has been identified as a contaminant of concern at many National Priority List (NPL) sites, however, studies on contamination at NPL sites are often only conducted on a local level. This study was to analyze the potential connection between mercury-contaminated NPL sites and the presence of mercury impaired …


An Evaluation Of Knot Placement Strategies For Spline Regression, William Klein Jan 2021

An Evaluation Of Knot Placement Strategies For Spline Regression, William Klein

CMC Senior Theses

Regression splines have an established value for producing quality fit at a relatively low-degree polynomial. This paper explores the implications of adopting new methods for knot selection in tandem with established methodology from the current literature. Structural features of generated datasets, as well as residuals collected from sequential iterative models are used to augment the equidistant knot selection process. From analyzing a simulated dataset and an application onto the Racial Animus dataset, I find that a B-spline basis paired with equally-spaced knots remains the best choice when data are evenly distributed, even when structural features of a dataset are known …


A Statistical Learning Regression Model Utilized To Determine Predictive Factors Of Social Distancing During Covid-19 Pandemic, Timothy A. Smith, Albert J. Boquet, Matthew V. Chin Nov 2020

A Statistical Learning Regression Model Utilized To Determine Predictive Factors Of Social Distancing During Covid-19 Pandemic, Timothy A. Smith, Albert J. Boquet, Matthew V. Chin

Publications

In an application of the mathematical theory of statistics, predictive regression modelling can be used to determine if there is a trend to predict the response variable of social distancing in terms of multiple predictor input “predictor” variables. In this study the social distancing is measured as the percentage reduction in average mobility by GPS records, and the mathematical results obtained are interpreted to determine what factors drive that response. This study was done on county level data from the state of Florida during the COVID-19 pandemic, and it is found that the most deterministic predictors are county population density …


A Monte Carlo Analysis Of Ordinary Least Squares Versus Equal Weights, James Brewer Ayres Oct 2020

A Monte Carlo Analysis Of Ordinary Least Squares Versus Equal Weights, James Brewer Ayres

Masters Theses & Specialist Projects

Equal weights are an alternative weighting procedure to the optimal weights offered by ordinary least squares regression analysis. Also called units weights, equal weights are formed by standardizing scores on the predictor variables and averaging these standardized scores to create a composite score. Research is limited regarding the conditions under which equal weights result in cross-validated 𝑅𝑅2 values that meet or exceed optimal weights. In this study, I explored the effect of various predictor-criterion correlations, predictor intercorrelations, and sample sizes to determine the relative performance of equal and optimal weighting schemes upon cross-validation. Results indicated that optimally weighted predictors explained …


Linear Methods For Regression With Small Sample Sizes Relative To The Number Of Variables., Rajesh Sikder Aug 2020

Linear Methods For Regression With Small Sample Sizes Relative To The Number Of Variables., Rajesh Sikder

Electronic Theses and Dissertations

In data sets where there are a small number of observations but a large number of variables observed for each observation, ordinary least squares estimation cannot be used for regression models. There are many alternative including stepwise regression, penalized methods such as ridge regression and the LASSO, and methods based on derived inputs such as principal components regression and partial least squares regression. In this thesis, these five methods are described. K-fold cross validation is also discussed as a way for determining regularization parameters for each method. The performance of these methods in estimation and prediction is also examined through …


Using Stability To Select A Shrinkage Method, Dean Dustin May 2020

Using Stability To Select A Shrinkage Method, Dean Dustin

Department of Statistics: Dissertations, Theses, and Student Work

Shrinkage methods are estimation techniques based on optimizing expressions to find which variables to include in an analysis, typically a linear regression. The general form of these expressions is the sum of an empirical risk plus a complexity penalty based on the number of parameters. Many shrinkage methods are known to satisfy an ‘oracle’ property meaning that asymptotically they select the correct variables and estimate their coefficients efficiently. In Section 1.2, we show oracle properties in two general settings. The first uses a log likelihood in place of the empirical risk and allows a general class of penalties. The second …


Introduction To Research Statistical Analysis: An Overview Of The Basics, Christian Vandever Apr 2020

Introduction To Research Statistical Analysis: An Overview Of The Basics, Christian Vandever

HCA Healthcare Journal of Medicine

This article covers many statistical ideas essential to research statistical analysis. Sample size is explained through the concepts of statistical significance level and power. Variable types and definitions are included to clarify necessities for how the analysis will be interpreted. Categorical and quantitative variable types are defined, as well as response and predictor variables. Statistical tests described include t-tests, ANOVA and chi-square tests. Multiple regression is also explored for both logistic and linear regression. Finally, the most common statistics produced by these methods are explored.


An Exploration Of Link Functions Used In Ordinal Regression, Thomas J. Smith, David A. Walker, Cornelius M. Mckenna Apr 2020

An Exploration Of Link Functions Used In Ordinal Regression, Thomas J. Smith, David A. Walker, Cornelius M. Mckenna

Journal of Modern Applied Statistical Methods

The purpose of this study is to examine issues involved with choice of a link function in generalized linear models with ordinal outcomes, including distributional appropriateness, link specificity, and palindromic invariance are discussed and an exemplar analysis provided using the Pew Research Center 25th anniversary of the Web Omnibus Survey data. Simulated data are used to compare the relative palindromic invariance of four distinct indices of determination/discrimination, including a newly proposed index by Smith et al. (2017).


Evaluation Of Relationship Between Lead-Dust Loading, Lead-Dust Concentration, And Total Dust Loading Metrics Across Multiple Data Sets, Charles Bevington Dec 2019

Evaluation Of Relationship Between Lead-Dust Loading, Lead-Dust Concentration, And Total Dust Loading Metrics Across Multiple Data Sets, Charles Bevington

Capstone Experience

Lead-dust monitoring studies report values as either lead-dust loadings µg/ft2 or as lead-dust concentrations µg/g. It is rare for studies to report both metrics. When only lead-dust loading values are present, professionals require an approach to estimate lead-dust concentration values. A literature search identified five studies that contained raw data for both lead-dust loading and lead-dust concentration. An additional thirty-two studies had summary-statistics available for both lead-dust loading and lead-dust concentration. Studies with raw-data were used to develop an empirically-based loading to concentration statistical relationship. Raw data sets were critically evaluated to determine whether elimination or …


Joint Asymptotics For Smoothing Spline Semiparametric Nonlinear Models, Jiahui Yu Oct 2019

Joint Asymptotics For Smoothing Spline Semiparametric Nonlinear Models, Jiahui Yu

Doctoral Dissertations

We study the joint asymptotics of general smoothing spline semiparametric models in the settings of density estimation and regression. We provide a systematic framework which incorporates many existing models as special cases, and further allows for nonlinear relationships between the finite-dimensional Euclidean parameter and the infinite-dimensional functional parameter. For both density estimation and regression, we establish the local existence and uniqueness of the penalized likelihood estimators for our proposed models. In the density estimation setting, we prove joint consistency and obtain the rates of convergence of the joint estimator in an appropriate norm. The convergence rate of the parametric component …


Semi-Supervised Regression With Generative Adversarial Networks Using Minimal Labeled Data, Greg Olmschenk Sep 2019

Semi-Supervised Regression With Generative Adversarial Networks Using Minimal Labeled Data, Greg Olmschenk

Dissertations, Theses, and Capstone Projects

This work studies the generalization of semi-supervised generative adversarial networks (GANs) to regression tasks. A novel feature layer contrasting optimization function, in conjunction with a feature matching optimization, allows the adversarial network to learn from unannotated data and thereby reduce the number of labels required to train a predictive network. An analysis of simulated training conditions is performed to explore the capabilities and limitations of the method. In concert with the semi-supervised regression GANs, an improved label topology and upsampling technique for multi-target regression tasks are shown to reduce data requirements. Improvements are demonstrated on a wide variety of vision …


Successful Shot Locations And Shot Types Used In Ncaa Men’S Division I Basketball, Olivia D. Perrin Aug 2019

Successful Shot Locations And Shot Types Used In Ncaa Men’S Division I Basketball, Olivia D. Perrin

All NMU Master's Theses

The primary purpose of the current study was to investigate the effect of court location (distance and angle from basket) and shot types used on shot success in NCAA Men’s DI basketball during the 2017-18 season. A secondary purpose was to further expand the analysis based on two additional factors: player position (guard, forward, or center) and team ranking. All statistical analyses were completed in RStudio and three binomial logistic regression analyses were performed to evaluate factors that influence shot success; one for all two and three point shot attempts, one for only two point attempts, and one for only …


Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia May 2019

Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia

SMU Data Science Review

In this paper, we help NASA solve three Exploration Mission-1 (EM-1) challenges: data storage, computation time, and visualization of complex data. NASA is studying one year of trajectory data to determine available launch opportunities (about 90TBs of data). We improve data storage by introducing a cloud-based solution that provides elasticity and server upgrades. This migration will save $120k in infrastructure costs every four years, and potentially avoid schedule slips. Additionally, it increases computational efficiency by 125%. We further enhance computation via machine learning techniques that use the classic orbital elements to predict valid trajectories. Our machine learning model decreases trajectory …


A Self-Contained Course In The Mathematical Theory Of Statistics For Scientists & Engineers With An Emphasis On Predictive Regression Modeling & Financial Applications., Tim Smith Apr 2019

A Self-Contained Course In The Mathematical Theory Of Statistics For Scientists & Engineers With An Emphasis On Predictive Regression Modeling & Financial Applications., Tim Smith

Timothy Smith

Preface & Acknowledgments

This textbook is designed for a higher level undergraduate, perhaps even first year graduate, course for engineering or science students who are interested to gain knowledge of using data analysis to make predictive models. While there is no statistical perquisite knowledge required to read this book, due to the fact that the study is designed for the reader to truly understand the underlying theory rather than just learn how to read computer output, it would be best read with some familiarity of elementary statistics. The book is self-contained and the only true perquisite knowledge is a solid …