Open Access. Powered by Scholars. Published by Universities.®
- Institution
- Publication Year
- Publication
-
- U.C. Berkeley Division of Biostatistics Working Paper Series (5)
- FIU Electronic Theses and Dissertations (2)
- SMU Data Science Review (2)
- CMC Senior Theses (1)
- Doctoral Dissertations (1)
-
- Electronic Thesis and Dissertation Repository (1)
- HCA Healthcare Journal of Medicine (1)
- Honors Theses and Capstones (1)
- Industrial Engineering Undergraduate Honors Theses (1)
- LSU Master's Theses (1)
- Medical Student Research Symposium (1)
- Theses and Dissertations--Statistics (1)
- Williams Honors College, Honors Research Projects (1)
- Publication Type
Articles 1 - 19 of 19
Full-Text Articles in Entire DC Network
Defensive Impact Wins: Developing A New Method To Rate Individual Defense In Nba Games, Dylan J. Stiles
Defensive Impact Wins: Developing A New Method To Rate Individual Defense In Nba Games, Dylan J. Stiles
Honors Theses and Capstones
With the analytics revolution in sports in the past 20 years, it seems that everything that can be quantified is. In basketball though, trying to break the game down into a set of numbers comes with a unique problem. While we've come up with a good set of advanced numbers to measure offensive efficiency, defense is fundamentally harder to quantify. The game is played five on five, but it has often been popular or convenient to model defense as a set of five one on one games. As defenses became more complex into the 2010s, this methodology became more insignificant. …
The Short-Term Effects Of Fine Airborne Particulate Matter And Climate On Covid-19 Disease Dynamics, El Hussain Shamsa, Kezhong Zhang
The Short-Term Effects Of Fine Airborne Particulate Matter And Climate On Covid-19 Disease Dynamics, El Hussain Shamsa, Kezhong Zhang
Medical Student Research Symposium
Background: Despite more than 60% of the United States population being fully vaccinated, COVID-19 cases continue to spike in a temporal pattern. These patterns in COVID-19 incidence and mortality may be linked to short-term changes in environmental factors.
Methods: Nationwide, county-wise measurements for COVID-19 cases and deaths, fine-airborne particulate matter (PM2.5), and maximum temperature were obtained from March 20, 2020 to March 20, 2021. Multivariate Linear Regression was used to analyze the association between environmental factors and COVID-19 incidence and mortality rates in each season. Negative Binomial Regression was used to analyze daily fluctuations of COVID-19 cases …
Assessing The Influence Of Health Policy And Population Mobility On Covid-19 Spread In Arkansas, Tayden Barretto
Assessing The Influence Of Health Policy And Population Mobility On Covid-19 Spread In Arkansas, Tayden Barretto
Industrial Engineering Undergraduate Honors Theses
The outbreak of COVID-19 has created a major crisis across the world since its start in 2019, and its influence on every realm of society is undeniable. Globally, more than 500 million cases have been recorded since March 2020, with almost 6 million deaths. In the wake of this crisis, many governments and health organizations have taken steps and precautions to mitigate its spread. These steps involve public mandates of information, reducing frequency of personal contact, and use of masks to minimize the risk of transmission. Current access to mobility data released from Google detailing population movements has provided a …
Analysis Of Minor League Rule Changes Effect On Stolen Bases, Zachary Houghtaling
Analysis Of Minor League Rule Changes Effect On Stolen Bases, Zachary Houghtaling
Williams Honors College, Honors Research Projects
This study uses various statistical analyses to evaluate the justification of rule changes for Major League Baseball that were implemented within the Minor Leagues during the 2021 minor league season. The primary focus of the study is predicting how some of these Minor League rule changes could affect the stolen base success rate and the number of attempts per game within the Major Leagues. A survey was conducted to evaluate how fans feel about stolen bases within the current game and if rules should be altered to increase the number of stolen bases that occur. Additionally, recorded Major and Minor …
Statistical Theory For Specialized Linear Regression Adjustment Methods Compared To Multiple Linear Regression In The Presence And Absence Of Interaction Effects, Leon Su
Theses and Dissertations--Statistics
When building models to investigate outcomes and variables of interest, researchers often want to adjust for other variables. There is a variety of ways that these adjustments are performed. In this work, we will consider four approaches to adjustment utilized by researchers in various fields. We will compare the efficacy of these methods to what we call the ”true model method”, fitting a multiple linear regression model in which adjustment variables are model covariates. Our goal is to show that these adjustment methods have inferior performance to the true model method by comparing model parameter estimates, power, type I error, …
An Evaluation Of Knot Placement Strategies For Spline Regression, William Klein
An Evaluation Of Knot Placement Strategies For Spline Regression, William Klein
CMC Senior Theses
Regression splines have an established value for producing quality fit at a relatively low-degree polynomial. This paper explores the implications of adopting new methods for knot selection in tandem with established methodology from the current literature. Structural features of generated datasets, as well as residuals collected from sequential iterative models are used to augment the equidistant knot selection process. From analyzing a simulated dataset and an application onto the Racial Animus dataset, I find that a B-spline basis paired with equally-spaced knots remains the best choice when data are evenly distributed, even when structural features of a dataset are known …
Introduction To Research Statistical Analysis: An Overview Of The Basics, Christian Vandever
Introduction To Research Statistical Analysis: An Overview Of The Basics, Christian Vandever
HCA Healthcare Journal of Medicine
This article covers many statistical ideas essential to research statistical analysis. Sample size is explained through the concepts of statistical significance level and power. Variable types and definitions are included to clarify necessities for how the analysis will be interpreted. Categorical and quantitative variable types are defined, as well as response and predictor variables. Statistical tests described include t-tests, ANOVA and chi-square tests. Multiple regression is also explored for both logistic and linear regression. Finally, the most common statistics produced by these methods are explored.
Joint Asymptotics For Smoothing Spline Semiparametric Nonlinear Models, Jiahui Yu
Joint Asymptotics For Smoothing Spline Semiparametric Nonlinear Models, Jiahui Yu
Doctoral Dissertations
We study the joint asymptotics of general smoothing spline semiparametric models in the settings of density estimation and regression. We provide a systematic framework which incorporates many existing models as special cases, and further allows for nonlinear relationships between the finite-dimensional Euclidean parameter and the infinite-dimensional functional parameter. For both density estimation and regression, we establish the local existence and uniqueness of the penalized likelihood estimators for our proposed models. In the density estimation setting, we prove joint consistency and obtain the rates of convergence of the joint estimator in an appropriate norm. The convergence rate of the parametric component …
Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia
Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia
SMU Data Science Review
In this paper, we help NASA solve three Exploration Mission-1 (EM-1) challenges: data storage, computation time, and visualization of complex data. NASA is studying one year of trajectory data to determine available launch opportunities (about 90TBs of data). We improve data storage by introducing a cloud-based solution that provides elasticity and server upgrades. This migration will save $120k in infrastructure costs every four years, and potentially avoid schedule slips. Additionally, it increases computational efficiency by 125%. We further enhance computation via machine learning techniques that use the classic orbital elements to predict valid trajectories. Our machine learning model decreases trajectory …
Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse
Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse
FIU Electronic Theses and Dissertations
Regression is a statistical technique for modeling the relationship between a dependent variable Y and two or more predictor variables, also known as regressors. In the broad field of regression, there exists a special case in which the relationship between the dependent variable and the regressor(s) is linear. This is known as linear regression.
The purpose of this paper is to create a useful method that effectively selects a subset of regressors when dealing with high dimensional data and/or collinearity in linear regression. As the name depicts it, high dimensional data occurs when the number of predictor variables is far …
Identifying Key Factors Associated With High Risk Asthma Patients To Reduce The Cost Of Health Resources Utilization, Amani Ahmad
Identifying Key Factors Associated With High Risk Asthma Patients To Reduce The Cost Of Health Resources Utilization, Amani Ahmad
LSU Master's Theses
Asthma is associated with frequent use of primary health services and places a burden on the United States economy. Identifying key factors associated with increased cost of asthma is an essential step to improve practices of asthma management.
The aim of this study was to identify factors associated with over utilization of primary health services and increased cost via claims data and to explore the effectiveness of case management program in reducing overall asthma related cost.
Claims data analysis for Medicaid insured asthma patients in Louisiana was conducted. Asthma patients were identified using their ICD-9 and ICD-10 codes, forward variable …
Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels
Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels
SMU Data Science Review
In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that reviews …
Sabermetrics - Statistical Modeling Of Run Creation And Prevention In Baseball, Parker Chernoff
Sabermetrics - Statistical Modeling Of Run Creation And Prevention In Baseball, Parker Chernoff
FIU Electronic Theses and Dissertations
The focus of this thesis was to investigate which baseball metrics are most conducive to run creation and prevention. Stepwise regression and Liu estimation were used to formulate two models for the dependent variables and also used for cross validation. Finally, the predicted values were fed into the Pythagorean Expectation formula to predict a team’s most important goal: winning.
Each model fit strongly and collinearity amongst offensive predictors was considered using variance inflation factors. Hits, walks, and home runs allowed, infield putouts, errors, defense-independent earned run average ratio, defensive efficiency ratio, saves, runners left on base, shutouts, and walks per …
A New Diagnostic Test For Regression, Yun Shi
A New Diagnostic Test For Regression, Yun Shi
Electronic Thesis and Dissertation Repository
A new diagnostic test for regression and generalized linear models is discussed. The test is based on testing if the residuals are close together in the linear space of one of the covariates are correlated. This is a generalization of the famous problem of spurious correlation in time series regression. A full model building approach for the case of regression was developed in Mahdi (2011, Ph.D. Thesis, Western University, ”Diagnostic Checking, Time Series and Regression”) using an iterative generalized least squares algorithm. Simulation experiments were reported that demonstrate the validity and utility of this approach but no actual applications were …
Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan
Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Regression models are often used to test for cause-effect relationships from data collected in randomized trials or experiments. This practice has deservedly come under heavy scrutiny, since commonly used models such as linear and logistic regression will often not capture the actual relationships between variables, and incorrectly specified models potentially lead to incorrect conclusions. In this paper, we focus on hypothesis test of whether the treatment given in a randomized trial has any effect on the mean of the primary outcome, within strata of baseline variables such as age, sex, and health status. Our primary concern is ensuring that such …
The Cross-Validated Adaptive Epsilon-Net Estimator, Mark J. Van Der Laan, Sandrine Dudoit, Aad W. Van Der Vaart
The Cross-Validated Adaptive Epsilon-Net Estimator, Mark J. Van Der Laan, Sandrine Dudoit, Aad W. Van Der Vaart
U.C. Berkeley Division of Biostatistics Working Paper Series
Suppose that we observe a sample of independent and identically distributed realizations of a random variable. Assume that the parameter of interest can be defined as the minimizer, over a suitably defined parameter space, of the expectation (with respect to the distribution of the random variable) of a particular (loss) function of a candidate parameter value and the random variable. Examples of commonly used loss functions are the squared error loss function in regression and the negative log-density loss function in density estimation. Minimizing the empirical risk (i.e., the empirical mean of the loss function) over the entire parameter space …
Asymptotics Of Cross-Validated Risk Estimation In Estimator Selection And Performance Assessment, Sandrine Dudoit, Mark J. Van Der Laan
Asymptotics Of Cross-Validated Risk Estimation In Estimator Selection And Performance Assessment, Sandrine Dudoit, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Risk estimation is an important statistical question for the purposes of selecting a good estimator (i.e., model selection) and assessing its performance (i.e., estimating generalization error). This article introduces a general framework for cross-validation and derives distributional properties of cross-validated risk estimators in the context of estimator selection and performance assessment. Arbitrary classes of estimators are considered, including density estimators and predictors for both continuous and polychotomous outcomes. Results are provided for general full data loss functions (e.g., absolute and squared error, indicator, negative log density). A broad definition of cross-validation is used in order to cover leave-one-out cross-validation, V-fold …
Locally Efficient Estimation Of Regression Parameters Using Current Status Data, Chris Andrews, Mark J. Van Der Laan, James M. Robins
Locally Efficient Estimation Of Regression Parameters Using Current Status Data, Chris Andrews, Mark J. Van Der Laan, James M. Robins
U.C. Berkeley Division of Biostatistics Working Paper Series
In biostatistics applications interest often focuses on the estimation of the distribution of a time-variable T. If one only observes whether or not T exceeds an observed monitoring time C, then the data structure is called current status data, also known as interval censored data, case I. We consider this data structure extended to allow the presence of both time-independent covariates and time-dependent covariate processes that are observed until the monitoring time. We assume that the monitoring process satisfies coarsening at random.
Our goal is to estimate the regression parameter beta of the regression model T = Z*beta+epsilon where the …
Bivariate Current Status Data, Mark J. Van Der Laan, Nicholas P. Jewell
Bivariate Current Status Data, Mark J. Van Der Laan, Nicholas P. Jewell
U.C. Berkeley Division of Biostatistics Working Paper Series
In many applications, it is often of interest to estimate a bivariate distribution of two survival random variables. Complete observation of such random variables is often incomplete. If one only observes whether or not each of the individual survival times exceeds a common observed monitoring time C, then the data structure is referred to as bivariate current status data (Wang and Ding, 2000). For such data, we show that the identifiable part of the joint distribution is represented by three univariate cumulative distribution functions, namely the two marginal cumulative distribution functions, and the bivariate cumulative distribution function evaluated on the …