Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 21 of 21

Full-Text Articles in Entire DC Network

The Short-Term Effects Of Fine Airborne Particulate Matter And Climate On Covid-19 Disease Dynamics, El Hussain Shamsa, Kezhong Zhang Jun 2022

The Short-Term Effects Of Fine Airborne Particulate Matter And Climate On Covid-19 Disease Dynamics, El Hussain Shamsa, Kezhong Zhang

Medical Student Research Symposium

Background: Despite more than 60% of the United States population being fully vaccinated, COVID-19 cases continue to spike in a temporal pattern. These patterns in COVID-19 incidence and mortality may be linked to short-term changes in environmental factors.

Methods: Nationwide, county-wise measurements for COVID-19 cases and deaths, fine-airborne particulate matter (PM2.5), and maximum temperature were obtained from March 20, 2020 to March 20, 2021. Multivariate Linear Regression was used to analyze the association between environmental factors and COVID-19 incidence and mortality rates in each season. Negative Binomial Regression was used to analyze daily fluctuations of COVID-19 cases …


Aggregating Twitter Text Through Generalized Linear Regression Models For Tweet Popularity Prediction And Automatic Topic Classification, Chen Mo, Jingjing Yin, Isaac Chun-Hai Fung, Zion Tse Nov 2021

Aggregating Twitter Text Through Generalized Linear Regression Models For Tweet Popularity Prediction And Automatic Topic Classification, Chen Mo, Jingjing Yin, Isaac Chun-Hai Fung, Zion Tse

Department of Biostatistics, Epidemiology, and Environmental Health Sciences Faculty Publications

Social media platforms have become accessible resources for health data analysis. However, the advanced computational techniques involved in big data text mining and analysis are challenging for public health data analysts to apply. This study proposes and explores the feasibility of a novel yet straightforward method by regressing the outcome of interest on the aggregated influence scores for association and/or classification analyses based on generalized linear models. The method reduces the document term matrix by transforming text data into a continuous summary score, thereby reducing the data dimension substantially and easing the data sparsity issue of the term matrix. To …


A Statistical Learning Regression Model Utilized To Determine Predictive Factors Of Social Distancing During Covid-19 Pandemic, Timothy A. Smith, Albert J. Boquet, Matthew V. Chin Nov 2020

A Statistical Learning Regression Model Utilized To Determine Predictive Factors Of Social Distancing During Covid-19 Pandemic, Timothy A. Smith, Albert J. Boquet, Matthew V. Chin

Publications

In an application of the mathematical theory of statistics, predictive regression modelling can be used to determine if there is a trend to predict the response variable of social distancing in terms of multiple predictor input “predictor” variables. In this study the social distancing is measured as the percentage reduction in average mobility by GPS records, and the mathematical results obtained are interpreted to determine what factors drive that response. This study was done on county level data from the state of Florida during the COVID-19 pandemic, and it is found that the most deterministic predictors are county population density …


A Monte Carlo Analysis Of Ordinary Least Squares Versus Equal Weights, James Brewer Ayres Oct 2020

A Monte Carlo Analysis Of Ordinary Least Squares Versus Equal Weights, James Brewer Ayres

Masters Theses & Specialist Projects

Equal weights are an alternative weighting procedure to the optimal weights offered by ordinary least squares regression analysis. Also called units weights, equal weights are formed by standardizing scores on the predictor variables and averaging these standardized scores to create a composite score. Research is limited regarding the conditions under which equal weights result in cross-validated 𝑅𝑅2 values that meet or exceed optimal weights. In this study, I explored the effect of various predictor-criterion correlations, predictor intercorrelations, and sample sizes to determine the relative performance of equal and optimal weighting schemes upon cross-validation. Results indicated that optimally weighted predictors explained …


Using Stability To Select A Shrinkage Method, Dean Dustin May 2020

Using Stability To Select A Shrinkage Method, Dean Dustin

Department of Statistics: Dissertations, Theses, and Student Work

Shrinkage methods are estimation techniques based on optimizing expressions to find which variables to include in an analysis, typically a linear regression. The general form of these expressions is the sum of an empirical risk plus a complexity penalty based on the number of parameters. Many shrinkage methods are known to satisfy an ‘oracle’ property meaning that asymptotically they select the correct variables and estimate their coefficients efficiently. In Section 1.2, we show oracle properties in two general settings. The first uses a log likelihood in place of the empirical risk and allows a general class of penalties. The second …


Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse Apr 2019

Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse

FIU Electronic Theses and Dissertations

Regression is a statistical technique for modeling the relationship between a dependent variable Y and two or more predictor variables, also known as regressors. In the broad field of regression, there exists a special case in which the relationship between the dependent variable and the regressor(s) is linear. This is known as linear regression.

The purpose of this paper is to create a useful method that effectively selects a subset of regressors when dealing with high dimensional data and/or collinearity in linear regression. As the name depicts it, high dimensional data occurs when the number of predictor variables is far …


Sabermetrics - Statistical Modeling Of Run Creation And Prevention In Baseball, Parker Chernoff Mar 2018

Sabermetrics - Statistical Modeling Of Run Creation And Prevention In Baseball, Parker Chernoff

FIU Electronic Theses and Dissertations

The focus of this thesis was to investigate which baseball metrics are most conducive to run creation and prevention. Stepwise regression and Liu estimation were used to formulate two models for the dependent variables and also used for cross validation. Finally, the predicted values were fed into the Pythagorean Expectation formula to predict a team’s most important goal: winning.

Each model fit strongly and collinearity amongst offensive predictors was considered using variance inflation factors. Hits, walks, and home runs allowed, infield putouts, errors, defense-independent earned run average ratio, defensive efficiency ratio, saves, runners left on base, shutouts, and walks per …


Trend And Acceleration: A Multi-Model Approach To Key West Sea Level Rise, John Tenenholtz Nov 2017

Trend And Acceleration: A Multi-Model Approach To Key West Sea Level Rise, John Tenenholtz

FIU Electronic Theses and Dissertations

Sea level rise (SLR) varies depending on location. It is therefore important to local residents, businesses and government to analyze SLR locally. Further, because of increasing ice melt and other effects of climate change, rates of SLR may change. It is therefore also important to evaluate rates of change of SLR, which we call sea level acceleration (SLA) or deceleration.

The present thesis will review the annual average sea level data compiled at the Key West tidal gauge in Key West, Florida. We use a multi-model approach that compares the results of various models on that data set. The goal …


Predicting Successful Long-Term Weight Loss From Short-Term Weight-Loss Outcomes: New Insights From A Dynamic Energy Balance Model (The Pounds Lost Study), Diana Thomas, W Andrada Ivanescu, Corby K. Martin, Steven B. Heymsfield, Kaitlyn Marshall, Victoria E. Bodrato, Donald Williamson, Stephen Anton, Frank M. Sacks, Donna Ryan, George A. Bray Mar 2015

Predicting Successful Long-Term Weight Loss From Short-Term Weight-Loss Outcomes: New Insights From A Dynamic Energy Balance Model (The Pounds Lost Study), Diana Thomas, W Andrada Ivanescu, Corby K. Martin, Steven B. Heymsfield, Kaitlyn Marshall, Victoria E. Bodrato, Donald Williamson, Stephen Anton, Frank M. Sacks, Donna Ryan, George A. Bray

Department of Applied Mathematics and Statistics Faculty Scholarship and Creative Works

Background: Currently, early weight-loss predictions of long-term weight-loss success rely on fixed percent-weight-loss thresholds.

Objective: The objective was to develop thresholds during the first 3 mo of intervention that include the influence of age, sex, baseline weight, percent weight loss, and deviations from expected weight to predict whether a participant is likely to lose 5% or more body weight by year 1.

Design: Data consisting of month 1, 2, 3, and 12 treatment weights were obtained from the 2-y Preventing Obesity Using Novel Dietary Strategies (POUNDS Lost) intervention. Logistic regression models that included covariates of age, height, sex, baseline weight, …


On The Causal Interpretation Of Race In Regressions Adjusting For Confounding And Mediating Variables, Tyler J. Vanderweele, Whitney Robinson Nov 2013

On The Causal Interpretation Of Race In Regressions Adjusting For Confounding And Mediating Variables, Tyler J. Vanderweele, Whitney Robinson

Harvard University Biostatistics Working Paper Series

We consider different possible interpretations of the “effect of race” when regressions are run with race as an exposure variable, controlling also for various confounding and mediating variables. When adjustment is made for socioeconomic status early in a person's life, we discuss under what contexts the regression coefficients for race can be interpreted as corresponding to the extent to which a racial disparity would remain if various socioeconomic distributions early in life across racial groups could be equalized. When adjustment is also made for adult socioeconomic status, we note how the overall disparity can be decomposed into the portion that …


The Em Algorithm For Group Testing Regression Models Under Matrix Pooling, Christopher R. Bilder, Boan Zhang Oct 2009

The Em Algorithm For Group Testing Regression Models Under Matrix Pooling, Christopher R. Bilder, Boan Zhang

Department of Statistics: Faculty Publications

No abstract provided.


Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan Jan 2008

Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Regression models are often used to test for cause-effect relationships from data collected in randomized trials or experiments. This practice has deservedly come under heavy scrutiny, since commonly used models such as linear and logistic regression will often not capture the actual relationships between variables, and incorrectly specified models potentially lead to incorrect conclusions. In this paper, we focus on hypothesis test of whether the treatment given in a randomized trial has any effect on the mean of the primary outcome, within strata of baseline variables such as age, sex, and health status. Our primary concern is ensuring that such …


On Time Series Analysis Of Public Health And Biomedical Data, Scott L. Zeger, Rafael A. Irizarry, Roger D. Peng Sep 2004

On Time Series Analysis Of Public Health And Biomedical Data, Scott L. Zeger, Rafael A. Irizarry, Roger D. Peng

Johns Hopkins University, Dept. of Biostatistics Working Papers

A time series is a sequence of observations made over time. Examples in public health include daily ozone concentrations, weekly admissions to an emergency department or annual expenditures on health care in the United States. Time series models are used to describe the dependence of the response at each time on predictor variables including covariates and possibly previous values in the series. Time series methods are necessary to account for the correlation among repeated responses over time. This paper gives an overview of time series ideas and methods used in public health research.


The Cross-Validated Adaptive Epsilon-Net Estimator, Mark J. Van Der Laan, Sandrine Dudoit, Aad W. Van Der Vaart Feb 2004

The Cross-Validated Adaptive Epsilon-Net Estimator, Mark J. Van Der Laan, Sandrine Dudoit, Aad W. Van Der Vaart

U.C. Berkeley Division of Biostatistics Working Paper Series

Suppose that we observe a sample of independent and identically distributed realizations of a random variable. Assume that the parameter of interest can be defined as the minimizer, over a suitably defined parameter space, of the expectation (with respect to the distribution of the random variable) of a particular (loss) function of a candidate parameter value and the random variable. Examples of commonly used loss functions are the squared error loss function in regression and the negative log-density loss function in density estimation. Minimizing the empirical risk (i.e., the empirical mean of the loss function) over the entire parameter space …


To Model Or Not To Model? Competing Modes Of Inference For Finite Population Sampling, Rod Little Nov 2003

To Model Or Not To Model? Competing Modes Of Inference For Finite Population Sampling, Rod Little

The University of Michigan Department of Biostatistics Working Paper Series

Finite population sampling is perhaps the only area of statistics where the primary mode of analysis is based on the randomization distribution, rather than on statistical models for the measured variables. This article reviews the debate between design and model-based inference. The basic features of the two approaches are illustrated using the case of inference about the mean from stratified random samples. Strengths and weakness of design-based and model-based inference for surveys are discussed. It is suggested that models that take into account the sample design and make weak parametric assumptions can produce reliable and efficient inferences in surveys settings. …


Asymptotics Of Cross-Validated Risk Estimation In Estimator Selection And Performance Assessment, Sandrine Dudoit, Mark J. Van Der Laan Feb 2003

Asymptotics Of Cross-Validated Risk Estimation In Estimator Selection And Performance Assessment, Sandrine Dudoit, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Risk estimation is an important statistical question for the purposes of selecting a good estimator (i.e., model selection) and assessing its performance (i.e., estimating generalization error). This article introduces a general framework for cross-validation and derives distributional properties of cross-validated risk estimators in the context of estimator selection and performance assessment. Arbitrary classes of estimators are considered, including density estimators and predictors for both continuous and polychotomous outcomes. Results are provided for general full data loss functions (e.g., absolute and squared error, indicator, negative log density). A broad definition of cross-validation is used in order to cover leave-one-out cross-validation, V-fold …


Partial Auc Estimation And Regression, Lori E. Dodd, Margaret S. Pepe Jan 2003

Partial Auc Estimation And Regression, Lori E. Dodd, Margaret S. Pepe

UW Biostatistics Working Paper Series

Accurate disease diagnosis is critical for health care. New diagnostic and screening tests must be evaluated for their abilities to discriminate disease from non-diseased states. The partial area under the ROC curve (partial AUC) is a measure of diagnostic test accuracy. We present an interpretation of the partial AUC that gives rise to a new non-parametric estimator. This estimator is more robust than existing estimators, which make parametric assumptions. We show that the robustness is gained with only a moderate loss in efficiency. We describe a regression modelling framework for making inference about covariate effects on the partial AUC. Such …


Locally Efficient Estimation Of Regression Parameters Using Current Status Data, Chris Andrews, Mark J. Van Der Laan, James M. Robins Sep 2002

Locally Efficient Estimation Of Regression Parameters Using Current Status Data, Chris Andrews, Mark J. Van Der Laan, James M. Robins

U.C. Berkeley Division of Biostatistics Working Paper Series

In biostatistics applications interest often focuses on the estimation of the distribution of a time-variable T. If one only observes whether or not T exceeds an observed monitoring time C, then the data structure is called current status data, also known as interval censored data, case I. We consider this data structure extended to allow the presence of both time-independent covariates and time-dependent covariate processes that are observed until the monitoring time. We assume that the monitoring process satisfies coarsening at random.

Our goal is to estimate the regression parameter beta of the regression model T = Z*beta+epsilon where the …


Bivariate Current Status Data, Mark J. Van Der Laan, Nicholas P. Jewell Sep 2002

Bivariate Current Status Data, Mark J. Van Der Laan, Nicholas P. Jewell

U.C. Berkeley Division of Biostatistics Working Paper Series

In many applications, it is often of interest to estimate a bivariate distribution of two survival random variables. Complete observation of such random variables is often incomplete. If one only observes whether or not each of the individual survival times exceeds a common observed monitoring time C, then the data structure is referred to as bivariate current status data (Wang and Ding, 2000). For such data, we show that the identifiable part of the joint distribution is represented by three univariate cumulative distribution functions, namely the two marginal cumulative distribution functions, and the bivariate cumulative distribution function evaluated on the …


Self-Consistency: A Fundamental Concept In Statistics, Thaddeus Tarpey, Bernard Flury Aug 1996

Self-Consistency: A Fundamental Concept In Statistics, Thaddeus Tarpey, Bernard Flury

Mathematics and Statistics Faculty Publications

The term ''self-consistency'' was introduced in 1989 by Hastie and Stuetzle to describe the property that each point on a smooth curve or surface is the mean of all points that project orthogonally onto it. We generalize this concept to self-consistent random vectors: a random vector Y is self-consistent for X if E[X|Y] = Y almost surely. This allows us to construct a unified theoretical basis for principal components, principal curves and surfaces, principal points, principal variables, principal modes of variation and other statistical methods. We provide some general results on self-consistent random variables, give …


Generating Unbiased Ratio And Regression Estimators, William (Bill) H. Williams Jun 1991

Generating Unbiased Ratio And Regression Estimators, William (Bill) H. Williams

Publications and Research

Standard ratio and regression are only conditionally unbiased. The paper uses split sample techniques to develop unbiased versions.