Open Access. Powered by Scholars. Published by Universities.®
- Institution
- Publication Year
- Publication
-
- U.C. Berkeley Division of Biostatistics Working Paper Series (5)
- FIU Electronic Theses and Dissertations (3)
- Department of Applied Mathematics and Statistics Faculty Scholarship and Creative Works (1)
- Department of Biostatistics, Epidemiology, and Environmental Health Sciences Faculty Publications (1)
- Department of Statistics: Dissertations, Theses, and Student Work (1)
-
- Department of Statistics: Faculty Publications (1)
- Harvard University Biostatistics Working Paper Series (1)
- Johns Hopkins University, Dept. of Biostatistics Working Papers (1)
- Masters Theses & Specialist Projects (1)
- Mathematics and Statistics Faculty Publications (1)
- Medical Student Research Symposium (1)
- Publications (1)
- Publications and Research (1)
- The University of Michigan Department of Biostatistics Working Paper Series (1)
- UW Biostatistics Working Paper Series (1)
Articles 1 - 21 of 21
Full-Text Articles in Entire DC Network
The Short-Term Effects Of Fine Airborne Particulate Matter And Climate On Covid-19 Disease Dynamics, El Hussain Shamsa, Kezhong Zhang
The Short-Term Effects Of Fine Airborne Particulate Matter And Climate On Covid-19 Disease Dynamics, El Hussain Shamsa, Kezhong Zhang
Medical Student Research Symposium
Background: Despite more than 60% of the United States population being fully vaccinated, COVID-19 cases continue to spike in a temporal pattern. These patterns in COVID-19 incidence and mortality may be linked to short-term changes in environmental factors.
Methods: Nationwide, county-wise measurements for COVID-19 cases and deaths, fine-airborne particulate matter (PM2.5), and maximum temperature were obtained from March 20, 2020 to March 20, 2021. Multivariate Linear Regression was used to analyze the association between environmental factors and COVID-19 incidence and mortality rates in each season. Negative Binomial Regression was used to analyze daily fluctuations of COVID-19 cases …
Aggregating Twitter Text Through Generalized Linear Regression Models For Tweet Popularity Prediction And Automatic Topic Classification, Chen Mo, Jingjing Yin, Isaac Chun-Hai Fung, Zion Tse
Aggregating Twitter Text Through Generalized Linear Regression Models For Tweet Popularity Prediction And Automatic Topic Classification, Chen Mo, Jingjing Yin, Isaac Chun-Hai Fung, Zion Tse
Department of Biostatistics, Epidemiology, and Environmental Health Sciences Faculty Publications
Social media platforms have become accessible resources for health data analysis. However, the advanced computational techniques involved in big data text mining and analysis are challenging for public health data analysts to apply. This study proposes and explores the feasibility of a novel yet straightforward method by regressing the outcome of interest on the aggregated influence scores for association and/or classification analyses based on generalized linear models. The method reduces the document term matrix by transforming text data into a continuous summary score, thereby reducing the data dimension substantially and easing the data sparsity issue of the term matrix. To …
A Statistical Learning Regression Model Utilized To Determine Predictive Factors Of Social Distancing During Covid-19 Pandemic, Timothy A. Smith, Albert J. Boquet, Matthew V. Chin
A Statistical Learning Regression Model Utilized To Determine Predictive Factors Of Social Distancing During Covid-19 Pandemic, Timothy A. Smith, Albert J. Boquet, Matthew V. Chin
Publications
In an application of the mathematical theory of statistics, predictive regression modelling can be used to determine if there is a trend to predict the response variable of social distancing in terms of multiple predictor input “predictor” variables. In this study the social distancing is measured as the percentage reduction in average mobility by GPS records, and the mathematical results obtained are interpreted to determine what factors drive that response. This study was done on county level data from the state of Florida during the COVID-19 pandemic, and it is found that the most deterministic predictors are county population density …
A Monte Carlo Analysis Of Ordinary Least Squares Versus Equal Weights, James Brewer Ayres
A Monte Carlo Analysis Of Ordinary Least Squares Versus Equal Weights, James Brewer Ayres
Masters Theses & Specialist Projects
Equal weights are an alternative weighting procedure to the optimal weights offered by ordinary least squares regression analysis. Also called units weights, equal weights are formed by standardizing scores on the predictor variables and averaging these standardized scores to create a composite score. Research is limited regarding the conditions under which equal weights result in cross-validated 𝑅𝑅2 values that meet or exceed optimal weights. In this study, I explored the effect of various predictor-criterion correlations, predictor intercorrelations, and sample sizes to determine the relative performance of equal and optimal weighting schemes upon cross-validation. Results indicated that optimally weighted predictors explained …
Using Stability To Select A Shrinkage Method, Dean Dustin
Using Stability To Select A Shrinkage Method, Dean Dustin
Department of Statistics: Dissertations, Theses, and Student Work
Shrinkage methods are estimation techniques based on optimizing expressions to find which variables to include in an analysis, typically a linear regression. The general form of these expressions is the sum of an empirical risk plus a complexity penalty based on the number of parameters. Many shrinkage methods are known to satisfy an ‘oracle’ property meaning that asymptotically they select the correct variables and estimate their coefficients efficiently. In Section 1.2, we show oracle properties in two general settings. The first uses a log likelihood in place of the empirical risk and allows a general class of penalties. The second …
Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse
Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse
FIU Electronic Theses and Dissertations
Regression is a statistical technique for modeling the relationship between a dependent variable Y and two or more predictor variables, also known as regressors. In the broad field of regression, there exists a special case in which the relationship between the dependent variable and the regressor(s) is linear. This is known as linear regression.
The purpose of this paper is to create a useful method that effectively selects a subset of regressors when dealing with high dimensional data and/or collinearity in linear regression. As the name depicts it, high dimensional data occurs when the number of predictor variables is far …
Sabermetrics - Statistical Modeling Of Run Creation And Prevention In Baseball, Parker Chernoff
Sabermetrics - Statistical Modeling Of Run Creation And Prevention In Baseball, Parker Chernoff
FIU Electronic Theses and Dissertations
The focus of this thesis was to investigate which baseball metrics are most conducive to run creation and prevention. Stepwise regression and Liu estimation were used to formulate two models for the dependent variables and also used for cross validation. Finally, the predicted values were fed into the Pythagorean Expectation formula to predict a team’s most important goal: winning.
Each model fit strongly and collinearity amongst offensive predictors was considered using variance inflation factors. Hits, walks, and home runs allowed, infield putouts, errors, defense-independent earned run average ratio, defensive efficiency ratio, saves, runners left on base, shutouts, and walks per …
Trend And Acceleration: A Multi-Model Approach To Key West Sea Level Rise, John Tenenholtz
Trend And Acceleration: A Multi-Model Approach To Key West Sea Level Rise, John Tenenholtz
FIU Electronic Theses and Dissertations
Sea level rise (SLR) varies depending on location. It is therefore important to local residents, businesses and government to analyze SLR locally. Further, because of increasing ice melt and other effects of climate change, rates of SLR may change. It is therefore also important to evaluate rates of change of SLR, which we call sea level acceleration (SLA) or deceleration.
The present thesis will review the annual average sea level data compiled at the Key West tidal gauge in Key West, Florida. We use a multi-model approach that compares the results of various models on that data set. The goal …
Predicting Successful Long-Term Weight Loss From Short-Term Weight-Loss Outcomes: New Insights From A Dynamic Energy Balance Model (The Pounds Lost Study), Diana Thomas, W Andrada Ivanescu, Corby K. Martin, Steven B. Heymsfield, Kaitlyn Marshall, Victoria E. Bodrato, Donald Williamson, Stephen Anton, Frank M. Sacks, Donna Ryan, George A. Bray
Predicting Successful Long-Term Weight Loss From Short-Term Weight-Loss Outcomes: New Insights From A Dynamic Energy Balance Model (The Pounds Lost Study), Diana Thomas, W Andrada Ivanescu, Corby K. Martin, Steven B. Heymsfield, Kaitlyn Marshall, Victoria E. Bodrato, Donald Williamson, Stephen Anton, Frank M. Sacks, Donna Ryan, George A. Bray
Department of Applied Mathematics and Statistics Faculty Scholarship and Creative Works
Background: Currently, early weight-loss predictions of long-term weight-loss success rely on fixed percent-weight-loss thresholds.
Objective: The objective was to develop thresholds during the first 3 mo of intervention that include the influence of age, sex, baseline weight, percent weight loss, and deviations from expected weight to predict whether a participant is likely to lose 5% or more body weight by year 1.
Design: Data consisting of month 1, 2, 3, and 12 treatment weights were obtained from the 2-y Preventing Obesity Using Novel Dietary Strategies (POUNDS Lost) intervention. Logistic regression models that included covariates of age, height, sex, baseline weight, …
On The Causal Interpretation Of Race In Regressions Adjusting For Confounding And Mediating Variables, Tyler J. Vanderweele, Whitney Robinson
On The Causal Interpretation Of Race In Regressions Adjusting For Confounding And Mediating Variables, Tyler J. Vanderweele, Whitney Robinson
Harvard University Biostatistics Working Paper Series
We consider different possible interpretations of the “effect of race” when regressions are run with race as an exposure variable, controlling also for various confounding and mediating variables. When adjustment is made for socioeconomic status early in a person's life, we discuss under what contexts the regression coefficients for race can be interpreted as corresponding to the extent to which a racial disparity would remain if various socioeconomic distributions early in life across racial groups could be equalized. When adjustment is also made for adult socioeconomic status, we note how the overall disparity can be decomposed into the portion that …
The Em Algorithm For Group Testing Regression Models Under Matrix Pooling, Christopher R. Bilder, Boan Zhang
The Em Algorithm For Group Testing Regression Models Under Matrix Pooling, Christopher R. Bilder, Boan Zhang
Department of Statistics: Faculty Publications
No abstract provided.
Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan
Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Regression models are often used to test for cause-effect relationships from data collected in randomized trials or experiments. This practice has deservedly come under heavy scrutiny, since commonly used models such as linear and logistic regression will often not capture the actual relationships between variables, and incorrectly specified models potentially lead to incorrect conclusions. In this paper, we focus on hypothesis test of whether the treatment given in a randomized trial has any effect on the mean of the primary outcome, within strata of baseline variables such as age, sex, and health status. Our primary concern is ensuring that such …
On Time Series Analysis Of Public Health And Biomedical Data, Scott L. Zeger, Rafael A. Irizarry, Roger D. Peng
On Time Series Analysis Of Public Health And Biomedical Data, Scott L. Zeger, Rafael A. Irizarry, Roger D. Peng
Johns Hopkins University, Dept. of Biostatistics Working Papers
A time series is a sequence of observations made over time. Examples in public health include daily ozone concentrations, weekly admissions to an emergency department or annual expenditures on health care in the United States. Time series models are used to describe the dependence of the response at each time on predictor variables including covariates and possibly previous values in the series. Time series methods are necessary to account for the correlation among repeated responses over time. This paper gives an overview of time series ideas and methods used in public health research.
The Cross-Validated Adaptive Epsilon-Net Estimator, Mark J. Van Der Laan, Sandrine Dudoit, Aad W. Van Der Vaart
The Cross-Validated Adaptive Epsilon-Net Estimator, Mark J. Van Der Laan, Sandrine Dudoit, Aad W. Van Der Vaart
U.C. Berkeley Division of Biostatistics Working Paper Series
Suppose that we observe a sample of independent and identically distributed realizations of a random variable. Assume that the parameter of interest can be defined as the minimizer, over a suitably defined parameter space, of the expectation (with respect to the distribution of the random variable) of a particular (loss) function of a candidate parameter value and the random variable. Examples of commonly used loss functions are the squared error loss function in regression and the negative log-density loss function in density estimation. Minimizing the empirical risk (i.e., the empirical mean of the loss function) over the entire parameter space …
To Model Or Not To Model? Competing Modes Of Inference For Finite Population Sampling, Rod Little
To Model Or Not To Model? Competing Modes Of Inference For Finite Population Sampling, Rod Little
The University of Michigan Department of Biostatistics Working Paper Series
Finite population sampling is perhaps the only area of statistics where the primary mode of analysis is based on the randomization distribution, rather than on statistical models for the measured variables. This article reviews the debate between design and model-based inference. The basic features of the two approaches are illustrated using the case of inference about the mean from stratified random samples. Strengths and weakness of design-based and model-based inference for surveys are discussed. It is suggested that models that take into account the sample design and make weak parametric assumptions can produce reliable and efficient inferences in surveys settings. …
Asymptotics Of Cross-Validated Risk Estimation In Estimator Selection And Performance Assessment, Sandrine Dudoit, Mark J. Van Der Laan
Asymptotics Of Cross-Validated Risk Estimation In Estimator Selection And Performance Assessment, Sandrine Dudoit, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Risk estimation is an important statistical question for the purposes of selecting a good estimator (i.e., model selection) and assessing its performance (i.e., estimating generalization error). This article introduces a general framework for cross-validation and derives distributional properties of cross-validated risk estimators in the context of estimator selection and performance assessment. Arbitrary classes of estimators are considered, including density estimators and predictors for both continuous and polychotomous outcomes. Results are provided for general full data loss functions (e.g., absolute and squared error, indicator, negative log density). A broad definition of cross-validation is used in order to cover leave-one-out cross-validation, V-fold …
Partial Auc Estimation And Regression, Lori E. Dodd, Margaret S. Pepe
Partial Auc Estimation And Regression, Lori E. Dodd, Margaret S. Pepe
UW Biostatistics Working Paper Series
Accurate disease diagnosis is critical for health care. New diagnostic and screening tests must be evaluated for their abilities to discriminate disease from non-diseased states. The partial area under the ROC curve (partial AUC) is a measure of diagnostic test accuracy. We present an interpretation of the partial AUC that gives rise to a new non-parametric estimator. This estimator is more robust than existing estimators, which make parametric assumptions. We show that the robustness is gained with only a moderate loss in efficiency. We describe a regression modelling framework for making inference about covariate effects on the partial AUC. Such …
Locally Efficient Estimation Of Regression Parameters Using Current Status Data, Chris Andrews, Mark J. Van Der Laan, James M. Robins
Locally Efficient Estimation Of Regression Parameters Using Current Status Data, Chris Andrews, Mark J. Van Der Laan, James M. Robins
U.C. Berkeley Division of Biostatistics Working Paper Series
In biostatistics applications interest often focuses on the estimation of the distribution of a time-variable T. If one only observes whether or not T exceeds an observed monitoring time C, then the data structure is called current status data, also known as interval censored data, case I. We consider this data structure extended to allow the presence of both time-independent covariates and time-dependent covariate processes that are observed until the monitoring time. We assume that the monitoring process satisfies coarsening at random.
Our goal is to estimate the regression parameter beta of the regression model T = Z*beta+epsilon where the …
Bivariate Current Status Data, Mark J. Van Der Laan, Nicholas P. Jewell
Bivariate Current Status Data, Mark J. Van Der Laan, Nicholas P. Jewell
U.C. Berkeley Division of Biostatistics Working Paper Series
In many applications, it is often of interest to estimate a bivariate distribution of two survival random variables. Complete observation of such random variables is often incomplete. If one only observes whether or not each of the individual survival times exceeds a common observed monitoring time C, then the data structure is referred to as bivariate current status data (Wang and Ding, 2000). For such data, we show that the identifiable part of the joint distribution is represented by three univariate cumulative distribution functions, namely the two marginal cumulative distribution functions, and the bivariate cumulative distribution function evaluated on the …
Self-Consistency: A Fundamental Concept In Statistics, Thaddeus Tarpey, Bernard Flury
Self-Consistency: A Fundamental Concept In Statistics, Thaddeus Tarpey, Bernard Flury
Mathematics and Statistics Faculty Publications
The term ''self-consistency'' was introduced in 1989 by Hastie and Stuetzle to describe the property that each point on a smooth curve or surface is the mean of all points that project orthogonally onto it. We generalize this concept to self-consistent random vectors: a random vector Y is self-consistent for X if E[X|Y] = Y almost surely. This allows us to construct a unified theoretical basis for principal components, principal curves and surfaces, principal points, principal variables, principal modes of variation and other statistical methods. We provide some general results on self-consistent random variables, give …
Generating Unbiased Ratio And Regression Estimators, William (Bill) H. Williams
Generating Unbiased Ratio And Regression Estimators, William (Bill) H. Williams
Publications and Research
Standard ratio and regression are only conditionally unbiased. The paper uses split sample techniques to develop unbiased versions.