Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistical Theory

Selected Works

Keyword
Publication Year
Publication
File Type

Articles 1 - 30 of 59

Full-Text Articles in Statistics and Probability

Propensity Score Analysis With Matching Weights, Liang Li Jun 2017

Propensity Score Analysis With Matching Weights, Liang Li

Liang Li

The propensity score analysis is one of the most widely used methods for studying the causal treatment effect in observational studies. This paper studies treatment effect estimation with the method of matching weights. This method resembles propensity score matching but offers a number of new features including efficient estimation, rigorous variance calculation, simple asymptotics, statistical tests of balance, clearly identified target population with optimal sampling property, and no need for choosing matching algorithm and caliper size. In addition, we propose the mirror histogram as a useful tool for graphically displaying balance. The method also shares some features of the inverse …


Random Regression Models Based On The Elliptically Contoured Distribution Assumptions With Applications To Longitudinal Data, Alfred A. Bartolucci, Shimin Zheng, Sejong Bae, Karan P. Singh May 2017

Random Regression Models Based On The Elliptically Contoured Distribution Assumptions With Applications To Longitudinal Data, Alfred A. Bartolucci, Shimin Zheng, Sejong Bae, Karan P. Singh

Shimin Zheng

We generalize Lyles et al.’s (2000) random regression models for longitudinal data, accounting for both undetectable values and informative drop-outs in the distribution assumptions. Our models are constructed on the generalized multivariate theory which is based on the Elliptically Contoured Distribution (ECD). The estimation of the fixed parameters in the random regression models are invariant under the normal or the ECD assumptions. For the Human Immunodeficiency Virus Epidemiology Research Study data, ECD models fit the data better than classical normal models according to the Akaike (1974) Information Criterion. We also note that both univariate distributions of the random intercept and …


Evaluation Of Progress Towards The Unaids 90-90-90 Hiv Care Cascade: A Description Of Statistical Methods Used In An Interim Analysis Of The Intervention Communities In The Search Study, Laura Balzer, Joshua Schwab, Mark J. Van Der Laan, Maya L. Petersen Feb 2017

Evaluation Of Progress Towards The Unaids 90-90-90 Hiv Care Cascade: A Description Of Statistical Methods Used In An Interim Analysis Of The Intervention Communities In The Search Study, Laura Balzer, Joshua Schwab, Mark J. Van Der Laan, Maya L. Petersen

Laura B. Balzer

WHO guidelines call for universal antiretroviral treatment, and UNAIDS has set a global target to virally suppress most HIV-positive individuals. Accurate estimates of population-level coverage at each step of the HIV care cascade (testing, treatment, and viral suppression) are needed to assess the effectiveness of "test and treat" strategies implemented to achieve this goal. The data available to inform such estimates, however, are susceptible to informative missingness: the number of HIV-positive individuals in a population is unknown; individuals tested for HIV may not be representative of those whom a testing intervention fails to reach, and HIV-positive individuals with a viral …


Auxiliary Likelihood-Based Approximate Bayesian Computation In State Space Models, Worapree Ole Maneesoonthorn Dec 2015

Auxiliary Likelihood-Based Approximate Bayesian Computation In State Space Models, Worapree Ole Maneesoonthorn

Worapree Ole Maneesoonthorn

A new approach to inference in state space models is proposed, using approximate Bayesian computation (ABC). ABC avoids evaluation of an intractable likelihood by matching summary statistics computed from observed data with statistics computed from data simulated from the true process, based on parameter draws from the prior. Draws that produce a 'match' between observed and simulated summaries are retained, and used to estimate the inaccessible posterior; exact inference being feasible only if the statistics are suffi#14;cient. With no reduction to su#14;fficiency being possible in the state space setting, we pursue summaries via the maximization of
an auxiliary likelihood function. …


Shrinkage Estimation For Multivariate Hidden Markov Mixture Models, Mark Fiecas, Jürgen Franke, Rainer Von Sachs, Joseph Tadjuidje Dec 2015

Shrinkage Estimation For Multivariate Hidden Markov Mixture Models, Mark Fiecas, Jürgen Franke, Rainer Von Sachs, Joseph Tadjuidje

Mark Fiecas

Motivated from a changing market environment over time, we consider high-dimensional data such as financial returns, generated by a hidden Markov model which allows for switching between different regimes or states. To get more stable estimates of the covariance matrices of the different states, potentially driven by a number of observations which is small compared to the dimension, we apply shrinkage and combine it with an EM-type algorithm. This approach will yield better estimates a more stable estimates of the covariance matrix, which allows for improved reconstruction of the hidden Markov chain. In addition to a simulation study and the …


Estimation Of Reliability In Multicomponent Stress-Strength Based On Generalized Rayleigh Distribution, Gadde Srinivasa Rao Nov 2015

Estimation Of Reliability In Multicomponent Stress-Strength Based On Generalized Rayleigh Distribution, Gadde Srinivasa Rao

Srinivasa Rao Gadde Dr.

A multicomponent system of k components having strengths following k- independently and identically distributed random variables x1, x2, ..., xk and each component experiencing a random stress Y is considered. The system is regarded as alive only if at least s out of k (s < k) strengths exceed the stress. The reliability of such a system is obtained when strength and stress variates are given by a generalized Rayleigh distribution with different shape parameters. Reliability is estimated using the maximum likelihood (ML) method of estimation in samples drawn from strength and stress distributions; the reliability estimators are compared asymptotically. Monte-Carlo …


An Omnibus Nonparametric Test Of Equality In Distribution For Unknown Functions, Alexander Luedtke, Marco Carone, Mark Van Der Laan Oct 2015

An Omnibus Nonparametric Test Of Equality In Distribution For Unknown Functions, Alexander Luedtke, Marco Carone, Mark Van Der Laan

Alex Luedtke

We present a novel family of nonparametric omnibus tests of the hypothesis that two unknown but estimable functions are equal in distribution when applied to the observed data structure. We developed these tests, which represent a generalization of the maximum mean discrepancy tests described in Gretton et al. [2006], using recent developments from the higher-order pathwise differentiability literature. Despite their complex derivation, the associated test statistics can be expressed rather simply as U-statistics. We study the asymptotic behavior of the proposed tests under the null hypothesis and under both fixed and local alternatives. We provide examples to which our tests …


Nonparametric Methods For Doubly Robust Estimation Of Continuous Treatment Effects, Edward Kennedy, Zongming Ma, Matthew Mchugh, Dylan Small Jun 2015

Nonparametric Methods For Doubly Robust Estimation Of Continuous Treatment Effects, Edward Kennedy, Zongming Ma, Matthew Mchugh, Dylan Small

Edward H. Kennedy

Continuous treatments (e.g., doses) arise often in practice, but available causal effect estimators require either parametric models for the effect curve or else consistent estimation of a single nuisance function. We propose a novel doubly robust kernel smoothing approach, which requires only mild smoothness assumptions on the effect curve and allows for misspecification of either the treatment density or outcome regression. We derive asymptotic properties and also discuss an approach for data-driven bandwidth selection. The methods are illustrated via simulation and in a study of the effect of nurse staffing on hospital readmissions penalties.


Semiparametric Causal Inference In Matched Cohort Studies, Edward Kennedy, Arvid Sjolander, Dylan Small Jun 2015

Semiparametric Causal Inference In Matched Cohort Studies, Edward Kennedy, Arvid Sjolander, Dylan Small

Edward H. Kennedy

Odds ratios can be estimated in case-control studies using standard logistic regression, ignoring the outcome-dependent sampling. In this paper we discuss an analogous result for treatment effects on the treated in matched cohort studies. Specifically, in studies where a sample of treated subjects is observed along with a separate sample of possibly matched controls, we show that efficient and doubly robust estimators of effects on the treated are computationally equivalent to standard estimators, which ignore the matching and exposure-based sampling. This is not the case for general average effects. We also show that matched cohort studies are often more efficient …


Using The Bootstrap For Estimating The Sample Size In Statistical Experiments, Maher Qumsiyeh Feb 2015

Using The Bootstrap For Estimating The Sample Size In Statistical Experiments, Maher Qumsiyeh

Maher Qumsiyeh

Efron’s (1979) Bootstrap has been shown to be an effective method for statistical estimation and testing. It provides better estimates than normal approximations for studentized means, least square estimates and many other statistics of interest. It can be used to select the active factors - factors that have an effect on the response - in experimental designs. This article shows that the bootstrap can be used to determine sample size or the number of runs required to achieve a certain confidence level in statistical experiments.


Comparison Of Re-Sampling Methods To Generalized Linear Models And Transformations In Factorial And Fractional Factorial Designs, Maher Qumsiyeh, Gerald Shaughnessy Feb 2015

Comparison Of Re-Sampling Methods To Generalized Linear Models And Transformations In Factorial And Fractional Factorial Designs, Maher Qumsiyeh, Gerald Shaughnessy

Maher Qumsiyeh

Experimental situations in which observations are not normally distributed frequently occur in practice. A common situation occurs when responses are discrete in nature, for example counts. One way to analyze such experimental data is to use a transformation for the responses; another is to use a link function based on a generalized linear model (GLM) approach. Re-sampling is employed as an alternative method to analyze non-normal, discrete data. Results are compared to those obtained by the previous two methods.


Optimal Restricted Estimation For More Efficient Longitudinal Causal Inference, Edward Kennedy, Marshall Joffe, Dylan Small Dec 2014

Optimal Restricted Estimation For More Efficient Longitudinal Causal Inference, Edward Kennedy, Marshall Joffe, Dylan Small

Edward H. Kennedy

Efficient semiparametric estimation of longitudinal causal effects is often analytically or computationally intractable. We propose a novel restricted estimation approach for increasing efficiency, which can be used with other techniques, is straightforward to implement, and requires no additional modeling assumptions.


A Review Of Frequentist Tests For The 2x2 Binomial Trial, Chris Lloyd Dec 2014

A Review Of Frequentist Tests For The 2x2 Binomial Trial, Chris Lloyd

Chris J. Lloyd

The 2x2 binomial trial is the simplest of data structures yet its statistical analysis and the issues it raises have been debated and revisited for over 70 years. Which analysis should biomedical researchers use in applications? In this review, we consider frequentist tests only, specifically tests with control size either exactly or very close to exactly. These procedures can be classified as conditional and unconditional. Amongst tests motivated by a conditional model, Lancaster’s mid-p and Liebermeister’s test are less conservative than Fisher’s classical test, but do not control type 1 error. Within the conditional framework, only Fisher’s test can be …


An Outlier Robust Block Bootstrap For Small Area Estimation, Payam Mokhtarian, Ray Chambers Mar 2014

An Outlier Robust Block Bootstrap For Small Area Estimation, Payam Mokhtarian, Ray Chambers

Payam Mokhtarian

Small area inference based on mixed models, i.e. models that contain both fixed and random effects, are the industry standard for this field, allowing between area heterogeneity to be represented by random area effects. Use of the linear mixed model is ubiquitous in this context, with maximum likelihood, or its close relative, REML, the standard method for estimating the parameters of this model. These parameter estimates, and in particular the resulting predicted values of the random area effects, are then used to construct empirical best linear unbiased predictors (EBLUPs) of the unknown small area means. It is now well known …


Adaptive Pair-Matching In The Search Trial And Estimation Of The Intervention Effect, Laura Balzer, Maya L. Petersen, Mark J. Van Der Laan Jan 2014

Adaptive Pair-Matching In The Search Trial And Estimation Of The Intervention Effect, Laura Balzer, Maya L. Petersen, Mark J. Van Der Laan

Laura B. Balzer

In randomized trials, pair-matching is an intuitive design strategy to protect study validity and to potentially increase study power. In a common design, candidate units are identified, and their baseline characteristics used to create the best n/2 matched pairs. Within the resulting pairs, the intervention is randomized, and the outcomes measured at the end of follow-up. We consider this design to be adaptive, because the construction of the matched pairs depends on the baseline covariates of all candidate units. As consequence, the observed data cannot be considered as n/2 independent, identically distributed (i.i.d.) pairs of units, as current practice assumes. …


Spectral Density Shrinkage For High-Dimensional Time Series, Mark Fiecas, Rainer Von Sachs Dec 2013

Spectral Density Shrinkage For High-Dimensional Time Series, Mark Fiecas, Rainer Von Sachs

Mark Fiecas

Time series data obtained from neurophysiological signals is often high-dimensional and the length of the time series is often short relative to the number of dimensions. Thus, it is difficult or sometimes impossible to compute statistics that are based on the spectral density matrix because these matrices are numerically unstable. In this work, we discuss the importance of regularization for spectral analysis of high-dimensional time series and propose shrinkage estimation for estimating high-dimensional spectral density matrices. The shrinkage estimator is derived from a penalized log-likelihood, and the optimal penalty parameter has a closed-form solution, which can be estimated using the …


Estimating Effects On Rare Outcomes: Knowledge Is Power, Laura B. Balzer, Mark J. Van Der Laan May 2013

Estimating Effects On Rare Outcomes: Knowledge Is Power, Laura B. Balzer, Mark J. Van Der Laan

Laura B. Balzer

Many of the secondary outcomes in observational studies and randomized trials are rare. Methods for estimating causal effects and associations with rare outcomes, however, are limited, and this represents a missed opportunity for investigation. In this article, we construct a new targeted minimum loss-based estimator (TMLE) for the effect of an exposure or treatment on a rare outcome. We focus on the causal risk difference and statistical models incorporating bounds on the conditional risk of the outcome, given the exposure and covariates. By construction, the proposed estimator constrains the predicted outcomes to respect this model knowledge. Theoretically, this bounding provides …


On The Exact Size Of Multiple Comparison Tests, Chris Lloyd Dec 2012

On The Exact Size Of Multiple Comparison Tests, Chris Lloyd

Chris J. Lloyd

No abstract provided.


Theory And Methods For Gini Coefficients Partitioned By Quantile Range, Chaitra Nagaraja Dec 2012

Theory And Methods For Gini Coefficients Partitioned By Quantile Range, Chaitra Nagaraja

Chaitra H Nagaraja

The Gini coefficient is frequently used to measure inequality in populations. However, it is possible that inequality levels may change over time differently for disparate subgroups which cannot be detected with population-level estimates only. Therefore, it may be informative to examine inequality separately for these segments. The case where the population is split into two segments based on non-overlapping quantile ranges is examined. Asymptotic theory is derived and practical methods to estimate standard errors and construct confidence intervals using resampling methods are developed. An application to per capita income across census tracts using American Community Survey data is considered.


Fixed Bandwidth Theory For Tail Index Estimation, Tucker Mcelroy, Chaitra H. Nagaraja Dec 2012

Fixed Bandwidth Theory For Tail Index Estimation, Tucker Mcelroy, Chaitra H. Nagaraja

Chaitra H Nagaraja

No abstract provided.


On The Size Accuracy Of Combination Tests, Chris Lloyd Dec 2012

On The Size Accuracy Of Combination Tests, Chris Lloyd

Chris J. Lloyd

One element of the analysis of adaptive clinical trials is combining the evidence from several (often two) stages. When the endpoint is binary, standard single stage tests statistics do not control size well. Yet the combined test might not be valid if the single stage tests are not. The purpose of this paper is to numerically and theoretically examine the extent to which combining basic tests statistics mitigates or magnifies the size violation of the final test.


Obtaining Critical Values For Test Of Markov Regime Switching, Douglas G. Steigerwald, Valerie Bostwick Oct 2012

Obtaining Critical Values For Test Of Markov Regime Switching, Douglas G. Steigerwald, Valerie Bostwick

Douglas G. Steigerwald

For Markov regime-switching models, testing for the possible presence of more than one regime requires the use of a non-standard test statistic. Carter and Steigerwald (forthcoming, Journal of Econometric Methods) derive in detail the analytic steps needed to implement the test ofMarkov regime-switching proposed by Cho and White (2007, Econometrica). We summarize the implementation steps and address the computational issues that arise. A new command to compute regime-switching critical values, rscv, is introduced and presented in the context of empirical research.


Big Data And The Future, Sherri Rose Jul 2012

Big Data And The Future, Sherri Rose

Sherri Rose

No abstract provided.


Variances For Maximum Penalized Likelihood Estimates Obtained Via The Em Algorithm, Mark Segal, Peter Bacchetti, Nicholas Jewell Apr 2012

Variances For Maximum Penalized Likelihood Estimates Obtained Via The Em Algorithm, Mark Segal, Peter Bacchetti, Nicholas Jewell

Mark R Segal

We address the problem of providing variances for parameter estimates obtained under a penalized likelihood formulation through use of the EM algorithm. The proposed solution represents a synthesis of two existent techniques. Firstly, we exploit the supplemented EM algorithm developed in Meng and Rubin (1991) that provides variance estimates for maximum likelihood estimates obtained via the EM algorithm. Their procedure relies on evaluating the Jacobian of the mapping induced by the EM algorithm. Secondly, we utilize a result from Green (1990) that provides an expression for the Jacobian of the mapping induced by the EM algorithm applied to a penalized …


Backcalculation Of Hiv Infection Rates, Peter Bacchetti, Mark Segal, Nicholas Jewell Apr 2012

Backcalculation Of Hiv Infection Rates, Peter Bacchetti, Mark Segal, Nicholas Jewell

Mark R Segal

Backcalculation is an important method of reconstructing past rates of human immunodeficiency virus (HIV) infection and for estimating current prevalence of HIV infection and future incidence of acquired immunodeficiency syndrome (AIDS). This paper reviews the backcalculation techniques, focusing on the key assumptions of the method, including the necessary information regarding incubation, reporting delay, and models for the infection curve. A summary is given of the extent to which the appropriate external information is available and whether checks of the relevant assumptions are possible through use of data on AIDS incidence from surveillance systems. A likelihood approach to backcalculation is described …


Loss Function Based Ranking In Two-Stage, Hierarchical Models, Rongheng Lin, Thomas A. Louis, Susan M. Paddock, Greg Ridgeway Mar 2012

Loss Function Based Ranking In Two-Stage, Hierarchical Models, Rongheng Lin, Thomas A. Louis, Susan M. Paddock, Greg Ridgeway

Rongheng Lin

Several authors have studied the performance of optimal, squared error loss (SEL) estimated ranks. Though these are effective, in many applications interest focuses on identifying the relatively good (e.g., in the upper 10%) or relatively poor performers. We construct loss functions that address this goal and evaluate candidate rank estimates, some of which optimize specific loss functions. We study performance for a fully parametric hierarchical model with a Gaussian prior and Gaussian sampling distributions, evaluating performance for several loss functions. Results show that though SEL-optimal ranks and percentiles do not specifically focus on classifying with respect to a percentile cut …


Testing For Regime Swtiching: A Comment, Douglas Steigerwald, Andrew Carter Dec 2011

Testing For Regime Swtiching: A Comment, Douglas Steigerwald, Andrew Carter

Douglas G. Steigerwald

An autoregressive model with Markov-regime switching is analyzed that reflects on the properties of the quasi-likelihood ratio test developed by Cho and White (2007). For such a model, we show that consistency of the quasi-maximum likelihood estimator for the population parameter values, on which consistency of the test is based, does not hold. We describe a condition that ensures consistency of the estimator and discuss the consistency of the test in the absence of consistency of the estimator.


Some Non-Asymptotic Properties Of Parametric Bootstrap P-Values, Chris Lloyd Dec 2011

Some Non-Asymptotic Properties Of Parametric Bootstrap P-Values, Chris Lloyd

Chris J. Lloyd

The bootstrap P-value is the exact tail probability of a test statistic, cal-culated assuming the nuisance parameter equals the null maximum likelihood (ML) estimate. For discrete data, bootstrap P-values perform amazingly well even for small samples, even as standard first order methods perform surprisingly poorly. Why is this? Detailed numerical calculations in Lloyd (2012a) strongly suggest that the good performance of bootstrap is not explained by asymptotics. In this paper, I establish several desirable non-asymptotic properties of bootstrap P-values. The most important of these is that bootstrap will correct ‘bad’ ordering of the sample space which leads to a more …


Asymptotic Theory For Cross-Validated Targeted Maximum Likelihood Estimation, Wenjing Zheng, Mark J. Van Der Laan Jul 2011

Asymptotic Theory For Cross-Validated Targeted Maximum Likelihood Estimation, Wenjing Zheng, Mark J. Van Der Laan

Wenjing Zheng

We consider a targeted maximum likelihood estimator of a path-wise differentiable parameter of the data generating distribution in a semi-parametric model based on observing n independent and identically distributed observations. The targeted maximum likelihood estimator (TMLE) uses V-fold sample splitting for the initial estimator in order to make the TMLE maximally robust in its bias reduction step. We prove a general theorem that states asymptotic efficiency (and thereby regularity) of the targeted maximum likelihood estimator when the initial estimator is consistent and a second order term converges to zero in probability at a rate faster than the square root of …


Cross-Validated Targeted Minimum-Loss-Based Estimation, Wenjing Zheng, Mark Van Der Laan Dec 2010

Cross-Validated Targeted Minimum-Loss-Based Estimation, Wenjing Zheng, Mark Van Der Laan

Wenjing Zheng

No abstract provided.