Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistical Theory

Selected Works

Institution
Keyword
Publication Year
Publication
File Type

Articles 1 - 30 of 77

Full-Text Articles in Statistics and Probability

Propensity Score Analysis With Matching Weights, Liang Li Jun 2017

Propensity Score Analysis With Matching Weights, Liang Li

Liang Li

The propensity score analysis is one of the most widely used methods for studying the causal treatment effect in observational studies. This paper studies treatment effect estimation with the method of matching weights. This method resembles propensity score matching but offers a number of new features including efficient estimation, rigorous variance calculation, simple asymptotics, statistical tests of balance, clearly identified target population with optimal sampling property, and no need for choosing matching algorithm and caliper size. In addition, we propose the mirror histogram as a useful tool for graphically displaying balance. The method also shares some features of the inverse …


Random Regression Models Based On The Elliptically Contoured Distribution Assumptions With Applications To Longitudinal Data, Alfred A. Bartolucci, Shimin Zheng, Sejong Bae, Karan P. Singh May 2017

Random Regression Models Based On The Elliptically Contoured Distribution Assumptions With Applications To Longitudinal Data, Alfred A. Bartolucci, Shimin Zheng, Sejong Bae, Karan P. Singh

Shimin Zheng

We generalize Lyles et al.’s (2000) random regression models for longitudinal data, accounting for both undetectable values and informative drop-outs in the distribution assumptions. Our models are constructed on the generalized multivariate theory which is based on the Elliptically Contoured Distribution (ECD). The estimation of the fixed parameters in the random regression models are invariant under the normal or the ECD assumptions. For the Human Immunodeficiency Virus Epidemiology Research Study data, ECD models fit the data better than classical normal models according to the Akaike (1974) Information Criterion. We also note that both univariate distributions of the random intercept and …


Evaluation Of Progress Towards The Unaids 90-90-90 Hiv Care Cascade: A Description Of Statistical Methods Used In An Interim Analysis Of The Intervention Communities In The Search Study, Laura Balzer, Joshua Schwab, Mark J. Van Der Laan, Maya L. Petersen Feb 2017

Evaluation Of Progress Towards The Unaids 90-90-90 Hiv Care Cascade: A Description Of Statistical Methods Used In An Interim Analysis Of The Intervention Communities In The Search Study, Laura Balzer, Joshua Schwab, Mark J. Van Der Laan, Maya L. Petersen

Laura B. Balzer

WHO guidelines call for universal antiretroviral treatment, and UNAIDS has set a global target to virally suppress most HIV-positive individuals. Accurate estimates of population-level coverage at each step of the HIV care cascade (testing, treatment, and viral suppression) are needed to assess the effectiveness of "test and treat" strategies implemented to achieve this goal. The data available to inform such estimates, however, are susceptible to informative missingness: the number of HIV-positive individuals in a population is unknown; individuals tested for HIV may not be representative of those whom a testing intervention fails to reach, and HIV-positive individuals with a viral …


Functional Car Models For Spatially Correlated Functional Datasets, Lin Zhang, Veerabhadran Baladandayuthapani, Hongxiao Zhu, Keith A. Baggerly, Tadeusz Majewski, Bogdan Czerniak, Jeffrey S. Morris Jan 2016

Functional Car Models For Spatially Correlated Functional Datasets, Lin Zhang, Veerabhadran Baladandayuthapani, Hongxiao Zhu, Keith A. Baggerly, Tadeusz Majewski, Bogdan Czerniak, Jeffrey S. Morris

Jeffrey S. Morris

We develop a functional conditional autoregressive (CAR) model for spatially correlated data for which functions are collected on areal units of a lattice. Our model performs functional response regression while accounting for spatial correlations with potentially nonseparable and nonstationary covariance structure, in both the space and functional domains. We show theoretically that our construction leads to a CAR model at each functional location, with spatial covariance parameters varying and borrowing strength across the functional domain. Using basis transformation strategies, the nonseparable spatial-functional model is computationally scalable to enormous functional datasets, generalizable to different basis functions, and can be used on …


Auxiliary Likelihood-Based Approximate Bayesian Computation In State Space Models, Worapree Ole Maneesoonthorn Dec 2015

Auxiliary Likelihood-Based Approximate Bayesian Computation In State Space Models, Worapree Ole Maneesoonthorn

Worapree Ole Maneesoonthorn

A new approach to inference in state space models is proposed, using approximate Bayesian computation (ABC). ABC avoids evaluation of an intractable likelihood by matching summary statistics computed from observed data with statistics computed from data simulated from the true process, based on parameter draws from the prior. Draws that produce a 'match' between observed and simulated summaries are retained, and used to estimate the inaccessible posterior; exact inference being feasible only if the statistics are suffi#14;cient. With no reduction to su#14;fficiency being possible in the state space setting, we pursue summaries via the maximization of
an auxiliary likelihood function. …


Shrinkage Estimation For Multivariate Hidden Markov Mixture Models, Mark Fiecas, Jürgen Franke, Rainer Von Sachs, Joseph Tadjuidje Dec 2015

Shrinkage Estimation For Multivariate Hidden Markov Mixture Models, Mark Fiecas, Jürgen Franke, Rainer Von Sachs, Joseph Tadjuidje

Mark Fiecas

Motivated from a changing market environment over time, we consider high-dimensional data such as financial returns, generated by a hidden Markov model which allows for switching between different regimes or states. To get more stable estimates of the covariance matrices of the different states, potentially driven by a number of observations which is small compared to the dimension, we apply shrinkage and combine it with an EM-type algorithm. This approach will yield better estimates a more stable estimates of the covariance matrix, which allows for improved reconstruction of the hidden Markov chain. In addition to a simulation study and the …


Estimation Of Reliability In Multicomponent Stress-Strength Based On Generalized Rayleigh Distribution, Gadde Srinivasa Rao Nov 2015

Estimation Of Reliability In Multicomponent Stress-Strength Based On Generalized Rayleigh Distribution, Gadde Srinivasa Rao

Srinivasa Rao Gadde Dr.

A multicomponent system of k components having strengths following k- independently and identically distributed random variables x1, x2, ..., xk and each component experiencing a random stress Y is considered. The system is regarded as alive only if at least s out of k (s < k) strengths exceed the stress. The reliability of such a system is obtained when strength and stress variates are given by a generalized Rayleigh distribution with different shape parameters. Reliability is estimated using the maximum likelihood (ML) method of estimation in samples drawn from strength and stress distributions; the reliability estimators are compared asymptotically. Monte-Carlo …


An Omnibus Nonparametric Test Of Equality In Distribution For Unknown Functions, Alexander Luedtke, Marco Carone, Mark Van Der Laan Oct 2015

An Omnibus Nonparametric Test Of Equality In Distribution For Unknown Functions, Alexander Luedtke, Marco Carone, Mark Van Der Laan

Alex Luedtke

We present a novel family of nonparametric omnibus tests of the hypothesis that two unknown but estimable functions are equal in distribution when applied to the observed data structure. We developed these tests, which represent a generalization of the maximum mean discrepancy tests described in Gretton et al. [2006], using recent developments from the higher-order pathwise differentiability literature. Despite their complex derivation, the associated test statistics can be expressed rather simply as U-statistics. We study the asymptotic behavior of the proposed tests under the null hypothesis and under both fixed and local alternatives. We provide examples to which our tests …


Nonparametric Methods For Doubly Robust Estimation Of Continuous Treatment Effects, Edward Kennedy, Zongming Ma, Matthew Mchugh, Dylan Small Jun 2015

Nonparametric Methods For Doubly Robust Estimation Of Continuous Treatment Effects, Edward Kennedy, Zongming Ma, Matthew Mchugh, Dylan Small

Edward H. Kennedy

Continuous treatments (e.g., doses) arise often in practice, but available causal effect estimators require either parametric models for the effect curve or else consistent estimation of a single nuisance function. We propose a novel doubly robust kernel smoothing approach, which requires only mild smoothness assumptions on the effect curve and allows for misspecification of either the treatment density or outcome regression. We derive asymptotic properties and also discuss an approach for data-driven bandwidth selection. The methods are illustrated via simulation and in a study of the effect of nurse staffing on hospital readmissions penalties.


Semiparametric Causal Inference In Matched Cohort Studies, Edward Kennedy, Arvid Sjolander, Dylan Small Jun 2015

Semiparametric Causal Inference In Matched Cohort Studies, Edward Kennedy, Arvid Sjolander, Dylan Small

Edward H. Kennedy

Odds ratios can be estimated in case-control studies using standard logistic regression, ignoring the outcome-dependent sampling. In this paper we discuss an analogous result for treatment effects on the treated in matched cohort studies. Specifically, in studies where a sample of treated subjects is observed along with a separate sample of possibly matched controls, we show that efficient and doubly robust estimators of effects on the treated are computationally equivalent to standard estimators, which ignore the matching and exposure-based sampling. This is not the case for general average effects. We also show that matched cohort studies are often more efficient …


Using The Bootstrap For Estimating The Sample Size In Statistical Experiments, Maher Qumsiyeh Feb 2015

Using The Bootstrap For Estimating The Sample Size In Statistical Experiments, Maher Qumsiyeh

Maher Qumsiyeh

Efron’s (1979) Bootstrap has been shown to be an effective method for statistical estimation and testing. It provides better estimates than normal approximations for studentized means, least square estimates and many other statistics of interest. It can be used to select the active factors - factors that have an effect on the response - in experimental designs. This article shows that the bootstrap can be used to determine sample size or the number of runs required to achieve a certain confidence level in statistical experiments.


Comparison Of Re-Sampling Methods To Generalized Linear Models And Transformations In Factorial And Fractional Factorial Designs, Maher Qumsiyeh, Gerald Shaughnessy Feb 2015

Comparison Of Re-Sampling Methods To Generalized Linear Models And Transformations In Factorial And Fractional Factorial Designs, Maher Qumsiyeh, Gerald Shaughnessy

Maher Qumsiyeh

Experimental situations in which observations are not normally distributed frequently occur in practice. A common situation occurs when responses are discrete in nature, for example counts. One way to analyze such experimental data is to use a transformation for the responses; another is to use a link function based on a generalized linear model (GLM) approach. Re-sampling is employed as an alternative method to analyze non-normal, discrete data. Results are compared to those obtained by the previous two methods.


Optimal Restricted Estimation For More Efficient Longitudinal Causal Inference, Edward Kennedy, Marshall Joffe, Dylan Small Dec 2014

Optimal Restricted Estimation For More Efficient Longitudinal Causal Inference, Edward Kennedy, Marshall Joffe, Dylan Small

Edward H. Kennedy

Efficient semiparametric estimation of longitudinal causal effects is often analytically or computationally intractable. We propose a novel restricted estimation approach for increasing efficiency, which can be used with other techniques, is straightforward to implement, and requires no additional modeling assumptions.


A Review Of Frequentist Tests For The 2x2 Binomial Trial, Chris Lloyd Dec 2014

A Review Of Frequentist Tests For The 2x2 Binomial Trial, Chris Lloyd

Chris J. Lloyd

The 2x2 binomial trial is the simplest of data structures yet its statistical analysis and the issues it raises have been debated and revisited for over 70 years. Which analysis should biomedical researchers use in applications? In this review, we consider frequentist tests only, specifically tests with control size either exactly or very close to exactly. These procedures can be classified as conditional and unconditional. Amongst tests motivated by a conditional model, Lancaster’s mid-p and Liebermeister’s test are less conservative than Fisher’s classical test, but do not control type 1 error. Within the conditional framework, only Fisher’s test can be …


An Outlier Robust Block Bootstrap For Small Area Estimation, Payam Mokhtarian, Ray Chambers Mar 2014

An Outlier Robust Block Bootstrap For Small Area Estimation, Payam Mokhtarian, Ray Chambers

Payam Mokhtarian

Small area inference based on mixed models, i.e. models that contain both fixed and random effects, are the industry standard for this field, allowing between area heterogeneity to be represented by random area effects. Use of the linear mixed model is ubiquitous in this context, with maximum likelihood, or its close relative, REML, the standard method for estimating the parameters of this model. These parameter estimates, and in particular the resulting predicted values of the random area effects, are then used to construct empirical best linear unbiased predictors (EBLUPs) of the unknown small area means. It is now well known …


Adaptive Pair-Matching In The Search Trial And Estimation Of The Intervention Effect, Laura Balzer, Maya L. Petersen, Mark J. Van Der Laan Jan 2014

Adaptive Pair-Matching In The Search Trial And Estimation Of The Intervention Effect, Laura Balzer, Maya L. Petersen, Mark J. Van Der Laan

Laura B. Balzer

In randomized trials, pair-matching is an intuitive design strategy to protect study validity and to potentially increase study power. In a common design, candidate units are identified, and their baseline characteristics used to create the best n/2 matched pairs. Within the resulting pairs, the intervention is randomized, and the outcomes measured at the end of follow-up. We consider this design to be adaptive, because the construction of the matched pairs depends on the baseline covariates of all candidate units. As consequence, the observed data cannot be considered as n/2 independent, identically distributed (i.i.d.) pairs of units, as current practice assumes. …


An Asymptotically Minimax Kernel Machine, Debashis Ghosh Jan 2014

An Asymptotically Minimax Kernel Machine, Debashis Ghosh

Debashis Ghosh

Recently, a class of machine learning-inspired procedures, termed kernel machine methods, has been extensively developed in the statistical literature. It has been shown to have large power for a wide class of problems and applications in genomics and brain imaging. Many authors have exploited an equivalence between kernel machines and mixed eects models and used attendant estimation and inferential procedures. In this note, we construct a so-called `adaptively minimax' kernel machine. Such a construction highlights the role of thresholding in the observation space and limits on the interpretability of such kernel machines.


On Likelihood Ratio Tests When Nuisance Parameters Are Present Only Under The Alternative, Cz Di, K-Y Liang Jan 2014

On Likelihood Ratio Tests When Nuisance Parameters Are Present Only Under The Alternative, Cz Di, K-Y Liang

Chongzhi Di

In parametric models, when one or more parameters disappear under the null hypothesis, the likelihood ratio test statistic does not converge to chi-square distributions. Rather, its limiting distribution is shown to be equivalent to that of the supremum of a squared Gaussian process. However, the limiting distribution is analytically intractable for most of examples, and approximation or simulation based methods must be used to calculate the p values. In this article, we investigate conditions under which the asymptotic distributions have analytically tractable forms, based on the principal component decomposition of Gaussian processes. When these conditions are not satisfied, the principal …


Spectral Density Shrinkage For High-Dimensional Time Series, Mark Fiecas, Rainer Von Sachs Dec 2013

Spectral Density Shrinkage For High-Dimensional Time Series, Mark Fiecas, Rainer Von Sachs

Mark Fiecas

Time series data obtained from neurophysiological signals is often high-dimensional and the length of the time series is often short relative to the number of dimensions. Thus, it is difficult or sometimes impossible to compute statistics that are based on the spectral density matrix because these matrices are numerically unstable. In this work, we discuss the importance of regularization for spectral analysis of high-dimensional time series and propose shrinkage estimation for estimating high-dimensional spectral density matrices. The shrinkage estimator is derived from a penalized log-likelihood, and the optimal penalty parameter has a closed-form solution, which can be estimated using the …


Beta Binomial Regression, Joseph M. Hilbe Oct 2013

Beta Binomial Regression, Joseph M. Hilbe

Joseph M Hilbe

Monograph on how to construct, interpret and evaluate beta, beta binomial, and zero inflated beta-binomial regression models. Stata and R code used for examples.


Estimating Effects On Rare Outcomes: Knowledge Is Power, Laura B. Balzer, Mark J. Van Der Laan May 2013

Estimating Effects On Rare Outcomes: Knowledge Is Power, Laura B. Balzer, Mark J. Van Der Laan

Laura B. Balzer

Many of the secondary outcomes in observational studies and randomized trials are rare. Methods for estimating causal effects and associations with rare outcomes, however, are limited, and this represents a missed opportunity for investigation. In this article, we construct a new targeted minimum loss-based estimator (TMLE) for the effect of an exposure or treatment on a rare outcome. We focus on the causal risk difference and statistical models incorporating bounds on the conditional risk of the outcome, given the exposure and covariates. By construction, the proposed estimator constrains the predicted outcomes to respect this model knowledge. Theoretically, this bounding provides …


On The Exact Size Of Multiple Comparison Tests, Chris Lloyd Dec 2012

On The Exact Size Of Multiple Comparison Tests, Chris Lloyd

Chris J. Lloyd

No abstract provided.


Theory And Methods For Gini Coefficients Partitioned By Quantile Range, Chaitra Nagaraja Dec 2012

Theory And Methods For Gini Coefficients Partitioned By Quantile Range, Chaitra Nagaraja

Chaitra H Nagaraja

The Gini coefficient is frequently used to measure inequality in populations. However, it is possible that inequality levels may change over time differently for disparate subgroups which cannot be detected with population-level estimates only. Therefore, it may be informative to examine inequality separately for these segments. The case where the population is split into two segments based on non-overlapping quantile ranges is examined. Asymptotic theory is derived and practical methods to estimate standard errors and construct confidence intervals using resampling methods are developed. An application to per capita income across census tracts using American Community Survey data is considered.


Fixed Bandwidth Theory For Tail Index Estimation, Tucker Mcelroy, Chaitra H. Nagaraja Dec 2012

Fixed Bandwidth Theory For Tail Index Estimation, Tucker Mcelroy, Chaitra H. Nagaraja

Chaitra H Nagaraja

No abstract provided.


On The Size Accuracy Of Combination Tests, Chris Lloyd Dec 2012

On The Size Accuracy Of Combination Tests, Chris Lloyd

Chris J. Lloyd

One element of the analysis of adaptive clinical trials is combining the evidence from several (often two) stages. When the endpoint is binary, standard single stage tests statistics do not control size well. Yet the combined test might not be valid if the single stage tests are not. The purpose of this paper is to numerically and theoretically examine the extent to which combining basic tests statistics mitigates or magnifies the size violation of the final test.


Obtaining Critical Values For Test Of Markov Regime Switching, Douglas G. Steigerwald, Valerie Bostwick Oct 2012

Obtaining Critical Values For Test Of Markov Regime Switching, Douglas G. Steigerwald, Valerie Bostwick

Douglas G. Steigerwald

For Markov regime-switching models, testing for the possible presence of more than one regime requires the use of a non-standard test statistic. Carter and Steigerwald (forthcoming, Journal of Econometric Methods) derive in detail the analytic steps needed to implement the test ofMarkov regime-switching proposed by Cho and White (2007, Econometrica). We summarize the implementation steps and address the computational issues that arise. A new command to compute regime-switching critical values, rscv, is introduced and presented in the context of empirical research.


Big Data And The Future, Sherri Rose Jul 2012

Big Data And The Future, Sherri Rose

Sherri Rose

No abstract provided.


Targeted Maximum Likelihood Estimation For Dynamic Treatment Regimes In Sequential Randomized Controlled Trials, Paul Chaffee, Mark J. Van Der Laan Jun 2012

Targeted Maximum Likelihood Estimation For Dynamic Treatment Regimes In Sequential Randomized Controlled Trials, Paul Chaffee, Mark J. Van Der Laan

Paul H. Chaffee

Sequential Randomized Controlled Trials (SRCTs) are rapidly becoming essential tools in the search for optimized treatment regimes in ongoing treatment settings. Analyzing data for multiple time-point treatments with a view toward optimal treatment regimes is of interest in many types of afflictions: HIV infection, Attention Deficit Hyperactivity Disorder in children, leukemia, prostate cancer, renal failure, and many others. Methods for analyzing data from SRCTs exist but they are either inefficient or suffer from the drawbacks of estimating equation methodology. We describe an estimation procedure, targeted maximum likelihood estimation (TMLE), which has been fully developed and implemented in point treatment settings, …


Variances For Maximum Penalized Likelihood Estimates Obtained Via The Em Algorithm, Mark Segal, Peter Bacchetti, Nicholas Jewell Apr 2012

Variances For Maximum Penalized Likelihood Estimates Obtained Via The Em Algorithm, Mark Segal, Peter Bacchetti, Nicholas Jewell

Mark R Segal

We address the problem of providing variances for parameter estimates obtained under a penalized likelihood formulation through use of the EM algorithm. The proposed solution represents a synthesis of two existent techniques. Firstly, we exploit the supplemented EM algorithm developed in Meng and Rubin (1991) that provides variance estimates for maximum likelihood estimates obtained via the EM algorithm. Their procedure relies on evaluating the Jacobian of the mapping induced by the EM algorithm. Secondly, we utilize a result from Green (1990) that provides an expression for the Jacobian of the mapping induced by the EM algorithm applied to a penalized …


Backcalculation Of Hiv Infection Rates, Peter Bacchetti, Mark Segal, Nicholas Jewell Apr 2012

Backcalculation Of Hiv Infection Rates, Peter Bacchetti, Mark Segal, Nicholas Jewell

Mark R Segal

Backcalculation is an important method of reconstructing past rates of human immunodeficiency virus (HIV) infection and for estimating current prevalence of HIV infection and future incidence of acquired immunodeficiency syndrome (AIDS). This paper reviews the backcalculation techniques, focusing on the key assumptions of the method, including the necessary information regarding incubation, reporting delay, and models for the infection curve. A summary is given of the extent to which the appropriate external information is available and whether checks of the relevant assumptions are possible through use of data on AIDS incidence from surveillance systems. A likelihood approach to backcalculation is described …