Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistical Theory

Selected Works

Institution
Keyword
Publication Year
Publication
File Type

Articles 31 - 60 of 77

Full-Text Articles in Physical Sciences and Mathematics

Loss Function Based Ranking In Two-Stage, Hierarchical Models, Rongheng Lin, Thomas A. Louis, Susan M. Paddock, Greg Ridgeway Mar 2012

Loss Function Based Ranking In Two-Stage, Hierarchical Models, Rongheng Lin, Thomas A. Louis, Susan M. Paddock, Greg Ridgeway

Rongheng Lin

Several authors have studied the performance of optimal, squared error loss (SEL) estimated ranks. Though these are effective, in many applications interest focuses on identifying the relatively good (e.g., in the upper 10%) or relatively poor performers. We construct loss functions that address this goal and evaluate candidate rank estimates, some of which optimize specific loss functions. We study performance for a fully parametric hierarchical model with a Gaussian prior and Gaussian sampling distributions, evaluating performance for several loss functions. Results show that though SEL-optimal ranks and percentiles do not specifically focus on classifying with respect to a percentile cut …


Simulating Non-Normal Distributions With Specified L-Moments And L-Correlations, Todd C. Headrick, Mohan D. Pant Jan 2012

Simulating Non-Normal Distributions With Specified L-Moments And L-Correlations, Todd C. Headrick, Mohan D. Pant

Todd Christopher Headrick

This paper derives a procedure for simulating continuous non-normal distributions with specified L-moments and L-correlations in the context of power method polynomials of order three. It is demonstrated that the proposed procedure has computational advantages over the traditional product-moment procedure in terms of solving for intermediate correlations. Simulation results also demonstrate that the proposed L-moment-based procedure is an attractive alternative to the traditional procedure when distributions with more severe departures from normality are considered. Specifically, estimates of L-skew and L-kurtosis are superior to the conventional estimates of skew and kurtosis in terms of both relative bias and relative standard error. …


Proportional Mean Residual Life Model For Right-Censored Length-Biased Data, Gary Kwun Chuen Chan, Ying Qing Chen, Chongzhi Di Jan 2012

Proportional Mean Residual Life Model For Right-Censored Length-Biased Data, Gary Kwun Chuen Chan, Ying Qing Chen, Chongzhi Di

Chongzhi Di

To study disease association with risk factors in epidemiologic studies, cross-sectional sampling is often more focused and less costly for recruiting study subjects who have already experienced initiating events. For time-to-event outcome, however, such a sampling strategy may be length-biased. Coupled with censoring, analysis of length-biased data can be quite challenging, due to the so-called “induced informative censoring” in which the survival time and censoring time are correlated through a common backward recurrence time. We propose to use the proportional mean residual life model of Oakes and Dasu (1990) for analysis of censored length-biased survival data. Several nonstandard data structures, …


Testing For Regime Swtiching: A Comment, Douglas Steigerwald, Andrew Carter Dec 2011

Testing For Regime Swtiching: A Comment, Douglas Steigerwald, Andrew Carter

Douglas G. Steigerwald

An autoregressive model with Markov-regime switching is analyzed that reflects on the properties of the quasi-likelihood ratio test developed by Cho and White (2007). For such a model, we show that consistency of the quasi-maximum likelihood estimator for the population parameter values, on which consistency of the test is based, does not hold. We describe a condition that ensures consistency of the estimator and discuss the consistency of the test in the absence of consistency of the estimator.


Some Non-Asymptotic Properties Of Parametric Bootstrap P-Values, Chris Lloyd Dec 2011

Some Non-Asymptotic Properties Of Parametric Bootstrap P-Values, Chris Lloyd

Chris J. Lloyd

The bootstrap P-value is the exact tail probability of a test statistic, cal-culated assuming the nuisance parameter equals the null maximum likelihood (ML) estimate. For discrete data, bootstrap P-values perform amazingly well even for small samples, even as standard first order methods perform surprisingly poorly. Why is this? Detailed numerical calculations in Lloyd (2012a) strongly suggest that the good performance of bootstrap is not explained by asymptotics. In this paper, I establish several desirable non-asymptotic properties of bootstrap P-values. The most important of these is that bootstrap will correct ‘bad’ ordering of the sample space which leads to a more …


Asymptotic Theory For Cross-Validated Targeted Maximum Likelihood Estimation, Wenjing Zheng, Mark J. Van Der Laan Jul 2011

Asymptotic Theory For Cross-Validated Targeted Maximum Likelihood Estimation, Wenjing Zheng, Mark J. Van Der Laan

Wenjing Zheng

We consider a targeted maximum likelihood estimator of a path-wise differentiable parameter of the data generating distribution in a semi-parametric model based on observing n independent and identically distributed observations. The targeted maximum likelihood estimator (TMLE) uses V-fold sample splitting for the initial estimator in order to make the TMLE maximally robust in its bias reduction step. We prove a general theorem that states asymptotic efficiency (and thereby regularity) of the targeted maximum likelihood estimator when the initial estimator is consistent and a second order term converges to zero in probability at a rate faster than the square root of …


Multilevel Latent Class Models With Dirichlet Mixing Distribution, Chong-Zhi Di, Karen Bandeen-Roche Jan 2011

Multilevel Latent Class Models With Dirichlet Mixing Distribution, Chong-Zhi Di, Karen Bandeen-Roche

Chongzhi Di

Latent class analysis (LCA) and latent class regression (LCR) are widely used for modeling multivariate categorical outcomes in social sciences and biomedical studies. Standard analyses assume data of different respondents to be mutually independent, excluding application of the methods to familial and other designs in which participants are clustered. In this paper, we consider multilevel latent class models, in which sub-population mixing probabilities are treated as random effects that vary among clusters according to a common Dirichlet distribution. We apply the Expectation-Maximization (EM) algorithm for model fitting by maximum likelihood (ML). This approach works well, but is computationally intensive when …


Likelihood Ratio Testing For Admixture Models With Application To Genetic Linkage Analysis, Chong-Zhi Di, Kung-Yee Liang Jan 2011

Likelihood Ratio Testing For Admixture Models With Application To Genetic Linkage Analysis, Chong-Zhi Di, Kung-Yee Liang

Chongzhi Di

We consider likelihood ratio tests (LRT) and their modifications for homogeneity in admixture models. The admixture model is a special case of two component mixture model, where one component is indexed by an unknown parameter while the parameter value for the other component is known. It has been widely used in genetic linkage analysis under heterogeneity, in which the kernel distribution is binomial. For such models, it is long recognized that testing for homogeneity is nonstandard and the LRT statistic does not converge to a conventional 2 distribution. In this paper, we investigate the asymptotic behavior of the LRT for …


Cross-Validated Targeted Minimum-Loss-Based Estimation, Wenjing Zheng, Mark Van Der Laan Dec 2010

Cross-Validated Targeted Minimum-Loss-Based Estimation, Wenjing Zheng, Mark Van Der Laan

Wenjing Zheng

No abstract provided.


Accurately Sized Test Statistics With Misspecified Conditional Homoskedasticity, Douglas Steigerwald, Jack Erb Dec 2010

Accurately Sized Test Statistics With Misspecified Conditional Homoskedasticity, Douglas Steigerwald, Jack Erb

Douglas G. Steigerwald

We study the finite-sample performance of test statistics in linear regression models where the error dependence is of unknown form. With an unknown dependence structure there is traditionally a trade-off between the maximum lag over which the correlation is estimated (the bandwidth) and the amount of heterogeneity in the process. When allowing for heterogeneity, through conditional heteroskedasticity, the correlation at far lags is generally omitted and the resultant inflation of the empirical size of test statistics has long been recognized. To allow for correlation at far lags we study test statistics constructed under the possibly misspecified assumption of conditional homoskedasticity. …


The Underground Economy Of Fake Antivirus Software, Douglas Steigerwald, Brett Stone-Gross, Ryan Abman, Richard Kemmerer, Christopher Kruegel, Giovanni Vigna Dec 2010

The Underground Economy Of Fake Antivirus Software, Douglas Steigerwald, Brett Stone-Gross, Ryan Abman, Richard Kemmerer, Christopher Kruegel, Giovanni Vigna

Douglas G. Steigerwald

Fake antivirus (AV) programs have been utilized to defraud millions of computer users into paying as much as one hundred dollars for a phony software license. As a result, fake AV software has evolved into one of the most lucrative criminal operations on the Internet. In this paper, we examine the operations of three large-scale fake AV businesses, lasting from three months to more than two years. More precisely, we present the results of our analysis on a trove of data obtained from several backend servers that the cybercriminals used to drive their scam operations. Our investigations reveal that these …


Computing Highly Accurate Confidence Limits From Discrete Data Using Importance Sampling, Chris Lloyd Dec 2010

Computing Highly Accurate Confidence Limits From Discrete Data Using Importance Sampling, Chris Lloyd

Chris J. Lloyd

For discrete parametric models, approximate confidence limits perform poorly from a strict frequentist perspective. In principle, exact and optimal confidence limits can be computed using the formula of Buehler (1957), Lloyd and Kabaila (2003). So-called profile upper limits (Kabaila \& Lloyd, 2001) are closely related to Buehler limits and have extremely good properties. Both profile and Buehler limits depend on the probability of a certain tail set as a function of the unknown parameters. Unfortunately, this probability surface is not computable for realistic models. In this paper, importance sampling is used to estimate the surface and hence the confidence limits. …


Curriculum Vitae, Tatiyana V. Apanasovich Oct 2010

Curriculum Vitae, Tatiyana V. Apanasovich

Tatiyana V Apanasovich

No abstract provided.


Computing Highly Accurate Or Exact P-Values Using Importance Sampling (Revised), Chris Lloyd Jan 2010

Computing Highly Accurate Or Exact P-Values Using Importance Sampling (Revised), Chris Lloyd

Chris J. Lloyd

Especially for discrete data, standard first order P-values can suffer from poor accuracy, even for quite large sample sizes. Moreover, different test statistics can give practically different results. There are several approaches to computing P-values which do not suffer these defects, such as parametric bootstrap P-values or the partially maximised P-values of Berger & Boos (1994).

Both these methods require computing the exact tail probability of the approximate P-value as a function of the nuisance parameter/s, known as the significance profile. For most practical problems this is not computationally feasible. I develop an importance sampling approach to this problem. A …


Statistical Simulation: Power Method Polynomials And Other Transformations, Todd C. Headrick Jan 2010

Statistical Simulation: Power Method Polynomials And Other Transformations, Todd C. Headrick

Todd Christopher Headrick

Although power method polynomials based on the standard normal distributions have been used in many different contexts for the past 30 years, it was not until recently that the probability density function (pdf) and cumulative distribution function (cdf) were derived and made available. Focusing on both univariate and multivariate nonnormal data generation, Statistical Simulation: Power Method Polynomials and Other Transformations presents techniques for conducting a Monte Carlo simulation study. It shows how to use power method polynomials for simulating univariate and multivariate nonnormal distributions with specified cumulants and correlation matrices. The book first explores the methodology underlying the power method, …


A Statistical Framework For The Analysis Of Chip-Seq Data, Pei Fen Kuan, Dongjun Chung, Guangjin Pan, James A. Thomson, Ron Stewart, Sunduz Keles Nov 2009

A Statistical Framework For The Analysis Of Chip-Seq Data, Pei Fen Kuan, Dongjun Chung, Guangjin Pan, James A. Thomson, Ron Stewart, Sunduz Keles

Sunduz Keles

Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) has revolutionalized experiments for genome-wide profiling of DNA-binding proteins, histone modifications, and nucleosome occupancy. As the cost of sequencing is decreasing, many researchers are switching from microarray-based technologies (ChIP-chip) to ChIP-Seq for genome-wide study of transcriptional regulation. Despite its increasing and well-deserved popularity, there is little work that investigates and accounts for sources of biases in the ChIP-Seq technology. These biases typically arise from both the standard pre-processing protocol and the underlying DNA sequence of the generated data.

We study data from a naked DNA sequencing experiment, which sequences non-cross-linked DNA after deproteinizing and …


Multilevel Functional Principal Component Analysis, Chong-Zhi Di, Ciprian M. Crainiceanu, Brian S. Caffo, Naresh M. Punjabi Jan 2009

Multilevel Functional Principal Component Analysis, Chong-Zhi Di, Ciprian M. Crainiceanu, Brian S. Caffo, Naresh M. Punjabi

Chongzhi Di

The Sleep Heart Health Study (SHHS) is a comprehensive landmark study of sleep and its impacts on health outcomes. A primary metric of the SHHS is the in-home polysomnogram, which includes two electroencephalographic (EEG) channels for each subject, at two visits. The volume and importance of this data presents enormous challenges for analysis. To address these challenges, we introduce multilevel functional principal component analysis (MFPCA), a novel statistical methodology designed to extract core intra- and inter-subject geometric components of multilevel functional data. Though motivated by the SHHS, the proposed methodology is generally applicable, with potential relevance to many modern scientific …


Nonparametric Signal Extraction And Measurement Error In The Analysis Of Electroencephalographic Activity During Sleep, Ciprian M. Crainiceanu, Brian S. Caffo, Chong-Zhi Di, Naresh M. Punjabi Jan 2009

Nonparametric Signal Extraction And Measurement Error In The Analysis Of Electroencephalographic Activity During Sleep, Ciprian M. Crainiceanu, Brian S. Caffo, Chong-Zhi Di, Naresh M. Punjabi

Chongzhi Di

We introduce methods for signal and associated variability estimation based on hierarchical nonparametric smoothing with application to the Sleep Heart Health Study (SHHS). SHHS is the largest electroencephalographic (EEG) collection of sleep-related data, which contains, at each visit, two quasi-continuous EEG signals for each subject. The signal features extracted from EEG data are then used in second level analyses to investigate the relation between health, behavioral, or biometric outcomes and sleep. Using subject specific signals estimated with known variability in a second level regression becomes a nonstandard measurement error problem.We propose and implement methods that take into account cross-sectional and …


Generalized Multilevel Functional Regression, Ciprian M. Crainiceanu, Ana-Maria Staicu, Chong-Zhi Di Jan 2009

Generalized Multilevel Functional Regression, Ciprian M. Crainiceanu, Ana-Maria Staicu, Chong-Zhi Di

Chongzhi Di

We introduce Generalized Multilevel Functional Linear Models (GMFLMs), a novel statistical framework for regression models where exposure has a multilevel functional structure. We show that GMFLMs are, in fact, generalized multilevel mixed models. Thus, GMFLMs can be analyzed using the mixed effects inferential machinery and can be generalized within a well-researched statistical framework. We propose and compare two methods for inference: (1) a two-stage frequentist approach; and (2) a joint Bayesian analysis. Our methods are motivated by and applied to the Sleep Heart Health Study, the largest community cohort study of sleep. However, our methods are general and easy to …


Bootstrap P-Values In Discrete Models: Asymptotic And Non-Asymptotic Effects, Chris Lloyd Dec 2008

Bootstrap P-Values In Discrete Models: Asymptotic And Non-Asymptotic Effects, Chris Lloyd

Chris J. Lloyd

(This paper is a major revision of http://works.bepress.com/chris_lloyd/15/.) Standard first order P-values suffer from two important drawbacks. First, even for quite large sample sizes they can misrepresent the exact significance which depends on nuisance parameters unspecified under the null. For most discrete models is that accuracy is variable and breaks down completely at the boundary. Second, different test statistics can give practically different results.

The bootstrap P-value is the exact significance with the null maximum estimate (ML) of the nuisance parameter substituted. We show that bootstrap P-values based on different first order statistics differ to second order. We also show …


Adaptive Estimation, Douglas G. Steigerwald Dec 2007

Adaptive Estimation, Douglas G. Steigerwald

Douglas G. Steigerwald

No abstract provided.


The Black Swan: Praise And Criticism, Peter H. Westfall, Joseph M. Hilbe Aug 2007

The Black Swan: Praise And Criticism, Peter H. Westfall, Joseph M. Hilbe

Joseph M Hilbe

No abstract provided.


The Power Method Transformation: Its Probability Density Function, Distribution Function, And Its Further Use For Fitting Data, Todd C. Headrick, Rhonda K. Kowalchuk Mar 2007

The Power Method Transformation: Its Probability Density Function, Distribution Function, And Its Further Use For Fitting Data, Todd C. Headrick, Rhonda K. Kowalchuk

Todd Christopher Headrick

The power method polynomial transformation is a popular algorithm used for simulating non-normal distributions because of its simplicity and ease of execution. The primary limitations of the power method transformation are that its probability density function (pdf) and cumulative distribution function (cdf) are unknown. In view of this, the power method’s pdf and cdf are derived in general form. More specific properties are also derived for determining if a given transformation will also have an associated pdf in the context of polynomials of order three and five. Numerical examples and parametric plots of power method densities are provided to confirm …


A Note On Empirical Likelihood Inference Of Residual Life Regression, Ying Qing Chen, Yichuan Zhao Dec 2006

A Note On Empirical Likelihood Inference Of Residual Life Regression, Ying Qing Chen, Yichuan Zhao

Yichuan Zhao

Mean residual life function, or life expectancy, is an important function to characterize distribution of residual life. The proportional mean residual life model by Oakes and Dasu (1990) is a regression tool to study the association between life expectancy and its associated covariates. Although semiparametric inference procedures have been proposed in the literature, the accuracy of such procedures may be low when the censoring proportion is relatively large. In this paper, the semiparametric inference procedures are studied with an empirical likelihood ratio method. An empirical likelihood confidence region is constructed for the regression parameters. The proposed method is further compared …


Noise Reduced Realized Volatility: A Kalman Filter Approach, Douglas Steigerwald, John Owens Dec 2005

Noise Reduced Realized Volatility: A Kalman Filter Approach, Douglas Steigerwald, John Owens

Douglas G. Steigerwald

How should one remove microstructure noise from high-frequency asset prices? We show how to use the Kalman filter to efficiently remove microstructure noise.


On Optimizing Multi-Level Designs: Power Under Budget Constraints, Todd C. Headrick, Bruno D. Zumbo Jan 2005

On Optimizing Multi-Level Designs: Power Under Budget Constraints, Todd C. Headrick, Bruno D. Zumbo

Todd Christopher Headrick

This paper derives a procedure for efficiently allocating the number of units in multi-level designs given prespecified power levels. The derivation of the procedure is based on a constrained optimization problem that maximizes a general form of a ratio of expected mean squares subject to a budget constraint. The procedure makes use of variance component estimates to optimize designs during the budget formulating stages. The method provides more general closed form solutions than other currently available formulae. As such, the proposed procedure allows for the determination of the optimal numbers of units for studies that involve more complex designs. A …


More Powerful Unconditional Tests Of No Treatment Effect From Binary Matched Pairs, Chris Lloyd Dec 2004

More Powerful Unconditional Tests Of No Treatment Effect From Binary Matched Pairs, Chris Lloyd

Chris J. Lloyd

This is the workign paper version that preceeded the paper "A New Exact and More Powerful Unconditional Test of no Treatment Effect from Binary Matched Pairs" published in Biometrics 76 (also on this site:http://works.bepress.com/chris_lloyd/3/


Identifying A Source Of Financial Volatility, Douglas G. Steigerwald, Richard Vagnoni Dec 2004

Identifying A Source Of Financial Volatility, Douglas G. Steigerwald, Richard Vagnoni

Douglas G. Steigerwald

How should one combine stock and option markets in models of trade and asset price volatility? We address this question, paying particular attention to the identification of parameters of interest.


Inferring Information Frequency And Quality, Douglas G. Steigerwald, John Owens Dec 2004

Inferring Information Frequency And Quality, Douglas G. Steigerwald, John Owens

Douglas G. Steigerwald

We develop a microstructure model that, in contrast to previous models, allows one to estimate the frequency and quality of private information. In addition, the model produces stationary asset price and trading volume series. We find evidence that information arrives frequently within a day and that this information is of high quality. The frequent arrival of information, while in contrast to previous microstructure model estimates, accords with nonmodel-based estimates and the related literature testing the mixture-of-distributions hypothesis. To determine if the estimates are correctly reflecting the arrival of latent information, we estimate the parameters over half-hour intervals within the day. …


Consumption Function, Douglas G. Steigerwald Dec 2003

Consumption Function, Douglas G. Steigerwald

Douglas G. Steigerwald

No abstract provided.