Open Access. Powered by Scholars. Published by Universities.®

Statistical Methodology Commons

Open Access. Powered by Scholars. Published by Universities.®

2012

Discipline
Institution
Keyword
Publication
Publication Type
File Type

Articles 1 - 30 of 44

Full-Text Articles in Statistical Methodology

Group Testing Regression Models, Boan Zhang Nov 2012

Group Testing Regression Models, Boan Zhang

Department of Statistics: Dissertations, Theses, and Student Work

Group testing, where groups of individual specimens are composited to test for the presence or absence of a disease (or some other binary characteristic), is a procedure commonly used to reduce the costs of screening a large number of individuals. Statistical research in group testing has traditionally focused on a homogeneous population, where individuals are assumed to have the same probability of having a disease. However, individuals often have different risks of positivity, so recent research has examined regression models that allow for heterogeneity among individuals within the population. This dissertation focuses on two problems involving group testing regression models. …


Obtaining Critical Values For Test Of Markov Regime Switching, Douglas G. Steigerwald, Valerie Bostwick Oct 2012

Obtaining Critical Values For Test Of Markov Regime Switching, Douglas G. Steigerwald, Valerie Bostwick

Douglas G. Steigerwald

For Markov regime-switching models, testing for the possible presence of more than one regime requires the use of a non-standard test statistic. Carter and Steigerwald (forthcoming, Journal of Econometric Methods) derive in detail the analytic steps needed to implement the test ofMarkov regime-switching proposed by Cho and White (2007, Econometrica). We summarize the implementation steps and address the computational issues that arise. A new command to compute regime-switching critical values, rscv, is introduced and presented in the context of empirical research.


Quest For Continuous Improvement: Gathering Feedback And Data Through Multiple Methods To Evaluate And Improve A Library’S Discovery Tool, Jeanne M. Brown Oct 2012

Quest For Continuous Improvement: Gathering Feedback And Data Through Multiple Methods To Evaluate And Improve A Library’S Discovery Tool, Jeanne M. Brown

Library Faculty Presentations

Summon at UNLV

  • Implemented fall 2011: a web-scale discovery tool
  • Expectations for Summon
  • Continuous Summon Improvement (CSI)Group

The environment

  • User changes
  • Library changes
  • Vendor changes
  • Product changes
  • Complex information environment
  • Change + complexity = need to assess using multiple streams of feedback


A Doubling Technique For The Power Method Transformations, Mohan D. Pant, Todd C. Headrick Oct 2012

A Doubling Technique For The Power Method Transformations, Mohan D. Pant, Todd C. Headrick

Mohan Dev Pant

Power method polynomials are used for simulating non-normal distributions with specified product moments or L-moments. The power method is capable of producing distributions with extreme values of skew (L-skew) and kurtosis (L-kurtosis). However, these distributions can be extremely peaked and thus not representative of real-world data. To obviate this problem, two families of distributions are introduced based on a doubling technique with symmetric standard normal and logistic power method distributions. The primary focus of the methodology is in the context of L-moment theory. As such, L-moment based systems of equations are derived for simulating univariate and multivariate non-normal distributions with …


Finding A Better Confidence Interval For A Single Regression Changepoint Using Different Bootstrap Confidence Interval Procedures, Bodhipaksha Thilakarathne Oct 2012

Finding A Better Confidence Interval For A Single Regression Changepoint Using Different Bootstrap Confidence Interval Procedures, Bodhipaksha Thilakarathne

Electronic Theses and Dissertations

Recently a number of papers have been published in the area of regression changepoints but there is not much literature concerning confidence intervals for regression changepoints. The purpose of this paper is to find a better bootstrap confidence interval for a single regression changepoint. ("Better" confidence interval means having a minimum length and coverage probability which is close to a chosen significance level). Several methods will be used to find bootstrap confidence intervals. Among those methods a better confidence interval will be presented.


Adventures In Library Salary Surveys, Scott L. Schaffer Aug 2012

Adventures In Library Salary Surveys, Scott L. Schaffer

UVM Libraries Conference Day

Salary surveys are an important tool for the library community and the administrators and boards responsible for the oversight of libraries. However, such assessments must be constructed and analyzed with great care. The Vermont Library Association Personnel Committee has conducted three salary surveys over the past several years, one focusing on academic libraries and two on public libraries. Significant issues have included confidentiality, participation rate, definitions, length and difficulty of questions, collection of data, and representativeness. Suggestions and lessons learned will be shared.


An L-Moment-Based Analog For The Schmeiser-Deutsch Class Of Distributions, Todd C. Headrick, Mohan D. Pant Aug 2012

An L-Moment-Based Analog For The Schmeiser-Deutsch Class Of Distributions, Todd C. Headrick, Mohan D. Pant

Mohan Dev Pant

This paper characterizes the conventional moment-based Schmeiser-Deutsch (S-D) class of distributions through the method of L-moments. The system can be used in a variety of settings such as simulation or modeling various processes. A procedure is also described for simulating S-D distributions with specified L-moments and L-correlations. The Monte Carlo results presented in this study indicate that the estimates of L-skew, L-kurtosis, and L-correlation associated with the S-D class of distributions are substantially superior to their corresponding conventional product-moment estimators in terms of relative bias—most notably when sample sizes are small.


諸外国のデータエディティング及び混淆正規分布モデルによる多変量外れ値検出法についての研究(高橋将宜、選択的エディティング、セレクティブエディティング), Masayoshi Takahashi Aug 2012

諸外国のデータエディティング及び混淆正規分布モデルによる多変量外れ値検出法についての研究(高橋将宜、選択的エディティング、セレクティブエディティング), Masayoshi Takahashi

Masayoshi Takahashi

No abstract provided.


Big Data And The Future, Sherri Rose Jul 2012

Big Data And The Future, Sherri Rose

Sherri Rose

No abstract provided.


A Prior-Free Framework Of Coherent Inference And Its Derivation Of Simple Shrinkage Estimators, David R. Bickel Jun 2012

A Prior-Free Framework Of Coherent Inference And Its Derivation Of Simple Shrinkage Estimators, David R. Bickel

COBRA Preprint Series

The reasoning behind uses of confidence intervals and p-values in scientific practice may be made coherent by modeling the inferring statistician or scientist as an idealized intelligent agent. With other things equal, such an agent regards a hypothesis coinciding with a confidence interval of a higher confidence level as more certain than a hypothesis coinciding with a confidence interval of a lower confidence level. The agent uses different methods of confidence intervals conditional on what information is available. The coherence requirement means all levels of certainty of hypotheses about the parameter agree with the same distribution of certainty over parameter …


Methods For Shape-Constrained Kernel Density Estimation, Mark A. Wolters Jun 2012

Methods For Shape-Constrained Kernel Density Estimation, Mark A. Wolters

Electronic Thesis and Dissertation Repository

Nonparametric density estimators are used to estimate an unknown probability density while making minimal assumptions about its functional form. Although the low reliance of nonparametric estimators on modelling assumptions is a benefit, their performance will be improved if auxiliary information about the density's shape is incorporated into the estimate. Auxiliary information can take the form of shape constraints, such as unimodality or symmetry, that the estimate must satisfy. Finding the constrained estimate is usually a difficult optimization problem, however, and a consistent framework for finding estimates across a variety of problems is lacking.

It is proposed to find shape-constrained density …


Targeted Maximum Likelihood Estimation For Dynamic Treatment Regimes In Sequential Randomized Controlled Trials, Paul Chaffee, Mark J. Van Der Laan Jun 2012

Targeted Maximum Likelihood Estimation For Dynamic Treatment Regimes In Sequential Randomized Controlled Trials, Paul Chaffee, Mark J. Van Der Laan

Paul H. Chaffee

Sequential Randomized Controlled Trials (SRCTs) are rapidly becoming essential tools in the search for optimized treatment regimes in ongoing treatment settings. Analyzing data for multiple time-point treatments with a view toward optimal treatment regimes is of interest in many types of afflictions: HIV infection, Attention Deficit Hyperactivity Disorder in children, leukemia, prostate cancer, renal failure, and many others. Methods for analyzing data from SRCTs exist but they are either inefficient or suffer from the drawbacks of estimating equation methodology. We describe an estimation procedure, targeted maximum likelihood estimation (TMLE), which has been fully developed and implemented in point treatment settings, …


Investigation Of Trends And Predictive Effectiveness Of Crash Severity Models, James E. Mooradian Jun 2012

Investigation Of Trends And Predictive Effectiveness Of Crash Severity Models, James E. Mooradian

Master's Theses

This thesis describes analysis using ordinal logistic regression to uncover temporal patterns in the severity level (fatal, serious injury, minor injury, slight injury or no injury) for persons involved in highway crashes in Connecticut, focusing on the demographic split between senior travelers (65 years and over) and non-senior travelers. Existing state sources provide data describing the time and weather conditions for each crash and the vehicles and persons involved over the time period from 1995 to 2009 as well as the traffic volumes and the characteristics of the roads on which these crashes occurred. Findings indicate an overall increase in …


A Logistic L-Moment-Based Analog For The Tukey G-H, G, H, And H-H System Of Distributions, Todd C. Headrick, Mohan D. Pant Jun 2012

A Logistic L-Moment-Based Analog For The Tukey G-H, G, H, And H-H System Of Distributions, Todd C. Headrick, Mohan D. Pant

Mohan Dev Pant

This paper introduces a standard logistic L-moment-based system of distributions. The proposed system is an analog to the standard normal conventional moment-based Tukey g-h, g, h, and h-h system of distributions. The system also consists of four classes of distributions and is referred to as (i) asymmetric γ-κ, (ii) log-logistic γ, (iii) symmetric κ, and (iv) asymmetric κL-κR. The system can be used in a variety of settings such as simulation or modeling events—most notably when heavy-tailed distributions are of interest. A procedure is also described for simulating γ-κ, γ, κ, and κL-κR distributions with specified L-moments and L-correlations. The …


A Method For Simulating Nonnormal Distributions With Specified L-Skew, L-Kurtosis, And L-Correlation, Todd C. Headrick, Mohan D. Pant May 2012

A Method For Simulating Nonnormal Distributions With Specified L-Skew, L-Kurtosis, And L-Correlation, Todd C. Headrick, Mohan D. Pant

Mohan Dev Pant

This paper introduces two families of distributions referred to as the symmetric κ and asymmetric κL-κR distributions. The families are based on transformations of standard logistic pseudo-random deviates. The primary focus of the theoretical development is in the contexts of L-moments and the L-correlation. Also included is the development of a method for specifying distributions with controlled degrees of L-skew, L-kurtosis, and L-correlation. The method can be applied in a variety of settings such as Monte Carlo studies, simulation, or modeling events. It is also demonstrated that estimates of L-skew, L-kurtosis, and L-correlation are superior to conventional product-moment estimates of …


Simulating Non-Normal Distributions With Specified L-Moments And L-Correlations, Todd C. Headrick, Mohan D. Pant May 2012

Simulating Non-Normal Distributions With Specified L-Moments And L-Correlations, Todd C. Headrick, Mohan D. Pant

Mohan Dev Pant

This paper derives a procedure for simulating continuous non-normal distributions with specified L-moments and L-correlations in the context of power method polynomials of order three. It is demonstrated that the proposed procedure has computational advantages over the traditional product-moment procedure in terms of solving for intermediate correlations. Simulation results also demonstrate that the proposed L-moment-based procedure is an attractive alternative to the traditional procedure when distributions with more severe departures from normality are considered. Specifically, estimates of L-skew and L-kurtosis are superior to the conventional estimates of skew and kurtosis in terms of both relative bias and relative standard error. …


Confidence Intervals For The Selected Population In Randomized Trials That Adapt The Population Enrolled, Michael Rosenblum May 2012

Confidence Intervals For The Selected Population In Randomized Trials That Adapt The Population Enrolled, Michael Rosenblum

Johns Hopkins University, Dept. of Biostatistics Working Papers

It is a challenge to design randomized trials when it is suspected that a treatment may benefit only certain subsets of the target population. In such situations, trial designs have been proposed that modify the population enrolled based on an interim analysis, in a preplanned manner. For example, if there is early evidence that the treatment only benefits a certain subset of the population, enrollment may then be restricted to this subset. At the end of such a trial, it is desirable to draw inferences about the selected population. We focus on constructing confidence intervals for the average treatment effect …


Using The R Library Rpanel For Gui-Based Simulations In Introductory Statistics Courses, Ryan M. Allison May 2012

Using The R Library Rpanel For Gui-Based Simulations In Introductory Statistics Courses, Ryan M. Allison

Statistics

As a student, I noticed that the statistical package R (http://www.r-project.org) would have several benefits of its usage in the classroom. One benefit to the package is its free and open-source nature. This would be a great benefit for instructors and students alike since it would be of no cost to use, unlike other statistical packages. Due to this, students could continue using the program after their statistical courses and into their professional careers. It would be good to expose students while they are in school to a tool that professionals use in industry. R also has powerful …


Differential Patterns Of Interaction And Gaussian Graphical Models, Masanao Yajima, Donatello Telesca, Yuan Ji, Peter Muller Apr 2012

Differential Patterns Of Interaction And Gaussian Graphical Models, Masanao Yajima, Donatello Telesca, Yuan Ji, Peter Muller

COBRA Preprint Series

We propose a methodological framework to assess heterogeneous patterns of association amongst components of a random vector expressed as a Gaussian directed acyclic graph. The proposed framework is likely to be useful when primary interest focuses on potential contrasts characterizing the association structure between known subgroups of a given sample. We provide inferential frameworks as well as an efficient computational algorithm to fit such a model and illustrate its validity through a simulation. We apply the model to Reverse Phase Protein Array data on Acute Myeloid Leukemia patients to show the contrast of association structure between refractory patients and relapsed …


Variances For Maximum Penalized Likelihood Estimates Obtained Via The Em Algorithm, Mark Segal, Peter Bacchetti, Nicholas Jewell Apr 2012

Variances For Maximum Penalized Likelihood Estimates Obtained Via The Em Algorithm, Mark Segal, Peter Bacchetti, Nicholas Jewell

Mark R Segal

We address the problem of providing variances for parameter estimates obtained under a penalized likelihood formulation through use of the EM algorithm. The proposed solution represents a synthesis of two existent techniques. Firstly, we exploit the supplemented EM algorithm developed in Meng and Rubin (1991) that provides variance estimates for maximum likelihood estimates obtained via the EM algorithm. Their procedure relies on evaluating the Jacobian of the mapping induced by the EM algorithm. Secondly, we utilize a result from Green (1990) that provides an expression for the Jacobian of the mapping induced by the EM algorithm applied to a penalized …


Backcalculation Of Hiv Infection Rates, Peter Bacchetti, Mark Segal, Nicholas Jewell Apr 2012

Backcalculation Of Hiv Infection Rates, Peter Bacchetti, Mark Segal, Nicholas Jewell

Mark R Segal

Backcalculation is an important method of reconstructing past rates of human immunodeficiency virus (HIV) infection and for estimating current prevalence of HIV infection and future incidence of acquired immunodeficiency syndrome (AIDS). This paper reviews the backcalculation techniques, focusing on the key assumptions of the method, including the necessary information regarding incubation, reporting delay, and models for the infection curve. A summary is given of the extent to which the appropriate external information is available and whether checks of the relevant assumptions are possible through use of data on AIDS incidence from surveillance systems. A likelihood approach to backcalculation is described …


Loss Function Based Ranking In Two-Stage, Hierarchical Models, Rongheng Lin, Thomas A. Louis, Susan M. Paddock, Greg Ridgeway Mar 2012

Loss Function Based Ranking In Two-Stage, Hierarchical Models, Rongheng Lin, Thomas A. Louis, Susan M. Paddock, Greg Ridgeway

Rongheng Lin

Several authors have studied the performance of optimal, squared error loss (SEL) estimated ranks. Though these are effective, in many applications interest focuses on identifying the relatively good (e.g., in the upper 10%) or relatively poor performers. We construct loss functions that address this goal and evaluate candidate rank estimates, some of which optimize specific loss functions. We study performance for a fully parametric hierarchical model with a Gaussian prior and Gaussian sampling distributions, evaluating performance for several loss functions. Results show that though SEL-optimal ranks and percentiles do not specifically focus on classifying with respect to a percentile cut …


On The Order Statistics Of Standard Normal-Based Power Method Distributions, Todd C. Headrick, Mohan D. Pant Mar 2012

On The Order Statistics Of Standard Normal-Based Power Method Distributions, Todd C. Headrick, Mohan D. Pant

Mohan Dev Pant

This paper derives a procedure for determining the expectations of order statistics associated with the standard normal distribution (Z) and its powers of order three and five (Z^3 and Z^5). The procedure is demonstrated for sample sizes of n ≤ 9. It is shown that Z^3 and Z^5 have expectations of order statistics that are functions of the expectations for Z and can be expressed in terms of explicit elementary functions for sample sizes of n ≤ 5. For sample sizes of n = 6, 7 the expectations of the order statistics for Z, Z^3, and Z^5 only require a …


Avoiding Boundary Estimates In Linear Mixed Models Through Weakly Informative Priors, Yeojin Chung, Sophia Rabe-Hesketh, Andrew Gelman, Jingchen Liu, Vincent Dorie Feb 2012

Avoiding Boundary Estimates In Linear Mixed Models Through Weakly Informative Priors, Yeojin Chung, Sophia Rabe-Hesketh, Andrew Gelman, Jingchen Liu, Vincent Dorie

U.C. Berkeley Division of Biostatistics Working Paper Series

Variance parameters in mixed or multilevel models can be difficult to estimate, especially when the number of groups is small. We propose a maximum penalized likelihood approach which is equivalent to estimating variance parameters by their marginal posterior mode, given a weakly informative prior distribution. By choosing the prior from the gamma family with at least 1 degree of freedom, we ensure that the prior density is zero at the boundary and thus the marginal posterior mode of the group-level variance will be positive. The use of a weakly informative prior allows us to stabilize our estimates while remaining faithful …


A Doubling Method For The Generalized Lambda Distribution, Todd C. Headrick, Mohan D. Pant Feb 2012

A Doubling Method For The Generalized Lambda Distribution, Todd C. Headrick, Mohan D. Pant

Mohan Dev Pant

This paper introduces a new family of generalized lambda distributions (GLDs) based on a method of doubling symmetric GLDs. The focus of the development is in the context of L-moments and L-correlation theory. As such, included is the development of a procedure for specifying double GLDs with controlled degrees of L-skew, L-kurtosis, and L-correlations. The procedure can be applied in a variety of settings such as modeling events and Monte Carlo or simulation studies. Further, it is demonstrated that estimates of L-skew, L-kurtosis, and L-correlation are substantially superior to conventional product-moment estimates of skew, kurtosis, and Pearson correlation in terms …


Targeted Maximum Likelihood Estimation Of Natural Direct Effects, Wenjing Zheng, Mark Van Der Laan Jan 2012

Targeted Maximum Likelihood Estimation Of Natural Direct Effects, Wenjing Zheng, Mark Van Der Laan

Wenjing Zheng

In many causal inference problems, one is interested in the direct causal effect of an exposure on an outcome of interest that is not mediated by certain intermediate variables. Robins and Greenland (1992) and Pearl (2001) formalized the definition of two types of direct effects (natural and controlled) under the counterfactual framework. The efficient scores (under a nonparametric model) for the various natural effect parameters and their general robustness conditions, as well as an estimating equation based estimator using the efficient score, are provided in Tchetgen Tchetgen and Shpitser (2011b). In this article, we apply the targeted maximum likelihood framework …


Characterizing Tukey H And Hh-Distributions Through L-Moments And The L-Correlation, Todd C. Headrick, Mohan D. Pant Jan 2012

Characterizing Tukey H And Hh-Distributions Through L-Moments And The L-Correlation, Todd C. Headrick, Mohan D. Pant

Mohan Dev Pant

This paper introduces the Tukey family of symmetric h and asymmetric hh-distributions in the contexts of univariate L-moments and the L-correlation. Included is the development of a procedure for specifying nonnormal distributions with controlled degrees of L-skew, L-kurtosis, and L-correlations. The procedure can be applied in a variety of settings such as modeling events (e.g., risk analysis, extreme events) and Monte Carlo or simulation studies. Further, it is demonstrated that estimates of L-skew, L-kurtosis, and L-correlation are substantially superior to conventional product-moment estimates of skew, kurtosis, and Pearson correlation in terms of both relative bias and efficiency when heavy-tailed distributions …


Statistical Methods For Proteomic Biomarker Discovery Based On Feature Extraction Or Functional Modeling Approaches, Jeffrey S. Morris Jan 2012

Statistical Methods For Proteomic Biomarker Discovery Based On Feature Extraction Or Functional Modeling Approaches, Jeffrey S. Morris

Jeffrey S. Morris

In recent years, developments in molecular biotechnology have led to the increased promise of detecting and validating biomarkers, or molecular markers that relate to various biological or medical outcomes. Proteomics, the direct study of proteins in biological samples, plays an important role in the biomarker discovery process. These technologies produce complex, high dimensional functional and image data that present many analytical challenges that must be addressed properly for effective comparative proteomics studies that can yield potential biomarkers. Specific challenges include experimental design, preprocessing, feature extraction, and statistical analysis accounting for the inherent multiple testing issues. This paper reviews various computational …


Integrative Bayesian Analysis Of High-Dimensional Multi-Platform Genomics Data, Wenting Wang, Veerabhadran Baladandayuthapani, Jeffrey S. Morris, Bradley M. Broom, Ganiraju C. Manyam, Kim-Anh Do Jan 2012

Integrative Bayesian Analysis Of High-Dimensional Multi-Platform Genomics Data, Wenting Wang, Veerabhadran Baladandayuthapani, Jeffrey S. Morris, Bradley M. Broom, Ganiraju C. Manyam, Kim-Anh Do

Jeffrey S. Morris

Motivation: Analyzing data from multi-platform genomics experiments combined with patients’ clinical outcomes helps us understand the complex biological processes that characterize a disease, as well as how these processes relate to the development of the disease. Current integration approaches that treat the data are limited in that they do not consider the fundamental biological relationships that exist among the data from platforms.

Statistical Model: We propose an integrative Bayesian analysis of genomics data (iBAG) framework for identifying important genes/biomarkers that are associated with clinical outcome. This framework uses a hierarchical modeling technique to combine the data obtained from multiple platforms …


R Code: A Non-Iterative Implementation Of Tango's Score Confidence Interval For A Paired Difference Of Proportions, Zhao Yang Jan 2012

R Code: A Non-Iterative Implementation Of Tango's Score Confidence Interval For A Paired Difference Of Proportions, Zhao Yang

Zhao (Tony) Yang, Ph.D.

For matched-pair binary data, a variety of approaches have been proposed for the construction of a confidence interval (CI) for the difference of marginal probabilities between two procedures. The score-based approximate CI has been shown to outperform other asymptotic CIs. Tango’s method provides a score CI by inverting a score test statistic using an iterative procedure. In the developed R code, we propose an efficient non-iterative method with closed-form expression to calculate Tango’s CIs. Examples illustrate the practical application of the new approach.