Open Access. Powered by Scholars. Published by Universities.®

Statistical Methodology Commons

Open Access. Powered by Scholars. Published by Universities.®

947 Full-Text Articles 1222 Authors 207691 Downloads 59 Institutions

All Articles in Statistical Methodology

Faceted Search

947 full-text articles. Page 1 of 26.

Mechanistic Mathematical Models: An Underused Platform For Hpv Research, Marc Ryser, Patti Gravitt, Evan R. Myers 2017 George Washington University

Mechanistic Mathematical Models: An Underused Platform For Hpv Research, Marc Ryser, Patti Gravitt, Evan R. Myers

Global Health Faculty Publications

Health economic modeling has become an invaluable methodology for the design and evaluation of clinical and public health interventions against the human papillomavirus (HPV) and associated diseases. At the same time, relatively little attention has been paid to a different yet complementary class of models, namely that of mechanistic mathematical models. The primary focus of mechanistic mathematical models is to better understand the intricate biologic mechanisms and dynamics of disease. Inspired by a long and successful history of mechanistic modeling in other biomedical fields, we highlight several areas of HPV research where mechanistic models have the potential to advance the ...


Gilmore Girls And Instagram: A Statistical Look At The Popularity Of The Television Show Through The Lens Of An Instagram Page, Brittany Simmons 2017 Chapman University

Gilmore Girls And Instagram: A Statistical Look At The Popularity Of The Television Show Through The Lens Of An Instagram Page, Brittany Simmons

Student Research Day Abstracts and Posters

After going on the Warner Brothers Tour in December of 2015, I created a Gilmore Girls Instagram account. This account, which started off as a way for me to create edits of the show and post my photos from the tour turned into something bigger than I ever could have imagined. In just over a year I have over 55,000 followers. I post content including revival news, merchandise, and edits of the show that have been featured in Entertainment Weekly, Bustle, E! News, People Magazine, Yahoo News, & GilmoreNews.

I created a dataset of qualitative and quantitative outcomes from my ...


Comparision Of Survival Curves Between Cox Proportional Hazards, Random Forests, And Conditional Inference Forests In Survival Analysis, Brandon Weathers, Richard Cutler Dr. 2017 Utah State University

Comparision Of Survival Curves Between Cox Proportional Hazards, Random Forests, And Conditional Inference Forests In Survival Analysis, Brandon Weathers, Richard Cutler Dr.

All Graduate Plan B and other Reports

Survival analysis methods are a mainstay of the biomedical fields but are finding increasing use in other disciplines including finance and engineering. A widely used tool in survival analysis is the Cox proportional hazards regression model. For this model, all the predicted survivor curves have the same basic shape, which may not be a good approximation to reality. In contrast the Random Survival Forests does not make the proportional hazards assumption and has the flexibility to model survivor curves that are of quite different shapes for different groups of subjects. We applied both techniques to a number of publicly available ...


Denoising Tandem Mass Spectrometry Data, Felix Offei 2017 East Tennessee State Universtiy

Denoising Tandem Mass Spectrometry Data, Felix Offei

Electronic Theses and Dissertations

Protein identification using tandem mass spectrometry (MS/MS) has proven to be an effective way to identify proteins in a biological sample. An observed spectrum is constructed from the data produced by the tandem mass spectrometer. A protein can be identified if the observed spectrum aligns with the theoretical spectrum. However, data generated by the tandem mass spectrometer are affected by errors thus making protein identification challenging in the field of proteomics. Some of these errors include wrong calibration of the instrument, instrument distortion and noise. In this thesis, we present a pre-processing method, which focuses on the removal of ...


Error Costs, Legal Standards Of Proof And Statistical Significance, Michelle Burtis, Jonah B. Gelbach, Bruce H. Kobayashi 2017 Charles River Associates (CRA) International

Error Costs, Legal Standards Of Proof And Statistical Significance, Michelle Burtis, Jonah B. Gelbach, Bruce H. Kobayashi

Faculty Scholarship

The relationship between legal standards of proof and thresholds of statistical significance is a well-known and studied phenomena in the academic literature. Moreover, the distinction between the two has been recognized in law. For example, in Matrix v. Siracusano, the Court unanimously rejected the petitioner’s argument that the issue of materiality in a securities class action can be defined by the presence or absence of a statistically significant effect. However, in other contexts, thresholds based on fixed significance levels imported from academic settings continue to be used as a legal standard of proof. Our positive analysis demonstrates how a ...


Estimating Autoantibody Signatures To Detect Autoimmune Disease Patient Subsets, Zhenke Wu, Livia Casciola-Rosen, Ami A. Shah, Antony Rosen, Scott L. Zeger 2017 Department of Biostatistics and Michigan Institute of Data Science, University of Michigan

Estimating Autoantibody Signatures To Detect Autoimmune Disease Patient Subsets, Zhenke Wu, Livia Casciola-Rosen, Ami A. Shah, Antony Rosen, Scott L. Zeger

Johns Hopkins University, Dept. of Biostatistics Working Papers

Autoimmune diseases are characterized by highly specific immune responses against molecules in self-tissues. Different autoimmune diseases are characterized by distinct immune responses, making autoantibodies useful for diagnosis and prediction. In many diseases, the targets of autoantibodies are incompletely defined. Although the technologies for autoantibody discovery have advanced dramatically over the past decade, each of these techniques generates hundreds of possibilities, which are onerous and expensive to validate. We set out to establish a method to greatly simplify autoantibody discovery, using a pre-filtering step to define subgroups with similar specificities based on migration of labeled, immunoprecipitated proteins on sodium dodecyl sulfate ...


Session D-5: Informal Comparative Inference: What Is It?, Karen Togliatti 2017 Illinois Mathematics and Science Academy

Session D-5: Informal Comparative Inference: What Is It?, Karen Togliatti

Professional Learning Day

Come and experience a hands-on task that has middle-school students grapple with informal inferential reasoning. Three key principles of informal inference – data as evidence, probabilistic language, and generalizing ‘beyond the data’ will be discussed as students build and analyze distributions to answer the question, “Does hand dominance play a role in throwing accuracy?” Connections to the CCSSM statistics standards for middle-school will be highlighted.


Evaluation Of Progress Towards The Unaids 90-90-90 Hiv Care Cascade: A Description Of Statistical Methods Used In An Interim Analysis Of The Intervention Communities In The Search Study, Laura Balzer, Joshua Schwab, Mark J. van der Laan, Maya L. Petersen 2017 Department of Biostatistics, Harvard T.H. Chan School of Public Heath

Evaluation Of Progress Towards The Unaids 90-90-90 Hiv Care Cascade: A Description Of Statistical Methods Used In An Interim Analysis Of The Intervention Communities In The Search Study, Laura Balzer, Joshua Schwab, Mark J. Van Der Laan, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

WHO guidelines call for universal antiretroviral treatment, and UNAIDS has set a global target to virally suppress most HIV-positive individuals. Accurate estimates of population-level coverage at each step of the HIV care cascade (testing, treatment, and viral suppression) are needed to assess the effectiveness of "test and treat" strategies implemented to achieve this goal. The data available to inform such estimates, however, are susceptible to informative missingness: the number of HIV-positive individuals in a population is unknown; individuals tested for HIV may not be representative of those whom a testing intervention fails to reach, and HIV-positive individuals with a viral ...


Interweaving Markov Chain Monte Carlo Strategies For Efficient Estimation Of Dynamic Linear Models, Matthew Simpson, Jarad Niemi, Vivekananda Roy 2017 University of Missouri

Interweaving Markov Chain Monte Carlo Strategies For Efficient Estimation Of Dynamic Linear Models, Matthew Simpson, Jarad Niemi, Vivekananda Roy

Statistics Publications

In dynamic linear models (DLMs) with unknown fixed parameters, a standard Markov chain Monte Carlo (MCMC) sampling strategy is to alternate sampling of latent states conditional on fixed parameters and sampling of fixed parameters conditional on latent states. In some regions of the parameter space, this standard data augmentation (DA) algorithm can be inefficient. To improve efficiency, we apply the interweaving strategies of Yu and Meng to DLMs. For this, we introduce three novel alternative DAs for DLMs: the scaled errors, wrongly scaled errors, and wrongly scaled disturbances. With the latent states and the less well known scaled disturbances, this ...


The Logic And Limits Of Event Studies In Securities Fraud Litigation, Jill E. Fisch, Jonah B. Gelbach, Jonathan Klick 2017 University of Pennsylvania Law School

The Logic And Limits Of Event Studies In Securities Fraud Litigation, Jill E. Fisch, Jonah B. Gelbach, Jonathan Klick

Faculty Scholarship

Event studies have become increasingly important in securities fraud litigation after the Supreme Court’s decision in Halliburton II. Litigants have used event study methodology, which empirically analyzes the relationship between the disclosure of corporate information and the issuer’s stock price, to provide evidence in the evaluation of key elements of federal securities fraud, including materiality, reliance, causation, and damages. As the use of event studies grows and they increasingly serve a gatekeeping function in determining whether litigation will proceed beyond a preliminary stage, it will be critical for courts to use them correctly.

This Article explores an array ...


Calculating Power By Bootstrap, With An Application To Cluster-Randomized Trials, Ken Kleinman, Susan S. Huang 2017 University of Massachusetts Amherst, School of Public Health and Health Sciences

Calculating Power By Bootstrap, With An Application To Cluster-Randomized Trials, Ken Kleinman, Susan S. Huang

eGEMs (Generating Evidence & Methods to improve patient outcomes)

Background: A key requirement for a useful power calculation is that the calculation mimic the data analysis that will be performed on the actual data, once it is observed. Close approximations may be difficult to achieve using analytic solutions, however, and thus Monte Carlo approaches, including both simulation and bootstrap resampling, are often attractive. One setting in which this is particularly true is cluster-randomized trial designs. However, Monte Carlo approaches are useful in many additional settings as well. Calculating power for cluster-randomized trials using analytic or simulation-based methods is frequently unsatisfactory due to the complexity of the data analysis methods ...


It's All About Balance: Propensity Score Matching In The Context Of Complex Survey Data, David Lenis, Trang Q. ;Nguyen, Nian Dong, Elizabeth A. Stuart 2017 Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health

It's All About Balance: Propensity Score Matching In The Context Of Complex Survey Data, David Lenis, Trang Q. ;Nguyen, Nian Dong, Elizabeth A. Stuart

Johns Hopkins University, Dept. of Biostatistics Working Papers

Many research studies aim to draw causal inferences using data from large, nationally representative survey samples, and many of these studies use propensity score matching to make those causal inferences as rigorous as possible given the non-experimental nature of the data. However, very few applied studies are careful about incorporating the survey design with the propensity score analysis, which may mean that the results don’t generate population inferences. This may be because few methodological studies examine how to best combine these methods. Furthermore, even fewer of the methodological studies incorporate different non-response mechanisms in their analysis. This study examines ...


Time Series Copulas For Heteroskedastic Data, Michael S. Smith, Worapree Maneesoonthorn, Ruben Loaiza-Maya 2017 Melbourne Business School

Time Series Copulas For Heteroskedastic Data, Michael S. Smith, Worapree Maneesoonthorn, Ruben Loaiza-Maya

Michael Stanley Smith

We propose parametric copulas that capture serial dependence in stationary heteroskedastic time series. We develop our copula for first order Markov series, and extend it to higher orders and multivariate series. We derive the copula of a volatility proxy, based on which we propose new measures of volatility dependence, including co-movement and spillover in multivariate series. In general, these depend upon the marginal distributions of the series. Using exchange rate returns, we show that the resulting copula models can capture their marginal distributions more accurately than univariate and multivariate GARCH models, and produce more accurate value at risk forecasts.


Pointwise Influence Matrices For Functional-Response Regression, Philip T. Reiss, Lei Huang, Pei-Shien Wu, Huaihou Chen, Stan Colcombe 2016 New York University School of Medicine

Pointwise Influence Matrices For Functional-Response Regression, Philip T. Reiss, Lei Huang, Pei-Shien Wu, Huaihou Chen, Stan Colcombe

Philip T. Reiss

We extend the notion of an influence or hat matrix to regression with functional responses and scalar predictors. For responses depending linearly on a set of predictors, our definition is shown to reduce to the conventional influence matrix for linear models. The pointwise degrees of freedom, the trace of the pointwise hat matrix, are shown to have an adaptivity property that motivates a two-step bivariate smoother for modeling nonlinear dependence on a single predictor. This procedure adapts to varying complexity of the nonlinear model at different locations along the function, and thereby achieves better performance than competing tensor product smoothers ...


Penalized Nonparametric Scalar-On-Function Regression Via Principal Coordinates, Philip T. Reiss, David L. Miller, Pei-Shien Wu, Wen-Yu Hua 2016 New York University School of Medicine

Penalized Nonparametric Scalar-On-Function Regression Via Principal Coordinates, Philip T. Reiss, David L. Miller, Pei-Shien Wu, Wen-Yu Hua

Philip T. Reiss

A number of classical approaches to nonparametric regression have recently been extended to the case of functional predictors. This paper introduces a new method of this type, which extends intermediate-rank penalized smoothing to scalar-on-function regression. The core idea is to regress the response on leading principal coordinates defined by a relevant distance among the functional predictors, while applying a ridge penalty. Our publicly available implementation, based on generalized additive modeling software, allows for fast optimal tuning parameter selection and for extensions to multiple functional predictors, exponential family-valued responses, and mixed-effects models. In an application to signature verification data, the proposed ...


Improving Power In Group Sequential, Randomized Trials By Adjusting For Prognostic Baseline Variables And Short-Term Outcomes, Tianchen Qian, Michael Rosenblum, Huitong Qiu 2016 Departmnet of Biostatistics, Johns Hopkins Bloomberg School of Public Health

Improving Power In Group Sequential, Randomized Trials By Adjusting For Prognostic Baseline Variables And Short-Term Outcomes, Tianchen Qian, Michael Rosenblum, Huitong Qiu

Johns Hopkins University, Dept. of Biostatistics Working Papers

In group sequential designs, adjusting for baseline variables and short-term outcomes can lead to increased power and reduced sample size. We derive formulas for the precision gain from such variable adjustment using semiparametric estimators for the average treatment effect, and give new results on what conditions lead to substantial power gains and sample size reductions. The formulas reveal how the impact of prognostic variables on the precision gain is modified by the number of pipeline participants, analysis timing, enrollment rate, and treatment effect heterogeneity, when the semiparametric estimator uses correctly specified models. Given set prognostic value of baseline variables and ...


Stochastic Optimization Of Adaptive Enrichment Designs For Two Subpopulations, Aaron Fisher, Michael Rosenblum 2016 Harvard T.H. Chan School of Public Health

Stochastic Optimization Of Adaptive Enrichment Designs For Two Subpopulations, Aaron Fisher, Michael Rosenblum

Johns Hopkins University, Dept. of Biostatistics Working Papers

An adaptive enrichment design is a randomized trial that allows enrollment criteria to be modified at interim analyses, based on a preset decision rule. When there is prior uncertainty regarding treatment effect heterogeneity, these trial designs can provide improved power for detecting treatment effects in subpopulations. We present a simulated annealing approach to search over the space of decision rules and other parameters for an adaptive enrichment design. The goal is to minimize the expected number enrolled or expected duration, while preserving the appropriate power and Type I error rate. We also explore the benefits of parallel computation in the ...


Monte Carlo Methods In Bayesian Inference: Theory, Methods And Applications, Huarui Zhang 2016 University of Arkansas, Fayetteville

Monte Carlo Methods In Bayesian Inference: Theory, Methods And Applications, Huarui Zhang

Theses and Dissertations

Monte Carlo methods are becoming more and more popular in statistics due to the fast development of efficient computing technologies. One of the major beneficiaries of this advent is the field of Bayesian inference. The aim of this thesis is two-fold: (i) to explain the theory justifying the validity of the simulation-based schemes in a Bayesian setting (why they should work) and (ii) to apply them in several different types of data analysis that a statistician has to routinely encounter. In Chapter 1, I introduce key concepts in Bayesian statistics. Then we discuss Monte Carlo Simulation methods in detail. Our ...


Rao-Lovric And The Triwizard Point Null Hypothesis Tournament, Shlomo Sawilowsky 2016 Wayne State University

Rao-Lovric And The Triwizard Point Null Hypothesis Tournament, Shlomo Sawilowsky

Journal of Modern Applied Statistical Methods

The debate if the point null hypothesis is ever literally true cannot be resolved, because there are three competing statistical systems claiming ownership of the construct. The local resolution depends on personal acclimatization to a Fisherian, Frequentist, or Bayesian orientation (or an unexpected fourth champion if decision theory is allowed to compete). Implications of Rao and Lovric’s proposed Hodges-Lehman paradigm are discussed in the Appendix.


Censoring Unbiased Regression Trees And Ensembles, Jon Arni Steingrimsson, Liqun Diao, Robert L. Strawderman 2016 Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health

Censoring Unbiased Regression Trees And Ensembles, Jon Arni Steingrimsson, Liqun Diao, Robert L. Strawderman

Johns Hopkins University, Dept. of Biostatistics Working Papers

This paper proposes a novel approach to building regression trees and ensemble learning in survival analysis. By first extending the theory of censoring unbiased transformations, we construct observed data estimators of full data loss functions in cases where responses can be right censored. This theory is used to construct two specific classes of methods for building regression trees and regression ensembles that respectively make use of Buckley-James and doubly robust estimating equations for a given full data risk function. For the particular case of squared error loss, we further show how to implement these algorithms using existing software (e.g ...


Digital Commons powered by bepress