Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistical Theory

2015

Institution
Keyword
Publication
Publication Type
File Type

Articles 1 - 30 of 55

Full-Text Articles in Statistics and Probability

Inequality In Treatment Benefits: Can We Determine If A New Treatment Benefits The Many Or The Few?, Emily Huang, Ethan Fang, Daniel Hanley, Michael Rosenblum Dec 2015

Inequality In Treatment Benefits: Can We Determine If A New Treatment Benefits The Many Or The Few?, Emily Huang, Ethan Fang, Daniel Hanley, Michael Rosenblum

Johns Hopkins University, Dept. of Biostatistics Working Papers

The primary analysis in many randomized controlled trials focuses on the average treatment effect and does not address whether treatment benefits are widespread or limited to a select few. This problem affects many disease areas, since it stems from how randomized trials, often the gold standard for evaluating treatments, are designed and analyzed. Our goal is to learn about the fraction who benefit from a treatment, based on randomized trial data. We consider the case where the outcome is ordinal, with binary outcomes as a special case. In general, the fraction who benefit is a non-identifiable parameter, and the best …


Niche-Based Modeling Of Japanese Stiltgrass (Microstegium Vimineum) Using Presence-Only Information, Nathan Bush Nov 2015

Niche-Based Modeling Of Japanese Stiltgrass (Microstegium Vimineum) Using Presence-Only Information, Nathan Bush

Masters Theses

The Connecticut River watershed is experiencing a rapid invasion of aggressive non-native plant species, which threaten watershed function and structure. Volunteer-based monitoring programs such as the University of Massachusetts’ OutSmart Invasives Species Project, Early Detection Distribution Mapping System (EDDMapS) and the Invasive Plant Atlas of New England (IPANE) have gathered valuable invasive plant data. These programs provide a unique opportunity for researchers to model invasive plant species utilizing citizen-sourced data. This study took advantage of these large data sources to model invasive plant distribution and to determine environmental and biophysical predictors that are most influential in dispersion, and to identify …


Estimation Of Reliability In Multicomponent Stress-Strength Based On Generalized Rayleigh Distribution, Gadde Srinivasa Rao Nov 2015

Estimation Of Reliability In Multicomponent Stress-Strength Based On Generalized Rayleigh Distribution, Gadde Srinivasa Rao

Srinivasa Rao Gadde Dr.

A multicomponent system of k components having strengths following k- independently and identically distributed random variables x1, x2, ..., xk and each component experiencing a random stress Y is considered. The system is regarded as alive only if at least s out of k (s < k) strengths exceed the stress. The reliability of such a system is obtained when strength and stress variates are given by a generalized Rayleigh distribution with different shape parameters. Reliability is estimated using the maximum likelihood (ML) method of estimation in samples drawn from strength and stress distributions; the reliability estimators are compared asymptotically. Monte-Carlo …


Bayes Multiple Binary Classifier - How To Make Decisions Like A Bayesian, Wensong Wu Nov 2015

Bayes Multiple Binary Classifier - How To Make Decisions Like A Bayesian, Wensong Wu

Mathematics Colloquium Series

This presentation will start by a general introduction of Bayesian statistics, which has become popular in the era of big data. Then we consider a two-class classification problem, where the goal is to predict the class membership of M units based on the values of high-dimensional categorical predictor variables as well as both the values of predictor variables and the class membership of other N independent units. We focus on applying generalized linear regression models with Boolean expressions of categorical predictors. We consider a Bayesian and decision-theoretic framework, and develop a general form of Bayes multiple binary classification functions with …


Approaches For Detection Of Unstable Processes: A Comparative Study, Yerriswamy Wooluru, D. R. Swamy, P. Nagesh Nov 2015

Approaches For Detection Of Unstable Processes: A Comparative Study, Yerriswamy Wooluru, D. R. Swamy, P. Nagesh

Journal of Modern Applied Statistical Methods

A process is stable only when parameters of the distribution of a process or product characteristic remain same over time. Only a stable process has the ability to perform in a predictable manner over time. Statistical analysis of process data usually assume that data are obtained from stable process. In the absence of control charts, the hypothesis of process stability is usually assessed by visual examination of the pattern in the run chart. In this paper appropriate statistical approaches have been adopted to detect instability in the process and compared their performance with the run chart of considerably shorter length …


Contrails: Causal Inference Using Propensity Scores, Dean S. Barron Nov 2015

Contrails: Causal Inference Using Propensity Scores, Dean S. Barron

Journal of Modern Applied Statistical Methods

Contrails are clouds caused by airplane exhausts, which geologists contend decrease daily temperature ranges on Earth. Following the 2001 World Trade Center attack, cancelled domestic flights triggered the first absence of contrails in decades. Resultant exceptional data capacitated causal inference analysis by propensity score matching. Estimated contrail effect was 6.8981°F.


The Bayes Factor For Case-Control Studies With Misclassified Data, Tzesan Lee Nov 2015

The Bayes Factor For Case-Control Studies With Misclassified Data, Tzesan Lee

Journal of Modern Applied Statistical Methods

The question of how to test if collected data for a case-control study are misclassified was investigated. A mixed approach was employed to calculate the Bayes factor to assess the validity of the null hypothesis of no-misclassification. A real-world data set on the association between lung cancer and smoking status was used as an example to illustrate the proposed method.


Statistical Modeling Of Migration Attractiveness Of The Eu Member States, Tatiana Tikhomirova, Yulia Lebedeva Nov 2015

Statistical Modeling Of Migration Attractiveness Of The Eu Member States, Tatiana Tikhomirova, Yulia Lebedeva

Journal of Modern Applied Statistical Methods

Identifying the relationship between the migration attractiveness of the European Union countries and their level of socio-economic development is investigated. An approach is proposed identify influences on migration socio-economic characteristics, by aggregating and reducing their diversity, and substantiating the cause-and-effect relationships of the studied phenomenon. A stable classification of countries scheme is developed according to the attractiveness of migration on aggregate factors, and then an econometric model of a binary choice using panel data for 2008-2010 was applying, quantifying the impact of aggregate designed factors on immigration and emigration.


Bayesian Analysis Under Progressively Censored Rayleigh Data, Gyan Prakash Nov 2015

Bayesian Analysis Under Progressively Censored Rayleigh Data, Gyan Prakash

Journal of Modern Applied Statistical Methods

The one-parameter Rayleigh model is considered as an underlying model for evaluating the properties of Bayes estimator under Progressive Type-II right censored data. The One‑Sample Bayes prediction bound length (OSBPBL) is also measured. Based on two different asymmetric loss functions a comparative study presented for Bayes estimation. A simulation study was used to evaluate their comparative properties.


An Empirical Study On Different Ranking Methods For Effective Data Classification, Ilangovan Sangaiah, A. Vincent Antony Kumar, Appavu Balamurugan Nov 2015

An Empirical Study On Different Ranking Methods For Effective Data Classification, Ilangovan Sangaiah, A. Vincent Antony Kumar, Appavu Balamurugan

Journal of Modern Applied Statistical Methods

Ranking is the attribute selection technique used in the pre-processing phase to emphasize the most relevant attributes which allow models of classification simpler and easy to understand. It is a very important and a central task for information retrieval, such as web search engines, recommendation systems, and advertisement systems. A comparison between eight ranking methods was conducted. Ten different learning algorithms (NaiveBayes, J48, SMO, JRIP, Decision table, RandomForest, Multilayerperceptron, Kstar) were used to test the accuracy. The ranking methods with different supervised learning algorithms give different results for balanced accuracy. It was shown the selection of ranking methods could be …


Two Stage Robust Ridge Method In A Linear Regression Model, Adewale Folaranmi Lukman, Oyedeji Isola Osowole, Kayode Ayinde Nov 2015

Two Stage Robust Ridge Method In A Linear Regression Model, Adewale Folaranmi Lukman, Oyedeji Isola Osowole, Kayode Ayinde

Journal of Modern Applied Statistical Methods

Two Stage Robust Ridge Estimators based on robust estimators M, MM, S, LTS are examined in the presence of autocorrelation, multicollinearity and outliers as alternative to Ordinary Least Square Estimator (OLS). The estimator based on S estimator performs better. Mean square error was used as a criterion for examining the performances of these estimators.


Semi-Parametric Non-Proportional Hazard Model With Time Varying Covariate, Kazeem A. Adeleke, Alfred A. Abiodun, R. A. Ipinyomi Nov 2015

Semi-Parametric Non-Proportional Hazard Model With Time Varying Covariate, Kazeem A. Adeleke, Alfred A. Abiodun, R. A. Ipinyomi

Journal of Modern Applied Statistical Methods

The application of survival analysis has extended the importance of statistical methods for time to event data that incorporate time dependent covariates. The Cox proportional hazards model is one such method that is widely used. An extension of the Cox model with time-dependent covariates was adopted when proportionality assumption are violated. The purpose of this study is to validate the model assumption when hazard rate varies with time. This approach is applied to model data on duration of infertility subject to time varying covariate. Validity is assessed by a set of simulation experiments and results indicate that a non proportional …


Structural Properties Of Transmuted Weibull Distribution, Kaisar Ahmad, S. P. Ahmad, A. Ahmed Nov 2015

Structural Properties Of Transmuted Weibull Distribution, Kaisar Ahmad, S. P. Ahmad, A. Ahmed

Journal of Modern Applied Statistical Methods

The transmuted Weibull distribution, and a related special case, is introduced. Estimates of parameters are obtained by using a new method of moments.


New Entropy Estimators With Smaller Root Mean Squared Error, Amer Ibrahim Al-Omari Nov 2015

New Entropy Estimators With Smaller Root Mean Squared Error, Amer Ibrahim Al-Omari

Journal of Modern Applied Statistical Methods

New estimators of entropy of continuous random variable are suggested. The proposed estimators are investigated under simple random sampling (SRS), ranked set sampling (RSS), and double ranked set sampling (DRSS) methods. The estimators are compared with Vasicek (1976) and Al-Omari (2014) entropy estimators theoretically and by simulation in terms of the root mean squared error (RMSE) and bias values. The results indicate that the suggested estimators have less RMSE and bias values than their competing estimators introduced by Vasicek (1976) and Al-Omari (2014).


Caution For Software Use Of New Statistical Methods (R), Akiva J. Lorenz, Barry S. Markman, Shlomo Sawilowsky Nov 2015

Caution For Software Use Of New Statistical Methods (R), Akiva J. Lorenz, Barry S. Markman, Shlomo Sawilowsky

Journal of Modern Applied Statistical Methods

Open source programming languages such as R allow statisticians to develop and rapidly disseminate advanced procedures, but sometimes at the expense of a proper vetting process. A new example is the least trimmed squares regression available in R’s lqs() in the MASS library. It produces pretty regression lines, particularly in the presence of outliers. However, this procedure lacks a defined standard error, and thus it should be avoided.


Inferences About The Skipped Correlation Coefficient: Dealing With Heteroscedasticity And Non-Normality, Rand Wilcox Nov 2015

Inferences About The Skipped Correlation Coefficient: Dealing With Heteroscedasticity And Non-Normality, Rand Wilcox

Journal of Modern Applied Statistical Methods

A common goal is testing the hypothesis that Pearson’s correlation is zero and typically this is done based on Student’s T test. There are, however, several well-known concerns. First, Student’s T is sensitive to heteroscedasticity. That is, when it rejects, it is reasonable to conclude that there is dependence, but in terms of making a decision about the strength of the association, it is unsatisfactory. Second, Pearson’s correlation is not robust: it can poorly reflect the strength of the association. Even a single outlier can have a tremendous impact on the usual estimate of Pearson’s correlation, which can result in …


Resolving The Issue Of How Reliability Is Related To Statistical Power: Adhering To Mathematical Definitions, Donald W. Zimmerman, Bruno D. Zumbo Nov 2015

Resolving The Issue Of How Reliability Is Related To Statistical Power: Adhering To Mathematical Definitions, Donald W. Zimmerman, Bruno D. Zumbo

Journal of Modern Applied Statistical Methods

Reliability in classical test theory is a population-dependent concept, defined as a ratio of true-score variance and observed-score variance, where observed-score variance is a sum of true and error components. On the other hand, the power of a statistical significance test is a function of the total variance, irrespective of its decomposition into true and error components. For that reason, the reliability of a dependent variable is a function of the ratio of true-score variance and observed-score variance, whereas statistical power is a function of the sum of the same two variances. Controversies about how reliability is related to statistical …


In (Partial) Defense Of .05, Thomas R. Knapp Nov 2015

In (Partial) Defense Of .05, Thomas R. Knapp

Journal of Modern Applied Statistical Methods

Researchers are frequently chided for choosing the .05 alpha level as the determiner of statistical significance (or non-significance). A partial justification is provided.


The Distribution Of The Inverse Square Root Transformed Error Component Of The Multiplicative Time Series Model, Bright F. Ajibade, Chinwe R. Nwosu, J. I. Mbegdu Nov 2015

The Distribution Of The Inverse Square Root Transformed Error Component Of The Multiplicative Time Series Model, Bright F. Ajibade, Chinwe R. Nwosu, J. I. Mbegdu

Journal of Modern Applied Statistical Methods

The probability density function, mean and variance of the inverse square-root transformed left-truncated N(1,σ2) error component e*t(=1/ √et) of the multiplicative time series model were established. A comparison of key-statistical properties of e*t and et confirmed normality with mean 1 but with Var(e*t) ≈1/4Var(et) when σ≤0.14. Hence σ≤0.14 is the required condition for successful transformation.


Front Matter, Jmasm Editors Nov 2015

Front Matter, Jmasm Editors

Journal of Modern Applied Statistical Methods

.


Vol. 14, No. 2 (Full Issue), Jmasm Editors Nov 2015

Vol. 14, No. 2 (Full Issue), Jmasm Editors

Journal of Modern Applied Statistical Methods

.


Monte Carlo Comparison Of The Parameter Estimation Methods For The Two-Parameter Gumbel Distribution, Demet Aydin, Birdal Şenoğlu Nov 2015

Monte Carlo Comparison Of The Parameter Estimation Methods For The Two-Parameter Gumbel Distribution, Demet Aydin, Birdal Şenoğlu

Journal of Modern Applied Statistical Methods

The performances of the seven different parameter estimation methods for the Gumbel distribution are compared with numerical simulations. Estimation methods used in this study are the method of moments (ME), the method of maximum likelihood (ML), the method of modified maximum likelihood (MML), the method of least squares (LS), the method of weighted least squares (WLS), the method of percentile (PE) and the method of probability weighted moments (PWM). Performance of the estimators is compared with respect to their biases, MSE and deficiency (Def) values via Monte-Carlo simulation. A Monte Carlo Simulation study showed that the method of PWM was …


A Robust Panel Unit Root Test In The Presence Of Cross Sectional Dependence, Nurul Sima Mohamad Shariff, Nor Aishah Hamzah Nov 2015

A Robust Panel Unit Root Test In The Presence Of Cross Sectional Dependence, Nurul Sima Mohamad Shariff, Nor Aishah Hamzah

Journal of Modern Applied Statistical Methods

Problems arise in testing the stationarity of the panel in the presence of cross sectional dependence and outliers. The currently available panel unit root tests are very much affected by the presence of outliers. As such, this article introduces an alternative test which is robust to outliers and cross sectional dependence. The performance and robustness of the proposed test is discussed and comparisons are made to the existing tests via simulation studies.


Jmasm34: Two Group Program For Cohen's D, Hedges’ G, Η2, Radj2, Ω2, Ɛ2, Confidence Intervals, And Power, David A. Walker Nov 2015

Jmasm34: Two Group Program For Cohen's D, Hedges’ G, Η2, Radj2, Ω2, Ɛ2, Confidence Intervals, And Power, David A. Walker

Journal of Modern Applied Statistical Methods

The purpose of this research is to provide an application for users interested in a SPSS syntax program to determine an array of commonly-employed effect sizes and confidence intervals not readily available in SPSS functionality, such as the standardized mean difference and r-related squared indices, for a between-group design.


An Omnibus Nonparametric Test Of Equality In Distribution For Unknown Functions, Alexander Luedtke, Marco Carone, Mark Van Der Laan Oct 2015

An Omnibus Nonparametric Test Of Equality In Distribution For Unknown Functions, Alexander Luedtke, Marco Carone, Mark Van Der Laan

Alex Luedtke

We present a novel family of nonparametric omnibus tests of the hypothesis that two unknown but estimable functions are equal in distribution when applied to the observed data structure. We developed these tests, which represent a generalization of the maximum mean discrepancy tests described in Gretton et al. [2006], using recent developments from the higher-order pathwise differentiability literature. Despite their complex derivation, the associated test statistics can be expressed rather simply as U-statistics. We study the asymptotic behavior of the proposed tests under the null hypothesis and under both fixed and local alternatives. We provide examples to which our tests …


C-Learning: A New Classification Framework To Estimate Optimal Dynamic Treatment Regimes, Baqun Zhang, Min Zhang Aug 2015

C-Learning: A New Classification Framework To Estimate Optimal Dynamic Treatment Regimes, Baqun Zhang, Min Zhang

The University of Michigan Department of Biostatistics Working Paper Series

Personalizing treatment to accommodate patient heterogeneity and the evolving nature of a disease over time has received considerable attention lately. A dynamic treatment regime is a set of decision rules, each corresponding to a decision point, that determine that next treatment based on each individual’s own available characteristics and treatment history up to that point. We show that identifying the optimal dynamic treatment regime can be recast as a sequential classification problem and is equivalent to sequentially minimizing a weighted expected misclassification error. This general classification perspective targets the exact goal of optimally individualizing treatments and is new and fundamentally …


Nonparametric Methods For Doubly Robust Estimation Of Continuous Treatment Effects, Edward Kennedy, Zongming Ma, Matthew Mchugh, Dylan Small Jun 2015

Nonparametric Methods For Doubly Robust Estimation Of Continuous Treatment Effects, Edward Kennedy, Zongming Ma, Matthew Mchugh, Dylan Small

Edward H. Kennedy

Continuous treatments (e.g., doses) arise often in practice, but available causal effect estimators require either parametric models for the effect curve or else consistent estimation of a single nuisance function. We propose a novel doubly robust kernel smoothing approach, which requires only mild smoothness assumptions on the effect curve and allows for misspecification of either the treatment density or outcome regression. We derive asymptotic properties and also discuss an approach for data-driven bandwidth selection. The methods are illustrated via simulation and in a study of the effect of nurse staffing on hospital readmissions penalties.


Semiparametric Causal Inference In Matched Cohort Studies, Edward Kennedy, Arvid Sjolander, Dylan Small Jun 2015

Semiparametric Causal Inference In Matched Cohort Studies, Edward Kennedy, Arvid Sjolander, Dylan Small

Edward H. Kennedy

Odds ratios can be estimated in case-control studies using standard logistic regression, ignoring the outcome-dependent sampling. In this paper we discuss an analogous result for treatment effects on the treated in matched cohort studies. Specifically, in studies where a sample of treated subjects is observed along with a separate sample of possibly matched controls, we show that efficient and doubly robust estimators of effects on the treated are computationally equivalent to standard estimators, which ignore the matching and exposure-based sampling. This is not the case for general average effects. We also show that matched cohort studies are often more efficient …


A Study Of The Parametric And Nonparametric Linear-Circular Correlation Coefficient, Robin Tu Jun 2015

A Study Of The Parametric And Nonparametric Linear-Circular Correlation Coefficient, Robin Tu

Statistics

Circular statistics are specialized statistical methods that deal specifically with directional data. Data that is angular require specialized techniques due to the modulo 2π (in radians) or modulo 360 (in degrees) nature of angles.

Correlation, typically in terms of Pearson’s correlation coefficient, is a measure of association between two linear random variables x and y. In this paper, the specific circular technique of the parametric and nonparametric linear-circular correlation coefficient will be explored where correlation is no longer between two linear variables x and y, but between a linear random variable x and circular random variable θ.

A simulation …


Maximum Likelihood Estimation Of The Kumaraswamy Exponential Distribution With Applications, K. A. Adepoju, O. I. Chukwu May 2015

Maximum Likelihood Estimation Of The Kumaraswamy Exponential Distribution With Applications, K. A. Adepoju, O. I. Chukwu

Journal of Modern Applied Statistical Methods

The Kumaraswamy exponential distribution, a generalization of the exponential, is developed as a model for problems in environmental studies, survival analysis and reliability. The estimation of parameters is approached by maximum likelihood and the observed information matrix is derived. The proposed models are applied to three real data sets.