Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistical Theory

Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 1591 - 1620 of 1673

Full-Text Articles in Statistics and Probability

Double Median Ranked Set Sample: Comparing To Other Double Ranked Samples For Mean And Ratio Estimators, Hani M. Samawi, Eman M. Tawalbeh Nov 2002

Double Median Ranked Set Sample: Comparing To Other Double Ranked Samples For Mean And Ratio Estimators, Hani M. Samawi, Eman M. Tawalbeh

Journal of Modern Applied Statistical Methods

Double median ranked set sample (DMRSS) and its properties for estimating the population mean, when the underlying distribution is assumed to be symmetric about its mean, are introduced. Also, the performance of DMRSS with respect to other ranked set samples and double ranked set samples, for estimating the population mean and ratio, is considered. Real data that consist of heights and diameters of 399 trees are used to illustrate the procedure. The analysis and simulation indicate that using DMRSS for estimating the population mean is more efficient than using the other ranked samples and double ranked samples schemes except in …


On The Misuse Of Confidence Intervals For Two Means In Testing For The Significance Of The Difference Between The Means, George W. Ryan, Steven D. Leadbetter Nov 2002

On The Misuse Of Confidence Intervals For Two Means In Testing For The Significance Of The Difference Between The Means, George W. Ryan, Steven D. Leadbetter

Journal of Modern Applied Statistical Methods

Comparing individual confidence intervals of two population means is an incorrect procedure for determining the statistical significance of the difference between the means. We show conditions where confidence intervals for the means from two independent samples overlap and the difference between the means is in fact significant.


Fermat, Schubert, Einstein, And Behrens-Fisher: The Probable Difference Between Two Means When Σ_1^2≠Σ_2^2, Shlomo S. Sawilowsky Nov 2002

Fermat, Schubert, Einstein, And Behrens-Fisher: The Probable Difference Between Two Means When Σ_1^2≠Σ_2^2, Shlomo S. Sawilowsky

Journal of Modern Applied Statistical Methods

The history of the Behrens-Fisher problem and some approximate solutions are reviewed. In outlining relevant statistical hypotheses on the probable difference between two means, the importance of the Behrens- Fisher problem from a theoretical perspective is acknowledged, but it is concluded that this problem is irrelevant for applied research in psychology, education, and related disciplines. The focus is better placed on “shift in location” and, more importantly, “shift in location and change in scale” treatment alternatives.


Best Regression Model Using Information Criteria, Phill Gagné, C. Mitchell Dayton Nov 2002

Best Regression Model Using Information Criteria, Phill Gagné, C. Mitchell Dayton

Journal of Modern Applied Statistical Methods

The accuracy of AIC and BIC is evaluated under simulated multiple regression conditions, varying number of total and valid predictors, R2, and n. AIC and BIC were increasingly accurate as n increased and as total predictors decreased. Interactions of the ratio of valid/total predictors affected accuracy.


On The Estimation Of Binomial Success Probability With Zero Occurrence In Sample, Mehdi Razzaghi Nov 2002

On The Estimation Of Binomial Success Probability With Zero Occurrence In Sample, Mehdi Razzaghi

Journal of Modern Applied Statistical Methods

The problem of estimating the probability of a rare event when the sample shows no incidence of the event is considered. Several methodologies based on various statistical techniques are described and their relative performances are investigated. A decision theoretic approach for estimation of response probability when the sample contains zero responses is examined in depth. The properties of each method are discussed and an example from teratology is used to provide illustration and to demonstrate the results.


Null Distribution Of The Likelihood Ratio Statistic For Feed-Forward Neural Networks, Douglas Landsittel, Harshinder Singh, Vincent C. Arena, Stewart J. Anderson Nov 2002

Null Distribution Of The Likelihood Ratio Statistic For Feed-Forward Neural Networks, Douglas Landsittel, Harshinder Singh, Vincent C. Arena, Stewart J. Anderson

Journal of Modern Applied Statistical Methods

Despite recent publications exploring model complexity with modern regression methods, their dimensionality is rarely quantified in practice and the distributions of related test statistics are not well characterized. Through a simulation study, we describe the null distribution of the likelihood ratio statistic for several different feed-forward neural network models.


A Simulation Study Of The Impact Of Forecast Recovery For Control Charts Applied To Arma Processes, John N. Dyer, B. Michael Adams, Michael D. Conerly Nov 2002

A Simulation Study Of The Impact Of Forecast Recovery For Control Charts Applied To Arma Processes, John N. Dyer, B. Michael Adams, Michael D. Conerly

Journal of Modern Applied Statistical Methods

Forecast-based schemes are often used to monitor autocorrelated processes, but the resulting forecast recovery has a significant effect on the performance of control charts. This article describes forecast recovery for autocorrelated processes, and the resulting simulation study is used to explain the performance of control charts applied to forecast errors.


Accounting For Non-Independent Observations In 2×2 Tables, With Application To Correcting For Family Clustering In Exposure-Risk Relationship Studies, Leslie A. Kalsih, Katherine A. Riester, Stuart J. Pocock Nov 2002

Accounting For Non-Independent Observations In 2×2 Tables, With Application To Correcting For Family Clustering In Exposure-Risk Relationship Studies, Leslie A. Kalsih, Katherine A. Riester, Stuart J. Pocock

Journal of Modern Applied Statistical Methods

Participants in epidemiologic studies may not represent statistically independent observations. We consider modifications to conventional analyses of 2×2 tables, including Fisher’s exact test and confidence intervals, to account for correlated observations in this setting. An example is provided, assessing the robustness of conclusions from a published analysis.


Combining Quantum Mechanical Calculations And A Χ^2 Fit In A Potential Energy Function For The Co_2 + O^+ Reaction, Ellen F. Sawilowsky Nov 2002

Combining Quantum Mechanical Calculations And A Χ^2 Fit In A Potential Energy Function For The Co_2 + O^+ Reaction, Ellen F. Sawilowsky

Journal of Modern Applied Statistical Methods

In order to compute a highly accurate statistical rate constant for the CO2 + O+ reaction, it is necessary to first calculate the potential energy of the system at many different geometric configurations. Quantum mechanical calculations are very time-consuming, making it difficult to obtain a sufficient number to allow for accurate interpolation. The number of quantum mechanical calculations required can be significantly reduced by using known relations in classical physics to calculate energy for configurations where the oxygen is relatively far from the CO2. A chi-squared fit to quantum mechanical points is obtained for these configurations, and the resulting …


Type I Error Rates For Rank-Based Tests Of Homogeneity Of Slopes, Alan J. Klockars, Tim P. Moses Nov 2002

Type I Error Rates For Rank-Based Tests Of Homogeneity Of Slopes, Alan J. Klockars, Tim P. Moses

Journal of Modern Applied Statistical Methods

The purpose of this study was to explicate two issues concerning the standard and rank based test of homogeneity of slopes. Two alternative ranking methods intended to address nonnormality and additive treatment effect patterns were developed and compared in terms of their ability to control Type I error. The results replicated previous findings of inflated Type I error rates with leptokurtic curves and with rank based tests with some patterns of additive treatment effects. The new nonparametric procedures generally control Type I error although they were slightly inflated with skewed distributions.


Exploration Of Distributions Of Ratio Of Partial Sum Of Sample Eigenvalues When All Population Eigenvalues Are The Same, Moonseong Heo Nov 2002

Exploration Of Distributions Of Ratio Of Partial Sum Of Sample Eigenvalues When All Population Eigenvalues Are The Same, Moonseong Heo

Journal of Modern Applied Statistical Methods

This paper explores empirically the first two moments of ratio of the partial sum of the first two sample eigenvalues to the sum of all eigenvalues when the population eigenvalues of a covariance matrix are all the same. Estimation of the first two moments can be practically crucial in assessing non-randomness of observed patterns on planar graphical displays based on lower rank approximations of data matrices. For derivation of the moments, exact and large sample asymptotic distributions of the sample ratios are reviewed but neither can be applicable to derivation of the moments. Therefore, I rely on simulations, where data …


On Distribution Function Estimation Using Double Ranked Set Samples With Application, Walid A. Abu-Dayyeh, Hani M. Samawi, Lara A. Bani-Hani Nov 2002

On Distribution Function Estimation Using Double Ranked Set Samples With Application, Walid A. Abu-Dayyeh, Hani M. Samawi, Lara A. Bani-Hani

Journal of Modern Applied Statistical Methods

As a variation of ranked set sampling (RSS); double ranked set sampling (DRSS) was introduced by Al-Saleh and Al-Kadiri (2000), and it has been used only for estimating the mean of the population. In this paper DRSS will be used for estimating the distribution function (cdf). The efficiency of the proposed estimators will be obtained when ranking is perfect. Some inference on the distribution function will be drawn based on Kolomgrov-Smirnov statistic. It will be shown that using DRSS will increase the efficiency in this case.


A Program For Generating All Permutations Of {1, 2, ..., N}, Robert Disario Nov 2002

A Program For Generating All Permutations Of {1, 2, ..., N}, Robert Disario

Journal of Modern Applied Statistical Methods

A Visual Basic program that generates all permutations of {1, 2, ..., n} is presented. The procedure for running the program as an Excel macro is described. An application is presented which involves selecting permutations which meet a specific constraint.


Chronic Disease Data And Analysis: Current State Of The Field, Ralph D'Agostino Sr., Lisa M. Sullivan Nov 2002

Chronic Disease Data And Analysis: Current State Of The Field, Ralph D'Agostino Sr., Lisa M. Sullivan

Journal of Modern Applied Statistical Methods

Chronic disease usually spans years of a person’s lifetime and includes a disease free period, a preclinical, or latent period, where there are few overt signs of disease, a clinical period where the disease manifests and is eventually diagnosed, and a follow-up period where the disease might progress steadily or remain stable. It is often of interest to investigate the relationship between risk factors measured at a point in time (usually during the disease free or preclinical period), and the development of disease at some future point (e.g., 10 years later). We outline some popular designs for the identification of …


Locally Efficient Estimation With Bivariate Right Censored Data , Christopher M. Quale, Mark J. Van Der Laan, James M. Robins Oct 2002

Locally Efficient Estimation With Bivariate Right Censored Data , Christopher M. Quale, Mark J. Van Der Laan, James M. Robins

U.C. Berkeley Division of Biostatistics Working Paper Series

Estimation for bivariate right censored data is a problem that has had much study over the past 15 years. In this paper we propose a new class of estimators for the bivariate survivor function based on locally efficient estimation. The locally efficient estimator takes bivariate estimators Fn and Gn of the distributions of the time variables T1,T2 and the censoring variables C1,C2, respectively, and maps them to the resulting estimator. If Fn and Gn are consistent estimators of F and G, respectively, then the resulting estimator will be nonparametrically efficient (thus the term ``locally efficient''). However, if either Fn or …


Fast Fifth-Order Polynomial Transforms For Generating Univariate And Multivariate Nonnormal Distributions, Todd C. Headrick Oct 2002

Fast Fifth-Order Polynomial Transforms For Generating Univariate And Multivariate Nonnormal Distributions, Todd C. Headrick

Todd Christopher Headrick

A general procedure is derived for simulating univariate and multivariate nonnormal distributions using polynomial transformations of order five. The procedure allows for the additional control of the fifth and sixth moments. The ability to control higher moments increases the precision in the approximations of nonnormal distributions and lowers the skew and kurtosis boundary relative to the competing procedures considered. Tabled values of constants are provided for approximating various probability density functions. A numerical example is worked to demonstrate the multivariate procedure. The results of a Monte Carlo simulation are provided to demonstrate that the procedure generates specified population parameters and …


The Analysis Of Placement Values For Evaluating Discriminatory Measures, Margaret S. Pepe, Tianxi Cai Sep 2002

The Analysis Of Placement Values For Evaluating Discriminatory Measures, Margaret S. Pepe, Tianxi Cai

UW Biostatistics Working Paper Series

The idea of using measurements such as biomarkers, clinical data, or molecular biology assays for classification and prediction is popular in modern medicine. The scientific evaluation of such measures includes assessing the accuracy with which they predict the outcome of interest. Receiver operating characteristic curves are commonly used for evaluating the accuracy of diagnostic tests. They can be applied more broadly, indeed to any problem involving classification to two states or populations (D = 0 or D = 1). We show that the ROC curve can be interpreted as a cumulative distribution function for the discriminatory measure Y in the …


Accelerated Hazards Model: Method, Theory And Applications, Ying Qing Chen, Nicholas P. Jewell, Jingrong Yang Sep 2002

Accelerated Hazards Model: Method, Theory And Applications, Ying Qing Chen, Nicholas P. Jewell, Jingrong Yang

U.C. Berkeley Division of Biostatistics Working Paper Series

In an accelerated hazards model, the hazard functions of a failure time are related through the time scale-change, which is often a function of covariates and associated parameters. When the hazard functions have special properties, such as monotonicity in time, the parameters may be clinically meaningful in measuring a treatment effect. This paper reviews methodological and theoretical development of this model. Applications of the accelerated hazards model including sample size calculation in clinical trials, are also explored.


Locally Efficient Estimation Of Regression Parameters Using Current Status Data, Chris Andrews, Mark J. Van Der Laan, James M. Robins Sep 2002

Locally Efficient Estimation Of Regression Parameters Using Current Status Data, Chris Andrews, Mark J. Van Der Laan, James M. Robins

U.C. Berkeley Division of Biostatistics Working Paper Series

In biostatistics applications interest often focuses on the estimation of the distribution of a time-variable T. If one only observes whether or not T exceeds an observed monitoring time C, then the data structure is called current status data, also known as interval censored data, case I. We consider this data structure extended to allow the presence of both time-independent covariates and time-dependent covariate processes that are observed until the monitoring time. We assume that the monitoring process satisfies coarsening at random.

Our goal is to estimate the regression parameter beta of the regression model T = Z*beta+epsilon where the …


Case-Control Current Status Data, Nicholas P. Jewell, Mark J. Van Der Laan Sep 2002

Case-Control Current Status Data, Nicholas P. Jewell, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Current status observation on survival times has recently been widely studied. An extreme form of interval censoring, this data structure refers to situations where the only available information on a survival random variable, T, is whether or not T exceeds a random independent monitoring time C, a binary random variable, Y. To date, nonparametric analyses of current status data have assumed the availability of i.i.d. random samples of the random variable (Y, C), or a similar random sample at each of a set of fixed monitoring times. In many situations, it is useful to consider a case-control sampling scheme. Here, …


Why Prefer Double Robust Estimates? Illustration With Causal Point Treatment Studies, Romain Neugebauer, Mark J. Van Der Laan Sep 2002

Why Prefer Double Robust Estimates? Illustration With Causal Point Treatment Studies, Romain Neugebauer, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In point treatment marginal structural models with treatment A, outcome Y and covariates W, causal parameters can be estimated under the assumption of no unobserved confounders. Three estimates can be used: the G-computation, Inverse Probability of Treatment Weighted (IPTW) or Double Robust (DR) estimates. The properties of the IPTW and DR estimates are known under an assumption on the treatment mechanism that we name "Experimental Treatment Assignment" (ETA) assumption. We show that the DR estimating function is unbiased when the ETA assumption is violated if the model used to regress Y on A and W is correctly specified. The practical …


Bivariate Current Status Data, Mark J. Van Der Laan, Nicholas P. Jewell Sep 2002

Bivariate Current Status Data, Mark J. Van Der Laan, Nicholas P. Jewell

U.C. Berkeley Division of Biostatistics Working Paper Series

In many applications, it is often of interest to estimate a bivariate distribution of two survival random variables. Complete observation of such random variables is often incomplete. If one only observes whether or not each of the individual survival times exceeds a common observed monitoring time C, then the data structure is referred to as bivariate current status data (Wang and Ding, 2000). For such data, we show that the identifiable part of the joint distribution is represented by three univariate cumulative distribution functions, namely the two marginal cumulative distribution functions, and the bivariate cumulative distribution function evaluated on the …


Current Status Data: Review, Recent Developments And Open Problems, Nicholas P. Jewell, Mark J. Van Der Laan Sep 2002

Current Status Data: Review, Recent Developments And Open Problems, Nicholas P. Jewell, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Researchers working with survival data are by now adept at handling issues associated with incomplete data, particular those associated with various forms of censoring. An extreme form of interval censoring, known as current status observation, refers to situations where the only available information on a survival random variable T is whether or not T exceeds a random independent monitoring time C. This article contains a brief review of the extensive literature on the analysis of current status data, discussing the implications of response-based sampling on these methods. The majority of the paper introduces some recent extensions of these ideas to …


Semiparametric Regression Analysis On Longitudinal Pattern Of Recurrent Gap Times, Ying Qing Chen, Mei-Cheng Wang, Yijian Huang Aug 2002

Semiparametric Regression Analysis On Longitudinal Pattern Of Recurrent Gap Times, Ying Qing Chen, Mei-Cheng Wang, Yijian Huang

U.C. Berkeley Division of Biostatistics Working Paper Series

In longitudinal studies, individual subjects may experience recurrent events of the same type over a relatively long period of time. The longitudinal pattern of the gaps between the successive recurrent events is often of great research interest. In this article, the probability structure of the recurrent gap times is first explored in the presence of censoring. According to the discovered structure, we introduce the proportional reverse-time hazards models with unspecified baseline functions to accommodate heterogeneous individual underlying distributions, when the ongitudinal pattern parameter is of main interest. Inference procedures are proposed and studied by way of proper riskset construction. The …


Multiple Hypothesis Testing In Microarray Experiments, Sandrine Dudoit, Juliet Popper Shaffer, Jennifer C. Boldrick Aug 2002

Multiple Hypothesis Testing In Microarray Experiments, Sandrine Dudoit, Juliet Popper Shaffer, Jennifer C. Boldrick

U.C. Berkeley Division of Biostatistics Working Paper Series

DNA microarrays are a new and promising biotechnology which allows the monitoring of expression levels in cells for thousands of genes simultaneously. An important and common question in microarray experiments is the identification of differentially expressed genes, i.e., genes whose expression levels are associated with a response or covariate of interest. The biological question of differential expression can be restated as a problem in multiple hypothesis testing: the simultaneous test for each gene of the null hypothesis of no association between the expression levels and the responses or covariates. As a typical microarray experiment measures expression levels for thousands of …


Estimation Of The Bivariate Survival Function With Generalized Bivariate Right Censored Data Structures, Sunduz Keles, Mark J. Van Der Laan, James M. Robins Aug 2002

Estimation Of The Bivariate Survival Function With Generalized Bivariate Right Censored Data Structures, Sunduz Keles, Mark J. Van Der Laan, James M. Robins

U.C. Berkeley Division of Biostatistics Working Paper Series

We propose a bivariate survival function estimator for a general right censored data structure that includes a time dependent covariate process. Firstly, an initial estimator that generalizes Dabrowska's (1988) estimator is introduced. We obtain this estimator by a general methodology of constructing estimating functions in censored data models. The initial estimator is guaranteed to improve on Dabrowska's estimator and remains consistent and asymptotically linear under informative censoring schemes if the censoring mechanism is estimated consistently. We then construct an orthogonalized estimating function which results in a more robust and efficient estimator than our initial estimator. A simulation study demonstrates the …


Nonlinear Regression Based On Ranks, Ashebar Abebe Jun 2002

Nonlinear Regression Based On Ranks, Ashebar Abebe

Dissertations

This study presents robust methods for estimating parameters of nonlinear regression models. The proposed methods obtain estimates by minimizing rankbased dispersions instead of the Euclidean norm. We focus on the Wilcoxon and generalized signed-rank dispersion functions. Asymptotic properties of the estimators are established under mild regularity conditions similar to those used in least squares and least absolute deviations estimation. The study also shows that by considering the generalized signed-rank dispersion we obtain a class of estimators that encompasses most of the existing popular nonlinear regression estimators. As in linear models, these rank-based procedures provide estimators that are highly efficient. This …


Shifting Goals And Mounting Challenges For Statistical Methodology, Pranab K. Sen May 2002

Shifting Goals And Mounting Challenges For Statistical Methodology, Pranab K. Sen

Journal of Modern Applied Statistical Methods

Modern interdisciplinary research in statistical science encompasses a wide field: agriculture, biology, biomedical sciences along with bioinformatics, clinical sciences, education, environmental and public health disciplines, genomic science, industry, molecular genetics, socio-behavior, socio-economics, toxicology, and a variety of other disciplines. Statistical science has historically had mathematical perspectives dominating theoretical and methodological developments. Yet, the advent of modern information technology has opened the doors for highly computation intensive statistical tools (i.e., software), wherein mathematical aspects are often de-emphasized. Knowledge discovery and data mining (KDDM) is now becoming a dominating force, with bioinformatics as a notable example. In view of this apparent discordance …


Combining Two Nonparametric Tests Of Location, R. Clifford Blair May 2002

Combining Two Nonparametric Tests Of Location, R. Clifford Blair

Journal of Modern Applied Statistical Methods

A distribution-free test is proposed whose power is similar to that of the Wilcoxon Rank-Sum or Terry-Hoeffding Normal Scores tests depending on which of these two tests is more powerful in a given data analysis situation, regardless of the population. This new statistic is distribution-free, and adds no new assumptions to those associated with the constituent tests. A table of critical values for the new statistic is given and some of its Type I error and power properties are examined.


Power Analyses When Comparing Trimmed Means, Rand R. Wilcox, H. J. Keselman May 2002

Power Analyses When Comparing Trimmed Means, Rand R. Wilcox, H. J. Keselman

Journal of Modern Applied Statistical Methods

Given a random sample from each of two independent groups, this article takes up the problem of estimating power, as well as a power curve, when comparing 20% trimmed means with a percentile bootstrap method. Many methods were considered, but only one was found to be satisfactory in terms of obtaining both a point estimate of power as well as a (one-sided) confidence interval. The method is illustrated with data from a reading study where theory suggests two groups should differ but nonsignificant results were obtained.