Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

Bias

Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 54

Full-Text Articles in Physical Sciences and Mathematics

Utilizing Markov Chains To Estimate Allele Progression Through Generations, Ronit Gandhi Jan 2023

Utilizing Markov Chains To Estimate Allele Progression Through Generations, Ronit Gandhi

Honors Theses

All populations display patterns in allele frequencies over time. Some alleles cease to exist, while some grow to become the norm. These frequencies can shift or stay constant based on the conditions the population lives in. If in Hardy-Weinberg equilibrium, the allele frequencies stay constant. Most populations, however, have bias from environmental factors, sexual preferences, other organisms, etc. We propose a stochastic Markov chain model to study allele progression across generations. In such a model, the allele frequencies in the next generation depend only on the frequencies in the current one.

We use this model to track a recessive allele …


Generalized Ratio-Cum-Product Estimator For Finite Population Mean Under Two-Phase Sampling Scheme, Gajendra Kumar Vishwakarma, Sayed Mohammed Zeeshan Jun 2021

Generalized Ratio-Cum-Product Estimator For Finite Population Mean Under Two-Phase Sampling Scheme, Gajendra Kumar Vishwakarma, Sayed Mohammed Zeeshan

Journal of Modern Applied Statistical Methods

A method to lower the MSE of a proposed estimator relative to the MSE of the linear regression estimator under two-phase sampling scheme is developed. Estimators are developed to estimate the mean of the variate under study with the help of auxiliary variate (which are unknown but it can be accessed conveniently and economically). The mean square errors equations are obtained for the proposed estimators. In addition, optimal sample sizes are obtained under the given cost function. The comparison study has been done to set up conditions for which developed estimators are more effective than other estimators with novelty. The …


Bias Of Rank Correlation Under A Mixture Model, Russell Land Jan 2021

Bias Of Rank Correlation Under A Mixture Model, Russell Land

Electronic Theses and Dissertations

This thesis project will analyze the bias in mixture models when contaminated data is present. Specifically, we will analyze the relationship between the bias and the mixing proportion, p, for the rank correlation methods Spearman’s Rho and Kendall’s Tau. We will first look at the history of the two non-parametric rank correlation methods and the sample and population definitions will be introduced. Copulas will be introduced to show a few ways we can define these correlation methods. After that, mixture models will be defined and the main theorem will be stated and proved. As an example, we will apply this …


A New Exponential Approach For Reducing The Mean Squared Errors Of The Estimators Of Population Mean Using Conventional And Non-Conventional Location Parameters, Housila P. Singh, Anita Yadav May 2020

A New Exponential Approach For Reducing The Mean Squared Errors Of The Estimators Of Population Mean Using Conventional And Non-Conventional Location Parameters, Housila P. Singh, Anita Yadav

Journal of Modern Applied Statistical Methods

Classes of ratio-type estimators t (say) and ratio-type exponential estimators te (say) of the population mean are proposed, and their biases and mean squared errors under large sample approximation are presented. It is the class of ratio-type exponential estimators te provides estimators more efficient than the ratio-type estimators.


The Importance Of Type I Error Rates When Studying Bias In Monte Carlo Studies In Statistics, Michael Harwell Feb 2020

The Importance Of Type I Error Rates When Studying Bias In Monte Carlo Studies In Statistics, Michael Harwell

Journal of Modern Applied Statistical Methods

Two common outcomes of Monte Carlo studies in statistics are bias and Type I error rate. Several versions of bias statistics exist but all employ arbitrary cutoffs for deciding when bias is ignorable or non-ignorable. This article argues Type I error rates should be used when assessing bias.


Efficient Class Of Estimators For Finite Population Mean Using Auxiliary Information In Two-Occasion Successive Sampling, G. N. Singh, Mohd Khalid Apr 2019

Efficient Class Of Estimators For Finite Population Mean Using Auxiliary Information In Two-Occasion Successive Sampling, G. N. Singh, Mohd Khalid

Journal of Modern Applied Statistical Methods

In the case of sampling on two occasions, a class of estimators is considered which uses information on the first occasion as well as the second occasion in order to estimate the population means on the current (second) occasion. The usefulness of auxiliary information in enhancing the efficiency of this estimation is examined through the class of proposed estimators. Some properties of the class of estimators and a strategy of optimum replacement are discussed. The proposed class of estimators were empirically compared with the sample mean estimator in the case of no matching. The established optimum estimator, which is a …


A Strategy For Using Bias And Rmse As Outcomes In Monte Carlo Studies In Statistics, Michael Harwell Mar 2019

A Strategy For Using Bias And Rmse As Outcomes In Monte Carlo Studies In Statistics, Michael Harwell

Journal of Modern Applied Statistical Methods

To help ensure important patterns of bias and accuracy are detected in Monte Carlo studies in statistics this paper proposes conditioning bias and root mean square error (RMSE) measures on estimated Type I and Type II error rates. A small Monte Carlo study is used to illustrate this argument.


A Comparison Of Bayesian Estimation Techniques In A Multidimensional Two-Parameter Partial Credit Item Response Model, Peiyan Liu Jan 2019

A Comparison Of Bayesian Estimation Techniques In A Multidimensional Two-Parameter Partial Credit Item Response Model, Peiyan Liu

Electronic Theses and Dissertations

Bayesian estimation methods have shown better performance than the traditional Marginal Maximum Likelihood (MML) estimation method for parameter estimation in relatively simple item response models. However, extant literature is lacking on the investigation of Bayesian parameter estimation approaches for a multidimensional two parameter partial credit (M2PPC) model, therefore this simulation study investigated the performance of two Bayesian Markov Chain Monte Carlo (MCMC) algorithms: Gibbs Sampler and Hamiltonian Monte Carlo-No-U-Turn-Sampler (HMC-NUTS) for M2PPC models' parameter estimation. It compared the estimation accuracy and computing speed in different combinations of situations, including prior choices, test lengths, and the relationships between dimensions.

The datasets …


Controlling For Confounding Via Propensity Score Methods Can Result In Biased Estimation Of The Conditional Auc: A Simulation Study, Hadiza I. Galadima, Donna K. Mcclish Jan 2019

Controlling For Confounding Via Propensity Score Methods Can Result In Biased Estimation Of The Conditional Auc: A Simulation Study, Hadiza I. Galadima, Donna K. Mcclish

Community & Environmental Health Faculty Publications

In the medical literature, there has been an increased interest in evaluating association between exposure and outcomes using nonrandomized observational studies. However, because assignments to exposure are not random in observational studies, comparisons of outcomes between exposed and nonexposed subjects must account for the effect of confounders. Propensity score methods have been widely used to control for confounding, when estimating exposure effect. Previous studies have shown that conditioning on the propensity score results in biased estimation of conditional odds ratio and hazard ratio. However, research is lacking on the performance of propensity score methods for covariate adjustment when estimating the …


Evaluation Of Using The Bootstrap Procedure To Estimate The Population Variance, Nghia Trong Nguyen May 2018

Evaluation Of Using The Bootstrap Procedure To Estimate The Population Variance, Nghia Trong Nguyen

Electronic Theses and Dissertations

The bootstrap procedure is widely used in nonparametric statistics to generate an empirical sampling distribution from a given sample data set for a statistic of interest. Generally, the results are good for location parameters such as population mean, median, and even for estimating a population correlation. However, the results for a population variance, which is a spread parameter, are not as good due to the resampling nature of the bootstrap method. Bootstrap samples are constructed using sampling with replacement; consequently, groups of observations with zero variance manifest in these samples. As a result, a bootstrap variance estimator will carry a …


Investigation Of Model Stacking For Drug Sensitivity Prediction, Kevin Matlock, Carlos De Niz, Raziur Rahman, Souparno Ghosh, Ranadip Pal Jan 2018

Investigation Of Model Stacking For Drug Sensitivity Prediction, Kevin Matlock, Carlos De Niz, Raziur Rahman, Souparno Ghosh, Ranadip Pal

Department of Statistics: Faculty Publications

Background: A significant problem in precision medicine is the prediction of drug sensitivity for individual cancer cell lines. Predictive models such as Random Forests have shown promising performance while predicting from individual genomic features such as gene expressions. However, accessibility of various other forms of data types including information on multiple tested drugs necessitates the examination of designing predictive models incorporating the various data types.

Results: We explore the predictive performance of model stacking and the effect of stacking on the predictive bias and squarred error. In addition we discuss the analytical underpinnings supporting the advantages of stacking in reducing …


Impact Of Home Visit Capacity On Genetic Association Studies Of Late-Onset Alzheimer's Disease, David W. Fardo, Laura E. Gibbons, Shubhabrata Mukherjee, M. Maria Glymour, Wayne Mccormick, Susan M. Mccurry, James D. Bowen, Eric B. Larson, Paul K. Crane Aug 2017

Impact Of Home Visit Capacity On Genetic Association Studies Of Late-Onset Alzheimer's Disease, David W. Fardo, Laura E. Gibbons, Shubhabrata Mukherjee, M. Maria Glymour, Wayne Mccormick, Susan M. Mccurry, James D. Bowen, Eric B. Larson, Paul K. Crane

Biostatistics Faculty Publications

INTRODUCTION—Findings for genetic correlates of late-onset Alzheimer's disease (LOAD) in studies that rely solely on clinic visits may differ from those with capacity to follow participants unable to attend clinic visits.

METHODS—We evaluated previously identified LOAD-risk single nucleotide variants in the prospective Adult Changes in Thought study, comparing hazard ratios (HRs) estimated using the full data set of both in-home and clinic visits (n = 1697) to HRs estimated using only data that were obtained from clinic visits (n = 1308). Models were adjusted for age, sex, principal components to account for ancestry, and additional health indicators.

RESULTS …


Exponential Chain Dual To Ratio Cum Dual To Product Estimator For Finite Population Mean In Double Sampling Scheme, Yater Tato, B. K. Singh Jun 2017

Exponential Chain Dual To Ratio Cum Dual To Product Estimator For Finite Population Mean In Double Sampling Scheme, Yater Tato, B. K. Singh

Applications and Applied Mathematics: An International Journal (AAM)

This paper considers an exponential chain dual to ratio cum dual to product estimator for estimating finite population mean using two auxiliary variables in double sampling scheme when the information on another additional auxiliary variable is available along with the main auxiliary variable. The expressions for bias and mean square error of the asymptotically optimum estimator are identified in two different cases. The optimum value of the first phase and second phase sample size has been obtained for the fixed cost of survey. To illustrate the results, theoretical and empirical studies have also been carried out to judge the merits …


Effective Estimation Strategy Of Finite Population Variance Using Multi-Auxiliary Variables In Double Sampling, Reba Maji, G. N. Singh, Arnab Bandyopadhyay May 2017

Effective Estimation Strategy Of Finite Population Variance Using Multi-Auxiliary Variables In Double Sampling, Reba Maji, G. N. Singh, Arnab Bandyopadhyay

Journal of Modern Applied Statistical Methods

Estimation of population variance in two-phase (double) sampling is considered using information on multiple auxiliary variables. An unbiased estimator is proposed and its properties are studied under two different structures. The superiority of the suggested estimator over some contemporary estimators of population variance was established through empirical studies from a natural and an artificially generated dataset.


Efficient And Unbiased Estimation Procedure Of Population Mean In Two-Phase Sampling, Reba Maji, Arnab Bandyopadhyay, G. N. Singh Nov 2016

Efficient And Unbiased Estimation Procedure Of Population Mean In Two-Phase Sampling, Reba Maji, Arnab Bandyopadhyay, G. N. Singh

Journal of Modern Applied Statistical Methods

In this paper, an unbiased regression-ratio type estimator has been developed for estimating the population mean using two auxiliary variables in double sampling. Its properties are studied under two different cases. Empirical studies and graphical simulation have been done to demonstrate the efficiency of the proposed estimator over other estimators.


An Improved Generalized Estimation Procedure Of Current Population Mean In Two-Occasion Successive Sampling, G. N. Singh, Alok Kumar Singh, Anup Kumar Sharma Nov 2016

An Improved Generalized Estimation Procedure Of Current Population Mean In Two-Occasion Successive Sampling, G. N. Singh, Alok Kumar Singh, Anup Kumar Sharma

Journal of Modern Applied Statistical Methods

The present work is an attempt to make use of several auxiliary variables on both occasions for improving the precision of estimates for the current population mean in two-occasion successive sampling. A generalized exponential-cum-regression type estimator of the current population mean is proposed and its optimum replacement strategy has been discussed. Empirical studies are carried out to show the dominance of the proposed estimation procedure over the sample mean estimator and natural successive sampling estimator. Empirical results have been interpreted and suitable recommendations are put forward to survey practitioners.


Estimation Of P(X > Y) When X And Y Are Dependent Random Variables Using Different Bivariate Sampling Schemes, Hani M. Samawi, Amal Helu, Haresh Rochani, Jingjing Yin, Daniel Linder Sep 2016

Estimation Of P(X > Y) When X And Y Are Dependent Random Variables Using Different Bivariate Sampling Schemes, Hani M. Samawi, Amal Helu, Haresh Rochani, Jingjing Yin, Daniel Linder

Biostatistics Faculty Publications

The stress-strength models have been intensively investigated in the literature in regards of estimating the reliability θ = P (X > Y) using parametric and nonparametric approaches under different sampling schemes when X and Y are independent random variables. In this paper, we consider the problem of estimating θ when (X, Y) are dependent random variables with a bivariate underlying distribution. The empirical and kernel estimates of θ = P (X > Y), based on bivariate ranked set sampling (BVRSS) are considered, when (X, Y) are paired dependent continuous random variables. The estimators obtained are compared to their counterpart, bivariate simple random …


Almost Unbiased Estimator Using Known Value Of Population Parameter(S) In Sample Surveys, Rajesh Singh, S.B. Gupta, Sachin Malik May 2016

Almost Unbiased Estimator Using Known Value Of Population Parameter(S) In Sample Surveys, Rajesh Singh, S.B. Gupta, Sachin Malik

Journal of Modern Applied Statistical Methods

An almost unbiased estimator using known value of some population parameter(s) is proposed. A class of estimators is defined which includes Singh and Solanki (2012) and Sahai and Ray (1980), Sisodiya and Dwivedi (1981), Singh, Cauhan, Sawan, and Smarandache (2007), Upadhyaya and Singh (1984), Singh and Tailor (2003) estimators. Under simple random sampling without replacement (SRSWOR) scheme the expressions for bias and mean square error (MSE) are derived. Numerical illustrations are given.


Separation Of Points And Interval Estimation In Mixed Dose-Response Curves With Selective Component Labeling, Darl D. Flake Ii May 2016

Separation Of Points And Interval Estimation In Mixed Dose-Response Curves With Selective Component Labeling, Darl D. Flake Ii

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Dose-response experiments are those that involve giving subjects different amounts of a treatment and observing the outcome. For example, plants may be given fertilizer and their growth could be measured or cancer patients could be given different doses of chemotherapy and their response could be monitored. These experiments are used to understand the relationship between the amount of, and response to, the treatment. Logistic regression models are often used to summarize data from these types of experiments. The dose-response experiment that motivated this dissertation involved treating a grain-pest with a pesticide. Some of the beetles had genes that made them …


A Comparison Of Semi-Parametric And Nonparametric Methods For Estimating Mean Time To Event For Randomly Left Censored Data, Farzana Chowdhury, Jahida Gulshan, Syed Shahadat Hossain May 2015

A Comparison Of Semi-Parametric And Nonparametric Methods For Estimating Mean Time To Event For Randomly Left Censored Data, Farzana Chowdhury, Jahida Gulshan, Syed Shahadat Hossain

Journal of Modern Applied Statistical Methods

The aim of this study was to make a comparison among existing estimation methods (Kaplan-Meier, Nelson-Aalen and Regression on Ordered Statistics (ROS)) for randomly left censored time to event data under selected distributions and for different level of censoring and sample sizes in order to determine the strength of these methods based on simulated data. Comparisons among the methods are made on the basis of unbiasedness and Monte Carlo Standard Error of the summary statistics (mean time to event) obtained by those methods under different conditions.


Assessing Agreement Between Two Measurement Systems: An Alternative To The Limits Of Agreement Approach, Nathaniel Stevens, S H. Steiner, R J. Mackay Jan 2015

Assessing Agreement Between Two Measurement Systems: An Alternative To The Limits Of Agreement Approach, Nathaniel Stevens, S H. Steiner, R J. Mackay

Mathematics

The comparison of two measurement systems is important in medical and other contexts. A common goal is to decide if a new measurement system agrees suitably with an existing one, and hence whether the two can be used interchangeably. Various methods for assessing interchangeability are available, the most popular being the limits of agreement approach due to Bland and Altman. In this article, we review the challenges of this technique and propose a model-based framework for comparing measurement systems that overcomes those challenges. The proposal is based on a simple metric, the probability of agreement, and a corresponding plot which …


The Number Of Subjects Per Variable Required In Linear Regression Analyses, Peter Austin, Ewout Steyerberg Jan 2015

The Number Of Subjects Per Variable Required In Linear Regression Analyses, Peter Austin, Ewout Steyerberg

Peter Austin

Objectives: To determine the number of independent variables that can be included in a linear regression model.

Study Design and Setting: We used a series of Monte Carlo simulations to examine the impact of the number of subjects per variable (SPV) on the accuracy of estimated regression coefficients and standard errors, on the empirical coverage of estimated confidence intervals, and on the accuracy of the estimated R2 of the fitted model.

Results: A minimum of approximately two SPV tended to result in estimation of regression coefficients with relative bias of less than 10%. Furthermore, with this minimum number of SPV, …


Optimal Full Matching For Survival Outcomes: A Method That Merits More Widespread Use, Peter Austin, Elizabeth Stuart Dec 2014

Optimal Full Matching For Survival Outcomes: A Method That Merits More Widespread Use, Peter Austin, Elizabeth Stuart

Peter Austin

Matching on the propensity score is a commonly used analytic method for estimating the effects of treatments on outcomes. Commonly used propensity score matching methods include nearest neighbor matching and nearest neighbor caliper matching. Rosenbaum (1991) proposed an optimal full matching approach, in which matched strata are formed consisting of either one treated subject and at least one control subject or one control subject and at least one treated subject. Full matching has been used rarely in the applied literature. Furthermore, its performance for use with survival outcomes has not been rigorously evaluated. We propose a method to use full …


Estimation Of Gumbel Parameters Under Ranked Set Sampling, Omar M. Yousef, Sameer A. Al-Subh Nov 2014

Estimation Of Gumbel Parameters Under Ranked Set Sampling, Omar M. Yousef, Sameer A. Al-Subh

Journal of Modern Applied Statistical Methods

Consider the MLEs (maximum likelihood estimators) of the parameters of the Gumbel distribution using SRS (simple random sample) and RSS (ranked set sample) and the MOMEs (method of moment estimators) and REGs (regression estimators) based on SRS. A comparison between these estimators using bias and MSE (mean square error) was performed using simulation. It appears that the MLE based on RSS can be a robust competitor to the MLE based on SRS.


Evaluating Predictors Of An Individual’S Dietary Intake Latent Value Under Different Mixed Models, Shuli Yu Aug 2014

Evaluating Predictors Of An Individual’S Dietary Intake Latent Value Under Different Mixed Models, Shuli Yu

Doctoral Dissertations

The accurate estimation of an individual’s usual dietary intake is important since the estimates are essential to uncover the diet-disease relationships. This study explores a more accurate method to estimate an individual’s latent value of usual dietary intake when it is repeatedly measured using a 24-hour dietary recall (24HR) and seven day dietary recall (7DDR), accounting for random measurement error and bias. The performance of the (empirical) predictor of subject’s latent value obtained under the finite population mixed model (FPMM) framework is compared with those obtained under the usual mixed model and the measurement error model through a simulation study. …


Median Based Modified Ratio Estimators With Known Quartiles Of An Auxiliary Variable, Jambulingam Subramani, G Prabavathy May 2014

Median Based Modified Ratio Estimators With Known Quartiles Of An Auxiliary Variable, Jambulingam Subramani, G Prabavathy

Journal of Modern Applied Statistical Methods

New median based modified ratio estimators for estimating a finite population mean using quartiles and functions of an auxiliary variable are proposed. The bias and mean squared error of the proposed estimators are obtained and the mean squared error of the proposed estimators are compared with the usual simple random sampling without replacement (SRSWOR) sample mean, ratio estimator, a few existing modified ratio estimators, the linear regression estimator and median based ratio estimator for certain natural populations. A numerical study shows that the proposed estimators perform better than existing estimators; in addition, it is shown that the proposed median based …


Population Mean Estimation With Sub Sampling The Non-Respondents Using Two Phase Sampling, Sunil Kumar, M Viswanathaiah May 2014

Population Mean Estimation With Sub Sampling The Non-Respondents Using Two Phase Sampling, Sunil Kumar, M Viswanathaiah

Journal of Modern Applied Statistical Methods

The problem of non-response in double (or two phase) sampling is dealt with combined ratio, product and regression estimators. Expressions of bias and MSE for these estimators are obtained. Comparisons of a proposed strategy with a usual unbiased estimator and other estimators are carried out and results obtained are illustrated numerically using an empirical sample.


Separate Ratio-Type Estimators Of Population Mean In Stratified Random Sampling, Rajesh Tailor, Hilal A. Lone May 2014

Separate Ratio-Type Estimators Of Population Mean In Stratified Random Sampling, Rajesh Tailor, Hilal A. Lone

Journal of Modern Applied Statistical Methods

Separate ratio-type estimators for population mean with their properties are considered. Some separate ratio-type estimators for population mean using known parameters of auxiliary variate are proposed. The bias and mean squared error of the proposed estimators are obtained up to the first degree of approximation. It is shown that the proposed estimators are more efficient than unbiased estimators in stratified random sampling and usual separate ratio estimators under certain obtained conditions. To judge the merits of the proposed estimators, an empirical study was conducted.


Double Propensity-Score Adjustment: A Solution To Design Bias Or Bias Due To Incomplete Matching, Peter Austin Dec 2013

Double Propensity-Score Adjustment: A Solution To Design Bias Or Bias Due To Incomplete Matching, Peter Austin

Peter Austin

Propensity-score matching is frequently used to reduce the effects of confounding when using observational data to estimate the effects of treatments. Matching allows one to estimate the average effect of treatment in the treated. Rosenbaum and Rubin coined the term "bias due to incomplete matching" to describe the bias that can occur when some treated subjects are excluded from the matched sample because no appropriate control subject was available. The presence of incomplete matching raises important questions around the generalizability of estimated treatment effects to the entire population of treated subjects. We describe an analytic solution to address the bias …


Estimation Of Variance Using Known Coefficient Of Variation And Median Of An Auxiliary Variable, J. Subramani, G. Kumarapandiyan May 2013

Estimation Of Variance Using Known Coefficient Of Variation And Median Of An Auxiliary Variable, J. Subramani, G. Kumarapandiyan

Journal of Modern Applied Statistical Methods

A modified ratio type variance estimator for estimating population variance of a study variable when the population median and coefficient of variation of an auxiliary variable are known is proposed. The bias and mean squared error of the proposed estimator are derived and conditions under which the proposed estimator performs better than the traditional ratio type variance estimators and modified ratio type variance estimators are obtained. Using a numerical study results show that the proposed estimator performs better than the traditional ratio type variance estimator and existing modified ratio type variance estimators.