Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 26 of 26

Full-Text Articles in Statistics and Probability

The Distribution Of The Significance Level, Paul O. Monnu Jan 2024

The Distribution Of The Significance Level, Paul O. Monnu

Electronic Theses and Dissertations

Reporting the p-value is customary when conducting a test of hypothesis or significance. The likelihood of getting a fictitious second sample and presuming the null hypothesis is correct is the p-value. The significance level is a statistic that interests us to investigate. Being a statistic, it has a distribution. For the F-test in a one-way ANOVA and the t-tests for population means, we define the significance level, its observed value, and the observed significance level. It is possible to derive the significance level distribution. The t-test and the F-test are not without controversy. Specifically, we demonstrate that as sample size …


Statistical Inference On Lung Cancer Screening Using The National Lung Screening Trial Data., Farhin Rahman Aug 2023

Statistical Inference On Lung Cancer Screening Using The National Lung Screening Trial Data., Farhin Rahman

Electronic Theses and Dissertations

This dissertation consists of three research projects on cancer screening probability modeling. In these projects, the three key modeling parameters (sensitivity, sojourn time, transition density) for cancer screening were estimated, along with the long-term outcomes (including overdiagnosis as one outcome), the optimal screening time/age, the lead time distribution, and the probability of overdiagnosis at the future screening time were simulated to provide a statistical perspective on the effectiveness of cancer screening programs. In the first part of this dissertation, a statistical inference was conducted for male and female smokers using the National Lung Screening Trial (NLST) chest X-ray data. A …


Bayesian Methods For Graphical Models With Neighborhood Selection., Sagnik Bhadury Dec 2022

Bayesian Methods For Graphical Models With Neighborhood Selection., Sagnik Bhadury

Electronic Theses and Dissertations

Graphical models determine associations between variables through the notion of conditional independence. Gaussian graphical models are a widely used class of such models, where the relationships are formalized by non-null entries of the precision matrix. However, in high-dimensional cases, covariance estimates are typically unstable. Moreover, it is natural to expect only a few significant associations to be present in many realistic applications. This necessitates the injection of sparsity techniques into the estimation method. Classical frequentist methods, like GLASSO, use penalization techniques for this purpose. Fully Bayesian methods, on the contrary, are slow because they require iteratively sampling over a quadratic …


Confidence Interval For The Mean Of A Beta Distribution, Sean Rangel Dec 2021

Confidence Interval For The Mean Of A Beta Distribution, Sean Rangel

Electronic Theses and Dissertations

Statistical inference for the mean of a beta distribution has become increasingly popular in various fields of academic research. In this study, we developed a novel statistical model from likelihood-based techniques to evaluate various confidence interval techniques for the mean of a beta distribution. Simulation studies will be implemented to compare the performance of the confidence intervals. In addition to the development and study involving confidence intervals, we will also apply the confidence intervals to real biological data that was gathered by the Department of Biology at Stephen F. Austin State University and provide recommendations on the best practice.


Bias Of Rank Correlation Under A Mixture Model, Russell Land Jan 2021

Bias Of Rank Correlation Under A Mixture Model, Russell Land

Electronic Theses and Dissertations

This thesis project will analyze the bias in mixture models when contaminated data is present. Specifically, we will analyze the relationship between the bias and the mixing proportion, p, for the rank correlation methods Spearman’s Rho and Kendall’s Tau. We will first look at the history of the two non-parametric rank correlation methods and the sample and population definitions will be introduced. Copulas will be introduced to show a few ways we can define these correlation methods. After that, mixture models will be defined and the main theorem will be stated and proved. As an example, we will apply this …


Use Of Research Tradition And Design In Program Evaluation: An Explanatory Mixed Methods Study Of Practitioners’ Methodological Choices, Margaret Schultz Patel Jan 2021

Use Of Research Tradition And Design In Program Evaluation: An Explanatory Mixed Methods Study Of Practitioners’ Methodological Choices, Margaret Schultz Patel

Electronic Theses and Dissertations

The goal of this explanatory sequential mixed method study was to assess whether there were observable trends, associations, or group differences in evaluation methodology by settings and content area in published evaluations from the past ten years (quantitative), to illuminate how evaluation practitioners selected these methodologies (qualitative), and assess how emergent findings from each phase fit together or helped contextualize each other. In this study, methodology was operationalized as research tradition and method was operationalized as research design. For phase one (quantitative), a systematized ten-year review of five peer-reviewed evaluation journals was conducted and coded by journal, research tradition, research …


Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das Dec 2020

Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das

Electronic Theses and Dissertations

Recently, gene set analysis has become the first choice for gaining insights into the underlying complex biology of diseases through high-throughput genomic studies, such as Microarrays, bulk RNA-Sequencing, single cell RNA-Sequencing, etc. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Further, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. Hence, a comprehensive overview of the available gene set analysis approaches used for different high-throughput genomic studies is provided. The analysis of gene sets is usually carried out based on …


Using Saddlepoint Approximations And Likelihood-Based Methods To Conduct Statistical Inference For The Mean Of The Beta Distribution, Bryn Brakefield May 2020

Using Saddlepoint Approximations And Likelihood-Based Methods To Conduct Statistical Inference For The Mean Of The Beta Distribution, Bryn Brakefield

Electronic Theses and Dissertations

The prevalence of conducting statistical inference for the mean of the beta distribution has been rising in various fields of academic research, such as in immunology that analyzes proportions of rare cell population subsets. For our purposes, we will address this statistical inference problem by using likelihood-based applications to hypothesis testing, along with a relatively new statistical method called saddlepoint approximations. Through simulation work, we will compare the performance of these statistical procedures and provide both the statistical and scientific communities with recommendations on best practices.


Assessing Robustness Of The Rasch Mixture Model To Detect Differential Item Functioning - A Monte Carlo Simulation Study, Jinjin Huang Jan 2020

Assessing Robustness Of The Rasch Mixture Model To Detect Differential Item Functioning - A Monte Carlo Simulation Study, Jinjin Huang

Electronic Theses and Dissertations

Measurement invariance is crucial for an effective and valid measure of a construct. Invariance holds when the latent trait varies consistently across subgroups; in other words, the mean differences among subgroups are only due to true latent ability differences. Differential item functioning (DIF) occurs when measurement invariance is violated. There are two kinds of traditional tools for DIF detection: non-parametric methods and parametric methods. Mantel Haenszel (MH), SIBTEST, and standardization are examples of non-parametric DIF detection methods. The majority of parametric DIF detection methods are item response theory (IRT) based. Both non-parametric methods and parametric methods compare differences among subgroups …


Generalizations Of The Arcsine Distribution, Rebecca Rasnick May 2019

Generalizations Of The Arcsine Distribution, Rebecca Rasnick

Electronic Theses and Dissertations

The arcsine distribution looks at the fraction of time one player is winning in a fair coin toss game and has been studied for over a hundred years. There has been little further work on how the distribution changes when the coin tosses are not fair or when a player has already won the initial coin tosses or, equivalently, starts with a lead. This thesis will first cover a proof of the arcsine distribution. Then, we explore how the distribution changes when the coin the is unfair. Finally, we will explore the distribution when one person has won the first …


Wald Confidence Intervals For A Single Poisson Parameter And Binomial Misclassification Parameter When The Data Is Subject To Misclassification, Nishantha Janith Chandrasena Poddiwala Hewage Aug 2018

Wald Confidence Intervals For A Single Poisson Parameter And Binomial Misclassification Parameter When The Data Is Subject To Misclassification, Nishantha Janith Chandrasena Poddiwala Hewage

Electronic Theses and Dissertations

This thesis is based on a Poisson model that uses both error-free data and error-prone data subject to misclassification in the form of false-negative and false-positive counts. We present maximum likelihood estimators (MLEs), Fisher's Information, and Wald statistics for Poisson rate parameter and the two misclassification parameters. Next, we invert the Wald statistics to get asymptotic confidence intervals for Poisson rate parameter and false-negative rate parameter. The coverage and width properties for various sample size and parameter configurations are studied via a simulation study. Finally, we apply the MLEs and confidence intervals to one real data set and another realistic …


Evaluation Of Using The Bootstrap Procedure To Estimate The Population Variance, Nghia Trong Nguyen May 2018

Evaluation Of Using The Bootstrap Procedure To Estimate The Population Variance, Nghia Trong Nguyen

Electronic Theses and Dissertations

The bootstrap procedure is widely used in nonparametric statistics to generate an empirical sampling distribution from a given sample data set for a statistic of interest. Generally, the results are good for location parameters such as population mean, median, and even for estimating a population correlation. However, the results for a population variance, which is a spread parameter, are not as good due to the resampling nature of the bootstrap method. Bootstrap samples are constructed using sampling with replacement; consequently, groups of observations with zero variance manifest in these samples. As a result, a bootstrap variance estimator will carry a …


Examination And Comparison Of The Performance Of Common Non-Parametric And Robust Regression Models, Gregory F. Malek Aug 2017

Examination And Comparison Of The Performance Of Common Non-Parametric And Robust Regression Models, Gregory F. Malek

Electronic Theses and Dissertations

ABSTRACT

Examination and Comparison of the Performance of Common Non-Parametric and Robust Regression Models

By

Gregory Frank Malek

Stephen F. Austin State University, Masters in Statistics Program,

Nacogdoches, Texas, U.S.A.

g_m_2002@live.com

This work investigated common alternatives to the least-squares regression method in the presence of non-normally distributed errors. An initial literature review identified a variety of alternative methods, including Theil Regression, Wilcoxon Regression, Iteratively Re-Weighted Least Squares, Bounded-Influence Regression, and Bootstrapping methods. These methods were evaluated using a simple simulated example data set, as well as various real data sets, including math proficiency data, Belgian telephone call data, and faculty …


A Distribution Of The First Order Statistic When The Sample Size Is Random, Vincent Z. Forgo Mr May 2017

A Distribution Of The First Order Statistic When The Sample Size Is Random, Vincent Z. Forgo Mr

Electronic Theses and Dissertations

Statistical distributions also known as probability distributions are used to model a random experiment. Probability distributions consist of probability density functions (pdf) and cumulative density functions (cdf). Probability distributions are widely used in the area of engineering, actuarial science, computer science, biological science, physics, and other applicable areas of study. Statistics are used to draw conclusions about the population through probability models. Sample statistics such as the minimum, first quartile, median, third quartile, and maximum, referred to as the five-number summary, are examples of order statistics. The minimum and maximum observations are important in extreme value theory. This paper will …


Development And Properties Of Kernel-Based Methods For The Interpretation And Presentation Of Forensic Evidence, Douglas Armstrong Jan 2017

Development And Properties Of Kernel-Based Methods For The Interpretation And Presentation Of Forensic Evidence, Douglas Armstrong

Electronic Theses and Dissertations

The inference of the source of forensic evidence is related to model selection. Many forms of evidence can only be represented by complex, high-dimensional random vectors and cannot be assigned a likelihood structure. A common approach to circumvent this is to measure the similarity between pairs of objects composing the evidence. Such methods are ad-hoc and unstable approaches to the judicial inference process. While these methods address the dimensionality issue they also engender dependencies between scores when 2 scores have 1 object in common that are not taken into account in these models. The model developed in this research captures …


Comparison Of Different Methods For Estimating Log-Normal Means, Qi Tang May 2014

Comparison Of Different Methods For Estimating Log-Normal Means, Qi Tang

Electronic Theses and Dissertations

The log-normal distribution is a popular model in many areas, especially in biostatistics and survival analysis where the data tend to be right skewed. In our research, a total of ten different estimators of log-normal means are compared theoretically. Simulations are done using different values of parameters and sample size. As a result of comparison, ``A degree of freedom adjusted" maximum likelihood estimator and Bayesian estimator under quadratic loss are the best when using the mean square error (MSE) as a criterion. The ten estimators are applied to a real dataset, an environmental study from Naval Construction Battalion Center (NCBC), …


Some New Probability Distributions Based On Random Extrema And Permutation Patterns, Jie Hao May 2014

Some New Probability Distributions Based On Random Extrema And Permutation Patterns, Jie Hao

Electronic Theses and Dissertations

In this paper, we study a new family of random variables, that arise as the distribution of extrema of a random number N of independent and identically distributed random variables X1,X2, ..., XN, where each Xi has a common continuous distribution with support on [0,1]. The general scheme is first outlined, and SUG and CSUG models are introduced in detail where Xi is distributed as U[0,1]. Some features of the proposed distributions can be studied via its mean, variance, moments and moment-generating function. Moreover, we make some other choices for …


Comparing K Population Means With No Assumption About The Variances, Tony Yaacoub Jan 2014

Comparing K Population Means With No Assumption About The Variances, Tony Yaacoub

Electronic Theses and Dissertations

In the analysis of most statistically designed experiments, it is common to assume equal variances along with the assumptions that the sample measurements are independent and normally distributed. Under these three assumptions, a likelihood ratio test is used to test for the difference in population means. Typically, the assumption of independence can be justified based on the sampling method used by the researcher. The likelihood ratio test is robust to the assumption of normality. However, the equality of variances is often difficult to justify. It has been found that the assumption of equal variances cannot be made even after transforming …


Generalized Weibull And Inverse Weibull Distributions With Applications, Valeriia Sherina Jan 2014

Generalized Weibull And Inverse Weibull Distributions With Applications, Valeriia Sherina

Electronic Theses and Dissertations

In this thesis, new classes of Weibull and inverse Weibull distributions including the generalized new modified Weibull (GNMW), gamma-generalized inverse Weibull (GGIW), the weighted proportional inverse Weibull (WPIW) and inverse new modified Weibull (INMW) distributions are introduced. The GNMW contains several sub-models including the new modified Weibull (NMW), generalized modified Weibull (GMW), modified Weibull (MW), Weibull (W) and exponential (E) distributions, just to mention a few. The class of WPIW distributions contains several models such as: length-biased, hazard and reverse hazard proportional inverse Weibull, proportional inverse Weibull, inverse Weibull, inverse exponential, inverse Rayleigh, and Frechet distributions as special cases. Included …


Finding A Better Confidence Interval For A Single Regression Changepoint Using Different Bootstrap Confidence Interval Procedures, Bodhipaksha Thilakarathne Oct 2012

Finding A Better Confidence Interval For A Single Regression Changepoint Using Different Bootstrap Confidence Interval Procedures, Bodhipaksha Thilakarathne

Electronic Theses and Dissertations

Recently a number of papers have been published in the area of regression changepoints but there is not much literature concerning confidence intervals for regression changepoints. The purpose of this paper is to find a better bootstrap confidence interval for a single regression changepoint. ("Better" confidence interval means having a minimum length and coverage probability which is close to a chosen significance level). Several methods will be used to find bootstrap confidence intervals. Among those methods a better confidence interval will be presented.


Estimating The Difference Of Percentiles From Two Independent Populations., Romual Eloge Tchouta Aug 2008

Estimating The Difference Of Percentiles From Two Independent Populations., Romual Eloge Tchouta

Electronic Theses and Dissertations

We first consider confidence intervals for a normal percentile, an exponential percentile and a uniform percentile. Then we develop confidence intervals for a difference of percentiles from two independent normal populations, two independent exponential populations and two independent uniform populations. In our study, we mainly focus on the maximum likelihood to develop our confidence intervals. The efficiency of this method is examined via coverage rates obtained in a simulation study done with the statistical software R.


Interval Estimation For The Ratio Of Percentiles From Two Independent Populations., Pius Matheka Muindi Aug 2008

Interval Estimation For The Ratio Of Percentiles From Two Independent Populations., Pius Matheka Muindi

Electronic Theses and Dissertations

Percentiles are used everyday in descriptive statistics and data analysis. In real life, many quantities are normally distributed and normal percentiles are often used to describe those quantities. In life sciences, distributions like exponential, uniform, Weibull and many others are used to model rates, claims, pensions etc. The need to compare two or more independent populations can arise in data analysis. The ratio of percentiles is just one of the many ways of comparing populations. This thesis constructs a large sample confidence interval for the ratio of percentiles whose underlying distributions are known. A simulation study is conducted to evaluate …


New Technique For Imputing Missing Item Responses For An Ordinal Variable: Using Tennessee Youth Risk Behavior Survey As An Example., Andaleeb Abrar Ahmed Dec 2007

New Technique For Imputing Missing Item Responses For An Ordinal Variable: Using Tennessee Youth Risk Behavior Survey As An Example., Andaleeb Abrar Ahmed

Electronic Theses and Dissertations

Surveys ordinarily ask questions in an ordinal scale and often result in missing data. We suggest a regression based technique for imputing missing ordinal data. Multilevel cumulative logit model was used with an assumption that observed responses of certain key variables can serve as covariate in predicting missing item responses of an ordinal variable. Individual predicted probabilities at each response level were obtained. Average individual predicted probabilities for each response level were used to randomly impute the missing responses using a uniform distribution. Finally, likelihood ratio chi square statistics was used to compare the imputed and observed distributions. Two other …


A Statistical Evaluation Of Algorithms For Independently Seeding Pseudo-Random Number Generators Of Type Multiplicative Congruential (Lehmer-Class)., Robert Grisham Stewart Aug 2007

A Statistical Evaluation Of Algorithms For Independently Seeding Pseudo-Random Number Generators Of Type Multiplicative Congruential (Lehmer-Class)., Robert Grisham Stewart

Electronic Theses and Dissertations

To be effective, a linear congruential random number generator (LCG) should produce values that are (a) uniformly distributed on the unit interval (0,1) excluding endpoints and (b) substantially free of serial correlation. It has been found that many statistical methods produce inflated Type I error rates for correlated observations. Theoretically, independently seeding an LCG under the following conditions attenuates serial correlation: (a) simple random sampling of seeds, (b) non-replicate streams, (c) non-overlapping streams, and (d) non-adjoining streams. Accordingly, 4 algorithms (each satisfying at least 1 condition) were developed: (a) zero-leap, (b) fixed-leap, (c) scaled random-leap, and (d) unscaled random-leap. Note …


Comparing The Statistical Tests For Homogeneity Of Variances., Zhiqiang Mu Aug 2006

Comparing The Statistical Tests For Homogeneity Of Variances., Zhiqiang Mu

Electronic Theses and Dissertations

Testing the homogeneity of variances is an important problem in many applications since statistical methods of frequent use, such as ANOVA, assume equal variances for two or more groups of data. However, testing the equality of variances is a difficult problem due to the fact that many of the tests are not robust against non-normality. It is known that the kurtosis of the distribution of the source data can affect the performance of the tests for variance. We review the classical tests and their latest, more robust modifications, some other tests that have recently appeared in the literature, and use …


Bayesian Reference Inference On The Ratio Of Poisson Rates., Changbin Guo May 2006

Bayesian Reference Inference On The Ratio Of Poisson Rates., Changbin Guo

Electronic Theses and Dissertations

Bayesian reference analysis is a method of determining the prior under the Bayesian paradigm. It incorporates as little information as possible from the experiment. Estimation of the ratio of two independent Poisson rates is a common practical problem. In this thesis, the method of reference analysis is applied to derive the posterior distribution of the ratio of two independent Poisson rates, and then to construct point and interval estimates based on the reference posterior. In addition, the Frequentist coverage property of HPD intervals is verified through simulation.