Open Access. Powered by Scholars. Published by Universities.®

Statistical Methodology Commons

Open Access. Powered by Scholars. Published by Universities.®

Electronic Theses and Dissertations

Discipline
Institution
Keyword
Publication Year

Articles 1 - 30 of 39

Full-Text Articles in Statistical Methodology

Assessment Of Method Effects Of Keying And Wording In Instruments: A Mixed-Methods Explanatory Sequential Study, Lin Ma Mar 2024

Assessment Of Method Effects Of Keying And Wording In Instruments: A Mixed-Methods Explanatory Sequential Study, Lin Ma

Electronic Theses and Dissertations

This dissertation presents an innovative approach to examining the keying method, wording method, and construct validity on psychometric instruments. By employing a mixed methods explanatory sequential design, the effects of keying and wording in two psychometric assessments were examined and validated. Those two self-report psychometric assessments were the Effortful Control assessment (Ellis & Rothbart, 2001) and the Grit assessment (Duckworth & Quinn, 2009). Moreover, the quantitative phase utilized structural equation modeling to analyze 2,104 students’ responses and assess the construct of keying and wording. Various hypothetical models were investigated and evaluated. The reliability of each construct in each method was …


The Distribution Of The Significance Level, Paul O. Monnu Jan 2024

The Distribution Of The Significance Level, Paul O. Monnu

Electronic Theses and Dissertations

Reporting the p-value is customary when conducting a test of hypothesis or significance. The likelihood of getting a fictitious second sample and presuming the null hypothesis is correct is the p-value. The significance level is a statistic that interests us to investigate. Being a statistic, it has a distribution. For the F-test in a one-way ANOVA and the t-tests for population means, we define the significance level, its observed value, and the observed significance level. It is possible to derive the significance level distribution. The t-test and the F-test are not without controversy. Specifically, we demonstrate that as sample size …


Bayesian Strategies For Propensity Score Estimation In Causal Inference., Uthpala I. Wanigasekara Dec 2023

Bayesian Strategies For Propensity Score Estimation In Causal Inference., Uthpala I. Wanigasekara

Electronic Theses and Dissertations

Causal inference is a method used in various fields to draw causal conclusions based on data. It involves using assumptions, study designs, and estimation strategies to minimize the impact of confounding variables. Propensity scores are used to estimate outcome effects, through matching methods, stratification, weighting methods, and the Covariate Balancing Propensity Score method. However, they can be sensitive to estimation techniques and can lead to unstable findings. Researchers have proposed integrating weighing with regression adjustment in parametric models to improve causal inference validity. The first project focuses on Bayesian joint and two-stage methods for propensity score analysis. Propensity score modeling …


A Data-Driven Multi-Regime Approach For Predicting Real-Time Energy Consumption Of Industrial Machines., Abdulgani Kahraman Aug 2023

A Data-Driven Multi-Regime Approach For Predicting Real-Time Energy Consumption Of Industrial Machines., Abdulgani Kahraman

Electronic Theses and Dissertations

This thesis focuses on methods for improving energy consumption prediction performance in complex industrial machines. Working with real-world industrial machines brings several challenges, including data access, algorithmic bias, data privacy, and the interpretation of machine learning algorithms. To effectively manage energy consumption in the industrial sector, it is essential to develop a framework that enhances prediction performance, reduces energy costs, and mitigates air pollution in heavy industrial machine operations. This study aims to assist managers in making informed decisions and driving the transition towards green manufacturing. The energy consumption of industrial machinery is substantial, and the recent increase in CO2 …


An Analysis Of All-Cause Mortality On Patients With Sickle Cell Disease And Kidney Disease Using Propensity Score Matching, Adam Garrison May 2023

An Analysis Of All-Cause Mortality On Patients With Sickle Cell Disease And Kidney Disease Using Propensity Score Matching, Adam Garrison

Electronic Theses and Dissertations

In this work, we provide an overview of the Cox proportional hazards model for time to event or survival analysis and the notion of propensity score matching to deal with confounding factors. A full analysis is reported in Chapter 2 concerning mortality for in-center dialysis patients with sickle cell disease to demonstrate the application of a general analysis strategy that has some logistical benefits over more traditional approaches to accounting for confounding variables. We also provide some insight and discussions on the challenges and future research questions that will emerge when trying to implement this strategy as a monitoring tool …


Bayesian Methods For Graphical Models With Neighborhood Selection., Sagnik Bhadury Dec 2022

Bayesian Methods For Graphical Models With Neighborhood Selection., Sagnik Bhadury

Electronic Theses and Dissertations

Graphical models determine associations between variables through the notion of conditional independence. Gaussian graphical models are a widely used class of such models, where the relationships are formalized by non-null entries of the precision matrix. However, in high-dimensional cases, covariance estimates are typically unstable. Moreover, it is natural to expect only a few significant associations to be present in many realistic applications. This necessitates the injection of sparsity techniques into the estimation method. Classical frequentist methods, like GLASSO, use penalization techniques for this purpose. Fully Bayesian methods, on the contrary, are slow because they require iteratively sampling over a quadratic …


Examining The Credibility Of Story-Based Causal Methodologies, Megan E. Kauffmann Jan 2022

Examining The Credibility Of Story-Based Causal Methodologies, Megan E. Kauffmann

Electronic Theses and Dissertations

The purpose of this study was to explore how evaluators justify using story-based methodologies when examining causality. The two primary research questions of the study included: 1) what arguments are made by evaluators to justify the credibility of story-based causal methodologies to evaluation stakeholders; and 2) from the perspective of evaluators, how do contextual factors influence whether story-based causal methodologies are perceived as credible by evaluation stakeholders? A case study was conducted to examine the cases of four evaluators who had experience implementing a story-based methodology in an evaluation. Data collection procedures included two interviews with each participant and a …


Mis-Specification Of Functional Forms In Growth Mixture Modeling: A Monte Carlo Simulation, Richa Ghevarghese Jan 2022

Mis-Specification Of Functional Forms In Growth Mixture Modeling: A Monte Carlo Simulation, Richa Ghevarghese

Electronic Theses and Dissertations

Growth mixture modeling (GMM) is a methodological tool used to represent heterogeneity in longitudinal datasets through the identification of unobserved subgroups following qualitatively and quantitatively distinct trajectories in a population. These growth trajectories or functional forms are informed by the underlying developmental theory, are distinct to each subgroup, and form the core assumptions of the model. Therefore, the accuracy of the assumed functional forms of growth strongly influences substantive research and theories of growth. While there is evidence of mis-specified functional forms of growth in GMM literature, the weight of this violation has been largely overlooked. Current solutions to circumvent …


Application Of An Organizational Evaluation Capacity Assessment In A Multinational Ngo: A Case Study To Support Applied Practice, Ryan James Smyth Jan 2022

Application Of An Organizational Evaluation Capacity Assessment In A Multinational Ngo: A Case Study To Support Applied Practice, Ryan James Smyth

Electronic Theses and Dissertations

As evaluation capacity building (ECB) has rapidly emerged as a practice in human service organizations and as a field of academic inquiry, attention has focused on methods of evaluation capacity building while assessment of organizational evaluation capacity (EC) has lagged behind. To examine the practice of organizational evaluation capacity assessment, this dissertation presents two separate but related studies. In sub-study 1, I present a qualitative evidence synthesis of the research theorizing organizational evaluation capacity models. In sub-study 2, I support the implementation of one of the tools from the evidence-synthesis at a multinational human service organization. I use a concurrent …


Estimating Treatment Effect On Medical Cost And Examining Medical Cost Trajectory Using Splines And Change Point Techniques., Indranil Ghosh Dec 2021

Estimating Treatment Effect On Medical Cost And Examining Medical Cost Trajectory Using Splines And Change Point Techniques., Indranil Ghosh

Electronic Theses and Dissertations

In the world of growing medical needs, other than the clinical outcomes, the cost of healthcare is one of the important aspects to evaluate. The cost of treatment could act as a decisive factor on which one to choose from two equally likely effective treatment options. In literature, the most used quantity for the cost of treatment is cumulative lifetime cost since the diagnosis of a disease. While it provides a bird' eye view of the treatment cost, it fails to capture the underlying pattern of the treatment cost trajectory. We developed a marginal structural functional model (MSFM) using an …


Bayesian Variable Selection Strategies In Longitudinal Mixture Models And Categorical Regression Problems., Md Nazir Uddin Aug 2021

Bayesian Variable Selection Strategies In Longitudinal Mixture Models And Categorical Regression Problems., Md Nazir Uddin

Electronic Theses and Dissertations

In this work, we seek to develop a variable screening and selection method for Bayesian mixture models with longitudinal data. To develop this method, we consider data from the Health and Retirement Survey (HRS) conducted by University of Michigan. Considering yearly out-of-pocket expenditures as the longitudinal response variable, we consider a Bayesian mixture model with $K$ components. The data consist of a large collection of demographic, financial, and health-related baseline characteristics, and we wish to find a subset of these that impact cluster membership. An initial mixture model without any cluster-level predictors is fit to the data through an MCMC …


Evaluation Of The Effect Of The Clinical-Decision-Support Systems On Diabetes Management: A Multivariate Meta-Analysis Comparison With Univariate Meta-Analysis, Abdelfattah Elbarsha Jan 2021

Evaluation Of The Effect Of The Clinical-Decision-Support Systems On Diabetes Management: A Multivariate Meta-Analysis Comparison With Univariate Meta-Analysis, Abdelfattah Elbarsha

Electronic Theses and Dissertations

The advantage of using meta-analysis lies in its ability in providing a quantitative summary of the findings from multiple studies. The aim of this dissertation was first to conduct a simulation study in order to understand what factors (sample size, between-study correlation, and percent of missing data) have a significant effect on meta-analysis estimates and whether using univariate or multivariate meta-analysis would produce different estimates.

The second goal of this study was to evaluate the effect of clinical decision support systems CDSS on diabetes care management by conducting three separate univariate meta-analyses and one multivariate meta-analysis. CDSS are health information …


The Combined Impact Of Continuous And Ordinal Auxiliary Variables On Missing Data Imputation In Sem, Salina Wu Whitaker Jan 2021

The Combined Impact Of Continuous And Ordinal Auxiliary Variables On Missing Data Imputation In Sem, Salina Wu Whitaker

Electronic Theses and Dissertations

“Modern” methods of addressing missing data using full-information maximum-likelihood (FIML) have become mainstays in SEM analyses. FIML allows the inclusion of auxiliary variables which carry information that is related to missing values and can reduce bias in parameter estimates. Past research has illustrated the benefits of auxiliary variable inclusion under different missingness conditions (MCAR and MNAR; e.g., Enders, 2008), missingness proportions (e.g., Collins et al., 2001), and although limited, missingness patterns (e.g., Yoo, 2009) in FIML analyses. While past studies have focused on the effects of either continuous or ordinal auxiliary variables, no study has included both types in their …


Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das Dec 2020

Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das

Electronic Theses and Dissertations

Recently, gene set analysis has become the first choice for gaining insights into the underlying complex biology of diseases through high-throughput genomic studies, such as Microarrays, bulk RNA-Sequencing, single cell RNA-Sequencing, etc. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Further, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. Hence, a comprehensive overview of the available gene set analysis approaches used for different high-throughput genomic studies is provided. The analysis of gene sets is usually carried out based on …


Is The Reliability Of Objective Originality Scores Confounded By Elaboration?, Shannon Marie Maio Jan 2020

Is The Reliability Of Objective Originality Scores Confounded By Elaboration?, Shannon Marie Maio

Electronic Theses and Dissertations

The increased use of text-mining models as a scoring mechanism for divergent thinking (DT) tasks has sparked concerns about the ways in which automated Originality scores may be influenced by other dimensions of DT, especially Elaboration. The debate centers around the question of whether too much variance in automated Originality scores is accounted for by the number of words a participant uses in a response (i.e., Elaboration), and, thus, how the influence of Elaboration can affect the reliability of Originality scores. Here, a partial correlation analysis, in conjunction with text-mining and psychometric modeling, is conducted to test the degree to …


Multiple Imputation Using Influential Exponential Tilting In Case Of Non-Ignorable Missing Data, Kavita Gohil Jan 2020

Multiple Imputation Using Influential Exponential Tilting In Case Of Non-Ignorable Missing Data, Kavita Gohil

Electronic Theses and Dissertations

Modern research strategies rely predominantly on three steps, data collection, data analysis, and inference. In research, if the data is not collected as designed, researchers may face challenges of having incomplete data, especially when it is non-ignorable. These situations affect the subsequent steps of evaluation and make them difficult to perform. Inference with incomplete data is a challenging task in data analysis and clinical trials when missing data related to the condition under the study. Moreover, results obtained from incomplete data are prone to biases. Parameter estimation with non-ignorable missing data is even more challenging to handle and extract useful …


Generalization Of Kullback-Leibler Divergence For Multi-Stage Diseases: Application To Diagnostic Test Accuracy And Optimal Cut-Points Selection Criterion, Chen Mo Jan 2020

Generalization Of Kullback-Leibler Divergence For Multi-Stage Diseases: Application To Diagnostic Test Accuracy And Optimal Cut-Points Selection Criterion, Chen Mo

Electronic Theses and Dissertations

The Kullback-Leibler divergence (KL), which captures the disparity between two distributions, has been considered as a measure for determining the diagnostic performance of an ordinal diagnostic test. This study applies KL and further generalizes it to comprehensively measure the diagnostic accuracy test for multi-stage (K > 2) diseases, named generalized total Kullback-Leibler divergence (GTKL). Also, GTKL is proposed as an optimal cut-points selection criterion for discriminating subjects among different disease stages. Moreover, the study investigates a variety of applications of GTKL on measuring the rule-in/out potentials in the single-stage and multi-stage levels. Intensive simulation studies are conducted to compare the performance …


Nonparametric Misclassification Simulation And Extrapolation Method And Its Application, Congjian Liu Jan 2020

Nonparametric Misclassification Simulation And Extrapolation Method And Its Application, Congjian Liu

Electronic Theses and Dissertations

The misclassification simulation extrapolation (MC-SIMEX) method proposed by Küchenho et al. is a general method of handling categorical data with measurement error. It consists of two steps, the simulation and extrapolation steps. In the simulation step, it simulates observations with varying degrees of measurement error. Then parameter estimators for varying degrees of measurement error are obtained based on these observations. In the extrapolation step, it uses a parametric extrapolation function to obtain the parameter estimators for data with no measurement error. However, as shown in many studies, the parameter estimators are still biased as a result of the parametric extrapolation …


Communications And Methodologies In Crime Geography: Contemporary Approaches To Disseminating Criminal Incidence And Research, Mitchell Ogden Dec 2019

Communications And Methodologies In Crime Geography: Contemporary Approaches To Disseminating Criminal Incidence And Research, Mitchell Ogden

Electronic Theses and Dissertations

Many tools exist to assist law enforcement agencies in mitigating criminal activity. For centuries, academics used statistics in the study of crime and criminals, and more recently, police departments make use of spatial statistics and geographic information systems in that pursuit. Clustering and hot spot methods of analysis are popular in this application for their relative simplicity of interpretation and ease of process. With recent advancements in geospatial technology, it is easier than ever to publicly share data through visual communication tools like web applications and dashboards. Sharing data and results of analyses boosts transparency and the public image of …


Comparison Of Imputation Methods For Mixed Data Missing At Random, Kaitlyn Heidt May 2019

Comparison Of Imputation Methods For Mixed Data Missing At Random, Kaitlyn Heidt

Electronic Theses and Dissertations

A statistician's job is to produce statistical models. When these models are precise and unbiased, we can relate them to new data appropriately. However, when data sets have missing values, assumptions to statistical methods are violated and produce biased results. The statistician's objective is to implement methods that produce unbiased and accurate results. Research in missing data is becoming popular as modern methods that produce unbiased and accurate results are emerging, such as MICE in R, a statistical software. Using real data, we compare four common imputation methods, in the MICE package in R, at different levels of missingness. The …


A Comparison Of Bayesian Estimation Techniques In A Multidimensional Two-Parameter Partial Credit Item Response Model, Peiyan Liu Jan 2019

A Comparison Of Bayesian Estimation Techniques In A Multidimensional Two-Parameter Partial Credit Item Response Model, Peiyan Liu

Electronic Theses and Dissertations

Bayesian estimation methods have shown better performance than the traditional Marginal Maximum Likelihood (MML) estimation method for parameter estimation in relatively simple item response models. However, extant literature is lacking on the investigation of Bayesian parameter estimation approaches for a multidimensional two parameter partial credit (M2PPC) model, therefore this simulation study investigated the performance of two Bayesian Markov Chain Monte Carlo (MCMC) algorithms: Gibbs Sampler and Hamiltonian Monte Carlo-No-U-Turn-Sampler (HMC-NUTS) for M2PPC models' parameter estimation. It compared the estimation accuracy and computing speed in different combinations of situations, including prior choices, test lengths, and the relationships between dimensions.

The datasets …


Variable Selection In Accelerated Failure Time (Aft) Frailty Models: An Application Of Penalized Quasi-Likelihood, Sarbesh R. Pandeya Jan 2019

Variable Selection In Accelerated Failure Time (Aft) Frailty Models: An Application Of Penalized Quasi-Likelihood, Sarbesh R. Pandeya

Electronic Theses and Dissertations

Variable selection is one of the standard ways of selecting models in large scale datasets. It has applications in many fields of research study, especially in large multi-center clinical trials. One of the prominent methods in variable selection is the penalized likelihood, which is both consistent and efficient. However, the penalized selection is significantly challenging under the influence of random (frailty) covariates. It is even more complicated when there is involvement of censoring as it may not have a closed-form solution for the marginal log-likelihood. Therefore, we applied the penalized quasi-likelihood (PQL) approach that approximates the solution for such a …


Bayesian Analytical Approaches For Metabolomics : A Novel Method For Molecular Structure-Informed Metabolite Interaction Modeling, A Novel Diagnostic Model For Differentiating Myocardial Infarction Type, And Approaches For Compound Identification Given Mass Spectrometry Data., Patrick J. Trainor Aug 2018

Bayesian Analytical Approaches For Metabolomics : A Novel Method For Molecular Structure-Informed Metabolite Interaction Modeling, A Novel Diagnostic Model For Differentiating Myocardial Infarction Type, And Approaches For Compound Identification Given Mass Spectrometry Data., Patrick J. Trainor

Electronic Theses and Dissertations

Metabolomics, the study of small molecules in biological systems, has enjoyed great success in enabling researchers to examine disease-associated metabolic dysregulation and has been utilized for the discovery biomarkers of disease and phenotypic states. In spite of recent technological advances in the analytical platforms utilized in metabolomics and the proliferation of tools for the analysis of metabolomics data, significant challenges in metabolomics data analyses remain. In this dissertation, we present three of these challenges and Bayesian methodological solutions for each. In the first part we develop a new methodology to serve a basis for making higher order inferences in metabolomics, …


Generalized Spatiotemporal Modeling And Causal Inference For Assessing Treatment Effects For Multiple Groups For Ordinal Outcome., Soutik Ghosal Aug 2018

Generalized Spatiotemporal Modeling And Causal Inference For Assessing Treatment Effects For Multiple Groups For Ordinal Outcome., Soutik Ghosal

Electronic Theses and Dissertations

This dissertation consists of three projects and can be categorized in two broad research areas: generalized spatiotemporal modeling and causal inference based on observational data. In the first project, I introduce a Bayesian hierarchical mixed effect hurdle model with a nested random effect structure to model the count for primary care providers and understand their spatial and temporal variation. This study further enables us to identify the health professional shortage areas and the possible impacting factors. In the second project, I have unified popular parametric and nonparametric propensity score-based methods to assess the treatment effect of multiple groups for ordinal …


Spatio-Temporal Dynamics Of Atlantic Cod Bycatch In The Maine Lobster Fishery And Its Impacts On Stock Assessment, Robert E. Boenish May 2018

Spatio-Temporal Dynamics Of Atlantic Cod Bycatch In The Maine Lobster Fishery And Its Impacts On Stock Assessment, Robert E. Boenish

Electronic Theses and Dissertations

Of the most iconic fish species in the world, the Atlantic cod (Gadus morhua, hereafter, cod) has been a mainstay in the North Atlantic for centuries. While many global fish stocks have received increased pressure with the advent of new, more efficient fishing technology in the mid-20th century, exceptional pressure has been placed on this prized gadoid. Bycatch, or the unintended catch of organisms, is one of the biggest global fisheries issues. Directly resulting from the failed recovery of cod in the GoM, attention has been placed as to possible sources of unaccounted catch. Among the most …


Evaluation Of Using The Bootstrap Procedure To Estimate The Population Variance, Nghia Trong Nguyen May 2018

Evaluation Of Using The Bootstrap Procedure To Estimate The Population Variance, Nghia Trong Nguyen

Electronic Theses and Dissertations

The bootstrap procedure is widely used in nonparametric statistics to generate an empirical sampling distribution from a given sample data set for a statistic of interest. Generally, the results are good for location parameters such as population mean, median, and even for estimating a population correlation. However, the results for a population variance, which is a spread parameter, are not as good due to the resampling nature of the bootstrap method. Bootstrap samples are constructed using sampling with replacement; consequently, groups of observations with zero variance manifest in these samples. As a result, a bootstrap variance estimator will carry a …


Some New And Generalized Distributions Via Exponentiation, Gamma And Marshall-Olkin Generators With Applications, Hameed Abiodun Jimoh Jan 2018

Some New And Generalized Distributions Via Exponentiation, Gamma And Marshall-Olkin Generators With Applications, Hameed Abiodun Jimoh

Electronic Theses and Dissertations

Three new generalized distributions developed via completing risk, gamma generator, Marshall-Olkin generator and exponentiation techniques are proposed and studied. Structural properties including quantile functions, hazard rate functions, moment, conditional moments, mean deviations, R\'enyi entropy, distribution of order statistics and maximum likelihood estimates are presented. Monte Carlo simulation is employed to examine the performance of the proposed distributions. Applications of the generalized distributions to real lifetime data are presented to illustrate the usefulness of the models.


A Cross-Sectional Exploration Of Household Financial Reactions And Homebuyer Awareness Of Registered Sex Offenders In A Rural, Suburban, And Urban County., John Charles Navarro Aug 2017

A Cross-Sectional Exploration Of Household Financial Reactions And Homebuyer Awareness Of Registered Sex Offenders In A Rural, Suburban, And Urban County., John Charles Navarro

Electronic Theses and Dissertations

As stigmatized persons, registered sex offenders betoken instability in communities. Depressed home sale values are associated with the presence of registered sex offenders even though the public is largely unaware of the presence of registered sex offenders. Using a spatial multilevel approach, the current study examines the role registered sex offenders influence sale values of homes sold in 2015 for three U.S. counties (rural, suburban, and urban) located in Illinois and Kentucky within the social disorganization framework. Homebuyers were surveyed to examine whether awareness of local registered sex offenders and the homebuyer’s community type operate as moderators between home selling …


Denoising Tandem Mass Spectrometry Data, Felix Offei May 2017

Denoising Tandem Mass Spectrometry Data, Felix Offei

Electronic Theses and Dissertations

Protein identification using tandem mass spectrometry (MS/MS) has proven to be an effective way to identify proteins in a biological sample. An observed spectrum is constructed from the data produced by the tandem mass spectrometer. A protein can be identified if the observed spectrum aligns with the theoretical spectrum. However, data generated by the tandem mass spectrometer are affected by errors thus making protein identification challenging in the field of proteomics. Some of these errors include wrong calibration of the instrument, instrument distortion and noise. In this thesis, we present a pre-processing method, which focuses on the removal of noisy …


Newsvendor Models With Monte Carlo Sampling, Ijeoma W. Ekwegh Aug 2016

Newsvendor Models With Monte Carlo Sampling, Ijeoma W. Ekwegh

Electronic Theses and Dissertations

Newsvendor Models with Monte Carlo Sampling by Ijeoma Winifred Ekwegh The newsvendor model is used in solving inventory problems in which demand is random. In this thesis, we will focus on a method of using Monte Carlo sampling to estimate the order quantity that will either maximizes revenue or minimizes cost given that demand is uncertain. Given data, the Monte Carlo approach will be used in sampling data over scenarios and also estimating the probability density function. A bootstrapping process yields an empirical distribution for the order quantity that will maximize the expected profit. Finally, this method will be used …