Open Access. Powered by Scholars. Published by Universities.®

Statistical Methodology Commons

Open Access. Powered by Scholars. Published by Universities.®

1,077 Full-Text Articles 1,669 Authors 292,579 Downloads 99 Institutions

All Articles in Statistical Methodology

Faceted Search

1,077 full-text articles. Page 1 of 33.

Research In Short Term Actuarial Modeling, Elijah Howells 2020 California State University, San Bernardino

Research In Short Term Actuarial Modeling, Elijah Howells

Electronic Theses, Projects, and Dissertations

This paper covers mathematical methods used to conduct actuarial analysis in the short term, such as policy deductible analysis, maximum covered loss analysis, and mixtures of distributions. Assessment of a loss variable's distribution under the effect of a policy deductible, as well as one with an implemented maximum covered loss, and under both a policy deductible and maximum covered loss will also be covered. The derivation, meaning, and use of cost per loss and cost per payment will be discussed, as will those of an aggregate sum distribution, stop loss policy, and maximum likelihood estimation. For each topic, special ...


Sensitivity Analysis For Incomplete Data And Causal Inference, Heng Chen 2020 Southern Methodist University

Sensitivity Analysis For Incomplete Data And Causal Inference, Heng Chen

Statistical Science Theses and Dissertations

In this dissertation, we explore sensitivity analyses under three different types of incomplete data problems, including missing outcomes, missing outcomes and missing predictors, potential outcomes in \emph{Rubin causal model (RCM)}. The first sensitivity analysis is conducted for the \emph{missing completely at random (MCAR)} assumption in frequentist inference; the second one is conducted for the \emph{missing at random (MAR)} assumption in likelihood inference; the third one is conducted for one novel assumption, the ``sixth assumption'' proposed for the robustness of instrumental variable estimand in causal inference.


Evaluation Of The Utility Of Informative Priors In Bayesian Structural Equation Modeling With Small Samples, Hao Ma 2020 Southern Methodist University

Evaluation Of The Utility Of Informative Priors In Bayesian Structural Equation Modeling With Small Samples, Hao Ma

Department of Education Policy and Leadership Theses and Dissertations

The estimation of parameters in structural equation modeling (SEM) has been primarily based on the maximum likelihood estimator (MLE) and relies on large sample asymptotic theory. Consequently, the results of the SEM analyses with small samples may not be as satisfactory as expected. In contrast, informative priors typically do not require a large sample, and they may be helpful for improving the quality of estimates in the SEM models with small samples. However, the role of informative priors in the Bayesian SEM has not been thoroughly studied to date. Given the limited body of evidence, specifying effective informative priors remains ...


Introduction To Research Statistical Analysis: An Overview Of The Basics, Christian Vandever 2020 HCA Healthcare

Introduction To Research Statistical Analysis: An Overview Of The Basics, Christian Vandever

HCA Healthcare Journal of Medicine

This article covers many statistical ideas essential to research statistical analysis. Sample size is explained through the concepts of statistical significance level and power. Variable types and definitions are included to clarify necessities for how the analysis will be interpreted. Categorical and quantitative variable types are defined, as well as response and predictor variables. Statistical tests described include t-tests, ANOVA and chi-square tests. Multiple regression is also explored for both logistic and linear regression. Finally, the most common statistics produced by these methods are explored.


Demand Forecasting In Wholesale Alcohol Distribution: An Ensemble Approach, Tanvi Arora, Rajat Chandna, Stacy Conant, Bivin Sadler, Robert Slater 2020 Southern Methodist University

Demand Forecasting In Wholesale Alcohol Distribution: An Ensemble Approach, Tanvi Arora, Rajat Chandna, Stacy Conant, Bivin Sadler, Robert Slater

SMU Data Science Review

In this paper, historical data from a wholesale alcoholic beverage distributor was used to forecast sales demand. Demand forecasting is a vital part of the sale and distribution of many goods. Accurate forecasting can be used to optimize inventory, improve cash ow, and enhance customer service. However, demand forecasting is a challenging task due to the many unknowns that can impact sales, such as the weather and the state of the economy. While many studies focus effort on modeling consumer demand and endpoint retail sales, this study focused on demand forecasting from the distributor perspective. An ensemble approach was applied ...


Rmse-Minimizing Confidence Intervals For The Binomial Parameter, Kexin Feng 2020 William & Mary

Rmse-Minimizing Confidence Intervals For The Binomial Parameter, Kexin Feng

Undergraduate Honors Theses

Let X1, X2, . . . , Xn be independent and identically distributed Bernoulli(p) random variables with unknown parameter p satisfying 0 < p < 1. Let X = Pn i=1 Xi be the number of successes in the n mutually independent Bernoulli trials. The maximum likelihood estimator of p is ˆp = X/n. For fixed n and α, there are n + 1 distinct 100(1 − α)% confidence intervals associated with X = 0, 1, 2, . . . , n. Currently there is no known exact confidence interval for p. Our goal is to construct the confidence interval for p whose actual coverage is closest to the stated coverage, using the root mean squared error, RMSE, to measure the difference between the actual coverage and the stated coverage. The approximate confidence interval for p developed here minimizes the RMSE for a sample size n and a significance level α.


Visualization And Joint Analysis Of Monitored Multivariate Spatio-Temporal Data With Applications To Forest Fire Modelling And Sports Analytics, Devan Becker 2020 The University of Western Ontario

Visualization And Joint Analysis Of Monitored Multivariate Spatio-Temporal Data With Applications To Forest Fire Modelling And Sports Analytics, Devan Becker

Electronic Thesis and Dissertation Repository

This thesis develops and applies novel techniques for the study of complex data structures with applications to wildland fire analytics and sports analytics. We consider situations where different models share information, including many different variables recorded simultaneously in aerial wildland fire fighting, how the frequency and severity of wildland fires are related, and how the shot locations of hockey players can be decomposed into spatial components that are shared across different players.

The first study analyzes flight patterns while fighting a wildland fire using several outlier detection techniques. These techniques applied several definitions of ``outlier'' to determine whether or not ...


Power And Statistical Significance In Securities Fraud Litigation, Jill E. Fisch, Jonah B. Gelbach 2020 University of Pennsylvania Law School

Power And Statistical Significance In Securities Fraud Litigation, Jill E. Fisch, Jonah B. Gelbach

Faculty Scholarship at Penn Law

Event studies, a half-century-old approach to measuring the effect of events on stock prices, are now ubiquitous in securities fraud litigation. In determining whether the event study demonstrates a price effect, expert witnesses typically base their conclusion on whether the results are statistically significant at the 95% confidence level, a threshold that is drawn from the academic literature. As a positive matter, this represents a disconnect with legal standards of proof. As a normative matter, it may reduce enforcement of fraud claims because litigation event studies typically involve quite low statistical power even for large-scale frauds.

This paper, written for ...


Session 11 - Methods: Bootstrap Control Chart For Pareto Percentiles, Ruth Burkhalter 2020 University of South Dakota

Session 11 - Methods: Bootstrap Control Chart For Pareto Percentiles, Ruth Burkhalter

SDSU Data Science Symposium

Lifetime percentile is an important indicator of product reliability. However, the sampling distribution of a percentile estimator for any lifetime distribution is not a bell shaped one. As a result, the well-known Shewhart-type control chart cannot be applied to monitor the product lifetime percentiles. In this presentation, Bootstrap control charts based on maximum likelihood estimator (MLE) are proposed for monitoring Pareto percentiles. An intensive simulation study is conducted to compare the performance among the proposed MLE Bootstrap control chart and Shewhart-type control chart.


A Two-Stage Hybrid Model By Using Artificial Neural Networks As Feature Construction Algorithms, Yan Wang, Sherry Ni, Brian Stone 2020 Kennesaw State University

A Two-Stage Hybrid Model By Using Artificial Neural Networks As Feature Construction Algorithms, Yan Wang, Sherry Ni, Brian Stone

Grey Literature from PhD Candidates

We propose a two-stage hybrid approach with neural networks as the new feature construction algorithms for bankcard response classifications. The hybrid model uses a very simple neural network structure as the new feature construction tool in the first stage, then the newly created features are used as the additional input variables in logistic regression in the second stage. The model is compared with the traditional one-stage model in credit customer response classification. It is observed that the proposed two-stage model outperforms the one-stage model in terms of accuracy, the area under the ROC curve, and KS statistic. By creating new ...


An Automatic Interaction Detection Hybrid Model For Bankcard Response Classification, Yan Wang, Sherry Ni, Brian Stone 2020 Kennesaw State University

An Automatic Interaction Detection Hybrid Model For Bankcard Response Classification, Yan Wang, Sherry Ni, Brian Stone

Grey Literature from PhD Candidates

Data mining techniques have numerous applications in bankcard response modeling. Logistic regression has been used as the standard modeling tool in the financial industry because of its almost always desirable performance and its interpretability. In this paper, we propose a hybrid bankcard response model, which integrates decision tree-based chi-square automatic interaction detection (CHAID) into logistic regression. In the first stage of the hybrid model, CHAID analysis is used to detect the possible potential variable interactions. Then in the second stage, these potential interactions are served as the additional input variables in logistic regression. The motivation of the proposed hybrid model ...


A Xgboost Risk Model Via Feature Selection And Bayesian Hyper-Parameter Optimization, Yan Wang, Sherry Ni 2020 Kennesaw State University

A Xgboost Risk Model Via Feature Selection And Bayesian Hyper-Parameter Optimization, Yan Wang, Sherry Ni

Grey Literature from PhD Candidates

This paper aims to explore models based on the extreme gradient boosting (XGBoost) approach for business risk classification. Feature selection (FS) algorithms and hyper-parameter optimizations are simultaneously considered during model training. The five most commonly used FS methods including weight by Gini, weight by Chi-square, hierarchical variable clustering, weight by correlation, and weight by information are applied to alleviate the effect of redundant features. Two hyper-parameter optimization approaches, random search (RS) and Bayesian tree-structuredParzen Estimator (TPE), are applied in XGBoost. The effect of different FS and hyper-parameter optimization methods on the model performance are investigated by the Wilcoxon Signed Rank ...


Methodological Issues Of Spatial Agent-Based Models, Steven Manson, Li An, Keith C. Clarke, Alison Heppenstall, Jennifer Koch, Brittany Krzyzanowski, Fraser Morgan, David O'Sullivan, Bryan C. Runck, Eric Shook, Leigh Tesfatsion 2020 University of Minnesota - Twin Cities

Methodological Issues Of Spatial Agent-Based Models, Steven Manson, Li An, Keith C. Clarke, Alison Heppenstall, Jennifer Koch, Brittany Krzyzanowski, Fraser Morgan, David O'Sullivan, Bryan C. Runck, Eric Shook, Leigh Tesfatsion

Economics Publications

Agent based modeling (ABM) is a standard tool that is useful across many disciplines. Despite widespread and mounting interest in ABM, even broader adoption has been hindered by a set of methodological challenges that run from issues around basic tools to the need for a more complete conceptual foundation for the approach. After several decades of progress, ABMs remain difficult to develop and use for many students, scholars, and policy makers. This difficulty holds especially true for models designed to represent spatial patterns and processes across a broad range of human, natural, and human-environment systems. In this paper, we describe ...


Process Based Analysis Of Fluvial Stratigraphic Record: Middle Pennsylvanian Allegheny Formation, North-Central Wv, Oluwasegun O. Abatan 2020 West Virginia University

Process Based Analysis Of Fluvial Stratigraphic Record: Middle Pennsylvanian Allegheny Formation, North-Central Wv, Oluwasegun O. Abatan

Graduate Theses, Dissertations, and Problem Reports

Fluvial deposits represent some of the best hydrocarbon reservoirs, but the quality of fluvial reservoirs varies depending on the reservoir architecture, which is controlled by allogenic and autogenic processes. Allogenic controls, including paleoclimate, tectonics, and glacio-eustasy, have long been debated as dominant controls in the deposition of fluvial strata. However, recent research has questioned the validity of this cyclicity and may indicate major influence from autogenic controls. To further investigate allogenic controls on stratal order, I analyzed the facies architecture, geomorphology, paleohydrology, and the stratigraphic framework of the Middle Pennsylvanian Allegheny Formation (MPAF), a fluvial depositional system in the Appalachian ...


Knot Selection In Sparse Gaussian Processes With A Variational Objective Function, Nathaniel Garton, Jarad Niemi, Alicia Carriquiry 2020 Iowa State University

Knot Selection In Sparse Gaussian Processes With A Variational Objective Function, Nathaniel Garton, Jarad Niemi, Alicia Carriquiry

Statistics Publications

Sparse, knot‐based Gaussian processes have enjoyed considerable success as scalable approximations of full Gaussian processes. Certain sparse models can be derived through specific variational approximations to the true posterior, and knots can be selected to minimize the Kullback‐Leibler divergence between the approximate and true posterior. While this has been a successful approach, simultaneous optimization of knots can be slow due to the number of parameters being optimized. Furthermore, there have been few proposed methods for selecting the number of knots, and no experimental results exist in the literature. We propose using a one‐at‐a‐time knot selection ...


Generalized Matrix Decomposition Regression: Estimation And Inference For Two-Way Structured Data, Yue Wang, Ali Shojaie, Tim Randolph, Jing Ma 2019 University of Washington

Generalized Matrix Decomposition Regression: Estimation And Inference For Two-Way Structured Data, Yue Wang, Ali Shojaie, Tim Randolph, Jing Ma

UW Biostatistics Working Paper Series

Analysis of two-way structured data, i.e., data with structures among both variables and samples, is becoming increasingly common in ecology, biology and neuro-science. Classical dimension-reduction tools, such as the singular value decomposition (SVD), may perform poorly for two-way structured data. The generalized matrix decomposition (GMD, Allen et al., 2014) extends the SVD to two-way structured data and thus constructs singular vectors that account for both structures. While the GMD is a useful dimension-reduction tool for exploratory analysis of two-way structured data, it is unsupervised and cannot be used to assess the association between such data and an outcome of ...


Inference Of Heterogeneity In Meta-Analysis Of Rare Binary Events And Rss-Structured Cluster Randomized Studies, Chiyu Zhang 2019 Southern Methodist University

Inference Of Heterogeneity In Meta-Analysis Of Rare Binary Events And Rss-Structured Cluster Randomized Studies, Chiyu Zhang

Statistical Science Theses and Dissertations

This dissertation contains two topics: (1) A Comparative Study of Statistical Methods for Quantifying and Testing Between-study Heterogeneity in Meta-analysis with Focus on Rare Binary Events; (2) Estimation of Variances in Cluster Randomized Designs Using Ranked Set Sampling.

Meta-analysis, the statistical procedure for combining results from multiple studies, has been widely used in medical research to evaluate intervention efficacy and safety. In many practical situations, the variation of treatment effects among the collected studies, often measured by the heterogeneity parameter, may exist and can greatly affect the inference about effect sizes. Comparative studies have been done for only one or ...


Statistical Inference For Networks Of High-Dimensional Point Processes, Xu Wang, Mladen Kolar, Ali Shojaie 2019 University of Washington - Seattle Campus

Statistical Inference For Networks Of High-Dimensional Point Processes, Xu Wang, Mladen Kolar, Ali Shojaie

UW Biostatistics Working Paper Series

Fueled in part by recent applications in neuroscience, high-dimensional Hawkes process have become a popular tool for modeling the network of interactions among multivariate point process data. While evaluating the uncertainty of the network estimates is critical in scientific applications, existing methodological and theoretical work have only focused on estimation. To bridge this gap, this paper proposes a high-dimensional statistical inference procedure with theoretical guarantees for multivariate Hawkes process. Key to this inference procedure is a new concentration inequality on the first- and second-order statistics for integrated stochastic processes, which summarizes the entire history of the process. We apply this ...


Evaluation Of Modern Missing Data Handling Methods For Coefficient Alpha, Katerina Matysova 2019 University of Nebraska - Lincoln

Evaluation Of Modern Missing Data Handling Methods For Coefficient Alpha, Katerina Matysova

Public Access Theses and Dissertations from the College of Education and Human Sciences

When assessing a certain characteristic or trait using a multiple item measure, quality of that measure can be assessed by examining the reliability. To avoid multiple time points, reliability can be represented by internal consistency, which is most commonly calculated using Cronbach’s coefficient alpha. Almost every time human participants are involved in research, there is missing data involved. Missing data means that even though complete data were expected to be collected, some data are missing. Missing data can follow different patterns as well as be the result of different mechanisms. One traditional way to deal with missing data is ...


Phylogenetic Comparative Methods And The Evolution Of Multivariate Phenotypes, Dean C. Adams, Michael L. Collyer 2019 Iowa State University

Phylogenetic Comparative Methods And The Evolution Of Multivariate Phenotypes, Dean C. Adams, Michael L. Collyer

Ecology, Evolution and Organismal Biology Publications

Evolutionary biology is multivariate, and advances in phylogenetic comparative methods for multivariate phenotypes have surged to accommodate this fact. Evolutionary trends in multivariate phenotypes are derived from distances and directions between species in a multivariate phenotype space. For these patterns to be interpretable, phenotypes should be characterized by traits in commensurate units and scale. Visualizing such trends, as is achieved with phylomorphospaces, should continue to play a prominent role in macroevolutionary analyses. Evaluating phylogenetic generalized least squares (PGLS) models (e.g., phylogenetic analysis of variance and regression) is valuable, but using parametric procedures is limited to only a few phenotypic ...


Digital Commons powered by bepress