Open Access. Powered by Scholars. Published by Universities.®

Statistical Methodology Commons

Open Access. Powered by Scholars. Published by Universities.®

955 Full-Text Articles 1,427 Authors 242,218 Downloads 82 Institutions

All Articles in Statistical Methodology

Faceted Search

955 full-text articles. Page 1 of 27.

Estimation In High-Dimensional Factor Models With Structural Instabilities, Wen Gao 2018 University of Windsor

Estimation In High-Dimensional Factor Models With Structural Instabilities, Wen Gao

Major Papers

In this major paper, we use high-dimensional models to analyze macroeconomic data which is in influenced by the break point. In particular, we consider to detect the break point and study the changes of the number of factors and the factor loadings with the structural instability.

Concretely, we propose two factor models which explain the processes of pre- and post- break periods. Then, we consider the break point as known or unknown. In both situations, we derive the shrinkage estimators by minimizing the penalized least square function and calculate the estimators of the numbers of pre- and post- break factors ...


Minimizing The Perceived Financial Burden Due To Cancer, Hassan Azhar, Zoheb Allam, Gino Varghese, Daniel W. Engels, Sajiny John 2018 Southern Methodist University

Minimizing The Perceived Financial Burden Due To Cancer, Hassan Azhar, Zoheb Allam, Gino Varghese, Daniel W. Engels, Sajiny John

SMU Data Science Review

In this paper, we present a regression model that predicts perceived financial burden that a cancer patient experiences in the treatment and management of the disease. Cancer patients do not fully understand the burden associated with the cost of cancer, and their lack of understanding can increase the difficulties associated with living with the disease, in particular coping with the cost. The relationship between demographic characteristics and financial burden were examined in order to better understand the characteristics of a cancer patient and their burden, while all subsets regression was used to determine the best predictors of financial burden. Age ...


Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels 2018 Southern Methodist University

Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels

SMU Data Science Review

In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that ...


Testing Hypotheses Of Covariance Structure In Multivariate Data, Miguel Fonseca, Arkadiusz Koziol, Roman Zmyslony 2018 NOVA University of Lisbon

Testing Hypotheses Of Covariance Structure In Multivariate Data, Miguel Fonseca, Arkadiusz Koziol, Roman Zmyslony

Electronic Journal of Linear Algebra

In this paper there is given a new approach for testing hypotheses on the structure of covariance matrices in double multivariate data. It is proved that ratio of positive and negative parts of best unbiased estimators (BUE) provide an F-test for independence of blocks variables in double multivariate models.


Robust Inference For The Stepped Wedge Design, James P. Hughes, Patrick J. Heagerty, Fan Xia, Yuqi Ren 2018 University of Washington - Seattle Campus

Robust Inference For The Stepped Wedge Design, James P. Hughes, Patrick J. Heagerty, Fan Xia, Yuqi Ren

UW Biostatistics Working Paper Series

Based on a permutation argument, we derive a closed form expression for an estimate of the treatment effect, along with its standard error, in a stepped wedge design. We show that these estimates are robust to misspecification of both the mean and covariance structure of the underlying data-generating mechanism, thereby providing a robust approach to inference for the treatment effect in stepped wedge designs. We use simulations to evaluate the type I error and power of the proposed estimate and to compare the performance of the proposed estimate to the optimal estimate when the correct model specification is known. The ...


A Comparison Of R, Sas, And Python Implementations Of Random Forests, Breckell Soifua 2018 Utah State University

A Comparison Of R, Sas, And Python Implementations Of Random Forests, Breckell Soifua

All Graduate Plan B and other Reports

The Random Forest method is a useful machine learning tool developed by Leo Breiman. There are many existing implementations across different programming languages; the most popular of which exist in R, SAS, and Python. In this paper, we conduct a comprehensive comparison of these implementations with regards to the accuracy, variable importance measurements, and timing. This comparison was done on a variety of real and simulated data with different classification difficulty levels, number of predictors, and sample sizes. The comparison shows unexpectedly different results between the three implementations.


Bayesian Analytical Approaches For Metabolomics : A Novel Method For Molecular Structure-Informed Metabolite Interaction Modeling, A Novel Diagnostic Model For Differentiating Myocardial Infarction Type, And Approaches For Compound Identification Given Mass Spectrometry Data., Patrick J. Trainor 2018 University of Louisville

Bayesian Analytical Approaches For Metabolomics : A Novel Method For Molecular Structure-Informed Metabolite Interaction Modeling, A Novel Diagnostic Model For Differentiating Myocardial Infarction Type, And Approaches For Compound Identification Given Mass Spectrometry Data., Patrick J. Trainor

Electronic Theses and Dissertations

Metabolomics, the study of small molecules in biological systems, has enjoyed great success in enabling researchers to examine disease-associated metabolic dysregulation and has been utilized for the discovery biomarkers of disease and phenotypic states. In spite of recent technological advances in the analytical platforms utilized in metabolomics and the proliferation of tools for the analysis of metabolomics data, significant challenges in metabolomics data analyses remain. In this dissertation, we present three of these challenges and Bayesian methodological solutions for each. In the first part we develop a new methodology to serve a basis for making higher order inferences in metabolomics ...


Generalized Spatiotemporal Modeling And Causal Inference For Assessing Treatment Effects For Multiple Groups For Ordinal Outcome., Soutik Ghosal 2018 University of Louisville

Generalized Spatiotemporal Modeling And Causal Inference For Assessing Treatment Effects For Multiple Groups For Ordinal Outcome., Soutik Ghosal

Electronic Theses and Dissertations

This dissertation consists of three projects and can be categorized in two broad research areas: generalized spatiotemporal modeling and causal inference based on observational data. In the first project, I introduce a Bayesian hierarchical mixed effect hurdle model with a nested random effect structure to model the count for primary care providers and understand their spatial and temporal variation. This study further enables us to identify the health professional shortage areas and the possible impacting factors. In the second project, I have unified popular parametric and nonparametric propensity score-based methods to assess the treatment effect of multiple groups for ordinal ...


Bayesian Sparse Propensity Score Estimation For Unit Nonresponse, Hejian Sang, Gyuhyeong Goh, Jae Kwang Kim 2018 Iowa State University

Bayesian Sparse Propensity Score Estimation For Unit Nonresponse, Hejian Sang, Gyuhyeong Goh, Jae Kwang Kim

Statistics Preprints

Nonresponse weighting adjustment using propensity score is a popular method for handling unit nonresponse. However, including all available auxiliary variables into the propensity model can lead to inefficient and inconsistent estimation, especially with high-dimensional covariates. In this paper, a new Bayesian method using the Spike-and-Slab prior is proposed for sparse propensity score estimation. The proposed method is not based on any model assumption on the outcome variable and is computationally efficient. Instead of doing model selec- tion and parameter estimation separately as in many frequentist methods, the proposed method simultaneously selects the sparse response probability model and provides consistent parameter ...


Predictions Generated From A Simulation Engine For Gene Expression Micro-Arrays For Use In Research Laboratories, Gopinath R. Mavankal, John Blevins, Dominique Edwards, Monnie McGee, Andrew Hardin 2018 Southern Methodist University

Predictions Generated From A Simulation Engine For Gene Expression Micro-Arrays For Use In Research Laboratories, Gopinath R. Mavankal, John Blevins, Dominique Edwards, Monnie Mcgee, Andrew Hardin

SMU Data Science Review

In this paper we introduce the technical components, the biology and data science involved in the use of microarray technology in biological and clinical research. We discuss how laborious experimental protocols involved in obtaining this data used in laboratories could benefit from using simulations of the data. We discuss the approach used in the simulation engine from [7]. We use this simulation engine to generate a prediction tool in Power BI, a Microsoft, business intelligence tool for analytics and data visualization [22]. This tool could be used in any laboratory using micro-arrays to improve experimental design by comparing how predicted ...


Data Scientist’S Analysis Toolbox: Comparison Of Python, R, And Sas Performance, Jim Brittain, Mariana Cendon, Jennifer Nizzi, John Pleis 2018 Southern Methodist University

Data Scientist’S Analysis Toolbox: Comparison Of Python, R, And Sas Performance, Jim Brittain, Mariana Cendon, Jennifer Nizzi, John Pleis

SMU Data Science Review

A quantitative analysis will be performed on experiments utilizing three different tools used for Data Science. The analysis will include replication of analysis along with comparisons of code length, output, and results. Qualitative data will supplement the quantitative findings. The conclusion will provide data support guidance on the correct tool to use for common situations in the field of Data Science.


Hierarchical Bayesian Data Fusion Using Autoencoders, Yevgeniy Vladimirovich Reznichenko 2018 Marquette University

Hierarchical Bayesian Data Fusion Using Autoencoders, Yevgeniy Vladimirovich Reznichenko

Master's Theses (2009 -)

In this thesis, a novel method for tracker fusion is proposed and evaluated for vision-based tracking. This work combines three distinct popular techniques into a recursive Bayesian estimation algorithm. First, semi supervised learning approaches are used to partition data and to train a deep neural network that is capable of capturing normal visual tracking operation and is able to detect anomalous data. We compare various methods by examining their respective receiver operating conditions (ROC) curves, which represent the trade off between specificity and sensitivity for various detection threshold levels. Next, we incorporate the trained neural networks into an existing data ...


Combining Academics And Social Engagement: A Major-Specific Early Alert Method To Counter Student Attrition In Science, Technology, Engineering, And Mathematics, Andrew J. Sage, Cinzia Cervato, Ulrike Genschel, Craig Ogilvie 2018 Iowa State University

Combining Academics And Social Engagement: A Major-Specific Early Alert Method To Counter Student Attrition In Science, Technology, Engineering, And Mathematics, Andrew J. Sage, Cinzia Cervato, Ulrike Genschel, Craig Ogilvie

Geological and Atmospheric Sciences Publications

Students are most likely to leave science, technology, engineering, and mathematics (STEM) majors during their first year of college. We developed an analytic approach using random forests to identify at-risk students. This method is deployable midway through the first semester and accounts for academic preparation, early engagement in university life, and performance on midterm exams. By accounting for cognitive and noncognitive factors, our method achieves stronger predictive performance than would be possible using cognitive or noncognitive factors alone. We show that it is more difficult to predict whether students will leave STEM than whether they will leave the institution. More ...


Improving Shewhart Control Chart Performance In The Presence Of Measurement Error Using Multiple Measurements And Two-Stage Sampling, Kenneth W. Linna 2018 Auburn University Montgomery

Improving Shewhart Control Chart Performance In The Presence Of Measurement Error Using Multiple Measurements And Two-Stage Sampling, Kenneth W. Linna

Journal of International & Interdisciplinary Business Research

The usual Shewhart control chart efficiently detects large shifts in the mean of a quality characteristic and has been extensively studied in the literature. Most proposed alternatives to the Shewhart chart aim to improve either the signal performance for smaller mean shifts or reduce the sampling effort required to detect a larger shift. Measurement error has been shown in the literature to result in reduced power to detect process shifts. The combination of multiple measurements and two-stage sampling is considered here as a strategy for both regaining power lost due to measurement error and specifically tuning the charts for shifts ...


An Empirical Analysis Of Climatic, Geographic, And Cultural Determinants Of International Tourism, Ethan Straus 2018 Union College

An Empirical Analysis Of Climatic, Geographic, And Cultural Determinants Of International Tourism, Ethan Straus

Honors Theses

Each year, billions of people visit different countries all around the world. For many of those countries, tourism is their primary industry, leading to millions of jobs and dollars in revenue. It is expected that by 2020 total International Tourism Receipts will reach 2 trillion US dollars annually. Currently, tourism employs an estimated 200 million people around the world. With the continued progression of climate change, the tourism industry is facing a newfound threat. Global temperatures and the seal level are both expected to rise significantly by the end of the century. Additionally, the Intergovernmental Panel on Climate Change has ...


Inversion Copulas From Nonlinear State Space Models With An Application To Inflation Forecasting, Michael S. Smith, Worapree Ole Maneesoonthorn 2018 Melbourne Business School

Inversion Copulas From Nonlinear State Space Models With An Application To Inflation Forecasting, Michael S. Smith, Worapree Ole Maneesoonthorn

Michael Stanley Smith

We propose the construction of copulas through the inversion of nonlinear state space models. These copulas allow for new time series models that have the same serial dependence structure as a state space model, but with an arbitrary marginal distribution, and flexible density forecasts. We examine the time series properties of the copulas, outline serial dependence measures, and estimate the models using likelihood-based methods. Copulas constructed from three example state space models are considered: a stochastic volatility model with an unobserved component, a Markov switching autoregression, and a Gaussian linear unobserved component model. We show that all three inversion copulas ...


Spatio-Temporal Dynamics Of Atlantic Cod Bycatch In The Maine Lobster Fishery And Its Impacts On Stock Assessment, Robert E. Boenish 2018 University of Maine

Spatio-Temporal Dynamics Of Atlantic Cod Bycatch In The Maine Lobster Fishery And Its Impacts On Stock Assessment, Robert E. Boenish

Electronic Theses and Dissertations

Of the most iconic fish species in the world, the Atlantic cod (Gadus morhua, hereafter, cod) has been a mainstay in the North Atlantic for centuries. While many global fish stocks have received increased pressure with the advent of new, more efficient fishing technology in the mid-20th century, exceptional pressure has been placed on this prized gadoid. Bycatch, or the unintended catch of organisms, is one of the biggest global fisheries issues. Directly resulting from the failed recovery of cod in the GoM, attention has been placed as to possible sources of unaccounted catch. Among the most prominent is ...


Discrete Ranked Set Sampling, Heng Cui 2018 Southern Methodist University

Discrete Ranked Set Sampling, Heng Cui

Statistical Science Theses and Dissertations

Ranked set sampling (RSS) is an efficient data collection framework compared to simple random sampling (SRS). It is widely used in various application areas such as agriculture, environment, sociology, and medicine, especially in situations where measurement is expensive but ranking is less costly. Most past research in RSS focused on situations where the underlying distribution is continuous. However, it is not unusual to have a discrete data generation mechanism. Estimating statistical functionals are challenging as ties may truly exist in discrete RSS. In this thesis, we started with estimating the cumulative distribution function (CDF) in discrete RSS. We proposed two ...


Evaluation Of Using The Bootstrap Procedure To Estimate The Population Variance, Nghia Trong Nguyen 2018 Stephen F Austin State University

Evaluation Of Using The Bootstrap Procedure To Estimate The Population Variance, Nghia Trong Nguyen

Electronic Theses and Dissertations

The bootstrap procedure is widely used in nonparametric statistics to generate an empirical sampling distribution from a given sample data set for a statistic of interest. Generally, the results are good for location parameters such as population mean, median, and even for estimating a population correlation. However, the results for a population variance, which is a spread parameter, are not as good due to the resampling nature of the bootstrap method. Bootstrap samples are constructed using sampling with replacement; consequently, groups of observations with zero variance manifest in these samples. As a result, a bootstrap variance estimator will carry a ...


Analysis Challenges For High Dimensional Data, Bangxin Zhao 2018 The University of Western Ontario

Analysis Challenges For High Dimensional Data, Bangxin Zhao

Electronic Thesis and Dissertation Repository

In this thesis, we propose new methodologies targeting the areas of high-dimensional variable screening, influence measure and post-selection inference. We propose a new estimator for the correlation between the response and high-dimensional predictor variables, and based on the estimator we develop a new screening technique termed Dynamic Tilted Current Correlation Screening (DTCCS) for high dimensional variables screening. DTCCS is capable of picking up the relevant predictor variables within a finite number of steps. The DTCCS method takes the popular used sure independent screening (SIS) method and the high-dimensional ordinary least squares projection (HOLP) approach as its special cases.

Two methods ...


Digital Commons powered by bepress