Open Access. Powered by Scholars. Published by Universities.®

Statistical Methodology Commons

Open Access. Powered by Scholars. Published by Universities.®

Applied Statistics

2019

Institution
Keyword
Publication
Publication Type

Articles 1 - 19 of 19

Full-Text Articles in Statistical Methodology

Communications And Methodologies In Crime Geography: Contemporary Approaches To Disseminating Criminal Incidence And Research, Mitchell Ogden Dec 2019

Communications And Methodologies In Crime Geography: Contemporary Approaches To Disseminating Criminal Incidence And Research, Mitchell Ogden

Electronic Theses and Dissertations

Many tools exist to assist law enforcement agencies in mitigating criminal activity. For centuries, academics used statistics in the study of crime and criminals, and more recently, police departments make use of spatial statistics and geographic information systems in that pursuit. Clustering and hot spot methods of analysis are popular in this application for their relative simplicity of interpretation and ease of process. With recent advancements in geospatial technology, it is easier than ever to publicly share data through visual communication tools like web applications and dashboards. Sharing data and results of analyses boosts transparency and the public image of …


A “How-To” Manual For Doing Standard Statistics In R, Elizabeth Newton Oct 2019

A “How-To” Manual For Doing Standard Statistics In R, Elizabeth Newton

OER Textbooks

This “How To….” Manual is intended to assist the new user in implementing standard statistical methods, both parametric and non-parametric, using R statistical software. Its focus is on R implementation, not statistical theory. It includes the R commands, with examples, for the following: proportion tests, t-tests, ANOVA, variance tests, several correlation measures and regression models, Mann-Whitney-Wilcoxon tests, Kruskal-Wallis tests, chi-squared tests, multiple pairwise comparisons and effect sizes. Basic graphical methods are also illustrated.

[See note on 2024 update below.]


Optimal Design For A Causal Structure, Zaher Kmail Aug 2019

Optimal Design For A Causal Structure, Zaher Kmail

Department of Statistics: Dissertations, Theses, and Student Work

Linear models and mixed models are important statistical tools. But in many natural phenomena, there is more than one endogenous variable involved and these variables are related in a sophisticated way. Structural Equation Modeling (SEM) is often used to model the complex relationships between the endogenous and exogenous variables. It was first implemented in research to estimate the strength and direction of direct and indirect effects among variables and to measure the relative magnitude of each causal factor.

Historically, traditional optimal design theory focuses on univariate linear, nonlinear, and mixed models. There is no current literature on the subject of …


Allocative Poisson Factorization For Computational Social Science, Aaron Schein Jul 2019

Allocative Poisson Factorization For Computational Social Science, Aaron Schein

Doctoral Dissertations

Social science data often comes in the form of high-dimensional discrete data such as categorical survey responses, social interaction records, or text. These data sets exhibit high degrees of sparsity, missingness, overdispersion, and burstiness, all of which present challenges to traditional statistical modeling techniques. The framework of Poisson factorization (PF) has emerged in recent years as a natural way to model high-dimensional discrete data sets. This framework assumes that each observed count in a data set is a Poisson random variable $y ~ Pois(\mu)$ whose rate parameter $\mu$ is a function of shared model parameters. This thesis examines a specific …


Samples, Unite! Understanding The Effects Of Matching Errors On Estimation Of Total When Combining Data Sources, Benjamin Williams May 2019

Samples, Unite! Understanding The Effects Of Matching Errors On Estimation Of Total When Combining Data Sources, Benjamin Williams

Statistical Science Theses and Dissertations

Much recent research has focused on methods for combining a probability sample with a non-probability sample to improve estimation by making use of information from both sources. If units exist in both samples, it becomes necessary to link the information from the two samples for these units. Record linkage is a technique to link records from two lists that refer to the same unit but lack a unique identifier across both lists. Record linkage assigns a probability to each potential pair of records from the lists so that principled matching decisions can be made. Because record linkage is a probabilistic …


What Makes A Good Research Consultant?, Justin Harding, Samantha Estrada, Michael Floren May 2019

What Makes A Good Research Consultant?, Justin Harding, Samantha Estrada, Michael Floren

The Qualitative Report

Statistical and research consulting is defined as the collaboration of a statistician or methodologist with another professional for devising solutions to research problems. An in-depth, interview qualitative approach was taken to answer the research question of what makes a good research consultant. The authors interviewed four faculty members in the field of statistics and research methods and two experienced graduate student consultants. In-depth, face-to-face interviews revealed common themes regarding consultancy skills, resourcefulness, communication and interpersonal skills. The participants discussed how to improve consulting sessions and deal with clients with different statistics levels and backgrounds. Participants felt there was no difference …


Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia May 2019

Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia

SMU Data Science Review

In this paper, we help NASA solve three Exploration Mission-1 (EM-1) challenges: data storage, computation time, and visualization of complex data. NASA is studying one year of trajectory data to determine available launch opportunities (about 90TBs of data). We improve data storage by introducing a cloud-based solution that provides elasticity and server upgrades. This migration will save $120k in infrastructure costs every four years, and potentially avoid schedule slips. Additionally, it increases computational efficiency by 125%. We further enhance computation via machine learning techniques that use the classic orbital elements to predict valid trajectories. Our machine learning model decreases trajectory …


Dynamic Sampling Versions Of Popular Spc Charts For Big Data Analysis, Samuel Anyaso-Samuel May 2019

Dynamic Sampling Versions Of Popular Spc Charts For Big Data Analysis, Samuel Anyaso-Samuel

Boise State University Theses and Dissertations

The statistical process control (SPC) chart is an effective tool for the analysis, interpretation, and visualization of data from sequential processes. Commonly used SPC charts such as the Shewhart, CUSUM and EWMA charts are widely implemented in detecting distributional shifts in various processes. With recent scientific and technological advancements, massive amounts of data continue to be generated by production, medical, agricultural and many other industrial processes. Conventional SPC charts have significant drawbacks in monitoring such processes, specifically when the velocity of the data flow is greater than the run time of the monitoring procedure. In the literature, dynamic sampling control …


Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse Apr 2019

Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse

FIU Electronic Theses and Dissertations

Regression is a statistical technique for modeling the relationship between a dependent variable Y and two or more predictor variables, also known as regressors. In the broad field of regression, there exists a special case in which the relationship between the dependent variable and the regressor(s) is linear. This is known as linear regression.

The purpose of this paper is to create a useful method that effectively selects a subset of regressors when dealing with high dimensional data and/or collinearity in linear regression. As the name depicts it, high dimensional data occurs when the number of predictor variables is far …


Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan Mar 2019

Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan

COBRA Preprint Series

One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease's process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards …


Session: 4 Multilinear Subspace Learning And Its Applications To Machine Learning, Randy Hoover, Kyle Caudle Dr., Karen Braman Dr. Feb 2019

Session: 4 Multilinear Subspace Learning And Its Applications To Machine Learning, Randy Hoover, Kyle Caudle Dr., Karen Braman Dr.

SDSU Data Science Symposium

Multi-dimensional data analysis has seen increased interest in recent years. With more and more data arriving as 2-dimensional arrays (images) as opposed to 1-dimensioanl arrays (signals), new methods for dimensionality reduction, data analysis, and machine learning have been pursued. Most notably have been the Canonical Decompositions/Parallel Factors (commonly referred to as CP) and Tucker decompositions (commonly regarded as a high order SVD: HOSVD). In the current research we present an alternate method for computing singular value and eigenvalue decompositions on multi-way data through an algebra of circulants and illustrate their application to two well-known machine learning methods: Multi-Linear Principal Component …


Modeling Stochastically Intransitive Relationships In Paired Comparison Data, Ryan Patrick Alexander Mcshane Jan 2019

Modeling Stochastically Intransitive Relationships In Paired Comparison Data, Ryan Patrick Alexander Mcshane

Statistical Science Theses and Dissertations

If the Warriors beat the Rockets and the Rockets beat the Spurs, does that mean that the Warriors are better than the Spurs? Sophisticated fans would argue that the Warriors are better by the transitive property, but could Spurs fans make a legitimate argument that their team is better despite this chain of evidence?

We first explore the nature of intransitive (rock-scissors-paper) relationships with a graph theoretic approach to the method of paired comparisons framework popularized by Kendall and Smith (1940). Then, we focus on the setting where all pairs of items, teams, players, or objects have been compared to …


Statistical Designs For Network A/B Testing, Victoria V. Pokhilko Jan 2019

Statistical Designs For Network A/B Testing, Victoria V. Pokhilko

Theses and Dissertations

A/B testing refers to the statistical procedure of experimental design and analysis to compare two treatments, A and B, applied to different testing subjects. It is widely used by technology companies such as Facebook, LinkedIn, and Netflix, to compare different algorithms, web-designs, and other online products and services. The subjects participating in these online A/B testing experiments are users who are connected in different scales of social networks. Two connected subjects are similar in terms of their social behaviors, education and financial background, and other demographic aspects. Hence, it is only natural to assume that their reactions to online products …


Controlling For Confounding Via Propensity Score Methods Can Result In Biased Estimation Of The Conditional Auc: A Simulation Study, Hadiza I. Galadima, Donna K. Mcclish Jan 2019

Controlling For Confounding Via Propensity Score Methods Can Result In Biased Estimation Of The Conditional Auc: A Simulation Study, Hadiza I. Galadima, Donna K. Mcclish

Community & Environmental Health Faculty Publications

In the medical literature, there has been an increased interest in evaluating association between exposure and outcomes using nonrandomized observational studies. However, because assignments to exposure are not random in observational studies, comparisons of outcomes between exposed and nonexposed subjects must account for the effect of confounders. Propensity score methods have been widely used to control for confounding, when estimating exposure effect. Previous studies have shown that conditioning on the propensity score results in biased estimation of conditional odds ratio and hazard ratio. However, research is lacking on the performance of propensity score methods for covariate adjustment when estimating the …


Variable Selection In Accelerated Failure Time (Aft) Frailty Models: An Application Of Penalized Quasi-Likelihood, Sarbesh R. Pandeya Jan 2019

Variable Selection In Accelerated Failure Time (Aft) Frailty Models: An Application Of Penalized Quasi-Likelihood, Sarbesh R. Pandeya

Electronic Theses and Dissertations

Variable selection is one of the standard ways of selecting models in large scale datasets. It has applications in many fields of research study, especially in large multi-center clinical trials. One of the prominent methods in variable selection is the penalized likelihood, which is both consistent and efficient. However, the penalized selection is significantly challenging under the influence of random (frailty) covariates. It is even more complicated when there is involvement of censoring as it may not have a closed-form solution for the marginal log-likelihood. Therefore, we applied the penalized quasi-likelihood (PQL) approach that approximates the solution for such a …


Quantifying Human Biological Age: A Machine Learning Approach, Syed Ashiqur Rahman Jan 2019

Quantifying Human Biological Age: A Machine Learning Approach, Syed Ashiqur Rahman

Graduate Theses, Dissertations, and Problem Reports

Quantifying human biological age is an important and difficult challenge. Different biomarkers and numerous approaches have been studied for biological age prediction, each with its advantages and limitations. In this work, we first introduce a new anthropometric measure (called Surface-based Body Shape Index, SBSI) that accounts for both body shape and body size, and evaluate its performance as a predictor of all-cause mortality. We analyzed data from the National Health and Human Nutrition Examination Survey (NHANES). Based on the analysis, we introduce a new body shape index constructed from four important anthropometric determinants of body shape and body size: body …


Bayesian Analysis For The Intraclass Model And For The Quantile Semiparametric Mixed-Effects Double Regression Models, Duo Zhang Jan 2019

Bayesian Analysis For The Intraclass Model And For The Quantile Semiparametric Mixed-Effects Double Regression Models, Duo Zhang

Dissertations, Master's Theses and Master's Reports

This dissertation consists of three distinct but related research projects. The first two projects focus on objective Bayesian hypothesis testing and estimation for the intraclass correlation coefficient in linear models. The third project deals with Bayesian quantile inference for the semiparametric mixed-effects double regression models. In the first project, we derive the Bayes factors based on the divergence-based priors for testing the intraclass correlation coefficient (ICC). The hypothesis testing of the ICC is used to test the uncorrelatedness in multilevel modeling, and it has not well been studied from an objective Bayesian perspective. Simulation results show that the two sorts …


Statistical Methods For Joint Analysis Of Multiple Phenotypes And Their Applications For Phewas, Xueling Li Jan 2019

Statistical Methods For Joint Analysis Of Multiple Phenotypes And Their Applications For Phewas, Xueling Li

Dissertations, Master's Theses and Master's Reports

Genome-wide association studies (GWAS) have successfully detected tens of thousands of robust SNP-trait associations. Earlier researches have primarily focused on association studies of genetic variants and some well-defined functions or phenotypic traits. Emerging evidence suggests that pleiotropy, the phenomenon of one genetic variant affects multiple phenotypes, is widespread, especially in complex human diseases. Therefore, individual phenotype analyses may lose statistical power to identify the underlying genetic mechanism. Contrasting with single phenotype analyses, joint analysis of multiple phenotypes exploits the correlations between phenotypes and aggregates multiple weak marginal effects and is therefore likely to provide new insights into the functional consequences …


Statistical Modeling Of Influenza-Like-Illness In Montana Using Spatial And Temporal Methods, Benjamin A. Stark Jan 2019

Statistical Modeling Of Influenza-Like-Illness In Montana Using Spatial And Temporal Methods, Benjamin A. Stark

Graduate Student Theses, Dissertations, & Professional Papers

Studying air pollution and public health has been a historically important question in science. It has long been hypothesized that severe air pollution conditions lead to negative implications in basic human health. Primarily, areas thats are prone to severe degrees of human pollution are the focus of such studies. Such research relating to less populated areas are scarce, and this scarcity raises the question of how such pollution dynamics (human-made and natural) influence human health in more rural areas.

The aim of this study is to explore this hole in research; in particular we explore possible links between air pollution …