Open Access. Powered by Scholars. Published by Universities.®

Multivariate Analysis Commons

Open Access. Powered by Scholars. Published by Universities.®

Applied Statistics

2020

Articles 1 - 12 of 12

Full-Text Articles in Multivariate Analysis

Variation In Personality Among Semi-Wild Myanmar Timber Elephants, Sateesh Venkatesh Dec 2020

Variation In Personality Among Semi-Wild Myanmar Timber Elephants, Sateesh Venkatesh

Theses and Dissertations

This study examines two personality traits: exploration and neophobia, which could influence human-elephant conflicts. Thirty-one semi-wild elephants were tested over two trials using a custom novel puzzle tube containing three tasks and three rewards. Our studies show that elephants do vary significantly between individuals in both exploration and neophobia.


A Geochemical And Statistical Investigation Of The Big Four Springs Region In Southern Missouri, Jordan Jasso Vega Aug 2020

A Geochemical And Statistical Investigation Of The Big Four Springs Region In Southern Missouri, Jordan Jasso Vega

MSU Graduate Theses

The Big Four Springs region hosts four major first-order magnitude springs in southern Missouri and northern Arkansas. These springs are Big Spring (Carter County, MO), Greer Spring (Oregon County, MO), Mammoth Spring (Fulton County, AR), and Hodgson Mill Spring (Ozark County, MO). Based on historic dye traces and hydrogeological investigations, these springs drain an area of approximately 1500 square miles and collectively discharge an average of 780 million gallons of water per day. The rocks from youngest to oldest that are found in Big Four Springs region are the Cotter and Jefferson City Dolomite (Ordovician), Roubidoux Formation (Ordovician), Gasconade Dolomite …


Improving The Quality And Design Of Retrospective Clinical Outcome Studies That Utilize Electronic Health Records, Oliwier Dziadkowiec, Jeffery Durbin, Vignesh Jayaraman Muralidharan, Megan Novak, Brendon Cornett Jul 2020

Improving The Quality And Design Of Retrospective Clinical Outcome Studies That Utilize Electronic Health Records, Oliwier Dziadkowiec, Jeffery Durbin, Vignesh Jayaraman Muralidharan, Megan Novak, Brendon Cornett

HCA Healthcare Journal of Medicine

Electronic health records (EHRs) are an excellent source for secondary data analysis. Studies based on EHR-derived data, if designed properly, can answer previously unanswerable clinical research questions. In this paper we will highlight the benefits of large retrospective studies from secondary sources such as EHRs, examine retrospective cohort and case-control study design challenges, as well as methodological and statistical adjustment that can be made to overcome some of the inherent design limitations, in order to increase the generalizability, validity and reliability of the results obtained from these studies.


Effect Of Predictor Dependence On Variable Selection For Linear And Log-Linear Regression, Apu Chandra Das Jul 2020

Effect Of Predictor Dependence On Variable Selection For Linear And Log-Linear Regression, Apu Chandra Das

Graduate Theses and Dissertations

We propose a Bayesian approach to the Dirichlet-Multinomial (DM) regression model, which uses horseshoe, Laplace, and horseshoe plus priors for shrinkage and selection. The Dirichlet-Multinomial model can be used to find the significant association between a set of available covariates and taxa for a microbiome sample. We incorporate the covariates in a log-linear regression framework. We design a simulation study to make a comparison among the performance of the three shrinkage priors in terms of estimation accuracy and the ability to detect true signals. Our results have clearly separated the performance of the three priors and indicated that the horseshoe …


Learning Networks With Categorical Data Using Distance Correlation, And A Novel Graph-Based Multivariate Test, Jian Tinker Jul 2020

Learning Networks With Categorical Data Using Distance Correlation, And A Novel Graph-Based Multivariate Test, Jian Tinker

Graduate Theses and Dissertations

We study the use of distance correlation for statistical inference on categorical data, especially the induction of probability networks. Szekely et al. first defined distance correlation for continuous variables in [42], and Zhang translated the concept into the categorical setting in [57] by defining dCor(X,Y) for categorical variables X = (x1,...,xI) and Y = (y1,...,yJ) where P(X=xi)=[pi]i and P(Y=yi)=[pi]j with the formula [Please open the document]

Part I of the dissertation covers the background we need to understand this formula, and prepares us to analyze the properties and performance of its applications.

Part II then presents the main results of …


Assessing Differential Item Functioning In The Perceived Stress Scale, Nana Amma Berko Asamoah Jul 2020

Assessing Differential Item Functioning In The Perceived Stress Scale, Nana Amma Berko Asamoah

Graduate Theses and Dissertations

When an item on a test functions differently for subgroups of respondents with respect to an exogenous variable (or covariate) after conditioning on the latent variable of interest, the item is said to exhibit Differential Item Functioning (DIF). The 10-item Perceived Stress Scale (PSS10) is administered to respondents via MTurk to quantify “perceived stress” and identify if items on the scale function differently for specific subgroups defined by age, sex, race, marital status, number of children, employment status and social media usage.

The purpose of this study was to compare traditional DIF detection approaches (Mantel-Haenszel, logistic regression, likelihood ratio test …


Quantitative Model For Setting Manufacturer's Suggested Retail Price, Peter Byrd, Jonathan Knowles, Dmitry Andreev, Jacob Turner, Brian Mente, Laroux Wallace Jan 2020

Quantitative Model For Setting Manufacturer's Suggested Retail Price, Peter Byrd, Jonathan Knowles, Dmitry Andreev, Jacob Turner, Brian Mente, Laroux Wallace

SMU Data Science Review

In this paper, we present a quantitative approach to model the manufacturer’s suggested retail price (MSRP) for children’s doll- houses and establish relationships among key features that contribute most to establishing MSRP. Determination of the MSRP is a critical step in how consumers respond with their wallets when purchasing an item. KidKraft, a global leader in toys and juvenile products, sets MSRP subjectively using product experts. The process is arduous and time consuming requiring the focus of specialized resources and knowledge of the interaction between key attributes and their impact on consumer value. An accurate prediction of MSRP during the …


Zero-Inflated Longitudinal Mixture Model For Stochastic Radiographic Lung Compositional Change Following Radiotherapy Of Lung Cancer, Viviana A. Rodríguez Romero Jan 2020

Zero-Inflated Longitudinal Mixture Model For Stochastic Radiographic Lung Compositional Change Following Radiotherapy Of Lung Cancer, Viviana A. Rodríguez Romero

Theses and Dissertations

Compositional data (CD) is mostly analyzed as relative data, using ratios of components, and log-ratio transformations to be able to use known multivariable statistical methods. Therefore, CD where some components equal zero represent a problem. Furthermore, when the data is measured longitudinally, observations are spatially related and appear to come from a mixture population, the analysis becomes highly complex. For this matter, a two-part model was proposed to deal with structural zeros in longitudinal CD using a mixed-effects model. Furthermore, the model has been extended to the case where the non-zero components of the vector might a two component mixture …


Theory Of Principal Components For Applications In Exploratory Crime Analysis And Clustering, Daniel Silva Jan 2020

Theory Of Principal Components For Applications In Exploratory Crime Analysis And Clustering, Daniel Silva

All Graduate Theses, Dissertations, and Other Capstone Projects

The purpose of this paper is to develop the theory of principal components analysis succinctly from the fundamentals of matrix algebra and multivariate statistics. Principal components analysis is sometimes used as a descriptive technique to explain the variance-covariance or correlation structure of a dataset. However, most often, it is used as a dimensionality reduction technique to visualize a high dimensional dataset in a lower dimensional space. Principal components analysis accomplishes this by using the first few principal components, provided that they account for a substantial proportion of variation in the original dataset. In the same way, the first few principal …


Nonparametric Analysis Of Clustered And Multivariate Data, Yue Cui Jan 2020

Nonparametric Analysis Of Clustered And Multivariate Data, Yue Cui

Theses and Dissertations--Statistics

In this dissertation, we investigate three distinct but interrelated problems for nonparametric analysis of clustered data and multivariate data in pre-post factorial design.

In the first project, we propose a nonparametric approach for one-sample clustered data in pre-post intervention design. In particular, we consider the situation where for some clusters all members are only observed at either pre or post intervention but not both. This type of clustered data is referred to us as partially complete clustered data. Unlike most of its parametric counterparts, we do not assume specific models for data distributions, intra-cluster dependence structure or variability, in effect …


Nonparametric Tests Of Lack Of Fit For Multivariate Data, Yan Xu Jan 2020

Nonparametric Tests Of Lack Of Fit For Multivariate Data, Yan Xu

Theses and Dissertations--Statistics

A common problem in regression analysis (linear or nonlinear) is assessing the lack-of-fit. Existing methods make parametric or semi-parametric assumptions to model the conditional mean or covariance matrices. In this dissertation, we propose fully nonparametric methods that make only additive error assumptions. Our nonparametric approach relies on ideas from nonparametric smoothing to reduce the test of association (lack-of-fit) problem into a nonparametric multivariate analysis of variance. A major problem that arises in this approach is that the key assumptions of independence and constant covariance matrix among the groups will be violated. As a result, the standard asymptotic theory is not …


Three Essays On Health Economics And Policy Evaluation, Shishir Shakya Jan 2020

Three Essays On Health Economics And Policy Evaluation, Shishir Shakya

Graduate Theses, Dissertations, and Problem Reports

This dissertation consists of three essays on the U.S. Health care policy. Each paragraph below refers to the three abstracts for the three chapters in this dissertation, respectively. I provide quantitative evidence on how much Prescription Drug Monitoring Programs (PDMPs) affects the retail opioid prescribing behaviors. Using the American Community Survey (ACS), I retrieve county-level high dimensional panel data set from 2010 to 2017. I employ three separate identification strategies: difference-in-difference, double selection post-LASSO, and spatial difference-in-difference. I compare how the retail opioid prescribing behaviors of counties, that are mandatory for prescribers to check the PDMP before prescribing controlled substances …