Open Access. Powered by Scholars. Published by Universities.®

Multivariate Analysis Commons

Open Access. Powered by Scholars. Published by Universities.®

521 Full-Text Articles 770 Authors 249,635 Downloads 96 Institutions

All Articles in Multivariate Analysis

Faceted Search

521 full-text articles. Page 7 of 18.

Leveraging Reviews To Improve User Experience, Anthony Schams, Iram Bakhtiar, Cristina Stanley 2019 Southern Methodist University

Leveraging Reviews To Improve User Experience, Anthony Schams, Iram Bakhtiar, Cristina Stanley

SMU Data Science Review

In this paper, we will explore and present a method of finding characteristics of a restaurant using its reviews through machine learning algorithms. We begin by building models to predict the ratings of individual reviews using text and categorical features. This is to examine the efficacy of the algorithms to the task. Both XGBoost and logistic regression will be examined. With these models, our goal is then to identify key phrases in reviews that are correlated with positive and negative experience. Our analysis makes use of review data publicly made available by Yelp. Key bigrams extracted were non-specific to the …


Comparison Of Imputation Methods For Mixed Data Missing At Random, Kaitlyn Heidt 2019 East Tennessee State University

Comparison Of Imputation Methods For Mixed Data Missing At Random, Kaitlyn Heidt

Electronic Theses and Dissertations

A statistician's job is to produce statistical models. When these models are precise and unbiased, we can relate them to new data appropriately. However, when data sets have missing values, assumptions to statistical methods are violated and produce biased results. The statistician's objective is to implement methods that produce unbiased and accurate results. Research in missing data is becoming popular as modern methods that produce unbiased and accurate results are emerging, such as MICE in R, a statistical software. Using real data, we compare four common imputation methods, in the MICE package in R, at different levels of missingness. The …


A Systematic Assessment Of Socio-Economic Impacts Of Prolonged Episodic Volcano Crises, Justin Peers 2019 East Tennessee State University

A Systematic Assessment Of Socio-Economic Impacts Of Prolonged Episodic Volcano Crises, Justin Peers

Electronic Theses and Dissertations

Uncertainty surrounding volcanic activity can lead to socio-economic crises with or without an eruption as demonstrated by the post-1978 response to unrest of Long Valley Caldera (LVC), CA. Extensive research in physical sciences provides a foundation on which to assess direct impacts of hazards, but fewer resources have been dedicated towards understanding human responses to volcanic risk. To evaluate natural hazard risk issues at LVC, a multi-hazard, mail-based, household survey was conducted to compare perceptions of volcanic, seismic, and wildfire hazards. Impacts of volcanic activity on housing prices and businesses were examined at the county-level for three volcanoes with a …


Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse 2019 Florida International University

Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse

FIU Electronic Theses and Dissertations

Regression is a statistical technique for modeling the relationship between a dependent variable Y and two or more predictor variables, also known as regressors. In the broad field of regression, there exists a special case in which the relationship between the dependent variable and the regressor(s) is linear. This is known as linear regression.

The purpose of this paper is to create a useful method that effectively selects a subset of regressors when dealing with high dimensional data and/or collinearity in linear regression. As the name depicts it, high dimensional data occurs when the number of predictor variables is far …


Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan 2019 Temple University

Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan

COBRA Preprint Series

One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease's process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards …


Predicting Unplanned Medical Visits Among Patients With Diabetes Using Machine Learning, Arielle Selya, Eric L. Johnson 2019 Sanford Health

Predicting Unplanned Medical Visits Among Patients With Diabetes Using Machine Learning, Arielle Selya, Eric L. Johnson

SDSU Data Science Symposium

Diabetes poses a variety of medical complications to patients, resulting in a high rate of unplanned medical visits, which are costly to patients and healthcare providers alike. However, unplanned medical visits by their nature are very difficult to predict. The current project draws upon electronic health records (EMR’s) of adult patients with diabetes who received care at Sanford Health between 2014 and 2017. Various machine learning methods were used to predict which patients have had an unplanned medical visit based on a variety of EMR variables (age, BMI, blood pressure, # of prescriptions, # of diagnoses on problem list, A1C, …


Nonparametric Depth And Quantile Regression For Functional Data, Joydeep Chowdhury, Probal Chaudhuri 2019 Indian Statistical Institute, Kolkata

Nonparametric Depth And Quantile Regression For Functional Data, Joydeep Chowdhury, Probal Chaudhuri

Journal Articles

We investigate nonparametric regression methods based on spatial depth and quantiles when the response and the covariate are both functions. As in classical quantile regression for finite dimensional data, regression techniques developed here provide insight into the influence of the functional covariate on different parts, like the center as well as the tails, of the conditional distribution of the functional response. Depth and quantile based nonparametric regression methods are useful to detect heteroscedasticity in functional regression. We derive the asymptotic behavior of the nonparametric depth and quantile regression estimates, which depend on the small ball probabilities in the covariate space. …


Estimation Of Multivariate Asset Models With Jumps, Angela Loregian, Laura Ballotta, Gianluca Gianluca Fusai, MARCOS FABRICIO PEREZ 2019 ARPM

Estimation Of Multivariate Asset Models With Jumps, Angela Loregian, Laura Ballotta, Gianluca Gianluca Fusai, Marcos Fabricio Perez

Business Faculty Publications

We propose a consistent and computationally efficient two-step methodology for the estimation of multidimensional non-Gaussian asset models built using Levy processes. The proposed framework allows for dependence between assets and different tail behaviors and jump structures for each asset. Our procedure can be applied to portfolios with a large number of assets as it is immune to estimation dimensionality problems. Simulations show good finite sample properties and significant efficiency gains. This method is especially relevant for risk management purposes such as, for example, the computation of portfolio Value at Risk and intra-horizon Value at Risk, as we show in detail …


Modeling Stochastically Intransitive Relationships In Paired Comparison Data, Ryan Patrick Alexander McShane 2019 Southern Methodist University

Modeling Stochastically Intransitive Relationships In Paired Comparison Data, Ryan Patrick Alexander Mcshane

Statistical Science Theses and Dissertations

If the Warriors beat the Rockets and the Rockets beat the Spurs, does that mean that the Warriors are better than the Spurs? Sophisticated fans would argue that the Warriors are better by the transitive property, but could Spurs fans make a legitimate argument that their team is better despite this chain of evidence?

We first explore the nature of intransitive (rock-scissors-paper) relationships with a graph theoretic approach to the method of paired comparisons framework popularized by Kendall and Smith (1940). Then, we focus on the setting where all pairs of items, teams, players, or objects have been compared to …


Measuring Trace Element Concentrations In Artiodactyl Cannonbones Using Portable X-Ray Fluorescence, Joshua L. Henderson 2019 Central Washington University

Measuring Trace Element Concentrations In Artiodactyl Cannonbones Using Portable X-Ray Fluorescence, Joshua L. Henderson

All Master's Theses

Artiodactyl bones are the most common faunal remains found in Washington prehistoric archaeology sites, but they are often too fragmented to accurately identify a family, genus, or species. Traditional faunal analysis can only organize unidentifiable bone fragments into size class, and chemical methods often require the destruction of bone samples. In this thesis research, I tested a new, nondestructive faunal analysis technique using portable X-ray fluorescence (pXRF) to measure trace element concentrations in comparative collection and archaeological bone samples. Using cannonbones from five different artiodactyl species, I collected trace element data from 50 comparative collection specimens and 18 archaeological specimens …


The Effect Of Vegetative Structure On Nest-Burrow Selection By The Western Burrowing Owl: Comparing Traditional Methods To Photogrammetry With An Unmanned Aerial System, Dylan J. Steffen 2019 Fort Hays State University

The Effect Of Vegetative Structure On Nest-Burrow Selection By The Western Burrowing Owl: Comparing Traditional Methods To Photogrammetry With An Unmanned Aerial System, Dylan J. Steffen

Master's Theses

The shortgrass prairie ecoregion in the United States has been reduced to 52% of its historical extent, contributing to reduced habitat for native species. One such species is the Burrowing Owl (Athene cunicularia). The Western Burrowing Owl subspecies (A. c. hypugaea) is listed as a Species of Special Concern in nearly every western and midwestern state, including Kansas where it is designated as a Tier II Species of Greatest Conservation Need. Habitat destruction due to conversion to cropland, increasing use of pesticides, and reduction in burrowing mammal abundance are the primary threats that have led to …


The Dark Sky Character Of Archaeological Landscapes: Cultural Meaning And Conservation Strategies, Frank Prendergast 2019 Technological University Dublin

The Dark Sky Character Of Archaeological Landscapes: Cultural Meaning And Conservation Strategies, Frank Prendergast

Book/Book Chapter

This paper presents the first ever study of light pollution at selected Irish prehistoric archaeological landscapes. The concepts of cosmology and landscape are first briefly described and followed by a summary of early human settlement of the island. Building on this, the extant corpus of early prehistoric megalithic burial tombs is illustrated to show their contrasting distribution patterns and typology. Analysis of tomb locations using nearest-neighbour statistical methods reveals evidence of intentional clustering. Further geo-statistical analysis identifies the geographical locations and the density ranking of these nucleated clusters - a feature especially evident in the passage tomb tradition on this …


Compound-Specific Isotope Analysis Of Amino Acids In Biological Tissues: Applications In Forensic Entomology, Food Authentication And Soft-Biometrics In Humans, Mayara Patricia Viana de Matos 2019 West Virginia University

Compound-Specific Isotope Analysis Of Amino Acids In Biological Tissues: Applications In Forensic Entomology, Food Authentication And Soft-Biometrics In Humans, Mayara Patricia Viana De Matos

Graduate Theses, Dissertations, and Problem Reports

In this work we demonstrate the power of compound-specific isotope analysis (CSIA) to analyze proteinaceous biological materials in three distinct forensic applications, including: 1) linking necrophagous blow flies in different life stages to their primary carrion diet; 2) identifying the harvesting area of oysters for food authentication purposes; and 3) the ability to predict biometric traits about humans from their hair.

In the first application, we measured the amino-acid-level fractionation that occurs at each major life stage of Calliphora vicina (Robineau-Desvoidy) (Diptera: Calliphoridae) blow flies. Adult blow flies oviposited on raw pork muscle, beef muscle, or chicken liver. Larvae, pupae …


Biodiversity And Distribution Of Benthic Foraminifera In Harrington Sound, Bermuda: The Effects Of Physical And Geochemical Factors On Dominant Taxa, Nam Le 2019 Colby College

Biodiversity And Distribution Of Benthic Foraminifera In Harrington Sound, Bermuda: The Effects Of Physical And Geochemical Factors On Dominant Taxa, Nam Le

Honors Theses

Harrington Sound, Bermuda, is a nearly enclosed lagoon acting as a subtropical/tropical, carbonate-rich basin in which carbonate sediments, reef patches, and carbonate-producing organisms accumulate. Here, one of the most important calcareous groups is the Foraminifera. Analyses of common benthic orders, including miliolids (Quinqueloculina and Triloculina spp.) and rotaliids (Homotrema rubrum, Elphidium spp., and Ammonia beccarii), are essential in understanding past and present environmental conditions affecting the island's coastal environment. These taxa have been studied previously; however, factors explaining their individual patterns of abundance in the Sound are not well detailed. The goal of this study is …


Data Patterns Discovery Using Unsupervised Learning, Rachel A. Lewis 2019 Georgia Southern University

Data Patterns Discovery Using Unsupervised Learning, Rachel A. Lewis

Electronic Theses and Dissertations

Self-care activities classification poses significant challenges in identifying children’s unique functional abilities and needs within the exceptional children healthcare system. The accuracy of diagnosing a child's self-care problem, such as toileting or dressing, is highly influenced by an occupational therapists’ experience and time constraints. Thus, there is a need for objective means to detect and predict in advance the self-care problems of children with physical and motor disabilities. We use clustering to discover interesting information from self-care problems, perform automatic classification of binary data, and discover outliers. The advantages are twofold: the advancement of knowledge on identifying self-care problems in …


Essays On Mixture Models, Trevor R. Camper 2019 Georgia Southern University

Essays On Mixture Models, Trevor R. Camper

Electronic Theses and Dissertations

When considering statistical scenarios where one can sample from populations that are not of interest for the purposes of a study, bivariate mixture models can be used to study the effect that this missampling can have on parameter estimation. In this thesis, we will examine the behavior that bivariate mixture models have on two statistical constructs: Cronbach's alpha \cite{C51}, and Spearman's rho \cite{S04}. Chapter 1 will introduce notions of mixture models and the definition of bias under mixture models which will serve as the central concept of this thesis. Chapter 2 will investigate a particular psychometric issue known as insufficient …


Variable Selection In Accelerated Failure Time (Aft) Frailty Models: An Application Of Penalized Quasi-Likelihood, Sarbesh R. Pandeya 2019 Georgia Southern University

Variable Selection In Accelerated Failure Time (Aft) Frailty Models: An Application Of Penalized Quasi-Likelihood, Sarbesh R. Pandeya

Electronic Theses and Dissertations

Variable selection is one of the standard ways of selecting models in large scale datasets. It has applications in many fields of research study, especially in large multi-center clinical trials. One of the prominent methods in variable selection is the penalized likelihood, which is both consistent and efficient. However, the penalized selection is significantly challenging under the influence of random (frailty) covariates. It is even more complicated when there is involvement of censoring as it may not have a closed-form solution for the marginal log-likelihood. Therefore, we applied the penalized quasi-likelihood (PQL) approach that approximates the solution for such a …


A New Independence Measure And Its Applications In High Dimensional Data Analysis, Chenlu Ke 2019 University of Kentucky

A New Independence Measure And Its Applications In High Dimensional Data Analysis, Chenlu Ke

Theses and Dissertations--Statistics

This dissertation has three consecutive topics. First, we propose a novel class of independence measures for testing independence between two random vectors based on the discrepancy between the conditional and the marginal characteristic functions. If one of the variables is categorical, our asymmetric index extends the typical ANOVA to a kernel ANOVA that can test a more general hypothesis of equal distributions among groups. The index is also applicable when both variables are continuous. Second, we develop a sufficient variable selection procedure based on the new measure in a large p small n setting. Our approach incorporates marginal information between …


Transforms In Sufficient Dimension Reduction And Their Applications In High Dimensional Data, Jiaying Weng 2019 University of Kentucky

Transforms In Sufficient Dimension Reduction And Their Applications In High Dimensional Data, Jiaying Weng

Theses and Dissertations--Statistics

The big data era poses great challenges as well as opportunities for researchers to develop efficient statistical approaches to analyze massive data. Sufficient dimension reduction is such an important tool in modern data analysis and has received extensive attention in both academia and industry.

In this dissertation, we introduce inverse regression estimators using Fourier transforms, which is superior to the existing SDR methods in two folds, (1) it avoids the slicing of the response variable, (2) it can be readily extended to solve the high dimensional data problem. For the ultra-high dimensional problem, we investigate both eigenvalue decomposition and minimum …


Composite Nonparametric Tests In High Dimension, Alejandro G. Villasante Tezanos 2019 University of Kentucky

Composite Nonparametric Tests In High Dimension, Alejandro G. Villasante Tezanos

Theses and Dissertations--Statistics

This dissertation focuses on the problem of making high-dimensional inference for two or more groups. High-dimensional means both the sample size (n) and dimension (p) tend to infinity, possibly at different rates. Classical approaches for group comparisons fail in the high-dimensional situation, in the sense that they have incorrect sizes and low powers. Much has been done in recent years to overcome these problems. However, these recent works make restrictive assumptions in terms of the number of treatments to be compared and/or the distribution of the data. This research aims to (1) propose and investigate refined …


Digital Commons powered by bepress