Open Access. Powered by Scholars. Published by Universities.®

Multivariate Analysis Commons

Open Access. Powered by Scholars. Published by Universities.®

Applied Statistics

2019

Institution
Keyword
Publication
Publication Type

Articles 1 - 16 of 16

Full-Text Articles in Multivariate Analysis

Function Space Tensor Decomposition And Its Application In Sports Analytics, Justin Reising Dec 2019

Function Space Tensor Decomposition And Its Application In Sports Analytics, Justin Reising

Electronic Theses and Dissertations

Recent advancements in sports information and technology systems have ushered in a new age of applications of both supervised and unsupervised analytical techniques in the sports domain. These automated systems capture large volumes of data points about competitors during live competition. As a result, multi-relational analyses are gaining popularity in the field of Sports Analytics. We review two case studies of dimensionality reduction with Principal Component Analysis and latent factor analysis with Non-Negative Matrix Factorization applied in sports. Also, we provide a review of a framework for extending these techniques for higher order data structures. The primary scope of this …


Habitat Associations And Reproduction Of Fishes On The Northwestern Gulf Of Mexico Shelf Edge, Elizabeth Marie Keller Nov 2019

Habitat Associations And Reproduction Of Fishes On The Northwestern Gulf Of Mexico Shelf Edge, Elizabeth Marie Keller

LSU Doctoral Dissertations

Several of the northwestern Gulf of Mexico (GOM) shelf-edge banks provide critical hard bottom habitat for coral and fish communities, supporting a wide diversity of ecologically and economically important species. These sites may be fish aggregation and spawning sites and provide important habitat for fish growth and reproduction. Already designated as habitat areas of particular concern, many of these banks are also under consideration for inclusion in the expansion of the Flower Garden Banks National Marine Sanctuary. This project aimed to gain a more comprehensive understanding of the communities and fish species on shelf-edge banks by way of gonad histology, …


Classification Of Coronary Artery Disease In Non-Diabetic Patients Using Artificial Neural Networks, Demond Handley Oct 2019

Classification Of Coronary Artery Disease In Non-Diabetic Patients Using Artificial Neural Networks, Demond Handley

Annual Symposium on Biomathematics and Ecology Education and Research

No abstract provided.


Identifying Risk Factors Related To Premature Birth Through Binary Logistic And Proportional Odds Ordinal Logistic Regression, Clayton Elwood Aug 2019

Identifying Risk Factors Related To Premature Birth Through Binary Logistic And Proportional Odds Ordinal Logistic Regression, Clayton Elwood

Electronic Theses and Dissertations

Premature birth has been identified as the single greatest cause of death worldwide in children under the age of five. This thesis will implement binary logistic regression and proportional odds ordinal logistic regression to predict different levels of premature birth and identify associated risk factors. The models will be built from the Center for Disease Control and Prevention's 2014 Vital Statistics Natality Birth Data containing nearly 4 million live births within the United States. Odds ratios and confidence intervals on risk factors were produced utilizing binary logistic regression.


Optimal Design For A Causal Structure, Zaher Kmail Aug 2019

Optimal Design For A Causal Structure, Zaher Kmail

Department of Statistics: Dissertations, Theses, and Student Work

Linear models and mixed models are important statistical tools. But in many natural phenomena, there is more than one endogenous variable involved and these variables are related in a sophisticated way. Structural Equation Modeling (SEM) is often used to model the complex relationships between the endogenous and exogenous variables. It was first implemented in research to estimate the strength and direction of direct and indirect effects among variables and to measure the relative magnitude of each causal factor.

Historically, traditional optimal design theory focuses on univariate linear, nonlinear, and mixed models. There is no current literature on the subject of …


Leveraging Reviews To Improve User Experience, Anthony Schams, Iram Bakhtiar, Cristina Stanley May 2019

Leveraging Reviews To Improve User Experience, Anthony Schams, Iram Bakhtiar, Cristina Stanley

SMU Data Science Review

In this paper, we will explore and present a method of finding characteristics of a restaurant using its reviews through machine learning algorithms. We begin by building models to predict the ratings of individual reviews using text and categorical features. This is to examine the efficacy of the algorithms to the task. Both XGBoost and logistic regression will be examined. With these models, our goal is then to identify key phrases in reviews that are correlated with positive and negative experience. Our analysis makes use of review data publicly made available by Yelp. Key bigrams extracted were non-specific to the …


Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse Apr 2019

Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse

FIU Electronic Theses and Dissertations

Regression is a statistical technique for modeling the relationship between a dependent variable Y and two or more predictor variables, also known as regressors. In the broad field of regression, there exists a special case in which the relationship between the dependent variable and the regressor(s) is linear. This is known as linear regression.

The purpose of this paper is to create a useful method that effectively selects a subset of regressors when dealing with high dimensional data and/or collinearity in linear regression. As the name depicts it, high dimensional data occurs when the number of predictor variables is far …


Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan Mar 2019

Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan

COBRA Preprint Series

One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease's process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards …


Predicting Unplanned Medical Visits Among Patients With Diabetes Using Machine Learning, Arielle Selya, Eric L. Johnson Feb 2019

Predicting Unplanned Medical Visits Among Patients With Diabetes Using Machine Learning, Arielle Selya, Eric L. Johnson

SDSU Data Science Symposium

Diabetes poses a variety of medical complications to patients, resulting in a high rate of unplanned medical visits, which are costly to patients and healthcare providers alike. However, unplanned medical visits by their nature are very difficult to predict. The current project draws upon electronic health records (EMR’s) of adult patients with diabetes who received care at Sanford Health between 2014 and 2017. Various machine learning methods were used to predict which patients have had an unplanned medical visit based on a variety of EMR variables (age, BMI, blood pressure, # of prescriptions, # of diagnoses on problem list, A1C, …


Nonparametric Depth And Quantile Regression For Functional Data, Joydeep Chowdhury, Probal Chaudhuri Feb 2019

Nonparametric Depth And Quantile Regression For Functional Data, Joydeep Chowdhury, Probal Chaudhuri

Journal Articles

We investigate nonparametric regression methods based on spatial depth and quantiles when the response and the covariate are both functions. As in classical quantile regression for finite dimensional data, regression techniques developed here provide insight into the influence of the functional covariate on different parts, like the center as well as the tails, of the conditional distribution of the functional response. Depth and quantile based nonparametric regression methods are useful to detect heteroscedasticity in functional regression. We derive the asymptotic behavior of the nonparametric depth and quantile regression estimates, which depend on the small ball probabilities in the covariate space. …


Modeling Stochastically Intransitive Relationships In Paired Comparison Data, Ryan Patrick Alexander Mcshane Jan 2019

Modeling Stochastically Intransitive Relationships In Paired Comparison Data, Ryan Patrick Alexander Mcshane

Statistical Science Theses and Dissertations

If the Warriors beat the Rockets and the Rockets beat the Spurs, does that mean that the Warriors are better than the Spurs? Sophisticated fans would argue that the Warriors are better by the transitive property, but could Spurs fans make a legitimate argument that their team is better despite this chain of evidence?

We first explore the nature of intransitive (rock-scissors-paper) relationships with a graph theoretic approach to the method of paired comparisons framework popularized by Kendall and Smith (1940). Then, we focus on the setting where all pairs of items, teams, players, or objects have been compared to …


The Dark Sky Character Of Archaeological Landscapes: Cultural Meaning And Conservation Strategies, Frank Prendergast Jan 2019

The Dark Sky Character Of Archaeological Landscapes: Cultural Meaning And Conservation Strategies, Frank Prendergast

Book/Book Chapter

This paper presents the first ever study of light pollution at selected Irish prehistoric archaeological landscapes. The concepts of cosmology and landscape are first briefly described and followed by a summary of early human settlement of the island. Building on this, the extant corpus of early prehistoric megalithic burial tombs is illustrated to show their contrasting distribution patterns and typology. Analysis of tomb locations using nearest-neighbour statistical methods reveals evidence of intentional clustering. Further geo-statistical analysis identifies the geographical locations and the density ranking of these nucleated clusters - a feature especially evident in the passage tomb tradition on this …


Biodiversity And Distribution Of Benthic Foraminifera In Harrington Sound, Bermuda: The Effects Of Physical And Geochemical Factors On Dominant Taxa, Nam Le Jan 2019

Biodiversity And Distribution Of Benthic Foraminifera In Harrington Sound, Bermuda: The Effects Of Physical And Geochemical Factors On Dominant Taxa, Nam Le

Honors Theses

Harrington Sound, Bermuda, is a nearly enclosed lagoon acting as a subtropical/tropical, carbonate-rich basin in which carbonate sediments, reef patches, and carbonate-producing organisms accumulate. Here, one of the most important calcareous groups is the Foraminifera. Analyses of common benthic orders, including miliolids (Quinqueloculina and Triloculina spp.) and rotaliids (Homotrema rubrum, Elphidium spp., and Ammonia beccarii), are essential in understanding past and present environmental conditions affecting the island's coastal environment. These taxa have been studied previously; however, factors explaining their individual patterns of abundance in the Sound are not well detailed. The goal of this study is …


Variable Selection In Accelerated Failure Time (Aft) Frailty Models: An Application Of Penalized Quasi-Likelihood, Sarbesh R. Pandeya Jan 2019

Variable Selection In Accelerated Failure Time (Aft) Frailty Models: An Application Of Penalized Quasi-Likelihood, Sarbesh R. Pandeya

Electronic Theses and Dissertations

Variable selection is one of the standard ways of selecting models in large scale datasets. It has applications in many fields of research study, especially in large multi-center clinical trials. One of the prominent methods in variable selection is the penalized likelihood, which is both consistent and efficient. However, the penalized selection is significantly challenging under the influence of random (frailty) covariates. It is even more complicated when there is involvement of censoring as it may not have a closed-form solution for the marginal log-likelihood. Therefore, we applied the penalized quasi-likelihood (PQL) approach that approximates the solution for such a …


Essays On Mixture Models, Trevor R. Camper Jan 2019

Essays On Mixture Models, Trevor R. Camper

Electronic Theses and Dissertations

When considering statistical scenarios where one can sample from populations that are not of interest for the purposes of a study, bivariate mixture models can be used to study the effect that this missampling can have on parameter estimation. In this thesis, we will examine the behavior that bivariate mixture models have on two statistical constructs: Cronbach's alpha \cite{C51}, and Spearman's rho \cite{S04}. Chapter 1 will introduce notions of mixture models and the definition of bias under mixture models which will serve as the central concept of this thesis. Chapter 2 will investigate a particular psychometric issue known as insufficient …


Data Patterns Discovery Using Unsupervised Learning, Rachel A. Lewis Jan 2019

Data Patterns Discovery Using Unsupervised Learning, Rachel A. Lewis

Electronic Theses and Dissertations

Self-care activities classification poses significant challenges in identifying children’s unique functional abilities and needs within the exceptional children healthcare system. The accuracy of diagnosing a child's self-care problem, such as toileting or dressing, is highly influenced by an occupational therapists’ experience and time constraints. Thus, there is a need for objective means to detect and predict in advance the self-care problems of children with physical and motor disabilities. We use clustering to discover interesting information from self-care problems, perform automatic classification of binary data, and discover outliers. The advantages are twofold: the advancement of knowledge on identifying self-care problems in …