Open Access. Powered by Scholars. Published by Universities.®

Multivariate Analysis Commons

Open Access. Powered by Scholars. Published by Universities.®

332 Full-Text Articles 535 Authors 116,029 Downloads 69 Institutions

All Articles in Multivariate Analysis

Faceted Search

332 full-text articles. Page 1 of 13.

Quantitative Model For Setting Manufacturer's Suggested Retail Price, Peter Byrd, Jonathan Knowles, Dmitry Andreev, Jacob Turner, Brian Mente, LaRoux Wallace 2020 Southern Methodist University

Quantitative Model For Setting Manufacturer's Suggested Retail Price, Peter Byrd, Jonathan Knowles, Dmitry Andreev, Jacob Turner, Brian Mente, Laroux Wallace

SMU Data Science Review

In this paper, we present a quantitative approach to model the manufacturer’s suggested retail price (MSRP) for children’s doll- houses and establish relationships among key features that contribute most to establishing MSRP. Determination of the MSRP is a critical step in how consumers respond with their wallets when purchasing an item. KidKraft, a global leader in toys and juvenile products, sets MSRP subjectively using product experts. The process is arduous and time consuming requiring the focus of specialized resources and knowledge of the interaction between key attributes and their impact on consumer value. An accurate prediction of MSRP ...


Generalized Matrix Decomposition Regression: Estimation And Inference For Two-Way Structured Data, Yue Wang, Ali Shojaie, Tim Randolph, Jing Ma 2019 University of Washington

Generalized Matrix Decomposition Regression: Estimation And Inference For Two-Way Structured Data, Yue Wang, Ali Shojaie, Tim Randolph, Jing Ma

UW Biostatistics Working Paper Series

Analysis of two-way structured data, i.e., data with structures among both variables and samples, is becoming increasingly common in ecology, biology and neuro-science. Classical dimension-reduction tools, such as the singular value decomposition (SVD), may perform poorly for two-way structured data. The generalized matrix decomposition (GMD, Allen et al., 2014) extends the SVD to two-way structured data and thus constructs singular vectors that account for both structures. While the GMD is a useful dimension-reduction tool for exploratory analysis of two-way structured data, it is unsupervised and cannot be used to assess the association between such data and an outcome of ...


Statistical Inference For Networks Of High-Dimensional Point Processes, Xu Wang, Mladen Kolar, Ali Shojaie 2019 University of Washington - Seattle Campus

Statistical Inference For Networks Of High-Dimensional Point Processes, Xu Wang, Mladen Kolar, Ali Shojaie

UW Biostatistics Working Paper Series

Fueled in part by recent applications in neuroscience, high-dimensional Hawkes process have become a popular tool for modeling the network of interactions among multivariate point process data. While evaluating the uncertainty of the network estimates is critical in scientific applications, existing methodological and theoretical work have only focused on estimation. To bridge this gap, this paper proposes a high-dimensional statistical inference procedure with theoretical guarantees for multivariate Hawkes process. Key to this inference procedure is a new concentration inequality on the first- and second-order statistics for integrated stochastic processes, which summarizes the entire history of the process. We apply this ...


Function Space Tensor Decomposition And Its Application In Sports Analytics, Justin Reising 2019 East Tennessee State University

Function Space Tensor Decomposition And Its Application In Sports Analytics, Justin Reising

Electronic Theses and Dissertations

Recent advancements in sports information and technology systems have ushered in a new age of applications of both supervised and unsupervised analytical techniques in the sports domain. These automated systems capture large volumes of data points about competitors during live competition. As a result, multi-relational analyses are gaining popularity in the field of Sports Analytics. We review two case studies of dimensionality reduction with Principal Component Analysis and latent factor analysis with Non-Negative Matrix Factorization applied in sports. Also, we provide a review of a framework for extending these techniques for higher order data structures. The primary scope of this ...


#46 - America's Response To President Trump's Tweets, Amanda Friend 2019 University of West Georgia

#46 - America's Response To President Trump's Tweets, Amanda Friend

Georgia Undergraduate Research Conference (GURC)

Purpose: The purpose of the research throughout this study was to examine Trump’s tweets during the first six months he was in office. Due to Trump using Twitter as his main form of communication it is important for journalists and individuals to follow his tweets.

Research Questions: The analysis covers how many times people shared positive or negative tweets and if people shared more issue based tweets. This study emphasizes President Trump’s most popular tweets and how people responded to his first six months on Twitter.

Method: The tweets were coded with a key using content analysis to ...


Classification Of Coronary Artery Disease In Non-Diabetic Patients Using Artificial Neural Networks, Demond Handley 2019 Illinois State University

Classification Of Coronary Artery Disease In Non-Diabetic Patients Using Artificial Neural Networks, Demond Handley

Annual Symposium on Biomathematics and Ecology: Education and Research

No abstract provided.


Identifying Risk Factors Related To Premature Birth Through Binary Logistic And Proportional Odds Ordinal Logistic Regression, Clayton Elwood 2019 Duquesne University

Identifying Risk Factors Related To Premature Birth Through Binary Logistic And Proportional Odds Ordinal Logistic Regression, Clayton Elwood

Electronic Theses and Dissertations

Premature birth has been identified as the single greatest cause of death worldwide in children under the age of five. This thesis will implement binary logistic regression and proportional odds ordinal logistic regression to predict different levels of premature birth and identify associated risk factors. The models will be built from the Center for Disease Control and Prevention's 2014 Vital Statistics Natality Birth Data containing nearly 4 million live births within the United States. Odds ratios and confidence intervals on risk factors were produced utilizing binary logistic regression.


Optimal Design For A Causal Structure, Zaher Kmail 2019 University of Nebraska-Lincoln

Optimal Design For A Causal Structure, Zaher Kmail

Dissertations and Theses in Statistics

Linear models and mixed models are important statistical tools. But in many natural phenomena, there is more than one endogenous variable involved and these variables are related in a sophisticated way. Structural Equation Modeling (SEM) is often used to model the complex relationships between the endogenous and exogenous variables. It was first implemented in research to estimate the strength and direction of direct and indirect effects among variables and to measure the relative magnitude of each causal factor.

Historically, traditional optimal design theory focuses on univariate linear, nonlinear, and mixed models. There is no current literature on the subject of ...


Classification With The Matrix-Variate-T Distribution, Geoffrey Z. Thompson, Ranjan Maitra, William Q. Meeker, Ashraf Bastawros 2019 Iowa State University

Classification With The Matrix-Variate-T Distribution, Geoffrey Z. Thompson, Ranjan Maitra, William Q. Meeker, Ashraf Bastawros

Ashraf Bastawros

Matrix-variate distributions can intuitively model the dependence structure of matrix-valued observations that arise in applications with multivariate time series, spatio-temporal or repeated measures. This paper develops an Expectation-Maximization algorithm for discriminant analysis and classification with matrix-variate t-distributions. The methodology shows promise on simulated datasets or when applied to the forensic matching of fractured surfaces or the classification of functional Magnetic Resonance, satellite or hand gestures images.


Taking Multiple Regression Analysis To Task: A Review Of Mindware: Tools For Smart Thinking, By Richard Nisbett (2015), Jason Makansi 2019 Pearl Street Inc.

Taking Multiple Regression Analysis To Task: A Review Of Mindware: Tools For Smart Thinking, By Richard Nisbett (2015), Jason Makansi

Numeracy

Richard Nisbett. 2015. Mindware: Tools for Smart Thinking.(New York, NY: Farrar, Strauss, and Giroux). 336 pp. ISBN: 9780374536244

Nisbett, a psychologist, may not achieve his stated goal of teaching readers to “effortlessly” extend their common sense when it comes to quantitative analysis applied to everyday issues, but his critique of multiple regression analysis (MRA) in the middle chapters of Mindware is worth attention from, and contemplation by, the QL/QR and Numeracy community. While in at least one other source, Nisbett’s critique has been called a “crusade” against MRA, what he really advocates is that it not be ...


Implementation Of Multivariate Artificial Neural Networks Coupled With Genetic Algorithms For The Multi-Objective Property Prediction And Optimization Of Emulsion Polymers, David Chisholm 2019 California Polytechnic State University, San Luis Obispo

Implementation Of Multivariate Artificial Neural Networks Coupled With Genetic Algorithms For The Multi-Objective Property Prediction And Optimization Of Emulsion Polymers, David Chisholm

Master's Theses and Project Reports

Machine learning has been gaining popularity over the past few decades as computers have become more advanced. On a fundamental level, machine learning consists of the use of computerized statistical methods to analyze data and discover trends that may not have been obvious or otherwise observable previously. These trends can then be used to make predictions on new data and explore entirely new design spaces. Methods vary from simple linear regression to highly complex neural networks, but the end goal is similar. The application of these methods to material property prediction and new material discovery has been of high interest ...


Blacklegged Tick (Ixodes Scapularis) Distribution In Maine, Usa, As Related To Climate Change, White-Tailed Deer, And The Landscape, Susan P. Elias 2019 University of Maine

Blacklegged Tick (Ixodes Scapularis) Distribution In Maine, Usa, As Related To Climate Change, White-Tailed Deer, And The Landscape, Susan P. Elias

Electronic Theses and Dissertations

Lyme disease is caused by the bacterial spirochete Borrelia burgdorferi, which is transmitted through the bite of an infected blacklegged (deer) tick (Ixodes scapularis). Geographic invasion of I. scapularis in North America has been attributed to causes including 20th century reforestation and suburbanization, burgeoning populations of the white-tailed deer (Odocoileus virginianus) which is the primary reproductive host of I. scapularis, tick-associated non-native plant invasions, and climate change. Maine, USA, is a high Lyme disease incidence state, with a history of increasing I. scapularis abundance and northward range expansion. This thesis addresses the question: “To what extent has the range expansion ...


Leveraging Reviews To Improve User Experience, Anthony Schams, Iram Bakhtiar, Cristina Stanley 2019 Southern Methodist University

Leveraging Reviews To Improve User Experience, Anthony Schams, Iram Bakhtiar, Cristina Stanley

SMU Data Science Review

In this paper, we will explore and present a method of finding characteristics of a restaurant using its reviews through machine learning algorithms. We begin by building models to predict the ratings of individual reviews using text and categorical features. This is to examine the efficacy of the algorithms to the task. Both XGBoost and logistic regression will be examined. With these models, our goal is then to identify key phrases in reviews that are correlated with positive and negative experience. Our analysis makes use of review data publicly made available by Yelp. Key bigrams extracted were non-specific to the ...


Comparison Of Imputation Methods For Mixed Data Missing At Random, Kaitlyn Heidt 2019 East Tennessee State University

Comparison Of Imputation Methods For Mixed Data Missing At Random, Kaitlyn Heidt

Electronic Theses and Dissertations

A statistician's job is to produce statistical models. When these models are precise and unbiased, we can relate them to new data appropriately. However, when data sets have missing values, assumptions to statistical methods are violated and produce biased results. The statistician's objective is to implement methods that produce unbiased and accurate results. Research in missing data is becoming popular as modern methods that produce unbiased and accurate results are emerging, such as MICE in R, a statistical software. Using real data, we compare four common imputation methods, in the MICE package in R, at different levels of ...


Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan 2019 Temple University

Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan

COBRA Preprint Series

One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease's process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional ...


Predicting Unplanned Medical Visits Among Patients With Diabetes Using Machine Learning, Arielle Selya, Eric L. Johnson 2019 Sanford Health

Predicting Unplanned Medical Visits Among Patients With Diabetes Using Machine Learning, Arielle Selya, Eric L. Johnson

SDSU Data Science Symposium

Diabetes poses a variety of medical complications to patients, resulting in a high rate of unplanned medical visits, which are costly to patients and healthcare providers alike. However, unplanned medical visits by their nature are very difficult to predict. The current project draws upon electronic health records (EMR’s) of adult patients with diabetes who received care at Sanford Health between 2014 and 2017. Various machine learning methods were used to predict which patients have had an unplanned medical visit based on a variety of EMR variables (age, BMI, blood pressure, # of prescriptions, # of diagnoses on problem list, A1C, HDL ...


Estimation Of Multivariate Asset Models With Jumps, Angela Loregian, Laura Ballotta, Gianluca Gianluca Fusai, MARCOS FABRICIO PEREZ 2019 ARPM

Estimation Of Multivariate Asset Models With Jumps, Angela Loregian, Laura Ballotta, Gianluca Gianluca Fusai, Marcos Fabricio Perez

Business Faculty Publications

We propose a consistent and computationally efficient two-step methodology for the estimation of multidimensional non-Gaussian asset models built using Levy processes. The proposed framework allows for dependence between assets and different tail behaviors and jump structures for each asset. Our procedure can be applied to portfolios with a large number of assets as it is immune to estimation dimensionality problems. Simulations show good finite sample properties and significant efficiency gains. This method is especially relevant for risk management purposes such as, for example, the computation of portfolio Value at Risk and intra-horizon Value at Risk, as we show in detail ...


Sparse Model Identification And Learning For Ultra-High-Dimensional Additive Partially Linear Models, Xinyi Li, Li Wang, Dan Nettleton 2019 University of North Carolina at Chapel Hill

Sparse Model Identification And Learning For Ultra-High-Dimensional Additive Partially Linear Models, Xinyi Li, Li Wang, Dan Nettleton

Statistics Publications

The additive partially linear model (APLM) combines the flexibility of nonparametric regression with the parsimony of regression models, and has been widely used as a popular tool in multivariate nonparametric regression to alleviate the “curse of dimensionality”. A natural question raised in practice is the choice of structure in the nonparametric part, i.e., whether the continuous covariates enter into the model in linear or nonparametric form. In this paper, we present a comprehensive framework for simultaneous sparse model identification and learning for ultra-high-dimensional APLMs where both the linear and nonparametric components are possibly larger than the sample size. We ...


Biodiversity And Distribution Of Benthic Foraminifera In Harrington Sound, Bermuda: The Effects Of Physical And Geochemical Factors On Dominant Taxa, Nam Le 2019 Colby College

Biodiversity And Distribution Of Benthic Foraminifera In Harrington Sound, Bermuda: The Effects Of Physical And Geochemical Factors On Dominant Taxa, Nam Le

Honors Theses

Harrington Sound, Bermuda, is a nearly enclosed lagoon acting as a subtropical/tropical, carbonate-rich basin in which carbonate sediments, reef patches, and carbonate-producing organisms accumulate. Here, one of the most important calcareous groups is the Foraminifera. Analyses of common benthic orders, including miliolids (Quinqueloculina and Triloculina spp.) and rotaliids (Homotrema rubrum, Elphidium spp., and Ammonia beccarii), are essential in understanding past and present environmental conditions affecting the island's coastal environment. These taxa have been studied previously; however, factors explaining their individual patterns of abundance in the Sound are not well detailed. The goal of this study is to understand ...


Essays On Mixture Models, Trevor R. Camper 2019 Georgia Southern University

Essays On Mixture Models, Trevor R. Camper

Electronic Theses and Dissertations

When considering statistical scenarios where one can sample from populations that are not of interest for the purposes of a study, bivariate mixture models can be used to study the effect that this missampling can have on parameter estimation. In this thesis, we will examine the behavior that bivariate mixture models have on two statistical constructs: Cronbach's alpha \cite{C51}, and Spearman's rho \cite{S04}. Chapter 1 will introduce notions of mixture models and the definition of bias under mixture models which will serve as the central concept of this thesis. Chapter 2 will investigate a particular psychometric ...


Digital Commons powered by bepress