Open Access. Powered by Scholars. Published by Universities.®

Multivariate Analysis Commons

Open Access. Powered by Scholars. Published by Universities.®

2016

Discipline
Institution
Keyword
Publication
Publication Type
File Type

Articles 1 - 23 of 23

Full-Text Articles in Multivariate Analysis

Tutorial For Using The Center For High Performance Computing At The University Of Utah And An Example Using Random Forest, Stephen Barton Dec 2016

Tutorial For Using The Center For High Performance Computing At The University Of Utah And An Example Using Random Forest, Stephen Barton

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Random Forests are very memory intensive machine learning algorithms and most computers would fail at building models from datasets with millions of observations. Using the Center for High Performance Computing (CHPC) at the University of Utah and an airline on-time arrival dataset with 7 million observations from the U.S. Department of Transportation Bureau of Transportation Statistics we built 316 models by adjusting the depth of the trees and randomness of each forest and compared the accuracy and time each took. Using this dataset we discovered that substantial restrictions to the size of trees, observations allowed for each tree, and variables …


A Traders Guide To The Predictive Universe- A Model For Predicting Oil Price Targets And Trading On Them, Jimmie Harold Lenz Dec 2016

A Traders Guide To The Predictive Universe- A Model For Predicting Oil Price Targets And Trading On Them, Jimmie Harold Lenz

Doctor of Business Administration Dissertations

At heart every trader loves volatility; this is where return on investment comes from, this is what drives the proverbial “positive alpha.” As a trader, understanding the probabilities related to the volatility of prices is key, however if you could also predict future prices with reliability the world would be your oyster. To this end, I have achieved three goals with this dissertation, to develop a model to predict future short term prices (direction and magnitude), to effectively test this by generating consistent profits utilizing a trading model developed for this purpose, and to write a paper that anyone with …


Biogeographical Patterns Of Soil Microbial Communities: Ecological, Structural, And Functional Diversity And Their Application To Soil Provenance, Natalie Damaso Oct 2016

Biogeographical Patterns Of Soil Microbial Communities: Ecological, Structural, And Functional Diversity And Their Application To Soil Provenance, Natalie Damaso

FIU Electronic Theses and Dissertations

The current ecological hypothesis states that the soil type (e.g., chemical and physical properties) determines which microbes occupy a particular soil and provides the foundation for soil provenance studies. As human profiles are used to determine a match between evidence from a crime scene and a suspect, a soil microbial profile can be used to determine a match between soil found on the suspect’s shoes or clothing to the soil at a crime scene. However, for a robust tool to be applied in forensic application, an understanding of the uncertainty associated with any comparisons and the parameters that can significantly …


Development Of Anatomical And Functional Magnetic Resonance Imaging Measures Of Alzheimer Disease, Samaneh Kazemifar Oct 2016

Development Of Anatomical And Functional Magnetic Resonance Imaging Measures Of Alzheimer Disease, Samaneh Kazemifar

Electronic Thesis and Dissertation Repository

Alzheimer disease is considered to be a progressive neurodegenerative condition, clinically characterized by cognitive dysfunction and memory impairments. Incorporating imaging biomarkers in the early diagnosis and monitoring of disease progression is increasingly important in the evaluation of novel treatments. The purpose of the work in this thesis was to develop and evaluate novel structural and functional biomarkers of disease to improve Alzheimer disease diagnosis and treatment monitoring. Our overarching hypothesis is that magnetic resonance imaging methods that sensitively measure brain structure and functional impairment have the potential to identify people with Alzheimer’s disease prior to the onset of cognitive decline. …


Advanced Data Analysis - Lecture Notes, Erik B. Erhardt, Edward J. Bedrick, Ronald M. Schrader Oct 2016

Advanced Data Analysis - Lecture Notes, Erik B. Erhardt, Edward J. Bedrick, Ronald M. Schrader

Open Textbooks

Lecture notes for Advanced Data Analysis (ADA1 Stat 427/527 and ADA2 Stat 428/528), Department of Mathematics and Statistics, University of New Mexico, Fall 2016-Spring 2017. Additional material including RMarkdown templates for in-class and homework exercises, datasets, R code, and video lectures are available on the course websites: https://statacumen.com/teaching/ada1 and https://statacumen.com/teaching/ada2 .

Contents

I ADA1: Software

  • 0 Introduction to R, Rstudio, and ggplot

II ADA1: Summaries and displays, and one-, two-, and many-way tests of means

  • 1 Summarizing and Displaying Data
  • 2 Estimation in One-Sample Problems
  • 3 Two-Sample Inferences
  • 4 Checking Assumptions
  • 5 One-Way Analysis of Variance

III ADA1: Nonparametric, categorical, …


Implementing Some Basic Simuation Designs Using The Simsem Package In R, Keith A. Markus Sep 2016

Implementing Some Basic Simuation Designs Using The Simsem Package In R, Keith A. Markus

Open Educational Resources

The purpose of this tutorial is to provide a very basic introduction to implementing three simple research designs using the simsem package in R. R is an open source statistical computing environment (R Core Team, 2015). For more information about R, see the R Project homepage (https://www.r-project.org/) and the Comprehensive R Archive Network (CRAN) web page (https://cran.r-project.org/). The lavaan package provides functions for fitting and evaluating structural equation models (Rosseel, 2012). For further information about the lavaan package including tutorials, see the lavaan Project web page (http://lavaan.ugent.be/). The simsem package (Pornprasertmanit, Miller & Schoemann, 2016) provides functions to facilitate structural …


The Influence Of The Electric Supply Industry On Economic Growth In Less Developed Countries, Edward Richard Bee Aug 2016

The Influence Of The Electric Supply Industry On Economic Growth In Less Developed Countries, Edward Richard Bee

Dissertations

This study measures the impact that electrical outages have on manufacturing production in 135 less developed countries using stochastic frontier analysis and data from World Bank’s Investment Climate surveys. Outages of electricity, for firms with and without backup power sources, are the most frequently cited constraint on manufacturing growth in these surveys.

Outages are shown to reduce output below the production frontier by almost five percent in Africa and by a lower percentage in South Asia, Southeast Asia and the Middle East and North Africa. Production response to outages is quadratic in form. Outages also increase labor cost, reduce exports …


Regional Dynamic Price Relationships Of Distillers Dried Grains In U.S. Feed Markets, Matthew Fulton Johnson Aug 2016

Regional Dynamic Price Relationships Of Distillers Dried Grains In U.S. Feed Markets, Matthew Fulton Johnson

Masters Theses

Distillers dried grains with solubles (DDGS) is now a mainstream substitute in U.S. animal feed rations. DDGS is rich in fat and protein content and serves as a competitive feed source in livestock markets. The objective of this study is to identify dynamic price relationships among DDGS, corn, soybean meal, and livestock outputs in context of specific livestock sectors and their geographic location. Four locations associated with a predominant livestock sector are selected for analysis by measuring density and relative proportion of a livestock sector’s grain consumption at the county level. A vector error correction model is applied to post-mandate …


Variable Selection Via Penalized Regression And The Genetic Algorithm Using Information Complexity, With Applications For High-Dimensional -Omics Data, Tyler J. Massaro Aug 2016

Variable Selection Via Penalized Regression And The Genetic Algorithm Using Information Complexity, With Applications For High-Dimensional -Omics Data, Tyler J. Massaro

Doctoral Dissertations

This dissertation is a collection of examples, algorithms, and techniques for researchers interested in selecting influential variables from statistical regression models. Chapters 1, 2, and 3 provide background information that will be used throughout the remaining chapters, on topics including but not limited to information complexity, model selection, covariance estimation, stepwise variable selection, penalized regression, and especially the genetic algorithm (GA) approach to variable subsetting.

In chapter 4, we fully develop the framework for performing GA subset selection in logistic regression models. We present advantages of this approach against stepwise and elastic net regularized regression in selecting variables from a …


Quantifying Transit Access In New York City: Formulating An Accessibility Index For Analyzing Spatial And Social Patterns Of Public Transportation, Maxwell S. Siegel May 2016

Quantifying Transit Access In New York City: Formulating An Accessibility Index For Analyzing Spatial And Social Patterns Of Public Transportation, Maxwell S. Siegel

Theses and Dissertations

This paper aims to analyze accessibility within New York City’s transportation system through creating unique accessibility indices. Indices are detailed and implemented using GIS, analyzing the distribution of transit need and access. Regression analyses are performed highlighting relationships between demographics and accessibility and recommendations for transit expansion are presented.


Multivariate Thinking In An Intro Stats Course – Is It Possible?, Beverly Wood May 2016

Multivariate Thinking In An Intro Stats Course – Is It Possible?, Beverly Wood

Publications

Many of our students have an intuitive sense that there is more to the story than univariate or bivariate data can tell us. We can acknowledge and encourage that habit of digging deeper by demonstrating some ways to look at additional variables. Simpson’s paradox and side-by-side scatter plots are ways to provide a glimpse of more complex analysis that are accessible to students in an introductory course with or without strong quantitative skills.


Bivariate Negative Binomial Hurdle With Random Spatial Effects, Robert Mcnutt Apr 2016

Bivariate Negative Binomial Hurdle With Random Spatial Effects, Robert Mcnutt

Dissertations

Count data with excess zeros widely occur in ecology, epidemiology, marketing, and many other disciplines. Mixture distributions consisting of a point mass at zero and a separate discrete distribution are often employed in regression models to account for excessive zero observations in the data. While Poisson models are very popular for count data, Negative Binomial models provide greater flexibility due to their ability to account for overdispersion.

This research focuses on developing a method for analyzing bivariate count data with excess zeros collected over a lattice. A bivariate Zero-Inflated Negative Binomial Hurdle (ZINBH) regression model with spatial random effects is …


Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang Feb 2016

Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang

COBRA Preprint Series

Non-negative matrix factorization (NMF) is a widely used machine learning algorithm for dimension reduction of large-scale data. It has found successful applications in a variety of fields such as computational biology, neuroscience, natural language processing, information retrieval, image processing and speech recognition. In bioinformatics, for example, it has been used to extract patterns and profiles from genomic and text-mining data as well as in protein sequence and structure analysis. While the scientific performance of NMF is very promising in dealing with high dimensional data sets and complex data structures, its computational cost is high and sometimes could be critical for …


Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret Jan 2016

Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret

UW Biostatistics Working Paper Series

We have frequently implemented crossover studies to evaluate new therapeutic interventions for genital herpes simplex virus infection. The outcome measured to assess the efficacy of interventions on herpes disease severity is the viral shedding rate, defined as the frequency of detection of HSV on the genital skin and mucosa. We performed a simulation study to ascertain whether our standard model, which we have used previously, was appropriately considering all the necessary features of the shedding data to provide correct inference. We simulated shedding data under our standard, validated assumptions and assessed the ability of 5 different models to reproduce the …


The Finney County, Kansas Community Assessment Process: Fact Book, Debra J. Bolton Phd, Shannon L. Dick M.S. Jan 2016

The Finney County, Kansas Community Assessment Process: Fact Book, Debra J. Bolton Phd, Shannon L. Dick M.S.

NPP eBooks

This multi-lingual/multi-cultural study was called, Community Assets Processt, by the groups that “commissioned” it: Finnup Foundation, Finney County K-State Research & Extension, Western Kansas Community Foundation, Finney County United Way, Finney County Health Department, United Methodist Community Health Center (UMMAM), Center for Children and Families, Garden City Recreation Commission, and the Garden City Cultural Relations Board, because we intend for this to be an ongoing discussion.

An objective, for those promoting the study, was to connect foundation, state, and federal funding with activities or services that addressed the true needs of people living in Finney County. The group was looking …


Spatiotemporal Meta-Analysis: Reviewing Health Psychology Phenomena Over Space And Time., Blair T. Johnson Jan 2016

Spatiotemporal Meta-Analysis: Reviewing Health Psychology Phenomena Over Space And Time., Blair T. Johnson

CHIP Documents

This supplemental material is meant to support this article:

Johnson, B. T., Crowley, E., & Marrouch, N. Spatiotemporal meta-analysis: Reviewing health psychology phenomena over space and time. Health Psychology Review.

Specifically, it is a database of GDPs per capita for nations in the world between 1800 and 2015. It is archived here to support an online supplement to this article.

GDP per capita


Online Variational Bayes Inference For High-Dimensional Correlated Data, Sylvie T. Kabisa, Jeffrey S. Morris, David Dunson Jan 2016

Online Variational Bayes Inference For High-Dimensional Correlated Data, Sylvie T. Kabisa, Jeffrey S. Morris, David Dunson

Jeffrey S. Morris

High-dimensional data with hundreds of thousands of observations are becoming commonplace in many disciplines. The analysis of such data poses many computational challenges, especially when the observations are correlated over time and/or across space. In this paper we propose exible hierarchical regression models for analyzing such data that accommodate serial and/or spatial correlation. We address the computational challenges involved in fitting these models by adopting an approximate inference framework. We develop an online variational Bayes algorithm that works by incrementally reading the data into memory one portion at a time. The performance of the method is assessed through simulation studies. …


Development In Normal Mixture And Mixture Of Experts Modeling, Meng Qi Jan 2016

Development In Normal Mixture And Mixture Of Experts Modeling, Meng Qi

Theses and Dissertations--Statistics

In this dissertation, first we consider the problem of testing homogeneity and order in a contaminated normal model, when the data is correlated under some known covariance structure. To address this problem, we developed a moment based homogeneity and order test, and design weights for test statistics to increase power for homogeneity test. We applied our test to microarray about Down’s syndrome. This dissertation also studies a singular Bayesian information criterion (sBIC) for a bivariate hierarchical mixture model with varying weights, and develops a new data dependent information criterion (sFLIC).We apply our model and criteria to birth- weight and gestational …


Provision Of Hospital-Based Palliative Care And The Impact On Organizational And Patient Outcomes, Marisa L. Roczen Jan 2016

Provision Of Hospital-Based Palliative Care And The Impact On Organizational And Patient Outcomes, Marisa L. Roczen

Theses and Dissertations

Hospital-based palliative care services aim to streamline medical care for patients with chronic and potentially life-limiting illnesses by focusing on individual patient needs, efficient use of hospital resources, and providing guidance for patients, patients’ families and clinical providers toward making optimal decisions concerning a patient’s care. This study examined the nature of palliative care provision in U.S. hospitals and its impact on selected organizational and patient outcomes, including hospital costs, length of stay, in-hospital mortality, and transfer to hospice. Hospital costs and length of stay are viewed as important economic indicators. Specifically, lower hospital costs may increase a hospital’s profit …


The Relationship Between Exercise And Depression And Anxiety In College Students, Joshua Frank, Dr. Amy Adkins, Nathan Thomas, Dr. Danielle Dick Jan 2016

The Relationship Between Exercise And Depression And Anxiety In College Students, Joshua Frank, Dr. Amy Adkins, Nathan Thomas, Dr. Danielle Dick

Undergraduate Research Posters

The literature shows an inverse association between exercise and mental disorders. The aim of this study is to further elaborate on this association with regards to exercise and its relationship with anxiety and depression in a college sample. The subject group focused on seniors in the Spit for Science data set which incorporated a total of 821 students. Physical activity was assessed using the International Physical Activity Questionnaire (IPAQ) to estimate the overall metabolic equivalents (MET’s) each student spent in walking, moderate, or vigorous activity levels in the previous week. Sum scores were used to measure depression and anxiety. Overall,the …


A Comparison Of The Utility Of Craniometric And Dental Morphological Data For Assessing Biodistance And Sex-Differential Migration In The Pacific Islands, Brittney A. Eubank Jan 2016

A Comparison Of The Utility Of Craniometric And Dental Morphological Data For Assessing Biodistance And Sex-Differential Migration In The Pacific Islands, Brittney A. Eubank

Graduate Student Theses, Dissertations, & Professional Papers

Genetic analysis of maternally-inherited mitochondrial DNA and the paternally-inherited Y-chromosome yield contrasting pictures of movement of peoples into the Pacific Islands. A possible explanation for this discrepancy is a matrilocal residency pattern practiced by early Pacific settlers, in which Melanesian men were brought into settler communities to intermarry with local women, yielding a higher intrapopulation variance and lower interpopulation variance exhibited in males compared to females. This research investigates the possibility of sex-differential migration in the Oceanic populations of Easter Island, Fiji, Guam, Mokapu, and New Britain through analysis of biodistance based on dental morphological trait frequencies and craniometric measures …


Dimension Reduction And Variable Selection, Hossein Moradi Rekabdarkolaee Jan 2016

Dimension Reduction And Variable Selection, Hossein Moradi Rekabdarkolaee

Theses and Dissertations

High-dimensional data are becoming increasingly available as data collection technology advances. Over the last decade, significant developments have been taking place in high-dimensional data analysis, driven primarily by a wide range of applications in many fields such as genomics, signal processing, and environmental studies. Statistical techniques such as dimension reduction and variable selection play important roles in high dimensional data analysis. Sufficient dimension reduction provides a way to find the reduced space of the original space without a parametric model. This method has been widely applied in many scientific fields such as genetics, brain imaging analysis, econometrics, environmental sciences, etc. …


Shrinkage Estimation For Multivariate Hidden Markov Mixture Models, Mark Fiecas, Jürgen Franke, Rainer Von Sachs, Joseph Tadjuidje Dec 2015

Shrinkage Estimation For Multivariate Hidden Markov Mixture Models, Mark Fiecas, Jürgen Franke, Rainer Von Sachs, Joseph Tadjuidje

Mark Fiecas

Motivated from a changing market environment over time, we consider high-dimensional data such as financial returns, generated by a hidden Markov model which allows for switching between different regimes or states. To get more stable estimates of the covariance matrices of the different states, potentially driven by a number of observations which is small compared to the dimension, we apply shrinkage and combine it with an EM-type algorithm. This approach will yield better estimates a more stable estimates of the covariance matrix, which allows for improved reconstruction of the hidden Markov chain. In addition to a simulation study and the …