Open Access. Powered by Scholars. Published by Universities.®

Multivariate Analysis Commons

Open Access. Powered by Scholars. Published by Universities.®

Applied Statistics

Selected Works

Institution
Keyword
Publication Year
Publication
File Type

Articles 1 - 30 of 49

Full-Text Articles in Multivariate Analysis

Multivariate Spectral Analysis Of Crism Data To Characterize The Composition Of Mawrth Vallis, Melissa Luna Mar 2018

Multivariate Spectral Analysis Of Crism Data To Characterize The Composition Of Mawrth Vallis, Melissa Luna

Melissa Luna

No abstract provided.


Bayesian Function-On-Function Regression For Multi-Level Functional Data, Mark J. Meyer, Brent A. Coull, Francesco Versace, Paul Cinciripini, Jeffrey S. Morris Jan 2015

Bayesian Function-On-Function Regression For Multi-Level Functional Data, Mark J. Meyer, Brent A. Coull, Francesco Versace, Paul Cinciripini, Jeffrey S. Morris

Jeffrey S. Morris

Medical and public health research increasingly involves the collection of more and more complex and high dimensional data. In particular, functional data|where the unit of observation is a curve or set of curves that are finely sampled over a grid -- is frequently obtained. Moreover, researchers often sample multiple curves per person resulting in repeated functional measures. A common question is how to analyze the relationship between two functional variables. We propose a general function-on-function regression model for repeatedly sampled functional data, presenting a simple model as well as a more extensive mixed model framework, along with multiple functional posterior …


Functional Regression, Jeffrey S. Morris Jan 2015

Functional Regression, Jeffrey S. Morris

Jeffrey S. Morris

Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the development of this field, which has accelerated in the past 10 years to become one of the fastest growing areas of statistics, fueled by the growing number of applications yielding this type of data. One unique characteristic of FDA is the need to combine information both across and within functions, which Ramsay and …


Depicting Estimates Using The Intercept In Meta-Regression Models: The Moving Constant Technique, Blair T. Johnson Dr., Tania B. Huedo-Medina Dr. Aug 2014

Depicting Estimates Using The Intercept In Meta-Regression Models: The Moving Constant Technique, Blair T. Johnson Dr., Tania B. Huedo-Medina Dr.

Blair T. Johnson

In any scientific discipline, the ability to portray research patterns graphically often aids greatly in interpreting a phenomenon. In part to depict phenomena, the statistics and capabilities of meta-analytic models have grown increasingly sophisticated. Accordingly, this article details how to move the constant in weighted meta-analysis regression models (viz. “meta-regression”) to illuminate the patterns in such models across a range of complexities. Although it is commonly ignored in practice, the constant (or intercept) in such models can be indispensible when it is not relegated to its usual static role. The moving constant technique makes possible estimates and confidence intervals at …


A Comparative Analysis Of Decision Trees Vis-À-Vis Other Computational Data Mining Techniques In Automotive Insurance Fraud Detection, Adrian Gepp, Kuldeep Kumar, J Holton Wilson, Sukanto Bhattacharya Jul 2014

A Comparative Analysis Of Decision Trees Vis-À-Vis Other Computational Data Mining Techniques In Automotive Insurance Fraud Detection, Adrian Gepp, Kuldeep Kumar, J Holton Wilson, Sukanto Bhattacharya

Kuldeep Kumar

No abstract provided.


From Amazon To Apple: Modeling Online Retail Sales, Purchase Incidence And Visit Behavior, Anastasios Panagiotelis, Michael S. Smith, Peter Danaher Dec 2013

From Amazon To Apple: Modeling Online Retail Sales, Purchase Incidence And Visit Behavior, Anastasios Panagiotelis, Michael S. Smith, Peter Danaher

Michael Stanley Smith

In this study we propose a multivariate stochastic model for website visit duration, page views, purchase incidence and the sale amount for online retailers. The model is constructed by composition from carefully selected distributions, and involves copula components. It allows for the strong nonlinear relationships between the sales and visit variables to be explored in detail, and can be used to construct sales predictions. The model is readily estimated using maximum likelihood, making it an attractive choice in practice given the large sample sizes that are commonplace in online retail studies. We examine a number of top-ranked U.S. online retailers, …


Spectral Density Shrinkage For High-Dimensional Time Series, Mark Fiecas, Rainer Von Sachs Dec 2013

Spectral Density Shrinkage For High-Dimensional Time Series, Mark Fiecas, Rainer Von Sachs

Mark Fiecas

Time series data obtained from neurophysiological signals is often high-dimensional and the length of the time series is often short relative to the number of dimensions. Thus, it is difficult or sometimes impossible to compute statistics that are based on the spectral density matrix because these matrices are numerically unstable. In this work, we discuss the importance of regularization for spectral analysis of high-dimensional time series and propose shrinkage estimation for estimating high-dimensional spectral density matrices. The shrinkage estimator is derived from a penalized log-likelihood, and the optimal penalty parameter has a closed-form solution, which can be estimated using the …


Global Quantitative Assessment Of The Colorectal Polyp Burden In Familial Adenomatous Polyposis Using A Web-Based Tool, Patrick M. Lynch, Jeffrey S. Morris, William A. Ross, Miguel A. Rodriguez-Bigas, Juan Posadas, Rossa Khalaf, Diane M. Weber, Valerie O. Sepeda, Bernard Levin, Imad Shureiqi Jan 2013

Global Quantitative Assessment Of The Colorectal Polyp Burden In Familial Adenomatous Polyposis Using A Web-Based Tool, Patrick M. Lynch, Jeffrey S. Morris, William A. Ross, Miguel A. Rodriguez-Bigas, Juan Posadas, Rossa Khalaf, Diane M. Weber, Valerie O. Sepeda, Bernard Levin, Imad Shureiqi

Jeffrey S. Morris

Background: Accurate measures of the total polyp burden in familial adenomatous polyposis (FAP) are lacking. Current assessment tools include polyp quantitation in limited-field photographs and qualitative total colorectal polyp burden by video.

Objective: To develop global quantitative tools of the FAP colorectal adenoma burden.

Design: A single-arm, phase II trial.

Patients: Twenty-seven patients with FAP.

Intervention: Treatment with celecoxib for 6 months, with before-treatment and after-treatment videos posted to an intranet with an interactive site for scoring.

Main Outcome Measurements: Global adenoma counts and sizes (grouped into categories: less than 2 mm, 2-4 mm, and greater than 4 mm) were …


Sas Macro: Kappa Statistic For Clustered Matched-Pair Data, Zhao Yang Jan 2013

Sas Macro: Kappa Statistic For Clustered Matched-Pair Data, Zhao Yang

Zhao (Tony) Yang, Ph.D.

The SAS macro was developed to calculate the kappa statistic for the clustered matched-pair data.


A Comparison Of Periodic Autoregressive And Dynamic Factor Models In Intraday Energy Demand Forecasting, Thomas Mestekemper, Goeran Kauermann, Michael Smith Dec 2012

A Comparison Of Periodic Autoregressive And Dynamic Factor Models In Intraday Energy Demand Forecasting, Thomas Mestekemper, Goeran Kauermann, Michael Smith

Michael Stanley Smith

We suggest a new approach for forecasting energy demand at an intraday resolution. Demand in each intraday period is modeled using semiparametric regression smoothing to account for calendar and weather components. Residual serial dependence is captured by one of two multivariate stationary time series models, with dimension equal to the number of intraday periods. These are a periodic autoregression and a dynamic factor model. We show the benefits of our approach in the forecasting of district heating demand in a steam network in Germany and aggregate electricity demand in the state of Victoria, Australia. In both studies, accounting for weather …


Bayesian Approaches To Copula Modelling, Michael S. Smith Dec 2012

Bayesian Approaches To Copula Modelling, Michael S. Smith

Michael Stanley Smith

Copula models have become one of the most widely used tools in the applied modelling of multivariate data. Similarly, Bayesian methods are increasingly used to obtain efficient likelihood-based inference. However, to date, there has been only limited use of Bayesian approaches in the formulation and estimation of copula models. This article aims to address this shortcoming in two ways. First, to introduce copula models and aspects of copula theory that are especially relevant for a Bayesian analysis. Second, to outline Bayesian approaches to formulating and estimating copula models, and their advantages over alternative methods. Copulas covered include Archimedean, copulas constructed …


Time Series, Unit Roots, And Cointegration: An Introduction, Lonnie K. Stevans Dec 2012

Time Series, Unit Roots, And Cointegration: An Introduction, Lonnie K. Stevans

Lonnie K. Stevans

The econometric literature on unit roots took off after the publication of the paper by Nelson and Plosser (1982) that argued that most macroeconomic series have unit roots and that this is important for the analysis of macroeconomic policy. Yule (1926) suggested that regressions based on trending time series data can be spurious. This problem of spurious correlation was further pursued by Granger and Newbold (1974) and this also led to the development of the concept of cointegration (lack of cointegration implies spurious regression). The pathbreaking paper by Granger (1981), first presented at a conference at the University of Florida …


Statistical Methods For Proteomic Biomarker Discovery Based On Feature Extraction Or Functional Modeling Approaches, Jeffrey S. Morris Jan 2012

Statistical Methods For Proteomic Biomarker Discovery Based On Feature Extraction Or Functional Modeling Approaches, Jeffrey S. Morris

Jeffrey S. Morris

In recent years, developments in molecular biotechnology have led to the increased promise of detecting and validating biomarkers, or molecular markers that relate to various biological or medical outcomes. Proteomics, the direct study of proteins in biological samples, plays an important role in the biomarker discovery process. These technologies produce complex, high dimensional functional and image data that present many analytical challenges that must be addressed properly for effective comparative proteomics studies that can yield potential biomarkers. Specific challenges include experimental design, preprocessing, feature extraction, and statistical analysis accounting for the inherent multiple testing issues. This paper reviews various computational …


Integrative Bayesian Analysis Of High-Dimensional Multi-Platform Genomics Data, Wenting Wang, Veerabhadran Baladandayuthapani, Jeffrey S. Morris, Bradley M. Broom, Ganiraju C. Manyam, Kim-Anh Do Jan 2012

Integrative Bayesian Analysis Of High-Dimensional Multi-Platform Genomics Data, Wenting Wang, Veerabhadran Baladandayuthapani, Jeffrey S. Morris, Bradley M. Broom, Ganiraju C. Manyam, Kim-Anh Do

Jeffrey S. Morris

Motivation: Analyzing data from multi-platform genomics experiments combined with patients’ clinical outcomes helps us understand the complex biological processes that characterize a disease, as well as how these processes relate to the development of the disease. Current integration approaches that treat the data are limited in that they do not consider the fundamental biological relationships that exist among the data from platforms.

Statistical Model: We propose an integrative Bayesian analysis of genomics data (iBAG) framework for identifying important genes/biomarkers that are associated with clinical outcome. This framework uses a hierarchical modeling technique to combine the data obtained from multiple platforms …


A Comparative Analysis Of Decision Trees Vis-À-Vis Other Computational Data Mining Techniques In Automotive Insurance Fraud Detection, Adrian Gepp, Kuldeep Kumar, J Holton Wilson, Sukanto Bhattacharya Dec 2011

A Comparative Analysis Of Decision Trees Vis-À-Vis Other Computational Data Mining Techniques In Automotive Insurance Fraud Detection, Adrian Gepp, Kuldeep Kumar, J Holton Wilson, Sukanto Bhattacharya

Adrian Gepp

No abstract provided.


Modeling Dependence Using Skew T Copulas: Bayesian Inference And Applications, Michael S. Smith, Quan Gan, Robert Kohn Dec 2011

Modeling Dependence Using Skew T Copulas: Bayesian Inference And Applications, Michael S. Smith, Quan Gan, Robert Kohn

Michael Stanley Smith

[THIS IS AN AUGUST 2010 REVISION THAT REPLACES ALL PREVIOUS VERSIONS.]

We construct a copula from the skew t distribution of Sahu, Dey & Branco (2003). This copula can capture asymmetric and extreme dependence between variables, and is one of the few copulas that can do so and still be used in high dimensions effectively. However, it is difficult to estimate the copula model by maximum likelihood when the multivariate dimension is high, or when some or all of the marginal distributions are discrete-valued, or when the parameters in the marginal distributions and copula are estimated jointly. We therefore propose …


Estimation Of Copula Models With Discrete Margins Via Bayesian Data Augmentation, Michael S. Smith, Mohamad A. Khaled Dec 2011

Estimation Of Copula Models With Discrete Margins Via Bayesian Data Augmentation, Michael S. Smith, Mohamad A. Khaled

Michael Stanley Smith

Estimation of copula models with discrete margins is known to be difficult beyond the bivariate case. We show how this can be achieved by augmenting the likelihood with latent variables, and computing inference using the resulting augmented posterior. To evaluate this we propose two efficient Markov chain Monte Carlo sampling schemes. One generates the latent variables as a block using a Metropolis-Hasting step with a proposal that is close to its target distribution, the other generates them one at a time. Our method applies to all parametric copulas where the conditional copula functions can be evaluated, not just elliptical copulas …


Identifying Unique Neighborhood Characteristics To Guide Health Planning For Stroke And Heart Attack: Fuzzy Cluster And Discriminant Analyses Approaches, Ashley Pedigo, William Seaver, Agricola Odoi Jul 2011

Identifying Unique Neighborhood Characteristics To Guide Health Planning For Stroke And Heart Attack: Fuzzy Cluster And Discriminant Analyses Approaches, Ashley Pedigo, William Seaver, Agricola Odoi

Agricola Odoi

Background: Socioeconomic, demographic, and geographic factors are known determinants of stroke and myocardial infarction (MI) risk. Clustering of these factors in neighborhoods needs to be taken into consideration during planning, prioritization and implementation of health programs intended to reduce disparities. Given the complex and multidimensional nature of these factors, multivariate methods are needed to identify neighborhood clusters of these determinants so as to better understand the unique neighborhood profiles. This information is critical for evidence-based health planning and service provision. Therefore, this study used a robust multivariate approach to classify neighborhoods and identify their socio-demographic characteristics so as to provide …


Cv, Lorán Chollete Jan 2011

Cv, Lorán Chollete

Lorán Chollete

No abstract provided.


International Diversification: An Extreme Value Approach, Lorán Chollete, Victor De La Peña, Ching-Chih Lu Jan 2011

International Diversification: An Extreme Value Approach, Lorán Chollete, Victor De La Peña, Ching-Chih Lu

Lorán Chollete

No abstract provided.


Rejoinder: Estimation Issues For Copulas Applied To Marketing Data, Peter Danaher, Michael Smith Dec 2010

Rejoinder: Estimation Issues For Copulas Applied To Marketing Data, Peter Danaher, Michael Smith

Michael Stanley Smith

Estimating copula models using Bayesian methods presents some subtle challenges, ranging from specification of the prior to computational tractability. There is also some debate about what is the most appropriate copula to employ from those available. We address these issues here and conclude by discussing further applications of copula models in marketing.


Forecasting Television Ratings, Peter Danaher, Tracey Dagger, Michael Smith Dec 2010

Forecasting Television Ratings, Peter Danaher, Tracey Dagger, Michael Smith

Michael Stanley Smith

Despite the state of flux in media today, television remains the dominant player globally for advertising spend. Since television advertising time is purchased on the basis of projected future ratings, and ad costs have skyrocketed, there is increasing pressure to forecast television ratings accurately. Previous forecasting methods are not generally very reliable and many have not been validated, but more distressingly, none have been tested in today’s multichannel environment. In this study we compare 8 different forecasting models, ranging from a naïve empirical method to a state-of-the-art Bayesian model-averaging method. Our data come from a recent time period, 2004-2008 in …


Windows Executable For Gaussian Copula With Nbd Margins, Michael S. Smith Dec 2010

Windows Executable For Gaussian Copula With Nbd Margins, Michael S. Smith

Michael Stanley Smith

This is an example Windows 32bit program to estimate a Gaussian copula model with NBD margins. The margins are estimated first using MLE, and the copula second using Bayesian MCMC. The model was discussed in Danaher & Smith (2011; Marketing Science) as example 4 (section 4.2).


Modeling Multivariate Distributions Using Copulas: Applications In Marketing, Peter J. Danaher, Michael S. Smith Dec 2010

Modeling Multivariate Distributions Using Copulas: Applications In Marketing, Peter J. Danaher, Michael S. Smith

Michael Stanley Smith

In this research we introduce a new class of multivariate probability models to the marketing literature. Known as “copula models”, they have a number of attractive features. First, they permit the combination of any univariate marginal distributions that need not come from the same distributional family. Second, a particular class of copula models, called “elliptical copula”, have the property that they increase in complexity at a much slower rate than existing multivariate probability models as the number of dimensions increase. Third, they are very general, encompassing a number of existing multivariate models, and provide a framework for generating many more. …


Bicycle Commuting In Melbourne During The 2000s Energy Crisis: A Semiparametric Analysis Of Intraday Volumes, Michael S. Smith, Goeran Kauermann Dec 2010

Bicycle Commuting In Melbourne During The 2000s Energy Crisis: A Semiparametric Analysis Of Intraday Volumes, Michael S. Smith, Goeran Kauermann

Michael Stanley Smith

Cycling is attracting renewed attention as a mode of transport in western urban environments, yet the determinants of usage are poorly understood. In this paper we investigate some of these using intraday bicycle volumes collected via induction loops located at ten bike paths in the city of Melbourne, Australia, between December 2005 and June 2008. The data are hourly counts at each location, with temporal and spatial disaggregation allowing for the impact of meteorology to be measured accurately for the first time. Moreover, during this period petrol prices varied dramatically and the data also provide a unique opportunity to assess …


Modeling Longitudinal Data Using A Pair-Copula Decomposition Of Serial Dependence, Michael S. Smith, Aleksey Min, Carlos Almeida, Claudia Czado Nov 2010

Modeling Longitudinal Data Using A Pair-Copula Decomposition Of Serial Dependence, Michael S. Smith, Aleksey Min, Carlos Almeida, Claudia Czado

Michael Stanley Smith

Copulas have proven to be very successful tools for the flexible modelling of cross-sectional dependence. In this paper we express the dependence structure of continuous-valued time series data using a sequence of bivariate copulas. This corresponds to a type of decomposition recently called a ‘vine’ in the graphical models literature, where each copula is entitled a ‘pair-copula’. We propose a Bayesian approach for the estimation of this dependence structure for longitudinal data. Bayesian selection ideas are used to identify any independence pair-copulas, with the end result being a parsimonious representation of a time-inhomogeneous Markov process of varying order. Estimates are …


Estimating Confidence Intervals For Eigenvalues In Exploratory Factor Analysis, Ross Larsen, Russell Warne Jul 2010

Estimating Confidence Intervals For Eigenvalues In Exploratory Factor Analysis, Ross Larsen, Russell Warne

Russell T Warne

Exploratory factor analysis (EFA) has become a common procedure in educational and psychological research. In the course of performing an EFA, researchers often base the decision of how many factors to retain on the eigenvalues for the factors. However, many researchers do not realize that eigenvalues, like all sample statistics, are subject to sampling error, which means that confidence intervals (CIs) can be estimated for each eigenvalue. In the present article, we demonstrate two methods of estimating CIs for eigenvalues: one based on the mathematical properties of the central limit theorem, and the other based on bootstrapping. References to appropriate …


International Diversification: A Copula Approach, Lorán Chollete, Victor De La Pena, Ching-Chih Lu Jan 2010

International Diversification: A Copula Approach, Lorán Chollete, Victor De La Pena, Ching-Chih Lu

Lorán Chollete

No abstract provided.


Wavelet-Based Functional Linear Mixed Models: An Application To Measurement Error–Corrected Distributed Lag Models, Elizabeth J. Malloy, Jeffrey S. Morris, Sara D. Adar, Helen Suh, Diane R. Gold, Brent A. Coull Jan 2010

Wavelet-Based Functional Linear Mixed Models: An Application To Measurement Error–Corrected Distributed Lag Models, Elizabeth J. Malloy, Jeffrey S. Morris, Sara D. Adar, Helen Suh, Diane R. Gold, Brent A. Coull

Jeffrey S. Morris

Frequently, exposure data are measured over time on a grid of discrete values that collectively define a functional observation. In many applications, researchers are interested in using these measurements as covariates to predict a scalar response in a regression setting, with interest focusing on the most biologically relevant time window of exposure. One example is in panel studies of the health effects of particulate matter (PM), where particle levels are measured over time. In such studies, there are many more values of the functional data than observations in the data set so that regularization of the corresponding functional regression coefficient …


Members’ Discoveries: Fatal Flaws In Cancer Research, Jeffrey S. Morris Jan 2010

Members’ Discoveries: Fatal Flaws In Cancer Research, Jeffrey S. Morris

Jeffrey S. Morris

A recent article published in The Annals of Applied Statistics (AOAS) by two MD Anderson researchers—Keith Baggerly and Kevin Coombes—dissects results from a highly-influential series of medical papers involving genomics-driven personalized cancer therapy, and outlines a series of simple yet fatal flaws that raises serious questions about the veracity of the original results. Having immediate and strong impact, this paper, along with related work, is providing the impetus for new standards of reproducibility in scientific research.