Open Access. Powered by Scholars. Published by Universities.®

Applied Statistics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 66

Full-Text Articles in Applied Statistics

Multivariate Spectral Analysis Of Crism Data To Characterize The Composition Of Mawrth Vallis, Melissa Luna Mar 2018

Multivariate Spectral Analysis Of Crism Data To Characterize The Composition Of Mawrth Vallis, Melissa Luna

Melissa Luna

No abstract provided.


Methods For Scalar-On-Function Regression, Philip T. Reiss, Jeff Goldsmith, Han Lin Shang, R. Todd Ogden Jul 2017

Methods For Scalar-On-Function Regression, Philip T. Reiss, Jeff Goldsmith, Han Lin Shang, R. Todd Ogden

Philip T. Reiss

Recent years have seen an explosion of activity in the field of functional data analysis (FDA), in which curves, spectra, images, etc. are considered as basic functional data units. A central problem in FDA is how to fit regression models with scalar responses and functional data points as predictors. We review some of the main approaches to this problem, categorizing the basic model types as linear, nonlinear and nonparametric. We discuss publicly available software packages, and illustrate some of the procedures by application to a functional magnetic resonance imaging dataset.


Penalized Nonparametric Scalar-On-Function Regression Via Principal Coordinates, Philip T. Reiss, David L. Miller, Pei-Shien Wu, Wen-Yu Hua Dec 2016

Penalized Nonparametric Scalar-On-Function Regression Via Principal Coordinates, Philip T. Reiss, David L. Miller, Pei-Shien Wu, Wen-Yu Hua

Philip T. Reiss

A number of classical approaches to nonparametric regression have recently been extended to the case of functional predictors. This paper introduces a new method of this type, which extends intermediate-rank penalized smoothing to scalar-on-function regression. The core idea is to regress the response on leading principal coordinates defined by a relevant distance among the functional predictors, while applying a ridge penalty. Our publicly available implementation, based on generalized additive modeling software, allows for fast optimal tuning parameter selection and for extensions to multiple functional predictors, exponential family-valued responses, and mixed-effects models. In an application to signature verification data, the proposed …


Modeling The Evolution Of Dynamic Brain Processes During An Associative Learning Experiment, Mark Fiecas, Hernando Ombao Dec 2015

Modeling The Evolution Of Dynamic Brain Processes During An Associative Learning Experiment, Mark Fiecas, Hernando Ombao

Mark Fiecas

Our goal is to use local field potentials (LFPs) to rigorously study changes in neuronal activity in the hippocampus and the nucleus accumbens over the course of an associative learning experiment. We show that the spectral properties of the LFPs changed during the experiment. While many statistical models take into account nonstationarity within a single trial of the experiment, the evolution of brain dynamics across trials is often ignored. In this paper, we developed a novel time series model that captures both sources of nonstationarity. Under the proposed model we rigorously define the spectral density matrix so that it evolves …


公的統計における欠測値補定の研究:多重代入法と単一代入法(高橋将宜), Masayoshi Takahashi Jun 2015

公的統計における欠測値補定の研究:多重代入法と単一代入法(高橋将宜), Masayoshi Takahashi

Masayoshi Takahashi

No abstract provided.


Promoting Similarity Of Model Sparsity Structures In Integrative Analysis Of Cancer Genetic Data, Shuangge Ma Dec 2014

Promoting Similarity Of Model Sparsity Structures In Integrative Analysis Of Cancer Genetic Data, Shuangge Ma

Shuangge Ma

In profiling studies, the analysis of a single dataset often leads to unsatisfactory results because of the small sample size. Multi-dataset analysis utilizes information across multiple independent datasets and outperforms single-dataset analysis. Among the available multi-dataset analysis methods, integrative analysis methods aggregate and analyze raw data and outperform meta-analysis methods, which analyze multiple datasets separately and then pool summary statistics. In this study, we conduct integrative analysis and marker selection under the heterogeneity structure, which allows different datasets to have overlapping but not necessarily identical sets of markers. Under certain scenarios, it is reasonable to expect some similarity of identified …


欠測値補定の診断手法としての多重代入法(高橋将宜), Masayoshi Takahashi Sep 2014

欠測値補定の診断手法としての多重代入法(高橋将宜), Masayoshi Takahashi

Masayoshi Takahashi

No abstract provided.


Depicting Estimates Using The Intercept In Meta-Regression Models: The Moving Constant Technique, Blair T. Johnson Dr., Tania B. Huedo-Medina Dr. Aug 2014

Depicting Estimates Using The Intercept In Meta-Regression Models: The Moving Constant Technique, Blair T. Johnson Dr., Tania B. Huedo-Medina Dr.

Blair T. Johnson

In any scientific discipline, the ability to portray research patterns graphically often aids greatly in interpreting a phenomenon. In part to depict phenomena, the statistics and capabilities of meta-analytic models have grown increasingly sophisticated. Accordingly, this article details how to move the constant in weighted meta-analysis regression models (viz. “meta-regression”) to illuminate the patterns in such models across a range of complexities. Although it is commonly ignored in practice, the constant (or intercept) in such models can be indispensible when it is not relegated to its usual static role. The moving constant technique makes possible estimates and confidence intervals at …


Comparison Of Methods For Estimating The Effect Of Salvage Therapy In Prostate Cancer When Treatment Is Given By Indication., Jeremy Taylor, Jincheng Shen, Edward Kennedy, Lu Wang, Douglas Schaubel Dec 2013

Comparison Of Methods For Estimating The Effect Of Salvage Therapy In Prostate Cancer When Treatment Is Given By Indication., Jeremy Taylor, Jincheng Shen, Edward Kennedy, Lu Wang, Douglas Schaubel

Edward H. Kennedy

For patients who were previously treated for prostate cancer, salvage hormone therapy is frequently given when the longitudinal marker prostate-specific antigen begins to rise during follow-up. Because the treatment is given by indication, estimating the effect of the hormone therapy is challenging. In a previous paper we described two methods for estimating the treatment effect, called two-stage and sequential stratification. The two-stage method involved modeling the longitudinal and survival data. The sequential stratification method involves contrasts within matched sets of people, where each matched set includes people who did and did not receive hormone therapy. In this paper, we evaluate …


From Amazon To Apple: Modeling Online Retail Sales, Purchase Incidence And Visit Behavior, Anastasios Panagiotelis, Michael S. Smith, Peter Danaher Dec 2013

From Amazon To Apple: Modeling Online Retail Sales, Purchase Incidence And Visit Behavior, Anastasios Panagiotelis, Michael S. Smith, Peter Danaher

Michael Stanley Smith

In this study we propose a multivariate stochastic model for website visit duration, page views, purchase incidence and the sale amount for online retailers. The model is constructed by composition from carefully selected distributions, and involves copula components. It allows for the strong nonlinear relationships between the sales and visit variables to be explored in detail, and can be used to construct sales predictions. The model is readily estimated using maximum likelihood, making it an attractive choice in practice given the large sample sizes that are commonplace in online retail studies. We examine a number of top-ranked U.S. online retailers, …


Spectral Density Shrinkage For High-Dimensional Time Series, Mark Fiecas, Rainer Von Sachs Dec 2013

Spectral Density Shrinkage For High-Dimensional Time Series, Mark Fiecas, Rainer Von Sachs

Mark Fiecas

Time series data obtained from neurophysiological signals is often high-dimensional and the length of the time series is often short relative to the number of dimensions. Thus, it is difficult or sometimes impossible to compute statistics that are based on the spectral density matrix because these matrices are numerically unstable. In this work, we discuss the importance of regularization for spectral analysis of high-dimensional time series and propose shrinkage estimation for estimating high-dimensional spectral density matrices. The shrinkage estimator is derived from a penalized log-likelihood, and the optimal penalty parameter has a closed-form solution, which can be estimated using the …


Hierarchical Vector Auto-Regressive Models And Their Applications To Multi-Subject Effective Connectivity, Cristina Gorrostieta, Mark Fiecas, Hernando Ombao, Erin Burke, Steven Cramer Oct 2013

Hierarchical Vector Auto-Regressive Models And Their Applications To Multi-Subject Effective Connectivity, Cristina Gorrostieta, Mark Fiecas, Hernando Ombao, Erin Burke, Steven Cramer

Mark Fiecas

Vector auto-regressive (VAR) models typically form the basis for constructing directed graphical models for investigating connectivity in a brain network with brain regions of interest (ROIs) as nodes. There are limitations in the standard VAR models. The number of parameters in the VAR model increases quadratically with the number of ROIs and linearly with the order of the model and thus due to the large number of parameters, the model could pose serious estimation problems. Moreover, when applied to imaging data, the standard VAR model does not account for variability in the connectivity structure across all subjects. In this paper, …


Instrumental Variable Analyses: Exploiting Natural Randomness To Understand Causal Mechanisms, Theodore Iwashyna, Edward Kennedy Dec 2012

Instrumental Variable Analyses: Exploiting Natural Randomness To Understand Causal Mechanisms, Theodore Iwashyna, Edward Kennedy

Edward H. Kennedy

Instrumental variable analysis is a technique commonly used in the social sciences to provide evidence that a treatment causes an outcome, as contrasted with evidence that a treatment is merely associated with differences in an outcome. To extract such strong evidence from observational data, instrumental variable analysis exploits situations where some degree of randomness affects how patients are selected for a treatment. An instrumental variable is a characteristic of the world that leads some people to be more likely to get the specific treatment we want to study but does not otherwise change thosepatients’ outcomes. This seminar explains, in nonmathematical …


Theory And Methods For Gini Coefficients Partitioned By Quantile Range, Chaitra Nagaraja Dec 2012

Theory And Methods For Gini Coefficients Partitioned By Quantile Range, Chaitra Nagaraja

Chaitra H Nagaraja

The Gini coefficient is frequently used to measure inequality in populations. However, it is possible that inequality levels may change over time differently for disparate subgroups which cannot be detected with population-level estimates only. Therefore, it may be informative to examine inequality separately for these segments. The case where the population is split into two segments based on non-overlapping quantile ranges is examined. Asymptotic theory is derived and practical methods to estimate standard errors and construct confidence intervals using resampling methods are developed. An application to per capita income across census tracts using American Community Survey data is considered.


A Comparison Of Periodic Autoregressive And Dynamic Factor Models In Intraday Energy Demand Forecasting, Thomas Mestekemper, Goeran Kauermann, Michael Smith Dec 2012

A Comparison Of Periodic Autoregressive And Dynamic Factor Models In Intraday Energy Demand Forecasting, Thomas Mestekemper, Goeran Kauermann, Michael Smith

Michael Stanley Smith

We suggest a new approach for forecasting energy demand at an intraday resolution. Demand in each intraday period is modeled using semiparametric regression smoothing to account for calendar and weather components. Residual serial dependence is captured by one of two multivariate stationary time series models, with dimension equal to the number of intraday periods. These are a periodic autoregression and a dynamic factor model. We show the benefits of our approach in the forecasting of district heating demand in a steam network in Germany and aggregate electricity demand in the state of Victoria, Australia. In both studies, accounting for weather …


Bayesian Approaches To Copula Modelling, Michael S. Smith Dec 2012

Bayesian Approaches To Copula Modelling, Michael S. Smith

Michael Stanley Smith

Copula models have become one of the most widely used tools in the applied modelling of multivariate data. Similarly, Bayesian methods are increasingly used to obtain efficient likelihood-based inference. However, to date, there has been only limited use of Bayesian approaches in the formulation and estimation of copula models. This article aims to address this shortcoming in two ways. First, to introduce copula models and aspects of copula theory that are especially relevant for a Bayesian analysis. Second, to outline Bayesian approaches to formulating and estimating copula models, and their advantages over alternative methods. Copulas covered include Archimedean, copulas constructed …


Quantifying Temporal Correlations: A Test-Retest Evaluation Of Functional Connectivity In Resting-State Fmri, Mark Fiecas, Hernando Ombao, Dan Van Lunen, Richard Baumgartner, Alexandre Coimbra, Dai Feng Dec 2012

Quantifying Temporal Correlations: A Test-Retest Evaluation Of Functional Connectivity In Resting-State Fmri, Mark Fiecas, Hernando Ombao, Dan Van Lunen, Richard Baumgartner, Alexandre Coimbra, Dai Feng

Mark Fiecas

There have been many interpretations of functional connectivity and proposed measures of temporal correlations between BOLD signals across different brain areas. These interpretations yield from many studies on functional connectivity using resting-state fMRI data that have emerged in recent years. However, not all of these studies used the same metrics for quantifying the temporal correlations between brain regions. In this paper, we use a public-domain test–retest resting-state fMRI data set to perform a systematic investigation of the stability of the metrics that are often used in resting-state functional connectivity (FC) studies. The fMRI data set was collected across three different …


Obtaining Critical Values For Test Of Markov Regime Switching, Douglas G. Steigerwald, Valerie Bostwick Oct 2012

Obtaining Critical Values For Test Of Markov Regime Switching, Douglas G. Steigerwald, Valerie Bostwick

Douglas G. Steigerwald

For Markov regime-switching models, testing for the possible presence of more than one regime requires the use of a non-standard test statistic. Carter and Steigerwald (forthcoming, Journal of Econometric Methods) derive in detail the analytic steps needed to implement the test ofMarkov regime-switching proposed by Cho and White (2007, Econometrica). We summarize the implementation steps and address the computational issues that arise. A new command to compute regime-switching critical values, rscv, is introduced and presented in the context of empirical research.


Big Data And The Future, Sherri Rose Jul 2012

Big Data And The Future, Sherri Rose

Sherri Rose

No abstract provided.


A Logistic L-Moment-Based Analog For The Tukey G-H, G, H, And H-H System Of Distributions, Todd C. Headrick, Mohan D. Pant Jun 2012

A Logistic L-Moment-Based Analog For The Tukey G-H, G, H, And H-H System Of Distributions, Todd C. Headrick, Mohan D. Pant

Mohan Dev Pant

This paper introduces a standard logistic L-moment-based system of distributions. The proposed system is an analog to the standard normal conventional moment-based Tukey g-h, g, h, and h-h system of distributions. The system also consists of four classes of distributions and is referred to as (i) asymmetric γ-κ, (ii) log-logistic γ, (iii) symmetric κ, and (iv) asymmetric κL-κR. The system can be used in a variety of settings such as simulation or modeling events—most notably when heavy-tailed distributions are of interest. A procedure is also described for simulating γ-κ, γ, κ, and κL-κR distributions with specified L-moments and L-correlations. The …


Targeted Maximum Likelihood Estimation Of Natural Direct Effects, Wenjing Zheng, Mark Van Der Laan Jan 2012

Targeted Maximum Likelihood Estimation Of Natural Direct Effects, Wenjing Zheng, Mark Van Der Laan

Wenjing Zheng

In many causal inference problems, one is interested in the direct causal effect of an exposure on an outcome of interest that is not mediated by certain intermediate variables. Robins and Greenland (1992) and Pearl (2001) formalized the definition of two types of direct effects (natural and controlled) under the counterfactual framework. The efficient scores (under a nonparametric model) for the various natural effect parameters and their general robustness conditions, as well as an estimating equation based estimator using the efficient score, are provided in Tchetgen Tchetgen and Shpitser (2011b). In this article, we apply the targeted maximum likelihood framework …


Testing For Regime Swtiching: A Comment, Douglas Steigerwald, Andrew Carter Dec 2011

Testing For Regime Swtiching: A Comment, Douglas Steigerwald, Andrew Carter

Douglas G. Steigerwald

An autoregressive model with Markov-regime switching is analyzed that reflects on the properties of the quasi-likelihood ratio test developed by Cho and White (2007). For such a model, we show that consistency of the quasi-maximum likelihood estimator for the population parameter values, on which consistency of the test is based, does not hold. We describe a condition that ensures consistency of the estimator and discuss the consistency of the test in the absence of consistency of the estimator.


Modeling Dependence Using Skew T Copulas: Bayesian Inference And Applications, Michael S. Smith, Quan Gan, Robert Kohn Dec 2011

Modeling Dependence Using Skew T Copulas: Bayesian Inference And Applications, Michael S. Smith, Quan Gan, Robert Kohn

Michael Stanley Smith

[THIS IS AN AUGUST 2010 REVISION THAT REPLACES ALL PREVIOUS VERSIONS.]

We construct a copula from the skew t distribution of Sahu, Dey & Branco (2003). This copula can capture asymmetric and extreme dependence between variables, and is one of the few copulas that can do so and still be used in high dimensions effectively. However, it is difficult to estimate the copula model by maximum likelihood when the multivariate dimension is high, or when some or all of the marginal distributions are discrete-valued, or when the parameters in the marginal distributions and copula are estimated jointly. We therefore propose …


Estimation Of Copula Models With Discrete Margins Via Bayesian Data Augmentation, Michael S. Smith, Mohamad A. Khaled Dec 2011

Estimation Of Copula Models With Discrete Margins Via Bayesian Data Augmentation, Michael S. Smith, Mohamad A. Khaled

Michael Stanley Smith

Estimation of copula models with discrete margins is known to be difficult beyond the bivariate case. We show how this can be achieved by augmenting the likelihood with latent variables, and computing inference using the resulting augmented posterior. To evaluate this we propose two efficient Markov chain Monte Carlo sampling schemes. One generates the latent variables as a block using a Metropolis-Hasting step with a proposal that is close to its target distribution, the other generates them one at a time. Our method applies to all parametric copulas where the conditional copula functions can be evaluated, not just elliptical copulas …


Asymptotic Theory For Cross-Validated Targeted Maximum Likelihood Estimation, Wenjing Zheng, Mark J. Van Der Laan Jul 2011

Asymptotic Theory For Cross-Validated Targeted Maximum Likelihood Estimation, Wenjing Zheng, Mark J. Van Der Laan

Wenjing Zheng

We consider a targeted maximum likelihood estimator of a path-wise differentiable parameter of the data generating distribution in a semi-parametric model based on observing n independent and identically distributed observations. The targeted maximum likelihood estimator (TMLE) uses V-fold sample splitting for the initial estimator in order to make the TMLE maximally robust in its bias reduction step. We prove a general theorem that states asymptotic efficiency (and thereby regularity) of the targeted maximum likelihood estimator when the initial estimator is consistent and a second order term converges to zero in probability at a rate faster than the square root of …


Rejoinder: Estimation Issues For Copulas Applied To Marketing Data, Peter Danaher, Michael Smith Dec 2010

Rejoinder: Estimation Issues For Copulas Applied To Marketing Data, Peter Danaher, Michael Smith

Michael Stanley Smith

Estimating copula models using Bayesian methods presents some subtle challenges, ranging from specification of the prior to computational tractability. There is also some debate about what is the most appropriate copula to employ from those available. We address these issues here and conclude by discussing further applications of copula models in marketing.


Forecasting Television Ratings, Peter Danaher, Tracey Dagger, Michael Smith Dec 2010

Forecasting Television Ratings, Peter Danaher, Tracey Dagger, Michael Smith

Michael Stanley Smith

Despite the state of flux in media today, television remains the dominant player globally for advertising spend. Since television advertising time is purchased on the basis of projected future ratings, and ad costs have skyrocketed, there is increasing pressure to forecast television ratings accurately. Previous forecasting methods are not generally very reliable and many have not been validated, but more distressingly, none have been tested in today’s multichannel environment. In this study we compare 8 different forecasting models, ranging from a naïve empirical method to a state-of-the-art Bayesian model-averaging method. Our data come from a recent time period, 2004-2008 in …


Accurately Sized Test Statistics With Misspecified Conditional Homoskedasticity, Douglas Steigerwald, Jack Erb Dec 2010

Accurately Sized Test Statistics With Misspecified Conditional Homoskedasticity, Douglas Steigerwald, Jack Erb

Douglas G. Steigerwald

We study the finite-sample performance of test statistics in linear regression models where the error dependence is of unknown form. With an unknown dependence structure there is traditionally a trade-off between the maximum lag over which the correlation is estimated (the bandwidth) and the amount of heterogeneity in the process. When allowing for heterogeneity, through conditional heteroskedasticity, the correlation at far lags is generally omitted and the resultant inflation of the empirical size of test statistics has long been recognized. To allow for correlation at far lags we study test statistics constructed under the possibly misspecified assumption of conditional homoskedasticity. …


The Underground Economy Of Fake Antivirus Software, Douglas Steigerwald, Brett Stone-Gross, Ryan Abman, Richard Kemmerer, Christopher Kruegel, Giovanni Vigna Dec 2010

The Underground Economy Of Fake Antivirus Software, Douglas Steigerwald, Brett Stone-Gross, Ryan Abman, Richard Kemmerer, Christopher Kruegel, Giovanni Vigna

Douglas G. Steigerwald

Fake antivirus (AV) programs have been utilized to defraud millions of computer users into paying as much as one hundred dollars for a phony software license. As a result, fake AV software has evolved into one of the most lucrative criminal operations on the Internet. In this paper, we examine the operations of three large-scale fake AV businesses, lasting from three months to more than two years. More precisely, we present the results of our analysis on a trove of data obtained from several backend servers that the cybercriminals used to drive their scam operations. Our investigations reveal that these …


Windows Executable For Gaussian Copula With Nbd Margins, Michael S. Smith Dec 2010

Windows Executable For Gaussian Copula With Nbd Margins, Michael S. Smith

Michael Stanley Smith

This is an example Windows 32bit program to estimate a Gaussian copula model with NBD margins. The margins are estimated first using MLE, and the copula second using Bayesian MCMC. The model was discussed in Danaher & Smith (2011; Marketing Science) as example 4 (section 4.2).