Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Applied Statistics

PDF

Selected Works

Keyword
Publication Year
Publication

Articles 1 - 30 of 33

Full-Text Articles in Statistical Models

Stability Of Single-Parent Gene Expression Complementation In Maize Hybrids Upon Water Deficit Stress, Caroline Marcon, Anja Paschold, Waqas Ahmed Malik, Andrew Lithio, Jutta A. Baldauf, Lena Altrogge, Nina Opitz, Christa Lanz, Heiko Schoof, Dan Nettleton, Hans-Peter Piepho, Frank Hochholdinger Jul 2019

Stability Of Single-Parent Gene Expression Complementation In Maize Hybrids Upon Water Deficit Stress, Caroline Marcon, Anja Paschold, Waqas Ahmed Malik, Andrew Lithio, Jutta A. Baldauf, Lena Altrogge, Nina Opitz, Christa Lanz, Heiko Schoof, Dan Nettleton, Hans-Peter Piepho, Frank Hochholdinger

Dan Nettleton

Heterosis is the superior performance of F1 hybrids compared with their homozygous, genetically distinct parents. In this study, we monitored the transcriptomic divergence of the maize (Zea mays) inbred lines B73 and Mo17 and their reciprocal F1 hybrid progeny in primary roots under control and water deficit conditions simulated by polyethylene glycol treatment. Single-parent expression (SPE) of genes is an extreme instance of gene expression complementation, in which genes are active in only one of two parents but are expressed in both reciprocal hybrids. In this study, 1,997 genes only expressed in B73 and 2,024 genes …


Genomic Neighborhoods For Arabidopsisretrotransposons: A Role For Targeted Integration In The Distribution Of The Metaviridae, Brooke D. Peterson-Burch, Dan Nettleton, Daniel F. Voytas Jul 2019

Genomic Neighborhoods For Arabidopsisretrotransposons: A Role For Targeted Integration In The Distribution Of The Metaviridae, Brooke D. Peterson-Burch, Dan Nettleton, Daniel F. Voytas

Dan Nettleton

Background: Retrotransposons are an abundant component of eukaryotic genomes. The high quality of the Arabidopsis thaliana genome sequence makes it possible to comprehensively characterize retroelement populations and explore factors that contribute to their genomic distribution.

Results: We identified the full complement of A. thaliana long terminal repeat (LTR) retroelements using RetroMap, a software tool that iteratively searches genome sequences for reverse transcriptases and then defines retroelement insertions. Relative ages of full-length elements were estimated by assessing sequence divergence between LTRs: the Pseudoviridae were significantly younger than the Metaviridae. All retroelement insertions were mapped onto the genome sequence and their distribution …


Empirical Bayes Analysis Of Rna-Seq Data For Detection Of Gene Expression Heterosis, Jarad Niemi, Eric Mittman, Will Landau, Dan Nettleton Jun 2019

Empirical Bayes Analysis Of Rna-Seq Data For Detection Of Gene Expression Heterosis, Jarad Niemi, Eric Mittman, Will Landau, Dan Nettleton

Dan Nettleton

An important type of heterosis, known as hybrid vigor, refers to the enhancements in the phenotype of hybrid progeny relative to their inbred parents. Although hybrid vigor is extensively utilized in agriculture, its molecular basis is still largely unknown. In an effort to understand phenotypic heterosis at the molecular level, researchers are measuring transcript abundance levels of thousands of genes in parental inbred lines and their hybrid offspring using RNA sequencing (RNA-seq) technology. The resulting data allow researchers to search for evidence of gene expression heterosis as one potential molecular mechanism underlying heterosis of agriculturally important traits. The null hypotheses …


Penalized Nonparametric Scalar-On-Function Regression Via Principal Coordinates, Philip T. Reiss, David L. Miller, Pei-Shien Wu, Wen-Yu Hua Dec 2016

Penalized Nonparametric Scalar-On-Function Regression Via Principal Coordinates, Philip T. Reiss, David L. Miller, Pei-Shien Wu, Wen-Yu Hua

Philip T. Reiss

A number of classical approaches to nonparametric regression have recently been extended to the case of functional predictors. This paper introduces a new method of this type, which extends intermediate-rank penalized smoothing to scalar-on-function regression. The core idea is to regress the response on leading principal coordinates defined by a relevant distance among the functional predictors, while applying a ridge penalty. Our publicly available implementation, based on generalized additive modeling software, allows for fast optimal tuning parameter selection and for extensions to multiple functional predictors, exponential family-valued responses, and mixed-effects models. In an application to signature verification data, the proposed …


Depicting Estimates Using The Intercept In Meta-Regression Models: The Moving Constant Technique, Blair T. Johnson Dr., Tania B. Huedo-Medina Dr. Aug 2014

Depicting Estimates Using The Intercept In Meta-Regression Models: The Moving Constant Technique, Blair T. Johnson Dr., Tania B. Huedo-Medina Dr.

Blair T. Johnson

In any scientific discipline, the ability to portray research patterns graphically often aids greatly in interpreting a phenomenon. In part to depict phenomena, the statistics and capabilities of meta-analytic models have grown increasingly sophisticated. Accordingly, this article details how to move the constant in weighted meta-analysis regression models (viz. “meta-regression”) to illuminate the patterns in such models across a range of complexities. Although it is commonly ignored in practice, the constant (or intercept) in such models can be indispensible when it is not relegated to its usual static role. The moving constant technique makes possible estimates and confidence intervals at …


A General Framework For Infrastructure System Reliability Modelling And Analysis, Payam Mokhtarian, Mohammad-Reza Namazi-Rad, Tin Kin Ho, Mahmoud Efatmaneshnik Mar 2014

A General Framework For Infrastructure System Reliability Modelling And Analysis, Payam Mokhtarian, Mohammad-Reza Namazi-Rad, Tin Kin Ho, Mahmoud Efatmaneshnik

Payam Mokhtarian

An infrastructure system is inherently complex, with layers of both explicitly defined and hidden or subtle interfaces with other infrastructure systems and human users. High availability is desired, which implies stringent requirements on reliability and safety. Reliability analysis typically starts at component or sub-system level and aggregates through the system functional hierarchy. Because of the system complexity, incorporating occurrences of all possible interactions and scenarios is not always practical and failure data is often limited. Moreover, there are unobserved events among the sub-systems distributing either randomly or with temporal trend. To facilitate reliability analysis amid the complex environment and uncertain …


From Amazon To Apple: Modeling Online Retail Sales, Purchase Incidence And Visit Behavior, Anastasios Panagiotelis, Michael S. Smith, Peter Danaher Dec 2013

From Amazon To Apple: Modeling Online Retail Sales, Purchase Incidence And Visit Behavior, Anastasios Panagiotelis, Michael S. Smith, Peter Danaher

Michael Stanley Smith

In this study we propose a multivariate stochastic model for website visit duration, page views, purchase incidence and the sale amount for online retailers. The model is constructed by composition from carefully selected distributions, and involves copula components. It allows for the strong nonlinear relationships between the sales and visit variables to be explored in detail, and can be used to construct sales predictions. The model is readily estimated using maximum likelihood, making it an attractive choice in practice given the large sample sizes that are commonplace in online retail studies. We examine a number of top-ranked U.S. online retailers, …


Spectral Density Shrinkage For High-Dimensional Time Series, Mark Fiecas, Rainer Von Sachs Dec 2013

Spectral Density Shrinkage For High-Dimensional Time Series, Mark Fiecas, Rainer Von Sachs

Mark Fiecas

Time series data obtained from neurophysiological signals is often high-dimensional and the length of the time series is often short relative to the number of dimensions. Thus, it is difficult or sometimes impossible to compute statistics that are based on the spectral density matrix because these matrices are numerically unstable. In this work, we discuss the importance of regularization for spectral analysis of high-dimensional time series and propose shrinkage estimation for estimating high-dimensional spectral density matrices. The shrinkage estimator is derived from a penalized log-likelihood, and the optimal penalty parameter has a closed-form solution, which can be estimated using the …


Hierarchical Vector Auto-Regressive Models And Their Applications To Multi-Subject Effective Connectivity, Cristina Gorrostieta, Mark Fiecas, Hernando Ombao, Erin Burke, Steven Cramer Oct 2013

Hierarchical Vector Auto-Regressive Models And Their Applications To Multi-Subject Effective Connectivity, Cristina Gorrostieta, Mark Fiecas, Hernando Ombao, Erin Burke, Steven Cramer

Mark Fiecas

Vector auto-regressive (VAR) models typically form the basis for constructing directed graphical models for investigating connectivity in a brain network with brain regions of interest (ROIs) as nodes. There are limitations in the standard VAR models. The number of parameters in the VAR model increases quadratically with the number of ROIs and linearly with the order of the model and thus due to the large number of parameters, the model could pose serious estimation problems. Moreover, when applied to imaging data, the standard VAR model does not account for variability in the connectivity structure across all subjects. In this paper, …


Bayesian Approaches To Copula Modelling, Michael S. Smith Dec 2012

Bayesian Approaches To Copula Modelling, Michael S. Smith

Michael Stanley Smith

Copula models have become one of the most widely used tools in the applied modelling of multivariate data. Similarly, Bayesian methods are increasingly used to obtain efficient likelihood-based inference. However, to date, there has been only limited use of Bayesian approaches in the formulation and estimation of copula models. This article aims to address this shortcoming in two ways. First, to introduce copula models and aspects of copula theory that are especially relevant for a Bayesian analysis. Second, to outline Bayesian approaches to formulating and estimating copula models, and their advantages over alternative methods. Copulas covered include Archimedean, copulas constructed …


Big Data And The Future, Sherri Rose Jul 2012

Big Data And The Future, Sherri Rose

Sherri Rose

No abstract provided.


Modeling Dependence Using Skew T Copulas: Bayesian Inference And Applications, Michael S. Smith, Quan Gan, Robert Kohn Dec 2011

Modeling Dependence Using Skew T Copulas: Bayesian Inference And Applications, Michael S. Smith, Quan Gan, Robert Kohn

Michael Stanley Smith

[THIS IS AN AUGUST 2010 REVISION THAT REPLACES ALL PREVIOUS VERSIONS.]

We construct a copula from the skew t distribution of Sahu, Dey & Branco (2003). This copula can capture asymmetric and extreme dependence between variables, and is one of the few copulas that can do so and still be used in high dimensions effectively. However, it is difficult to estimate the copula model by maximum likelihood when the multivariate dimension is high, or when some or all of the marginal distributions are discrete-valued, or when the parameters in the marginal distributions and copula are estimated jointly. We therefore propose …


Estimation Of Copula Models With Discrete Margins Via Bayesian Data Augmentation, Michael S. Smith, Mohamad A. Khaled Dec 2011

Estimation Of Copula Models With Discrete Margins Via Bayesian Data Augmentation, Michael S. Smith, Mohamad A. Khaled

Michael Stanley Smith

Estimation of copula models with discrete margins is known to be difficult beyond the bivariate case. We show how this can be achieved by augmenting the likelihood with latent variables, and computing inference using the resulting augmented posterior. To evaluate this we propose two efficient Markov chain Monte Carlo sampling schemes. One generates the latent variables as a block using a Metropolis-Hasting step with a proposal that is close to its target distribution, the other generates them one at a time. Our method applies to all parametric copulas where the conditional copula functions can be evaluated, not just elliptical copulas …


Modeling Multivariate Distributions Using Copulas: Applications In Marketing, Peter J. Danaher, Michael S. Smith Dec 2010

Modeling Multivariate Distributions Using Copulas: Applications In Marketing, Peter J. Danaher, Michael S. Smith

Michael Stanley Smith

In this research we introduce a new class of multivariate probability models to the marketing literature. Known as “copula models”, they have a number of attractive features. First, they permit the combination of any univariate marginal distributions that need not come from the same distributional family. Second, a particular class of copula models, called “elliptical copula”, have the property that they increase in complexity at a much slower rate than existing multivariate probability models as the number of dimensions increase. Third, they are very general, encompassing a number of existing multivariate models, and provide a framework for generating many more. …


Bicycle Commuting In Melbourne During The 2000s Energy Crisis: A Semiparametric Analysis Of Intraday Volumes, Michael S. Smith, Goeran Kauermann Dec 2010

Bicycle Commuting In Melbourne During The 2000s Energy Crisis: A Semiparametric Analysis Of Intraday Volumes, Michael S. Smith, Goeran Kauermann

Michael Stanley Smith

Cycling is attracting renewed attention as a mode of transport in western urban environments, yet the determinants of usage are poorly understood. In this paper we investigate some of these using intraday bicycle volumes collected via induction loops located at ten bike paths in the city of Melbourne, Australia, between December 2005 and June 2008. The data are hourly counts at each location, with temporal and spatial disaggregation allowing for the impact of meteorology to be measured accurately for the first time. Moreover, during this period petrol prices varied dramatically and the data also provide a unique opportunity to assess …


The Generalized Shrinkage Estimator For The Analysis Of Functional Connectivity Of Brain Signals, Mark Fiecas, Hernando Ombao Dec 2010

The Generalized Shrinkage Estimator For The Analysis Of Functional Connectivity Of Brain Signals, Mark Fiecas, Hernando Ombao

Mark Fiecas

We develop a new statistical method for estimating functional connectivity between neurophysiological signals represented by a multivariate time series. We use partial coherence as the measure of functional connectivity. Partial coherence identifies the frequency bands that drive the direct linear association between any pair of channels. To estimate partial coherence, one would first need an estimate of the spectral density matrix of the multivariate time series. Parametric estimators of the spectral density matrix provide good frequency resolution but could be sensitive when the parametric model is misspecified. Smoothing-based nonparametric estimators are robust to model misspecification and are consistent but may …


An Introduction To Propensity-Score Methods For Reducing Confounding In Observational Studies, Peter C. Austin Dec 2010

An Introduction To Propensity-Score Methods For Reducing Confounding In Observational Studies, Peter C. Austin

Peter Austin

The propensity score is the probability of treatment assignment conditional on observed baseline characteristics. The propensity score allows one to design and analyze an observational (non-randomized) study so that it mimics some of the particular characteristics of a randomized controlled trial. In particular, the propensity score is a balancing score: conditional on the propensity score, the distribution of observed baseline covariates will be similar between treated and untreated subjects. We describe four different propensity score methods: matching on the propensity score, stratification on the propensity score, inverse probability of treatment weighting using the propensity score, and covariate adjustment using the …


Modeling Longitudinal Data Using A Pair-Copula Decomposition Of Serial Dependence, Michael S. Smith, Aleksey Min, Carlos Almeida, Claudia Czado Nov 2010

Modeling Longitudinal Data Using A Pair-Copula Decomposition Of Serial Dependence, Michael S. Smith, Aleksey Min, Carlos Almeida, Claudia Czado

Michael Stanley Smith

Copulas have proven to be very successful tools for the flexible modelling of cross-sectional dependence. In this paper we express the dependence structure of continuous-valued time series data using a sequence of bivariate copulas. This corresponds to a type of decomposition recently called a ‘vine’ in the graphical models literature, where each copula is entitled a ‘pair-copula’. We propose a Bayesian approach for the estimation of this dependence structure for longitudinal data. Bayesian selection ideas are used to identify any independence pair-copulas, with the end result being a parsimonious representation of a time-inhomogeneous Markov process of varying order. Estimates are …


Men In Black: The Impact Of New Contracts On Football Referees’ Performances, Babatunde Buraimo, Alex Bryson, Rob Simmons Oct 2010

Men In Black: The Impact Of New Contracts On Football Referees’ Performances, Babatunde Buraimo, Alex Bryson, Rob Simmons

Dr Babatunde Buraimo

No abstract provided.


The 1905 Einstein Equation In A General Mathematical Analysis Model Of Quasars, Byron E. Bell May 2010

The 1905 Einstein Equation In A General Mathematical Analysis Model Of Quasars, Byron E. Bell

Byron E. Bell

The 1905 wave equation of Albert Einstein is a model that can be used in many areas, such as physics, applied mathematics, statistics, quantum chaos and financial mathematics, etc. I will give a proof from the equation of A. Einstein’s paper “Zur Elektrodynamik bewegter Körper” it will be done by removing the variable time (t) and the constant (c) the speed of light from the above equation and look at the factors that affect the model in a real analysis framework. Testing the model with SDSS-DR5 Quasar Catalog (Schneider +, 2007). Keywords: direction cosine, apparent magnitudes of optical light; ultraviolet …


Fast Function-On-Scalar Regression With Penalized Basis Expansions, Philip T. Reiss, Lei Huang, Maarten Mennes Dec 2009

Fast Function-On-Scalar Regression With Penalized Basis Expansions, Philip T. Reiss, Lei Huang, Maarten Mennes

Lei Huang

Regression models for functional responses and scalar predictors are often fitted by means of basis functions, with quadratic roughness penalties applied to avoid overfitting. The fitting approach described by Ramsay and Silverman in the 1990s amounts to a penalized ordinary least squares (P-OLS) estimator of the coefficient functions. We recast this estimator as a generalized ridge regression estimator, and present a penalized generalized least squares (P-GLS) alternative. We describe algorithms by which both estimators can be implemented, with automatic selection of optimal smoothing parameters, in a more computationally efficient manner than has heretofore been available. We discuss pointwise confidence intervals …


Bayesian Inference For A Periodic Stochastic Volatility Model Of Intraday Electricity Prices, Michael S. Smith Dec 2009

Bayesian Inference For A Periodic Stochastic Volatility Model Of Intraday Electricity Prices, Michael S. Smith

Michael Stanley Smith

The Gaussian stochastic volatility model is extended to allow for periodic autoregressions (PAR) in both the level and log-volatility process. Each PAR is represented as a first order vector autoregression for a longitudinal vector of length equal to the period. The periodic stochastic volatility model is therefore expressed as a multivariate stochastic volatility model. Bayesian posterior inference is computed using a Markov chain Monte Carlo scheme for the multivariate representation. A circular prior that exploits the periodicity is suggested for the log-variance of the log-volatilities. The approach is applied to estimate a periodic stochastic volatility model for half-hourly electricity prices …


Bayesian Skew Selection For Multivariate Models, Michael S. Smith, Anastasios Panagiotelis Dec 2009

Bayesian Skew Selection For Multivariate Models, Michael S. Smith, Anastasios Panagiotelis

Michael Stanley Smith

We develop a Bayesian approach for the selection of skew in multivariate skew t distributions constructed through hidden conditioning in the manners suggested by either Azzalini and Capitanio (2003) or Sahu, Dey and Branco~(2003). We show that the skew coefficients for each margin are the same for the standardized versions of both distributions. We introduce binary indicators to denote whether there is symmetry, or skew, in each dimension. We adopt a proper beta prior on each non-zero skew coefficient, and derive the corresponding prior on the skew parameters. In both distributions we show that as the degrees of freedom increases, …


Fast Function-On-Scalar Regression With Penalized Basis Expansions, Philip T. Reiss, Lei Huang, Maarten Mennes Dec 2009

Fast Function-On-Scalar Regression With Penalized Basis Expansions, Philip T. Reiss, Lei Huang, Maarten Mennes

Philip T. Reiss

Regression models for functional responses and scalar predictors are often fitted by means of basis functions, with quadratic roughness penalties applied to avoid overfitting. The fitting approach described by Ramsay and Silverman in the 1990s amounts to a penalized ordinary least squares (P-OLS) estimator of the coefficient functions. We recast this estimator as a generalized ridge regression estimator, and present a penalized generalized least squares (P-GLS) alternative. We describe algorithms by which both estimators can be implemented, with automatic selection of optimal smoothing parameters, in a more computationally efficient manner than has heretofore been available. We discuss pointwise confidence intervals …


A Statistical Framework For The Analysis Of Chip-Seq Data, Pei Fen Kuan, Dongjun Chung, Guangjin Pan, James A. Thomson, Ron Stewart, Sunduz Keles Nov 2009

A Statistical Framework For The Analysis Of Chip-Seq Data, Pei Fen Kuan, Dongjun Chung, Guangjin Pan, James A. Thomson, Ron Stewart, Sunduz Keles

Sunduz Keles

Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) has revolutionalized experiments for genome-wide profiling of DNA-binding proteins, histone modifications, and nucleosome occupancy. As the cost of sequencing is decreasing, many researchers are switching from microarray-based technologies (ChIP-chip) to ChIP-Seq for genome-wide study of transcriptional regulation. Despite its increasing and well-deserved popularity, there is little work that investigates and accounts for sources of biases in the ChIP-Seq technology. These biases typically arise from both the standard pre-processing protocol and the underlying DNA sequence of the generated data.

We study data from a naked DNA sequencing experiment, which sequences non-cross-linked DNA after deproteinizing and …


Are (The Log-Odds Of) Hospital Mortality Rates Normally Distributed In Ontario? Implications For Studying Variations In Outcomes Of Medical Care, Peter C. Austin Dec 2008

Are (The Log-Odds Of) Hospital Mortality Rates Normally Distributed In Ontario? Implications For Studying Variations In Outcomes Of Medical Care, Peter C. Austin

Peter Austin

Objective: Hierarchical regression models are used to examine variations in outcomes following the provision of medical care across providers. These models frequently assume a normal distribution for the provider-specific random effects. Poincaré said, “Everyone believes in the normal law, the experimenters because they imagine it a mathematical theorem, and the mathematicians because they think it an experimental fact”. Our objective was to examine the appropriateness of this assumption when examining variations in mortality.

Study design and setting: We used Bayesian model selection methods to compare hierarchical regression models in which the provider-specific random effects were either a normal distribution or …


Are Credit Constraints In Italy Really More Binding In The South?, Claudio Lupi Dec 2004

Are Credit Constraints In Italy Really More Binding In The South?, Claudio Lupi

Claudio Lupi

This paper is motivated by a very practical question: are there significant geographical differences in the accessibility to the credit market on the part of Italian households? The investigation is carried using robust probit model. Estimation is carried out in a Bayesian framework. The results are somewhat surprising, showing that the area where households are more likely to be credit constrained is not the South, as could be easily imagined, but rather the highly developed and industrialized North-West.


Ensuring The Comparability Of Comparison Groups: Is Randomization Enough?, Vance Berger, Sherri Rose Dec 2003

Ensuring The Comparability Of Comparison Groups: Is Randomization Enough?, Vance Berger, Sherri Rose

Sherri Rose

It is widely believed that baseline imbalances in randomized trials must necessarily be random. In fact, there is a type of selection bias that can cause substantial, systematic and reproducible baseline imbalances of prognostic covariates even in properly randomized trials. It is possible, given complete data, to quantify both the susceptibility of a given trial to this type of selection bias and the extent to which selection bias appears to have caused either observable or unobservable baseline imbalances. Yet, in articles reporting on randomized trials, it is uncommon to find either these assessments or the information that would enable a …


Semiparametric Regression: An Exposition And Application To Print Advertising Data, Michael S. Smith, Robert Kohn, Sharat K. Mathur Dec 1999

Semiparametric Regression: An Exposition And Application To Print Advertising Data, Michael S. Smith, Robert Kohn, Sharat K. Mathur

Michael Stanley Smith

A new regression based approach is proposed for modeling marketing databases. The approach is Bayesian and provides a number of significant improvements over current methods. Independent variables can enter into the model in either a parametric or nonparametric manner, significant variables can be identified from a large number of potential regressors and an appropriate transformation of the dependent variable can be automatically selected from a discrete set of pre-specified candidate transformations. All these features are estimated simultaneously and automatically using a Bayesian hierarchical model coupled with a Gibbs sampling scheme. Being Bayesian, it is straightforward to introduce subjective information about …


Additive Nonparametric Regression With Autocorrelated Errors, Michael S. Smith, C Wong, Robert Kohn Dec 1997

Additive Nonparametric Regression With Autocorrelated Errors, Michael S. Smith, C Wong, Robert Kohn

Michael Stanley Smith

A Bayesian approach is presented for nonparametric estimation of an additive regression model with autocorrelated errors. Each of the potentially nonlinear components is modelled as a regression spline using many knots, while the errors are modelled by a high order stationary autoregressive process parameterised in terms of its autocorrelations. The distribution of significant knots and partial autocorrelations is accounted for using subset selection. Our approach also allows the selection of a suitable transformation of the dependent variable. All aspects of the model are estimated simultaneously using Markov chain Monte Carlo. It is shown empirically that the proposed approach works well …