Open Access. Powered by Scholars. Published by Universities.®

Multivariate Analysis Commons

Open Access. Powered by Scholars. Published by Universities.®

COBRA

Discipline
Keyword
Publication Year
Publication

Articles 1 - 30 of 78

Full-Text Articles in Multivariate Analysis

Generalized Matrix Decomposition Regression: Estimation And Inference For Two-Way Structured Data, Yue Wang, Ali Shojaie, Tim Randolph, Jing Ma Dec 2019

Generalized Matrix Decomposition Regression: Estimation And Inference For Two-Way Structured Data, Yue Wang, Ali Shojaie, Tim Randolph, Jing Ma

UW Biostatistics Working Paper Series

Analysis of two-way structured data, i.e., data with structures among both variables and samples, is becoming increasingly common in ecology, biology and neuro-science. Classical dimension-reduction tools, such as the singular value decomposition (SVD), may perform poorly for two-way structured data. The generalized matrix decomposition (GMD, Allen et al., 2014) extends the SVD to two-way structured data and thus constructs singular vectors that account for both structures. While the GMD is a useful dimension-reduction tool for exploratory analysis of two-way structured data, it is unsupervised and cannot be used to assess the association between such data and an outcome of interest. …


Statistical Inference For Networks Of High-Dimensional Point Processes, Xu Wang, Mladen Kolar, Ali Shojaie Dec 2019

Statistical Inference For Networks Of High-Dimensional Point Processes, Xu Wang, Mladen Kolar, Ali Shojaie

UW Biostatistics Working Paper Series

Fueled in part by recent applications in neuroscience, high-dimensional Hawkes process have become a popular tool for modeling the network of interactions among multivariate point process data. While evaluating the uncertainty of the network estimates is critical in scientific applications, existing methodological and theoretical work have only focused on estimation. To bridge this gap, this paper proposes a high-dimensional statistical inference procedure with theoretical guarantees for multivariate Hawkes process. Key to this inference procedure is a new concentration inequality on the first- and second-order statistics for integrated stochastic processes, which summarizes the entire history of the process. We apply this …


Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan Mar 2019

Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan

COBRA Preprint Series

One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease's process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards …


Studying The Optimal Scheduling For Controlling Prostate Cancer Under Intermittent Androgen Suppression, Sunil K. Dhar, Hans R. Chaudhry, Bruce G. Bukiet, Zhiming Ji, Nan Gao, Thomas W. Findley Jan 2017

Studying The Optimal Scheduling For Controlling Prostate Cancer Under Intermittent Androgen Suppression, Sunil K. Dhar, Hans R. Chaudhry, Bruce G. Bukiet, Zhiming Ji, Nan Gao, Thomas W. Findley

Harvard University Biostatistics Working Paper Series

This retrospective study shows that the majority of patients’ correlations between PSA and Testosterone during the on-treatment period is at least 0.90. Model-based duration calculations to control PSA levels during off-treatment are provided. There are two pairs of models. In one pair, the Generalized Linear Model and Mixed Model are both used to analyze the variability of PSA at the individual patient level by using the variable “Patient ID” as a repeated measure. In the second pair, Patient ID is not used as a repeated measure but additional baseline variables are included to analyze the variability of PSA.


Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang Feb 2016

Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang

COBRA Preprint Series

Non-negative matrix factorization (NMF) is a widely used machine learning algorithm for dimension reduction of large-scale data. It has found successful applications in a variety of fields such as computational biology, neuroscience, natural language processing, information retrieval, image processing and speech recognition. In bioinformatics, for example, it has been used to extract patterns and profiles from genomic and text-mining data as well as in protein sequence and structure analysis. While the scientific performance of NMF is very promising in dealing with high dimensional data sets and complex data structures, its computational cost is high and sometimes could be critical for …


Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret Jan 2016

Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret

UW Biostatistics Working Paper Series

We have frequently implemented crossover studies to evaluate new therapeutic interventions for genital herpes simplex virus infection. The outcome measured to assess the efficacy of interventions on herpes disease severity is the viral shedding rate, defined as the frequency of detection of HSV on the genital skin and mucosa. We performed a simulation study to ascertain whether our standard model, which we have used previously, was appropriately considering all the necessary features of the shedding data to provide correct inference. We simulated shedding data under our standard, validated assumptions and assessed the ability of 5 different models to reproduce the …


Joint Estimation Of Multiple Graphical Models From High Dimensional Time Series, Huitong Qiu, Fang Han, Han Liu, Brian Caffo Nov 2013

Joint Estimation Of Multiple Graphical Models From High Dimensional Time Series, Huitong Qiu, Fang Han, Han Liu, Brian Caffo

Johns Hopkins University, Dept. of Biostatistics Working Papers

In this manuscript the problem of jointly estimating multiple graphical models in high dimensions is considered. It is assumed that the data are collected from n subjects, each of which consists of m non-independent observations. The graphical models of subjects vary, but are assumed to change smoothly corresponding to a measure of the closeness between subjects. A kernel based method for jointly estimating all graphical models is proposed. Theoretically, under a double asymptotic framework, where both (m,n) and the dimension d can increase, the explicit rate of convergence in parameter estimation is provided, thus characterizing the strength one can borrow …


Sparse Median Graphs Estimation In A High Dimensional Semiparametric Model, Fang Han, Han Liu, Brian Caffo Oct 2013

Sparse Median Graphs Estimation In A High Dimensional Semiparametric Model, Fang Han, Han Liu, Brian Caffo

Johns Hopkins University, Dept. of Biostatistics Working Papers

In this manuscript a unified framework for conducting inference on complex aggregated data in high dimensional settings is proposed. The data are assumed to be a collection of multiple non-Gaussian realizations with underlying undirected graphical structures. Utilizing the concept of median graphs in summarizing the commonality across these graphical structures, a novel semiparametric approach to modeling such complex aggregated data is provided along with robust estimation of the median graph, which is assumed to be sparse. The estimator is proved to be consistent in graph recovery and an upper bound on the rate of convergence is given. Experiments on both …


A Bayesian Regression Tree Approach To Identify The Effect Of Nanoparticles Properties On Toxicity Profiles, Cecile Low-Kam, Haiyuan Zhang, Zhaoxia Ji, Tian Xia, Jeffrey I. Zinc, Andre Nel, Donatello Telesca Mar 2013

A Bayesian Regression Tree Approach To Identify The Effect Of Nanoparticles Properties On Toxicity Profiles, Cecile Low-Kam, Haiyuan Zhang, Zhaoxia Ji, Tian Xia, Jeffrey I. Zinc, Andre Nel, Donatello Telesca

COBRA Preprint Series

We introduce a Bayesian multiple regression tree model to characterize relationships between physico-chemical properties of nanoparticles and their in-vitro toxicity over multiple doses and times of exposure. Unlike conventional models that rely on data summaries, our model solves the low sample size issue and avoids arbitrary loss of information by combining all measurements from a general exposure experiment across doses, times of exposure, and replicates. The proposed technique integrates Bayesian trees for modeling threshold effects and interactions, and penalized B-splines for dose and time-response surfaces smoothing. The resulting posterior distribution is sampled via a Markov Chain Monte Carlo algorithm. This …


Differential Patterns Of Interaction And Gaussian Graphical Models, Masanao Yajima, Donatello Telesca, Yuan Ji, Peter Muller Apr 2012

Differential Patterns Of Interaction And Gaussian Graphical Models, Masanao Yajima, Donatello Telesca, Yuan Ji, Peter Muller

COBRA Preprint Series

We propose a methodological framework to assess heterogeneous patterns of association amongst components of a random vector expressed as a Gaussian directed acyclic graph. The proposed framework is likely to be useful when primary interest focuses on potential contrasts characterizing the association structure between known subgroups of a given sample. We provide inferential frameworks as well as an efficient computational algorithm to fit such a model and illustrate its validity through a simulation. We apply the model to Reverse Phase Protein Array data on Acute Myeloid Leukemia patients to show the contrast of association structure between refractory patients and relapsed …


Toxicity Profiling Of Engineered Nanomaterials Via Multivariate Dose Response Surface Modeling, Trina Patel, Donatello Telesca, Saji George, Andre Nel Dec 2011

Toxicity Profiling Of Engineered Nanomaterials Via Multivariate Dose Response Surface Modeling, Trina Patel, Donatello Telesca, Saji George, Andre Nel

COBRA Preprint Series

New generation in-vitro high throughput screening (HTS) assays for the assessment of engineered nanomaterials provide an opportunity to learn how these particles interact at the cellular level, particularly in relation to injury pathways. These types of assays are often characterized by small sample sizes, high measurement error and high dimensionality as multiple cytotoxicity outcomes are measured across an array of doses and durations of exposure. In this article we propose a probability model for toxicity profiling of engineered nanomaterials. A hierarchical framework is used to account for the multivariate nature of the data by modeling dependence between outcomes and thereby …


Likelihood Based Population Independent Component Analysis, Ani Eloyan, Ciprian M. Crainiceanu, Brian S. Caffo Nov 2011

Likelihood Based Population Independent Component Analysis, Ani Eloyan, Ciprian M. Crainiceanu, Brian S. Caffo

Johns Hopkins University, Dept. of Biostatistics Working Papers

Independent component analysis (ICA) is a widely used technique for blind source separation, used heavily in several scientific research areas including acoustics, electrophysiology and functional neuroimaging. We propose a scalable two-stage iterative true group ICA methodology for analyzing population level fMRI data where the number of subjects is very large. The method is based on likelihood estimators of the underlying source densities and the mixing matrix. As opposed to many commonly used group ICA algorithms the proposed method does not require significant data reduction by a twofold singular value decomposition. In addition, the method can be applied to a large …


Multiple Testing Of Local Maxima For Detection Of Peaks In Chip-Seq Data, Armin Schwartzman, Andrew Jaffe, Yulia Gavrilov, Clifford A. Meyer Aug 2011

Multiple Testing Of Local Maxima For Detection Of Peaks In Chip-Seq Data, Armin Schwartzman, Andrew Jaffe, Yulia Gavrilov, Clifford A. Meyer

Harvard University Biostatistics Working Paper Series

No abstract provided.


On The Covariate-Adjusted Estimation For An Overall Treatment Difference With Data From A Randomized Comparative Clinical Trial, Lu Tian, Tianxi Cai, Lihui Zhao, L. J. Wei Jul 2011

On The Covariate-Adjusted Estimation For An Overall Treatment Difference With Data From A Randomized Comparative Clinical Trial, Lu Tian, Tianxi Cai, Lihui Zhao, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


A Unified Approach To Non-Negative Matrix Factorization And Probabilistic Latent Semantic Indexing, Karthik Devarajan, Guoli Wang, Nader Ebrahimi Jul 2011

A Unified Approach To Non-Negative Matrix Factorization And Probabilistic Latent Semantic Indexing, Karthik Devarajan, Guoli Wang, Nader Ebrahimi

COBRA Preprint Series

Non-negative matrix factorization (NMF) by the multiplicative updates algorithm is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into two matrices, W and H, each with nonnegative entries, V ~ WH. NMF has been shown to have a unique parts-based, sparse representation of the data. The nonnegativity constraints in NMF allow only additive combinations of the data which enables it to learn parts that have distinct physical representations in reality. In the last few years, NMF has been successfully applied in a variety of areas such as natural language processing, information retrieval, image processing, speech recognition …


Multiple Testing Of Local Maxima For Detection Of Unimodal Peaks In 1d, Armin Schwartzman, Yulia Gavrilov, Robert J. Adler Jul 2011

Multiple Testing Of Local Maxima For Detection Of Unimodal Peaks In 1d, Armin Schwartzman, Yulia Gavrilov, Robert J. Adler

Harvard University Biostatistics Working Paper Series

No abstract provided.


Component Extraction Of Complex Biomedical Signal And Performance Analysis Based On Different Algorithm, Hemant Pasusangai Kasturiwale Jun 2011

Component Extraction Of Complex Biomedical Signal And Performance Analysis Based On Different Algorithm, Hemant Pasusangai Kasturiwale

Johns Hopkins University, Dept. of Biostatistics Working Papers

Biomedical signals can arise from one or many sources including heart ,brains and endocrine systems. Multiple sources poses challenge to researchers which may have contaminated with artifacts and noise. The Biomedical time series signal are like electroencephalogram(EEG),electrocardiogram(ECG),etc The morphology of the cardiac signal is very important in most of diagnostics based on the ECG. The diagnosis of patient is based on visual observation of recorded ECG,EEG,etc, may not be accurate. To achieve better understanding , PCA (Principal Component Analysis) and ICA algorithms helps in analyzing ECG signals . The immense scope in the field of biomedical-signal processing Independent Component Analysis( …


Gains In Power From Structured Two-Sample Tests Of Means On Graphs, Laurent Jacob, Pierre Neuvial, Sandrine Dudoit Oct 2010

Gains In Power From Structured Two-Sample Tests Of Means On Graphs, Laurent Jacob, Pierre Neuvial, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

We consider multivariate two-sample tests of means, where the location shift between the two populations is expected to be related to a known graph structure. An important application of such tests is the detection of differentially expressed genes between two patient populations, as shifts in expression levels are expected to be coherent with the structure of graphs reflecting gene properties such as biological process, molecular function, regulation, or metabolism. For a fixed graph of interest, we demonstrate that accounting for graph structure can yield more powerful tests under the assumption of smooth distribution shift on the graph. We also investigate …


A Unified Approach To Modeling Multivariate Binary Data Using Copulas Over Partitions, Bruce J. Swihart, Brian Caffo, Ciprian Crainiceanu Jul 2010

A Unified Approach To Modeling Multivariate Binary Data Using Copulas Over Partitions, Bruce J. Swihart, Brian Caffo, Ciprian Crainiceanu

Johns Hopkins University, Dept. of Biostatistics Working Papers

Many seemingly disparate approaches for marginal modeling have been developed in recent years. We demonstrate that many current approaches for marginal modeling of correlated binary outcomes produce likelihoods that are equivalent to the proposed copula-based models herein. These general copula models of underlying latent threshold random variables yield likelihood based models for marginal fixed effects estimation and interpretation in the analysis of correlated binary data. Moreover, we propose a nomenclature and set of model relationships that substantially elucidates the complex area of marginalized models for binary data. A diverse collection of didactic mathematical and numerical examples are given to illustrate …


The Effect Of Correlation In False Discovery Rate Estimation, Armin Schwartzman, Xihong Lin Jul 2009

The Effect Of Correlation In False Discovery Rate Estimation, Armin Schwartzman, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.


Correlated Binary Regression Using Orthogonalized Residuals, Richard C. Zink, Bahjat F. Qaqish Mar 2009

Correlated Binary Regression Using Orthogonalized Residuals, Richard C. Zink, Bahjat F. Qaqish

COBRA Preprint Series

This paper focuses on marginal regression models for correlated binary responses when estimation of the association structure is of primary interest. A new estimating function approach based on orthogonalized residuals is proposed. This procedure allows a new representation and addresses some of the difficulties of the conditional-residual formulation of alternating logistic regressions of Carey, Zeger & Diggle (1993). The new method is illustrated with an analysis of data on impaired pulmonary function.


Group Comparison Of Eigenvalues And Eigenvectors Of Diffusion Tensors, Armin Schwartzman, Robert F. Dougherty, Jonathan E. Taylor Mar 2009

Group Comparison Of Eigenvalues And Eigenvectors Of Diffusion Tensors, Armin Schwartzman, Robert F. Dougherty, Jonathan E. Taylor

Harvard University Biostatistics Working Paper Series

No abstract provided.


Sparse Linear Discriminant Analysis For Simultaneous Testing For The Significance Of A Gene Set/Pathway And Gene Selection, Michael C. Wu, Lingson Zhang, Zhaoxi Wang, David C. Christiani, Xihong Lin Jan 2009

Sparse Linear Discriminant Analysis For Simultaneous Testing For The Significance Of A Gene Set/Pathway And Gene Selection, Michael C. Wu, Lingson Zhang, Zhaoxi Wang, David C. Christiani, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.


Space-Time Regression Modeling Of Tree Growth Using The Skew-T Distribution, Farouk S. Nathoo Dec 2008

Space-Time Regression Modeling Of Tree Growth Using The Skew-T Distribution, Farouk S. Nathoo

COBRA Preprint Series

In this article we present new statistical methodology for the analysis of repeated measures of spatially correlated growth data. Our motivating application, a ten year study of height growth in a plantation of even-aged white spruce, presents several challenges for statistical analysis. Here, the growth measurements arise from an asymmetric distribution, with heavy tails, and thus standard longitudinal regression models based on a Gaussian error structure are not appropriate. We seek more flexibility for modeling both skewness and fat tails, and achieve this within the class of skew-elliptical distributions. Within this framework, robust space-time regression models are formulated using random …


Limitations Of Remotely-Sensed Aerosol As A Spatial Proxy For Fine Particulate Matter, Christopher J. Paciorek, Yang Liu Sep 2008

Limitations Of Remotely-Sensed Aerosol As A Spatial Proxy For Fine Particulate Matter, Christopher J. Paciorek, Yang Liu

Harvard University Biostatistics Working Paper Series

Recent research highlights the promise of remotely-sensed aerosol optical depth (AOD) as a proxy for ground-level PM2.5. Particular interest lies in the information on spatial heterogeneity potentially provided by AOD, with important application to estimating and monitoring pollution exposure for public health purposes. Given the temporal and spatio-temporal correlations reported between AOD and PM2.5 , it is tempting to interpret the spatial patterns in AOD as reflecting patterns in PM2.5 . Here we find only limited spatial associations of AOD from three satellite retrievals with PM2.5 over the eastern U.S. at the daily and yearly levels in 2004. We then …


Model-Based Clustering Of Methylation Array Data: A Recursive-Partitioning Algorithm For High-Dimensional Data Arising As A Mixture Of Beta Distributions, E. Andres Houseman, Brock C. Christensen, Ru-Fang Yeh, Carmen J. Marsit, Margaret R. Karagas, Margaret Wrensch, Heather H. Nelson, Joseph Wiemels, Shichun Zheng, John K. Wiencke, Karl T. Kelsey Jun 2008

Model-Based Clustering Of Methylation Array Data: A Recursive-Partitioning Algorithm For High-Dimensional Data Arising As A Mixture Of Beta Distributions, E. Andres Houseman, Brock C. Christensen, Ru-Fang Yeh, Carmen J. Marsit, Margaret R. Karagas, Margaret Wrensch, Heather H. Nelson, Joseph Wiemels, Shichun Zheng, John K. Wiencke, Karl T. Kelsey

Harvard University Biostatistics Working Paper Series

No abstract provided.


Empirical Null And False Discovery Rate Inference For Exponential Families, Armin Schwartzman Feb 2008

Empirical Null And False Discovery Rate Inference For Exponential Families, Armin Schwartzman

Harvard University Biostatistics Working Paper Series

No abstract provided.


Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan Jan 2008

Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Regression models are often used to test for cause-effect relationships from data collected in randomized trials or experiments. This practice has deservedly come under heavy scrutiny, since commonly used models such as linear and logistic regression will often not capture the actual relationships between variables, and incorrectly specified models potentially lead to incorrect conclusions. In this paper, we focus on hypothesis test of whether the treatment given in a randomized trial has any effect on the mean of the primary outcome, within strata of baseline variables such as age, sex, and health status. Our primary concern is ensuring that such …


On The Merits Of Voxel-Based Morphometric Path-Analysis For Investigating Volumetric Mediation Of A Toxicant's Influence On Cognitive Function, Shu-Chih Su, Brian S. Caffo, Lynn E. Eberly, Elizabeth Garrett-Mayer, Walter F. Stewart, Sining Chen, David Yousem, Christos Davatzikos, Brian Schwartz Jan 2008

On The Merits Of Voxel-Based Morphometric Path-Analysis For Investigating Volumetric Mediation Of A Toxicant's Influence On Cognitive Function, Shu-Chih Su, Brian S. Caffo, Lynn E. Eberly, Elizabeth Garrett-Mayer, Walter F. Stewart, Sining Chen, David Yousem, Christos Davatzikos, Brian Schwartz

Johns Hopkins University, Dept. of Biostatistics Working Papers

We previously showed that lifetime cumulative lead dose, measured as lead concentration in the tibia bone by X-ray fluorescence, was associated with persistent and progressive declines in cognitive function and with decreases in MRI-based brain volumes in former lead workers. Moreover, larger region-specific brain volumes were associated with better cognitive function. These findings motivated us to explore a novel application of path analysis to evaluate effect mediation. Voxel-wise path analysis, at face value, represents the natural evolution of voxel-based morphometry methods to answer questions of mediation. Application of these methods to the former lead worker data demonstrated potential limitations in …


Simultaneous Confidence Intervals Based On The Percentile Bootstrap Approach, Micha Mandel, Rebecca A. Betensky Jun 2007

Simultaneous Confidence Intervals Based On The Percentile Bootstrap Approach, Micha Mandel, Rebecca A. Betensky

Harvard University Biostatistics Working Paper Series

No abstract provided.