Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Multivariate Analysis

PDF

Institution
Keyword
Publication Year
Publication
Publication Type

Articles 121 - 150 of 160

Full-Text Articles in Statistical Models

Mixture Of Factor Analyzers With Information Criteria And The Genetic Algorithm, Esra Turan Aug 2010

Mixture Of Factor Analyzers With Information Criteria And The Genetic Algorithm, Esra Turan

Doctoral Dissertations

In this dissertation, we have developed and combined several statistical techniques in Bayesian factor analysis (BAYFA) and mixture of factor analyzers (MFA) to overcome the shortcoming of these existing methods. Information Criteria are brought into the context of the BAYFA model as a decision rule for choosing the number of factors m along with the Press and Shigemasu method, Gibbs Sampling and Iterated Conditional Modes deterministic optimization. Because of sensitivity of BAYFA on the prior information of the factor pattern structure, the prior factor pattern structure is learned directly from the given sample observations data adaptively using Sparse Root algorithm. …


A Unified Approach To Modeling Multivariate Binary Data Using Copulas Over Partitions, Bruce J. Swihart, Brian Caffo, Ciprian Crainiceanu Jul 2010

A Unified Approach To Modeling Multivariate Binary Data Using Copulas Over Partitions, Bruce J. Swihart, Brian Caffo, Ciprian Crainiceanu

Johns Hopkins University, Dept. of Biostatistics Working Papers

Many seemingly disparate approaches for marginal modeling have been developed in recent years. We demonstrate that many current approaches for marginal modeling of correlated binary outcomes produce likelihoods that are equivalent to the proposed copula-based models herein. These general copula models of underlying latent threshold random variables yield likelihood based models for marginal fixed effects estimation and interpretation in the analysis of correlated binary data. Moreover, we propose a nomenclature and set of model relationships that substantially elucidates the complex area of marginalized models for binary data. A diverse collection of didactic mathematical and numerical examples are given to illustrate …


Statistical Analysis Of Texas Holdem Poker, Daniel Bragonier Jun 2010

Statistical Analysis Of Texas Holdem Poker, Daniel Bragonier

Statistics

Gathered lifetime online Poker data for Mike Linn. Attempted to analyze data to obtain information to maximize profit. Techniques included Univariate Analysis, Regression analysis, Anova analysis, Logistic Regression, and outlier Analysis. After the analysis, nothing of supreme importance or sustenance was found. Encountered issues with too much power. Results lead to plenty of statistical significance, but little practical significance. Results showed that the data did not provide all the answers that were being sought after, but there was some value in examining the data in a strict statistical manner.


Survival Prediction For Brain Tumor Patients Using Gene Expression Data, Vinicius Bonato May 2010

Survival Prediction For Brain Tumor Patients Using Gene Expression Data, Vinicius Bonato

Dissertations & Theses (Open Access)

Brain tumor is one of the most aggressive types of cancer in humans, with an estimated median survival time of 12 months and only 4% of the patients surviving more than 5 years after disease diagnosis. Until recently, brain tumor prognosis has been based only on clinical information such as tumor grade and patient age, but there are reports indicating that molecular profiling of gliomas can reveal subgroups of patients with distinct survival rates. We hypothesize that coupling molecular profiling of brain tumors with clinical information might improve predictions of patient survival time and, consequently, better guide future treatment decisions. …


Wavelet-Based Functional Linear Mixed Models: An Application To Measurement Error–Corrected Distributed Lag Models, Elizabeth J. Malloy, Jeffrey S. Morris, Sara D. Adar, Helen Suh, Diane R. Gold, Brent A. Coull Jan 2010

Wavelet-Based Functional Linear Mixed Models: An Application To Measurement Error–Corrected Distributed Lag Models, Elizabeth J. Malloy, Jeffrey S. Morris, Sara D. Adar, Helen Suh, Diane R. Gold, Brent A. Coull

Jeffrey S. Morris

Frequently, exposure data are measured over time on a grid of discrete values that collectively define a functional observation. In many applications, researchers are interested in using these measurements as covariates to predict a scalar response in a regression setting, with interest focusing on the most biologically relevant time window of exposure. One example is in panel studies of the health effects of particulate matter (PM), where particle levels are measured over time. In such studies, there are many more values of the functional data than observations in the data set so that regularization of the corresponding functional regression coefficient …


Members’ Discoveries: Fatal Flaws In Cancer Research, Jeffrey S. Morris Jan 2010

Members’ Discoveries: Fatal Flaws In Cancer Research, Jeffrey S. Morris

Jeffrey S. Morris

A recent article published in The Annals of Applied Statistics (AOAS) by two MD Anderson researchers—Keith Baggerly and Kevin Coombes—dissects results from a highly-influential series of medical papers involving genomics-driven personalized cancer therapy, and outlines a series of simple yet fatal flaws that raises serious questions about the veracity of the original results. Having immediate and strong impact, this paper, along with related work, is providing the impetus for new standards of reproducibility in scientific research.


Statistical Contributions To Proteomic Research, Jeffrey S. Morris, Keith A. Baggerly, Howard B. Gutstein, Kevin R. Coombes Jan 2010

Statistical Contributions To Proteomic Research, Jeffrey S. Morris, Keith A. Baggerly, Howard B. Gutstein, Kevin R. Coombes

Jeffrey S. Morris

Proteomic profiling has the potential to impact the diagnosis, prognosis, and treatment of various diseases. A number of different proteomic technologies are available that allow us to look at many proteins at once, and all of them yield complex data that raise significant quantitative challenges. Inadequate attention to these quantitative issues can prevent these studies from achieving their desired goals, and can even lead to invalid results. In this chapter, we describe various ways the involvement of statisticians or other quantitative scientists in the study team can contribute to the success of proteomic research, and we outline some of the …


Informatics And Statistics For Analyzing 2-D Gel Electrophoresis Images, Andrew W. Dowsey, Jeffrey S. Morris, Howard G. Gutstein, Guang Z. Yang Jan 2010

Informatics And Statistics For Analyzing 2-D Gel Electrophoresis Images, Andrew W. Dowsey, Jeffrey S. Morris, Howard G. Gutstein, Guang Z. Yang

Jeffrey S. Morris

Whilst recent progress in ‘shotgun’ peptide separation by integrated liquid chromatography and mass spectrometry (LC/MS) has enabled its use as a sensitive analytical technique, proteome coverage and reproducibility is still limited and obtaining enough replicate runs for biomarker discovery is a challenge. For these reasons, recent research demonstrates the continuing need for protein separation by two-dimensional gel electrophoresis (2-DE). However, with traditional 2-DE informatics, the digitized images are reduced to symbolic data though spot detection and quantification before proteins are compared for differential expression by spot matching. Recently, a more robust and automated paradigm has emerged where gels are directly …


Bayesian Random Segmentationmodels To Identify Shared Copy Number Aberrations For Array Cgh Data, Veerabhadran Baladandayuthapani, Yuan Ji, Rajesh Talluri, Luis E. Nieto-Barajas, Jeffrey S. Morris Jan 2010

Bayesian Random Segmentationmodels To Identify Shared Copy Number Aberrations For Array Cgh Data, Veerabhadran Baladandayuthapani, Yuan Ji, Rajesh Talluri, Luis E. Nieto-Barajas, Jeffrey S. Morris

Jeffrey S. Morris

Array-based comparative genomic hybridization (aCGH) is a high-resolution high-throughput technique for studying the genetic basis of cancer. The resulting data consists of log fluorescence ratios as a function of the genomic DNA location and provides a cytogenetic representation of the relative DNA copy number variation. Analysis of such data typically involves estimation of the underlying copy number state at each location and segmenting regions of DNA with similar copy number states. Most current methods proceed by modeling a single sample/array at a time, and thus fail to borrow strength across multiple samples to infer shared regions of copy number aberrations. …


Bayesian Inference For A Periodic Stochastic Volatility Model Of Intraday Electricity Prices, Michael S. Smith Dec 2009

Bayesian Inference For A Periodic Stochastic Volatility Model Of Intraday Electricity Prices, Michael S. Smith

Michael Stanley Smith

The Gaussian stochastic volatility model is extended to allow for periodic autoregressions (PAR) in both the level and log-volatility process. Each PAR is represented as a first order vector autoregression for a longitudinal vector of length equal to the period. The periodic stochastic volatility model is therefore expressed as a multivariate stochastic volatility model. Bayesian posterior inference is computed using a Markov chain Monte Carlo scheme for the multivariate representation. A circular prior that exploits the periodicity is suggested for the log-variance of the log-volatilities. The approach is applied to estimate a periodic stochastic volatility model for half-hourly electricity prices …


Bayesian Skew Selection For Multivariate Models, Michael S. Smith, Anastasios Panagiotelis Dec 2009

Bayesian Skew Selection For Multivariate Models, Michael S. Smith, Anastasios Panagiotelis

Michael Stanley Smith

We develop a Bayesian approach for the selection of skew in multivariate skew t distributions constructed through hidden conditioning in the manners suggested by either Azzalini and Capitanio (2003) or Sahu, Dey and Branco~(2003). We show that the skew coefficients for each margin are the same for the standardized versions of both distributions. We introduce binary indicators to denote whether there is symmetry, or skew, in each dimension. We adopt a proper beta prior on each non-zero skew coefficient, and derive the corresponding prior on the skew parameters. In both distributions we show that as the degrees of freedom increases, …


A Statistical Framework For The Analysis Of Chip-Seq Data, Pei Fen Kuan, Dongjun Chung, Guangjin Pan, James A. Thomson, Ron Stewart, Sunduz Keles Nov 2009

A Statistical Framework For The Analysis Of Chip-Seq Data, Pei Fen Kuan, Dongjun Chung, Guangjin Pan, James A. Thomson, Ron Stewart, Sunduz Keles

Sunduz Keles

Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) has revolutionalized experiments for genome-wide profiling of DNA-binding proteins, histone modifications, and nucleosome occupancy. As the cost of sequencing is decreasing, many researchers are switching from microarray-based technologies (ChIP-chip) to ChIP-Seq for genome-wide study of transcriptional regulation. Despite its increasing and well-deserved popularity, there is little work that investigates and accounts for sources of biases in the ChIP-Seq technology. These biases typically arise from both the standard pre-processing protocol and the underlying DNA sequence of the generated data.

We study data from a naked DNA sequencing experiment, which sequences non-cross-linked DNA after deproteinizing and …


Using Cone Beam Computed Tomography To Identify A Prediction Model For Obstructive Sleep Apnea, Jodi Parker Sep 2009

Using Cone Beam Computed Tomography To Identify A Prediction Model For Obstructive Sleep Apnea, Jodi Parker

Loma Linda University Electronic Theses, Dissertations & Projects

Introduction: Obstructive Sleep Apnea (OSA) patients have increased risk of morbidity and mortality. Early diagnosis may reduce morbidity and mortality. Prediction of OSA from imaging may help to identify OSA patients earlier in life. CBCT can be used for OSA diagnostic imaging due to its three-dimensional (3D) visualization of the upper airway and craniofacial complex. Magnification associated with conventional 2D radiography is eliminated with CBCT, and radiation to the patient is significantly less than previous modalities used to measure craniofacial & airway measurements associated with OSA. During a CBCT scan, the patient's image is taken supine, rather than the upright …


Correlated Binary Regression Using Orthogonalized Residuals, Richard C. Zink, Bahjat F. Qaqish Mar 2009

Correlated Binary Regression Using Orthogonalized Residuals, Richard C. Zink, Bahjat F. Qaqish

COBRA Preprint Series

This paper focuses on marginal regression models for correlated binary responses when estimation of the association structure is of primary interest. A new estimating function approach based on orthogonalized residuals is proposed. This procedure allows a new representation and addresses some of the difficulties of the conditional-residual formulation of alternating logistic regressions of Carey, Zeger & Diggle (1993). The new method is illustrated with an analysis of data on impaired pulmonary function.


Group Comparison Of Eigenvalues And Eigenvectors Of Diffusion Tensors, Armin Schwartzman, Robert F. Dougherty, Jonathan E. Taylor Mar 2009

Group Comparison Of Eigenvalues And Eigenvectors Of Diffusion Tensors, Armin Schwartzman, Robert F. Dougherty, Jonathan E. Taylor

Harvard University Biostatistics Working Paper Series

No abstract provided.


Space-Time Regression Modeling Of Tree Growth Using The Skew-T Distribution, Farouk S. Nathoo Dec 2008

Space-Time Regression Modeling Of Tree Growth Using The Skew-T Distribution, Farouk S. Nathoo

COBRA Preprint Series

In this article we present new statistical methodology for the analysis of repeated measures of spatially correlated growth data. Our motivating application, a ten year study of height growth in a plantation of even-aged white spruce, presents several challenges for statistical analysis. Here, the growth measurements arise from an asymmetric distribution, with heavy tails, and thus standard longitudinal regression models based on a Gaussian error structure are not appropriate. We seek more flexibility for modeling both skewness and fat tails, and achieve this within the class of skew-elliptical distributions. Within this framework, robust space-time regression models are formulated using random …


Limitations Of Remotely-Sensed Aerosol As A Spatial Proxy For Fine Particulate Matter, Christopher J. Paciorek, Yang Liu Sep 2008

Limitations Of Remotely-Sensed Aerosol As A Spatial Proxy For Fine Particulate Matter, Christopher J. Paciorek, Yang Liu

Harvard University Biostatistics Working Paper Series

Recent research highlights the promise of remotely-sensed aerosol optical depth (AOD) as a proxy for ground-level PM2.5. Particular interest lies in the information on spatial heterogeneity potentially provided by AOD, with important application to estimating and monitoring pollution exposure for public health purposes. Given the temporal and spatio-temporal correlations reported between AOD and PM2.5 , it is tempting to interpret the spatial patterns in AOD as reflecting patterns in PM2.5 . Here we find only limited spatial associations of AOD from three satellite retrievals with PM2.5 over the eastern U.S. at the daily and yearly levels in 2004. We then …


Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan Jan 2008

Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Regression models are often used to test for cause-effect relationships from data collected in randomized trials or experiments. This practice has deservedly come under heavy scrutiny, since commonly used models such as linear and logistic regression will often not capture the actual relationships between variables, and incorrectly specified models potentially lead to incorrect conclusions. In this paper, we focus on hypothesis test of whether the treatment given in a randomized trial has any effect on the mean of the primary outcome, within strata of baseline variables such as age, sex, and health status. Our primary concern is ensuring that such …


Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh Nov 2006

Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh

Harvard University Biostatistics Working Paper Series

No abstract provided.


An Informative Bayesian Structural Equation Model To Assess Source-Specific Health Effects Of Air Pollution, Margaret C. Nikolov, Brent A. Coull, Paul J. Catalano, John J. Godleski Jul 2006

An Informative Bayesian Structural Equation Model To Assess Source-Specific Health Effects Of Air Pollution, Margaret C. Nikolov, Brent A. Coull, Paul J. Catalano, John J. Godleski

Harvard University Biostatistics Working Paper Series

No abstract provided.


Mixed Multiplicative Factor Analysis Model For Air Pollution Exposure Assessment, Margaret C. Nikolov, Brent A. Coull, Paul J. Catalano, John J. Godleski Jul 2006

Mixed Multiplicative Factor Analysis Model For Air Pollution Exposure Assessment, Margaret C. Nikolov, Brent A. Coull, Paul J. Catalano, John J. Godleski

Harvard University Biostatistics Working Paper Series

No abstract provided.


Semiparametric Bayesian Modeling Of Multivariate Average Bioequivalence, Pulak Ghosh Dr., Mithat Gonen May 2006

Semiparametric Bayesian Modeling Of Multivariate Average Bioequivalence, Pulak Ghosh Dr., Mithat Gonen

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

Bioequivalence trials are usually conducted to compare two or more formulations of a drug. Simultaneous assessment of bioequivalence on multiple endpoints is called multivariate bioequivalence. Despite the fact that some tests for multivariate bioequivalence are suggested, current practice usually involves univariate bioequivalence assessments ignoring the correlations between the endpoints such as AUC and Cmax. In this paper we develop a semiparametric Bayesian test for bioequivalence under multiple endpoints. Specifically, we show how the correlation between the endpoints can be incorporated in the analysis and how this correlation affects the inference. Resulting estimates and posterior probabilities ``borrow strength'' from one another …


Semiparametric Latent Variable Regression Models For Spatio-Temporal Modeling Of Mobile Source Particles In The Greater Boston Area, Alexandros Gryparis, Brent A. Coull, Joel Schwartz, Helen H. Suh Apr 2006

Semiparametric Latent Variable Regression Models For Spatio-Temporal Modeling Of Mobile Source Particles In The Greater Boston Area, Alexandros Gryparis, Brent A. Coull, Joel Schwartz, Helen H. Suh

Harvard University Biostatistics Working Paper Series

Traffic particle concentrations show considerable spatial variability within a metropolitan area. We consider latent variable semiparametric regression models for modeling the spatial and temporal variability of black carbon and elemental carbon concentrations in the greater Boston area. Measurements of these pollutants, which are markers of traffic particles, were obtained from several individual exposure studies conducted at specific household locations as well as 15 ambient monitoring sites in the city. The models allow for both flexible, nonlinear effects of covariates and for unexplained spatial and temporal variability in exposure. In addition, the different individual exposure studies recorded different surrogates of traffic …


Multiple Tests Of Association With Biological Annotation Metadata, Sandrine Dudoit, Sunduz Keles, Mark J. Van Der Laan Mar 2006

Multiple Tests Of Association With Biological Annotation Metadata, Sandrine Dudoit, Sunduz Keles, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We propose a general and formal statistical framework for the multiple tests of associations between known fixed features of a genome and unknown parameters of the distribution of variable features of this genome in a population of interest. The known fixed gene-annotation profiles, corresponding to the fixed features of the genome, may concern Gene Ontology (GO) annotation, pathway membership, regulation by particular transcription factors, nucleotide sequences, or protein sequences. The unknown gene-parameter profiles, corresponding to the variable features of the genome, may be, for example, regression coefficients relating genome-wide transcript levels or DNA copy numbers to possibly censored biological and …


Test Statistics Null Distributions In Multiple Testing: Simulation Studies And Applications To Genomics, Katherine S. Pollard, Merrill D. Birkner, Mark J. Van Der Laan, Sandrine Dudoit Jul 2005

Test Statistics Null Distributions In Multiple Testing: Simulation Studies And Applications To Genomics, Katherine S. Pollard, Merrill D. Birkner, Mark J. Van Der Laan, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

Multiple hypothesis testing problems arise frequently in biomedical and genomic research, for instance, when identifying differentially expressed or co-expressed genes in microarray experiments. We have developed generally applicable resampling-based single-step and stepwise multiple testing procedures (MTP) for control of a broad class of Type I error rates, defined as tail probabilities and expected values for arbitrary functions of the numbers of false positives and rejected hypotheses (Dudoit and van der Laan, 2005; Dudoit et al., 2004a,b; Pollard and van der Laan, 2004; van der Laan et al., 2005, 2004a,b). As argued in the early article of Pollard and van der …


New Statistical Paradigms Leading To Web-Based Tools For Clinical/Translational Science, Knut M. Wittkowski May 2005

New Statistical Paradigms Leading To Web-Based Tools For Clinical/Translational Science, Knut M. Wittkowski

COBRA Preprint Series

As the field of functional genetics and genomics is beginning to mature, we become confronted with new challenges. The constant drop in price for sequencing and gene expression profiling as well as the increasing number of genetic and genomic variables that can be measured makes it feasible to address more complex questions. The success with rare diseases caused by single loci or genes has provided us with a proof-of-concept that new therapies can be developed based on functional genomics and genetics.

Common diseases, however, typically involve genetic epistasis, genomic pathways, and proteomic pattern. Moreover, to better understand the underlying biologi-cal …


Causal Inference In Longitudinal Studies With History-Restricted Marginal Structural Models, Romain Neugebauer, Mark J. Van Der Laan, Ira B. Tager Apr 2005

Causal Inference In Longitudinal Studies With History-Restricted Marginal Structural Models, Romain Neugebauer, Mark J. Van Der Laan, Ira B. Tager

U.C. Berkeley Division of Biostatistics Working Paper Series

Causal Inference based on Marginal Structural Models (MSMs) is particularly attractive to subject-matter investigators because MSM parameters provide explicit representations of causal effects. We introduce History-Restricted Marginal Structural Models (HRMSMs) for longitudinal data for the purpose of defining causal parameters which may often be better suited for Public Health research. This new class of MSMs allows investigators to analyze the causal effect of a treatment on an outcome based on a fixed, shorter and user-specified history of exposure compared to MSMs. By default, the latter represents the treatment causal effect of interest based on a treatment history defined by the …


Survival Ensembles, Torsten Hothorn, Peter Buhlmann, Sandrine Dudoit, Annette M. Molinaro, Mark J. Van Der Laan Apr 2005

Survival Ensembles, Torsten Hothorn, Peter Buhlmann, Sandrine Dudoit, Annette M. Molinaro, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We propose a unified and flexible framework for ensemble learning in the presence of censoring. For right-censored data, we introduce a random forest algorithm and a generic gradient boosting algorithm for the construction of prognostic models. The methodology is utilized for predicting the survival time of patients suffering from acute myeloid leukemia based on clinical and genetic covariates. Furthermore, we compare the diagnostic capabilities of the proposed censored data random forest and boosting methods applied to the recurrence free survival time of node positive breast cancer patients with previously published findings.


Combining Predictors For Classification Using The Area Under The Roc Curve, Margaret S. Pepe, Tianxi Cai, Zheng Zhang, Gary M. Longton Jan 2005

Combining Predictors For Classification Using The Area Under The Roc Curve, Margaret S. Pepe, Tianxi Cai, Zheng Zhang, Gary M. Longton

UW Biostatistics Working Paper Series

No single biomarker for cancer is considered adequately sensitive and specific for cancer screening. It is expected that the results of multiple markers will need to be combined in order to yield adequately accurate classification. Typically the objective function that is optimized for combining markers is the likelihood function. In this paper we consider an alternative objective function -- the area under the empirical receiver operating characteristic curve (AUC). We note that it yields consistent estimates of parameters in a generalized linear model for the risk score but does not require specifying the link function. Like logistic regression it yields …


Spatially Adaptive Bayesian P-Splines With Heteroscedastic Errors, Ciprian M. Crainiceanu, David Ruppert, Raymond J. Carroll Nov 2004

Spatially Adaptive Bayesian P-Splines With Heteroscedastic Errors, Ciprian M. Crainiceanu, David Ruppert, Raymond J. Carroll

Johns Hopkins University, Dept. of Biostatistics Working Papers

An increasingly popular tool for nonparametric smoothing are penalized splines (P-splines) which use low-rank spline bases to make computations tractable while maintaining accuracy as good as smoothing splines. This paper extends penalized spline methodology by both modeling the variance function nonparametrically and using a spatially adaptive smoothing parameter. These extensions have been studied before, but never together and never in the multivariate case. This combination is needed for satisfactory inference and can be implemented effectively by Bayesian \mbox{MCMC}. The variance process controlling the spatially-adaptive shrinkage of the mean and the variance of the heteroscedastic error process are modeled as log-penalized …