Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 17 of 17

Full-Text Articles in Numerical Analysis and Computation

Flexible Distributed Lag Models Using Random Functions With Application To Estimating Mortality Displacement From Heat-Related Deaths, Roger D. Peng Dec 2011

Flexible Distributed Lag Models Using Random Functions With Application To Estimating Mortality Displacement From Heat-Related Deaths, Roger D. Peng

Johns Hopkins University, Dept. of Biostatistics Working Papers

No abstract provided.


Caching And Visualizing Statistical Analyses, Roger D. Peng, Duncan Temple Lang Jun 2009

Caching And Visualizing Statistical Analyses, Roger D. Peng, Duncan Temple Lang

Johns Hopkins University, Dept. of Biostatistics Working Papers

We present the cacher and CodeDepends packages for R, which provide tools for (1) caching and analyzing the code for statistical analyses and (2) distributing these analyses to others in an efficient manner over the web. The cacher package takes objects created by evaluating R expressions and stores them in key-value databases. These databases of cached objects can subsequently be assembled into “cache packages” for distribution over the web. The cacher package also provides tools to help readers examine the data and code in a statistical analysis and reproduce, modify, or improve upon the results. In addition, readers can easily …


Efficient Evaluation Of Ranking Procedures When The Number Of Units Is Large With Application To Snp Identification, Thomas A. Louis, Ingo Ruczinski Feb 2009

Efficient Evaluation Of Ranking Procedures When The Number Of Units Is Large With Application To Snp Identification, Thomas A. Louis, Ingo Ruczinski

Johns Hopkins University, Dept. of Biostatistics Working Papers

Simulation-based assessment is a popular and frequently necessary approach to evaluation of statistical procedures. Sometimes overlooked is the ability to take advantage of underlying mathematical relations and we focus on this aspect. We show how to take advantage of large-sample theory when conducting a simulation using the analysis of genomic data as a motivating example. The approach uses convergence results to provide an approximation to smaller-sample results, results that are available only by simulation. We consider evaluating and comparing a variety of ranking-based methods for identifying the most highly associated SNPs in a genome-wide association study, derive integral equation representations …


Caching And Distributing Statistical Analyses In R, Roger D. Peng Apr 2008

Caching And Distributing Statistical Analyses In R, Roger D. Peng

Johns Hopkins University, Dept. of Biostatistics Working Papers

We present the cacher package for \proglang{R}, which provides tools for caching statistical analyses and for distributing these analyses to others in an efficient manner. The cacher package takes objects created by evaluating R expressions and stores them in key-value databases. These databases of cached objects can subsequently be assembled into packages for distribution over the web. The cacher package also provides tools to help readers examine the data and code in a statistical analysis and reproduce, modify, or improve upon the results. In addition, readers can easily conduct alternate analyses of the data. We describe the design and implementation …


A Method For Visualizing Multivariate Time Series Data, Roger D. Peng Feb 2008

A Method For Visualizing Multivariate Time Series Data, Roger D. Peng

Johns Hopkins University, Dept. of Biostatistics Working Papers

Visualization and exploratory analysis is an important part of any data analysis and is made more challenging when the data are voluminous and high-dimensional. One such example is environmental monitoring data, which are often collected over time and at multiple locations, resulting in a geographically indexed multivariate time series. Financial data, although not necessarily containing a geographic component, present another source of high-volume multivariate time series data. We present the mvtsplot function which provides a method for visualizing multivariate time series data. We outline the basic design concepts and provide some examples of its usage by applying it to a …


Bayesian Analysis For Penalized Spline Regression Using Win Bugs, Ciprian M. Crainiceanu, David Ruppert, M.P. Wand Dec 2007

Bayesian Analysis For Penalized Spline Regression Using Win Bugs, Ciprian M. Crainiceanu, David Ruppert, M.P. Wand

Johns Hopkins University, Dept. of Biostatistics Working Papers

Penalized splines can be viewed as BLUPs in a mixed model framework, which allows the use of mixed model software for smoothing. Thus, software originally developed for Bayesian analysis of mixed models can be used for penalized spline regression. Bayesian inference for nonparametric models enjoys the flexibility of nonparametric models and the exact inference provided by the Bayesian inferential machinery. This paper provides a simple, yet comprehensive, set of programs for the implementation of nonparametric Bayesian analysis in WinBUGS. MCMC mixing is substantially improved from the previous versions by using low{rank thin{plate splines instead of truncated polynomial basis. Simulation time …


Distributed Reproducible Research Using Cached Computations, Roger Peng, Sandrah P. Eckel Jun 2007

Distributed Reproducible Research Using Cached Computations, Roger Peng, Sandrah P. Eckel

Johns Hopkins University, Dept. of Biostatistics Working Papers

The ability to make scientific findings reproducible is increasingly important in areas where substantive results are the product of complex statistical computations. Reproducibility can allow others to verify the published findings and conduct alternate analyses of the same data. A question that arises naturally is how can one conduct and distribute reproducible research? This question is relevant from the point of view of both the authors who want to make their research reproducible and readers who want to reproduce relevant findings reported in the scientific literature. We present a framework in which reproducible research can be conducted and distributed via …


A Reproducible Research Toolkit For R, Roger Peng May 2007

A Reproducible Research Toolkit For R, Roger Peng

Johns Hopkins University, Dept. of Biostatistics Working Papers

We present a collection of R packages for conducting and distributing reproducible research using R, Sweave, and LaTeX. The collection consists of the cacheSweave, stashR, and SRPM packages which allow for the caching of computations in Sweave documents and the distribution of those cached computations via remotely accessible key-value databases. We describe the caching mechanism used by the cacheSweave package and tools that we have developed for authors and readers for the purposes of creating and interacting with reproducible documents.


Interacting With Local And Remote Data Respositories Using The Stashr Package, Sandrah P. Eckel, Roger Peng Dec 2006

Interacting With Local And Remote Data Respositories Using The Stashr Package, Sandrah P. Eckel, Roger Peng

Johns Hopkins University, Dept. of Biostatistics Working Papers

The stashR package (a Set of Tools for Administering SHared Repositories) for R implements a simple key-value style database where character string keys are associated with data values. The key-value databases can be either stored locally on the user's computer or accessed remotely via the Internet. Methods specific to the stashR package allow users to share data repositories or access previously created remote data repositories. In particular, methods are available for the S4 classes localDB and remoteDB to insert, retrieve, or delete data from the database as well as to synchronize local copies of the data to the remote version …


Interacting With Data Using The Filehash Package For R, Roger Peng Jun 2006

Interacting With Data Using The Filehash Package For R, Roger Peng

Johns Hopkins University, Dept. of Biostatistics Working Papers

The filehash package for R implements a simple key-value style database where character string keys are associated with data values that are stored on the disk. A simple interface is provided for inserting, retrieving, and deleting data from the database. Utilities are provided that allow filehash databases to be treated much like environments and lists are already used in R. These utilities are provided to encourage interactive and exploratory analysis on large datasets. Three different file formats for representing the database are currently available and new formats can easily be incorporated by third parties for use in the filehash framework.


Semiparametric Regression In Capture-Recapture Modelling, O. Gimenez, C. Barbraud, Ciprian M. Crainiceanu, S. Jenouvrier, B.T. Morgan Dec 2004

Semiparametric Regression In Capture-Recapture Modelling, O. Gimenez, C. Barbraud, Ciprian M. Crainiceanu, S. Jenouvrier, B.T. Morgan

Johns Hopkins University, Dept. of Biostatistics Working Papers

Capture-recapture models were developed to estimate survival using data arising from marking and monitoring wild animals over time. Variation in the survival process may be explained by incorporating relevant covariates. We develop nonparametric and semiparametric regression models for estimating survival in capture-recapture models. A fully Bayesian approach using MCMC simulations was employed to estimate the model parameters. The work is illustrated by a study of Snow petrels, in which survival probabilities are expressed as nonlinear functions of a climate covariate, using data from a 40-year study on marked individuals, nesting at Petrels Island, Terre Adelie.


Spatially Adaptive Bayesian P-Splines With Heteroscedastic Errors, Ciprian M. Crainiceanu, David Ruppert, Raymond J. Carroll Nov 2004

Spatially Adaptive Bayesian P-Splines With Heteroscedastic Errors, Ciprian M. Crainiceanu, David Ruppert, Raymond J. Carroll

Johns Hopkins University, Dept. of Biostatistics Working Papers

An increasingly popular tool for nonparametric smoothing are penalized splines (P-splines) which use low-rank spline bases to make computations tractable while maintaining accuracy as good as smoothing splines. This paper extends penalized spline methodology by both modeling the variance function nonparametrically and using a spatially adaptive smoothing parameter. These extensions have been studied before, but never together and never in the multivariate case. This combination is needed for satisfactory inference and can be implemented effectively by Bayesian \mbox{MCMC}. The variance process controlling the spatially-adaptive shrinkage of the mean and the variance of the heteroscedastic error process are modeled as log-penalized …


Studying Effects Of Primary Care Physicians And Patients On The Trade-Off Between Charges For Primary Care And Specialty Care Using A Hierarchical Multivariate Two-Part Model, John W. Robinson, Scott L. Zeger, Christopher B. Forrest Aug 2004

Studying Effects Of Primary Care Physicians And Patients On The Trade-Off Between Charges For Primary Care And Specialty Care Using A Hierarchical Multivariate Two-Part Model, John W. Robinson, Scott L. Zeger, Christopher B. Forrest

Johns Hopkins University, Dept. of Biostatistics Working Papers

Objective. To examine effects of primary care physicians (PCPs) and patients on the association between charges for primary care and specialty care in a point-of-service (POS) health plan.

Data Source. Claims from 1996 for 3,308 adult male POS plan members, each of whom was assigned to one of the 50 family practitioner-PCPs with the largest POS plan member-loads.

Study Design. A hierarchical multivariate two-part model was fitted using a Gibbs sampler to estimate PCPs' effects on patients' annual charges for two types of services, primary care and specialty care, the associations among PCPs' effects, and within-patient associations between charges for …


A Hierarchical Multivariate Two-Part Model For Profiling Providers' Effects On Healthcare Charges, John W. Robinson, Scott L. Zeger, Christopher B. Forrest Aug 2004

A Hierarchical Multivariate Two-Part Model For Profiling Providers' Effects On Healthcare Charges, John W. Robinson, Scott L. Zeger, Christopher B. Forrest

Johns Hopkins University, Dept. of Biostatistics Working Papers

Procedures for analyzing and comparing healthcare providers' effects on health services delivery and outcomes have been referred to as provider profiling. In a typical profiling procedure, patient-level responses are measured for clusters of patients treated by providers that in turn, can be regarded as statistically exchangeable. Thus, a hierarchical model naturally represents the structure of the data. When provider effects on multiple responses are profiled, a multivariate model rather than a series of univariate models, can capture associations among responses at both the provider and patient levels. When responses are in the form of charges for healthcare services and sampled …


Kernel Estimation Of Rate Function For Recurrent Event Data, Chin-Tsang Chiang, Mei-Cheng Wang, Chiung-Yu Huang Dec 2003

Kernel Estimation Of Rate Function For Recurrent Event Data, Chin-Tsang Chiang, Mei-Cheng Wang, Chiung-Yu Huang

Johns Hopkins University, Dept. of Biostatistics Working Papers

Recurrent event data are largely characterized by the rate function but smoothing techniques for estimating the rate function have never been rigorously developed or studied in statistical literature. This paper considers the moment and least squares methods for estimating the rate function from recurrent event data. With an independent censoring assumption on the recurrent event process, we study statistical properties of the proposed estimators and propose bootstrap procedures for the bandwidth selection and for the approximation of confidence intervals in the estimation of the occurrence rate function. It is identified that the moment method without resmoothing via a smaller bandwidth …


Cross-Calibration Of Stroke Disability Measures: Bayesian Analysis Of Longitudinal Ordinal Categorical Data Using Negative Dependence, Giovanni Parmigiani, Heidi W. Ashih, Gregory P. Samsa, Pamela W. Duncan, Sue Min Lai, David B. Matchar Aug 2003

Cross-Calibration Of Stroke Disability Measures: Bayesian Analysis Of Longitudinal Ordinal Categorical Data Using Negative Dependence, Giovanni Parmigiani, Heidi W. Ashih, Gregory P. Samsa, Pamela W. Duncan, Sue Min Lai, David B. Matchar

Johns Hopkins University, Dept. of Biostatistics Working Papers

It is common to assess disability of stroke patients using standardized scales, such as the Rankin Stroke Outcome Scale (RS) and the Barthel Index (BI). The Rankin Scale, which was designed for applications to stroke, is based on assessing directly the global conditions of a patient. The Barthel Index, which was designed for general applications, is based on a series of questions about the patient’s ability to carry out 10 basis activities of daily living. As both scales are commonly used, but few studies use both, translating between scales is important in gaining an overall understanding of the efficacy of …


Checking Assumptions In Latent Class Regression Models Via A Markov Chain Monte Carlo Estimation Approach: An Application To Depression And Socio-Economic Status, Elizabeth Garrett, Richard Miech, Pamela Owens, William W. Eaton, Scott L. Zeger Jan 2003

Checking Assumptions In Latent Class Regression Models Via A Markov Chain Monte Carlo Estimation Approach: An Application To Depression And Socio-Economic Status, Elizabeth Garrett, Richard Miech, Pamela Owens, William W. Eaton, Scott L. Zeger

Johns Hopkins University, Dept. of Biostatistics Working Papers

Latent class regression models are useful tools for assessing associations between covariates and latent variables. However, evaluation of key model assumptions cannot be performed using methods from standard regression models due to the unobserved nature of latent outcome variables. This paper presents graphical diagnostic tools to evaluate whether or not latent class regression models adhere to standard assumptions of the model: conditional independence and non-differential measurement. An integral part of these methods is the use of a Markov Chain Monte Carlo estimation procedure. Unlike standard maximum likelihood implementations for latent class regression model estimation, the MCMC approach allows us to …