Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 57

Full-Text Articles in Physical Sciences and Mathematics

Optimal Spatial Prediction Using Ensemble Machine Learning, Molly M. Davies, Mark J. Van Der Laan Dec 2012

Optimal Spatial Prediction Using Ensemble Machine Learning, Molly M. Davies, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Spatial prediction is an important problem in many scientific disciplines. Super Learner is an ensemble prediction approach related to stacked generalization that uses cross-validation to search for the optimal predictor amongst all convex combinations of a heterogeneous candidate set. It has been applied to non-spatial data, where theoretical results demonstrate it will perform asymptotically at least as well as the best candidate under consideration. We review these optimality properties and discuss the assumptions required in order for them to hold for spatial prediction problems. We present results of a simulation study confirming Super Learner works well in practice under a …


Relating Nanoparticle Properties To Biological Outcomes In Exposure Escalation Experiments, Trina Patel, Cecile Low-Kam, Zhaoxia Ji, Haiyuan Zhang, Tian Xia, Andre E. Nel, Jeffrey I. Zinc, Donatello Telesca Dec 2012

Relating Nanoparticle Properties To Biological Outcomes In Exposure Escalation Experiments, Trina Patel, Cecile Low-Kam, Zhaoxia Ji, Haiyuan Zhang, Tian Xia, Andre E. Nel, Jeffrey I. Zinc, Donatello Telesca

COBRA Preprint Series

A fundamental goal in nano-toxicology is that of identifying particle physical and chemical properties, which are likely to explain biological hazard. The first line of screening for potentially adverse outcomes often consists of exposure escalation experiments, involving the exposure of micro-organisms or cell lines to a battery of nanomaterials. We discuss a modeling strategy, that relates the outcome of an exposure escalation experiment to nanoparticle properties. Our approach makes use of a hierarchical decision process, where we jointly identify particles that initiate adverse biological outcomes and explain the probability of this event in terms of the particle physico-chemical descriptors. The …


A Regionalized National Universal Kriging Model Using Partial Least Squares Regression For Estimating Annual Pm2.5 Concentrations In Epidemiology, Paul D. Sampson, Mark Richards, Adam A. Szpiro, Silas Bergen, Lianne Sheppard, Timothy V. Larson, Joel Kaufman Dec 2012

A Regionalized National Universal Kriging Model Using Partial Least Squares Regression For Estimating Annual Pm2.5 Concentrations In Epidemiology, Paul D. Sampson, Mark Richards, Adam A. Szpiro, Silas Bergen, Lianne Sheppard, Timothy V. Larson, Joel Kaufman

UW Biostatistics Working Paper Series

Many cohort studies in environmental epidemiology require accurate modeling and prediction of fine scale spatial variation in ambient air quality across the U.S. This modeling requires the use of small spatial scale geographic or “land use” regression covariates and some degree of spatial smoothing. Furthermore, the details of the prediction of air quality by land use regression and the spatial variation in ambient air quality not explained by this regression should be allowed to vary across the continent due to the large scale heterogeneity in topography, climate, and sources of air pollution. This paper introduces a regionalized national universal kriging …


Sensitivity Analysis For Causal Inference Under Unmeasured Confounding And Measurement Error Problems, Iván Díaz, Mark J. Van Der Laan Dec 2012

Sensitivity Analysis For Causal Inference Under Unmeasured Confounding And Measurement Error Problems, Iván Díaz, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In this paper we present a sensitivity analysis for drawing inferences about parameters that are not estimable from observed data without additional assumptions. We present the methodology using two different examples: a causal parameter that is not identifiable due to violations of the randomization assumption, and a parameter that is not estimable in the nonparametric model due to measurement error. Existing methods for tackling these problems assume a parametric model for the type of violation to the identifiability assumption, and require the development of new estimators and inference for every new model. The method we present can be used in …


Computationally Efficient Confidence Intervals For Cross-Validated Area Under The Roc Curve Estimates, Erin Ledell, Maya L. Petersen, Mark J. Van Der Laan Dec 2012

Computationally Efficient Confidence Intervals For Cross-Validated Area Under The Roc Curve Estimates, Erin Ledell, Maya L. Petersen, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In binary classification problems, the area under the ROC curve (AUC), is an effective means of measuring the performance of your model. Most often, cross-validation is also used, in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we must obtain an estimate for its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, calculating the cross-validated AUC on even a relatively small data set can still require a …


A National Model Built With Partial Least Squares And Universal Kriging And Bootstrap-Based Measurement Error Correction Techniques: An Application To The Multi-Ethnic Study Of Atherosclerosis, Silas Bergen, Lianne Sheppard, Paul D. Sampson, Sun-Young Kim, Mark Richards, Sverre Vedal, Joel Kaufman, Adam A. Szpiro Dec 2012

A National Model Built With Partial Least Squares And Universal Kriging And Bootstrap-Based Measurement Error Correction Techniques: An Application To The Multi-Ethnic Study Of Atherosclerosis, Silas Bergen, Lianne Sheppard, Paul D. Sampson, Sun-Young Kim, Mark Richards, Sverre Vedal, Joel Kaufman, Adam A. Szpiro

UW Biostatistics Working Paper Series

Studies estimating health effects of long-term air pollution exposure often use a two-stage approach, building exposure models to assign individual-level exposures which are then used in regression analyses. This requires accurate exposure modeling and careful treatment of exposure measurement error. To illustrate the importance of carefully accounting for exposure model characteristics in two-stage air pollution studies, we consider a case study based on data from the Multi-Ethnic Study of Atherosclerosis (MESA). We present national spatial exposure models that use partial least squares and universal kriging to estimate annual average concentrations of four PM2.5 components: elemental carbon (EC), organic carbon (OC), …


Nonparametric Inference For Meta Analysis With Fixed Unknown, Study-Specific Parameters, Brian Claggett, Minge Xie, Lu Tian Nov 2012

Nonparametric Inference For Meta Analysis With Fixed Unknown, Study-Specific Parameters, Brian Claggett, Minge Xie, Lu Tian

Harvard University Biostatistics Working Paper Series

No abstract provided.


Statistical Inference When Using Data Adaptive Estimators Of Nuisance Parameters, Mark J. Van Der Laan Nov 2012

Statistical Inference When Using Data Adaptive Estimators Of Nuisance Parameters, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In order to be concrete we focus on estimation of the treatment specific mean, controlling for all measured baseline covariates, based on observing n independent and identically distributed copies of a random variable consisting of baseline covariates, a subsequently assigned binary treatment, and a final outcome. The statistical model only assumes possible restrictions on the conditional distribution of treatment, given the covariates, the so called propensity score. Estimators of the treatment specific mean involve estimation of the propensity score and/or estimation of the conditional mean of the outcome, given the treatment and covariates. In order to make these estimators asymptotically …


Treatment Selections Using Risk-Benefit Profiles Based On Data From Comparative Randomized Clinical Trials With Multiple Endpoints, Brian Claggett, Lu Tian, Davide Castagno, L. J. Wei Nov 2012

Treatment Selections Using Risk-Benefit Profiles Based On Data From Comparative Randomized Clinical Trials With Multiple Endpoints, Brian Claggett, Lu Tian, Davide Castagno, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


Likelihood Ratio Tests For The Mean Structure Of Correlated Functional Processes, Ana-Maria Staicu, Yingxing Li, Ciprian Crainiceanu, David M. Ruppert Nov 2012

Likelihood Ratio Tests For The Mean Structure Of Correlated Functional Processes, Ana-Maria Staicu, Yingxing Li, Ciprian Crainiceanu, David M. Ruppert

Johns Hopkins University, Dept. of Biostatistics Working Papers

The paper introduces a general framework for testing hypotheses about the structure of the mean function of complex functional processes. Important particular cases of the proposed framework are: 1) testing the null hypotheses that the mean of a functional process is parametric against a nonparametric alternative; and 2) testing the null hypothesis that the means of two possibly correlated functional processes are equal or differ by only a simple parametric function. A global pseudo likelihood ratio test is proposed and its asymptotic distribution is derived. The size and power properties of the test are confirmed in realistic simulation scenarios. Finite …


Longitudinal Functional Models With Structured Penalties, Madan G. Kundu, Jaroslaw Harezlak, Timothy W. Randolph Nov 2012

Longitudinal Functional Models With Structured Penalties, Madan G. Kundu, Jaroslaw Harezlak, Timothy W. Randolph

Johns Hopkins University, Dept. of Biostatistics Working Papers

Collection of functional data is becoming increasingly common including longitudinal observations in many studies. For example, we use magnetic resonance (MR) spectra collected over a period of time from late stage HIV patients. MR spectroscopy (MRS) produces a spectrum which is a mixture of metabolite spectra, instrument noise and baseline profile. Analysis of such data typically proceeds in two separate steps: feature extraction and regression modeling. In contrast, a recently-proposed approach, called partially empirical eigenvectors for regression (PEER) (Randolph, Harezlak and Feng, 2012), for functional linear models incorporates a priori knowledge via a scientifically-informed penalty operator in the regression function …


Pls-Rog: Partial Least Squares With Rank Order Of Groups, Hiroyuki Yamamoto Oct 2012

Pls-Rog: Partial Least Squares With Rank Order Of Groups, Hiroyuki Yamamoto

COBRA Preprint Series

Partial least squares (PLS), which is an unsupervised dimensionality reduction method, has been widely used in metabolomics. PLS can separate score depend on groups in a low dimensional subspace. However, this cannot use the information about rank order of groups. This information is often provided in which concentration of administered drugs to animals is gradually varies. In this study, we proposed partial least squares for rank order of groups (PLS-ROG). PLS-ROG can consider both separation and rank order of groups.


Statistical Hypothesis Test Of Factor Loading In Principal Component Analysis And Its Application To Metabolite Set Enrichment Analysis, Hiroyuki Yamamoto, Tamaki Fujimori, Hajime Sato, Gen Ishikawa, Kenjiro Kami, Yoshiaki Ohashi Oct 2012

Statistical Hypothesis Test Of Factor Loading In Principal Component Analysis And Its Application To Metabolite Set Enrichment Analysis, Hiroyuki Yamamoto, Tamaki Fujimori, Hajime Sato, Gen Ishikawa, Kenjiro Kami, Yoshiaki Ohashi

COBRA Preprint Series

Principal component analysis (PCA) has been widely used to visualize high-dimensional metabolomic data in a two- or three-dimensional subspace. In metabolomics, some metabolites (e.g. top 10 metabolites) have been subjectively selected when using factor loading in PCA, and biological inferences for these metabolites are made. However, this approach is possible to lead biased biological inferences because these metabolites are not objectively selected by statistical criterion. We proposed a statistical procedure to pick up metabolites by statistical hypothesis test of factor loading in PCA and make biological inferences by metabolite set enrichment analysis (MSEA) for these significant metabolites. This procedure depends …


Decline In Health For Older Adults: 5-Year Change In 13 Key Measures Of Standardized Health, Paula H. Diehr, Stephen M. Thielke, Anne B. Newman, Calvin H. Hirsch, Russell Tracy Oct 2012

Decline In Health For Older Adults: 5-Year Change In 13 Key Measures Of Standardized Health, Paula H. Diehr, Stephen M. Thielke, Anne B. Newman, Calvin H. Hirsch, Russell Tracy

UW Biostatistics Working Paper Series

Introduction

The health of older adults declines over time, but there are many ways of measuring health. We examined whether all measures declined at the same rate, or whether some aspects of health were less sensitive to aging than others.

Methods

We compared the decline in 13 measures of physical, mental, and functional health from the Cardiovascular Health Study: hospitalization, bed days, cognition, extremity strength, feelings about life as a whole, satisfaction with the purpose of life, self-rated health, depression, digit symbol substitution test, grip strength, ADLs, IADLs, and gait speed. Each measure was standardized against self-rated health. We compared …


Methods For Evaluating Prediction Performance Of Biomarkers And Tests, Margaret Pepe, Holly Janes Oct 2012

Methods For Evaluating Prediction Performance Of Biomarkers And Tests, Margaret Pepe, Holly Janes

UW Biostatistics Working Paper Series

This chapter describes and critiques methods for evaluating the performance of markers to predict risk of a current or future clinical outcome. We consider three criteria that are important for evaluating a risk model: calibration, benefit for decision making and accurate classification. We also describe and discuss a variety of summary measures in common use for quantifying predictive information such as the area under the ROC curve and R-squared. The roles and problems with recently proposed risk reclassification approaches are discussed in detail.


The Impact Of Covariance Misspecification In Multivariate Gaussian Mixtures On Estimation And Inference: An Application To Longitudinal Modeling, Brianna C. Heggeseth, Nicholas P. Jewell Oct 2012

The Impact Of Covariance Misspecification In Multivariate Gaussian Mixtures On Estimation And Inference: An Application To Longitudinal Modeling, Brianna C. Heggeseth, Nicholas P. Jewell

U.C. Berkeley Division of Biostatistics Working Paper Series

Multivariate Gaussian mixtures are a class of models that provide a flexible parametric approach for the representation of heterogeneous multivariate outcomes. When the outcome is a vector of repeated measurements taken on the same subject, there is often inherent dependence between observations. However, a common covariance assumption is conditional independence---that is, given the mixture component label, the outcomes for subjects are independent. In this paper, we study, through asymptotic bias calculations and simulation, the impact of covariance misspecification in multivariate Gaussian mixtures. Although maximum likelihood estimators of regression and mixing probability parameters are not consistent under misspecification, they have little …


Borrowing Information Across Populations In Estimating Positive And Negative Predictive Values, Ying Huang, Youyi Fong, John Wei, Ziding Feng Oct 2012

Borrowing Information Across Populations In Estimating Positive And Negative Predictive Values, Ying Huang, Youyi Fong, John Wei, Ziding Feng

UW Biostatistics Working Paper Series

A marker's capacity to predict risk of a disease depends on disease prevalence in the target population and its classification accuracy, i.e. its ability to discriminate diseased subjects from non-diseased subjects. The latter is often considered an intrinsic property of the marker; it is independent of disease prevalence and hence more likely to be similar across populations than risk prediction measures. In this paper, we are interested in evaluating the population-specific performance of a risk prediction marker in terms of positive predictive value (PPV) and negative predictive value (NPV) at given thresholds, when samples are available from the target population …


Causal Inference For Networks, Mark J. Van Der Laan Oct 2012

Causal Inference For Networks, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Suppose that we observe a population of causally connected units according to a network. On each unit we observe a set of potentially connected units that contains the true connections, and a longitudinal data structure, which includes time-dependent exposure or treatment, time-dependent covariates, a final outcome of interest. The target quantity of interest is defined as the mean outcome for this group of units if the exposures of the units would be probabilistically assigned according to a known specified mechanism, where the latter is called a stochastic intervention. Causal effects of interest are defined as contrasts of the mean of …


Targeted Learning Of The Probability Of Success Of An In Vitro Fertilization Program Controlling For Time-Dependent Confounders, Antoine Chambaz, Sherri Rose, Jean Bouyer, Mark J. Van Der Laan Oct 2012

Targeted Learning Of The Probability Of Success Of An In Vitro Fertilization Program Controlling For Time-Dependent Confounders, Antoine Chambaz, Sherri Rose, Jean Bouyer, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Infertility is a global public health issue and various treatments are available. In vitro fertilization (IVF) is an increasingly common treatment method, but accurately assessing the success of IVF programs has proven challenging since they consist of multiple cycles. We present a double robust semiparametric method that incorporates machine learning to estimate the probability of success (i.e., delivery resulting from embryo transfer) of a program of at most four IVF cycles in the French Devenir Apr`es Interruption de la FIV (DAIFI) study and several simulation studies, controlling for time-dependent confounders. We find that the probability of success in the DAIFI …


Assessing The Causal Effect Of Policies: An Approach Based On Stochastic Interventions, Iván Díaz, Mark J. Van Der Laan Oct 2012

Assessing The Causal Effect Of Policies: An Approach Based On Stochastic Interventions, Iván Díaz, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Stochastic interventions are a powerful tool to define parameters that measure the causal effect of a realistic intervention that intends to alter the population distribution of an exposure. In this paper we follow the approach described in D\'iaz and van der Laan (2011) to define and estimate the effect of an intervention that is expected to cause a truncation in the population distribution of the exposure. The observed data parameter that identifies the causal parameter of interest is established, as well as its efficient influence function under the non parametric model. Inverse probability of treatment weighted (IPTW), augmented IPTW and …


Quantifying Alternative Splicing From Paired-End Rna-Sequencing Data, David Rossell, Camille Stephan-Otto Attolini, Manuel Kroiss, Almond Stöcker Sep 2012

Quantifying Alternative Splicing From Paired-End Rna-Sequencing Data, David Rossell, Camille Stephan-Otto Attolini, Manuel Kroiss, Almond Stöcker

COBRA Preprint Series

RNA-sequencing has revolutionized biomedical research and, in particular, our ability to study gene alternative splicing. The problem has important implications for human health, as alternative splicing is involved in malfunctions at the cellular level and multiple diseases. However, the high-dimensional nature of the data and the existence of experimental biases pose serious data analysis challenges. We find that the standard data summaries used to study alternative splicing are severely limited, as they ignore a substantial amount of valuable information. Current data analysis methods are based on such summaries and are hence sub-optimal. Further, they have limited flexibility in accounting for …


Robust Estimation Of Pure/Natural Direct Effects With Mediator Measurement Error, Eric J. Tchetgen Tchetgen, Sheng Hsuan Lin Sep 2012

Robust Estimation Of Pure/Natural Direct Effects With Mediator Measurement Error, Eric J. Tchetgen Tchetgen, Sheng Hsuan Lin

COBRA Preprint Series

Recent developments in causal mediation analysis have offered new notions of direct and indirect effects, that formalize more traditional and informal notions of mediation analysis emanating primarily from the social sciences. The pure or natural direct effect of Robins-Greenland-Pearl quantifies the causal effect of an exposure that is not mediated by a variable on the causal pathway to the outcome, and combines with the natural indirect effect to produce the total causal effect of the exposure. Sufficient conditions for identification of natural direct effects were previously given, that assume certain independencies about potential outcomes, and a rich literature on estimation …


Robust Estimation Of Pure/Natural Direct Effects With Mediator Measurement Error, Eric J. Tchetgen Tchetgen, Sheng Hsuan Lin Sep 2012

Robust Estimation Of Pure/Natural Direct Effects With Mediator Measurement Error, Eric J. Tchetgen Tchetgen, Sheng Hsuan Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.


Targeted Learning For Causality And Statistical Analysis In Medical Research, Sherri Rose, Richard J.C.M. Starmans, Mark J. Van Der Laan Aug 2012

Targeted Learning For Causality And Statistical Analysis In Medical Research, Sherri Rose, Richard J.C.M. Starmans, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

The authors present the use of targeted learning methods for medical research, prepared as a chapter for the upcoming book "Statistics: Discovering Your Future Power." The targeted learning framework involves the explicit specification of the data, model, and parameter. The estimators are double robust and efficient, and can incorporate machine learning procedures such as the super learner.


Modeling Sleep Fragmentation In Populations Of Sleep Hypnograms, Bruce J. Swihart, Naresh M. Punjabi, Ciprian M. Crainiceanu Aug 2012

Modeling Sleep Fragmentation In Populations Of Sleep Hypnograms, Bruce J. Swihart, Naresh M. Punjabi, Ciprian M. Crainiceanu

Johns Hopkins University, Dept. of Biostatistics Working Papers

We introduce methods for the analysis of large populations of sleep architectures (hypnograms) that respect the 5-state 20-transition-type structure defined by the American Academy of Sleep Medicine. By applying these methods to the hypnograms of 5598 subjects from the Sleep Heart Health Study we: 1) provide the firrst analysis of sleep hypnogram data of such size and complexity in a community cohort with a 4-level comorbidity; 2) compare 5-state 20-transition-type sleep to 3-state 6-transition-type sleep for a check of feasibility and information-loss; 3) extend current approaches to multivariate survival data analysis to populations of time-to-transition processes; and 4) provide scalable …


A Phase I Bayesian Adaptive Design To Simultaneously Optimize Dose And Schedule Assignments Both Among And Within Patients, Thomas M. Braun, Jin Zhang Aug 2012

A Phase I Bayesian Adaptive Design To Simultaneously Optimize Dose And Schedule Assignments Both Among And Within Patients, Thomas M. Braun, Jin Zhang

The University of Michigan Department of Biostatistics Working Paper Series

In traditional schedule or dose-schedule finding designs, patients are assumed to receive their assigned dose-schedule combination throughout the trial even though the combination may be found to have an undesirable toxicity profile, which contradicts actual clinical practice. Since no systematic approach exists to optimize intra-patient dose-schedule as- signment, we propose a Phase I clinical trial design that extends existing approaches that optimize dose and schedule solely among patients by incorporating adaptive variations to dose-schedule assignments within patients as the study proceeds. Our design is based on a Bayesian non-mixture cure rate model that incorporates multiple administrations each patient receives with …


Fitting And Interpreting Continuous-Time Latent Markov Models For Panel Data, Jane M. Lange, Vladimir N. Minin Aug 2012

Fitting And Interpreting Continuous-Time Latent Markov Models For Panel Data, Jane M. Lange, Vladimir N. Minin

UW Biostatistics Working Paper Series

Multistate models are used to characterize disease processes within an individual. Clinical studies often observe the disease status of individuals at discrete time points, making exact times of transitions between disease states unknown. Such panel data pose considerable modeling challenges. Assuming the disease process progresses according a standard continuous-time Markov chain (CTMC) yields tractable likelihoods, but the assumption of exponential sojourn time distributions is typically unrealistic. More flexible semi-Markov models permit generic sojourn distributions yet yield intractable likelihoods for panel data in the presence of reversible transitions. One attractive alternative is to assume that the disease process is characterized by …


Transitions Among Health States Using 12 Measures Of Successful Aging: Results From The Cardiovascular Health Study, Stephen Thielke, Paula Diehr Aug 2012

Transitions Among Health States Using 12 Measures Of Successful Aging: Results From The Cardiovascular Health Study, Stephen Thielke, Paula Diehr

UW Biostatistics Working Paper Series

Introduction

Successful aging has many dimensions, which may manifest differently in men and women and at different ages. We sought to characterize one-year transitions in 12 measures of successful aging among a large cohort of older adults.

Methods

We analyzed twelve different measures of health in the Cardiovascular Health Study: self-rated health, ADLs, IADLs, depression, cognition, timed walk, number of days spent in bed, number of blocks walked, extremity strength, recent hospitalizations, feelings about life as a whole, and life satisfaction. We dichotomized responses for each variable into “healthy” or “sick”, and estimated the prevalence of the healthy state and …


Flexible Covariate-Adjusted Exact Tests For Randomized Studies, Alisa J. Stephens, Eric J. Tchetgen Tchetgen, Victor De Gruttola Aug 2012

Flexible Covariate-Adjusted Exact Tests For Randomized Studies, Alisa J. Stephens, Eric J. Tchetgen Tchetgen, Victor De Gruttola

Harvard University Biostatistics Working Paper Series

No abstract provided.


Locally Efficient Estimation Of Marginal Treatment Effects When Outcomes Are Correlated: Is The Prize Worth The Chase?, Alisa J. Stephens, Eric J. Tchetgen Tchetgen, Victor De Gruttola Aug 2012

Locally Efficient Estimation Of Marginal Treatment Effects When Outcomes Are Correlated: Is The Prize Worth The Chase?, Alisa J. Stephens, Eric J. Tchetgen Tchetgen, Victor De Gruttola

Harvard University Biostatistics Working Paper Series

No abstract provided.