Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

2008

Discipline
Institution
Keyword
Publication
Publication Type

Articles 1 - 26 of 26

Full-Text Articles in Statistical Models

Spatial Misalignment In Time Series Studies Of Air Pollution And Health Data, Roger D. Peng, Michelle L. Bell Dec 2008

Spatial Misalignment In Time Series Studies Of Air Pollution And Health Data, Roger D. Peng, Michelle L. Bell

Johns Hopkins University, Dept. of Biostatistics Working Papers

Time series studies of environmental exposures often involve comparing daily changes in a toxicant measured at a point in space with daily changes in an aggregate measure of health. Spatial misalignment of the exposure and response variables can bias the estimation of health risk and the magnitude of this bias depends on the spatial variation of the exposure of interest. In air pollution epidemiology, there is an increasing focus on estimating the health effects of the chemical components of particulate matter. One issue that is raised by this new focus is the spatial misalignment error introduced by the lack of …


Space-Time Regression Modeling Of Tree Growth Using The Skew-T Distribution, Farouk S. Nathoo Dec 2008

Space-Time Regression Modeling Of Tree Growth Using The Skew-T Distribution, Farouk S. Nathoo

COBRA Preprint Series

In this article we present new statistical methodology for the analysis of repeated measures of spatially correlated growth data. Our motivating application, a ten year study of height growth in a plantation of even-aged white spruce, presents several challenges for statistical analysis. Here, the growth measurements arise from an asymmetric distribution, with heavy tails, and thus standard longitudinal regression models based on a Gaussian error structure are not appropriate. We seek more flexibility for modeling both skewness and fat tails, and achieve this within the class of skew-elliptical distributions. Within this framework, robust space-time regression models are formulated using random …


Predicting Intra-Urban Variation In Air Pollution Concentrations With Complex Spatio-Temporal Interactions, Adam A. Szpiro, Paul D. Sampson, Lianne Sheppard, Thomas Lumley, Sara D. Adar, Joel Kaufman Nov 2008

Predicting Intra-Urban Variation In Air Pollution Concentrations With Complex Spatio-Temporal Interactions, Adam A. Szpiro, Paul D. Sampson, Lianne Sheppard, Thomas Lumley, Sara D. Adar, Joel Kaufman

UW Biostatistics Working Paper Series

We describe a methodology for assigning individual estimates of long-term average air pollution concentrations that accounts for a complex spatio-temporal correlation structure and can accommodate unbalanced observations. This methodology has been developed as part of the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air), a prospective cohort study funded by the U.S. EPA to investigate the relationship between chronic exposure to air pollution and cardiovascular disease. Our hierarchical model decomposes the space-time field into a “mean” that includes dependence on covariates and spatially varying seasonal and long-term trends and a “residual” that accounts for spatially correlated deviations from the …


Optimal Cutpoint Estimation With Censored Data, Mithat Gonen, Camelia Sima Nov 2008

Optimal Cutpoint Estimation With Censored Data, Mithat Gonen, Camelia Sima

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

We consider the problem of selecting an optimal cutpoint for a continuous marker when the outcome of interest is subject to right censoring. Maximal chi square methods and receiver operating characteristic (ROC) curves-based methods are commonly-used when the outcome is binary. In this article we show that selecting the cutpoint that maximizes the concordance, a metric similar to the area under an ROC curve, is equivalent to maximizing the Youden index, a popular criterion when the ROC curve is used to choose a threshold. We use this as a basis for proposing maximal concordance as a metric to use with …


A Simple Index Of Smoking, Abhaya Indrayan Dr., Rajeev Kumar Mr., Shridhar Dwivedi Dr. Nov 2008

A Simple Index Of Smoking, Abhaya Indrayan Dr., Rajeev Kumar Mr., Shridhar Dwivedi Dr.

COBRA Preprint Series

Background: Cigarette smoking is implicated in a large number of diseases and other adverse health conditions. Among the dimensions of smoking are number of cigarettes smoked per day, duration of smoking, passive smoking, smoking of filter cigarettes, age at start, and duration elapsed since quitting by ex-smokers. The practice so far is to study most of these separately. We develop a simple index that integrates these dimensions of smoking into a single metric, and suggest that this index be developed further. Method: The index is developed under a series of natural assumptions. Broadly, these are (i) the burden of smoking …


The Strength Of Statistical Evidence For Composite Hypotheses With An Application To Multiple Comparisons, David R. Bickel Nov 2008

The Strength Of Statistical Evidence For Composite Hypotheses With An Application To Multiple Comparisons, David R. Bickel

COBRA Preprint Series

The strength of the statistical evidence in a sample of data that favors one composite hypothesis over another may be quantified by the likelihood ratio using the parameter value consistent with each hypothesis that maximizes the likelihood function. Unlike the p-value and the Bayes factor, this measure of evidence is coherent in the sense that it cannot support a hypothesis over any hypothesis that it entails. Further, when comparing the hypothesis that the parameter lies outside a non-trivial interval to the hypotheses that it lies within the interval, the proposed measure of evidence almost always asymptotically favors the correct hypothesis …


Calibrating Parametric Subject-Specific Risk Estimation, Tianxi Cai, Lu Tian, Hajime Uno, Scott D. Solomon, L. J. Wei Oct 2008

Calibrating Parametric Subject-Specific Risk Estimation, Tianxi Cai, Lu Tian, Hajime Uno, Scott D. Solomon, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


Multilevel Latent Class Models With Dirichlet Mixing Distribution, Chongzhi Di, Karen Bandeen-Roche Oct 2008

Multilevel Latent Class Models With Dirichlet Mixing Distribution, Chongzhi Di, Karen Bandeen-Roche

Johns Hopkins University, Dept. of Biostatistics Working Papers

Latent class analysis (LCA) and latent class regression (LCR) are widely used for modeling multivariate categorical outcomes in social sciences and biomedical studies. Standard analyses assume data of different respondents to be mutually independent, excluding application of the methods to familial and other designs in which participants are clustered. In this paper, we develop multilevel latent class model, in which subpopulation mixing probabilities are treated as random effects that vary among clusters according to a common Dirichlet distribution. We apply the Expectation-Maximization (EM) algorithm for model fitting by maximum likelihood (ML). This approach works well, but is computationally intensive when …


Evaluating Subject-Level Incremental Values Of New Markers For Risk Classification Rule, Tianxi Cai, Lu Tian, Donald M. Lloyd-Jones, L. J. Wei Oct 2008

Evaluating Subject-Level Incremental Values Of New Markers For Risk Classification Rule, Tianxi Cai, Lu Tian, Donald M. Lloyd-Jones, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


Limitations Of Remotely-Sensed Aerosol As A Spatial Proxy For Fine Particulate Matter, Christopher J. Paciorek, Yang Liu Sep 2008

Limitations Of Remotely-Sensed Aerosol As A Spatial Proxy For Fine Particulate Matter, Christopher J. Paciorek, Yang Liu

Harvard University Biostatistics Working Paper Series

Recent research highlights the promise of remotely-sensed aerosol optical depth (AOD) as a proxy for ground-level PM2.5. Particular interest lies in the information on spatial heterogeneity potentially provided by AOD, with important application to estimating and monitoring pollution exposure for public health purposes. Given the temporal and spatio-temporal correlations reported between AOD and PM2.5 , it is tempting to interpret the spatial patterns in AOD as reflecting patterns in PM2.5 . Here we find only limited spatial associations of AOD from three satellite retrievals with PM2.5 over the eastern U.S. at the daily and yearly levels in 2004. We then …


Expanded Technical Report: Mapping Ancient Forests: Bayesian Inference For Spatio-Temporal Trends In Forest Composition Using The Fossil Pollen Proxy Record, Christopher J. Paciorek, Jason S. Mclachlan Sep 2008

Expanded Technical Report: Mapping Ancient Forests: Bayesian Inference For Spatio-Temporal Trends In Forest Composition Using The Fossil Pollen Proxy Record, Christopher J. Paciorek, Jason S. Mclachlan

Harvard University Biostatistics Working Paper Series

No abstract provided.


Measurement Error Caused By Spatial Misalignment In Environmental Epidemiology, Alexandros Gryparis, Christopher J. Paciorek, Ariana Zeka, Joel Schwartz, Brent A. Coull Sep 2008

Measurement Error Caused By Spatial Misalignment In Environmental Epidemiology, Alexandros Gryparis, Christopher J. Paciorek, Ariana Zeka, Joel Schwartz, Brent A. Coull

Harvard University Biostatistics Working Paper Series

No abstract provided.


Practical Large-Scale Spatio-Temporal Modeling Of Particulate Matter Concentrations, Christopher J. Paciorek, Jeff D. Yanosky, Robin C. Puett, Francine Laden, Helen H. Suh Sep 2008

Practical Large-Scale Spatio-Temporal Modeling Of Particulate Matter Concentrations, Christopher J. Paciorek, Jeff D. Yanosky, Robin C. Puett, Francine Laden, Helen H. Suh

Harvard University Biostatistics Working Paper Series

The last two decades have seen intense scientific and regulatory interest in the health effects of particulate matter (PM). Influential epidemiological studies that characterize chronic exposure of individuals rely on monitoring data that are sparse in space and time, so they often assign the same exposure to participants in large geographic areas and across time. We estimate monthly PM during 1988-2002 in a large spatial domain for use in studying health effects in the Nurses' Health Study. We develop a conceptually simple spatio-temporal model that uses a rich set of covariates. The model is used to estimate concentrations of PM10 …


Confidence Intervals For Negative Binomial Random Variables Of High Dispersion, David Shilane, Alan E. Hubbard, S N. Evans Aug 2008

Confidence Intervals For Negative Binomial Random Variables Of High Dispersion, David Shilane, Alan E. Hubbard, S N. Evans

U.C. Berkeley Division of Biostatistics Working Paper Series

This paper considers the problem of constructing confidence intervals for the mean of a Negative Binomial random variable based upon sampled data. When the sample size is large, we traditionally rely upon a Normal distribution approximation to construct these intervals. However, we demonstrate that the sample mean of highly dispersed Negative Binomials exhibits a slow convergence to the Normal in distribution as a function of the sample size. As a result, standard techniques (such as the Normal approximation and bootstrap) that construct confidence intervals for the mean will typically be too narrow and significantly undercover in the case of high …


Joint Spatial Modeling Of Recurrent Infection And Growth With Processes Under Intermittent Observation, Farouk S. Nathoo Aug 2008

Joint Spatial Modeling Of Recurrent Infection And Growth With Processes Under Intermittent Observation, Farouk S. Nathoo

COBRA Preprint Series

In this article we present new statistical methodology for longitudinal studies in forestry where trees are subject to recurrent infection and the hazard of infection depends on tree growth over time. Understanding the nature of this dependence has important implications for reforestation and breeding programs. Challenges arise for statistical analysis in this setting with sampling schemes leading to panel data, exhibiting dynamic spatial variability, and incomplete covariate histories for hazard regression. In addition, data are collected at a large number of locations which poses computational difficulties for spatiotemporal modeling. A joint model for infection and growth is developed; wherein, a …


Supervised Distance Matrices: Theory And Applications To Genomics, Katherine S. Pollard, Mark J. Van Der Laan Jun 2008

Supervised Distance Matrices: Theory And Applications To Genomics, Katherine S. Pollard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We propose a new approach to studying the relationship between a very high dimensional random variable and an outcome. Our method is based on a novel concept, the supervised distance matrix, which quantifies pairwise similarity between variables based on their association with the outcome. A supervised distance matrix is derived in two stages. The first stage involves a transformation based on a particular model for association. In particular, one might regress the outcome on each variable and then use the residuals or the influence curve from each regression as a data transformation. In the second stage, a choice of distance …


Confidence Intervals For The Population Mean Tailored To Small Sample Sizes, With Applications To Survey Sampling, Michael Rosenblum, Mark J. Van Der Laan Jun 2008

Confidence Intervals For The Population Mean Tailored To Small Sample Sizes, With Applications To Survey Sampling, Michael Rosenblum, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

The validity of standard confidence intervals constructed in survey sampling is based on the central limit theorem. For small sample sizes, the central limit theorem may give a poor approximation, resulting in confidence intervals that are misleading. We discuss this issue and propose methods for constructing confidence intervals for the population mean tailored to small sample sizes.

We present a simple approach for constructing confidence intervals for the population mean based on tail bounds for the sample mean that are correct for all sample sizes. Bernstein's inequality provides one such tail bound. The resulting confidence intervals have guaranteed coverage probability …


A Monte Carlo Power Analysis Of Traditional Repeated Measures And Hierarchical Multivariate Linear Models In Longitudinal Data Analysis, Hua Fang, Gordon P. Brooks, Maria L. Rizzo, Kimberly A. Espy, Robert S. Barcikowski May 2008

A Monte Carlo Power Analysis Of Traditional Repeated Measures And Hierarchical Multivariate Linear Models In Longitudinal Data Analysis, Hua Fang, Gordon P. Brooks, Maria L. Rizzo, Kimberly A. Espy, Robert S. Barcikowski

Developmental Cognitive Neuroscience Laboratory: Faculty and Staff Publications

The power properties of traditional repeated measures and hierarchical linear models have not been clearly determined in the balanced design for longitudinal studies in the current literature. A Monte Carlo power analysis of traditional repeated measures and hierarchical multivariate linear models are presented under three variance-covariance structures. Results suggest that traditional repeated measures have higher power than hierarchical linear models for main effects, but lower power for interaction effects. Significant power differences are also exhibited when power is compared across different covariance structures. Results also supplement more comprehensive empirical indexes for estimating model precision via bootstrap estimates and the approximate …


Mechanistic Home Range Models And Resource Selection Analysis: A Reconciliation And Unification, Paul R. Moorcroft, Alex Barnett Apr 2008

Mechanistic Home Range Models And Resource Selection Analysis: A Reconciliation And Unification, Paul R. Moorcroft, Alex Barnett

Dartmouth Scholarship

In the three decades since its introduction, resource selection analysis (RSA) has become a widespread method for analyzing spatial patterns of animal relocations obtained from telemetry studies. Recently, mechanistic home range models have been proposed as an alternative framework for studying patterns of animal space-use. In contrast to RSA models, mechanistic home range models are derived from underlying mechanistic descriptions of individual movement behavior and yield spatially explicit predictions for patterns of animal space-use. In addition, their mechanistic underpinning means that, unlike RSA, mechanistic home range models can also be used to predict changes in space-use following perturbation. In this …


A Method For Visualizing Multivariate Time Series Data, Roger D. Peng Feb 2008

A Method For Visualizing Multivariate Time Series Data, Roger D. Peng

Johns Hopkins University, Dept. of Biostatistics Working Papers

Visualization and exploratory analysis is an important part of any data analysis and is made more challenging when the data are voluminous and high-dimensional. One such example is environmental monitoring data, which are often collected over time and at multiple locations, resulting in a geographically indexed multivariate time series. Financial data, although not necessarily containing a geographic component, present another source of high-volume multivariate time series data. We present the mvtsplot function which provides a method for visualizing multivariate time series data. We outline the basic design concepts and provide some examples of its usage by applying it to a …


Jointly Modeling Continuous And Binary Outcomes For Boolean Outcomes: An Application To Modeling Hypertension, Xianbin Li, Brian S. Caffo, Elizabeth Stuart Feb 2008

Jointly Modeling Continuous And Binary Outcomes For Boolean Outcomes: An Application To Modeling Hypertension, Xianbin Li, Brian S. Caffo, Elizabeth Stuart

Johns Hopkins University, Dept. of Biostatistics Working Papers

Binary outcomes defined by logical (Boolean) "and" or "or" operations on original continuous and discrete outcomes arise commonly in medical diagnoses and epidemiological research. In this manuscript,we consider applying the “or” operator to two continuous variables above a threshold and a binary variable, a setting that occurs frequently in the modeling of hypertension. Rather than modeling the resulting composite outcome defined by the logical operator, we present a method that models the original outcomes thus utilizing all information in the data, yet continues to yield conclusions on the composite scale. A stratified propensity score adjustment is proposed to account for …


Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan Jan 2008

Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Regression models are often used to test for cause-effect relationships from data collected in randomized trials or experiments. This practice has deservedly come under heavy scrutiny, since commonly used models such as linear and logistic regression will often not capture the actual relationships between variables, and incorrectly specified models potentially lead to incorrect conclusions. In this paper, we focus on hypothesis test of whether the treatment given in a randomized trial has any effect on the mean of the primary outcome, within strata of baseline variables such as age, sex, and health status. Our primary concern is ensuring that such …


Gone In 60 Seconds: The Absorption Of News In A High-Frequency Betting Market, Babatunde Buraimo, David Peel, Rob Simmons Jan 2008

Gone In 60 Seconds: The Absorption Of News In A High-Frequency Betting Market, Babatunde Buraimo, David Peel, Rob Simmons

Dr Babatunde Buraimo

This paper tests for efficiency in a betting market that offers high-frequency data, the Betfair betting exchange for wagering on outcomes of English Premier League soccer matches. We find clear evidence of rapid adjustment of prices to large disturbances (news). Full adjustment takes place within a one minute interval after the news. This suggests that this particular wagering market is not just efficient at pre-match prices but is also efficient in the face of events within games.


Implementation Of Uncertainty Propagation In Triton/Keno, Charlotta Sanders, Denis Beller Jan 2008

Implementation Of Uncertainty Propagation In Triton/Keno, Charlotta Sanders, Denis Beller

Reactor Campaign (TRP)

Monte Carlo methods are beginning to be used for three dimensional fuel depletion analyses to compute various quantities of interest, including isotopic compositions of used nuclear fuel. The TRITON control module, available in the SCALE 5.1 code system, can perform three-dimensional (3-D) depletion calculations using either the KENO V.a or KENO-VI Monte Carlo transport codes, as well as the two-dimensional (2-D) NEWT discrete ordinates code. To overcome problems such as spatially nonuniform neutron flux and non-uniform statistical uncertainties in computed reaction rates and to improve the fidelity of calculations using Monte Carlo methods, uncertainty propagation is needed for depletion calculations.


Monaco/Mavric Evaluation For Facility Shielding And Dose Rate Analysis, Charlotta Sanders, Denis Beller Jan 2008

Monaco/Mavric Evaluation For Facility Shielding And Dose Rate Analysis, Charlotta Sanders, Denis Beller

Reactor Campaign (TRP)

The dimensions and the large amount of shielding required for Global Nuclear Energy Partnership (GNEP) facilities, advanced radiation shielding, and dose computation techniques are beyond today’s capabilities and will certainly be required. With the Generation IV Nuclear Energy System Initiative, it will become increasingly important to be able to accurately model advanced Boiling Water Reactor and Pressurized Water Reactor facilities, and to calculate dose rates at all locations within a containment (e.g., resulting from radiations from the reactor as well as the from the primary coolant loop) and adjoining structures (e.g., from the spent fuel pool).

The MAVRIC sequence is …


Application Of The Fractal Market Hypothesis For Modelling Macroeconomic Time Series, Jonathan Blackledge Jan 2008

Application Of The Fractal Market Hypothesis For Modelling Macroeconomic Time Series, Jonathan Blackledge

Articles

This paper explores the conceptual background to financial time series analysis and financial signal processing in terms of the Efficient Market Hypothesis. By revisiting the principal conventional approaches to market analysis and the reasoning associated with them, we develop a Fractal Market Hypothesis that is based on the application of non-stationary fractional dynamics using an operator of the type
2 / ∂x2 − σq(t) * ∂ q(t)/ ∂tq(t)

where σ−1 is the fractional diffusivity and q is the Fourier dimension which, for the topology considered, (i.e. the one-dimensional case) is related to the Fractal …