Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

2006

Series

Discipline
Institution
Keyword
Publication

Articles 1 - 30 of 172

Full-Text Articles in Statistics and Probability

Lehmann Family Of Roc Curves, Mithat Gonen, Glenn Heller Dec 2006

Lehmann Family Of Roc Curves, Mithat Gonen, Glenn Heller

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

Receiver operating characteristic (ROC) curves are useful in evaluating the ability of a continuous marker in discriminating between the two states of a binary outcome such as diseased/not diseased. The most popular parametric model for an ROC curve is the binormal model which assumes that the marker is normally distributed conditional on the outcome. Here we present an alternative to the binormal model based on the Lehmann family, also known as the proportional hazards specification. The resulting ROC curve and its functionals (such as the area under the curve) have simple analytic forms. We derive closed-form expressions for the asymptotic …


A Likelihood Based Method For Real Time Estimation Of The Serial Interval And Reproductive Number Of An Epidemic, Laura Forsberg White, Marcello Pagano Dec 2006

A Likelihood Based Method For Real Time Estimation Of The Serial Interval And Reproductive Number Of An Epidemic, Laura Forsberg White, Marcello Pagano

Harvard University Biostatistics Working Paper Series

No abstract provided.


Numerical And Asymptotical Study Of Three-Dimensional Wave Packets In A Compressible Boundary Layer, Eric Forgoston, Michael Viergutz, Anatoli Tumin Dec 2006

Numerical And Asymptotical Study Of Three-Dimensional Wave Packets In A Compressible Boundary Layer, Eric Forgoston, Michael Viergutz, Anatoli Tumin

Department of Applied Mathematics and Statistics Faculty Scholarship and Creative Works

A three-dimensional wave packet generated by a local disturbance in a two-dimensional hypersonic boundary layer flow is studied with the aid of the previously solved initialvalue problem. The solution can be presented as a sum of modes consisting of continuous and discrete spectra of temporal stability theory. Two discrete modes, known as Mode S and Mode F, are of interest in high-speed flows since they may be involved in a laminar-turbulent transition scenario. The continuous and discrete spectra are analyzed numerically for a hypersonic flow. A comprehensive study of the spectrum is performed, including Reynolds number, Mach number and temperature …


A Semiparametric Approach For The Nonparametric Transformation Survival Model With Multiple Covariates, Xiao Song, Shuangge Ma, Jian Huang, Xiao-Hua Zhou Dec 2006

A Semiparametric Approach For The Nonparametric Transformation Survival Model With Multiple Covariates, Xiao Song, Shuangge Ma, Jian Huang, Xiao-Hua Zhou

UW Biostatistics Working Paper Series

The nonparametric transformation model for survival time that makes no parametric assumptions on both the transformation function and the error is appealing in its flexibility. The nonparametric transformation model makes no assumption on the forms of the transformation function and the error distribution. This model is appealing in its flexibility for modeling censored survival data. Current approaches for estimation of the regression parameters involve maximizing discontinuous objective functions, which are numerically infeasible to implement in the case of multiple covariates. Based on the partial rank estimator (Khan & Tamer, 2004), we propose a smoothed partial rank estimator which maximizes a …


Gamma Shape Mixtures For Heavy-Tailed Distributions, Sergio Venturini, Francesca Dominici, Giovanni Parmigiani Dec 2006

Gamma Shape Mixtures For Heavy-Tailed Distributions, Sergio Venturini, Francesca Dominici, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

An important question in health services research is the estimation of the proportion of medical expenditures that exceed a given threshold. Typically, medical expenditures present highly skewed, heavy tailed distributions, for which a) simple variable transformations are insufficient to achieve a tractable low- dimensional parametric form and b) nonparametric methods are not efficient in estimating exceedance probabilities for large thresholds. Motivated by this context, in this paper we propose a general Bayesian approach for the estimation of tail probabilities of heavy-tailed distributions,based on a mixture of gamma distributions in which the mixing occurs over the shape parameter. This family provides …


Topology Of Attractors From Two-Piece Expanding Maps, Youngna Choi Dec 2006

Topology Of Attractors From Two-Piece Expanding Maps, Youngna Choi

Department of Applied Mathematics and Statistics Faculty Scholarship and Creative Works

In this paper we study the topology of the invariant sets derived from two-piece expanding maps. We classify the conditions under which the invariant sets are topological attractors, and show that the set of attractors is open and dense in the set of invariant sets derived by two-piece expanding maps.


Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh Nov 2006

Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh

The University of Michigan Department of Biostatistics Working Paper Series

SUMMARY. We consider a semiparametric regression model that relates a normal outcome to covariates and a genetic pathway, where the covariate effects are modeled parametrically and the pathway effect of multiple gene expressions is modeled parametrically or nonparametrically using least squares kernel machines (LSKMs). This unified framework allows a flexible function for the joint effect of multiple genes within a pathway by specifying a kernel function and allows for the possibility that each gene expression effect might be nonlinear and the genes within the same pathway are likely to interact with each other in a complicated way. This semiparametric model …


Spatio-Temporal Analysis Of Areal Data And Discovery Of Neighborhood Relationships In Conditionally Autoregressive Models, Subharup Guha, Louise Ryan Nov 2006

Spatio-Temporal Analysis Of Areal Data And Discovery Of Neighborhood Relationships In Conditionally Autoregressive Models, Subharup Guha, Louise Ryan

Harvard University Biostatistics Working Paper Series

No abstract provided.


Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh Nov 2006

Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh

Harvard University Biostatistics Working Paper Series

No abstract provided.


Analysis Of Case-Control Age-At-Onset Data Using A Modified Case-Cohort Method, Bin Nan, Xihong Lin Nov 2006

Analysis Of Case-Control Age-At-Onset Data Using A Modified Case-Cohort Method, Bin Nan, Xihong Lin

The University of Michigan Department of Biostatistics Working Paper Series

Case-control designs are widely used in rare disease studies. In a typical case-control study, data are collected from a sample of all available subjects who have experienced a disease (cases) and a sub-sample of subjects who have not experienced the disease (controls) in a study cohort. Cases are often oversampled in case-control studies. Logistic regression is a common tool to estimate the relative risks of the disease and a set of covariates. Very often in such a study, information of ages-at-onset of the disease for all cases and ages at survey of controls are known. Standard logistic regression analysis using …


Gene Expression Patterns That Predict Sensitivity To Epidermal Growth Factor Receptor Tyrosine Kinase Inhibitors In Lung Cancer Cell Lines And Human Lung Tumors, Justin M. Balko, Anil Potti, Christopher Saunders, Arnold J. Stromberg, Eric B. Haura, Esther P. Black Nov 2006

Gene Expression Patterns That Predict Sensitivity To Epidermal Growth Factor Receptor Tyrosine Kinase Inhibitors In Lung Cancer Cell Lines And Human Lung Tumors, Justin M. Balko, Anil Potti, Christopher Saunders, Arnold J. Stromberg, Eric B. Haura, Esther P. Black

Statistics Faculty Publications

BACKGROUND: Increased focus surrounds identifying patients with advanced non-small cell lung cancer (NSCLC) who will benefit from treatment with epidermal growth factor receptor (EGFR) tyrosine kinase inhibitors (TKI). EGFR mutation, gene copy number, coexpression of ErbB proteins and ligands, and epithelial to mesenchymal transition markers all correlate with EGFR TKI sensitivity, and while prediction of sensitivity using any one of the markers does identify responders, individual markers do not encompass all potential responders due to high levels of inter-patient and inter-tumor variability. We hypothesized that a multivariate predictor of EGFR TKI sensitivity based on gene expression data would offer a …


Smoothed Rank Regression With Censored Data, Glenn Heller Nov 2006

Smoothed Rank Regression With Censored Data, Glenn Heller

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

A weighted rank estimating function is proposed to estimate the regression parameter vector in an accelerated failure time model with right censored data. In general, rank estimating functions are discontinuous in the regression parameter, creating difficulties in determining the asymptotic distribution of the estimator. A local distribution function is used to create a rank based estimating function that is continuous and monotone in the regression parameter vector. A weight is included in the estimating function to produce a bounded influence estimate. The asymptotic distribution of the regression estimator is developed and simulations are performed to examine its finite sample properties. …


Properties Of Monotonic Effects, Tyler J. Vanderweele, James M. Robins Nov 2006

Properties Of Monotonic Effects, Tyler J. Vanderweele, James M. Robins

COBRA Preprint Series

Various relationships are shown hold between monotonic effects and weak monotonic effects and the monotonicity of certain conditional expectations. This relationship is considered for both binary and non-binary variables. Counterexamples are provide to show that the results do not hold under less restrictive conditions. The ideas of monotonic effects are furthermore used to relate signed edges on a directed acyclic graph to qualitative effect modification.


Multiple Testing With An Empirical Alternative Hypothesis, James E. Signorovitch Nov 2006

Multiple Testing With An Empirical Alternative Hypothesis, James E. Signorovitch

Harvard University Biostatistics Working Paper Series

An optimal multiple testing procedure is identified for linear hypotheses under the general linear model, maximizing the expected number of false null hypotheses rejected at any significance level. The optimal procedure depends on the unknown data-generating distribution, but can be consistently estimated. Drawing information together across many hypotheses, the estimated optimal procedure provides an empirical alternative hypothesis by adapting to underlying patterns of departure from the null. Proposed multiple testing procedures based on the empirical alternative are evaluated through simulations and an application to gene expression microarray data. Compared to a standard multiple testing procedure, it is not unusual for …


Doubly Penalized Buckley-James Method For Survival Data With High-Dimensional Covariates, Sijian Wang, Bin Nan, Ji Zhu, David G. Beer Nov 2006

Doubly Penalized Buckley-James Method For Survival Data With High-Dimensional Covariates, Sijian Wang, Bin Nan, Ji Zhu, David G. Beer

The University of Michigan Department of Biostatistics Working Paper Series

Recent interest in cancer research focuses on predicting patients' survival by investigating gene expression profiles based on microarray analysis. We propose a doubly penalized Buckley-James method for the semiparametric accelerated failure time model to relate high-dimensional genomic data to censored survival outcomes, which uses a mixture of L1-norm and L2-norm penalties. Similar to the elastic-net method for linear regression model with uncensored data, the proposed method performs automatic gene selection and parameter estimation, where highly correlated genes are able to be selected (or removed) together. The two-dimensional tuning parameter is determined by cross-validation and uniform design. …


Ex Ante Choices Of Law And Forum: An Empirical Analysis Of Corporate Merger Agreements, Theodore Eisenberg, Geoffrey P. Miller Nov 2006

Ex Ante Choices Of Law And Forum: An Empirical Analysis Of Corporate Merger Agreements, Theodore Eisenberg, Geoffrey P. Miller

Cornell Law Faculty Publications

Legal scholars have focused much attention on the incorporation puzzle—why business corporations so heavily favor Delaware as the site of incorporation. This paper suggests that the focus on the incorporation decision overlooks a broader but intimately related set of questions. The choice of Delaware as a situs of incorporation is, effectively, a choice of law decision. A company electing to charter in Delaware selects Delaware law (and authorizes Delaware courts to adjudicate legal disputes) regarding the allocation of governance authority within the firm. In this sense, the incorporation decision is fundamentally similar to any setting in which a company selects …


A Note On Bias Due To Fitting Prospective Multivariate Generalized Linear Models To Categorical Outcomes Ignoring Retrospective Sampling Schemes, Bhramar Mukherjee, Ivy Liu Nov 2006

A Note On Bias Due To Fitting Prospective Multivariate Generalized Linear Models To Categorical Outcomes Ignoring Retrospective Sampling Schemes, Bhramar Mukherjee, Ivy Liu

The University of Michigan Department of Biostatistics Working Paper Series

Outcome dependent sampling designs are commonly used in economics, market research and epidemiological studies. Case-control sampling design is a classic example of outcome dependent sampling, where exposure information is collected on subjects conditional on their disease status. In many situations, the outcome under consideration may have multiple categories instead of a simple dichotomization. For example, in a case-control study, there may be disease sub-classification among the “cases” based on progression of the disease, or in terms of other histological and morphological characteristics of the disease. In this note, we investigate the issue of fitting prospective multivariate generalized linear models to …


Exploiting Gene-Environment Independence For Analysis Of Case-Control Studies: An Empirical Bayes Approach To Trade Off Between Bias And Efficiency, Bhramar Mukherjee, Nilanjan Chatterjee Nov 2006

Exploiting Gene-Environment Independence For Analysis Of Case-Control Studies: An Empirical Bayes Approach To Trade Off Between Bias And Efficiency, Bhramar Mukherjee, Nilanjan Chatterjee

The University of Michigan Department of Biostatistics Working Paper Series

Standard prospective logistic regression analysis of case-control data often leads to very imprecise estimates of gene-environment interactions due to small numbers of cases or controls in cells of crossing genotype and exposure. In contrast, under the assumption of gene-environment independence, modern “retrospective” methods, including the “case-only” approach, can estimate the interaction parameters much more precisely, but they can be seriously biased when the underlying assumption of gene-environment independence is violated. In this article, we propose a novel approach to analyze case-control data that can relax the gene-environment independence assumption using an empirical Bayes framework. In the special case, involving a …


A Conversation With Harry Martz, Paul H. Kvam Nov 2006

A Conversation With Harry Martz, Paul H. Kvam

Department of Math & Statistics Faculty Publications

Harry F. Martz was born June 16, 1942 and grew up in Cumberland, Maryland. He received a Bachelor of Science degree in mathematics (with a minor in physics) from Frostburg State University in 1964, and earned a Ph.D. in statistics at Virginia Polytechnic Institute and State University in 1968. He started his statistics career at Texas Tech University's Department of Industrial Engineering and Statistics right after graduation. In 1978, he joined the technical staff at Los Alamos National Laboratory (LANL) in Los Alamos, New Mexico after first working as Full Professor in the Department of Industrial Engineering at Utah State …


Comments On “On The Distribution Of The Product Of Independent Rayleigh Random Variables”, Saralees Nadarajah, Samuel Kotz Nov 2006

Comments On “On The Distribution Of The Product Of Independent Rayleigh Random Variables”, Saralees Nadarajah, Samuel Kotz

Department of Statistics: Faculty Publications

It is pointed out that the result in Salo et al. “The Distribution of the Product of Independent Rayleigh Random Variables” IEEE Trans. Antennas Propag., vol. 54, pp. 639–643, Feb. 2006, is a particular case of a much more general result known since the 1970s. A general technique (known as the H function technique) is described that can be used derive a wide range of results similar to Salo et al.


Comments On “On The Distribution Of The Product Of Independent Rayleigh Random Variables”, Saralees Nadarajah, Samuel Kotz Nov 2006

Comments On “On The Distribution Of The Product Of Independent Rayleigh Random Variables”, Saralees Nadarajah, Samuel Kotz

Department of Statistics: Faculty Publications

It is pointed out that the result in Salo et al. “The Distribution of the Product of Independent Rayleigh Random Variables” IEEE Trans. Antennas Propag., vol. 54, pp. 639–643, Feb. 2006, is a particular case of a much more general result known since the 1970s. A general technique (known as the function technique) is described that can be used derive a wide range of results similar to Salo et al.


Dynamic Modeling And Statistical Analysis Of Event Times, Edsel A. Pena Nov 2006

Dynamic Modeling And Statistical Analysis Of Event Times, Edsel A. Pena

Faculty Publications

This review article provides an overview of recent work in the modeling and analysis of recurrent events arising in engineering, reliability, public health, biomedicine and other areas. Recurrent event modeling possesses unique facets making it different and more difficult to handle than single event settings. For instance, the impact of an increasing number of event occurrences needs to be taken into account, the effects of covariates should be considered, potential association among the interevent times within a unit cannot be ignored, and the effects of performed interventions after each event occurrence need to be factored in. A recent general class …


Security In Pervasive Computing: Current Status And Open Issues, Munirul Haque, Sheikh Iqbal Ahamed Nov 2006

Security In Pervasive Computing: Current Status And Open Issues, Munirul Haque, Sheikh Iqbal Ahamed

Mathematics, Statistics and Computer Science Faculty Research and Publications

Million of wireless device users are ever on the move, becoming more dependent on their PDAs, smart phones, and other handheld devices. With the advancement of pervasive computing, new and unique capabilities are available to aid mobile societies. The wireless nature of these devices has fostered a new era of mobility. Thousands of pervasive devices are able to arbitrarily join and leave a network, creating a nomadic environment known as a pervasive ad hoc network. However, mobile devices have vulnerabilities, and some are proving to be challenging. Security in pervasive computing is the most critical challenge. Security is needed to …


Large Cluster Asymptotics For Gee: Working Correlation Models, Hyoju Chung, Thomas Lumley Oct 2006

Large Cluster Asymptotics For Gee: Working Correlation Models, Hyoju Chung, Thomas Lumley

UW Biostatistics Working Paper Series

This paper presents large cluster asymptotic results for generalized estimating equations. The complexity of working correlation model is characterized in terms of the number of working correlation components to be estimated. When the cluster size is relatively large, we may encounter a situation where a high-dimensional working correlation matrix is modeled and estimated from the data. In the present asymptotic setting, the cluster size and the complexity of working correlation model grow with the number of independent clusters. We show the existence, weak consistency and asymptotic normality of marginal regression parameter estimators using the results of empirical process theory and …


Statistical Analysis Of Air Pollution Panel Studies: An Illustration, Holly Janes, Lianne Sheppard, Kristen Shepherd Oct 2006

Statistical Analysis Of Air Pollution Panel Studies: An Illustration, Holly Janes, Lianne Sheppard, Kristen Shepherd

UW Biostatistics Working Paper Series

The panel study design is commonly used to evaluate the short-term health effects of air pollution. Standard statistical methods for analyzing longitudinal data are available, but the literature reveals that the techniques are not well understood by practitioners. We illustrate these methods using data from the 1999 to 2002 Seattle panel study. Marginal, conditional, and transitional approaches for modeling longitudinal data are reviewed and contrasted with respect to their parameter interpretation and methods for accounting for correlation and dealing with missing data. We also discuss and illustrate techniques for controlling for time-dependent and time-independent confounding, and for exploring and summarizing …


Bayesian Hidden Markov Modeling Of Array Cgh Data, Subharup Guha, Yi Li, Donna Neuberg Oct 2006

Bayesian Hidden Markov Modeling Of Array Cgh Data, Subharup Guha, Yi Li, Donna Neuberg

Harvard University Biostatistics Working Paper Series

Genomic alterations have been linked to the development and progression of cancer. The technique of Comparative Genomic Hybridization (CGH) yields data consisting of fluorescence intensity ratios of test and reference DNA samples. The intensity ratios provide information about the number of copies in DNA. Practical issues such as the contamination of tumor cells in tissue specimens and normalization errors necessitate the use of statistics for learning about the genomic alterations from array-CGH data. As increasing amounts of array CGH data become available, there is a growing need for automated algorithms for characterizing genomic profiles. Specifically, there is a need for …


Exploration Of Distributional Models For A Novel Intensity-Dependent Normalization , Nicola Lama, Patrizia Boracchi, Elia Mario Biganzoli Oct 2006

Exploration Of Distributional Models For A Novel Intensity-Dependent Normalization , Nicola Lama, Patrizia Boracchi, Elia Mario Biganzoli

COBRA Preprint Series

Currently used gene intensity-dependent normalization methods, based on regression smoothing techniques, usually approach the two problems of location bias detrending and data re-scaling without taking into account the censoring characteristic of certain gene expressions produced by experiment measurement constraints or by previous normalization steps. Moreover, the bias vs variance balance control of normalization procedures is not often discussed but left to the user's experience. Here an approximate maximum likelihood procedure to fit a model smoothing the dependences of log-fold gene expression differences on average gene intensities is presented. Central tendency and scaling factor were modeled by means of B-splines smoothing …


Targeted Maximum Likelihood Learning, Mark J. Van Der Laan, Daniel Rubin Oct 2006

Targeted Maximum Likelihood Learning, Mark J. Van Der Laan, Daniel Rubin

U.C. Berkeley Division of Biostatistics Working Paper Series

Suppose one observes a sample of independent and identically distributed observations from a particular data generating distribution. Suppose that one has available an estimate of the density of the data generating distribution such as a maximum likelihood estimator according to a given or data adaptively selected model. Suppose that one is concerned with estimation of a particular pathwise differentiable Euclidean parameter. A substitution estimator evaluating the parameter of the density estimator is typically too biased and might not even converge at the parametric rate: that is, the density estimator was targeted to be a good estimator of the density and …


Crude Cumulative Incidence In The Form Of A Horvitz-Thompson Like And Kaplan-Meier Like Estimator, Laura Antolini, Elia Mario Biganzoli, Patrizia Boracchi Oct 2006

Crude Cumulative Incidence In The Form Of A Horvitz-Thompson Like And Kaplan-Meier Like Estimator, Laura Antolini, Elia Mario Biganzoli, Patrizia Boracchi

COBRA Preprint Series

The link between the nonparametric estimator of the crude cumulative incidence of a competing risk and the Kaplan-Meier estimator is exploited. The equivalence of the nonparametric crude cumulative incidence to an inverse-probability-of-censoring weighted average of the sub-distribution function is proved. The link between the estimation of crude cumulative incidence curves and Gray's family of nonparametric tests is considered. The crude cumulative incidence is proved to be a Kaplan-Meier like estimator based on the sub-distribution hazard, i.e. the quantity on which Gray's family of tests is based. A standard probabilistic formalism is adopted to have a note accessible to applied statisticians.


Procedure Models, C. F. Bartley, W. W. Watson Oct 2006

Procedure Models, C. F. Bartley, W. W. Watson

Publications (YM)

This procedure establishes the responsibilities and process for documenting activities that constitute scientific investigation modeling. Planning requirements for conducting modeling are contained in LP-2.29Q-BSC, Planning for Science Activities.