Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

PDF

Selected Works

2006

Articles 1 - 30 of 36

Full-Text Articles in Entire DC Network

Modeling The Incubation Period Of Anthrax, Ron Brookmeyer, Elizabeth Johnson, Sarah Barry Dec 2006

Modeling The Incubation Period Of Anthrax, Ron Brookmeyer, Elizabeth Johnson, Sarah Barry

Ron Brookmeyer

Models of the incubation period of anthrax are important to public health planners because they can be used to predict the delay before outbreaks are detected, the size of an outbreak and the duration of time that persons should remain on antibiotics to prevent disease. The difficulty is that there is little direct data about the incubation period in humans. The objective of this paper is to develop and apply models for the incubation period of anthrax. Mechanistic models that account for the biology of spore clearance and germination are developed based on a competing risks formulation. The models predict …


A Note On Empirical Likelihood Inference Of Residual Life Regression, Ying Qing Chen, Yichuan Zhao Dec 2006

A Note On Empirical Likelihood Inference Of Residual Life Regression, Ying Qing Chen, Yichuan Zhao

Yichuan Zhao

Mean residual life function, or life expectancy, is an important function to characterize distribution of residual life. The proportional mean residual life model by Oakes and Dasu (1990) is a regression tool to study the association between life expectancy and its associated covariates. Although semiparametric inference procedures have been proposed in the literature, the accuracy of such procedures may be low when the censoring proportion is relatively large. In this paper, the semiparametric inference procedures are studied with an empirical likelihood ratio method. An empirical likelihood confidence region is constructed for the regression parameters. The proposed method is further compared …


Wavelet-Based Functional Mixed Models To Characterize Population Heterogeneity In Accelerometer Profiles: A Case Study. , Jeffrey S. Morris, Cassandra Arroyo, Brent A. Coull, Louise M. Ryan, Steven L. Gortmaker Dec 2006

Wavelet-Based Functional Mixed Models To Characterize Population Heterogeneity In Accelerometer Profiles: A Case Study. , Jeffrey S. Morris, Cassandra Arroyo, Brent A. Coull, Louise M. Ryan, Steven L. Gortmaker

Jeffrey S. Morris

We present a case study illustrating the challenges of analyzing accelerometer data taken from a sample of children participating in an intervention study designed to increase physical activity. An accelerometer is a small device worn on the hip that records the minute-by-minute activity levels of the child throughout the day for each day it is worn. The resulting data are irregular functions characterized by many peaks representing short bursts of intense activity. We model these data using the wavelet-based functional mixed model. This approach incorporates multiple fixed effects and random effect functions of arbitrary form, the estimates of which are …


Alternative Probeset Definitions For Combining Microarray Data Across Studies Using Different Versions Of Affymetrix Oligonucleotide Arrays, Jeffrey S. Morris, Chunlei Wu, Kevin R. Coombes, Keith A. Baggerly, Jing Wang, Li Zhang Dec 2006

Alternative Probeset Definitions For Combining Microarray Data Across Studies Using Different Versions Of Affymetrix Oligonucleotide Arrays, Jeffrey S. Morris, Chunlei Wu, Kevin R. Coombes, Keith A. Baggerly, Jing Wang, Li Zhang

Jeffrey S. Morris

Many published microarray studies have small to moderate sample sizes, and thus have low statistical power to detect significant relationships between gene expression levels and outcomes of interest. By pooling data across multiple studies, however, we can gain power, enabling us to detect new relationships. This type of pooling is complicated by the fact that gene expression measurements from different microarray platforms are not directly comparable. In this chapter, we discuss two methods for combining information across different versions of Affymetrix oligonucleotide arrays. Each involves a new approach for combining probes on the array into probesets. The first approach involves …


An Econometric Method Of Correcting For Unit Nonresponse Bias In Surveys, Martin Ravallion, Anton Korinek, Johan Mistiaen Dec 2006

An Econometric Method Of Correcting For Unit Nonresponse Bias In Surveys, Martin Ravallion, Anton Korinek, Johan Mistiaen

Martin Ravallion

Past approaches to correcting for unit nonresponse in sample surveys by re-weighting the data assume that the problem is ignorable within arbitrary subgroups of the population. Theory and evidence suggest that this assumption is unlikely to hold, and that household characteristics such as income systematically affect survey compliance. We show that this leaves a bias in the re-weighted data and we propose a method of correcting for this bias. The geographic structure of nonresponse rates allows us to identify a micro compliance function, which is then used to re-weight the unit-record data. An example is given for the US Current …


Identifying Important Explanatory Variables For Time-Varying Outcomes., Oliver Bembom, Maya L. Petersen, Mark J. Van Der Laan Dec 2006

Identifying Important Explanatory Variables For Time-Varying Outcomes., Oliver Bembom, Maya L. Petersen, Mark J. Van Der Laan

Maya Petersen

This chapter describes a systematic and targeted approach for estimating the impact of each of a large number of baseline covariates on an outcome that is measured repeatedly over time. These variable importance estimates can be adjusted for a user-specified set of confounders and lend themselves in a straightforward way to obtaining confidence intervals and p-values. Hence, they can in particular be used to identify a subset of baseline covariates that are the most important explanatory variables for the time-varying outcome of interest. We illustrate the methodology in a data analysis aimed at finding mutations of the human immunodeficiency virus …


Identifying Important Explanatory Variables For Time-Varying Outcomes., Oliver Bembom, Maya L. Petersen, Mark J. Van Der Laan Dec 2006

Identifying Important Explanatory Variables For Time-Varying Outcomes., Oliver Bembom, Maya L. Petersen, Mark J. Van Der Laan

Oliver Bembom

This chapter describes a systematic and targeted approach for estimating the impact of each of a large number of baseline covariates on an outcome that is measured repeatedly over time. These variable importance estimates can be adjusted for a user-specified set of confounders and lend themselves in a straightforward way to obtaining confidence intervals and p-values. Hence, they can in particular be used to identify a subset of baseline covariates that are the most important explanatory variables for the time-varying outcome of interest. We illustrate the methodology in a data analysis aimed at finding mutations of the human immunodeficiency virus …


Modeling An Outbreak Of Anthrax, Ron Brookmeyer Nov 2006

Modeling An Outbreak Of Anthrax, Ron Brookmeyer

Ron Brookmeyer

Introduction

On October 2, 2001 a sixty-three-year-old Florida man who worked as a photo editor at a media publishing company was admitted to an emergency department complaining of nausea, vomiting, and fever. His symptoms began four days earlier on a recreational trip to North Carolina. The man died shortly thereafter. An astute clinician quickly made the surprising diagnosis of inhalational anthrax, which is a serious and deadly disease. The diagnosis was surprising because inhalational anthrax is extremely rare; only 18 cases were reported in the United States between 1900 and 1978. Public health officials at first believed that the Florida …


Optimizing The Expected Overlap Of Survey Samples Via The Northwest Corner Rule, Lenka Mach, Philip T. Reiss, Ioana Schiopu-Kratina Nov 2006

Optimizing The Expected Overlap Of Survey Samples Via The Northwest Corner Rule, Lenka Mach, Philip T. Reiss, Ioana Schiopu-Kratina

Philip T. Reiss

In survey sampling there is often a need to coordinate the selection of pairs of samples drawn from two overlapping populations so as to maximize or minimize their expected overlap, subject to constraints on the marginal probabilities determined by the respective designs. For instance, maximizing the expected overlap between repeated samples can stabilize the resulting estimates of change and reduce the costs of first contacts; minimizing the expected overlap can avoid overburdening respondents with multiple surveys. We focus on the important special case in which both samples are selected by simple random sampling without replacement (SRSWOR) conducted independently within each …


Prepms: Tof Ms Data Graphical Preprocessing Tool, Yuliya V. Karpievitch, Elizabeth G. Hill, Adam J. Smolka, Jeffrey S. Morris, Kevin R. Coombes, Keith A. Baggerly, Jonas S. Almeida Nov 2006

Prepms: Tof Ms Data Graphical Preprocessing Tool, Yuliya V. Karpievitch, Elizabeth G. Hill, Adam J. Smolka, Jeffrey S. Morris, Kevin R. Coombes, Keith A. Baggerly, Jonas S. Almeida

Jeffrey S. Morris

We introduce a simple-to-use graphical tool that enables researchers to easily prepare time-of-flight mass spectrometry data for analysis. For ease of use, the graphical executable provides default parameter settings experimentally determined to work well in most situations. These values can be changed by the user if desired. PrepMS is a stand-alone application made freely available (open source), and is under the General Public License (GPL). Its graphical user interface, default parameter settings, and display plots allow PrepMS to be used effectively for data preprocessing, peak detection, and visual data quality assessment.


Use Of Unbiased Estimating Equations To Estimate Correlation In Generalized Estimating Equation Analysis Of Longitudinal Trials, Wenguang Sun, Justine Shults, Mary Leonard Oct 2006

Use Of Unbiased Estimating Equations To Estimate Correlation In Generalized Estimating Equation Analysis Of Longitudinal Trials, Wenguang Sun, Justine Shults, Mary Leonard

Justine Shults

In a recent publication, Wang and Carey (Journal of the American Statistical Association, 99, pp. 845-853, 2004) presented a new approach for estimation of the correlation parameters in the framework of generalized estimating equations (GEE). They considered correlated continuous, binary and count data with a generalized Markov correlation structure that includes the first-order autoregressive AR(1) and Markov structures as special cases. They made detailed comparisons with pseudo-likelihood (PL) and the first stage of quasi-least squares (QLS), a two-stage approach in the framework of generalized estimating equations (GEE). In this note we extend their comparisons for the second (bias corrected) stage …


Censored Data Regression In High-Dimension And Low-Sample Size Settings For Genomic Applications, Hongzhe Li Oct 2006

Censored Data Regression In High-Dimension And Low-Sample Size Settings For Genomic Applications, Hongzhe Li

Hongzhe Li

New high-throughput technologies are generating various types of high-dimensional genomic and proteomic data and meta-data (e.g., networks and pathways) in order to obtain a systems-level understanding of various complex diseases such as human cancers and cardiovascular diseases. As the amount and complexity of the data increase and as the questions being addressed become more sophisticated, we face the great challenge of how to model such data in order to draw valid statistical and biological conclusions. One important problem in genomic research is to relate these high-throughput genomic data to various clinical outcomes, including possibly censored survival outcomes such as age …


Wavelet-Based Functional Mixed Model Analysis: Computational Considerations, Richard C. Herrick, Jeffrey S. Morris Aug 2006

Wavelet-Based Functional Mixed Model Analysis: Computational Considerations, Richard C. Herrick, Jeffrey S. Morris

Jeffrey S. Morris

Wavelet-based Functional Mixed Models is a new Bayesian method extending mixed models to irregular functional data (Morris and Carroll, JRSS-B, 2006). These data sets are typically very large and can quickly run into memory and time constraints unless these issues are carefully dealt with in the software. We reduce runtime by 1.) identifying and optimizing hotspots, 2.) using wavelet compression to do less computation with minimal impact on results, and 3.) dividing the code into multiple executables to be run in parallel using a grid computing resource. We discuss rules of thumb for estimating memory requirements and computation times in …


Bayesian Sample Size Calculations In Phase Ii Clinical Trials Using A Mixture Of Informative Priors., Byron J. Gajewski, Matthew S. Mayo Aug 2006

Bayesian Sample Size Calculations In Phase Ii Clinical Trials Using A Mixture Of Informative Priors., Byron J. Gajewski, Matthew S. Mayo

Byron J Gajewski

A number of researchers have discussed phase II clinical trials from a Bayesian perspective. A recent article by Mayo and Gajewski focuses on sample size calculations, which they determine by specifying an informative prior distribution and then calculating a posterior probability that the true response will exceed a prespecified target. In this article, we extend these sample size calculations to include a mixture of informative prior distributions. The mixture comes from several sources of information. For example consider information from two (or more) clinicians. The first clinician is pessimistic about the drug and the second clinician is optimistic. We tabulate …


Regression Analysis With Categorized Regression Calibrated Exposure: Some Interesting Findings, Ingvild Dalen, John Buonaccorsi, Petter Laake, Anette Hjartaker, Magne Thorese Jul 2006

Regression Analysis With Categorized Regression Calibrated Exposure: Some Interesting Findings, Ingvild Dalen, John Buonaccorsi, Petter Laake, Anette Hjartaker, Magne Thorese

John Buonaccorsi

Background: Regression calibration as a method for handling measurement error is becoming increasingly well-known and used in epidemiologic research. However, the standard version of the method is not appropriate for exposure analyzed on a categorical (e.g. quintile) scale, an approach commonly used in epidemiologic studies. A tempting solution could then be to use the predicted continuous exposure obtained through the regression calibration method and treat it as an approximation to the true exposure, that is, include the categorized calibrated exposure in the main regression analysis. Methods: We use semi-analytical calculations and simulations to evaluate the performance of the proposed approach …


Some Statistical Issues In Microarray Gene Expression Data, Matthew S. Mayo, Byron J. Gajewski, Jeffrey S. Morris Jun 2006

Some Statistical Issues In Microarray Gene Expression Data, Matthew S. Mayo, Byron J. Gajewski, Jeffrey S. Morris

Jeffrey S. Morris

In this paper we discuss some of the statistical issues that should be considered when conducting experiments involving microarray gene expression data. We discuss statistical issues related to preprocessing the data as well as the analysis of the data. Analysis of the data is discussed in three contexts: class comparison, class prediction and class discovery. We also review the methods used in two studies that are using microarray gene expression to assess the effect of exposure to radiofrequency (RF) fields on gene expression. Our intent is to provide a guide for radiation researchers when conducting studies involving microarray gene expression …


Bayesian Models For Pooling Microarray Studies With Multiple Sources Of Replications, Erin M. Conlon, Joon J. Song, Jun S. Liu May 2006

Bayesian Models For Pooling Microarray Studies With Multiple Sources Of Replications, Erin M. Conlon, Joon J. Song, Jun S. Liu

Erin M. Conlon

Background Biologists often conduct multiple but different cDNA microarray studies that all target the same biological system or pathway. Within each study, replicate slides within repeated identical experiments are often produced. Pooling information across studies can help more accurately identify true target genes. Here, we introduce a method to integrate multiple independent studies efficiently. Results We introduce a Bayesian hierarchical model to pool cDNA microarray data across multiple independent studies to identify highly expressed genes. Each study has multiple sources of variation, i.e. replicate slides within repeated identical experiments. Our model produces the gene-specific posterior probability of differential expression, which …


A Review Of Limdep 9.0 And Nlogit 4.0, Joseph Hilbe May 2006

A Review Of Limdep 9.0 And Nlogit 4.0, Joseph Hilbe

Joseph M Hilbe

No abstract provided.


Mathematica 5.2: A Review, Joseph Hilbe May 2006

Mathematica 5.2: A Review, Joseph Hilbe

Joseph M Hilbe

No abstract provided.


Wavelet-Based Functional Mixed Models, Jeffrey S. Morris, Raymond J. Carroll Apr 2006

Wavelet-Based Functional Mixed Models, Jeffrey S. Morris, Raymond J. Carroll

Jeffrey S. Morris

Increasingly, Increasingly, scientific studies yield functional data, in which the ideal units of observation are curves and the observed data consist of sets of curves that are sampled on a fine grid. We present new methodology that generalizes the linear mixed model to the functional mixed model framework, with model fitting done by using a Bayesian wavelet-based approach. This method is flexible, allowing functions of arbitrary formand the full range of fixed effects structures and between-curve covariance structures that are available in the mixed model framework. It yields nonparametric estimates of the fixed and random-effects functions as well as the …


Synchrony Of Change In Depressive Symptoms, Health Status, And Quality Of Life In Persons With Clinical Depression, Paula Diehr Apr 2006

Synchrony Of Change In Depressive Symptoms, Health Status, And Quality Of Life In Persons With Clinical Depression, Paula Diehr

Paula Diehr

BACKGROUND: Little is known about longitudinal associations among measures of depression, mental and physical health, and quality of life (QOL). We followed 982 clinically depressed persons to determine which measures changed and whether the change was synchronous with change in depressive symptoms. METHODS: Data were from the Longitudinal Investigation of Depression Outcomes (LIDO). Depressive symptoms, physical and mental health, and quality of life were measured at baseline, 6 weeks, 3 months, and 9 months. Change in the measures was examined over time and for persons with different levels of change in depressive symptoms. RESULTS: On average, all of the measures …


Shrinkage Estimation For Sage Data Using A Mixture Dirichlet Prior, Jeffrey S. Morris, Keith A. Baggerly, Kevin R. Coombes Mar 2006

Shrinkage Estimation For Sage Data Using A Mixture Dirichlet Prior, Jeffrey S. Morris, Keith A. Baggerly, Kevin R. Coombes

Jeffrey S. Morris

Serial Analysis of Gene Expression (SAGE) is a technique for estimating the gene expression profile of a biological sample. Any efficient inference in SAGE must be based upon efficient estimates of these gene expression profiles, which consist of the estimated relative abundances for each mRNA species present in the sample. The data from SAGE experiments are counts for each observed mRNA species, and can be modeled using a multinomial distribution with two characteristics: skewness in the distribution of relative abundances and small sample size relative to the dimension. As a result of these characteristics, a given SAGE sample will fail …


An Introduction To High-Throughput Bioinformatics Data, Keith A. Baggerly, Kevin R. Coombes, Jeffrey S. Morris Mar 2006

An Introduction To High-Throughput Bioinformatics Data, Keith A. Baggerly, Kevin R. Coombes, Jeffrey S. Morris

Jeffrey S. Morris

High throughput biological assays supply thousands of measurements per sample, and the sheer amount of related data increases the need for better models to enhance inference. Such models, however, are more effective if they take into account the idiosyncracies associated with the specific methods of measurement: where the numbers come from. We illustrate this point by describing three different measurement platforms: microarrays, serial analysis of gene expression (SAGE), and proteomic mass spectrometry.


Bayesian Mixture Models For Gene Expression And Protein Profiles, Michele Guindani, Kim-Anh Do, Peter Mueller, Jeffrey S. Morris Mar 2006

Bayesian Mixture Models For Gene Expression And Protein Profiles, Michele Guindani, Kim-Anh Do, Peter Mueller, Jeffrey S. Morris

Jeffrey S. Morris

We review the use of semi-parametric mixture models for Bayesian inference in high throughput genomic data. We discuss three specific approaches for microarray data, for protein mass spectrometry experiments, and for SAGE data. For the microarray data and the protein mass spectrometry we assume group comparison experiments, i.e., experiments that seek to identify genes and proteins that are differentially expressed across two biologic conditions of interest. For the SAGE data example we consider inference for a single biologic sample.


Analysis Of Mass Spectrometry Data Using Bayesian Wavelet-Based Functional Mixed Models, Jeffrey S. Morris, Philip J. Brown, Keith A. Baggerly, Kevin R. Coombes Mar 2006

Analysis Of Mass Spectrometry Data Using Bayesian Wavelet-Based Functional Mixed Models, Jeffrey S. Morris, Philip J. Brown, Keith A. Baggerly, Kevin R. Coombes

Jeffrey S. Morris

In this chapter, we demonstrate how to analyze MALDI-TOF/SELDITOF mass spectrometry data using the wavelet-based functional mixed model introduced by Morris and Carroll (2006), which generalizes the linear mixed models to the case of functional data. This approach models each spectrum as a function, and is very general, accommodating a broad class of experimental designs and allowing one to model nonparametric functional effects for various factors, which can be conditions of interest (e.g. cancer/normal) or experimental factors (blocking factors). Inference on these functional effects allows us to identify protein peaks related to various outcomes of interest, including dichotomous outcomes, categorical …


The "Duty" To Be A Rational Shareholder, David A. Hoffman Feb 2006

The "Duty" To Be A Rational Shareholder, David A. Hoffman

David A Hoffman

How and when do courts determine that corporate disclosures are actionable under the federal securities laws? The applicable standard is materiality: would a (mythical) reasonable investor have considered a given disclosure important. As I establish through empirical and statistical testing of approximately 500 cases analyzing the materiality standard, judicial findings of immateriality are remarkably common, and have been stable over time. Materiality's scope results in the dismissal of a large number of claims, and creates a set of cases in which courts attempt to explain and defend their vision of who is, and is not, a reasonable investor. Thus, materiality …


Investigating Omitted Variable Bias In Regression Parameter Estimation: A Genetic Algorithm Approach, Lonnie K. Stevans, David N. Sessions Jan 2006

Investigating Omitted Variable Bias In Regression Parameter Estimation: A Genetic Algorithm Approach, Lonnie K. Stevans, David N. Sessions

Lonnie K. Stevans

Bias in regression estimates resulting from the omission of a correlated relevant variable is a well known phenomenon. In this study, we apply a genetic algorithm to estimate the missing variable and, using that estimated variable, demonstrate that significant bias in regression estimates can be substantially corrected with relatively high confidence in effective models. Our interest is restricted to the case of a missing binary indicator variable and the analytical properties of bias and MSE dominance of the resulting dependent error generated vector process. These findings are compared to prior results for the independent error proxy process. Simulations are run …


Spatial-Temporal Data Mining Procedure: Lasr, Xiao-Feng Wang, Jiayang Sun, Kath Bogie Jan 2006

Spatial-Temporal Data Mining Procedure: Lasr, Xiao-Feng Wang, Jiayang Sun, Kath Bogie

Xiaofeng Wang

This paper is concerned with the statistical development of our spatial-temporal data mining procedure, LASR (pronounced "laser"). LASR is the abbreviation for Longitudinal Analysis with Self-Registration of largep-small-n data. It was motivated by a study of "Neuromuscular Electrical Stimulation" experiments, where the data are noisy and heterogeneous, might not align from one session to another, and involve a large number of multiple comparisons. The three main components of LASR are: (1) data segmentation for separating heterogeneous data and for distinguishing outliers, (2) automatic approaches for spatial and temporal data registration, and (3) statistical smoothing mapping for identifying "activated" regions based …


Non-Normal Path Analysis In The Presence Of Measurement Error And Missing Data: A Bayesian Analysis Of Nursing Homes' Structure And Outcomes, Byron J. Gajewski, Robert Lee, Sarah Thomspn, Dunton Nancy, Annette Becker, Valorie Coffland Jan 2006

Non-Normal Path Analysis In The Presence Of Measurement Error And Missing Data: A Bayesian Analysis Of Nursing Homes' Structure And Outcomes, Byron J. Gajewski, Robert Lee, Sarah Thomspn, Dunton Nancy, Annette Becker, Valorie Coffland

Byron J Gajewski

Path analytic models are useful tools in quantitative nursing research. They allow researchers to hypothesize causal inferential paths and test the significance of these paths both directly and indirectly through a mediating variable. A standard statistical method in the path analysis literature is to treat the variables as having a normal distribution and to estimate paths using several least squares regression equations. The parameters corresponding to the direct paths have point and interval estimates based on normal distribution theory. Indirect paths are a product of the direct path from the independent variable to the mediating variable and the direct path …


Inter-Rater Reliability Of Nursing Home Surveys: A Bayesian Latent Class Approach, Byron J. Gajewski, Sarah Thompson, Nancy Dunton, Annette Becker, Marcia Wrona Jan 2006

Inter-Rater Reliability Of Nursing Home Surveys: A Bayesian Latent Class Approach, Byron J. Gajewski, Sarah Thompson, Nancy Dunton, Annette Becker, Marcia Wrona

Byron J Gajewski

In the U.S., federal and state governments perform routine inspections of nursing homes. Results of the inspections allow government to generate nes for ndings of non-compliance as well as allow consumers to rank facilities. The purpose of this study is to investigate the inter-rater reliability of the nursing home survey process. In general, the survey data involves 191 binary deciency variables interpreted as ‘decient’ or ‘non-decient’. To reduce the dimensionality of the problem, our proposed method involves two steps. First, we reduce the deciency categories to sub-categories using previous nursing home studies. Second, looking at the State of Kansas specically, …