Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 10741 - 10770 of 12686

Full-Text Articles in Physical Sciences and Mathematics

Applications Of Statistical Data Mining Methods, George Fernandez Apr 2004

Applications Of Statistical Data Mining Methods, George Fernandez

Conference on Applied Statistics in Agriculture

Data mining is a collection of analytical techniques to uncover new trends and patterns in large databases. These data mining techniques stress visualization to thoroughly study the structure of data and to check the validity of statistical model fit to the data and lead to knowledge discovery. Data mining is an interdisciplinary research area spanning several disciplines such as database management, machine learning, statistical computing, and expert systems. Although data mining is a relatively new term, the technology is not. Data mining allows users to analyze data from many different dimensions or angles, explore and categorize it, and summarize the …


Editor's Preface And Table Of Contents, George A. Milliken Apr 2004

Editor's Preface And Table Of Contents, George A. Milliken

Conference on Applied Statistics in Agriculture

These proceedings contain papers presented in the sixteenth annual Kansas State University Conference on Applied Statistics in Agriculture, held in Manhattan, Kansas, April 25-27, 2004.


Resampling Methods For Estimating Functions With U-Statistic Structure, Wenyu Jiang, Jack Kalbfleisch Apr 2004

Resampling Methods For Estimating Functions With U-Statistic Structure, Wenyu Jiang, Jack Kalbfleisch

The University of Michigan Department of Biostatistics Working Paper Series

Suppose that inference about parameters of interest is to be based on an unbiased estimating function that is U-statistic of degree 1 or 2. We define suitable studentized versions of such estimating functions and consider asymptotic approximations as well as an estimating function bootstrap (EFB) method based on resampling the estimated terms in the estimating functions. These methods are justified asymptotically and lead to confidence intervals produced directly from the studentized estimating functions. Particular examples in this class of estimating functions arise in La estimation as well as Wilcoxon rank regression and other related estimation problems. The proposed methods are …


Covariate Adjustment In The Analysis Of Microarray Data From Clinical Studies, Debashis Ghosh, Arul Chinnaiyan Apr 2004

Covariate Adjustment In The Analysis Of Microarray Data From Clinical Studies, Debashis Ghosh, Arul Chinnaiyan

The University of Michigan Department of Biostatistics Working Paper Series

There is tremendous scientific interest in the analysis of gene expression data in clinical settings, such as oncology. In this paper, we describe the importance of adjusting for confounders and other prognostic factors in order to select for differentially expressed genes for followup validation studies. We develop two approaches to the analysis of microarray data in nonrandomized clinical settings. The first is an extension of the current significance analysis of microarray procedures, where other covariates are taken into account. The second is a novel covariate-adjusted regression modelling based on the receiver operating characteristic curve for the analysis of gene expression …


One- And Two-Sample Nonparametric Inference Procedures In The Presence Of Dependent Censoring, Yuhyun Park, Lu Tian, L. J. Wei Apr 2004

One- And Two-Sample Nonparametric Inference Procedures In The Presence Of Dependent Censoring, Yuhyun Park, Lu Tian, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


Evaluating Markers For Selecting A Patient's Treatment, Xiao Song, Margaret S. Pepe Apr 2004

Evaluating Markers For Selecting A Patient's Treatment, Xiao Song, Margaret S. Pepe

UW Biostatistics Working Paper Series

Selecting the best treatment for a patient's disease may be facilitated by evaluating clinical characteristics or biomarker measurements at diagnosis. We consider how to evaluate the potential of such measurements to impact on treatment selection algorithms. For example, magnetic resonance neurographic imaging is potentially useful for deciding whether a patient should be treated surgically for carpal tunnel syndrome or if he/she should receive less invasive conservative therapy. We propose a graphical display, the selection impact (SI) curve, that shows the population response rate as a function of treatment selection criteria based on the marker. The curve can be useful for …


Nonparametric Control Chart For The Range, Arnold J. Stromberg Apr 2004

Nonparametric Control Chart For The Range, Arnold J. Stromberg

Statistics Faculty Patents

The method comprises establishing the number of subsets of a dataset that have a range of the difference between any two datapoints within the dataset, and computing a control chart for the range based thereon. In another aspect, a software program for accomplishing the method of the present invention is provided. The method of the invention allows monitoring variability of a product being produced by a particular piece of machinery, of a process conducted by the machinery, or of a product stream generated thereby, accurately detecting changes in variability in real time. The true distribution of the data is reflected, …


Mathematical And Empirical Modeling Of Chemical Reactions In A Microreactor, Jing Hu Apr 2004

Mathematical And Empirical Modeling Of Chemical Reactions In A Microreactor, Jing Hu

Doctoral Dissertations

This dissertation is concerned with mathematical and empirical modeling to simulate three important chemical reactions (cyclohexene hydrogenation and dehydrogenation, preferential oxidation of carbon monoxide, and the Fischer-Tropsch (F-T) synthesis in a microreaction system.

Empirical modeling and optimization techniques based on experimental design (Central Composite Design (CCD)) and response surface methodology were applied to these three chemical reactions. Regression models were built, and the operating conditions (such as temperature, the ratio of the reactants, and total flow rate) which maximize reactant conversion and product selectivity were determined for each reaction.

A probability model for predicting the probability that a certain species …


On Leverage In A Stochastic Volatility Model, Jun Yu Apr 2004

On Leverage In A Stochastic Volatility Model, Jun Yu

Research Collection School Of Economics

This paper is concerned with specification for modelling financial leverage effect in the context of stochastic volatility (SV) models. Two alternative specifications co-exist in the literature. One is the Euler approximation to the well known continuous time SV model with leverage effect and the other is the discrete time SV model of Jacquier, Polson and Rossi (2004, Journal of Econometrics, forthcoming). Using a Gaussian nonlinear state space form with uncorrelated measurement and transition errors, I show that it is easy to interpret the leverage effect in the conventional model whereas it is not clear how to obtain the leverage effect …


Regulatory Motif Finding By Logic Regression, Sunduz Keles, Mark J. Van Der Laan, Chris Vulpe Mar 2004

Regulatory Motif Finding By Logic Regression, Sunduz Keles, Mark J. Van Der Laan, Chris Vulpe

U.C. Berkeley Division of Biostatistics Working Paper Series

Multiple transcription factors coordinately control transcriptional regulation of genes in eukaryotes. Although multiple computational methods consider the identification of individual transcription factor binding sites (TFBSs), very few focus on the interactions between these sites. We consider finding transcription factor binding sites and their context specific interactions using microarray gene expression data. We devise a hybrid approach called LogicMotif composed of a TFBS identification method combined with the new regression methodology logic regression of Ruczinski et al. (2003). LogicMotif has two steps: First potential binding sites are identified from transcription control regions of genes of interest. Various available methods can be …


Latest Tools For Viewing And Quality Checking Arm Data, S. Moore, Gary B. Hughes Mar 2004

Latest Tools For Viewing And Quality Checking Arm Data, S. Moore, Gary B. Hughes

Statistics

The DOE Atmospheric Radiation Measurement (ARM) program has acquired an incredibly large quantity of data over its period of operation, all of which must be reviewed in some manner in order to ensure that the data is of “known and reasonable” quality (ARM Science Plan). To accomplish this, Mission Research Corporation (MRC) coordinates with the ARM Data Quality Office to develop software tools that quality-check data products in a timely and continuous fashion. These tools work with the Data Quality Health and Status (DQ HandS) Explorer (Peppler et al. 2004) by analyzing ARM data streams, providing assessments of data quality …


A Statistical Method For Constructing Transcriptional Regulatory Networks Using Gene Expression And Sequence Data , Biao Xing, Mark J. Van Der Laan Mar 2004

A Statistical Method For Constructing Transcriptional Regulatory Networks Using Gene Expression And Sequence Data , Biao Xing, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Transcriptional regulation is one of the most important means of gene regulation. Uncovering transcriptional regulatory network helps us to understand the complex cellular process. In this paper, we describe a comprehensive statistical approach for constructing the transcriptional regulatory network using data of gene expression, promoter sequence, and transcription factor binding sites. Our simulation studies show that the overall and false positive error rates in the estimated transcriptional regulatory network are expected to be small if the systematic noise in the constructed feature matrix is small. Our analysis based on 658 microarray experiments on yeast gene expression programs and 46 transcription …


Causal Inference In Hybrid Intervention Trials Involving Treatment Choice, Qi Long, Rod Little, Xihong Lin Mar 2004

Causal Inference In Hybrid Intervention Trials Involving Treatment Choice, Qi Long, Rod Little, Xihong Lin

The University of Michigan Department of Biostatistics Working Paper Series

Randomized allocation of treatments is a cornerstone of experimental design, but has drawbacks when a limited set of individuals are willing to be randomized, or the act of randomization undermines the success of the treatment. Choice-based experimental designs allow a subset of the participants to choose their treatments. We discuss here causal inferences for experimental designs where some participants are randomly allocated to treatments and others receive their treatment preference. This paper was motivated by the “Women Take Pride” (WTP) study (Janevic et al., 2001), a doubly randomized preference trail (DRPT) to assess behavioral interventions for women with heart disease. …


Error Models For Microarray Intensities, Wolfgang Huber, Anja Von Heydebreck, Martin Vingron Mar 2004

Error Models For Microarray Intensities, Wolfgang Huber, Anja Von Heydebreck, Martin Vingron

Bioconductor Project Working Papers

We derive the additive-multiplicative error model for microarray intensities, and describe two applications. For the detection of differentially expressed genes, we obtain a statistic whose variance is approximately independent of the mean intensity. For the post hoc calibration (normalization) of data with respect to experimental factors, we describe a method for parameter estimation.


A Bayesian Hierarchical Approach To Multirater Correlated Roc Analysis, Tim Johnson, Valen Johnson Mar 2004

A Bayesian Hierarchical Approach To Multirater Correlated Roc Analysis, Tim Johnson, Valen Johnson

The University of Michigan Department of Biostatistics Working Paper Series

In a common ROC study design, several readers are asked to rate diagnostics of the same cases processed under different modalities. We describe a Bayesian hierarchical model that facilitates the analysis of this study design by explicitly modeling the three sources of variation inherent to it. In so doing, we achieve substantial reductions in the posterior uncertainty associated with estimates of the differences in areas under the estimated ROC curves and corresponding reductions in the mean squared error (MSE) of these estimates. Based on simulation studies, both the widths of confidence intervals and MSE of estimates of differences in the …


Loss-Based Cross-Validated Deletion/Substitution/Addition Algorithms In Estimation, Sandra E. Sinisi, Mark J. Van Der Laan Mar 2004

Loss-Based Cross-Validated Deletion/Substitution/Addition Algorithms In Estimation, Sandra E. Sinisi, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In van der Laan and Dudoit (2003) we propose and theoretically study a unified loss function based statistical methodology, which provides a road map for estimation and performance assessment. Given a parameter of interest which can be described as the minimizer of the population mean of a loss function, the road map involves as important ingredients cross-validation for estimator selection and minimizing over subsets of basis functions the empirical risk of the subset-specific estimator of the parameter of interest, where the basis functions correspond to a parameterization of a specified subspace of the complete parameter space. In this article we …


Explaining Death Row's Population And Racial Composition, John H. Blume, Theodore Eisenberg, Martin T. Wells Mar 2004

Explaining Death Row's Population And Racial Composition, John H. Blume, Theodore Eisenberg, Martin T. Wells

Cornell Law Faculty Publications

Twenty-three years of murder and death sentence data show how murder demographics help explain death row populations. Nevada and Oklahoma are the most death-prone states; Texas's death sentence rate is below the national mean. Accounting for the race of murderers establishes that black representation on death row is lower than black representation in the population of murder offenders. This disproportion results from reluctance to seek or impose death in black defendant-black victim cases, which more than offsets eagerness to seek and impose death in black defendant-white victim cases. Death sentence rates in black defendant-white victim cases far exceed those in …


Attorney Fees In Class Action Settlements: An Empirical Study, Theodore Eisenberg, Geoffrey P. Miller Mar 2004

Attorney Fees In Class Action Settlements: An Empirical Study, Theodore Eisenberg, Geoffrey P. Miller

Cornell Law Faculty Publications

Study of two comprehensive class action case data sets covering 1993-2002 shows that the amount of client recovery is overwhelmingly the most important determinant of the attorney fee award. Even in cases in which the courts engage in the lodestar calculation (the product of reasonable hours and a reasonable hourly rate), the client's recovery generally explains the pattern of awards better than the lodestar. Thus, the time and expense of a lodestar calculation may be wasteful. We also find no robust evidence that either recoveries for plaintiffs or fees of their attorneys increased overtime. The mean fee award in common …


An Investigation Of The Effects Of Correlation, Autocorrelation, And Sample Size In Classifier Fusion, Nathan J. Leap Mar 2004

An Investigation Of The Effects Of Correlation, Autocorrelation, And Sample Size In Classifier Fusion, Nathan J. Leap

Theses and Dissertations

This thesis extends the research found in Storm, Bauer, and Oxley, 2003. Data correlation effects and sample size effects on three classifier fusion techniques and one data fusion technique were investigated. Identification System Operating Characteristic Fusion (Haspert, 2000), the Receiver Operating Characteristic Within Fusion method (Oxley and Bauer, 2002), and a Probabilistic Neural Network were the three classifier fusion techniques; a Generalized Regression Neural Network was the data fusion technique. Correlation was injected into the data set both within a feature set (autocorrelation) and across feature sets for a variety of classification problems, and sample size was varied throughout. Total …


The Cross-Validated Adaptive Epsilon-Net Estimator, Mark J. Van Der Laan, Sandrine Dudoit, Aad W. Van Der Vaart Feb 2004

The Cross-Validated Adaptive Epsilon-Net Estimator, Mark J. Van Der Laan, Sandrine Dudoit, Aad W. Van Der Vaart

U.C. Berkeley Division of Biostatistics Working Paper Series

Suppose that we observe a sample of independent and identically distributed realizations of a random variable. Assume that the parameter of interest can be defined as the minimizer, over a suitably defined parameter space, of the expectation (with respect to the distribution of the random variable) of a particular (loss) function of a candidate parameter value and the random variable. Examples of commonly used loss functions are the squared error loss function in regression and the negative log-density loss function in density estimation. Minimizing the empirical risk (i.e., the empirical mean of the loss function) over the entire parameter space …


A Bayesian Chi-Squared Test For Goodness Of Fit, Valen Johnson Feb 2004

A Bayesian Chi-Squared Test For Goodness Of Fit, Valen Johnson

The University of Michigan Department of Biostatistics Working Paper Series

This article describes an extension of classical x 2 goodness-of-fit tests to Bayesian model assessment. The extension, which essentially involvesevaluating Pearson's goodness-of-fit statistic at a parameter value drawn from its posterior distribution, has the important property that it is asymptoti-cally distributed as a x2 random variable on K-1 degrees of freedom, indepen-dently of the dimension of the underlying parameter vector. By averaging over the posterior distribution of this statistic, a global goodness-of-fit diagnostic is obtained. Advantages of this diagnostic{which may be interpreted as the area under an ROC curve{include ease of interpretation, computational conve-nience, and favorable power properties. The proposed …


Multiple Imputation For Interval Censored Data With Auxiliary Variables, Chiu-Hsieh Hsu, Jeremy Taylor, Susan Murray Feb 2004

Multiple Imputation For Interval Censored Data With Auxiliary Variables, Chiu-Hsieh Hsu, Jeremy Taylor, Susan Murray

The University of Michigan Department of Biostatistics Working Paper Series

We propose a nonparametric multiple imputation scheme, NPMLE imputation, for the analysis of interval censored survival data. Features of the method are that it converts interval-censored data problems to complete data or right censored data problems to which many standard approaches can be used, and the measures of uncertainty are easily obtained. In addition to the event time of primary interest, there are frequently other auxiliary variables that are associated with the event time. For the goal of estimating the marginal survival distribution, these auxiliary variables may provide some additional information about the event time for the interval censored observations. …


Individualized Predictions Of Disease Progression Following Radiation Therapy For Prostate Cancer., Jeremy Taylor, Menggang Yu, Howard M. Sandler Feb 2004

Individualized Predictions Of Disease Progression Following Radiation Therapy For Prostate Cancer., Jeremy Taylor, Menggang Yu, Howard M. Sandler

The University of Michigan Department of Biostatistics Working Paper Series

Background: Following treatment for localized prostate cancer, men are monitored with serial PSA measurements. Refining the predictive value of post-treatment PSA determinations may add to clinical management and we have developed a model that predicts for an individual patient future PSA values and estimates the time to future clinical recurrence.

Methods: Data from 934 patients treated for prostate cancer between 1987 and 2000 were used to develop a comprehensive statistical model to fit the clinical recurrence events and pattern of PSA data. A logistic regression model was used for the probability of cure, non-linear hierarchical mixed models were used for …


Piecewise Constant Cross-Ratio Estimation For Association In Bivariate Survival Data With Application To Studying Markers Of Menopausal Transition, Bin Nan, Xihong Lin, Lynda D. Lisabet, Sioban Harlow Feb 2004

Piecewise Constant Cross-Ratio Estimation For Association In Bivariate Survival Data With Application To Studying Markers Of Menopausal Transition, Bin Nan, Xihong Lin, Lynda D. Lisabet, Sioban Harlow

The University of Michigan Department of Biostatistics Working Paper Series

A question of significant interest in female reproductive aging is to identify bleeding criteria for the menopausal transition. Although various bleeding criteria, or markers, have been proposed for the menopausal transition, their validity has not been adequately examined. The Tremin Trust data are collected from a long-term cohort study that followed a group of women throughout their whole reproductive life, and provide a unique opportunity for assessing the association between age at onset of a bleeding marker and age onset of menopause. Formal statistical analysis of this dependence is challenging give the fact that both the marker event and menopause …


Individual Prediction In Prostate Cancer Studies Using A Joint Longitudinal-Survival-Cure Model, Menggang Yu, Jeremy Taylor, Howard M. Sandler Feb 2004

Individual Prediction In Prostate Cancer Studies Using A Joint Longitudinal-Survival-Cure Model, Menggang Yu, Jeremy Taylor, Howard M. Sandler

The University of Michigan Department of Biostatistics Working Paper Series

For monitoring patients treated for prostate cancer, Prostate Specific Antigen (PSA) is measured periodically after they receive treatment. Increases in PSA are suggestive of recurrence of the cancer and are used in making decisions about possible new treatments. The data from studies of such patients typically consist of longitudinal PSA measurements, censored event times and baseline covariates. Methods for the combined analysis of both longitudinal and survival data have been developed in recent years, with the main emphasis being on modeling and estimation. We analyze data from a prostate cancer study that has been extended by adding a mixture structure …


Mixture Models For Assessing Differential Expression In Complex Tissues Using Microarray Data, Debashis Ghosh Feb 2004

Mixture Models For Assessing Differential Expression In Complex Tissues Using Microarray Data, Debashis Ghosh

The University of Michigan Department of Biostatistics Working Paper Series

The use of DNA microarrays has become quite popular in many scientific and medical disciplines, such as in cancer research. One common goal of these studies is to determine which genes are differentially expressed between cancer and healthy tissue, or more generally, between two experimental conditions. A major complication in the molecular profiling of tumors using gene expression data is that the data represent a combination of tumor and normal cells. Much of the methodology developed for assessing differential expression with microarray data has assumed that tissue samples are homogeneous. In this article, we outline a general framework for determining …


Optimal Sample Size For Multiple Testing: The Case Of Gene Expression Microarrays, Peter Muller, Giovanni Parmigiani, Christian Robert, Judith Rousseau Feb 2004

Optimal Sample Size For Multiple Testing: The Case Of Gene Expression Microarrays, Peter Muller, Giovanni Parmigiani, Christian Robert, Judith Rousseau

Johns Hopkins University, Dept. of Biostatistics Working Papers

We consider the choice of an optimal sample size for multiple comparison problems. The motivating application is the choice of the number of microarray experiments to be carried out when learning about differential gene expression. However, the approach is valid in any application that involves multiple comparisons in a large number of hypothesis tests. We discuss two decision problems in the context of this setup: the sample size selection and the decision about the multiple comparisons. We adopt a decision theoretic approach,using loss functions that combine the competing goals of discovering as many ifferentially expressed genes as possible, while keeping …


A Log-Normal Distribution Model Of The Effect Of Bacteria And Ear Fenestration On Hearing Loss: A Bayesian Approach, Byron J. Gajewski, Jack D. Sedwick, Patrick J. Antonelli Feb 2004

A Log-Normal Distribution Model Of The Effect Of Bacteria And Ear Fenestration On Hearing Loss: A Bayesian Approach, Byron J. Gajewski, Jack D. Sedwick, Patrick J. Antonelli

Byron J Gajewski

Chronic ear infection is a potentially life-threatening illness that medical doctors typically treat with ear surgery. Despite the success of this treatment, complications can occur due to bacteria infection. Surgeons believe that this infection causes the patient to have clinically signigcant hearing damage. In order to understand such complications, surgeons must quantify the effect of bacteria, their toxins and ear surgery on hearing loss. To this end, the other two authors of this paper performed two experiments on guinea pigs to measure hearing thresholds following a bacterial infection and surgery of the inner ear. The response variable in these experiments …


An Analysis Of The Periodicity Of The Cell Cycle And Apoptotic Regulatory Proteins In Prostate Xenografts Using Anova And Cosinor Methods, Aleen Hosdaghian Jan 2004

An Analysis Of The Periodicity Of The Cell Cycle And Apoptotic Regulatory Proteins In Prostate Xenografts Using Anova And Cosinor Methods, Aleen Hosdaghian

Theses

Circadian rhythms have been found in both plants and animals, in normal tissues as well as in most tumors and human cancers. By following these rhythms in healthy and cancerous tissue, it has been possible to find optimal times to deliver a dose of drug, such that efficacy is maximized and toxicity to normal tissues is minimized. In this study, the periodicity of several cell cycle and apoptotic regulatory proteins were studied in two prostate cancer models against a dietary therapeutic agent, Selenium. The ALVA-3 1 (androgen-independent) and PC-3 (androgen-independent) prostate cancer cell lines were grown in vivo, as a …


Calibrating Observed Differential Gene Expression For The Multiplicity Of Genes On The Array, Yingye Zheng, Margaret S. Pepe Jan 2004

Calibrating Observed Differential Gene Expression For The Multiplicity Of Genes On The Array, Yingye Zheng, Margaret S. Pepe

UW Biostatistics Working Paper Series

In a gene expression array study, the expression levels of thousands of genes are monitored simultaneously across various biological conditions on a small set of subjects. One goal of such studies is to explore a large pool of genes in order to select a subset of genes that appear to be differently expressed for further investigation. Of particular interest here is how to select the top k genes once genes are ranked based on their evidence for differential expression in two tissue types. We consider statistical methods that provide a more rigorous and intuitively appealing selection process for k. We …