Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 20 of 20

Full-Text Articles in Statistics and Probability

Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan Mar 2019

Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan

COBRA Preprint Series

One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease's process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards …


A Prior-Free Framework Of Coherent Inference And Its Derivation Of Simple Shrinkage Estimators, David R. Bickel Jun 2012

A Prior-Free Framework Of Coherent Inference And Its Derivation Of Simple Shrinkage Estimators, David R. Bickel

COBRA Preprint Series

The reasoning behind uses of confidence intervals and p-values in scientific practice may be made coherent by modeling the inferring statistician or scientist as an idealized intelligent agent. With other things equal, such an agent regards a hypothesis coinciding with a confidence interval of a higher confidence level as more certain than a hypothesis coinciding with a confidence interval of a lower confidence level. The agent uses different methods of confidence intervals conditional on what information is available. The coherence requirement means all levels of certainty of hypotheses about the parameter agree with the same distribution of certainty over parameter …


A Proof Of Bell's Inequality In Quantum Mechanics Using Causal Interactions, James M. Robins, Tyler J. Vanderweele, Richard D. Gill Sep 2011

A Proof Of Bell's Inequality In Quantum Mechanics Using Causal Interactions, James M. Robins, Tyler J. Vanderweele, Richard D. Gill

COBRA Preprint Series

We give a simple proof of Bell's inequality in quantum mechanics which, in conjunction with experiments, demonstrates that the local hidden variables assumption is false. The proof sheds light on relationships between the notion of causal interaction and interference between particles.


A Unified Approach To Non-Negative Matrix Factorization And Probabilistic Latent Semantic Indexing, Karthik Devarajan, Guoli Wang, Nader Ebrahimi Jul 2011

A Unified Approach To Non-Negative Matrix Factorization And Probabilistic Latent Semantic Indexing, Karthik Devarajan, Guoli Wang, Nader Ebrahimi

COBRA Preprint Series

Non-negative matrix factorization (NMF) by the multiplicative updates algorithm is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into two matrices, W and H, each with nonnegative entries, V ~ WH. NMF has been shown to have a unique parts-based, sparse representation of the data. The nonnegativity constraints in NMF allow only additive combinations of the data which enables it to learn parts that have distinct physical representations in reality. In the last few years, NMF has been successfully applied in a variety of areas such as natural language processing, information retrieval, image processing, speech recognition …


Propensity Score Analysis With Matching Weights, Liang Li May 2011

Propensity Score Analysis With Matching Weights, Liang Li

COBRA Preprint Series

The propensity score analysis is one of the most widely used methods for studying the causal treatment effect in observational studies. This paper studies treatment effect estimation with the method of matching weights. This method resembles propensity score matching but offers a number of new features including efficient estimation, rigorous variance calculation, simple asymptotics, statistical tests of balance, clearly identified target population with optimal sampling property, and no need for choosing matching algorithm and caliper size. In addition, we propose the mirror histogram as a useful tool for graphically displaying balance. The method also shares some features of the inverse …


Minimum Description Length Measures Of Evidence For Enrichment, Zhenyu Yang, David R. Bickel Dec 2010

Minimum Description Length Measures Of Evidence For Enrichment, Zhenyu Yang, David R. Bickel

COBRA Preprint Series

In order to functionally interpret differentially expressed genes or other discovered features, researchers seek to detect enrichment in the form of overrepresentation of discovered features associated with a biological process. Most enrichment methods treat the p-value as the measure of evidence using a statistical test such as the binomial test, Fisher's exact test or the hypergeometric test. However, the p-value is not interpretable as a measure of evidence apart from adjustments in light of the sample size. As a measure of evidence supporting one hypothesis over the other, the Bayes factor (BF) overcomes this drawback of the p-value but lacks …


Minimum Description Length And Empirical Bayes Methods Of Identifying Snps Associated With Disease, Ye Yang, David R. Bickel Nov 2010

Minimum Description Length And Empirical Bayes Methods Of Identifying Snps Associated With Disease, Ye Yang, David R. Bickel

COBRA Preprint Series

The goal of determining which of hundreds of thousands of SNPs are associated with disease poses one of the most challenging multiple testing problems. Using the empirical Bayes approach, the local false discovery rate (LFDR) estimated using popular semiparametric models has enjoyed success in simultaneous inference. However, the estimated LFDR can be biased because the semiparametric approach tends to overestimate the proportion of the non-associated single nucleotide polymorphisms (SNPs). One of the negative consequences is that, like conventional p-values, such LFDR estimates cannot quantify the amount of information in the data that favors the null hypothesis of no disease-association.

We …


The Strength Of Statistical Evidence For Composite Hypotheses: Inference To The Best Explanation, David R. Bickel Jun 2010

The Strength Of Statistical Evidence For Composite Hypotheses: Inference To The Best Explanation, David R. Bickel

COBRA Preprint Series

A general function to quantify the weight of evidence in a sample of data for one hypothesis over another is derived from the law of likelihood and from a statistical formalization of inference to the best explanation. For a fixed parameter of interest, the resulting weight of evidence that favors one composite hypothesis over another is the likelihood ratio using the parameter value consistent with each hypothesis that maximizes the likelihood function over the parameter of interest. Since the weight of evidence is generally only known up to a nuisance parameter, it is approximated by replacing the likelihood function with …


Shrinkage Estimation Of Expression Fold Change As An Alternative To Testing Hypotheses Of Equivalent Expression, Zahra Montazeri, Corey M. Yanofsky, David R. Bickel Aug 2009

Shrinkage Estimation Of Expression Fold Change As An Alternative To Testing Hypotheses Of Equivalent Expression, Zahra Montazeri, Corey M. Yanofsky, David R. Bickel

COBRA Preprint Series

Research on analyzing microarray data has focused on the problem of identifying differentially expressed genes to the neglect of the problem of how to integrate evidence that a gene is differentially expressed with information on the extent of its differential expression. Consequently, researchers currently prioritize genes for further study either on the basis of volcano plots or, more commonly, according to simple estimates of the fold change after filtering the genes with an arbitrary statistical significance threshold. While the subjective and informal nature of the former practice precludes quantification of its reliability, the latter practice is equivalent to using a …


Correlated Binary Regression Using Orthogonalized Residuals, Richard C. Zink, Bahjat F. Qaqish Mar 2009

Correlated Binary Regression Using Orthogonalized Residuals, Richard C. Zink, Bahjat F. Qaqish

COBRA Preprint Series

This paper focuses on marginal regression models for correlated binary responses when estimation of the association structure is of primary interest. A new estimating function approach based on orthogonalized residuals is proposed. This procedure allows a new representation and addresses some of the difficulties of the conditional-residual formulation of alternating logistic regressions of Carey, Zeger & Diggle (1993). The new method is illustrated with an analysis of data on impaired pulmonary function.


Validation Of Differential Gene Expression Algorithms: Application Comparing Fold Change Estimation To Hypothesis Testing, David R. Bickel, Corey M. Yanofsky Feb 2009

Validation Of Differential Gene Expression Algorithms: Application Comparing Fold Change Estimation To Hypothesis Testing, David R. Bickel, Corey M. Yanofsky

COBRA Preprint Series

Sustained research on the problem of determining which genes are differentially expressed on the basis of microarray data has yielded a plethora of statistical algorithms, each justified by theory, simulation, or ad hoc validation and yet differing in practical results from equally justified algorithms. The widespread confusion on which method to use in practice has been exacerbated by the finding that simply ranking genes by their fold changes sometimes outperforms popular statistical tests.

Algorithms may be compared by quantifying each method's error in predicting expression ratios, whether such ratios are defined across microarray channels or between two independent groups. For …


Change-Point Problem And Regression: An Annotated Bibliography, Ahmad Khodadadi, Masoud Asgharian Nov 2008

Change-Point Problem And Regression: An Annotated Bibliography, Ahmad Khodadadi, Masoud Asgharian

COBRA Preprint Series

The problems of identifying changes at unknown times and of estimating the location of changes in stochastic processes are referred to as "the change-point problem" or, in the Eastern literature, as "disorder".

The change-point problem, first introduced in the quality control context, has since developed into a fundamental problem in the areas of statistical control theory, stationarity of a stochastic process, estimation of the current position of a time series, testing and estimation of change in the patterns of a regression model, and most recently in the comparison and matching of DNA sequences in microarray data analysis.

Numerous methodological approaches …


The Strength Of Statistical Evidence For Composite Hypotheses With An Application To Multiple Comparisons, David R. Bickel Nov 2008

The Strength Of Statistical Evidence For Composite Hypotheses With An Application To Multiple Comparisons, David R. Bickel

COBRA Preprint Series

The strength of the statistical evidence in a sample of data that favors one composite hypothesis over another may be quantified by the likelihood ratio using the parameter value consistent with each hypothesis that maximizes the likelihood function. Unlike the p-value and the Bayes factor, this measure of evidence is coherent in the sense that it cannot support a hypothesis over any hypothesis that it entails. Further, when comparing the hypothesis that the parameter lies outside a non-trivial interval to the hypotheses that it lies within the interval, the proposed measure of evidence almost always asymptotically favors the correct hypothesis …


A New Method For Constructing Exact Tests Without Making Any Assumptions, Karl H. Schlag Aug 2008

A New Method For Constructing Exact Tests Without Making Any Assumptions, Karl H. Schlag

COBRA Preprint Series

We present a new method for constructing exact distribution-free tests (and con…fidence intervals) for variables that can generate more than two possible outcomes. This method separates the search for an exact test from the goal to create a non- randomized test. Randomization is used to extend any exact test relating to means of variables with fi…nitely many outcomes to variables with outcomes belonging to a given bounded set. Tests in terms of variance and covariance are reduced to tests relating to means. Randomness is then eliminated in a separate step. This method is used to create con…fidence intervals for the …


Bringing Game Theory To Hypothesis Testing: Establishing Finite Sample Bounds On Inference, Karl H. Schlag Jun 2008

Bringing Game Theory To Hypothesis Testing: Establishing Finite Sample Bounds On Inference, Karl H. Schlag

COBRA Preprint Series

Small sample properties are of fundamental interest when only limited data is available. Exact inference is limited by constraints imposed by specific nonrandomized tests and of course also by lack of more data. These effects can be separated as we propose to evaluate a test by comparing its type II error to the minimal type II error among all tests for the given sample. Game theory is used to establish this minimal type II error, the associated randomized test is characterized as part of a Nash equilibrium of a fictitious game against nature. We use this method to investigate sequential …


Properties Of Monotonic Effects On Directed Acyclic Graphs, Tyler J. Vanderweele, James M. Robins Apr 2008

Properties Of Monotonic Effects On Directed Acyclic Graphs, Tyler J. Vanderweele, James M. Robins

COBRA Preprint Series

Various relationships are shown hold between monotonic effects and weak monotonic effects and the monotonicity of certain conditional expectations. Counterexamples are provided to show that the results do not hold under less restrictive conditions. Monotonic effects are furthermore used to relate signed edges on a causal directed acyclic graph to qualitative effect modification. The theory is applied to an example concerning the direct effect of smoking on cardiovascular disease controlling for hypercholesterolemia. Monotonicity assumptions are used to construct a test for whether there is a variable that confounds the relationship between the mediator, hypercholesterolemia, and the outcome, cardiovascular disease.


Bootstrap Confidence Regions For Optimal Operating Conditions In Response Surface Methodology, Roger D. Gibb, I-Li Lu, Walter H. Carter Jr Nov 2007

Bootstrap Confidence Regions For Optimal Operating Conditions In Response Surface Methodology, Roger D. Gibb, I-Li Lu, Walter H. Carter Jr

COBRA Preprint Series

This article concerns the application of bootstrap methodology to construct a likelihood-based confidence region for operating conditions associated with the maximum of a response surface constrained to a specified region. Unlike classical methods based on the stationary point, proper interpretation of this confidence region does not depend on unknown model parameters. In addition, the methodology does not require the assumption of normally distributed errors. The approach is demonstrated for concave-down and saddle system cases in two dimensions. Simulation studies were performed to assess the coverage probability of these regions.

AMS 2000 subj Classification: 62F25, 62F40, 62F30, 62J05.

Key words: Stationary …


Review Of The Maximum Likelihood Functions For Right Censored Data. A New Elementary Derivation., Stefano Patti, Elia Biganzoli, Patrizia Boracchi May 2007

Review Of The Maximum Likelihood Functions For Right Censored Data. A New Elementary Derivation., Stefano Patti, Elia Biganzoli, Patrizia Boracchi

COBRA Preprint Series

Censoring is a well known feature recurrent in the analysis of lifetime data, occurring in the model when exact lifetimes can be collected for only a representative portion of the surveyed individuals. If lifetimes are known only to exceed some given values, it is referred to as right censoring. In this paper we propose a systematization and a new derivation of the likelihood function for right censored sampling schemes; calculations are reported and assumptions are carefully stated. The sampling schemes considered (Type I, II and Random Censoring) give rise to the same ML function. Only the knowledge of elementary probability …


Properties Of Monotonic Effects, Tyler J. Vanderweele, James M. Robins Nov 2006

Properties Of Monotonic Effects, Tyler J. Vanderweele, James M. Robins

COBRA Preprint Series

Various relationships are shown hold between monotonic effects and weak monotonic effects and the monotonicity of certain conditional expectations. This relationship is considered for both binary and non-binary variables. Counterexamples are provide to show that the results do not hold under less restrictive conditions. The ideas of monotonic effects are furthermore used to relate signed edges on a directed acyclic graph to qualitative effect modification.


New Statistical Paradigms Leading To Web-Based Tools For Clinical/Translational Science, Knut M. Wittkowski May 2005

New Statistical Paradigms Leading To Web-Based Tools For Clinical/Translational Science, Knut M. Wittkowski

COBRA Preprint Series

As the field of functional genetics and genomics is beginning to mature, we become confronted with new challenges. The constant drop in price for sequencing and gene expression profiling as well as the increasing number of genetic and genomic variables that can be measured makes it feasible to address more complex questions. The success with rare diseases caused by single loci or genes has provided us with a proof-of-concept that new therapies can be developed based on functional genomics and genetics.

Common diseases, however, typically involve genetic epistasis, genomic pathways, and proteomic pattern. Moreover, to better understand the underlying biologi-cal …