Statistics and Probability | Open Access Articles

Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh

Harvard University Biostatistics Working Paper Series

No abstract provided.

Multiple Testing With An Empirical Alternative Hypothesis, James E. Signorovitch

Harvard University Biostatistics Working Paper Series

An optimal multiple testing procedure is identified for linear hypotheses under the general linear model, maximizing the expected number of false null hypotheses rejected at any significance level. The optimal procedure depends on the unknown data-generating distribution, but can be consistently estimated. Drawing information together across many hypotheses, the estimated optimal procedure provides an empirical alternative hypothesis by adapting to underlying patterns of departure from the null. Proposed multiple testing procedures based on the empirical alternative are evaluated through simulations and an application to gene expression microarray data. Compared to a standard multiple testing procedure, it is not unusual for …

Go to article

Exploration Of Distributional Models For A Novel Intensity-Dependent Normalization , Nicola Lama, Patrizia Boracchi, Elia Mario Biganzoli

COBRA Preprint Series

Currently used gene intensity-dependent normalization methods, based on regression smoothing techniques, usually approach the two problems of location bias detrending and data re-scaling without taking into account the censoring characteristic of certain gene expressions produced by experiment measurement constraints or by previous normalization steps. Moreover, the bias vs variance balance control of normalization procedures is not often discussed but left to the user's experience. Here an approximate maximum likelihood procedure to fit a model smoothing the dependences of log-fold gene expression differences on average gene intensities is presented. Central tendency and scaling factor were modeled by means of B-splines smoothing …

Go to article

Structural Inference In Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Xihong Lin, Donglin Zeng

Harvard University Biostatistics Working Paper Series

No abstract provided.

Go to article

Estimation In Semiparametric Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Donglin Zeng, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.

Go to article

Nonparametric Regression Using Local Kernel Estimating Equations For Correlated Failure Time Data, Zhangsheng Yu, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.

Go to article

Causal Inference In Hybrid Intervention Trials Involving Treatment Choice, Qi Long, Rod Little, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.

Go to article

A Comparison Of Methods For Estimating The Causal Effect Of A Treatment In Randomized Clinical Trials Subject To Noncompliance, Rod Little, Qi Long, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.

Go to article

Bounded Search For De Novo Identification Of Degenerate Cis-Regulatory Elements, Jonathan M. Carlson, Arijit Chakravarty, Radhika S. Khetani, Robert H. Gross

Dartmouth Scholarship

The identification of statistically overrepresented sequences in the upstream regions of coregulated genes should theoretically permit the identification of potential cis-regulatory elements. However, in practice many cis-regulatory elements are highly degenerate, precluding the use of an exhaustive word-counting strategy for their identification. While numerous methods exist for inferring base distributions using a position weight matrix, recent studies suggest that the independence assumptions inherent in the model, as well as the inability to reach a global optimum, limit this approach.

Go to article

Genome Scanning Methods For Comparing Sequences Between Groups, With Application To Hiv Vaccine Trials, Peter B. Gilbert, Chunyuan Wu, David V. Jobes

UW Biostatistics Working Paper Series

Consider a placebo-controlled preventive HIV vaccine efficacy trial. An HIV amino acid sequence is measured from each volunteer who acquires HIV, and these sequences are aligned together with the reference HIV sequence represented in the vaccine. We develop genome scanning methods to identify HIV positions at which the amino acids in sequences from infected vaccine recipients tend to be more divergent from the corresponding reference amino acid than the amino acids in sequences from infected placebo recipients. We consider five two-sample test statistics, based on Euclidean, Mahalanobis, and Kullback-Leibler divergence measures. Weights are incorporated to reflect biological information contained in …

Go to article

2^K Factorials In Blocks Of Size 2, With Application To Two-Color Microarray Experiments, Kathleen F. Kerr

UW Biostatistics Working Paper Series

When a two-level design must be run in blocks of size two, there is a unique blocking scheme that enables estimation of all the main effects. Unfortunately this design does not enable estimation of any two-factor interactions. When the experimental goal is to estimate all main effects and two-factor interactions, it is necessary to combine replicates of the experiment that use different blocking schemes. In this paper we identify such designs for up to eight factors that enable estimation of all main effects and two-factor interactions with the fewest number of replications. In addition, we give a construction for general …

Go to article

Multiple Tests Of Association With Biological Annotation Metadata, Sandrine Dudoit, Sunduz Keles, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We propose a general and formal statistical framework for the multiple tests of associations between known fixed features of a genome and unknown parameters of the distribution of variable features of this genome in a population of interest. The known fixed gene-annotation profiles, corresponding to the fixed features of the genome, may concern Gene Ontology (GO) annotation, pathway membership, regulation by particular transcription factors, nucleotide sequences, or protein sequences. The unknown gene-parameter profiles, corresponding to the variable features of the genome, may be, for example, regression coefficients relating genome-wide transcript levels or DNA copy numbers to possibly censored biological and …

Go to article

Gpnn: Power Studies And Applications Of A Neural Network Method For Detecting Gene-Gene Interactions In Studies Of Human Disease, Alison A. Motsinger, Stephen L. Lee, George Mellick, Marylyn D. Ritchie

Dartmouth Scholarship

The identification and characterization of genes that influence the risk of common, complex multifactorial disease primarily through interactions with other genes and environmental factors remains a statistical and computational challenge in genetic epidemiology. We have previously introduced a genetic programming optimized neural network (GPNN) as a method for optimizing the architecture of a neural network to improve the identification of gene combinations associated with disease risk. The goal of this study was to evaluate the power of GPNN for identifying high-order gene-gene interactions. We were also interested in applying GPNN to a real data analysis in Parkinson's disease.

Go to article

Impacts Of A Manure Composting Program On Stream Water Quality, A. Bekele, A. M.S. Mcfarland, A. J. Whisenant

Faculty Publications

In February 2001, the Texas Commission on Environmental Quality (TCEQ) adopted a Total Maximum Daily Load (TMDL) for soluble reactive phosphorus (SRP) along the North Bosque River. Within this TMDL, dairy waste application fields were identified as the major nonpoint-source contribution of nutrients. In September 2000, a manure composting program was initiated that resulted in about 500,000 metric tons of dairy manure being hauled to composting facilities and exported from the watershed through December 2004. To evaluate the impact of the manure composting program on stream water quality, storm event mean concentrations of nutrients and total suspended solids were compared …

Go to article

Analyzing Dna Microarrays With Undergraduate Statisticians, Johanna S. Hardin, Laura Hoopes, Ryan Murphy '06

Pomona Faculty Publications and Research

With advances in technology, biologists have been saddled with high dimensional data that need modern statistical methodology for analysis. DNA microarrays are able to simultaneously measure thousands of genes (and the activity of those genes) in a single sample. Biologists use microarrays to trace connections between pathways or to identify all genes that respond to a signal. The statistical tools we usually teach our undergraduates are inadequate for analyzing thousands of measurements on tens of samples. The project materials include readings on microarrays as well as computer lab activities. The topics covered include image analysis, filtering and normalization techniques, and …

Go to article

Statistics and Probability Commons^™

Full-Text Articles in Statistics and Probability

Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh

Harvard University Biostatistics Working Paper Series

Multiple Testing With An Empirical Alternative Hypothesis, James E. Signorovitch

Harvard University Biostatistics Working Paper Series

Exploration Of Distributional Models For A Novel Intensity-Dependent Normalization , Nicola Lama, Patrizia Boracchi, Elia Mario Biganzoli

COBRA Preprint Series

Structural Inference In Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Xihong Lin, Donglin Zeng

Harvard University Biostatistics Working Paper Series

Estimation In Semiparametric Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Donglin Zeng, Xihong Lin

Harvard University Biostatistics Working Paper Series

Nonparametric Regression Using Local Kernel Estimating Equations For Correlated Failure Time Data, Zhangsheng Yu, Xihong Lin

Harvard University Biostatistics Working Paper Series

Causal Inference In Hybrid Intervention Trials Involving Treatment Choice, Qi Long, Rod Little, Xihong Lin

Harvard University Biostatistics Working Paper Series

A Comparison Of Methods For Estimating The Causal Effect Of A Treatment In Randomized Clinical Trials Subject To Noncompliance, Rod Little, Qi Long, Xihong Lin

Harvard University Biostatistics Working Paper Series

Bounded Search For De Novo Identification Of Degenerate Cis-Regulatory Elements, Jonathan M. Carlson, Arijit Chakravarty, Radhika S. Khetani, Robert H. Gross

Dartmouth Scholarship

Genome Scanning Methods For Comparing Sequences Between Groups, With Application To Hiv Vaccine Trials, Peter B. Gilbert, Chunyuan Wu, David V. Jobes

UW Biostatistics Working Paper Series

2^K Factorials In Blocks Of Size 2, With Application To Two-Color Microarray Experiments, Kathleen F. Kerr

UW Biostatistics Working Paper Series

Multiple Tests Of Association With Biological Annotation Metadata, Sandrine Dudoit, Sunduz Keles, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Gpnn: Power Studies And Applications Of A Neural Network Method For Detecting Gene-Gene Interactions In Studies Of Human Disease, Alison A. Motsinger, Stephen L. Lee, George Mellick, Marylyn D. Ritchie

Dartmouth Scholarship

Impacts Of A Manure Composting Program On Stream Water Quality, A. Bekele, A. M.S. Mcfarland, A. J. Whisenant

Faculty Publications

Analyzing Dna Microarrays With Undergraduate Statisticians, Johanna S. Hardin, Laura Hoopes, Ryan Murphy '06

Pomona Faculty Publications and Research