Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 15 of 15

Full-Text Articles in Statistics and Probability

Optimal Design Of Low-Density Snp Arrays For Genomic Prediction: Algorithm And Applications, Xiao-Lin Wu, Jiaqi Xu, Guofei Feng, George R. Wiggans, Jeremy F. Taylor, Jun He, Changsong Qian, Jiansheng Qiu, Barry Simpson, Jeremy Walker, Stewart Bauck Sep 2016

Optimal Design Of Low-Density Snp Arrays For Genomic Prediction: Algorithm And Applications, Xiao-Lin Wu, Jiaqi Xu, Guofei Feng, George R. Wiggans, Jeremy F. Taylor, Jun He, Changsong Qian, Jiansheng Qiu, Barry Simpson, Jeremy Walker, Stewart Bauck

Department of Statistics: Faculty Publications

Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD) or high-density (HD) SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE) or haplotype-averaged Shannon entropy (HASE) and adjusted for uniformity of the SNP distribution. …


Methods To Account For Breed Composition In A Bayesian Gwas Method Which Utilizes Haplotype Clusters, Danielle F. Wilson-Wells Aug 2016

Methods To Account For Breed Composition In A Bayesian Gwas Method Which Utilizes Haplotype Clusters, Danielle F. Wilson-Wells

Department of Statistics: Dissertations, Theses, and Student Work

In livestock, prediction of an animal’s genetic merit using genomic information is becoming increasingly common. The models used to make these predictions typically assume that we are sampling from a homogeneous population. However, in both commercial and experimental populations the sire and dam of an individual may be a mixture of different breeds. Haplotype models can capture this population structure.

Two models based on breed specific haplotype clusters where developed to account for differences across multiple breeds. The first model utilizes the breed composition of the individual, while the second utilizes the breed composition from the sire and dam. Haplotype …


Converting Heterogeneous Statistical Tables On The Web To Searchable Databases, David W. Embley, Mukkai S. Krishnamoorthy, George Nagy, Sharad C. Seth Feb 2016

Converting Heterogeneous Statistical Tables On The Web To Searchable Databases, David W. Embley, Mukkai S. Krishnamoorthy, George Nagy, Sharad C. Seth

School of Computing: Faculty Publications

Much of the world’s quantitative data reside in scattered web tables. For a meaningful role in Big Data analytics, the facts reported in these tables must be brought into a uniform framework. Based on a formalization of header-indexed tables, we proffer an algorithmic solution to end-to-end table processing for a large class of human-readable tables. The proposed algorithms transform header-indexed tables to a category table format that maps easily to a variety of industry-standard data stores for query processing. The algorithms segment table regions based on the unique indexing of the data region by header paths, classify table cells, and …


How Often Are Antibiotic-Resistant Bacteria Said To “Evolve” In The News?, Nina Singh, Matthew T. Sit, Deanna M. Chung, Ana A. Lopez, Ranil Weerackoon, Pamela J. Yeh Jan 2016

How Often Are Antibiotic-Resistant Bacteria Said To “Evolve” In The News?, Nina Singh, Matthew T. Sit, Deanna M. Chung, Ana A. Lopez, Ranil Weerackoon, Pamela J. Yeh

Department of Statistics: Faculty Publications

Media plays an important role in informing the general public about scientific ideas.We examine whether the word “evolve,” sometimes considered controversial by the general public, is frequently used in the popular press. Specifically, we ask how often articles discussing antibiotic resistance use the word “evolve” (or its lexemes) as opposed to alternative terms such as “emerge” or “develop.” We chose the topic of antibiotic resistance because it is a medically important issue; bacterial evolution is a central player in human morbidity and mortality. We focused on the most widely-distributed newspapers written in English in the United States, United Kingdom, Canada, …


Systematic Evaluation Of The Impact Of Chip-Seq Read Designs On Genome Coverage, Peak Identification, And Allele-Specific Binding Detection, Qi Zhang, Xin Zeng, Sam Younkin, Trupti Kawli, Michael P. Snyder, Sündüz Kele Jan 2016

Systematic Evaluation Of The Impact Of Chip-Seq Read Designs On Genome Coverage, Peak Identification, And Allele-Specific Binding Detection, Qi Zhang, Xin Zeng, Sam Younkin, Trupti Kawli, Michael P. Snyder, Sündüz Kele

Department of Statistics: Faculty Publications

Background: Chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiments revolutionized genome-wide profiling of transcription factors and histone modifications. Although maturing sequencing technologies allow these experiments to be carried out with short (36–50 bps), long (75–100 bps), single-end, or paired-end reads, the impact of these read parameters on the downstream data analysis are not well understood. In this paper, we evaluate the effects of different read parameters on genome sequence alignment, coverage of different classes of genomic features, peak identification, and allele-specific binding detection.

Results: We generated 101 bps paired-end ChIP-seq data for many transcription factors from human GM12878 and MCF7 cell …


Enscat: Clustering Of Categorical Data Via Ensembling, Bertrand S. Clarke, Saeid Amiri, Jennifer L. Clarke Jan 2016

Enscat: Clustering Of Categorical Data Via Ensembling, Bertrand S. Clarke, Saeid Amiri, Jennifer L. Clarke

Department of Statistics: Faculty Publications

Background: Clustering is a widely used collection of unsupervised learning techniques for identifying natural classes within a data set. It is often used in bioinformatics to infer population substructure. Genomic data are often categorical and high dimensional, e.g., long sequences of nucleotides. This makes inference challenging: The distance metric is often not well-defined on categorical data; running time for computations using high dimensional data can be considerable; and the Curse of Dimensionality often impedes the interpretation of the results. Up to the present, however, the literature and software addressing clustering for categorical data has not yet led to a standard …


A Genomic Bayesian Multi-Trait And Multi-Environment Model, Osval A. Montesinos-López, Abelardo Montesinos-López, José Crossa, Fernando Toledo, Oscar Pérez-Hernández, Kent M. Eskridge, Jessica Rutkoski Jan 2016

A Genomic Bayesian Multi-Trait And Multi-Environment Model, Osval A. Montesinos-López, Abelardo Montesinos-López, José Crossa, Fernando Toledo, Oscar Pérez-Hernández, Kent M. Eskridge, Jessica Rutkoski

Department of Statistics: Faculty Publications

When information on multiple genotypes evaluated in multiple environments is recorded, a multi-environment single trait model for assessing genotype × environment interaction (G×E) is usually employed. Comprehensive models that simultaneously take into account the correlated traits and trait × genotype × environment interaction (T×G×E) are lacking. In this research, we propose a Bayesian model for analyzing multiple traits and multiple environments for whole-genome prediction (WGP) model. For this model, we used Half-𝑡 priors on each standard deviation term and uniform priors on each correlation of the covariance matrix. These priors were not informative and led to posterior inferences that were …


Genomic Bayesian Prediction Model For Count Data With Genotype X Environment Interaction, Abelardo Montesinos-López, Osval A. Montesinos-López, José Crossa, Juan Burgueño, Kent M. Eskridge, Esteban Falconi-Castillo, Xinyao He, Pawan Singh, Karen Cichy Jan 2016

Genomic Bayesian Prediction Model For Count Data With Genotype X Environment Interaction, Abelardo Montesinos-López, Osval A. Montesinos-López, José Crossa, Juan Burgueño, Kent M. Eskridge, Esteban Falconi-Castillo, Xinyao He, Pawan Singh, Karen Cichy

Department of Statistics: Faculty Publications

Genomic tools allow the study of the whole genome, and facilitate the study of genotype-environment combinations and their relationship with phenotype. However, most genomic prediction models developed so far are appropriate for Gaussian phenotypes. For this reason, appropriate genomic prediction models are needed for count data, since the conventional regression models used on count data with a large sample size (nT ) and a small number of parameters (p) cannot be used for genomic-enabled prediction where the number of parameters (p) is larger than the sample size (nT ). Here, we propose a Bayesian mixed-negative binomial (BMNB) genomic …


The Impact Of Hair Coat Color On Longevity Of Holstein Cows In The Tropics, C. N. Lee, K. S. Baek, A. Parkhurst Jan 2016

The Impact Of Hair Coat Color On Longevity Of Holstein Cows In The Tropics, C. N. Lee, K. S. Baek, A. Parkhurst

Department of Statistics: Faculty Publications

Background: Over two decades of observations in the field in South East Asia and Hawai‘i suggest that majority of the commercial dairy herds are of black hair coat. Hence a simple study to determine the accuracy of the observation was conducted with two large dairy herds in Hawaii in the mid-1990s.

Methods: A retrospective study on longevity of Holstein cattle in the tropics was conducted using DairyComp-305 lactation information coupled with phenotypic evaluation of hair coat color in two large dairy farms. Cows were classified into 3 groups: a) black (B, >90%); b) black/white (BW, 50:50) and c) white (W, …


Sex-Specific Hippocampal 5-Hydroxymethylcytosine Is Disrupted In Response To Acute Stress, Ligia A. Papale, Sisi Li, Andy Madrid, Qi Zhang, Li Chen, Pankaj Chopra, Peng Jin, Sunduz Keles, Reid S. Alisch Jan 2016

Sex-Specific Hippocampal 5-Hydroxymethylcytosine Is Disrupted In Response To Acute Stress, Ligia A. Papale, Sisi Li, Andy Madrid, Qi Zhang, Li Chen, Pankaj Chopra, Peng Jin, Sunduz Keles, Reid S. Alisch

Department of Statistics: Faculty Publications

Environmental stress is among the most important contributors to increased susceptibility to develop psychiatric disorders. While it is well known that acute environmental stress alters gene expression, the molecular mechanisms underlying these changes remain largely unknown. 5-hydroxymethylcytosine (5hmC) is a novel environmentally sensitive epigenetic modification that is highly enriched in neurons and is associated with active neuronal transcription. Recently,we reported a genome-wide disruption of hippocampal 5hmCin male mice following acute stress that was correlated to altered transcript levels of genes in known stress related pathways. Since sex-specific endocrine mechanisms respond to environmental stimulus by altering the neuronal epigenome, we examined …


A Compendium Of Chromatin Contact Maps Reveals Spatially Active Regions In The Human Genome, Anthony D. Schmitt, Ming Hu, Inkyung Jung, Zheng Xu, Yunjiang Qiu, Catherine L. Tan, Yun Li, Shin Lin, Yiing Lin, Cathy L. Barr, Bing Ren Jan 2016

A Compendium Of Chromatin Contact Maps Reveals Spatially Active Regions In The Human Genome, Anthony D. Schmitt, Ming Hu, Inkyung Jung, Zheng Xu, Yunjiang Qiu, Catherine L. Tan, Yun Li, Shin Lin, Yiing Lin, Cathy L. Barr, Bing Ren

Department of Statistics: Faculty Publications

The three-dimensional configuration of DNA is integral to all nuclear processes in eukaryotes, yet our knowledge of the chromosome architecture is still limited. Genome-wide chromosome conformation capture studies have uncovered features of chromatin organization in cultured cells, but genome architecture in human tissues has yet to be explored. Here, we report the most comprehensive survey to date of chromatin organization in human tissues. Through integrative analysis of chromatin contact maps in 21 primary human tissues and cell types, we find topologically associating domains highly conserved in different tissues. We also discover genomic regions that exhibit unusually high levels of local …


Hiview: An Integrative Genome Browser To Leverage Hi‑C Results For The Interpretation Of Gwas Variants, Zheng Xu, Guosheng Zhang, Qing Duan, Shengjie Chai, Baqun Zhang, Cong Wu, Fulai Jin, Feng Yue, Yun Li, Ming Hu Jan 2016

Hiview: An Integrative Genome Browser To Leverage Hi‑C Results For The Interpretation Of Gwas Variants, Zheng Xu, Guosheng Zhang, Qing Duan, Shengjie Chai, Baqun Zhang, Cong Wu, Fulai Jin, Feng Yue, Yun Li, Ming Hu

Department of Statistics: Faculty Publications

Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with complex traits and diseases. However, most of them are located in the non-protein coding regions, and therefore it is challenging to hypothesize the functions of these non-coding GWAS variants. Recent large efforts such as the ENCODE and Roadmap Epigenomics projects have predicted a large number of regulatory elements. However, the target genes of these regulatory elements remain largely unknown. Chromatin conformation capture based technologies such as Hi-C can directly measure the chromatin interactions and have generated an increasingly comprehensive catalog of the interactome between the distal regulatory elements …


A Bayesian Gwas Method Utilizing Haplotype Clusters For A Composite Breed Population, Danielle F. Wilson-Wells, Stephen D. Kachman Jan 2016

A Bayesian Gwas Method Utilizing Haplotype Clusters For A Composite Breed Population, Danielle F. Wilson-Wells, Stephen D. Kachman

Department of Statistics: Faculty Publications

Commercial beef cattle are often composites of multiple breeds. Current methods used to produce genomic predictors are based on the underlying assumption of animals being sampled from a homogeneous population. As a result, the predictors can perform poorly when used to predict the relative genetic merit of animals whose breed composition are different. In part, this is due to the changes in linkage disequilibrium between the markers and the quantitative trait loci as we move from one breed to the next. An alternative model based on breed specific haplotype clusters was developed to allow for differences in linkage disequilibrium across …


Design Of Probabilistic Random Forests With Applications To Anticancer Drug Sensitivity Prediction- 2016, Raziur Rahman, Saad Haider, Souparno Ghosh, Ranadip Pal Jan 2016

Design Of Probabilistic Random Forests With Applications To Anticancer Drug Sensitivity Prediction- 2016, Raziur Rahman, Saad Haider, Souparno Ghosh, Ranadip Pal

Department of Statistics: Faculty Publications

Random forests consisting of an ensemble of regression trees with equal weights are frequently used for design of predictive models. In this article, we consider an extension of the methodology by representing the regression trees in the form of probabilistic trees and analyzing the nature of heteroscedasticity. The probabilistic tree representation allows for analytical computation of confidence intervals (CIs), and the tree weight optimization is expected to provide stricter CIs with comparable performance in mean error. We approached the ensemble of probabilistic trees’ prediction from the perspectives of a mixture distribution and as a weighted sum of correlated random variables. …


Species Discovery And Diversity In Lobocriconema (Criconematidae: Nematoda) And Related Plant-Parasitic Nematodes From North American Ecoregions, Tom Powers, Ernest C. Bernard, T. Harris, Robert Higgins, M. Olson, S. Olson, M. Lodema, Julianne N. Matczyszyn, P. Mullin, L. Sutton, K.S Powers Jan 2016

Species Discovery And Diversity In Lobocriconema (Criconematidae: Nematoda) And Related Plant-Parasitic Nematodes From North American Ecoregions, Tom Powers, Ernest C. Bernard, T. Harris, Robert Higgins, M. Olson, S. Olson, M. Lodema, Julianne N. Matczyszyn, P. Mullin, L. Sutton, K.S Powers

Department of Statistics: Faculty Publications

There are many nematode species that, following formal description, are seldom mentioned again in the scientific literature. Lobocriconema thornei and L. incrassatum are two such species, described from North American forests, respectively 37 and 49 years ago. In the course of a 3-year nematode biodiversity survey of North American ecoregions, specimens resembling Lobocriconema species appeared in soil samples from both grassland and forested sites. Using a combination of molecular and morphological analyses, together with a set of species delimitation approaches, we have expanded the known range of these species, added to the species descriptions, and discovered a related group of …