Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

12,733 Full-Text Articles 20,123 Authors 6,911,751 Downloads 287 Institutions

All Articles in Statistics and Probability

Faceted Search

12,733 full-text articles. Page 251 of 438.

Systematic Evaluation Of The Impact Of Chip-Seq Read Designs On Genome Coverage, Peak Identification, And Allele-Specific Binding Detection, Qi Zhang, Xin Zeng, Sam Younkin, Trupti Kawli, Michael P. Snyder, Sündüz Kele 2016 University of Nebraska-Lincoln

Systematic Evaluation Of The Impact Of Chip-Seq Read Designs On Genome Coverage, Peak Identification, And Allele-Specific Binding Detection, Qi Zhang, Xin Zeng, Sam Younkin, Trupti Kawli, Michael P. Snyder, Sündüz Kele

Department of Statistics: Faculty Publications

Background: Chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiments revolutionized genome-wide profiling of transcription factors and histone modifications. Although maturing sequencing technologies allow these experiments to be carried out with short (36–50 bps), long (75–100 bps), single-end, or paired-end reads, the impact of these read parameters on the downstream data analysis are not well understood. In this paper, we evaluate the effects of different read parameters on genome sequence alignment, coverage of different classes of genomic features, peak identification, and allele-specific binding detection.

Results: We generated 101 bps paired-end ChIP-seq data for many transcription factors from human GM12878 and MCF7 cell …


Enscat: Clustering Of Categorical Data Via Ensembling, Bertrand S. Clarke, Saeid Amiri, Jennifer L. Clarke 2016 University of Nebraska-Lincoln

Enscat: Clustering Of Categorical Data Via Ensembling, Bertrand S. Clarke, Saeid Amiri, Jennifer L. Clarke

Department of Statistics: Faculty Publications

Background: Clustering is a widely used collection of unsupervised learning techniques for identifying natural classes within a data set. It is often used in bioinformatics to infer population substructure. Genomic data are often categorical and high dimensional, e.g., long sequences of nucleotides. This makes inference challenging: The distance metric is often not well-defined on categorical data; running time for computations using high dimensional data can be considerable; and the Curse of Dimensionality often impedes the interpretation of the results. Up to the present, however, the literature and software addressing clustering for categorical data has not yet led to a standard …


A Genomic Bayesian Multi-Trait And Multi-Environment Model, Osval A. Montesinos-López, Abelardo Montesinos-López, José Crossa, Fernando Toledo, Oscar Pérez-Hernández, Kent M. Eskridge, Jessica Rutkoski 2016 Biometrics and Statistics Unit and Global Wheat Program of the International Maize and Wheat Improvement Center (CIMMYT)

A Genomic Bayesian Multi-Trait And Multi-Environment Model, Osval A. Montesinos-López, Abelardo Montesinos-López, José Crossa, Fernando Toledo, Oscar Pérez-Hernández, Kent M. Eskridge, Jessica Rutkoski

Department of Statistics: Faculty Publications

When information on multiple genotypes evaluated in multiple environments is recorded, a multi-environment single trait model for assessing genotype × environment interaction (G×E) is usually employed. Comprehensive models that simultaneously take into account the correlated traits and trait × genotype × environment interaction (T×G×E) are lacking. In this research, we propose a Bayesian model for analyzing multiple traits and multiple environments for whole-genome prediction (WGP) model. For this model, we used Half-𝑡 priors on each standard deviation term and uniform priors on each correlation of the covariance matrix. These priors were not informative and led to posterior inferences that were …


Genomic Bayesian Prediction Model For Count Data With Genotype X Environment Interaction, Abelardo Montesinos-López, Osval A. Montesinos-López, José Crossa, Juan Burgueño, Kent M. Eskridge, Esteban Falconi-Castillo, Xinyao He, Pawan Singh, Karen Cichy 2016 Centro de Investigación en Matemáticas (CIMAT)

Genomic Bayesian Prediction Model For Count Data With Genotype X Environment Interaction, Abelardo Montesinos-López, Osval A. Montesinos-López, José Crossa, Juan Burgueño, Kent M. Eskridge, Esteban Falconi-Castillo, Xinyao He, Pawan Singh, Karen Cichy

Department of Statistics: Faculty Publications

Genomic tools allow the study of the whole genome, and facilitate the study of genotype-environment combinations and their relationship with phenotype. However, most genomic prediction models developed so far are appropriate for Gaussian phenotypes. For this reason, appropriate genomic prediction models are needed for count data, since the conventional regression models used on count data with a large sample size (nT ) and a small number of parameters (p) cannot be used for genomic-enabled prediction where the number of parameters (p) is larger than the sample size (nT ). Here, we propose a Bayesian mixed-negative binomial (BMNB) genomic …


The Impact Of Hair Coat Color On Longevity Of Holstein Cows In The Tropics, C. N. Lee, K. S. Baek, A. Parkhurst 2016 University of Hawaii-Manoa

The Impact Of Hair Coat Color On Longevity Of Holstein Cows In The Tropics, C. N. Lee, K. S. Baek, A. Parkhurst

Department of Statistics: Faculty Publications

Background: Over two decades of observations in the field in South East Asia and Hawai‘i suggest that majority of the commercial dairy herds are of black hair coat. Hence a simple study to determine the accuracy of the observation was conducted with two large dairy herds in Hawaii in the mid-1990s.

Methods: A retrospective study on longevity of Holstein cattle in the tropics was conducted using DairyComp-305 lactation information coupled with phenotypic evaluation of hair coat color in two large dairy farms. Cows were classified into 3 groups: a) black (B, >90%); b) black/white (BW, 50:50) and c) white (W, …


Sex-Specific Hippocampal 5-Hydroxymethylcytosine Is Disrupted In Response To Acute Stress, Ligia A. Papale, Sisi Li, Andy Madrid, Qi Zhang, Li Chen, Pankaj Chopra, Peng Jin, Sunduz Keles, Reid S. Alisch 2016 University of Wisconsin

Sex-Specific Hippocampal 5-Hydroxymethylcytosine Is Disrupted In Response To Acute Stress, Ligia A. Papale, Sisi Li, Andy Madrid, Qi Zhang, Li Chen, Pankaj Chopra, Peng Jin, Sunduz Keles, Reid S. Alisch

Department of Statistics: Faculty Publications

Environmental stress is among the most important contributors to increased susceptibility to develop psychiatric disorders. While it is well known that acute environmental stress alters gene expression, the molecular mechanisms underlying these changes remain largely unknown. 5-hydroxymethylcytosine (5hmC) is a novel environmentally sensitive epigenetic modification that is highly enriched in neurons and is associated with active neuronal transcription. Recently,we reported a genome-wide disruption of hippocampal 5hmCin male mice following acute stress that was correlated to altered transcript levels of genes in known stress related pathways. Since sex-specific endocrine mechanisms respond to environmental stimulus by altering the neuronal epigenome, we examined …


A Compendium Of Chromatin Contact Maps Reveals Spatially Active Regions In The Human Genome, Anthony D. Schmitt, Ming Hu, Inkyung Jung, Zheng Xu, Yunjiang Qiu, Catherine L. Tan, Yun Li, Shin Lin, Yiing Lin, Cathy L. Barr, Bing Ren 2016 Arima Genomics Inc.

A Compendium Of Chromatin Contact Maps Reveals Spatially Active Regions In The Human Genome, Anthony D. Schmitt, Ming Hu, Inkyung Jung, Zheng Xu, Yunjiang Qiu, Catherine L. Tan, Yun Li, Shin Lin, Yiing Lin, Cathy L. Barr, Bing Ren

Department of Statistics: Faculty Publications

The three-dimensional configuration of DNA is integral to all nuclear processes in eukaryotes, yet our knowledge of the chromosome architecture is still limited. Genome-wide chromosome conformation capture studies have uncovered features of chromatin organization in cultured cells, but genome architecture in human tissues has yet to be explored. Here, we report the most comprehensive survey to date of chromatin organization in human tissues. Through integrative analysis of chromatin contact maps in 21 primary human tissues and cell types, we find topologically associating domains highly conserved in different tissues. We also discover genomic regions that exhibit unusually high levels of local …


Hiview: An Integrative Genome Browser To Leverage Hi‑C Results For The Interpretation Of Gwas Variants, Zheng Xu, Guosheng Zhang, Qing Duan, Shengjie Chai, Baqun Zhang, Cong Wu, Fulai Jin, Feng Yue, Yun Li, Ming Hu 2016 University of North Carolina at Chapel Hill

Hiview: An Integrative Genome Browser To Leverage Hi‑C Results For The Interpretation Of Gwas Variants, Zheng Xu, Guosheng Zhang, Qing Duan, Shengjie Chai, Baqun Zhang, Cong Wu, Fulai Jin, Feng Yue, Yun Li, Ming Hu

Department of Statistics: Faculty Publications

Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with complex traits and diseases. However, most of them are located in the non-protein coding regions, and therefore it is challenging to hypothesize the functions of these non-coding GWAS variants. Recent large efforts such as the ENCODE and Roadmap Epigenomics projects have predicted a large number of regulatory elements. However, the target genes of these regulatory elements remain largely unknown. Chromatin conformation capture based technologies such as Hi-C can directly measure the chromatin interactions and have generated an increasingly comprehensive catalog of the interactome between the distal regulatory elements …


A Bayesian Gwas Method Utilizing Haplotype Clusters For A Composite Breed Population, Danielle F. Wilson-Wells, Stephen D. Kachman 2016 University of Nebraska - Lincoln

A Bayesian Gwas Method Utilizing Haplotype Clusters For A Composite Breed Population, Danielle F. Wilson-Wells, Stephen D. Kachman

Department of Statistics: Faculty Publications

Commercial beef cattle are often composites of multiple breeds. Current methods used to produce genomic predictors are based on the underlying assumption of animals being sampled from a homogeneous population. As a result, the predictors can perform poorly when used to predict the relative genetic merit of animals whose breed composition are different. In part, this is due to the changes in linkage disequilibrium between the markers and the quantitative trait loci as we move from one breed to the next. An alternative model based on breed specific haplotype clusters was developed to allow for differences in linkage disequilibrium across …


Design Of Probabilistic Random Forests With Applications To Anticancer Drug Sensitivity Prediction- 2016, Raziur Rahman, Saad Haider, Souparno Ghosh, Ranadip Pal 2016 Texas Tech University

Design Of Probabilistic Random Forests With Applications To Anticancer Drug Sensitivity Prediction- 2016, Raziur Rahman, Saad Haider, Souparno Ghosh, Ranadip Pal

Department of Statistics: Faculty Publications

Random forests consisting of an ensemble of regression trees with equal weights are frequently used for design of predictive models. In this article, we consider an extension of the methodology by representing the regression trees in the form of probabilistic trees and analyzing the nature of heteroscedasticity. The probabilistic tree representation allows for analytical computation of confidence intervals (CIs), and the tree weight optimization is expected to provide stricter CIs with comparable performance in mean error. We approached the ensemble of probabilistic trees’ prediction from the perspectives of a mixture distribution and as a weighted sum of correlated random variables. …


Effect Of An Interactive Component On Students' Conceptual Understanding Of Hypothesis Testing, Sarah Anne Inkpen 2016 Walden University

Effect Of An Interactive Component On Students' Conceptual Understanding Of Hypothesis Testing, Sarah Anne Inkpen

Walden Dissertations and Doctoral Studies

The Premier Technical College of Qatar (PTC-Q) has seen high failure rates among students taking a college statistics course. The students are English as a foreign language (EFL) learners in business studies and health sciences. Course delivery has involved conventional content/curriculum-centered instruction with minimal to no interactive components. The purpose of this quasi-experimental study was to assess the effectiveness of an interactive approach to teaching and learning statistics used in North America and the United Kingdom when used with EFL students in the Middle East. Guided by von Glasersfeld's constructivist framework, this study compared conceptual understanding between a convenience sample …


A Saddlepoint Approximation To Left-Tailed Hypothesis Tests Of Variance For Non-Normal Populations, Tyler L. Grimes 2016 University of North Florida

A Saddlepoint Approximation To Left-Tailed Hypothesis Tests Of Variance For Non-Normal Populations, Tyler L. Grimes

UNF Graduate Theses and Dissertations

When the variance of a single population needs to be assessed, the well-known chi-squared test of variance is often used but relies heavily on its normality assumption. For non-normal populations, few alternative tests have been developed to conduct left tailed hypothesis tests of variance. This thesis outlines a method for generating new test statistics using a saddlepoint approximation. Several novel test statistics are proposed. The type-I error rates and power of each test are evaluated using a Monte Carlo simulation study. One of the proposed test statistics, R_gamma2, controls type-I error rates better than existing tests, while having comparable power. …


A New Right Tailed Test Of The Ratio Of Variances, Elizabeth Rochelle Lesser 2016 University of North Florida

A New Right Tailed Test Of The Ratio Of Variances, Elizabeth Rochelle Lesser

UNF Graduate Theses and Dissertations

It is important to be able to compare variances efficiently and accurately regardless of the parent populations. This study proposes a new right tailed test for the ratio of two variances using the Edgeworth’s expansion. To study the Type I error rate and Power performance, simulation was performed on the new test with various combinations of symmetric and skewed distributions. It is found to have more controlled Type I error rates than the existing tests. Additionally, it also has sufficient power. Therefore, the newly derived test provides a good robust alternative to the already existing methods.


Grief And Gratitude, Lynne Steuerle Schofield 2016 Swarthmore College

Grief And Gratitude, Lynne Steuerle Schofield

Mathematics & Statistics Faculty Works

No abstract provided.


Developing An Alternative Way To Analyze Nanostring Data, Shu Shen 2016 University of Kentucky

Developing An Alternative Way To Analyze Nanostring Data, Shu Shen

Theses and Dissertations--Statistics

Nanostring technology provides a new method to measure gene expressions. It's more sensitive than microarrays and able to do more gene measurements than RT-PCR with similar sensitivity. This system produces counts for each target gene and tabulates them. Counts can be normalized by using an Excel macro or nSolver before analysis. Both methods rely on data normalization prior to statistical analysis to identify differentially expressed genes. Alternatively, we propose to model gene expressions as a function of positive controls and reference gene measurements. Simulations and examples are used to compare this model with Nanostring normalization methods. The results show that …


Statistical Inference On Trimmed Means, Lorenz Curves, And Partial Area Under Roc Curves By Empirical Likelihood Method, Yumin Zhao 2016 University of Kentucky

Statistical Inference On Trimmed Means, Lorenz Curves, And Partial Area Under Roc Curves By Empirical Likelihood Method, Yumin Zhao

Theses and Dissertations--Statistics

Traditionally the inference on trimmed means, Lorenz Curves, and partial AUC (pAUC) under ROC curves have been done based on the asymptotic normality of the statistics. Based on the theory of empirical likelihood, in this dissertation we developed novel methods to do statistical inferences on trimmed means, Lorenz curves, and pAUC. A common characteristic among trimmed means, Lorenz curves, and pAUC is that their inferences are not based on the whole set of samples. Qin and Tsao (2002), Qin et al. (2013), and Qin et al. (2011) recently published their re- searches on the inferences of trimmed means, Lorenz curves, …


Sample Size Estimation For Genomics Experiments With Dependent End Points, Desmond Koomson 2016 University of Texas at El Paso

Sample Size Estimation For Genomics Experiments With Dependent End Points, Desmond Koomson

Open Access Theses & Dissertations

In typical genomics studies involving numerous association tests of gene mutations with a disease, error rate control via multiplicity adjustment is paramount because even if all genes were to be non-differentially associated, we would still make some false positives. Many methods exist that incorporate the control of multiplicity for normally distributed endpoints in sample size estimation, but none addresses the issue for non-normally correlated endpoints.

One common practice in the literature is to assume an equal correlation among all differentially associated or expressed genes, thereby using the generalized binomial or beta-binomial model to compute the comparison-wise power of detecting these …


Anatomy, Implant Selection And Placement Influence Spine Mechanics Associated With Total Disc Replacement, Justin F.M. Hollenbeck 2016 University of Denver

Anatomy, Implant Selection And Placement Influence Spine Mechanics Associated With Total Disc Replacement, Justin F.M. Hollenbeck

Electronic Theses and Dissertations

Through aging and injury, the intervertebral disc of the lumbar spine can undergo degeneration, leading to collapse of the vertebrae and low back pain, a symptom that affects half the adult population in any given year. In an effort to reduce low back pain, total disc replacement treatment removes the degenerated disc, restores natural height and lordosis of the segment, and preserves motion at the joint. Patient anatomy, implant selection, and implant placement play significant roles in a patient's outcomes after total disc replacement surgery. Thus, the objective of the work presented in this thesis was to develop a suite …


Privacy And Accountability In Black-Box Medicine, Roger Allan Ford, W. Nicholson Price II 2016 University of New Hampshire School of Law

Privacy And Accountability In Black-Box Medicine, Roger Allan Ford, W. Nicholson Price Ii

Law Faculty Scholarship

Black-box medicine—the use of big data and sophisticated machine learning techniques for health-care applications—could be the future of personalized medicine. Black-box medicine promises to make it easier to diagnose rare diseases and conditions, identify the most promising treatments, and allocate scarce resources among different patients. But to succeed, it must overcome two separate, but related, problems: patient privacy and algorithmic accountability. Privacy is a problem because researchers need access to huge amounts of patient health information to generate useful medical predictions. And accountability is a problem because black-box algorithms must be verified by outsiders to ensure they are accurate and …


Diversification And Market Neutral Portfolios In S&P500, Alan S. Agnew 2016 University of Akron

Diversification And Market Neutral Portfolios In S&P500, Alan S. Agnew

Williams Honors College, Honors Research Projects

Our goal is to investigate strategies to deal with the risks associated with holding asset in the stock market. We first deal with risk of holding a specific stock, by the use of diversification. Later, we’ll attempt to deal with the market risk, which is the risk of entire market going up and down. Data used in this project comes from daily adjusted closing price of stocks listed in the S&P500 index ranging from January 3rd, 2000 to December 31st, 2015 and the data is processed using statistical software R.

Sections 2 through 4 of this …


Digital Commons powered by bepress