Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

PDF

University of Nebraska - Lincoln

Clustering

Articles 1 - 3 of 3

Full-Text Articles in Entire DC Network

Genomic Prediction Using Canopy Coverage Image And Genotypic Information In Soybean Via A Hybrid Model, Reka Howard, Diego Jarquin Jan 2019

Genomic Prediction Using Canopy Coverage Image And Genotypic Information In Soybean Via A Hybrid Model, Reka Howard, Diego Jarquin

Department of Statistics: Faculty Publications

Prediction techniques are important in plant breeding as they provide a tool for selection that is more efficient and economical than traditional phenotypic and pedigree based selection. The conventional genomic prediction models include molecular marker information to predict the phenotype. With the development of new phenomics techniques we have the opportunity to collect image data on the plants, and extend the traditional genomic prediction models where we incorporate diverse set of information collected on the plants. In our research, we developed a hybrid matrix model that incorporates molecular marker and canopy coverage information as a weighted linear combination to predict …


Enscat: Clustering Of Categorical Data Via Ensembling, Bertrand S. Clarke, Saeid Amiri, Jennifer L. Clarke Jan 2016

Enscat: Clustering Of Categorical Data Via Ensembling, Bertrand S. Clarke, Saeid Amiri, Jennifer L. Clarke

Department of Statistics: Faculty Publications

Background: Clustering is a widely used collection of unsupervised learning techniques for identifying natural classes within a data set. It is often used in bioinformatics to infer population substructure. Genomic data are often categorical and high dimensional, e.g., long sequences of nucleotides. This makes inference challenging: The distance metric is often not well-defined on categorical data; running time for computations using high dimensional data can be considerable; and the Curse of Dimensionality often impedes the interpretation of the results. Up to the present, however, the literature and software addressing clustering for categorical data has not yet led to a standard …


A Comparison Of Population-Averaged And Cluster-Specific Approaches In The Context Of Unequal Probabilities Of Selection, Natalie A. Koziol May 2015

A Comparison Of Population-Averaged And Cluster-Specific Approaches In The Context Of Unequal Probabilities Of Selection, Natalie A. Koziol

College of Education and Human Sciences: Dissertations, Theses, and Student Research

Sampling designs of large-scale, federally funded studies are typically complex, involving multiple design features (e.g., clustering, unequal probabilities of selection). Researchers must account for these features in order to obtain unbiased point estimators and make valid inferences about population parameters. Single-level (i.e., population-averaged) and multilevel (i.e., cluster-specific) methods provide two alternatives for modeling clustered data. Single-level methods rely on the use of adjusted variance estimators to account for dependency due to clustering, whereas multilevel methods incorporate the dependency into the specification of the model.

Although the literature comparing single-level and multilevel approaches is vast, comparisons have been limited to the …