Open Access. Powered by Scholars. Published by Universities.®
- Institution
- Publication
- Publication Type
Articles 1 - 6 of 6
Full-Text Articles in Statistics and Probability
Hierarchical Clustering With Simple Matching And Joint Entropy Dissimilarity Measure, A Mete ÇilingtüRk, ÖZlem ErgüT
Hierarchical Clustering With Simple Matching And Joint Entropy Dissimilarity Measure, A Mete ÇilingtüRk, ÖZlem ErgüT
Journal of Modern Applied Statistical Methods
Conventional clustering algorithms are restricted for use with data containing ratio or interval scale variables; hence, distances are used. As social studies require merely categorical data, the literature is enriched with more complicated clustering techniques and algorithms of categorical data. These techniques are based on similarity or dissimilarity matrices. The algorithms are using density based or pattern based approaches. A probabilistic nature to similarity structure is proposed. The entropy dissimilarity measure has comparable results with simple matching dissimilarity at hierarchical clustering. It overcomes dimension increase through binarization of the categorical data. This approach is also functional with the clustering methods, …
Statistical Inference For Data Adaptive Target Parameters, Mark J. Van Der Laan, Alan E. Hubbard, Sara Kherad Pajouh
Statistical Inference For Data Adaptive Target Parameters, Mark J. Van Der Laan, Alan E. Hubbard, Sara Kherad Pajouh
U.C. Berkeley Division of Biostatistics Working Paper Series
Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in estimation-sample (one of the V subsamples) and corresponding complementary parameter-generating sample that is used to generate a target parameter. For each of the V parameter-generating samples, we apply an algorithm that maps the sample in a target parameter mapping which represent the statistical target parameter generated by that parameter-generating …
The X-Alter Algorithm: A Parameter-Free Method Of Unsupervised Clustering, Thomas Laloë, Rémi Servien
The X-Alter Algorithm: A Parameter-Free Method Of Unsupervised Clustering, Thomas Laloë, Rémi Servien
Journal of Modern Applied Statistical Methods
Using quantization techniques, Laloë (2010) defined a new clustering algorithm called Alter. This L1-based algorithm is shown to be convergent but suffers two major flaws. The number of clusters, K, must be supplied by the user and the computational cost is high. This article adapts the X-means algorithm (Pelleg & Moore, 2000) to solve both problems.
Constructing A More Powerful Test In Two-Level Block Randomized Designs, Spyros Konstantopoulos
Constructing A More Powerful Test In Two-Level Block Randomized Designs, Spyros Konstantopoulos
Journal of Modern Applied Statistical Methods
A more powerful test is proposed for the treatment effect in two-level block randomized designs where random assignment takes place at the first level. When clustering at the second level is assumed to be known, the proposed test produces higher estimates of power than the typical test.
Small-To-Medium Enterprises And Economic Growth: A Comparative Study Of Clustering Techniques, Karim K. Mardaneh
Small-To-Medium Enterprises And Economic Growth: A Comparative Study Of Clustering Techniques, Karim K. Mardaneh
Journal of Modern Applied Statistical Methods
Small-to-medium enterprises (SMEs) in regional (non-metropolitan) areas are considered when economic planning may require large data sets and sophisticated clustering techniques. The economic growth of regional areas was investigated using four clustering algorithms. Empirical analysis demonstrated that the modified global k-means algorithm outperformed other algorithms.
Statistical Inference For Simultaneous Clustering Of Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan
Statistical Inference For Simultaneous Clustering Of Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Current methods for analysis of gene expression data are mostly based on clustering and classification of either genes or samples. We offer support for the idea that more complex patterns can be identified in the data if genes and samples are considered simultaneously. We formalize the approach and propose a statistical framework for two-way clustering. A simultaneous clustering parameter is defined as a function of the true data generating distribution, and an estimate is obtained by applying this function to the empirical distribution. We illustrate that a wide range of clustering procedures, including generalized hierarchical methods, can be defined as …