Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 6 of 6

Full-Text Articles in Statistics and Probability

Hierarchical Clustering With Simple Matching And Joint Entropy Dissimilarity Measure, A Mete ÇilingtüRk, ÖZlem ErgüT May 2014

Hierarchical Clustering With Simple Matching And Joint Entropy Dissimilarity Measure, A Mete ÇilingtüRk, ÖZlem ErgüT

Journal of Modern Applied Statistical Methods

Conventional clustering algorithms are restricted for use with data containing ratio or interval scale variables; hence, distances are used. As social studies require merely categorical data, the literature is enriched with more complicated clustering techniques and algorithms of categorical data. These techniques are based on similarity or dissimilarity matrices. The algorithms are using density based or pattern based approaches. A probabilistic nature to similarity structure is proposed. The entropy dissimilarity measure has comparable results with simple matching dissimilarity at hierarchical clustering. It overcomes dimension increase through binarization of the categorical data. This approach is also functional with the clustering methods, …


Statistical Inference For Data Adaptive Target Parameters, Mark J. Van Der Laan, Alan E. Hubbard, Sara Kherad Pajouh Jun 2013

Statistical Inference For Data Adaptive Target Parameters, Mark J. Van Der Laan, Alan E. Hubbard, Sara Kherad Pajouh

U.C. Berkeley Division of Biostatistics Working Paper Series

Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in estimation-sample (one of the V subsamples) and corresponding complementary parameter-generating sample that is used to generate a target parameter. For each of the V parameter-generating samples, we apply an algorithm that maps the sample in a target parameter mapping which represent the statistical target parameter generated by that parameter-generating …


The X-Alter Algorithm: A Parameter-Free Method Of Unsupervised Clustering, Thomas Laloë, Rémi Servien May 2013

The X-Alter Algorithm: A Parameter-Free Method Of Unsupervised Clustering, Thomas Laloë, Rémi Servien

Journal of Modern Applied Statistical Methods

Using quantization techniques, Laloë (2010) defined a new clustering algorithm called Alter. This L1-based algorithm is shown to be convergent but suffers two major flaws. The number of clusters, K, must be supplied by the user and the computational cost is high. This article adapts the X-means algorithm (Pelleg & Moore, 2000) to solve both problems.


Constructing A More Powerful Test In Two-Level Block Randomized Designs, Spyros Konstantopoulos May 2013

Constructing A More Powerful Test In Two-Level Block Randomized Designs, Spyros Konstantopoulos

Journal of Modern Applied Statistical Methods

A more powerful test is proposed for the treatment effect in two-level block randomized designs where random assignment takes place at the first level. When clustering at the second level is assumed to be known, the proposed test produces higher estimates of power than the typical test.


Small-To-Medium Enterprises And Economic Growth: A Comparative Study Of Clustering Techniques, Karim K. Mardaneh Nov 2012

Small-To-Medium Enterprises And Economic Growth: A Comparative Study Of Clustering Techniques, Karim K. Mardaneh

Journal of Modern Applied Statistical Methods

Small-to-medium enterprises (SMEs) in regional (non-metropolitan) areas are considered when economic planning may require large data sets and sophisticated clustering techniques. The economic growth of regional areas was investigated using four clustering algorithms. Empirical analysis demonstrated that the modified global k-means algorithm outperformed other algorithms.


Statistical Inference For Simultaneous Clustering Of Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan Jul 2001

Statistical Inference For Simultaneous Clustering Of Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Current methods for analysis of gene expression data are mostly based on clustering and classification of either genes or samples. We offer support for the idea that more complex patterns can be identified in the data if genes and samples are considered simultaneously. We formalize the approach and propose a statistical framework for two-way clustering. A simultaneous clustering parameter is defined as a function of the true data generating distribution, and an estimate is obtained by applying this function to the empirical distribution. We illustrate that a wide range of clustering procedures, including generalized hierarchical methods, can be defined as …