Department of Computer Science
Vanderbilt University,
Nashville, TN 37235
IEEE Transactions on Systems, Man, Cybernetics, vol. 28C, no. 2, pp.219-230, May 1998.
The data exploration task can be divided into three interrelated subtasks: (i) feature selection, (ii) discovery, and (iii) interpretation. This paper describes an unsupervised discovery method with biases geared toward partitioning objects into clusters that improve interpretability. The algorithm, ITERATE, employs: (i) a data ordering scheme and (ii) an iterative redistribution operator to produce maximally cohesive and distinct clusters. Cohesion or intra-class similarity is measured in terms of the match between individual objects and their assigned cluster prototype. Distinctness or inter-class dissimilarity is measured by an average of the variance of the distribution match between clusters. We demonstrate that interpretability, from a problem solving viewpoint, is addressed by the intra- and inter-class measures. Empirical results demonstrate the properties of the discovery algorithm, and its applications to problem solving.
Keywords: knowledge discovery, data mining, conceptual clustering, concept formation, criterion function, order bias, iterative redistribution.
Full Paper (PDF 294912 bytes).