Clustering Sequece Data using Hiddem Markov Model Representation
Cen Li and Gautam Biswas
Department of Computer Science, Vanderbilt University
Box 1679, Station B,
Nashville, TN 37235
in SPIE'99 Conference on Data Mining and Knowledge Discovery: Theory, Tools, and Technology,
page 14-21, Orlando, FL, April 4-6,1999
This paper proposed a clustering method for sequence data using hidden Markov model(HMM)
representation. The proposed methodology improves upon existing HMM based clustering methods in two ways:
- it enables HMMS to dynamically change its model structure to obtain a better fit model for data during clustering process
- it provides objective criterion function to select the optimal clustering partition.
The algorithm is presented in terms of four nested levels of searches:
- the search for the optimal number of clusters in a partition
- the search for the optimal structure for a given partion
- the search for the optimal HMM structure for each cluster
- the search for the optima HMM parameters for each HMM
Preliminary results are given to support the proposed methodology.
Keywords:
clustering, hidden Markov model, model selection, Bayesian Information Criterion (BIC), mutual information
Full Paper (PDF 163840 bytes)