Abstract
Hidden Markov Models (HMM) (i.e. doubly stochastic probabilistic networks) have been widely used in analyzing time-series data such as those obtained from speech and molecular biology. A crucial issue in modeling time-series data using HMM, is the problem of determining the appropriate model architecture: the number of states and the links between the states. While current HMM training procedures iteratively optimize model parameters, they usually require the model configuration to be fixed. The task of model configuration is done manually by trained experts. In this paper we present a procedure that addresses the problem of automatically configuring HMM's. It starts with a large, possibly over-fitted HMM, and attempts to prune it down to the appropriate complexity fit. The procedure can be seen as a generalization of the wellknown iterative Baum-Welch Algorithm. The parameter re-estimates in our procedure can be formally derived and its local convergence can be formally proved. Compared to existing methods, our procedure offers the following advantages: (1) better convergence characteristics than the standard Baum-Welch algorithm, (2) automatic reduction of model size to the right complexity fit, (3) better generalization, and (4) relative insensitivity to the initial model size. We demonstrate these features by presenting empirical results on the problem of recognizing DNA promoter sequences.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
H. Akaike. Information theory and an extension of the maximum likelihood principle. In B. Petrov and F. Csaki, editors, Proc. of the 2nd International Symposium on Information Theory, pages 267–281, Budapest, 1972. Akademiai Kaido.
H. Akaike. A new look at the statistical model identification. IEEE Trans. Automatic Control, 19(6):716–723, 1974.
K. Asai, S. Hayamizu, and K. Onizuka. HMM with protein structure grammar. In HICSS-93, 1993. to appear.
P. Baldi, Y. Chauvin, T. Hunkapiller, and McClure M. A. Hidden Markov Models in molecular biology: New algorithms and applications. In C. L. Giles, S. J. Hanson, and J. D. Cowan, editors, Advances in Neural Information Processing Systems 5. Mougan Kaufman, 1993. to appear.
L. Breiman, J. H. Friedman, Olshen R. A., and C. J. Stone. Classification and Regression Trees. The Wadsworth statistics/probability series. Wadsworth, Inc., 1984.
B. Efron. The jackknife, the bootstrap and other resampling plans. In SIAM, 1982.
T. Hanazawa, T. Kawabata, and K. Shikano. Recognition of Japanese voiced stops using Hidden Markov Models. Nihon-Onkyou-Gakkai-Shi, 45(10):776–785, 1989. (in Japanese).
D. Haussler, A. Krogh, I. S. Mian, and K. Sjölander. Protein modeling using Hid-den Markov Models: Analysis of globins. Technical Report UCSC-CRL-92-23, University of California, 1992.
S. Kullback and R. A. Leibler. On information and sufficiency. Ann. Math. Stat., 22:79–86, 1951.
S. E. Levinson, L. R. Rabiner, and M. M. Sondhi. An introduction to the application of the theory of probabilitic functions of a Markov process to automatic speech recognition. Bell Systems Technical Journal, 62:1035–74, 1983.
L. A. Liporace. Maximum likelihood estimation for multivariate observations of Markov sources. IEEE Trans. Info. Theory, IT-28(5):729–734, 1982.
Morita. From information coding to MDL principle. SUURI-KAGAKU, 25(8):25–31, 1987. (in Japanese).
M. Noordewier, G. Towell, and J. Shavlik. Learning to recognize promoters in DNA sequences. In Proceedings of 1990 AAAI Spring Symposium on Artificial Intelligence and Molecular Biology, 1990.
J. R. Quinlan. Generating production rules from decision trees. In Proc. of IJCAI-87, pages 304–307, 1987.
L. R. Rabiner and B. H. Juang. An introduction to Hidden Markov Models. IEEE ASSP MAGAZINE, pages 4–16, January 1986.
J. Rissanen. Modelling by shortest data description. Automatica, 14:465–471, 1978.
J. Rissanen. Stochastic complexity. Journal of Royal Statistical Society, B, 49(3):223–239, 1987.
A. Stolcke and S. Omohundro. Hidden Markov Model induction by Bayesian model merging. In C. L. Giles, S. J. Hanson, and J. D. Cowan, editors, Advances in Neural Information Processing Systems 5. Morgan Kaufman, 1993. to appear.
J. Takami and S. Sagayama. A successive state splitting algorithm for efficient allophone modeling. In Proceedings of ICASSP, 1992.
S. M. Weiss and N. Indurkhya. Reduced complexity rule induction. In Proc. of IJCAI-91, pages 678–684, 1991.
S. M. Weiss and C. Kulikowski. Computer Systems That Learn. Morgan Kaufmann, 1991.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1993 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Iwayama, M., Indurkhya, N., Motoda, H. (1993). A new algorithm for automatic configuration of Hidden Markov Models. In: Jantke, K.P., Kobayashi, S., Tomita, E., Yokomori, T. (eds) Algorithmic Learning Theory. ALT 1993. Lecture Notes in Computer Science, vol 744. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57370-4_51
Download citation
DOI: https://doi.org/10.1007/3-540-57370-4_51
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57370-8
Online ISBN: 978-3-540-48096-9
eBook Packages: Springer Book Archive