Online Selective-Sample Learning of Hidden Markov Models for Sequence Classification

Kim, Minyoung

  • Received : 2015.08.14
  • Accepted : 2015.09.24
  • Published : 2015.09.25


We consider an online selective-sample learning problem for sequence classification, where the goal is to learn a predictive model using a stream of data samples whose class labels can be selectively queried by the algorithm. Given that there is a limit to the total number of queries permitted, the key issue is choosing the most informative and salient samples for their class labels to be queried. Recently, several aggressive selective-sample algorithms have been proposed under a linear model for static (non-sequential) binary classification. We extend the idea to hidden Markov models for multi-class sequence classification by introducing reasonable measures for the novelty and prediction confidence of the incoming sample with respect to the current model, on which the query decision is based. For several sequence classification datasets/tasks in online learning setups, we demonstrate the effectiveness of the proposed approach.


Machine learning;Sequence classification;Online learning;Hidden Markov models


  1. B. Kaluza, V. Mirchevska, E. Dovgan, M. Lustrek, and M. Gams, “An agent-based approach to care in independent living,” 2010. International Joint Conference on Ambient Intelligence, Malaga, Spain.
  2. L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, 1989.
  3. A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society B, vol. 39, pp. 185–197, 1977.
  4. J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” Journal of Machine Learning Research, vol. 12, pp. 2121–2159, 2011.
  5. R. Tanawongsuwan and A. Bobick, "Characteristics of time-distance gait parameters across speeds," 2003. Graphics, Visualization, and Usability Center, Georgia Institute of Technology, Atlanta, GA, TR GIT-GVU-03-01.
  6. R. Tanawongsuwan and A. Bobick, “Performance analysis of time-distance gait parameters under different speeds,” 2003. International Conference on Audio and Video Based Biometric Person Authentication.
  7. S. Hettich and S. D. Bay, “The UCI KDD Archive (,” 1999. Irvine, University of California, Information and Computer Science.
  8. J. Lien, T. Kanade, J. Cohn, and C. Li, “Detection, tracking, and classification of action units in facial expression,” 1999. Journal of Robotics and Autonomous Systems.
  9. P. Viola and M. Jones, “Robust real-time object detection,” International Journal of Computer Vision, vol. 57, no. 2, pp. 137–154, 2001.
  10. P. Yang, Q. Liu, and D. N. Metaxas, “Rankboost with l1 regularization for facial expression recognition and intensity estimation,” 2009. International Conference on Computer Vision.
  11. B. H. Juang and L. R. Rabiner, “A probabilistic distance measure for hidden Markov models,” 1985. AT&T Technical Journal.
  12. F. Sha and L. K. Saul, "Large margin hidden Markov models for automatic speech recognition," 2007. Advances in Neural Information Processing Systems 19.
  13. T. Starner and A. Pentland, “Real-time American sign language recognition from video using hidden Markov models,” 1995. International Symposium on Computer Vision.
  14. A. D. Wilson and A. F. Bobick, “Parametric hidden Markov models for gesture recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 9, pp. 884–900, 1999.
  15. J. Alon, S. Sclaroff, G. Kollios, and V. Pavlovic, “Discovering clusters in motion time-series data,” 2003. Computer Vision and Pattern Recognition.
  16. R. Durbin, S. Eddy, A. Krogh, and G. Mitchenson, Biological Sequence Analysis. Cambridge University Press, 2002.
  17. K. Crammer, “Doubly aggressive selective sampling algorithms for classification,” 2014. International Conference on AI & Statistics.
  18. S. Dasgupta, A. T. Kalai, and C. Monteleoni, “Analysis of perceptron-based active learning,” 2005. Conference on Learning Theory.
  19. A. Beygelzimer, S. Dasgupta, and J. Langford, “Importance weighted active learning,” 2009. International Conference on Machine Learning.
  20. N. Cesa-Bianchi, C. Gentile, and L. Zaniboni, “Worst-case analysis of selective sampling for linear classification,” Journal of Machine Learning Research, vol. 7, pp. 1205–1230, 2006.
  21. C. Leslie, E. Eskin, and W. S. Noble, “The spectrum kernel: A string kernel for SVM protein classification,” Pacific Symposium on Biocomputing, vol. 7, pp. 566–575, 2002.

Cited by

  1. Frequentist and Bayesian Learning Approaches to Artificial Intelligence vol.16, pp.2, 2016,