Geodesic Clustering for Covariance Matrices

  • Lee, Haesung (Department of Statistics, Pennsylvania State University) ;
  • Ahn, Hyun-Jung (Kantar Health) ;
  • Kim, Kwang-Rae (School of Mathematical Sciences, University of Nottingham) ;
  • Kim, Peter T. (Department of Mathematics and Statistics, University of Guelph) ;
  • Koo, Ja-Yong (Department of Statistics, Korea University)
  • Received : 2015.04.02
  • Accepted : 2015.06.20
  • Published : 2015.07.31


The K-means clustering algorithm is a popular and widely used method for clustering. For covariance matrices, we consider a geodesic clustering algorithm based on the K-means clustering framework in consideration of symmetric positive definite matrices as a Riemannian (non-Euclidean) manifold. This paper considers a geodesic clustering algorithm for data consisting of symmetric positive definite (SPD) matrices, utilizing the Riemannian geometric structure for SPD matrices and the idea of a K-means clustering algorithm. A K-means clustering algorithm is divided into two main steps for which we need a dissimilarity measure between two matrix data points and a way of computing centroids for observations in clusters. In order to use the Riemannian structure, we adopt the geodesic distance and the intrinsic mean for symmetric positive definite matrices. We demonstrate our proposed method through simulations as well as application to real financial data.


Supported by : National Research Foundation of Korea (NRF)


  1. Ai, X. W., Hu, T., Li, X. and Xiong, H. (2010). Clustering high-frequency stock data for trading volatility analysis, In Proceedings of 9th International Conference on Machine Learning and Applications (ICMLA), Washington, DC, 333-338.
  2. Asgharbeygi, N. and Maleki, A. (2008). Geodesic k-means clustering, In Proceedings of 19th International Conference on Pattern Recognition (ICPR 2008), Tampa, FL, 1-4.
  3. Bhattacharya, R. and Patrangenaru, V. (2003). Large sample theory of intrinsic and extrinsic sample means on manifolds. I, Annals of Statistics, 31, 1-29.
  4. Cachier, P., Pennec, X. and Ayache, N. (1999). Fast non rigid matching by gradient descent: Study and improvements of the "demons" algorithm, RR-3706, Available from:
  5. Fletcher, P. T. and Joshi, S. (2004). Principal geodesic analysis on symmetric spaces: Statistics of diffusion tensors. In M. Sonka, et al. (Eds.), Computer Vision and Mathematical Methods in Medical and Biomedical Image Analysis, Springer, Heidelberg, 87-98.
  6. Goh, A. and Vidal, R. (2008). Clustering and dimensionality reduction on Riemannian manifolds, In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR2008), Anchorage, AK, 1-7.
  7. Hartigan, J. A. andWong, M. A. (1979). Algorithm AS 136: A k-means clustering algorithm, Journal of the Royal Statistical Society Series C (Applied Statistics), 28, 100-108.
  8. Jayasumana, S., Hartley, R., Salzmann, M., Li, H., and Harandi, M. (2013). Kernel methods on the Riemannian manifold of symmetric positive definite matrices, In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR2013), Portland, OR, 73-80.
  9. Kim, J., Shim, K. H. and Choi, S. (2007). Soft geodesic kernel k-means, In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2007), Honolulu, HI, 429-432.
  10. Schwartzman, A. (2006). Random ellipsoids and false discovery rates: Statistics for diffusion tensor imaging data (Doctoral dissertation), Stanford University, CA.
  11. Xu, R. and Wunsch, D. (2005). Survey of clustering algorithms, IEEE Transactions on Neural Networks, 16, 645-678.
  12. Wang, Z. and Vemuri, B. C. (2005). DTI segmentation using an information theoretic tensor dissimilarity measure, IEEE Transactions on Medical Imaging, 24, 1267-1277.