DOI QR코드

DOI QR Code

A fast approximate fitting for mixture of multivariate skew t-distribution via EM algorithm

  • Kim, Seung-Gu (Department of Data and Information, Sangji University)
  • Received : 2020.01.13
  • Accepted : 2020.02.12
  • Published : 2020.03.31

Abstract

A mixture of multivariate canonical fundamental skew t-distribution (CFUST) has been of interest in various fields. In particular, interest in the unsupervised learning society is noteworthy. However, fitting the model via EM algorithm suffers from significant processing time. The main cause is due to the calculation of many multivariate t-cdfs (cumulative distribution functions) in E-step. In this article, we provide an approximate, but fast calculation method for the in univariate fashion, which is the product of successively conditional univariate t-cdfs with Taylor's first order approximation. By replacing all multivariate t-cdfs in E-step with the proposed approximate versions, we obtain the admissible results of fitting the model, where it gives 85% reduction time for the 5 dimensional skewness case of the Australian Institution Sport data set. For this approach, discussions about rough properties, advantages and limits are also presented.

References

  1. Arellano-Valle RB and Genton MG (2005). On fundamental skew distributions, Journal of Multivariate Analysis, 96, 93-116 https://doi.org/10.1016/j.jmva.2004.10.002
  2. Cook RD and Weisberg S (1994). An Introduction to Regression Graphics, John Wiley & Sons, New York.
  3. Genz A and Bretz F (2009). Computation of multivariate normal and t probabilities, Lecture Notes in Statistics, 195, Springer-Verlag, Heidelberg.
  4. Ho HJ, Lin TI, Chen HY, andWang WL (2012). Some results on the truncated multivariate t distribution, Journal of Multivariate Analysis, 96, 93-116.
  5. Kim SG (2016). An approximation fitting for mixture of multivariate skew normal distribution via EM algorithm, Korean Journal of Applied Statistics, 29, 513-523. https://doi.org/10.5351/KJAS.2016.29.3.513
  6. Lee SX and McLachlan GJ (2016a). Finite mixtures of canonical fundamental skew t-distributions: The unification of the unrestricted and restricted skew t-mixture models, Statistics and Computing, 26, 573-586. https://doi.org/10.1007/s11222-015-9545-x
  7. Lee SX and McLachlan GJ (2016b). A simple parallel EM algorithm for statistical learning via mixture models. arXiv:1606.02054 [stat.CO] 7 Jun 2016.
  8. Lin TI (2010). Robust mixture modelling using multivariate skew t distribution, Statistics and Computing, 20, 343-356. https://doi.org/10.1007/s11222-009-9128-9
  9. Lin TI, Wang WL, McLachlan GJ, and Lee SX (2018). Robust mixtures of factor analysis models using the restricted multivariate skew-t distribution, Statistical Modelling, 18, 50-72. https://doi.org/10.1177/1471082X17718119
  10. Pyne S, Hu X, Wang K, et al. (2009). Automated high-dimensional flow cytometric data analysis. In Proceedings of the National Academy of Sciences of the United States of America, 106, 8519-8524. https://doi.org/10.1073/pnas.0903028106
  11. Zogheib B and Elsaheli A (2015). Approximations of the t distribution, Applied Mathematical Sciences, 9, 2445-2449. https://doi.org/10.12988/ams.2015.52148