DOI QR코드

DOI QR Code

Comparison of Methods for Reducing the Dimension of Compositional Data with Zero Values

  • Received : 2012.05.02
  • Accepted : 2012.06.16
  • Published : 2012.07.31

Abstract

Compositional data consist of compositions that are non-negative vectors of proportions with the unit-sum constraint. In disciplines such as petrology and archaeometry, it is fundamental to statistically analyze this type of data. Aitchison (1983) introduced a log-contrast principal component analysis that involves logratio transformed data, as a dimension-reduction technique to understand and interpret the structure of compositional data. However, the analysis is not usable when zero values are present in the data. In this paper, we introduce 4 possible methods to reduce the dimension of compositional data with zero values. Two real data sets are analyzed using the methods and the obtained results are compared.

Keywords

References

  1. Aitchison, J. (1982). The statistical analysis of compositional data(with discussion), Journal of the Royal Statistical Society, Series B, 44, 139-177.
  2. Aitchison, J. (1983). Principal component analysis of compositional data, Biometrika, 70, 57-65. https://doi.org/10.1093/biomet/70.1.57
  3. Aitchison, J. (1986). The Statistical Analysis of Compositional Data, Chapman and Hall, New York.
  4. Bacon-Shone, J. (1992). Ranking methods for compositional data, Applied Statistics, 41, 533-537. https://doi.org/10.2307/2348087
  5. Baxter, M. J., Cool, H. E. M. and Heyworth, M. P. (1990). Principal component and correspondence analysis of compositional data: Some similarities, Journal of Applied Statistics, 17, 229-235. https://doi.org/10.1080/757582834
  6. Butler, J. C. (1976). Principal component analysis using the hypothetical closed array, Journal of Mathematical Geology, 8, 25-36. https://doi.org/10.1007/BF01039682
  7. Gnanadesikan, R. (1977). Methods for Statistical Data Analysis of Multivariate Observations,Wiley, New York.
  8. Gower, J. C. (1967). Multivariate analysis and multidimensional geometry, Statistician, 17, 13-28. https://doi.org/10.2307/2987199
  9. Huh, M.-H. (1999). Quantification Methods for Multivariate Data, Freedom Academy, Seoul.
  10. Jolliffe, I. T. (2002). Principal Component Analysis, 2nd Edition Springer, New York.
  11. Kaiser, R. F. (1962). Composition and origin of glacial till, Mexico and Kasoag quadrangles, New York, Journal of Sedimentary Petrology, 32, 502-513.
  12. Le Maitre, R. W. (1968). Chemical variation within and between volcanic rock series - a statistical approach, Journal of Petrology, 9, 220-252. https://doi.org/10.1093/petrology/9.2.220
  13. Martin-Fernadez, J. A., Barcelo-Vidal, C. and Pawlowsky-Glahn, V. (2003). Dealing with zeros and missing values in compositional data sets using nonparametric imputation, Journal of Mathematical Geology, 35, 253-278. https://doi.org/10.1023/A:1023866030544
  14. Sibson, R. (1978). Studies in the robustness of multidimensional scaling, Journal of the Royal Statistical Society, Series B, 40, 234-238.
  15. Webb, W. N. and Briggs, L. I. (1966). The use of principal component analysis to screen mineralogical data, Journal of Geology, 74, 716-720. https://doi.org/10.1086/627206