Comparison of Methods for Reducing the Dimension of Compositional Data with Zero Values Song, Taeg-Youn; Choi, Byung-Jin;
Compositional data consist of compositions that are non-negative vectors of proportions with the unit-sum constraint. In disciplines such as petrology and archaeometry, it is fundamental to statistically analyze this type of data. Aitchison (1983) introduced a log-contrast principal component analysis that involves logratio transformed data, as a dimension-reduction technique to understand and interpret the structure of compositional data. However, the analysis is not usable when zero values are present in the data. In this paper, we introduce 4 possible methods to reduce the dimension of compositional data with zero values. Two real data sets are analyzed using the methods and the obtained results are compared.
Compositional data;dimension-reduction;log-contrast principal component analysis;correspondence analysis;ranked data;quantification method;
Aitchison, J. (1982). The statistical analysis of compositional data(with discussion), Journal of the Royal Statistical Society, Series B, 44, 139-177.
Aitchison, J. (1983). Principal component analysis of compositional data, Biometrika, 70, 57-65.
Aitchison, J. (1986). The Statistical Analysis of Compositional Data, Chapman and Hall, New York.
Bacon-Shone, J. (1992). Ranking methods for compositional data, Applied Statistics, 41, 533-537.
Baxter, M. J., Cool, H. E. M. and Heyworth, M. P. (1990). Principal component and correspondence analysis of compositional data: Some similarities, Journal of Applied Statistics, 17, 229-235.
Butler, J. C. (1976). Principal component analysis using the hypothetical closed array, Journal of Mathematical Geology, 8, 25-36.
Gnanadesikan, R. (1977). Methods for Statistical Data Analysis of Multivariate Observations,Wiley, New York.
Gower, J. C. (1967). Multivariate analysis and multidimensional geometry, Statistician, 17, 13-28.
Jolliffe, I. T. (2002). Principal Component Analysis, 2nd Edition Springer, New York.
Kaiser, R. F. (1962). Composition and origin of glacial till, Mexico and Kasoag quadrangles, New York, Journal of Sedimentary Petrology, 32, 502-513.
Le Maitre, R. W. (1968). Chemical variation within and between volcanic rock series - a statistical approach, Journal of Petrology, 9, 220-252.
Martin-Fernadez, J. A., Barcelo-Vidal, C. and Pawlowsky-Glahn, V. (2003). Dealing with zeros and missing values in compositional data sets using nonparametric imputation, Journal of Mathematical Geology, 35, 253-278.
Sibson, R. (1978). Studies in the robustness of multidimensional scaling, Journal of the Royal Statistical Society, Series B, 40, 234-238.
Webb, W. N. and Briggs, L. I. (1966). The use of principal component analysis to screen mineralogical data, Journal of Geology, 74, 716-720.