A new classification method using penalized partial least squares

벌점 부분최소자승법을 이용한 분류방법

  • Kim, Yun-Dae (Department of Industrial and Management Engineering, POSTECH) ;
  • Jun, Chi-Hyuck (Department of Industrial and Management Engineering, POSTECH) ;
  • Lee, Hye-Seon (Department of Industrial and Management Engineering, POSTECH)
  • 김윤대 (포항공과대학교 산업경영공학과) ;
  • 전치혁 (포항공과대학교 산업경영공학과) ;
  • 이혜선 (포항공과대학교 산업경영공학과)
  • Received : 2011.08.16
  • Accepted : 2011.09.21
  • Published : 2011.10.01

Abstract

Classification is to generate a rule of classifying objects into several categories based on the learning sample. Good classification model should classify new objects with low misclassification error. Many types of classification methods have been developed including logistic regression, discriminant analysis and tree. This paper presents a new classification method using penalized partial least squares. Penalized partial least squares can make the model more robust and remedy multicollinearity problem. This paper compares the proposed method with logistic regression and PCA based discriminant analysis by some real and artificial data. It is concluded that the new method has better power as compared with other methods.

분류분석은 학습표본으로부터 분류규칙을 도출한 후 새로운 표본에 적용하여 특정 범주로 분류하는 방법이다. 데이터의 복잡성에 따라 다양한 분류분석 방법이 개발되어 왔지만, 데이터 차원이 높고 변수간 상관성이 높은 경우 정확하게 분류하는 것은 쉽지 않다. 본 연구에서는 데이터차원이 상대적으로 높고 변수간 상관성이 높을 때 강건한 분류방법을 제안하고자 한다. 부분최소자승법은 연속형데이터에 사용되는 기법으로서 고차원이면서 독립변수간 상관성이 높을 때 예측력이 높은 통계기법으로 알려져 있는 다변량 분석기법이다. 벌점 부분최소자승법을 이용한 분류방법을 실제데이터와 시뮬레이션을 적용하여 성능을 비교하고자 한다.

Keywords

References

  1. 이제영, 이종형 (2010). 서포트 벡터 머신 알고리즘을 활용한 연속형 데이터의 다중인자 차원축소방법 적용. <한국데이터정보과학회지>, 21, 1271-1280.
  2. 전치혁, 정민근, 이혜선 (2004). <공학응용통계>, 홍릉출판사, 서울.
  3. Barker, M. and Rayens, W. (2003). Partial least squares for discrimination. Journal of Chemometrics, 17, 166-173. https://doi.org/10.1002/cem.785
  4. Berger, R. (1981). Comparison of the Gompertz and logistic equations to describe plant disease progress. Phytopathology, 71, 716-719. https://doi.org/10.1094/Phyto-71-716
  5. Fort, G. and Lambert-Lacroix, S. (2005). Classification using partial least squares with penalized logistic regression. Bioinformatics, 21, 1104-1111. https://doi.org/10.1093/bioinformatics/bti114
  6. Geldadi, P. and Kowalski, B. (1986). Partial least-squares regression: A tutorial. Analytica Chemica Acta, 185, 1-17. https://doi.org/10.1016/0003-2670(86)80028-9
  7. Kemsley, E. K. (1996). Discriminant analysis of high-dimensional data: A comparison of principal components analysis and partial least squares data reduction methods. Chemometrics and Intelligent Laboratory Systems, 33, 47-61. https://doi.org/10.1016/0169-7439(95)00090-9
  8. Kramer, N., Boulesteix, A. and Tutz, G. (2008). Penalized Partial Least Squares with applications to Bspline transformations and functional data. Chemometrics and Intelligent Laboratory Systems, 94, 60-69. https://doi.org/10.1016/j.chemolab.2008.06.009
  9. Mallet, Y., Coomans, D. and de Vel, O. (1996). Recent developments in discriminant analysis on high dimensional spectral data. Chemometrics and Intelligent Laboratory Systems, 35, 157-173. https://doi.org/10.1016/S0169-7439(96)00050-0
  10. McCullagh, P. and Nelder, J. (1989). Generalized linear models, second edition, Chapman and Hall/CRC, Boca Raton.
  11. Nguyen, D. and Rocke, D. (2002). Tumor classification by partial least squares using microarray gene expression data. Bioinformatics, 18, 39-50. https://doi.org/10.1093/bioinformatics/18.1.39
  12. Preda, C., Saporta, G. and Leveder. C. (2007). PLS classification of functional data. Computational Statistics, 22, 223-235. https://doi.org/10.1007/s00180-007-0041-4
  13. Wold, S., Sjostrom, M. and Eriksson, L. (2001). PLS-regression: A basic tool of chemometrics. Chemometrics and Intelligent Laboratory System, 58, 109-130. https://doi.org/10.1016/S0169-7439(01)00155-1
  14. Wold, S., Rube, H., Wold, H. and Dunn, W.J. (1984). The collinearity problem in linear regression. The Partial Least Squares (PLS) approach to generalized inverses. SIAM Journal of Scientific and Statistical Computations, 5, 735-743. https://doi.org/10.1137/0905052
  15. Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparsep principal component analysis. Journal of Computational and Graphical Statistics, 15, 265-286. https://doi.org/10.1198/106186006X113430