DOI QR코드

DOI QR Code

Nearest Neighbor Based Prototype Classification Preserving Class Regions

  • Hwang, Doosung (Dept. of Software, Dankook University) ;
  • Kim, Daewon (Dept. of Applied Computer Engineering, Dankook University)
  • Received : 2016.10.06
  • Accepted : 2017.05.18
  • Published : 2017.10.31

Abstract

A prototype selection method chooses a small set of training points from a whole set of class data. As the data size increases, the selected prototypes play a significant role in covering class regions and learning a discriminate rule. This paper discusses the methods for selecting prototypes in a classification framework. We formulate a prototype selection problem into a set covering optimization problem in which the sets are composed with distance metric and predefined classes. The formulation of our problem makes us draw attention only to prototypes per class, not considering the other class points. A training point becomes a prototype by checking the number of neighbors and whether it is preselected. In this setting, we propose a greedy algorithm which chooses the most relevant points for preserving the class dominant regions. The proposed method is simple to implement, does not have parameters to adapt, and achieves better or comparable results on both artificial and real-world problems.

Keywords

References

  1. T. Hastie, R. Tibshirani, and J. H. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York, NY: Springer, 2001.
  2. X. Wu and V. Kumar, The Top Ten Algorithms in Data Mining. Boca Raton, FL: CRC Press, 2009.
  3. D. R. Wilson and T. R. Martinez, "Reduction techniques for instance-based learning algorithms," Machine Learning, vol. 38, no. 3, pp. 257-286, 2000. https://doi.org/10.1023/A:1007626913721
  4. J. A. Olvera-Lopez, J. A. Carrasco-Ochoa, J. F. Martinez-Trinidad, and J. Kittler, "A review of instance selection methods," Artificial Intelligence Review, vol. 34, no. 2, pp. 133-143, 2010. https://doi.org/10.1007/s10462-010-9165-y
  5. S. Garcia, J. Derrac, J. Cano, and F. Herrera, "Prototype selection for nearest neighbor classification: taxonomy and empirical study," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 3, pp. 417-435, 2012. https://doi.org/10.1109/TPAMI.2011.142
  6. J. Bien and R. Tibshirani, "Prototype selection for interpretable classification," The Annals of Applied Statistics, vol. 5, no. 4, pp. 2403-2424, 2011. https://doi.org/10.1214/11-AOAS495
  7. H. A. Fayed, S. R. Hashem, and A. F. Atiya, "Self-generating prototypes for pattern classification," Pattern Recognition, vol. 40, no. 5, pp. 1498-1509, 2007. https://doi.org/10.1016/j.patcog.2006.10.018
  8. I. Triguero, J. Derrac, S. Garcia, and F. Herrera, "A taxonomy and experimental study on prototype generation for nearest neighbor classification," IEEE Transactions on Systems, Man, and Cybernetics Part C (Applications and Reviews), vol. 42, no. 1, pp. 86-100, 2012. https://doi.org/10.1109/TSMCC.2010.2103939
  9. M. J. Hudak, "RCE classifiers: theory and practice," Cybernetics and Systems, vol. 23, no. 5, pp. 483-515, 1992. https://doi.org/10.1080/01969729208927478
  10. J. Wang, P. Neskovic, and L. N. Cooper, "Learning class regions by sphere covering," Brown University, Providence, RI, IBNS Technical Report 2006-02, 2006.
  11. I. Takigawa, M. Kudo, and A. Nakamura, "Convex sets as prototypes for classifying patterns," Engineering Applications of Artificial Intelligence, vol. 22, no. 1, pp. 101-108, 2009. https://doi.org/10.1016/j.engappai.2008.05.012
  12. R. Younsi and A. Bagnall, "A randomized sphere cover classifier," in Proceedings of the 11th International Conference on Intelligent Data Engineering and Automated Learning, Paisley, UK, 2010, pp. 234-241.
  13. D. Marchette, "Class cover catch digraphs," Wiley Interdisciplinary Reviews: Computational Statistics, vol. 2, no. 2, pp. 171-177, 2010. https://doi.org/10.1002/wics.70
  14. F. Angiulli, "Fast nearest neighbor condensation for large data sets classification," IEEE Transactions on Knowledge and Data Engineering, vol. 19, pp. 1450-1464, 2007. https://doi.org/10.1109/TKDE.2007.190645
  15. D. Hwang and D. Kim, "Near-boundary data selection for fast support vector machines," Malaysian Journal of Computer Science, vol. 25, no. 1, pp. 23-37, 2013.
  16. M. Marchand and J. S. Taylor, "The set covering machine," Journal of Machine Learning, vol. 3, pp. 723-746, 2003.
  17. A. H. Cannon and L. J. Cowen, "Approximation algorithms for the class cover problem," Annals of Mathematics and Artificial Intelligence, vol. 40, no. 3-4, pp. 215-223, 2004. https://doi.org/10.1023/B:AMAI.0000012867.03976.a5
  18. V. V. Vazirani, Approximation Algorithms. New York, NY: Springer, 2001.
  19. The GLU Linear Programming Kit Package [Online]. Available: https://www.gnu.org/software/glpk/.
  20. UCI Machine Learning Repository [Online]. Available: http://archive.ics.uci.edu/ml/.
  21. KEEL-Dataset Repository [Online]. Available: http://sci2s.ugr.es/keel/datasets.php.