An Intelligence Support System Research on KTX Rolling Stock Failure Using Case-based Reasoning and Text Mining

사례기반추론과 텍스트마이닝 기법을 활용한 KTX 차량고장 지능형 조치지원시스템 연구

  • Lee, Hyung Il (Department of Business Informatics, Graduate School, Hanyang University) ;
  • Kim, Jong Woo (School of Business, Hanyang University)
  • 이형일 (한양대학교 일반대학원 비즈니스인포매틱스학과) ;
  • 김종우 (한양대학교 경영대학 경영학부)
  • Received : 2019.11.14
  • Accepted : 2020.03.14
  • Published : 2020.03.31


KTX rolling stocks are a system consisting of several machines, electrical devices, and components. The maintenance of the rolling stocks requires considerable expertise and experience of maintenance workers. In the event of a rolling stock failure, the knowledge and experience of the maintainer will result in a difference in the quality of the time and work to solve the problem. So, the resulting availability of the vehicle will vary. Although problem solving is generally based on fault manuals, experienced and skilled professionals can quickly diagnose and take actions by applying personal know-how. Since this knowledge exists in a tacit form, it is difficult to pass it on completely to a successor, and there have been studies that have developed a case-based rolling stock expert system to turn it into a data-driven one. Nonetheless, research on the most commonly used KTX rolling stock on the main-line or the development of a system that extracts text meanings and searches for similar cases is still lacking. Therefore, this study proposes an intelligence supporting system that provides an action guide for emerging failures by using the know-how of these rolling stocks maintenance experts as an example of problem solving. For this purpose, the case base was constructed by collecting the rolling stocks failure data generated from 2015 to 2017, and the integrated dictionary was constructed separately through the case base to include the essential terminology and failure codes in consideration of the specialty of the railway rolling stock sector. Based on a deployed case base, a new failure was retrieved from past cases and the top three most similar failure cases were extracted to propose the actual actions of these cases as a diagnostic guide. In this study, various dimensionality reduction measures were applied to calculate similarity by taking into account the meaningful relationship of failure details in order to compensate for the limitations of the method of searching cases by keyword matching in rolling stock failure expert system studies using case-based reasoning in the precedent case-based expert system studies, and their usefulness was verified through experiments. Among the various dimensionality reduction techniques, similar cases were retrieved by applying three algorithms: Non-negative Matrix Factorization(NMF), Latent Semantic Analysis(LSA), and Doc2Vec to extract the characteristics of the failure and measure the cosine distance between the vectors. The precision, recall, and F-measure methods were used to assess the performance of the proposed actions. To compare the performance of dimensionality reduction techniques, the analysis of variance confirmed that the performance differences of the five algorithms were statistically significant, with a comparison between the algorithm that randomly extracts failure cases with identical failure codes and the algorithm that applies cosine similarity directly based on words. In addition, optimal techniques were derived for practical application by verifying differences in performance depending on the number of dimensions for dimensionality reduction. The analysis showed that the performance of the cosine similarity was higher than that of the dimension using Non-negative Matrix Factorization(NMF) and Latent Semantic Analysis(LSA) and the performance of algorithm using Doc2Vec was the highest. Furthermore, in terms of dimensionality reduction techniques, the larger the number of dimensions at the appropriate level, the better the performance was found. Through this study, we confirmed the usefulness of effective methods of extracting characteristics of data and converting unstructured data when applying case-based reasoning based on which most of the attributes are texted in the special field of KTX rolling stock. Text mining is a trend where studies are being conducted for use in many areas, but studies using such text data are still lacking in an environment where there are a number of specialized terms and limited access to data, such as the one we want to use in this study. In this regard, it is significant that the study first presented an intelligent diagnostic system that suggested action by searching for a case by applying text mining techniques to extract the characteristics of the failure to complement keyword-based case searches. It is expected that this will provide implications as basic study for developing diagnostic systems that can be used immediately on the site.


  1. Aamodt, A. and E. Plaza, "Case-based Reasoning: Foundational Issues, Methodological Variations, and System Approaches," AI communications, Vol.7, No.1(1994), 39-59.
  2. Ahn, T. B. and J. T. Park, "Development of Model for Knowledge of Railway Facility Failure Cases," Journal of The Korean Society for Railway, Vol.22, No.2(2019), 169-177.
  3. Ahn, T. K. and K. J. Park, "Case-Based Expert System for EMU," Proceedings of Conference of The Korean Institute of Electrical Engineers, (2006), 1085-1086.
  4. Choi, S. J. and M. H. Kim, "Case Study on the KTX High Speed Rolling Stock Maintenance Characteristic by Analyzing Failures Statistics for 10 Years," Proceedings of Conference of The Korean Society for Railway, (2014), 1297-1302.
  5. Eom, J. K., "The Text-mining using Railway Accident Data," Journal of The Korean Society for Urban Railway, Vol.7, No.3 (2019), 397-405.
  6. Heo, G. E. and Y. G. Jung, "Efficient Text Documents Learning using Non-negative Matrix Factorization," Proceedings of Conference of The Korean Institute of Information Scientists and Engineers, Vol.36, No.2C (2009), 276-279.
  7. Jeon, S. M., H. W. Suh, and M. G. Jeong, "Automatic Failure Knowledge Extraction from Failure Analysis Documents," Proceedings of Conference of Society for Computational Design and Engineering, (2015), 12-22.
  8. Kim, B. J., S. Y. Lee, Y. D. Ahn, and S. J. Kang, "Wind Turbine Blade Fault Diagnosis System Using Machine Learning," Proceedings of Conference of The Korean Institute of Electrical Engineers, (2017), 1498-1499.
  9. Kim, D. S. and J. W. Kim, "Research Trend Analysis Using Bibliographic Information and Citations of Cloud Computing Articles: Application of Social Network Analysis," Journal of Intelligence and Information Systems, Vol.20, No.1(2014), 195-211.
  10. Le, Q. and T. Mikolov, "Distributed Representations of Sentences and Documents," Proceedings of International Conference on Machine Learning, (2014), 1188-1196.
  11. Lee, D. D. and H. S. Seung, "Algorithms for Non-negative Matrix Factorization," Advances in Neural Information Processing Systems, (2001), 556-562.
  12. Lee, G. J., B. Y. An, and M. H. Kim, "The Hybrid of Artificial Neural Networks and Case-based Reasoning for Diagnosis System," Proceedings of Conference of Korean Institute of Intelligent Systems, Vol.16, No.1 (2006), 130-133.
  13. Lee, J. S. and H. S. Myoung, "Development of a Book Recommender System for Internet Bookstore using Case-based Reasoning," Journal of Society for e-Business Studies, Vol.13, No.4(2008), 173-191.
  14. Lee, J. S. and Y. K. Kim, "A Hybrid Malfunction Diagnostic System Using Rules and Cases," Journal of Intelligence and Information Systems, Vol.4, No.1(1998), 115-131.
  15. Lee, W. Y., "Diagnostic Reasoning," Journal of Communications of the Korean Institute of Information Scientists and Engineers, Vol.10, No.4(1992), 50-55.
  16. Lee, W. Y., "A study on Fault Diagnosis Methodology [written in Korean]," Proceedings of Conference of Korean Institute of Industrial Engineers, (1998), 763-765.
  17. Linoff, G. S. and M. J. Berry, Data Mining Techniques: for Marketing, Sales, and Customer Relationship Management. Third Edition, John Wiley & Sons, New Jersey, 2011.
  18. Mikolov, T., K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space", arXiv preprint arXiv:1301.3781, (2013).
  19. Park, K. H., K. S. Kim, and J. W. Lee, "Efficient Expert System Establish using Text Data of Crop Disease based on Cosine Similarity," Proceedings of Conference of The Korean Institute of Communications and Information Sciences, (2018), 312-313.
  20. Park, K. J., "The Development of Case-Based Fault Diagnosis Expert System of Urban Transit Vehicles," Proceedings of Conference of Korean Society for Precision Engineering, (2012), 1249-1250.
  21. Park, S. K. and M. C. Shin, "Implementation of Korean Sentence Similarity using Sent2Vec Sentence Embedding," Proceedings of Conference of Human and Language Technology, (2018), 541-545.
  22. Park, Y. K., S. B. Park, N. I. Park, and H. A. Lee, "Web News Classification Using Latent Semantic Analysis," Proceedings of Conference of The Korean Institute of Information Scientists and Engineers, (2017), 1828-1830.
  23. Song, G. J. and J. J. Lim, "A Study on the Diagnosis and Prediction System of Vehicle Faults Using Condition Based Maintenance Technique," Journal of The Korea Institute of Intelligent Transport System, Vol.18, No.4 (2019), 80-95.
  24. Wang, A., "An Industrial Strength Audio Search Algorithm," Ismir, Vol.2003, (2003), 7-13.
  25. Yoon, M. H., J. H. Kim, and H. Jin, "Prediction for Performance of KNN in Diagnosis considering Features of Coronary Artery Disease Dataset," Proceedings of Conference of The Institute of Electronics and Information Engineers, (2013), 834-838.