DOI QR코드

DOI QR Code

LIME을 활용한 준지도 학습 기반 이상 탐지 모델: 반도체 공정을 중심으로

Anomaly Detection Model Based on Semi-Supervised Learning Using LIME: Focusing on Semiconductor Process

  • 안강민 (한양대학교 일반대학원 경영컨설팅학과) ;
  • 신주은 (한양대학교 일반대학원 경영컨설팅학과) ;
  • 백동현 (한양대학교 경상대학 경영학부)
  • Kang-Min, An (Department of Management Consulting, Graduate School of Hanyang University) ;
  • Ju-Eun, Shin (Department of Management Consulting, Graduate School of Hanyang University) ;
  • Dong Hyun, Baek (Division of Business Administration, Hanyang University)
  • 투고 : 2022.11.24
  • 심사 : 2022.12.07
  • 발행 : 2022.12.31

초록

Recently, many studies have been conducted to improve quality by applying machine learning models to semiconductor manufacturing process data. However, in the semiconductor manufacturing process, the ratio of good products is much higher than that of defective products, so the problem of data imbalance is serious in terms of machine learning. In addition, since the number of features of data used in machine learning is very large, it is very important to perform machine learning by extracting only important features from among them to increase accuracy and utilization. This study proposes an anomaly detection methodology that can learn excellently despite data imbalance and high-dimensional characteristics of semiconductor process data. The anomaly detection methodology applies the LIME algorithm after applying the SMOTE method and the RFECV method. The proposed methodology analyzes the classification result of the anomaly classification model, detects the cause of the anomaly, and derives a semiconductor process requiring action. The proposed methodology confirmed applicability and feasibility through application of cases.

키워드

과제정보

This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2019S1A5C2A04083153)

참고문헌

  1. Al Sarah, N., Rifat, F.Y., Hossain, M.S., and Narman, H.S., An Efficient Android Malware Prediction Using Ensemble machine learning algorithms, Procedia Computer Science, 2021, Vol. 191, pp. 184-191.
  2. An, J.H., XAI, Explanable Artificial Intelligence, Dissects Artificial Intelligence, Wikibooks, 2020.
  3. Andrew Ng, Developing and Evaluating an Anomaly Detection System[Video], coursera, n.d., https://www.coursera.org/learn/machine-learning.
  4. Breiman, L., Random Forests, Machine Learning, 2001, Vol. 45, pp. 5-32. https://doi.org/10.1023/A:1010933404324
  5. Chandola, V., Banerjee, A., and Kumar, V., Anomaly detection: A survey, ACM computing surveys (CSUR), 2009, Vol. 41, No. 3, pp. 1-58. https://doi.org/10.1145/1541880.1541882
  6. Chauhan, K.K., Joshi, G., Kaur, M., and Vig, R., Semiconductor wafer defect classification using convolution neural network: a binary case, In IOP Conference Series: Materials Science and Engineering, 2022, Vol. 1225, No. 1, pp. 012060, IOP Publishing.
  7. Chawla, N.V., Bowyer, K.W., Hall, L.O., and Kegelmeyer, W. P., SMOTE: synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 2002, Vol. 16, pp. 321-357. https://doi.org/10.1613/jair.953
  8. Choi, S.J., Technology of Design and Manufacturing Process of Nano Semiconductor Devices, Free Academy, 2021.
  9. Cox, D.R., The regression analysis of binary sequences, Journal of the Royal Statistical Society: Series B (Methodological), 1958, Vol. 20, No. 2, pp. 215-232. https://doi.org/10.1111/j.2517-6161.1958.tb00292.x
  10. Doran, D., Schulz, S., and Besold, T.R., What does explainable AI really mean? A new conceptualization of perspectives, 2017, arXiv preprint arXiv:1710.00794.
  11. Ertekin, S., Huang, J., Bottou, L., and Giles, L., Learning on the border: active learning in imbalanced data classification, In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, 2007, pp. 127-136.
  12. Goodlin, B.E., Boning, D.S., Sawin, H.H., and Wise, B.M., Simultaneous fault detection and classification for semiconductor manufacturing tools, Journal of the Electrochemical Society, 2003, Vol. 150, No.12, G778.
  13. Guyon, I., Weston, J., Barnhill, S., and Vapnik, V., Gene selection for cancer classification using support vector machines, Machine Learning, 2002, Vol. 46, No. 1, pp. 389-422. https://doi.org/10.1023/a:1012487302797
  14. Hawkins, D.M., Identification of outliers, Biometrical Journal, 1980, London: Chapman and Hall, Vol. 29, pp. 198-198.
  15. Heo, S.W. and Baek, D.H., A Methodology for Bankruptcy Prediction in Imbalanced Datasets using eXplainable AI, Journal of Korean Society of Industrial and Systems Engineering, 2022, Vol.45, No.2, pp. 65-76. https://doi.org/10.11627/jksie.2022.45.2.065
  16. Jung, I.S., The Future of the Semiconductor Empire, Ire media, 2021.
  17. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., and Liu, T.Y., Lightgbm: A Highly Efficient Gradient Boosting Decision Tree, Advances in Neural Information Processing Systems, 2017, Vol. 30, pp. 3146-3154.
  18. Kim, C.G. and Kang, J.W., LSTM based Anomaly Detection on semiconductor manufacturing data, Proceedings of the Korean Information Science Society Conference, 2017, pp. 760-762.
  19. Kim, D.W., Shin, G.Y., Yun, J.Y., Kim, S.S., and Han, M.M., Application of Discrete Wavelet Transforms to Identify Unknown Attacks in Anomaly Detection Analysis, Journal of Internet Computing and Services, 2021, Vol. 22, No. 3, pp. 45- 52. https://doi.org/10.7472/JKSII.2021.22.3.45
  20. Kim, H.S. and Lee, H.S., Fault Detect and Classification Framework for Semiconductor Manufacturing Processes using Missing Data Estimation and Generative Adversary Network, Journal of the Korean Society of Intelligent Systems, 2018, Vol. 28, No.4, pp. 393-400.
  21. Kim, J.E., Park, N.S., Yun, S.J., Chae, S.H., and Yoon, S.M., Application of Isolation Forest Technique for Outlier Detection in Water Quality Data, Journal of Korean Society of Environmental Engineers, 2018, Vol. 40, No. 12, pp. 473-480. https://doi.org/10.4491/KSEE.2018.40.12.473
  22. Kim, J.K., Han, Y.S., and Lee, J.S., Data imbalance problem solving for smote based oversampling: Study on fault detection prediction model in semiconductor manufacturing process, Advanced Science and Technology Letters, 2016, Vol. 133, pp. 79-84.
  23. Kim, J.W., A Study on Deterministic Utilization of Facilities for Allocation in the Semiconductor Manufacturing, Journal of Korean Society of Industrial and Systems Engineering, 2016, Vol. 39, No. 1, pp. 153-161. https://doi.org/10.11627/jkise.2016.39.1.153
  24. Kim, J.W., Strategies to leverage manufacturing big data, haum, 2020.
  25. Korea Semiconductor Industry Association, Silicon Times, Vol. 601, 2021, https://ksia.or.kr/mail/20210607/1.pdf. 10607/1.pdf
  26. Kwon, C.M., Python Machine Learning Complete Guide, Wikibooks, 2020.
  27. Lee, J.H., A New Abnormal Yields Detection Methodology in the Semiconductor Manufacturing Process, Journal of Information Technology Applications & Management, 2008, Vol. 15, No.1, pp. 243-260.
  28. Lee, Y.J., Park, G.A., and Kim, S.J., Analysis of Landslide Hazard Area using Logistic Regression Analysis and AHP (Analytical Hierarchy Process) Approach, Journal of the Korean Society of Civil Engineers D, 2006, Vol. 26, No. 5D, pp. 861-867.
  29. Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., and Liu, H., Feature selection: A data perspective, ACM computing surveys (CSUR), 2017, Vol. 50, No. 6, pp. 1-45.
  30. Liao, D.Y., Chen, C.Y., Tsai, W.P., Chen, H.T., Wu, Y.T., and Chang, S.C., Anomaly detection for semiconductor tools using stacked autoencoder learning, In 2018 International Symposium on Semiconductor Manufacturing (ISSM), 2018, pp. 1-4, IEEE.
  31. Liu, F.T., Ting, K.M., and Zhou, Z., Isolation Forest, 2008 Eighth IEEE International Conference on Data Mining, 2008, pp. 413-422.
  32. Liu, J., Hu, Q., and Yu, D., A comparative study on rough set based class imbalance learning, Knowledge-Based Systems, 2008, Vol. 21, No. 8, pp. 753-763.
  33. Maggipinto, M., Beghi, A., and Susto, G.A., A Deep Convolutional Autoencoder-Based Approach for Anomaly Detection With Industrial, Non-Images, 2-Dimensional Data: A Semiconductor Manufacturing Case Study, IEEE Transactions on Automation Science and Engineering, 2022.
  34. McKinsey & Company, Game changers: Five opportunities for US growth and renewal, McKinsey Global Institute, 2013.
  35. Michael McCann and Adrian Johnston., UCI Machine Learning Repository, 2008, https://archive.ics.uci.edu/ ml/datasets/SECOM.
  36. Nam, C.H. and Jang, K.S., Korean Sentiment Model Interpretation using LIME Algorithm, Journal of the Korea Institute of Information and Communication Engineering, 2021, Vol. 25, No. 12, pp. 1784-1789.
  37. Randolph-Gips, M., A new neural network to process missing data without Imputation, In 2008 Seventh International Conference on Machine Learning and Applications, 2008, pp. 756-762, IEEE.
  38. Raschka, S. and Mirjalili, V., Machine Learning Textbook with Python, Scikit-Learn, TensorFlow, gilbut, 2019, pp. 137-140.
  39. Ribeiro, M.T., Singh, S., and Guestrin, C., "Why should i trust you?" Explaining the predictions of any classifier, In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 1135-1144.
  40. Rothman, D., Hands-On Explainable AI(XAI) with Python, DK Road Books, 2021.
  41. Samek, W., Wiegand, T., and Muller, K.R., Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models, 2017, arXiv preprint arXiv:1708.08296.
  42. Schlosser, T., Friedrich, M., Beuth, F., and Kowerko, D., Improving automated visual fault inspection for semiconductor manufacturing using a hybrid multistage system of deep neural networks, Journal of Intelligent Manufacturing, 2022, pp. 1-25.
  43. Scholkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., and Williamson, R.C., Estimating the support of a high-dimensional distribution, Neural Computation, 2001, Vol. 13, No. 7, pp. 1443-1471. https://doi.org/10.1162/089976601750264965
  44. Software Policy Research Institute, Explainable AI, 2021, https://spri.kr/posts/view/23296?code=industry_trend.
  45. Stehman, S.V., Selecting and interpreting measures of thematic classification accuracy, Remote sensing of Environment, 1997, Vol. 62, No. 1, pp. 77-89. https://doi.org/10.1016/S0034-4257(97)00083-7
  46. Stekhoven, D.J. and Buhlmann, P., MissForest-non-parametric missing value imputation for mixed-type data, Bioinformatics, 2012, Vol. 28, No. 1, pp. 112-118. https://doi.org/10.1093/bioinformatics/btr597
  47. Susto, G.A., Terzi, M., and Beghi, A., Anomaly detection approaches for semiconductor manufacturing, Procedia Manufacturing, 2017, Vol. 11, pp. 2018-2024.
  48. West, D., Dellana, S., and Qian, J., Neural network ensemble strategies for financial decision applications, Computers and Operations Research, 2005, Vol.32, No.10, pp. 2543-2559. https://doi.org/10.1016/j.cor.2004.03.017
  49. XGBoost Tutorials, XGBoost Tutorials - xgboost 1.4.0-SNAPSHOT documentation, https://xgboost.read thedocs.io/en/latest/tutorials/index.html.