DOI QR코드

DOI QR Code

Store Sales Prediction Using Gradient Boosting Model

그래디언트 부스팅 모델을 활용한 상점 매출 예측

  • Choi, Jaeyoung (Library and Information Science, Sungkyunkwan University) ;
  • Yang, Heeyoon (Library and Information Science, Sungkyunkwan University) ;
  • Oh, Hayoung (College of Computing & Informatics, Sungkyunkwan University)
  • Received : 2020.11.09
  • Accepted : 2020.12.19
  • Published : 2021.02.28

Abstract

Through the rapid developments in machine learning, there have been diverse utilization approaches not only in industrial fields but also in daily life. Implementations of machine learning on financial data, also have been of interest. Herein, we employ machine learning algorithms to store sales data and present future applications for fintech enterprises. We utilize diverse missing data processing methods to handle missing data and apply gradient boosting machine learning algorithms; XGBoost, LightGBM, CatBoost to predict the future revenue of individual stores. As a result, we found that using median imputation onto missing data with the appliance of the xgboost algorithm has the best accuracy. By employing the proposed method, fintech enterprises and customers can attain benefits. Stores can benefit by receiving financial assistance beforehand from fintech companies, while these corporations can benefit by offering financial support to these stores with low risk.

최근 머신러닝의 발전에 따라 일상생활과 산업에서 기술을 적용하는 사례들이 많아지고 있다. 금융 데이터와 머신러닝 기법을 활용한 연구 또한 활발하게 이루어지고 있다. 본 논문은 이러한 동향에 따라 상점 매출 데이터에 머신러닝 기법을 접목해 매출 예측 모델을 구축, 핀테크 산업에서의 활용 방안을 제시한다. 다양한 결측치 처리 기법을 적용하고 그래디언트 부스팅 기반의 머신러닝 기법인 XGBoost, LightGBM, CatBoost를 사용하여 각 모델의 상점 매출예측 성능을 비교한다. 연구 결과, 단일대체법 중 중앙값 대체법을 사용한 데이터셋에 XGBoost를 활용해 예측을 진행한 모델의 성능이 가장 우수했다. 연구를 통해 얻은 모델을 이용하여 상점의 매출 예측을 진행함으로서 핀테크 기업의 고객 상점들은 대출금을 상환하기 전 금융 보조를 받는 근거로, 핀테크 기업은 상환 가능성이 높은 우수 상점에 금융 상품을 제공하는 등 기업과 고객 모두에게 긍정적인 방향으로 활용할 수 있다.

Keywords

References

  1. J. M. Yoon, "Effectiveness Analysis of Credit Card Default Risk with Deep Learning Neural Network," Journal of Money & Finance, vol. 33, no. 1, pp. 151-183, Mar. 2019. https://doi.org/10.21023/JMF.33.1.5
  2. Kaggle. UCI Credit Card Dataset [Internet]. Available: https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset.
  3. A. Shen, R. Tong, and Y. Deng, "Application of Classification Models on Credit Card Fraud Detection," in 2007 International Conference on Service Systems and Service Management, pp. 1-4, Jul. 2007.
  4. B. M. Pavlyshenko, "Machine-Learning Models for Sales Time Series Forecasting," Data, vol. 4, no. 1, Apr. 2019.
  5. Kaggle. Rossmann Store Sales [Internet]. Available: https://www.kaggle.com/c/rossmann-store-sales.
  6. J. H. Lee, "Stock price prediction model using deep learning," M. S. Thesis, Soongsil University, Seoul, 2016.
  7. S. B. Jha, R. F. Babiceanu, V. Pandey, and R. K. Jha, "Housing Market Prediction Problem using Different Machine Learning Algorithms: A Case Study," arXiv: 2006.10092v1, Jun. 2020.
  8. H. Kim, "The Prediction of PM2.5 in Seoul through XGBoost Ensemble," Journal of the Korean Data Analysis Society, vol. 22, no. 4, pp. 1661-1671, Aug. 2020. https://doi.org/10.37727/jkdas.2020.22.4.1661
  9. Y. G. Lee, J. Y. Oh, and G. B. Kim, "Interpretation of Load Forecasting Using Explainable Artificial Intelligence Techniques," The Transactions of the Korean Institute of Electrical Engineers, vol. 69, no. 3, pp. 480-485, Feb. 2020. https://doi.org/10.5370/kiee.2020.69.3.480
  10. S. I. Jang and K. C. Kwak, "Comparison of Safety Driver Prediction Performance with XGBoost and LightGBM," in Proceeding of Korea Institute of Infomation Technology Conference, pp. 360-362, Jun. 2019.
  11. Dacon. Korea data competition platform. Card Sales Prediction contest [Internet]. Available: https://dacon.io/competitions/official/140472/overview/.
  12. Dacon. Korea data competition platform [Internet]. Available: https://dacon.io/.
  13. R. J. A. Little and D. B. Rubin, Statistical Analysis with Missing Data, 2nd ed. Hambrug, NJ: John Wiley & Sons Inc., 2014.
  14. S. R. Lee, "Comparison of algorithms for the missing data imputation methods," M. S. Thesis, Hankuk University of Foreign Studies, Seoul, 2020.
  15. Yonsei Structure & Bridge Eng Lab. Interpolation [Internet]. Available: http://str.yonsei.ac.kr/korean/portal.php.
  16. J. Friedman, "Greedy Function Approximation: A Gradient Boosting Machine," The Annals of Statistics, 2nd ed. Cambridge, MA: The MIT Press., vol. 29, no. 5, pp.1189-1194, 2001.
  17. T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco: CA, pp. 785-794, 2016.
  18. Documents for Xgboost [Internet]. Available: https://xgboost.readthedocs.io/en/latest/#.
  19. G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T. Liu, "LightGBM: A Highly Efficient Gradient Boosting Decision Tree," in Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach: CA, pp. 3149-3157, 2017.
  20. Documents for Lightgbm [Internet]. Available: https://lightgbm.readthedocs.io/en/latest/index.html.
  21. L. Prokhorenkova, G. Gusev, A. Vorobev, V. A. Dorogush, and A. Gulin, "CatBoost: unbiased boosting with categorical feature," in Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 6639-6649, 2018.
  22. Documents for Catboost [Internet]. Available: https://catboost.ai/.