DOI QR코드

DOI QR Code

Missing Data Modeling based on Matrix Factorization of Implicit Feedback Dataset

암시적 피드백 데이터의 행렬 분해 기반 누락 데이터 모델링

  • Ji, JiaQi (Department of Information Center, Hebei Normal University for Nationalities) ;
  • Chung, Yeongjee (Department of Computer and Software Engineering, Wonkwang University)
  • Received : 2019.03.11
  • Accepted : 2019.03.26
  • Published : 2019.05.31

Abstract

Data sparsity is one of the main challenges for the recommender system. The recommender system contains massive data in which only a small part is the observed data and the others are missing data. Most studies assume that missing data is randomly missing from the dataset. Therefore, they only use observed data to train recommendation model, then recommend items to users. In actual case, however, missing data do not lost randomly. In our research, treat these missing data as negative examples of users' interest. Three sample methods are seamlessly integrated into SVD++ algorithm and then propose SVD++_W, SVD++_R and SVD++_KNN algorithm. Experimental results show that proposed sample methods effectively improve the precision in Top-N recommendation over the baseline algorithms. Among the three improved algorithms, SVD++_KNN has the best performance, which shows that the KNN sample method is a more effective way to extract the negative examples of the users' interest.

데이터 희소성은 추천 시스템의 주요 과제 중 하나이다. 추천 시스템에서는, 일부분만 관찰된 데이터이고 다른 부분은 데이터가 누락된 대용량 데이터를 포함하고 있다. 대부분의 연구에서는, 데이터 세트에서 무작위로 데이터가 누락되었다고 가정하고, 관찰된 데이터만을 사용하여 추천 모델을 학습함으로써 사용자에게 항목을 추천하고 있다. 그러나, 실제로는 누락된 데이터는 무작위로 손실되었다고 볼 수 없다. 본 연구에서는, 누락 된 데이터를 사용자적 관심의 부정적인 예라고 간주하였다. 또한, 3가지 샘플 접근 방식을 SVD++ 알고리즘과 결합하여 SVD++_W, SVD++_R 그리고 SVD++_KNN 알고리즘을 제안하였다. 실험결과를 통하여, 제안한 3가지 샘플 접근 방식이 기존의 기본적인 알고리즘 보다 Top-N 추천에서 정확성과 회수율을 효과적으로 향상시킬 수 있다는 것을 보였다. 특히, SVD++_KNN 가 가장 우수한 성능을 보였는데, 이는 KNN 샘플 접근 방식이 사용자적 관심의 부정적인 예를 추출하는데 가장 효율적인 방법이라는 것을 보여주었다.

Keywords

HOJBC0_2019_v23n5_495_f0001.png 이미지

Fig. 1 Replacement of missing data with rn

HOJBC0_2019_v23n5_495_f0002.png 이미지

Fig. 2 Random replacement of missing data with rn

HOJBC0_2019_v23n5_495_f0003.png 이미지

Fig. 3 Flow chart of K-Nearest neighbor sample method

HOJBC0_2019_v23n5_495_f0004.png 이미지

Fig. 4 The curve of precision when rn = 0 and a is changed

HOJBC0_2019_v23n5_495_f0005.png 이미지

Fig. 5 The curve of precision when a=0.2 and rn is changed

HOJBC0_2019_v23n5_495_f0006.png 이미지

Fig. 6 The curve of precision when rn = 0 and p is changed

HOJBC0_2019_v23n5_495_f0007.png 이미지

Fig. 7 The curve of precision when p=0.2 and rn is changed

HOJBC0_2019_v23n5_495_f0008.png 이미지

Fig. 8 The curve of precision when rn=0, p=1 and k is changed

HOJBC0_2019_v23n5_495_f0009.png 이미지

Fig. 9 The curve of precision when rn=0, k=30 and p is changed

HOJBC0_2019_v23n5_495_f0010.png 이미지

Fig. 10 The curve of precision when p=0, k=30 and rn is changed

Table. 1 Parameter selection result

HOJBC0_2019_v23n5_495_t0001.png 이미지

Table. 2 Experimental result when recommender number is 10

HOJBC0_2019_v23n5_495_t0002.png 이미지

Table. 3 Experimental result when recommender number is 20

HOJBC0_2019_v23n5_495_t0003.png 이미지

Table. 4 Experimental result when recommender number is 30

HOJBC0_2019_v23n5_495_t0004.png 이미지

References

  1. V. Bajpai, and Y. Yadav, "Survay Ppaer on Dynamic Recommendation System for e-Commerce," International Journal of Advanced Research in Computer Science [Online], vol. 9, no. 1, pp. 774-777, 2018. Available: http://www.ijarcs.info/index.php/Ijarcs/article/view/5503/4595 https://doi.org/10.26483/ijarcs.v9i1.5503
  2. I. E. Kartoglu, and M. W. Spratling, "Two collaborative filtering recommender systems based on sparse dictionary coding," in Knowledge and Information Systems, vol. 57, no. 3, pp. 709-720, 2018. https://doi.org/10.1007/s10115-018-1157-2
  3. W. Lu, F.-l. Chung, K. Lai, and L. Zhang, "Recommender system based on scarce information mining," Neural Networks, Elsevier, vol. 93, pp. 256-266, 2017. https://doi.org/10.1016/j.neunet.2017.05.001
  4. H. S. Moon, J. H. Yoon, and J. K. Kim, "The impact of information amount on the performance of recommender systems," in Proceedings of the 18th Annual International Conference on Electronic Commerce(ICEC 2016): e-Commerce in Smart connected World, Suwon, Republic of Korea: ACM New York, NY, Article no. 6, 2016.
  5. R. Heckel, and K. Ramchandran, "The Sample Complexity of Online One-Class Collaborative Filtering," Machine Learning (cs.LG) arXiv preprint arXiv:1706.00061, 2017 [Online]. Available: https://arXiv.org/abs/1706.00061.
  6. I. Jordanov, N. Petrov, and A. Petrozziello, "Classifiers Accuracy Improvement Based on Missing Data Imputation," Journal of Artificial Intelligence and Soft Computing Research(JAISCR), vol. 8, no. 1, pp. 31-48, 2018. https://doi.org/10.1515/jaiscr-2018-0002
  7. D. Li, C. Miao, S. Chu, J. Mallen, T. Yoshioka, and P. Srivastava, "Stable Matrix Approximation for Top-N Recommendation on Implicit Feedback Data," in Proceedings of the 51st Hawaii International Conference on System Sciences(HICSS-51), Waikoloa Village, HI: HICSS, pp. 1563-1572, Jan. 2018.
  8. X. Zhao, Z. Niu, K. Wang, K. Niu, and Z. Liu, "Improving top-N recommendation performance using missing data," Mathematical Problems in Engineering [Online], vol. 2015, Article ID 380472, 2015. Available: https://www.hindawi.com/journals/mpe/2015/380472/
  9. M. H. Abdi, G. O. Okeyo, and R. W. Mwangi, "Matrix Factorization Techniques for Context-Aware Collaborative Filtering Recommender Systems: A Survey," Computer and Information Science, Canadian Center of Science and Education, vol. 11, no. 2, pp. 1-10, 2018.
  10. B. Marlin, R. S. Zemel, S. Roweis, and M. Slaney, "Collaborative filtering and the missing at random assumption," Machine Learning (cs.LG) arXiv preprint arXiv:1206.5267, 2012 [Online]. Available: https://arXiv.org/abs/1206.5267.
  11. D. Jannach, and G. Adomavicius, "Recommendations with a purpose," in Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA: ACM New York, NY, pp. 7-10, 2016.
  12. Y. Koren, "Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model," in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Las Vegas, NV: ACM New York, NY, pp. 426-434, Aug. 2008.
  13. D.-K. Chae, S.-C. Lee, S.-Y. Lee, and S.-W. Kim, "On identifying k-nearest neighbors in neighborhood models for efficient and effective collaborative filtering," Neurocomputing, Elsevier, vol. 278, pp. 134-143, 2018. https://doi.org/10.1016/j.neucom.2017.06.081