DOI QR코드

DOI QR Code

Prediction Model for Popularity of Online Articles based on Analysis of Hit Count

온라인 게시글의 조회수 분석을 통한 인기도 예측

  • 김수도 (부산대학교 사회급변현상연구소) ;
  • 조환규 (부산대학교 컴퓨터공학과)
  • Received : 2012.02.02
  • Accepted : 2012.03.21
  • Published : 2012.04.28

Abstract

Online discussion bulletin in Korea is not only a specific place where user exchange opinions but also a public sphere through which users discuss and form public opinion. Sometimes, there is a heated debate on a topic and any article becomes a political or sociological issue. In this paper, we propose how to analyze the popularity of articles by collecting the information of articles obtained from two well-known discussion forums such as AGORA and SEOPRISE. And we propose a prediction model for the article popularity by applying the characteristics of subject articles. Our experiment shown that the popularity of 87.52% articles have been saturated within a day after the submission in AGORA, but the popularity of 39% articles is growing after 4 days passed in SEOPRISE. And we observed that there is a low correlation between the period of popularity and the hit count. The steady increase of the hit count of an article does not necessarily imply the final hit count of the article at the saturation point is so high. In this paper, we newly propose a new prediction model called 'baseline'. We evaluated the predictability for popular articles using three models (SVM, similar matching and baseline). Through the results of performance evaluation, we observed that SVM model is the best in F-measure and precision, but baseline is the best in running time.

Keywords

Prediction;Popularity;Online Articles;Online Communities

Acknowledgement

Supported by : 한국연구재단

References

  1. 송경재, "네트워크 시대와 시민운동의 정치사회적 함의", 한국 인터넷 문화의 특성과 발전방안 심포지엄, pp.199-231, 2008.
  2. 이윤정, 지정훈, 우균, 조환규, "인터넷 게시물의 댓글 분석 및 시각화", 한국콘텐츠학회논문지, 제9권, 제7호, pp.45-56, 2009. https://doi.org/10.5392/JKCA.2009.9.7.045
  3. G. Szabo and B. A. Huberman, "Predicting the Popularity of Online Content," Communication of the ACM, Vol.53, No.8, pp.80-88, 2010. https://doi.org/10.1145/1787234.1787254
  4. N. Agarwal, H. Liu, L. Tang, and P. S. Yu, "Identifying the influential bloggers in a community," Proc. of Web Search and Data Mining, pp.207-218, 2008.
  5. 김수도, 김소라, 조환규, "웹게시판에서 가상온도를 이용한 게시글의 인기 예측", 한국콘텐츠학회논문지, 제11권, 제10호, pp.19-29, 2011. https://doi.org/10.5392/JKCA.2011.11.10.019
  6. K. Lerman, "Social Information Processing in Social News Aggregation," IEEE Internet Computing:special issue on Social Search, Vol.11, No.6, pp.16-28, 2007.
  7. DIGG, http://digg.com
  8. YouTube, http://www.youtube.com
  9. M. Cha, H. Kwak, P. Rodriguez, Y. Y. Ahn, and S. Moon, "Analyzing the video popularity characteristics of large-scale user generated content systems," IEEE/ACM Transaction on Networking, Vol.17, No.5, pp.1357-1370, 2009. https://doi.org/10.1109/TNET.2008.2011358
  10. F. Figueiredo, F. Benevenuto, and J. M. Almeida, "The tube over time : Characterizing popularity growth of youtube videos," Proc. of Web Search and Data Mining, pp.745-754, 2011.
  11. K. Lerman and T. Hogg, "Using a Model of Social Dynamics to Predict Popularity of News," Proc. of World Wide Web, pp.621-630, 2010.
  12. S. Jamali and H. Rangwala, "Digging Digg: Comment Mining, Popularity Prediction, and Social Network Analysis," Proc. of Web Information Systems Modeling, pp.32-38, 2009.
  13. J. G. Lee, S. Moon, and K. Salamatian, "An Approach to Model and Predict the Popularity of Online Conntents with Explanatory Factors," Proc. of Intelligent Agent Technology, Vol.1, pp.623-630, 2010.
  14. AGORA, http://bbs1.agora.media.daum.net/gaia/do/debate/list?bbsId=D003
  15. SEOPRISE, http://www.seoprise.com/board/list.php?table=seoprise_13
  16. S. D. Kim, S. H. Kim, and H. G. Cho, "Predicting the Virtual Temperature of Web-Blog Articles as a Measurement Tool for Online Popularity," Proc. of Computer and Information Technology, pp.449-454, 2011.
  17. S. D. Kim, S. Y. Kim, and H. G. Cho, "A model for popularity dynamics to predict hot articles in discussion blog," Proc. of Ubiquitous Information Management and Communication, 2012.
  18. H. Abdi, Kendall rank correlation. n.j. salkind(ed.), Encyclopedia of Measurement and Statistics, 2006.
  19. J. Durbin and A. Stuart, "Inversions and rank correlation coefficients," J. of Royal Statistical Society, Vol.13, No.2, pp.303-309, 1951.
  20. SVM light, http://svmlight.joachims.org
  21. B. Boucheham, "Reduced data similarity-based matching for time series patterns alignment", Pattern Recognition Letters, Vol.31, pp.629-638, 2010. https://doi.org/10.1016/j.patrec.2009.11.019
  22. Wikipedia, http://en.wikipedia.org/wiki