DOI QR코드

DOI QR Code

Movie Box-office Prediction using Deep Learning and Feature Selection : Focusing on Multivariate Time Series

  • Byun, Jun-Hyung (Dept. of Industrial Management Engineering, Korea University) ;
  • Kim, Ji-Ho (Dept. of Industrial Management Engineering, Korea University) ;
  • Choi, Young-Jin (Dept. of Industrial Management Engineering, Korea University) ;
  • Lee, Hong-Chul (Dept. of Industrial Management Engineering, Korea University)
  • Received : 2020.05.04
  • Accepted : 2020.06.01
  • Published : 2020.06.30

Abstract

Box-office prediction is important to movie stakeholders. It is necessary to accurately predict box-office and select important variables. In this paper, we propose a multivariate time series classification and important variable selection method to improve accuracy of predicting the box-office. As a research method, we collected daily data from KOBIS and NAVER for South Korean movies, selected important variables using Random Forest and predicted multivariate time series using Deep Learning. Based on the Korean screen quota system, Deep Learning was used to compare the accuracy of box-office predictions on the 73rd day from movie release with the important variables and entire variables, and the results was tested whether they are statistically significant. As a Deep Learning model, Multi-Layer Perceptron, Fully Convolutional Neural Networks, and Residual Network were used. Among the Deep Learning models, the model using important variables and Residual Network had the highest prediction accuracy at 93%.

박스 오피스 예측은 영화 이해관계자들에게 중요하다. 따라서 정확한 박스 오피스 예측과 이에 영향을 미치는 주요 변수를 선별하는 것이 필요하다. 본 논문은 영화의 박스 오피스 예측 정확도 향상을 위해 다변량 시계열 데이터 분류와 주요 변수 선택 방법을 제안한다. 연구 방법으로 한국 영화 일별 데이터를 KOBIS와 NAVER에서 수집하였고, 랜덤 포레스트(Random Forest) 방법으로 주요 변수를 선별하였으며, 딥러닝(Deep Learning)으로 다변량 시계열을 예측하였다. 한국의 스크린 쿼터제(Screen Quota) 기준, 딥러닝을 이용하여 영화 개봉 73일째 흥행 예측 정확도를 주요 변수와 전체 변수로 비교하고 통계적으로 유의한지 검정하였다. 딥러닝 모델은 다층 퍼셉트론(Multi-Layer Perceptron), 완전 합성곱 신경망(Fully Convolutional Neural Networks), 잔차 네트워크(Residual Network)로 실험하였다. 결과적으로 주요 변수를 잔차 네트워크에 사용했을 때 예측 정확도가 약 93%로 가장 높았다.

Keywords

References

  1. Ikkim, Kmchun and Hlee, "The Effect of Professiona l Critics' Reviews on Online User Reviews and Box Office: US Motion Picture Industry, 2006-2008," Korean Journal of Management, Vol. 20, No. 3, pp. 1-27, June 2012. https://doi.org/10.11568/kjm.2012.20.1.001
  2. Policy Research Team, "2013 Korean Film Industry Settlement," Korean Film Council, pp. 13, 2014. https://www.kofic.or.kr/
  3. Yjjung and Hspark, “The impacts of screen quota in the screen industry,” Journal of The Korea Society of Computer and Information, Vol. 14, No. 12, pp. 217-223, Dec. 2009.
  4. Sykim, Shim and Ysjung, “A Comparison Study of the Determinants of Performance of Motion Picture s : Art Film vs. Commercial Film,” Journal of the Korea Contents Association, Vol. 10, No. 2, pp. 381-393, Feb. 2010. https://doi.org/10.5392/JKCA.2010.10.2.381
  5. Chyoon and Hdkim, “The Impact of Vertical Integration on the Conducts of Multiplex Theaters in the Korean Movie Industry,” Review of Culture & Economy, Vol. 15, No. 2, pp. 127-149, Aug. 2012.
  6. Mhheo, Pskang and Sjcho, "Predicting Box-office with Opinion mining reviews," Korean Institute Of Industrial Engineers, pp. 487-500, 295, The Ocean Resort, Republic of Korea, May 2013.
  7. NAVER Wikipedia, https://terms.naver.com
  8. Chroh, “A Study on the Distribution and Screening of Big Budget Movies in Korean Film Industry:Focus on the Ten Million Audiences’ Movies in 2010s,” Asian Cinema Studies, Vol. 12, No. 2, pp. 49-76, July. 2019.
  9. Bschon, Sbpark and Arjo, "The Effects of Movie Stars on Box-Office Performances," The Journal of Image and Cultural Contents, Vol. 18, pp. 363-389, Oct. 2019. DOI: 10.24174/jicc.2019.10.18.363
  10. Sycho, Hkkim, Bskim and Hwkim, “Predicting Movie Revenue by Online Review Mining: Using the Opening Week Online Review,” Information Systems Review, Vol. 16, No. 3, pp. 113-134, Dec. 2014. DOI: 10.14329/isr.2014.16.3.113
  11. Ynhwang, Yjnam, "An Empirical Study on the Relationship between the Online WOMs and the Number of Audience of Successful Fims," Journal of The Korea Contents Association, Vol. 19, No. 5 pp. 147-162, May. 2019. DOI: 10.5392/JKCA.2019.19.05.147
  12. Shjeon and Ysson, "Prediction of box office using data mining," The Korean Journal of Applied Statistics, Vol. 29, No. 7, pp. 1257-1270, Oct. 2016. DOI: 10.5351/KJAS.2016.29.7.1257
  13. Jmlee and Gglim "A Study on the Machine Learning Technique for the Prediction of the first week opening box office Using key Variable Method and Decision Tree," Hanyang University, pp. 1-60, Republic of Korea, Feb. 2018.
  14. Hyjeong and Hjyang, "Predicting Financial Success of a Movie Using Multiple Regression Analysis," Proceedings of the Korean Society of Computer Information Conference, pp. 275-278, Pyeongtaek University, Republic of Korea, July 2013.
  15. Jasong, Khchoi and Gwkim, “Development of New Variables Affecting Movie Success and Prediction of Weekly Box Office using Them Based on Machine Learning,” Journal of Korea Intelligence Information Systems Society, Vol. 24, No. 4, pp. 67-83, Dec. 2018. DOI: 10.13088/jiis.2018.24.4.067
  16. Hdkim, “The Success of Animation in Korean Film Industry : An Exploratory Analysis,” Journal of The Korean Society of Computer and Information, Vol. 19, No. 12, pp. 57-70, Dec. 2014. DOI: 10.9708/jks ci.2014.19.12.057
  17. Swbae and Jsyu, “Predicting the Real Estate Price Index Using Machine Learning Methods and Time Series Analysis Model,” Housing Studies Review, Vol. 26, No. 1, pp. 107-133, Feb. 2018. DOI: 10.24 957/hsr.2018.26.1.107
  18. F. Li, G. Li, Swhwang, B. Yao and Z. Zhang, "Web-Age Information Management 2014," Springer, pp. 298-310, 2014.
  19. H. Fawaz, G. Forestier, J. Weber, L. Idoumghar and P. Muller, "Deep learning for time series classification:a review," Data Mining and Knowl edge Discovery, Vol. 33, pp. 917-963, March 2019. DOI: 10.1007/s10618-019-00619-1
  20. Bhku, Gtkim, Jkmin and Hsko, “Deep Convolutional Neural Network with Bottleneck Structure using Raw Seismic Waveform for Earthquake Classification,” Journal of The Korea Society of Computer and Information, Vol. 24, No. 1, pp. 33-39, Jan. 2019. DOI:10.9708/jksci.2019.24.01.033
  21. Jhcho, Lslee, “Cleaning Noises from Time Series Data with Memory Effects,” Journal of The Korea Society of Computer and Information, Vol. 25, No. 4, pp. 37-45, Apr. 2020. DOI: 10.9708/jksci.2020.25.04.037
  22. K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, 2016.
  23. KOBIS, KOREA Box-office Information System, https://kobis.or.kr
  24. Jpyu and Ehlee, "A Model of Predictive Movie 10 Million Spectators through Big Data Analysis", The Korea Journal of BigData, Vol. 3, No. 1, pp. 63-71, Aug. 2018. https://doi.org/10.36498/KBIGDT.2018.3.1.63
  25. NAVER, https://www.naver.com
  26. NAVER Movie, https://movie.naver.com
  27. Wscho, "Use of Machine Learning Models in the Search for New Physics," Physics and High Technology, Vol. 26, pp. 4-19, Dec. 2017. DOI: 10.3938/PhiT.26.046.
  28. Yjyi, “Testing Main Effects in Interactive Multiple Regression,” Korean Academic Society Of Business Administration, Vol. 23, No. 4, pp. 183-210, Nov. 1994.
  29. Yiseo, Ehjeong and Djkim, “Deep Learning based Scrapbox Accumulated Status Measuring,” Journal of The Korea Society of Computer and Information, Vol. 25, No. 3, pp. 27-32, Mar. 2020. DOI: 10.9708/jksci.2020.25.03.027
  30. Twkim, Jhkim and Hsmoon, “The Study on The Identification Model of Friend or Foeon Helicopter by using Binary Classification with CNN,” Journal of The Korea Society of Computer and Information, Vol. 25, No. 3, pp. 33-42, Mar. 2020. DOI: 10.9708/jksci.2020.25.03.033