DOI QR코드

DOI QR Code

Study on prediction for a film success using text mining

텍스트 마이닝을 활용한 영화흥행 예측 연구

  • Lee, Sanghun (Onsol Communication) ;
  • Cho, Jangsik (Department of Information Statistics, Kyungsung University) ;
  • Kang, Changwan (Department of Data Information Science, Dongeui University) ;
  • Choi, Seungbae (Department of Data Information Science, Dongeui University)
  • 이상훈 ((주) 온솔 커뮤니케이션) ;
  • 조장식 (경성대학교 응용통계학과) ;
  • 강창완 (동의대학교 데이터정보학과) ;
  • 최승배 (동의대학교 데이터정보학과)
  • Received : 2015.07.21
  • Accepted : 2015.09.16
  • Published : 2015.11.30

Abstract

Recently, big data is positioning as a keyword in the academic circles. And usefulness of big data is carried into government, a local public body and enterprise as well as academic circles. Also they are endeavoring to obtain useful information in big data. This research mainly deals with analyses of box office success or failure of films using text mining. For data, it used a portal site 'D' and film review data, grade point average and the number of screens gained from the Korean Film Commission. The purpose of this paper is to propose a model to predict whether a film is success or not using these data. As a result of analysis, the correct classification rate by the prediction model method proposed in this paper is obtained 95.74%.

최근 빅 데이터는 학계에서 키워드로 자리매김을 하고 있다. 빅 데이터의 유용성은 학계뿐만 아니라 정부, 지자체 그리고 기업체까지 파급되고 있고, 빅 데이터 속에서 유용한 정보를 도출해 내기 위해 노력하고 있다. 본 연구에서는 영화에 대한 리뷰를 가지고 텍스트 마이닝 (text mining)을 이용한 빅 데이터 분석을 수행한다. 본 연구의 목적은 포털 사이트 'D'사와 영화진흥위원회의 영화에 대한 리뷰 데이터, 그리고 고객들의 평점평균 (score)과 스크린 수 (screen number)를 설명변수로 사용하고, 영화 흥행 여부를 종속변수로 하여 로지스틱 회귀분석을 통한 영화 흥행 예측 모형을 제안하는 것이다. 분석결과, 본 연구에서 제안한 예측모형의 정분류율은 95.74%로 얻어졌다.

Keywords

References

  1. An, S. W. and Cho, S. B. (2010). Stock prediction using news text mining and time series analysis. Journal of Computing Science and Engineering, 37, 77-82.
  2. Bae, K. Y., Park, J. H., Kim, J. S. and Lee, Y. S. (2013). Analysis of the abstracts of research articles in food related to climate change using a text-mining algorithm. Journal of the Korean Data & Information Science Society, 24, 1429-1437. https://doi.org/10.7465/jkdi.2013.24.6.1429
  3. Baek, G. I., Kim, K. K., Choi, S. B. and Kang, C. W. (2015). Prediction for the Films Success using Stylometry. Journal of the Korean Data Analysis Society, 17, 719-728.
  4. Chun, H. J. and Leem, B. H. (2014). Face/non-face channel fit comparison of life insurance company and non-life insurance company using social network analysis. Journal of the Korean Data & Information Science Society, 25, 1207-1219. https://doi.org/10.7465/jkdi.2014.25.6.1207
  5. Jung, K. H. (2010). A study of foresight method based on text mining and complexity network analysis. Korea Institute of S&T Evaluation and Planning, Seoul.
  6. Kang, B. U., Huh, M. K. and Choi, S. B. (2015). Performance analysis of volleyball games using the social network and text mining techniques. Journal of the Korean Data & Information Science Society, 26, 1-12. https://doi.org/10.7465/jkdi.2015.26.1.1
  7. Kim, K. H. and Oh, S. Y. (2009). Methodology for applying text mining techniques to analyzing online customer reviews for market segmentation. International Journal of Contents, 9, 272-284.
  8. Lee. J. Y. and Kim, H. J. (2014). Identification of major risk factors association with respiratory diseases by data mining. Journal of the Korean Data & Information Science Society, 25, 373-384. https://doi.org/10.7465/jkdi.2014.25.2.373
  9. Oh, S. W. and Jin, S. H. (2012). A study on analysis of internet shopping mall customers' reviews by text mining. Journal of the Korean Data Analysis Society, 14, 125-137.
  10. Oh, H. S., Cho, S. K., Kang, C. W. and Lim, D. S. (2010). Fashion Company's Claim Data Analysis Using Text Mining. Journal of the Korean Data Analysis Society, 12, 297-306.
  11. Park, H. W. and Lee, Y. O. (2009). A mixed text analysis of user comments on a portal site : The 'BBK Scandal' in the 2007 presidential election of south korea. Journal of the Korean Data Analysis Society, 11, 731-744.
  12. SAS Korea. (2010). G etting Started with SAS Text Miner 4.2., SAS Siftware Korea Ltd.
  13. Yu, E. J., Kim Y. S., Kim, N. K. and Jung, S. R. (2013). Predicting the direction of the stock index by using a domain-specific sentiment dictionary. Journal of intelligence and information systems, 19, 95-110. https://doi.org/10.13088/jiis.2013.19.1.095
  14. Yune, H. J., Kim, H. J. and Chang, J. Y. (2010). An efficient search method of product reviews using opinion mining techniques. Journal of Computing Science and Engineering, 16, 222-226.

Cited by

  1. Patent data analysis using clique analysis in a keyword network vol.27, pp.5, 2016, https://doi.org/10.7465/jkdi.2016.27.5.1273
  2. 소셜 빅데이터 분석과 기계학습을 이용한 영화흥행예측 기법의 실험적 평가 vol.17, pp.3, 2015, https://doi.org/10.7236/jiibc.2017.17.3.167
  3. 고차원 자료에서 영향점의 영향을 평가하기 위한 그래픽 방법 vol.28, pp.6, 2015, https://doi.org/10.7465/jkdi.2017.28.6.1291
  4. Response Analysis of Stop Smoking Campaign Webtoon Using Text Mining Technology: Focused on the “Tale of Cigarette” Comments vol.43, pp.1, 2015, https://doi.org/10.21032/jhis.2018.43.1.70
  5. 댓글이 음원 판매량에 미치는 차별적 영향에 관한 텍스트마이닝 분석 vol.19, pp.2, 2015, https://doi.org/10.15813/kmr.2018.19.2.005
  6. 인공지능 기반 수요예측 기법의 리뷰 vol.32, pp.6, 2015, https://doi.org/10.5351/kjas.2019.32.6.795