Sentiment analysis on movie review through building modified sentiment dictionary by movie genre

영역별 맞춤형 감성사전 구축을 통한 영화리뷰 감성분석

  • Received : 2016.03.11
  • Accepted : 2016.04.11
  • Published : 2016.06.30


Due to the growth of internet data and the rapid development of internet technology, "big data" analysis is actively conducted to analyze enormous data for various purposes. Especially in recent years, a number of studies have been performed on the applications of text mining techniques in order to overcome the limitations of existing structured data analysis. Various studies on sentiment analysis, the part of text mining techniques, are actively studied to score opinions based on the distribution of polarity of words in documents. Usually, the sentiment analysis uses sentiment dictionary contains positivity and negativity of vocabularies. As a part of such studies, this study tries to construct sentiment dictionary which is customized to specific data domain. Using a common sentiment dictionary for sentiment analysis without considering data domain characteristic cannot reflect contextual expression only used in the specific data domain. So, we can expect using a modified sentiment dictionary customized to data domain can lead the improvement of sentiment analysis efficiency. Therefore, this study aims to suggest a way to construct customized dictionary to reflect characteristics of data domain. Especially, in this study, movie review data are divided by genre and construct genre-customized dictionaries. The performance of customized dictionary in sentiment analysis is compared with a common sentiment dictionary. In this study, IMDb data are chosen as the subject of analysis, and movie reviews are categorized by genre. Six genres in IMDb, 'action', 'animation', 'comedy', 'drama', 'horror', and 'sci-fi' are selected. Five highest ranking movies and five lowest ranking movies per genre are selected as training data set and two years' movie data from 2012 September 2012 to June 2014 are collected as test data set. Using SO-PMI (Semantic Orientation from Point-wise Mutual Information) technique, we build customized sentiment dictionary per genre and compare prediction accuracy on review rating. As a result of the analysis, the prediction using customized dictionaries improves prediction accuracy. The performance improvement is 2.82% in overall and is statistical significant. Especially, the customized dictionary on 'sci-fi' leads the highest accuracy improvement among six genres. Even though this study shows the usefulness of customized dictionaries in sentiment analysis, further studies are required to generalize the results. In this study, we only consider adjectives as additional terms in customized sentiment dictionary. Other part of text such as verb and adverb can be considered to improve sentiment analysis performance. Also, we need to apply customized sentiment dictionary to other domain such as product reviews.


Sentiment Analysis;Sentiment Dictionary;PMI;SO-PMI


  1. Adhitama P., S. H. Kim and I. S. Na, "Twitter Trending Topic Classification using Naive Bayes Classifier," Proceedings of the Korean Information Science Society Conference, Vol.40(2013), 879-881.
  2. An J. K. and H. W. Kim, "Building a Korean Sentiment Lexicon Using Collective Intelligence," Journal of Intelligent Information Systems, Vol.21, No.2(2015), 49-67.
  3. Chang J. Y., "A Sentiment Analysis Algorithm for Automatic Product Reviews Classification in On-Line Shopping Mall," The Journal of Society for e-Business Studies, Vol.14, No.4(2009), 19-33.
  4. Cho T. M., H. N. Cho, J. D. Lee and J. H. Lee, "TV Drama Rating Prediction based on Sentiment Analysis of Viewers' Comments," Proceedings of the Korean Institute of Intelligent Systems Conference, Vol.24, No.1 (2014), 83-84.
  5. Jin W., H. H. Ho and R. K. Srihari, "OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extraction," KDD Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining(2009), 1195-1204.
  6. Jo, E. K., "The Current State of Affairs of the Sentiment Analysis and Case Study Based on Corpus," The Journal of Linguistic Science, Vol.61(2012), 259-282.
  7. Jo H. J., J. H. Seo and J. T. Choi, "OAR Algorithm Technology Based on Opinion Mining Utilizing Stock News Contents," Journal of Korean Institute of Information Technology, Vol.13, No.2(2015), 111-119.
  8. Kim J. H., Y. J. Oh and S. H. Chae, "The Construction of a Domain-Specific Sentiment Dictionary Using Graph-based Semi-supervised Learning Method," Korean Journal of the Science of Emotion and Sensibility, Vol.18, No.4(2015), 97-104.
  9. Kim K. P. and Y. S. Kwon, "Performance Comparison of Naive Bayesian Learning and Centroid-Based Classification for e-Mail Classification," IE Interfaces Vol.18, No.1 (2005), 10-21.
  10. Kim S. W. and N. G. Kim, "A Study on the Effect of Using Sentiment Lexicon in Opinion Classification," Journal of Intelligent Information Systems, Vol.20, No.1(2014), 133-148.
  11. Lee K. B., J. B. Baik and S. W. Lee, "Estimating a Pleasure-Displeasure Index of Word based on Word Similarity in SNS," Journal of KIISE : Computing Practices and Letters, Vol.20, No.3(2014), 159-164.
  12. Oh S. H. and S. J. Kang, "Movie Retrieval System by Analyzing Sentimental Keyword from User's Movie Reviews," Journal of the Korea Academia-Industrial cooperation Society, Vol.14, No.3(2013), 1422-1427.
  13. Scaffidi C., K. Bierhoff, E. Chang, M. Felker, H. Ng, and C. Jin, "Red Opal: Product-Feature Scoring from Reviews," Proceedings of the 8th ACM conference on Electronic commerce(2007), 182-191.
  14. Seo J. H., H. J. Jo and J. T. Choi, "Design for Opinion Dictionary of Emotion Applying Rules for Antonym of the Korean Grammar," Journal of Korean Institute of Information Technology, Vol.13, No.2(2015), 109-117.
  15. Song J. S., and S. W. Lee, "Automatic Construction of Positive/Negative Feature-Predicate Dictionary for Polarity Classification of Product Reviews," Journal of KIISE: Software and Applications, Vol.38, No.3 (2013), 157-168.
  16. Song S. I., D. J. Lee and S. G. Lee, "Identifying Sentiment Polarity of Korean Vocabulary Using PMI," Proceedings of the Korean Information Science Society Conference, Vol.37, No.1(2010), 260-265.
  17. Turney P. D. and M.L. Littman, "Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus," National Research Council, Institute for Information Technology, Technical Report(2002), ERB-1094.
  18. Turney P. D., and M. L. Littman, "Measuring Praise and Criticism: Inference of Semantic Orientation from Association," ACM Transactions on Information Systems (TOIS), Vol.21, No.4(2003), 315-346.
  19. Yeon J. H., D. J. Lee, J. H. Shim and S. G. Lee, "Product Review Data and Sentiment Analytical Processing Modeling," The Journal of Society for e-Business Studies, Vol.16, No.4(2011), 125-137.
  20. Yu E. J., Y. S. Kim, N. Y. Kim and S. R. Jeong, "Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary," Journal of Intelligent Information Systems, Vol.19, No.1(2013), 95-10.

Cited by

  1. Public Sentiment Analysis of Korean Top-10 Companies: Big Data Approach Using Multi-categorical Sentiment Lexicon vol.22, pp.3, 2016,
  2. Methodology for Identifying Issues of User Reviews from the Perspective of Evaluation Criteria: Focus on a Hotel Information Site vol.22, pp.3, 2016,
  3. A Text Mining Analysis for Research Trend about Information and Communication Technology in Construction Automation vol.17, pp.6, 2016,