Advanced SearchSearch Tips
Sentiment analysis on movie review through building modified sentiment dictionary by movie genre
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Sentiment analysis on movie review through building modified sentiment dictionary by movie genre
Lee, Sang Hoon; Cui, Jing; Kim, Jong Woo;
  PDF(new window)
Due to the growth of internet data and the rapid development of internet technology, "big data" analysis is actively conducted to analyze enormous data for various purposes. Especially in recent years, a number of studies have been performed on the applications of text mining techniques in order to overcome the limitations of existing structured data analysis. Various studies on sentiment analysis, the part of text mining techniques, are actively studied to score opinions based on the distribution of polarity of words in documents. Usually, the sentiment analysis uses sentiment dictionary contains positivity and negativity of vocabularies. As a part of such studies, this study tries to construct sentiment dictionary which is customized to specific data domain. Using a common sentiment dictionary for sentiment analysis without considering data domain characteristic cannot reflect contextual expression only used in the specific data domain. So, we can expect using a modified sentiment dictionary customized to data domain can lead the improvement of sentiment analysis efficiency. Therefore, this study aims to suggest a way to construct customized dictionary to reflect characteristics of data domain. Especially, in this study, movie review data are divided by genre and construct genre-customized dictionaries. The performance of customized dictionary in sentiment analysis is compared with a common sentiment dictionary. In this study, IMDb data are chosen as the subject of analysis, and movie reviews are categorized by genre. Six genres in IMDb, `action`, `animation`, `comedy`, `drama`, `horror`, and `sci-fi` are selected. Five highest ranking movies and five lowest ranking movies per genre are selected as training data set and two years` movie data from 2012 September 2012 to June 2014 are collected as test data set. Using SO-PMI (Semantic Orientation from Point-wise Mutual Information) technique, we build customized sentiment dictionary per genre and compare prediction accuracy on review rating. As a result of the analysis, the prediction using customized dictionaries improves prediction accuracy. The performance improvement is 2.82% in overall and is statistical significant. Especially, the customized dictionary on `sci-fi` leads the highest accuracy improvement among six genres. Even though this study shows the usefulness of customized dictionaries in sentiment analysis, further studies are required to generalize the results. In this study, we only consider adjectives as additional terms in customized sentiment dictionary. Other part of text such as verb and adverb can be considered to improve sentiment analysis performance. Also, we need to apply customized sentiment dictionary to other domain such as product reviews.
Sentiment Analysis;Sentiment Dictionary;PMI;SO-PMI;
 Cited by
국내 주요 10대 기업에 대한 국민 감성 분석: 다범주 감성사전을 활용한 빅 데이터 접근법,김서인;김동성;김종우;

지능정보연구 , 2016. vol.22. 3, pp.45-69 crossref(new window)
사용자 리뷰의 평가기준 별 이슈 식별 방법론: 호텔 리뷰 사이트를 중심으로,변성호;이동훈;김남규;

지능정보연구 , 2016. vol.22. 3, pp.23-43 crossref(new window)
텍스트마이닝 기법을 활용한 정보통신기술 기반 건설자동화 연구동향 분석,임시영;김석;

한국건설관리학회논문집, 2016. vol.17. 6, pp.13-23 crossref(new window)
Public Sentiment Analysis of Korean Top-10 Companies: Big Data Approach Using Multi-categorical Sentiment Lexicon, Journal of Intelligence and Information Systems, 2016, 22, 3, 45  crossref(new windwow)
Methodology for Identifying Issues of User Reviews from the Perspective of Evaluation Criteria: Focus on a Hotel Information Site, Journal of Intelligence and Information Systems, 2016, 22, 3, 23  crossref(new windwow)
A Text Mining Analysis for Research Trend about Information and Communication Technology in Construction Automation, Korean Journal of Construction Engineering and Management, 2016, 17, 6, 13  crossref(new windwow)
Adhitama P., S. H. Kim and I. S. Na, "Twitter Trending Topic Classification using Naive Bayes Classifier," Proceedings of the Korean Information Science Society Conference, Vol.40(2013), 879-881.

An J. K. and H. W. Kim, "Building a Korean Sentiment Lexicon Using Collective Intelligence," Journal of Intelligent Information Systems, Vol.21, No.2(2015), 49-67. crossref(new window)

Chang J. Y., "A Sentiment Analysis Algorithm for Automatic Product Reviews Classification in On-Line Shopping Mall," The Journal of Society for e-Business Studies, Vol.14, No.4(2009), 19-33.

Cho T. M., H. N. Cho, J. D. Lee and J. H. Lee, "TV Drama Rating Prediction based on Sentiment Analysis of Viewers' Comments," Proceedings of the Korean Institute of Intelligent Systems Conference, Vol.24, No.1 (2014), 83-84.

Jin W., H. H. Ho and R. K. Srihari, "OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extraction," KDD Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining(2009), 1195-1204.

Jo, E. K., "The Current State of Affairs of the Sentiment Analysis and Case Study Based on Corpus," The Journal of Linguistic Science, Vol.61(2012), 259-282.

Jo H. J., J. H. Seo and J. T. Choi, "OAR Algorithm Technology Based on Opinion Mining Utilizing Stock News Contents," Journal of Korean Institute of Information Technology, Vol.13, No.2(2015), 111-119.

Kim J. H., Y. J. Oh and S. H. Chae, "The Construction of a Domain-Specific Sentiment Dictionary Using Graph-based Semi-supervised Learning Method," Korean Journal of the Science of Emotion and Sensibility, Vol.18, No.4(2015), 97-104.

Kim K. P. and Y. S. Kwon, "Performance Comparison of Naive Bayesian Learning and Centroid-Based Classification for e-Mail Classification," IE Interfaces Vol.18, No.1 (2005), 10-21.

Kim S. W. and N. G. Kim, "A Study on the Effect of Using Sentiment Lexicon in Opinion Classification," Journal of Intelligent Information Systems, Vol.20, No.1(2014), 133-148.

Lee K. B., J. B. Baik and S. W. Lee, "Estimating a Pleasure-Displeasure Index of Word based on Word Similarity in SNS," Journal of KIISE : Computing Practices and Letters, Vol.20, No.3(2014), 159-164.

Oh S. H. and S. J. Kang, "Movie Retrieval System by Analyzing Sentimental Keyword from User's Movie Reviews," Journal of the Korea Academia-Industrial cooperation Society, Vol.14, No.3(2013), 1422-1427. crossref(new window)

Scaffidi C., K. Bierhoff, E. Chang, M. Felker, H. Ng, and C. Jin, "Red Opal: Product-Feature Scoring from Reviews," Proceedings of the 8th ACM conference on Electronic commerce(2007), 182-191.

Seo J. H., H. J. Jo and J. T. Choi, "Design for Opinion Dictionary of Emotion Applying Rules for Antonym of the Korean Grammar," Journal of Korean Institute of Information Technology, Vol.13, No.2(2015), 109-117.

Song J. S., and S. W. Lee, "Automatic Construction of Positive/Negative Feature-Predicate Dictionary for Polarity Classification of Product Reviews," Journal of KIISE: Software and Applications, Vol.38, No.3 (2013), 157-168.

Song S. I., D. J. Lee and S. G. Lee, "Identifying Sentiment Polarity of Korean Vocabulary Using PMI," Proceedings of the Korean Information Science Society Conference, Vol.37, No.1(2010), 260-265.

Turney P. D. and M.L. Littman, "Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus," National Research Council, Institute for Information Technology, Technical Report(2002), ERB-1094.

Turney P. D., and M. L. Littman, "Measuring Praise and Criticism: Inference of Semantic Orientation from Association," ACM Transactions on Information Systems (TOIS), Vol.21, No.4(2003), 315-346. crossref(new window)

Yeon J. H., D. J. Lee, J. H. Shim and S. G. Lee, "Product Review Data and Sentiment Analytical Processing Modeling," The Journal of Society for e-Business Studies, Vol.16, No.4(2011), 125-137. crossref(new window)

Yu E. J., Y. S. Kim, N. Y. Kim and S. R. Jeong, "Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary," Journal of Intelligent Information Systems, Vol.19, No.1(2013), 95-10. crossref(new window)