DOI QR코드

DOI QR Code

Classification of ratings in online reviews

온라인 리뷰에서 평점의 분류

  • Choi, Dongjun (Department of Statistics, University of Seoul) ;
  • Choi, Hosik (Applied Information Statistics, Kyonggi University) ;
  • Park, Changyi (Department of Statistics, University of Seoul)
  • 최동준 (서울시립대학교 통계학과) ;
  • 최호식 (경기대학교 응용정보통계학과) ;
  • 박창이 (서울시립대학교 통계학과)
  • Received : 2016.06.29
  • Accepted : 2016.07.22
  • Published : 2016.07.31

Abstract

Sentiment analysis or opinion mining is a technique of text mining employed to identify subjective information or opinions of an individual from documents in blogs, reviews, articles, or social networks. In the literature, only a problem of binary classification of ratings based on review texts in an online review. However, because there can be positive or negative reviews as well as neutral reviews, a multi-class classification will be more appropriate than the binary classification. To this end, we consider the multi-class classification of ratings based on review texts. In the preprocessing stage, we extract words related with ratings using chi-square statistic. Then the extracted words are used as input variables to multi-class classifiers such as support vector machines and proportional odds model to compare their predictive performances.

감성분석 (sentiment analysis) 혹은 오피니언 마이닝 (opinion mining)은 블로그, 리뷰, 신문기사나 소셜네트워크 등의 문서에서 개인의 주관적인 정보 혹은 의견을 알아보는데 사용되는 텍스트 마이닝의 기법이다. 평점이 있는 온라인 리뷰에서 리뷰 텍스트에 기반한 평점의 분류문제에 대한 선행연구에서는 이진 분류만을 고려하였다. 그러나 긍정과 부정 외에도 중립적인 의견도 있을 수 있기 때문에 이진 분류보다는 다범주 분류가 더 적합할 것이다. 본 연구에서는 리뷰 텍스트에 기반한 평점의 다범주 분류문제를 고려한다. 전처리에서는 카이제곱 통계량을 이용하여 평점과 연관된 단어들을 추출하고 이를 입력변수로 삼아 지지벡터기계 (support vector machines)와 비례오즈 모형 (proportional odds model) 등 다범주 분류기의 예측력을 비교한다.

Keywords

References

  1. Agresti, A. (2002). Categorical data analysis, 2nd Ed., Wiley, New Jersey
  2. Bae, K. Y., Park, J.-H., Kim, J. S., and Chae, M., Kang, M., and Lee, Y.-S. (2013). Analysis of the abstracts of research articles in food related to climate change using a text-mining algorithm. Journal of the Korean Data & Information Science Society, 24, 1429-1437. https://doi.org/10.7465/jkdi.2013.24.6.1429
  3. Chae, M., Kang, M., and Kim, Y. (2013). Documents recommendation using large citation data. Journal of the Korean Data & Information Science Society, 24, 999-1011. https://doi.org/10.7465/jkdi.2013.24.5.999
  4. Hand, D. J. and Till, R. J. (2001). A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning, 45, 171-186. https://doi.org/10.1023/A:1010920819831
  5. Hsu, C.-W. and Lin, C.-J. (2002). A comparison of methods for multiclass support vector machines, IEEE Transactions on neural networks, 13, 415-425. https://doi.org/10.1109/72.991427
  6. Kim, K.-J. and Ahn, H.C. (2010). Customer level classification model usings ordinal multiclass support vector machines. Asia Pacific Journal of Information Systems, 20, 23-37.
  7. Kim, S. O., Lee, S. Y., Lee, S. J., and Lee, H. C. (2013). A study of development for movie recommendation system algorithm using filtering. Journal of the Korean Data & Information Science Society, 24, 803-813. https://doi.org/10.7465/jkdi.2013.24.4.803
  8. Kim, S. and Kim, N. (2014). A Study on the effect of using sentiment lexicon in opinion classification. Journal of Intelligence and Information Systems, 20, 133-148.
  9. Lee, H and Hong, T. (2015). Terms based sentiment classification for online review using support vector machine. Information Systems Review, 17, 49-64.
  10. Lee, H. and Suh, Y. (2014). Social media comparative analysis based on multidimensional scaling. Journal of the Korean Data & Information Science Society, 25, 665-676. https://doi.org/10.7465/jkdi.2014.25.3.665
  11. Liu, B. (2012). Sentiment analysis and opinion mining, Morgan & Claypool Publishers, San Bernardino, California.
  12. Munzert, S., Rubba, C., Meissner, P. and Nyhuis, D. (2015). Automated data collecction with R, Wiley, West Sussex, United Kingdom.
  13. Vapnik, V. (1995). The nature of statistical learning, Springer, New York.

Cited by

  1. Robust inference with order constraint in microarray study vol.25, pp.5, 2016, https://doi.org/10.29220/csam.2018.25.5.559