Advanced SearchSearch Tips
Semi-supervised learning for sentiment analysis in mass social media
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Semi-supervised learning for sentiment analysis in mass social media
Hong, Sola; Chung, Yeounoh; Lee, Jee-Hyong;
  PDF(new window)
This paper aims to analyze user`s emotion automatically by analyzing Twitter, a representative social network service (SNS). In order to create sentiment analysis models by using machine learning techniques, sentiment labels that represent positive/negative emotions are required. However it is very expensive to obtain sentiment labels of tweets. So, in this paper, we propose a sentiment analysis model by using self-training technique in order to utilize "data without sentiment labels" as well as "data with sentiment labels". Self-training technique is that labels of "data without sentiment labels" is determined by utilizing "data with sentiment labels", and then updates models using together with "data with sentiment labels" and newly labeled data. This technique improves the sentiment analysis performance gradually. However, it has a problem that misclassifications of unlabeled data in an early stage affect the model updating through the whole learning process because labels of unlabeled data never changes once those are determined. Thus, labels of "data without sentiment labels" needs to be carefully determined. In this paper, in order to get high performance using self-training technique, we propose 3 policies for updating "data with sentiment labels" and conduct a comparative analysis. The first policy is to select data of which confidence is higher than a given threshold among newly labeled data. The second policy is to choose the same number of the positive and negative data in the newly labeled data in order to avoid the imbalanced class learning problem. The third policy is to choose newly labeled data less than a given maximum number in order to avoid the updates of large amount of data at a time for gradual model updates. Experiments are conducted using Stanford data set and the data set is classified into positive and negative. As a result, the learned model has a high performance than the learned models by using "data with sentiment labels" only and the self-training with a regular model update policy.
Twitter;Sentiment analysis;Semi-supervised learning;Self-training;SVM;
 Cited by
사회연결망상의 우위와 감성 표현과의 관계 분석: 알츠하이머 웹포럼의 적용,이민정;우지영;

한국컴퓨터정보학회논문지, 2015. vol.20. 6, pp.127-140 crossref(new window)
텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구,안주영;배정환;한남기;송민;

지능정보연구 , 2015. vol.21. 2, pp.69-92 crossref(new window)
B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up? Sentiment classification using machine learning techniques," In Proceeding of the ACL-02 conference on Empirical methods in natural language processing. Volume 10. Association for Computational Linguistics, pp. 79-86, 2002.

H. H. Kang, S. J. Yoo, and D. I. Han, "Design and Implementation of System for Classifying Review of Product Attribute to Positive/Negative," In proceeding of The 36th KIISE Fall Conference, vol. 36, no. 2, pp. 1-6, 2009.

A. Agarwal, B. Xie, I. Vovsha, O. Rambow, and R. Passonneau, "Sentiment analysis of twitter data," In Proceeding of the Workshop on Languages in Social Media. Association for Computational Linguistics, pp.30-38. 2011,

I. S. Kang, "A Comparative Study on Using SentiWordNet for English Twitter Sentiment Analysis," Journal of The Korean Institute of Intelligent System, vol. 23, no. 4, pp. 384-388, 2013. crossref(new window)

A. Hogenboom, D. Bal, F. Frasincar, M. Bal, F. de Jong, and U. Kaymak, "Exploiting Emoticons in Sentiment Analysis," In Proceeding of the 28th Annual ACM Symposium on Applied Computing ACM, pp. 703-710, 2013.

J. H. Yeon, D. J. Lee, J. H. Shim, and S. G. Lee, "Product Review Data and Sentiment Analytical Processing Modeling," The Journal of Society for e-Business Studies, vol. 16, no. 4, pp. 125-137, 2011. crossref(new window)

H. J. Yune, H. J. Kim, and J. Y. Chang, "An Eficient Search Method of Product Reviews using Opinion Mining Technique," The Journal of KIISE, vol. 16, no. 2, pp. 222-226, 2010.

C. CORTES, V. VAPNIK, "Support-vector networks," Machine learning, vol. 20, no. 3, pp. 273-297, 1995.

K. M. Kim, J. D. Lee, and J. H. Lee, "Sentiment Classification using Extracted Rationale Words by Genetic Algorithm," In Proceeding of the 14th International Symposium on Advanced Intelligent Systems, pp. 36-43, 2013.

H. G. Yeom, S. M. Park, J. J. Park, and K. B. Sim, "Superiority Demonstration of Variance-Considered Machines by Comparing Error Rate with Support Vector Machines," International Journal of Control, Automation, and Systems, vol. 9, no. 3, pp. 595-600, 2011. crossref(new window)

H. J. Lee, H. J. Shin, S. Z. Cho, and D. MacLachlan, "Semi-supervised response modeling," Journal of Interactive Marketing, vol. 24, no. 1, pp. 42-54, 2010. crossref(new window)

K. Soranaka, M. Matsushita, "Relationship Between Emotional Words and Emoticons in Tweets," In Proceeding of Technologies and Application of Artificial Intelligence, pp.262-265, 2012.

C. Li, K. Liu, and H. Wang, "The incremental learning algorithm with support vector machine based on hyperplane-distance," Applied Intelligence, pp.19-27, 2011.

Yun, "Evolution of big data - The future of IT services to resemble a human," Available:, 2013, [Accessed: August 1, 2014].