Development of Sentiment Analysis Model for the hot topic detection of online stock forums

온라인 주식 포럼의 핫토픽 탐지를 위한 감성분석 모형의 개발

Hong, Taeho;Lee, Taewon;Li, Jingjing

  • Received : 2015.09.09
  • Accepted : 2016.03.16
  • Published : 2016.03.31


Document classification based on emotional polarity has become a welcomed emerging task owing to the great explosion of data on the Web. In the big data age, there are too many information sources to refer to when making decisions. For example, when considering travel to a city, a person may search reviews from a search engine such as Google or social networking services (SNSs) such as blogs, Twitter, and Facebook. The emotional polarity of positive and negative reviews helps a user decide on whether or not to make a trip. Sentiment analysis of customer reviews has become an important research topic as datamining technology is widely accepted for text mining of the Web. Sentiment analysis has been used to classify documents through machine learning techniques, such as the decision tree, neural networks, and support vector machines (SVMs). is used to determine the attitude, position, and sensibility of people who write articles about various topics that are published on the Web. Regardless of the polarity of customer reviews, emotional reviews are very helpful materials for analyzing the opinions of customers through their reviews. Sentiment analysis helps with understanding what customers really want instantly through the help of automated text mining techniques. Sensitivity analysis utilizes text mining techniques on text on the Web to extract subjective information in the text for text analysis. Sensitivity analysis is utilized to determine the attitudes or positions of the person who wrote the article and presented their opinion about a particular topic. In this study, we developed a model that selects a hot topic from user posts at China's online stock forum by using the k-means algorithm and self-organizing map (SOM). In addition, we developed a detecting model to predict a hot topic by using machine learning techniques such as logit, the decision tree, and SVM. We employed sensitivity analysis to develop our model for the selection and detection of hot topics from China's online stock forum. The sensitivity analysis calculates a sentimental value from a document based on contrast and classification according to the polarity sentimental dictionary (positive or negative). The online stock forum was an attractive site because of its information about stock investment. Users post numerous texts about stock movement by analyzing the market according to government policy announcements, market reports, reports from research institutes on the economy, and even rumors. We divided the online forum's topics into 21 categories to utilize sentiment analysis. One hundred forty-four topics were selected among 21 categories at online forums about stock. The posts were crawled to build a positive and negative text database. We ultimately obtained 21,141 posts on 88 topics by preprocessing the text from March 2013 to February 2015. The interest index was defined to select the hot topics, and the k-means algorithm and SOM presented equivalent results with this data. We developed a decision tree model to detect hot topics with three algorithms: CHAID, CART, and C4.5. The results of CHAID were subpar compared to the others. We also employed SVM to detect the hot topics from negative data. The SVM models were trained with the radial basis function (RBF) kernel function by a grid search to detect the hot topics. The detection of hot topics by using sentiment analysis provides the latest trends and hot topics in the stock forum for investors so that they no longer need to search the vast amounts of information on the Web. Our proposed model is also helpful to rapidly determine customers' signals or attitudes towards government policy and firms' products and services.


Sentiment Analysis;Opinion Mining;SVM;Hot topic;Online forums


  1. An, J. and H. Kim, "Building a Korean Sentiment Lexicon Using Collective Intelligence," Journal of Intelligence and Information Systems, Vol.21, No.2(2015), 49-67.
  2. Baccianella, S., A. Esuli, and F. Sebastiani, "SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining," Proceedings of the Seventh Conference on International Language Resources and Evaluation, Vol.10(2010), 2200-2204.
  3. Bollen, J., H. Mao, and X. Zeng, "Twitter mood predicts the stock market," Journal of Computational Science, Vol.2, No.1(2011), 1-8.
  4. Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees, Wadsworth, Belmont, 2008.
  5. Burges, C. J., "A tutorial on support vector machines for pattern recognition," Data Mining and Knowledge Discovery, Vol.2, No.2(1998), 121-167.
  6. Chen, L., L. Qi, and F. Wang, "Comparison of feature-level learning methods for mining online consumer reviews," Expert Systems with Applications, Vol.39(2012), 9588-9601.
  7. Fung, G. P. C., J. X. Yu, and W. Lam, "Stock prediction: Integrating text mining approach using real-time news," Proceedings of IEEE International Conference on Computational Intelligence for Financial Engineering, (2003), 395-402.
  8. Hartigan, J. A., Clustering Algorithms. John Wiley & Sons, Inc., 1975.
  9. Hong, T. and E. Kim, "Predicting the Response of Segmented Customers for the Promotion Using Data Mining," Information Systems Review, Vol.12, No.2(2010), 75-88.
  10. Hu, M. and B. Liu, "Mining Opinion Features in Customer Reviews," Proceedings of the 19th national conference on Artificial intelligence, (2004), 755-760.
  11. Huang, C. J., J. J. Liao, D. X. Yang, T. Y. Chang, and Y. C. Luo, "Realization of a news dissemination agent based on weighted association rules and text mining techniques," Expert Systems with Applications, Vol.37, No.9(2010), 6409-6413.
  12. Hu, N., I. Bose, N. S. Koh, and L. Liu, "Manipulation of online reviews: An analysis of ratings, readability, and sentiments," Decision Support Systems, Vol.52, No.3(2012), 674-684.
  13. Jin, F., N. Self, P. Saraf, P. Butler, W. Wang, and N. Ramakrishnan, "Forex-foreteller: Currency trend modeling using news articles," Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, (2013), 1470-1473.
  14. Jin, Y., J. Kim, and J. Kim, "Product Community Anlaysis Using Opinion Mining and Network Anlysis: Movie Performance Prediction Case," Journal of Intelligence and Information Systems, Vol.20, No.1(2014), 49-165.
  15. Kass, G., "An exploratory technique for investigating large quantities of categorical data," Applied Statistics, Vol.29(1980), 119-127.
  16. Kim, Y. M., S. J. Jeong, and S. J. Lee, " A Study on the Stock Market Prediction Based on Sentiment Analysis of Social Media," Entrue Journal of Information Technology, Vol.13, No.3(2014), 59-70.
  17. Li, N. and D. D. Wu, "Using text mining and sentiment analysis for online forums hotspot detection and forecast," Decision Support Systems, Vol.48, No.2(2010), 354-368.
  18. Liu, B., "Sentiment Analysis and Opinion Mining," Synthesis Lectures on Human Language Technologies, Vol.5, No.1(2012), 1-167.
  19. Maks, I. and P. Vossen, "A lexicon model for deep sentiment analysis and opinion mining applications," Decision Support Systems, Vol.53, No.4(2012), 680-688.
  20. Martin-Valdivia, M. T., E. Martinez-Camara, J. M. Perea-Ortega, and L. A. Urena-Lopez, "Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches," Expert Systems with Applications, Vol.40, No.10(2013), 3934-3942.
  21. Medhat, W., A. Hassan, and H. Korashy, "Sentiment analysis algorithms and application: A survey," Ain Shams Engineering Journal, Vol.5(2014), 1093-1113.
  22. Oh, S.-H. and S.-J. Kang, "Movie Retrieval System by Analyzing Sentimental Keyword from User's Movie Reviews," Journal of the Korea Academia-Industrial, Vol.14, No.3(2013), 1422-1427.
  23. Pang, B. and L. Lee, "Opinion Mining and Sentiment Analysis," Foundations and Trends in Information Retrieval, Vol.2, No.1-2(2008), 1-135.
  24. Pang, B., L. Lee, and S. Vaithyanathan, "Thumbs up?: sentiment classification using machine learning techniques," Proceedings of the ACL-02 conference on Empirical methods in natural language processing, Vol.10(2002), 79-86.
  25. Park, H. and K. H. Cho, "CHAID Algorithm by Cubebased Proportional Sampling," Journal of Korean Data & Information Science Society, Vol.15, No.4(2004), 803-816.
  26. Quinlan, J. R., "Induction of Decision Trees," Machine Learning, Vol.1, No.1(1986), 81-106.
  27. Quinlan, J. R., C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, California, 1993.
  28. Schumaker, R. P., Y. Zhang, C. N. Huang, and H. Chen, "Evaluating sentiment in financial news articles," Decision Support Systems, Vol.53, No.3(2012), 458-464.
  29. Tan, S. and J. Zhang, "An empirical study of sentiment analysis for chinese documents," Expert Systems with Applications, Vol.34, No.4(2008), 2622-2629.
  30. Turney, P. D. and M. L. Littman, "Measuring Praise and Criticism: Inference of Semantic Orientation from Association," ACM Transactions on Information Systems(TOIS), Vol.21, No.4(2003), 315-346.
  31. Vapnik, V., The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995.
  32. Wang, G., J. Sun, J. Ma, K. Xu, and J. Gu, "Sentiment classification: The contribution of ensemble learning," Decision Support Systems, Vol.57(2014), 77-93.
  33. Yu, E., Y. Kim, N. Kim, and S. Jeong, "Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary," Journal of Intelligence and Information Systems, Vol.19, No.1(2013), 95-110.
  34. Zhang, C., D. Zeng, J. Li, F. Y. Wang, and W. Zuo, "Sentiment analysis of Chinese documents: From sentence to document level," Journal of the American Society for Information Science and Technology, Vol.60, No.12(2009), 2474-2487.


Supported by : 한국연구재단