DOI QR코드

DOI QR Code

Online news-based stock price forecasting considering homogeneity in the industrial sector

산업군 내 동질성을 고려한 온라인 뉴스 기반 주가예측

  • Seong, Nohyoon (KAIST College of Business, Korea Advanced Institute of Science and Technology (KAIST)) ;
  • Nam, Kihwan (College of Business, Hanyang University)
  • 성노윤 (한국과학기술원 경영대학 경영공학부) ;
  • 남기환 (한양대학교 경영대학 경영학부)
  • Received : 2017.11.10
  • Accepted : 2018.05.24
  • Published : 2018.06.30

Abstract

Since stock movements forecasting is an important issue both academically and practically, studies related to stock price prediction have been actively conducted. The stock price forecasting research is classified into structured data and unstructured data, and it is divided into technical analysis, fundamental analysis and media effect analysis in detail. In the big data era, research on stock price prediction combining big data is actively underway. Based on a large number of data, stock prediction research mainly focuses on machine learning techniques. Especially, research methods that combine the effects of media are attracting attention recently, among which researches that analyze online news and utilize online news to forecast stock prices are becoming main. Previous studies predicting stock prices through online news are mostly sentiment analysis of news, making different corpus for each company, and making a dictionary that predicts stock prices by recording responses according to the past stock price. Therefore, existing studies have examined the impact of online news on individual companies. For example, stock movements of Samsung Electronics are predicted with only online news of Samsung Electronics. In addition, a method of considering influences among highly relevant companies has also been studied recently. For example, stock movements of Samsung Electronics are predicted with news of Samsung Electronics and a highly related company like LG Electronics.These previous studies examine the effects of news of industrial sector with homogeneity on the individual company. In the previous studies, homogeneous industries are classified according to the Global Industrial Classification Standard. In other words, the existing studies were analyzed under the assumption that industries divided into Global Industrial Classification Standard have homogeneity. However, existing studies have limitations in that they do not take into account influential companies with high relevance or reflect the existence of heterogeneity within the same Global Industrial Classification Standard sectors. As a result of our examining the various sectors, it can be seen that there are sectors that show the industrial sectors are not a homogeneous group. To overcome these limitations of existing studies that do not reflect heterogeneity, our study suggests a methodology that reflects the heterogeneous effects of the industrial sector that affect the stock price by applying k-means clustering. Multiple Kernel Learning is mainly used to integrate data with various characteristics. Multiple Kernel Learning has several kernels, each of which receives and predicts different data. To incorporate effects of target firm and its relevant firms simultaneously, we used Multiple Kernel Learning. Each kernel was assigned to predict stock prices with variables of financial news of the industrial group divided by the target firm, K-means cluster analysis. In order to prove that the suggested methodology is appropriate, experiments were conducted through three years of online news and stock prices. The results of this study are as follows. (1) We confirmed that the information of the industrial sectors related to target company also contains meaningful information to predict stock movements of target company and confirmed that machine learning algorithm has better predictive power when considering the news of the relevant companies and target company's news together. (2) It is important to predict stock movements with varying number of clusters according to the level of homogeneity in the industrial sector. In other words, when stock prices are homogeneous in industrial sectors, it is important to use relational effect at the level of industry group without analyzing clusters or to use it in small number of clusters. When the stock price is heterogeneous in industry group, it is important to cluster them into groups. This study has a contribution that we testified firms classified as Global Industrial Classification Standard have heterogeneity and suggested it is necessary to define the relevance through machine learning and statistical analysis methodology rather than simply defining it in the Global Industrial Classification Standard. It has also contribution that we proved the efficiency of the prediction model reflecting heterogeneity.

주가 예측은 학문적으로나 실용적으로나 중요한 문제이기에, 주가 예측에 관련된 연구가 활발히 진행되었다. 빅 데이터 시대에 도입하면서, 빅 데이터를 결합한 주가 예측 연구도 활발히 진행되고 있다. 다수의 데이터를 기반으로 기계 학습을 이용한 연구가 주를 이룬다. 특히 언론의 효과를 접목한 연구 방법들이 주목을 받고 있는데, 그중 온라인 뉴스를 분석하여 주가 예측에 활용하는 연구가 주를 이루고 있다. 기존 연구들은 온라인 뉴스가 개별 회사에 대한 미치는 영향을 주로 살펴보았다. 또한, 관련성이 높은 기업끼리 서로 영향을 주는 것을 고려하는 방법도 최근에 연구되고 있다. 이는 동질성을 가지는 산업군에 대한 효과를 살펴본 것인데, 기존 연구에서 동질성을 가지는 산업군은 국제 산업 분류 표준에 따른다. 즉, 기존 연구들은 국제 산업 분류 표준으로 나뉜 산업군이 동질성을 가진다는 가정하에서 분석을 시행하였다. 하지만 기존 연구들은 영향력을 가지는 회사를 고려하지 못한 채 예측하였거나 산업군 내에서 이질성이 존재하는 점을 반영하지 못했다는 한계점을 가진다. 본 연구는 산업군 내에 이질성이 존재함을 밝히고, 이질성을 반영하지 못한 기존 연구의 한계점을 K-평균 군집 분석을 적용하여, 주가에 영향을 미치는 산업군의 동질적인 효과를 반영할 수 있는 방법론을 제안하였다. 방법론이 적합하다는 것을 증명하기 위해 3년간의 온라인 뉴스와 주가를 통해 실험한 결과, 다수의 경우에서 본 논문에서 제시한 방법이 좋은 결과를 나타냄을 확인할 수 있었으며, 국제 산업 분류 표준 산업군 내에서 이질성이 클수록 본 논문에서 제시한 방법이 좋은 효과를 보인다는 것을 확인할 수 있었다. 본 연구는 국제 산업 분류 표준으로 나누어진 기업들이 높은 동질성을 가지지 않는 다는것을 밝히고 이를 반영한 예측 모형의 효율성을 입증하였다는 점에서 의의를 가진다.

Keywords

References

  1. Aiolli, F., and M. Donini, "EasyMKL: a scalable multiple kernel learning algorithm," Neurocomputing, Vol. 169, (2015), 215-224. https://doi.org/10.1016/j.neucom.2014.11.078
  2. Arthur, D. and S. Vassilvitskii, "k-means++: the advantages of careful seeding". Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics Philadelphia, PA, USA. (2007), 1027-1035.
  3. Cherif, A., H. Cardot, and R. Bone, "SOM time series clustering and prediction with recurrent neural networks," Neurocomputing, Vol. 74, No. 11(2011), 1936-1944. https://doi.org/10.1016/j.neucom.2010.11.026
  4. Deng, S., T. Mitsubuchi, K. Shioda, T. Shimada, and A. Sakurai, "Combining technical analysis with sentiment analysis for stock price prediction," In Dependable, Autonomic and Secure Computing (DASC), 2011 IEEE Ninth International Conference on (2011), 800-807.
  5. Ester, M., H. P. Kriegel, J. Sander, and X. Xu, "A density-based algorithm for discovering clusters in large spatial databases with noise," In Kdd, Vol. 96, No. 34(1996), 226-231.
  6. Fung, G. P. C., J. X. Yu, and H. Lu, "The Predicting Power of Textual Information on Financial Markets," IEEE Intelligent Informatics Bulletin, Vol. 5, No. 1(2005), 1-10.
  7. Gidofalvi, G., and C. Elkan, "Using news articles to predict stock price movements," Department of Computer Science and Engineering, University of California, San Diego, (2001).
  8. Groth, S. S., and J. Muntermann, "An intraday market risk management approach based on textual analysis," Decision Support Systems, Vol. 50, No. 4(2011), 680-691. https://doi.org/10.1016/j.dss.2010.08.019
  9. Hagenau, M., M. Liebmann, and D. Neumann, "Automated news reading: Stock price prediction based on financial news using context-capturing features," Decision Support Systems, Vol. 55, No. 3(2013), 685-697. https://doi.org/10.1016/j.dss.2013.02.006
  10. Jain, A. K., "Data clustering: 50 years beyond K-means," Pattern recognition letters, Vol. 31, No. 8(2010), 651-666. https://doi.org/10.1016/j.patrec.2009.09.011
  11. Jain, A., S. V. Vishwanathan, and M. Varma, "SPF-GMKL: generalized multiple kernel learning with a million kernels," In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, (2012), 750-758.
  12. Jeong, J. S., D. S. Kim, and J. W. Kim, "Influence analysis of Internet buzz to corporate performance: Individual stock price prediction using sentiment analysis of online news", Journal of Intelligence and Information Systems, Vol. 21, No. 4 (2015), 37-51. https://doi.org/10.13088/JIIS.2015.21.4.037
  13. Kim, Y.-S., N.-G. Kim, and S.-R. Jeong, "Stock-Index Invest Model Using News Big Data Opinion Mining", Journal of Intelligence and Information Systems, Vol. 18, No. 2(2012), 143-156. https://doi.org/10.13088/JIIS.2012.18.2.143
  14. Lazarsfeld, P.F. and Henry, N.W., "Latent structure analysis", Boston: Houghton Miffli, (1968)
  15. Lee, D. J., J. H. Yeon, I. B. Hwang, and S. G. Lee, "KKMA: a tool for utilizing Sejong corpus based on relational database," Journal of KIISE: Computing Practices and Letters, Vol. 16, No. 11(2010), 1046-1050.
  16. Lee, M. and H. J. Lee, "Stock Price Prediction by Utilizing Category Neutral Terms: Text Mining Approach", Journal of Intelligence and Information Systems, Vol. 23, No. 2(2017), 123-138. https://doi.org/10.13088/jiis.2017.23.2.123
  17. Li, Q., T. Wang, P. Li, L. Liu, Q. Gong, and Y. Chen, "The effect of news and public mood on stock movements," Information Sciences, Vol. 278, (2014), 826-840. https://doi.org/10.1016/j.ins.2014.03.096
  18. Li, X., C. Wang, J. Dong, and F. Wang, "Improving stock market prediction by integrating both market news and stock prices," Database and Expert Systems Applications, Lecture Notes in Computer Science, Vol. 6861 (2011), 279-293.
  19. MacQueen, J., "Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability," Vol. 1, No. 14(1967) 281-297.
  20. Mittermayer, M., "Forecasting intraday stock price trends with text mining techniques," Proceedings of the 37th Annual Hawaii International Conference on System Sciences, (2004), 1-10.
  21. Motter, A. E., C. S. Zhou, and J. Kurths, "Enhancing complex-network synchronization," EPL(Europhysics Letters), Vol. 69, No. 3 (2005), 334. https://doi.org/10.1209/epl/i2004-10365-4
  22. Nassirtoussi, A.K., T.Y. Wah, S.R. Aghabozorgi, and D.N.C. Ling, "Text mining for market prediction: a systematic review," Expert Systems with Applications, Vol. 41, No. 16(2014), 7653-7670. https://doi.org/10.1016/j.eswa.2014.06.009
  23. Ng, R. T., and J. Han, "Efficient and effective clustering method for spatial data mining," In Proceedings of VLDB (1994), 144-155.
  24. Rousseeuw, P. J., "Silhouettes: a graphical aid to the interpretation and validation of cluster analysis," Journal of computational and applied mathematics, Vol. 20 (1987), 53-65. https://doi.org/10.1016/0377-0427(87)90125-7
  25. Schumaker, R. P., and H. Chen, "A quantitative stock prediction system based on financial news," Information Processing & Management, Vol. 45, No. 5(2009), 571-583. https://doi.org/10.1016/j.ipm.2009.05.001
  26. Shynkevich, Y., T. M. McGinnity, S. A. Coleman, and A. Belatreche, "Forecasting movements of health-care stock prices based on different categories of news articles using multiple kernel learning," Decision Support Systems, Vol. 85, (2016), 74-83. https://doi.org/10.1016/j.dss.2016.03.001
  27. Sun, Z., N. Ampornpunt, M. Varma, and S. Vishwanathan, "Multiple kernel learning and the SMO algorithm," In Advances in neural information processing systems, (2010), 2361-2369.
  28. Wang, F., L. Liu, and C. Dou, "Stock market volatility prediction: a service-oriented multi-kernel learning approach," 2012 IEEE Ninth International Conference on In Services Computing (SCC) (2012), 49-56.
  29. Yeh, C.-Y., C.-W. Huang, and S.-J. Lee, A multiple-kernel support vector regression approach for stock market price forecasting, Expert Systems with Applications, Vol. 38, No. 3(2011), 2177-2186. https://doi.org/10.1016/j.eswa.2010.08.004
  30. Zhai, Y., A. Hsu, and S. K. Halgamuge, "Combining news and technical indicators in daily stock price trends prediction," In Proceedings of the 4th international symposium on neural networks: advances in neural networks, Part III (2007), 1087-1096.
  31. Zhang, T., R. Ramakrishnan, and M. Livny, "BIRCH: an efficient data clustering method for very large databases," In ACM Sigmod Record Vol. 25, No. 2(1996), 103-114.