DOI QR코드

DOI QR Code

Comparative Usefulness of Naver and Google Search Information in Predictive Models for Youth Unemployment Rate in Korea

한국 청년실업률 예측 모형에서 네이버와 구글 검색 정보의 유용성 분석

  • Jung, Jae Un (Department of Management Information Systems, Dong-A University)
  • Received : 2018.06.15
  • Accepted : 2018.08.20
  • Published : 2018.08.28

Abstract

Recently, web search query information has been applied in advanced predictive model research. Google dominates the global web search market in the Korean market; however, Naver possesses a dominant market share. Based on this characteristic, this study intends to compare the utility of the Korean web search query information of Google and Naver using predictive models. Therefore, this study develops three time-series predictive models to estimate the youth unemployment rate in Korea using the ARIMA model. Model 1 only used the youth unemployment rate in Korea, whereas Models 2 and 3 added the Korean web search query information of Naver and Google, respectively, to Model 1. Compared to the predictability of the models during the training period, Models 2 and 3 showed better fit compared with Model 1. Models 2 and 3 correlated different query information. During predictive periods 1 (continuous with the training period) and 2 (discontinuous with the training period), Model 3 showed the best performance. During predictive period 2, only Model 3 exhibited a significant prediction result. This comparative study contributes to a general understanding of the usefulness of Korean web query information using the Naver and Google search engines.

최근 고급 예측모형 연구에 웹 검색 정보가 활용되고 있다. 세계 웹 검색시장에서 구글이 절대적 우위를 점하고 있지만, 국내 웹 검색시장에서는 네이버가 절대적 우위를 보이고 있다. 이러한 특성을 토대로 본 연구는 예측모형을 활용하여 구글과 네이버의 한국어 검색 정보에 대한 유용성을 비교해 보고자 한다. 이를 위해 ARIMA 모형을 활용하여 세 가지의 한국 청년실업률 예측 시계열 모형을 개발하였다. 모형1은 한국 청년실업률 데이터만 사용하였으며, 모형2와 3은 모형1에 네이버와 구글의 검색어 정보를 각각 추가하였다. 모형 훈련기간에서는 모형1보다 모형2와 3이 더 우수한 예측력을 보였다. 모형2와 3은 서로 다른 검색어 정보와 상관관계를 보였으며, 예측기간 1과 2에서 모형3이 가장 좋은 성능을 보였다. 예측기간 2에서는 모형 3만 유의미한 예측결과를 나타내었다. 이 비교 연구는 네이버와 구글 검색엔진을 이용한 한국어 웹 검색 정보의 유용성을 이해하는 데 도움을 준다.

Keywords

References

  1. B. J. Jansen. (2006). Search Log Analysis: What It Is, What's Been Done, How to Do It. Library & Information Science Research, 28, 407-432. DOI : 10.1016/j.lisr.2006.06.005
  2. S. P. Jun, T. E. Sung & H. W. Park. (2016). Forecasting by Analogy Using the Web Search Traffic. Technological Forecasting and Social Change, 115(1), 37-51. DOI : 10.1016/j.techfore.2016.09.014
  3. O. Y. Reiger. (2009). Search Engine Use Behavior of Students and Faculty: User Perceptions and Implications for Future Research. First Monday, 14(12). DOI : 10.5210/fm.v14i12.2716
  4. F. Wu, J. Madhavan & A. Halevy. (2011). Identifying Aspects for Web-Search Queries. Journal of Artificial Intelligence Research, 40(1), 677-700. DOI : 10.1613/jair.3182
  5. B. J. Jansen, Z. Liu, C. Weaver, G. Campbell & M. Gregg. (2011). Real Time Search on the Web: Queries, Topics, and Economic Value. Information Precessing and Management, 47(4), 491-506. DOI : 10.1016/j.ipm.2011.01.007
  6. J. Ginsberg, M. H. Mohebbi, R. S. Patel, L. Brammer, M. S. Smolinski & L. Brilliant. (2009). Detecting Influenza Epidemics Using Search Engine Query Data. Nature, 457, 1012-1014. DOI: 10.1038/nature07634
  7. N. Askitas & K. F. Zimmermann. (2009). Google Econometrics and Unemployment Forecasting. Applied Economics Querterly, 55(2), 107-120. DOI : 10.3790/aeq.55.2.107
  8. Trends Help. Google(Online). https://support.google.com/trends/answer/4365533?hl=en
  9. Y. Zhang, B. J. Jansen & A. Spink. (2009). Time Series Analysis of a Web Search Engine Transaction Log. Information Processing and Management, 45(1), 230-245. DOI : 10.1016/j.ipm.2008.07.003
  10. M. Schaefer, G. Sapi & S. Lorincz. (2018). The Effect of Big Data on Recommendation Quality. The Example of Internet Search. Berlin : Dusseldorf Institute for Competition Economics.
  11. Worldwide Desktop Market Share of Leading Search Engines from 2010 to April 2018. Statista(Online). https://www.statista.com/statistics/216573/worldwide-m arket-share-of-search-engines/
  12. China Search Engine Market in Q1 2018; 79% Driven by Mobile Search. China Internet Watch(Online). https://www.chinainternetwatch.com/24311/china-search-engine-market-q1-2018-79-driven-mobile-search/
  13. Debunking the Korean Search Engine Search Market Share in 2017. The Egg(Online). http://www.theegg.com/seo/korea/korean-search-engine-market-share-update-2017/
  14. K. Liu et. al. (2016). Using Baidu Search Index to Predict Dengue Outbreak in China. Scientific Reports, 6, Article No. 38040. DOI : 10.1038/srep38040
  15. C. M. Kwon, S. W. Hwang & J. U. Jung. (2015). Application of Web Query Information for Forecasting Korean Unemployment Rate. Journal of the Korea Society for Simulation, 24(2), 31-39. https://doi.org/10.9709/JKSS.2015.24.2.031
  16. G. Petris, S. Petrone & P. Campagnoli (2009). Dynamic Linear Models with R. NY : Springer.
  17. Statistics Korea(Online). http://www.index.go.kr
  18. C. M. Kwon & J. U. Jung. (2016). Forecasting Youth Unemployment in Korea with Web Search Queries. LNCS 9870, 3-14. DOI : https://doi.org/10.1007/978-3-319-46301-8_1
  19. G. E. P. Box, G. M. Jenkins, G. C. Reinsel & G. M. Ljung. (2015). Time Series Analysis: Forecasting and Control. Hoboken : Wiley.
  20. SARIMA. RDocumentation(Online). https://www.rdocumentation.org/packages/astsa/versions/1.8/topics/sarima
  21. Q. N. Tran & H. Arabnia (2015). Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology: Algorithms and Software Tools. MA : Morgan Kaufmann.
  22. Time Series Analysis. PennState Eberly College of Science(Online). https://newonlinecourses.science.psu.edu/stat510/node/67/
  23. J. Proppe & M. Reigher. (2017). Reliable Estimation of Prediction Uncertainty for Physicochemical Property Models. Journal of Chemical Theory and Computation, 13(7), 3297-3317. DOI : 10.1021/acs.jctc.7b00235
  24. S. Boslaugh. (2008). Encyclopedia of Epidemiology. Thousand Oaks : SAGE Publications.
  25. D. L. J. Alexander, A. Tropsah & D. A. Winkler. (2015). Beware of R2: Simple, Unambiguous Assessment of the Prediction Accuracy of QSAR and QSPR Models. Journal of Chemical Information and Modeling, 55(7), 1316-1322. DOI : 10.1021/acs.jcim.5b00206
  26. R. Pelanek. (2017). Measuring Predictive Performance of User Models: The Details Matter. Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization (pp. 197-201). NY : ACM.
  27. T. Chai & R. R. Draxler. (2014). Root Mean Square Error (RMSE) or Mean Absolute Error (MAE)? - Arguments against Avoiding RMSE in the Literature. Geoscientific Model Development, 7(3), 1247-1250. DOI : 10.5194/gmd-7-1247-2014
  28. What Is R Squared And Negative R Squared. Fairly Nearby(Online). http://www.fairlynerdy.com/what-is-r-squared/