DOI QR코드

DOI QR Code

A Customized Tourism System Using Log Data on Hadoop

로그 데이터를 이용한 하둡기반 맞춤형 관광시스템

  • ;
  • 김강철 (전남대학교 전기전자통신컴퓨터공학부)
  • Received : 2017.12.19
  • Accepted : 2018.04.15
  • Published : 2018.04.30

Abstract

As the usage of internet is increasing, a lot of user behavior are written in a log file and the researches and industries using the log files are getting activated recently. This paper uses the Hadoop based on open source distributed computing platform and proposes a customized tourism system by analyzing user behaviors in the log files. The proposed system uses Google Analytics to get user's log files from the website that users visit, and stores search terms extracted by MapReduce to HDFS. Also it gathers features about the sight-seeing places or cities which travelers want to tour from travel guide websites by Octopus application. It suggests the customized cities by matching the search terms and city features. NBP(next bit permutation) algorithm to rearrange the search terms and city features is used to increase the probability of matching. Some customized cities are suggested by analyzing log files for 39 users to show the performance of the proposed system.

인터넷 사용이 증가함에 따라 많은 사용자 행위가 로그 파일에 기록되고, 최근에 이들을 이용한 연구와 산업이 활성화되고 있다. 본 논문은 오픈 소스 기반 분산 컴퓨팅 플랫폼인 하둡을 사용하고, 로그 파일에 기록된 사용자 행위를 분석하여 맞춤형 관광 정보를 제공하는 시스템을 개발한다. 제안된 시스템은 사용자들이 검색한 웹사이트로부터 로그 파일을 얻기 위하여 구글의 Analytics를 사용하고, 하둡의 MapReduce를 사용하여 검색 항목을 추출하여 HDFS에 저장한다. Octopus 프로그램을 사용하여 여행안내 웹사이트로부터 여행관련 관광지나 도시에 대한 정보를 모으고, MapReduce를 사용하여 관광지의 특징을 추출한다. 그리고 관광지의 특징과 사용자 검색항목을 매칭하여 사용자에게 관광하고 싶은 맞춤형 도시를 제안한다. 본 논문에서는 매칭의 확률을 높이기 위하여 NBP(next bit permutation)알고리즘을 사용하여 검색항목과 관광지 특징을 재정렬하는 기법을 도입한다. 그리고 개발된 시스템의 효용성을 확인하기 위하여 39 명의 사용자에 대한 로그 데이터를 분석하여 맞춤형 관광도시를 제안한다.

Keywords

References

  1. V. Mayer-Schonberger, K. Cukier, Big data : a revolution that will transform how we live, work, and think, New York: John Murray 2013.
  2. T. White, Hadoop - The Definitive Guide: Storage and Analysis at Internet Scale (2. ed.), San Diego California: DBLP 2011.
  3. D. Miner, A. Shook, MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems, California: O'Reilly Media 2012.
  4. Y. Chen, C. Yang, Y. Liau, C. Chang, P. Chen, and P. Cang. "User behavior analysis and commodity recommendation for point-earning apps," IEEE Technologies and Applications of Artificial Intelligence 2017, pp.170-177.
  5. H. Hingave and R. Ingle, "An approach for MapReduce based log analysis using Hadoop," IEEE Int. Conf. on Electronics and Communication Systems 2015, pp.1264-1268.
  6. Y. Peng and K. Yu, "User behavior analysis of automobile websites based on distributed computing and sequential pattern mining," IEEE Int. Conf. on Network Infrastructure and Digital Content, 2017.
  7. NextBitPermutation: graphics.stanford.edu/-seander/bithacks.html #NextBit Permutation
  8. T. Guoping and J. Sun, "User Behavior Analysis Based on Search Engine Log," New Technology of Library & Information Service China, 2015.
  9. Y. Liu, J. Miao, M. Zhang, S. Ma, and L. Ru, "How do users describe their information need: query recommendation based on snippet click model," Expert Systems with Applications vol. 38, 2011, pp.13847-13856.
  10. G. He, S. Ren, D. Yu, and X. Wu, "Analysis of Enterprise User Behavior on Hadoop", IEEE Int. Conf. on Intelligent Human-Machine Systems & Cybernetics Vol. 2, 2014, pp.230-233.
  11. K. Young-geun, K. Won-jung, K. Seung-Hyun, "The Design of Method for Efficient Processing of Small Files in the Distributed System based on Hadoop Framework," The Journal of The Korea Institute of Electronic Communication Sciences 2015, vol.10, no.10, pp.1115-1122. https://doi.org/10.13067/JKIECS.2015.10.10.1115
  12. B. Kotiyal, A. Kumar, B. Pant, and R. Houdar, "Big data: Mining of log file through hadoop," IEEE Int. Conf. on Human Computer Interactions, 2014, pp.1-7.
  13. S. K. Dewangan, S. Pandey, and T. Verma, "A distributed framework for event log analysis using MapReduce," IEEE Int. Conf. on Advanced Communication Control and Computing Technologies 2017, pp.503-506.
  14. K. Seung-Hyun, K. Won-jung, K. Young-geun, and J. min-hui, "Learning System for Big Data Analysis based on the Raspberry Pi Board," The Journal of The Korea Institute of Electronic Communication Sciences 2014, vol.9, no.7, pp.791-797. https://doi.org/10.13067/JKIECS.2014.9.7.791
  15. Alexa L. Mokalis, Joel J. Davis, Google Analytics Demystified (4th Edition), Swedish: Engelska 2018.
  16. China Williams, G. Bloom, Lonely Planet Southeast Asia on a Shoestring, Australia: Lonely Planet, 2014.