DOI QR코드

DOI QR Code

웹페이지에서의 상품 데이터 추출을 위한 동적, 정적 크롤링 비교 및 활용

Comparison and Application of Dynamic and Static Crawling for Extracting Product Data from Web Pages

  • 김상혁 (남서울대학교 전자공학과) ;
  • 김정훈 (남서울대학교 전자공학과) ;
  • 이승대 (남서울대학교 전자공학과)
  • 투고 : 2023.10.17
  • 심사 : 2023.12.27
  • 발행 : 2023.12.31

초록

본 논문에서는 소비자들이 편의점에서 진행 중인 행사상품에 대해 접근하기 쉬운 웹페이지를 제작하였다. 제작하는 과정에서 행사상품의 데이터를 추출하는 두 가지 크롤링 방식인 정적 크롤링과 동적 크롤링을 비교 및 활용하였다. 정적 크롤링은 홈페이지에서 정적인 데이터를 수집하는 추출 방식이고 동적 크롤링은 웹 페이지에서 동적으로 생성되는 페이지의 데이터를 수집하는 추출하는 방식이다. 두 크롤링에 대한 비교를 통해 행사상품 데이터를 추출하는 데에 있어 어떤 크롤링 방식이 더 효과적인 방식인지에 대해 연구하였다. 그 중 효과적인 정적 크롤링을 이용해 웹 페이지를 제작하였으며, 소비자들이 더 손쉽게 확인할 수 있도록 1+1, 2+1 상품들을 카테고리화 하였고 검색기능을 넣어 웹페이지를 제작하였다.

In this paper, a web page that is easy for consumers to access event products in progress at convenience stores was created. In the production process, static crawling and dynamic crawling, two crawling methods for extracting data from event products, were compared and used. Static crawling is an extraction method of collecting static data from a homepage, and dynamic crawling is a method of collecting data from pages dynamically generated from a web page. Through the comparison of the two crawlings, we studied which crawl method is more effective in extracting event product data. Among them, a web page was created using effective static crawling, and 1+1 and 2+1 products were categorized and a search function was added to create a web page.

키워드

참고문헌

  1. J. Choi and S. Kim, "Early Detection Assistance System for Rare Diseases based on Patient's Symptom Information," J. of the Korea Institute of Electronic Communication Sciences, vol. 18, no. 2, 2023, pp. 373-378.
  2. S. Yu and S. Park, "Avocado Classification and Shipping Prediction System based on Transfer Learning Model for Rational Pricing," J. of the Korea Institute of Electronic Communication Sciences, vol. 18, no. 2, 2023, pp. 329-335.
  3. J. Lee, "Building an SNS Crawling System Using Python," J. of the Korea Industrial Information Systems Research, vol. 23, no. 5, 2018, pp. 69-75.
  4. S. Kwon, J. Lee, and C. Lee, "A Study on the Legal Perception of Web Crawling in the Data Economy Era," Korean Journal of Industrial Security, vol. 11, no. 3, 2021, pp. 73-100. https://doi.org/10.33388/kais.2021.11.3.073
  5. J. Kim and E. Kim, "WCTT: Web Crawling System based on HTML Document Formalization," Korea Institute Of Information and Communication Engineering, vol. 26, no. 4, 2022, pp. 495-502.
  6. J. Jeon and S. Lee, "Trends on Standardizations of HTML5 based Web Platform Technology," Electronics and Telecommunications Research Institute, vol. 27, no. 4, 2012, pp. 83-95.
  7. H. Ko, M. Kim, S. Lee, and H. Lee, "Django based ChatBot System Using KakaoTalk API," The Korea internet of Things Society, vol. 4, no. 1, 2018, pp. 31-36.
  8. Y. Chi, S. Moon, E. Shin, and H. Kim, "Study on Effective Web Services for Data Acquisition, Analysis, and Visualization," Jeonbuk National University Cultural Convergence Research Center Archiving, vol. 4, no. 2, 2021, pp. 113-122.
  9. T. Wang, J. Song, D. Son, M. Kim, D. Choi, and J. Jang, "Web crawler Improvement and Dynamic process Design and Implementation for Effective Data Collection," J. of the Korea Institute of Information and Communication Engineering, vol. 26, no. 11, 2022, pp. 1729-1740.
  10. K. Kim and Y. Zhao, "System Development of the Traffic Accident Prediction using Weather," J. of the Korea Institute of Electronic Communication Sciences, vol. 16, no. 1, 2021, pp. 101-108.