Efficient Design of Web Searching Robot Engine Using Distributed Processing Method with Javascript Function

Kim, Dae-Yu;Kim, Jung-Tae;

doi:10.6109/JKIICE.2009.13.12.2595

한국정보통신학회논문지 (Journal of the Korea Institute of Information and Communication Engineering)

제13권12호
/
Pages.2595-2602
/
2009
/
2234-4772(pISSN)
/
2288-4165(eISSN)

한국정보통신학회 (The Korea Institute of Information and Commucation Engineering)

DOI QR Code

자바스크립트 함수처리 기능을 포함한 분산처리 방식의 웹 수집 로봇의 설계

Efficient Design of Web Searching Robot Engine Using Distributed Processing Method with Javascript Function

김대유 (목원대학교) ;
김정태 (목원대학교)

발행 : 2009.12.31

https://doi.org/10.6109/JKIICE.2009.13.12.2595 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

본 논문에서는 기존의 웹 수집 로봇에서 처리 하지 못하는 자바스크립트 함수 링크를 처리하기 위하여 인터넷 익스플로러의 "Active Script Engine"을 사용하여 웹 로봇을 구현하였으며, 또한 자바스크립트 함수 링크를 처리하였을 경우 웹 수집 로봇의 수집량을 측정하기 위한 웹 수집 로봇을 개발하였다. 웹 수집 로봇을 개발하기 위해서 구글봇과 네이봇 등 웹 수집 로봇의 구조를 파악하여, 수집 로봇에 활용되는 구성요소를 구현하고 분산처리 형태의 웹 수집 로봇을 설계하였다. 또한 제안된 웹 로봇에 제안된 자바스크립트 처리 모델을 추가하여 성능평가를 하였으며, 성능평가방법은 자바스크립트를 사용하는 웹 사이트의 게시판을 대상으로 하여 웹 수집량을 비교 분석하였다. 웹 사이트 게시물 1000개인 경우, 일반 웹 로봇의 경우에는 1페이지밖에 수집하지 못하였고, 제안된 웹 로봇의 경우 1000개 이상의 웹 페이지를 수집하는 결과를 얻었다.

In this paper, we proposed and implemented web robot using active script engine with internet explore to process javascript function link, which is not processing in conventional web searching robot. This web searching robot is developed to measure collecting amount of web searching robot with processing of javascript function link. We analysed the architecture of web searching robot with google and naybot to develope web searching robot, implemented element of configuration applicable to searching robot and designed with distributed processing type. In addition to, we estimated the proposed web robot employing javascript processing model and analysed the comparison of collecting amount of web site board using javascript. We obtained the result of 1,000 web page collecting compared to conventional method in case of 1,000 web site board.

키워드

참고문헌

M. Gray, 'Internet Growth and Statistics: Credits and Background' http://www.mit.edu/people/mkgray/net/background.htm
Kwang Hyun Kim, 'A Methodology for Performance Evaluation of Web Robot, Korea Information Processing Society Vol. 11, No. 3, June 2004, pp.563-565
김광현, 이준호, '웹 로봇의 성능 평가를 위한 방법론', 정보처리학회, 제11D권, 제3호, 2006. pp.563-570
J. cho and H. Garcia-Molina, 'Parallel Crawler,' In Pro-ceedings of the 11th International World Wide Web Conference, Hawii, USA, 2002 , pp. 2-12
Beitzel et al., 2007 Beitzel, S. M., Jensen, E. C., Lewis, D. D., Chowdhury, A.,& Frieder, O. (2007). Automatic classification of Web queries using very large unlabeled query logs. ACM Transactions on Information Systems, 25(2), Article No. 9
J. Cho and H. Garcia-Molina, 'The Evolution of the Web and Implications for an Incremental Crawler,' In Proceedings of the 26th International Conference on Very large Databases, Cairo, Egypt, 2000, pp. 5-20
J. Cho, N. Shivakumar and H. Garcia-Molina, 'Finding replicated Web Collections,' In Proceedings of the ACMSIGMOD International Conference on Management of Data, dallas, Texas, 2000
A.Heydon and M. Najork, 'Mercator: A Scalable,: Extensible Web Crawler,' In Recordings of the 8th World Wide Web Conference, Toronto, Canada, 1999, pp. 2-7
M. Najork and A. Heydon, 'High-Performance Web Crawling,' SRC Research Report 173, Compaq Systems Research Center, 2001, pp. 2-8
Microsoft MSDN Library 'Internet Explorer Architecture'
Han Back Bae 'How to Use, TWebBrowser Object' http://www.delmadang.com/community/bbs_view.asp? bbsNo=3&bbsCat=43&indx=195149&keyword1=webbrowser&keyword2=

한국정보통신학회논문지 (Journal of the Korea Institute of Information and Communication Engineering)

자바스크립트 함수처리 기능을 포함한 분산처리 방식의 웹 수집 로봇의 설계

Efficient Design of Web Searching Robot Engine Using Distributed Processing Method with Javascript Function

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)