DOI QR Code
Design and Implementation of Web Crawler with Real-Time Keyword Extraction based on the RAKE Algorithm
- Zhang, Fei (Dept. of Computer and Software, Hanyang University) ;
- Jang, Sunggyun (Dept. of Computer and Software, Hanyang University) ;
- Joe, Inwhee (Dept. of Computer and Software, Hanyang University)
- Published : 2017.11.01
We propose a web crawler system with keyword extraction function in this paper. Researches on the keyword extraction in existing text mining are mostly based on databases which have already been grabbed by documents or corpora, but the purpose of this paper is to establish a real-time keyword extraction system which can extract the keywords of the corresponding text and store them into the database together while grasping the text of the web page. In this paper, we design and implement a crawler combining RAKE keyword extraction algorithm. It can extract keywords from the corresponding content while grasping the content of web page. As a result, the performance of the RAKE algorithm is improved by increasing the weight of the important features (such as the noun appearing in the title). The experimental results show that this method is superior to the existing method and it can extract keywords satisfactorily.