Advanced SearchSearch Tips
Study on Extraction of Keywords Using TF-IDF and Text Structure of Novels
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Study on Extraction of Keywords Using TF-IDF and Text Structure of Novels
You, Eun-Soon; Choi, Gun-Hee; Kim, Seung-Hoon;
  PDF(new window)
With the explosive growth of information about books, there is a growing number of customers who find it difficult to pick a book. Against the backdrop, the importance of a book recommendation system becomes greater, through which appropriate information about books could be offered then to encourage customers to buy a book in the end. However, existing recommendation systems based on the bibliographical information or user data reveal the reliability issue found in their recommendation results. This is why it is necessary to reflect semantic information extracted from the texts of a book`s main body in a recommendation system. Accordingly, this paper suggests a method for extracting keywords from the main body of novels, as a preceding research, by using TF-IDF method as well as the text structure. To this end, the texts of 100 novels have been collected then to divide them into four structural elements of preface, dialogue, non-dialogue and closing. Then, the TF-IDF weight of each keyword has been calculated. The calculation results show that the extraction accuracy of keywords improves by 42.1% in performance when more weight is given to dialogue while including preface and closing instead of using just the main body.
Keyword;TFIDF;Novel Structure;Book Recommendation System;Dialog Weight;
 Cited by
키워드 커뮤니티 네트워크의 소셜 네트워크 분석을 이용한 사물 인터넷 특허 분석,김도현;김현희;김동건;조진남;

응용통계연구, 2016. vol.29. 4, pp.719-728 crossref(new window)
S. G. Lee, H.-J. Kim, "Keyword Extraction from News Corpus using Modified TF-IDF", The Journal of Society for e-Business Studies, Vol.14, No.4, pp.59-73, 2009

G.-S. Go, W.-K. Jung, Y.-G. Shin, S.-S. Park and D.-S. Jang, "A Study on Development of Patent Information Retrieval Using Textmining", Journal of the Korea Academia-Industrial cooperation Society, Vol.12, No.8, pp.3677-3688, 2011 crossref(new window)

P. Soucy, G. W. Mineau, "Beyond TFIDF weighting for text categorization in the vector space model" In IJCAI, Vol. 5, pp. 1130-1135, 2005

O. Zamir, O. Etzioni, O. "Grouper: a dynamic clustering interface to Web search results", Computer Networks, Vol.31, No.11, pp.1361-1374, 1999 crossref(new window)

J. Martineau, T. Finin, "Delta TFIDF: An Improved Feature Space for Sentiment Analysis", In Proceedings of the 3rd AAAI International Conference on Weblogs and Social Media, 2009

J. Ramos, "Using tf-idf to determine word relevance in document queries", In Proceedings of the First Instructional Conference on Machine Learning, 2003

S.-P. Jung, S.-H. Lim, J.-H. Jeon, B. M. Kim and H. A. Lee, "Web Search Result Clustering using Snippets", Journal of KISS: Databases, pp.321-331, 2012

H.-G. Choi, S. J. Jun, and E.-J. Hwang, "Multi-Modal Scheme for Music Mood Classification", Korea Information Science Society, pp.259-262, 2011

H.I. Shin, U.I Yun, H.M. Ryang and G.B. Pyun, "An analytical Study for Extracting Topic Words on Text Documents", Korean Society For Internet Information, Vol.2011, No.6, pp.133-134, 2011

S.-H. Jang, S.-S. Kang, "Keyword - based Document Clustering Algorithm", Korea Information Science Society. Vol.29, No.1B, pp.469-471, 2002

C.-H. Kim, Theory of the novel structure, Korean Studies Information, pp.16-17; 45-51; 203-204, 2010

H. S. Kim, "Types, Discourse Functions of Quotation and Speech Presentation in Novel", The Journal of Language and Literature, pp.113-142, 2000


GunHee. Choi, H-S. An, J-S. Park, "Main body of the text books extraction research", Proceedings of the Korea Inteligent Information System Society Conference pp.191-193, 2014