JOURNAL BROWSE
Search
Advanced SearchSearch Tips
Study on Extraction of Keywords Using TF-IDF and Text Structure of Novels
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Study on Extraction of Keywords Using TF-IDF and Text Structure of Novels
You, Eun-Soon; Choi, Gun-Hee; Kim, Seung-Hoon;
  PDF(new window)
 Abstract
With the explosive growth of information about books, there is a growing number of customers who find it difficult to pick a book. Against the backdrop, the importance of a book recommendation system becomes greater, through which appropriate information about books could be offered then to encourage customers to buy a book in the end. However, existing recommendation systems based on the bibliographical information or user data reveal the reliability issue found in their recommendation results. This is why it is necessary to reflect semantic information extracted from the texts of a book's main body in a recommendation system. Accordingly, this paper suggests a method for extracting keywords from the main body of novels, as a preceding research, by using TF-IDF method as well as the text structure. To this end, the texts of 100 novels have been collected then to divide them into four structural elements of preface, dialogue, non-dialogue and closing. Then, the TF-IDF weight of each keyword has been calculated. The calculation results show that the extraction accuracy of keywords improves by 42.1% in performance when more weight is given to dialogue while including preface and closing instead of using just the main body.
 Keywords
Keyword;TFIDF;Novel Structure;Book Recommendation System;Dialog Weight;
 Language
Korean
 Cited by
 References
1.
S. G. Lee, H.-J. Kim, "Keyword Extraction from News Corpus using Modified TF-IDF", The Journal of Society for e-Business Studies, Vol.14, No.4, pp.59-73, 2009

2.
G.-S. Go, W.-K. Jung, Y.-G. Shin, S.-S. Park and D.-S. Jang, "A Study on Development of Patent Information Retrieval Using Textmining", Journal of the Korea Academia-Industrial cooperation Society, Vol.12, No.8, pp.3677-3688, 2011 crossref(new window)

3.
P. Soucy, G. W. Mineau, "Beyond TFIDF weighting for text categorization in the vector space model" In IJCAI, Vol. 5, pp. 1130-1135, 2005

4.
O. Zamir, O. Etzioni, O. "Grouper: a dynamic clustering interface to Web search results", Computer Networks, Vol.31, No.11, pp.1361-1374, 1999 crossref(new window)

5.
J. Martineau, T. Finin, "Delta TFIDF: An Improved Feature Space for Sentiment Analysis", In Proceedings of the 3rd AAAI International Conference on Weblogs and Social Media, 2009

6.
J. Ramos, "Using tf-idf to determine word relevance in document queries", In Proceedings of the First Instructional Conference on Machine Learning, 2003

7.
S.-P. Jung, S.-H. Lim, J.-H. Jeon, B. M. Kim and H. A. Lee, "Web Search Result Clustering using Snippets", Journal of KISS: Databases, pp.321-331, 2012

8.
H.-G. Choi, S. J. Jun, and E.-J. Hwang, "Multi-Modal Scheme for Music Mood Classification", Korea Information Science Society, pp.259-262, 2011

9.
H.I. Shin, U.I Yun, H.M. Ryang and G.B. Pyun, "An analytical Study for Extracting Topic Words on Text Documents", Korean Society For Internet Information, Vol.2011, No.6, pp.133-134, 2011

10.
S.-H. Jang, S.-S. Kang, "Keyword - based Document Clustering Algorithm", Korea Information Science Society. Vol.29, No.1B, pp.469-471, 2002

11.
C.-H. Kim, Theory of the novel structure, Korean Studies Information, pp.16-17; 45-51; 203-204, 2010

12.
H. S. Kim, "Types, Discourse Functions of Quotation and Speech Presentation in Novel", The Journal of Language and Literature, pp.113-142, 2000

13.
www.kldp.net/projects/hannanum

14.
GunHee. Choi, H-S. An, J-S. Park, "Main body of the text books extraction research", Proceedings of the Korea Inteligent Information System Society Conference pp.191-193, 2014