JOURNAL BROWSE
Search
Advanced SearchSearch Tips
An Optimal Weighting Method in Supervised Learning of Linguistic Model for Text Classification
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
An Optimal Weighting Method in Supervised Learning of Linguistic Model for Text Classification
Mikawa, Kenta; Ishida, Takashi; Goto, Masayuki;
  PDF(new window)
 Abstract
This paper discusses a new weighting method for text analyzing from the view point of supervised learning. The term frequency and inverse term frequency measure (tf-idf measure) is famous weighting method for information retrieval, and this method can be used for text analyzing either. However, it is an experimental weighting method for information retrieval whose effectiveness is not clarified from the theoretical viewpoints. Therefore, other effective weighting measure may be obtained for document classification problems. In this study, we propose the optimal weighting method for document classification problems from the view point of supervised learning. The proposed measure is more suitable for the text classification problem as used training data than the tf-idf measure. The effectiveness of our proposal is clarified by simulation experiments for the text classification problems of newspaper article and the customer review which is posted on the web site.
 Keywords
Text Classification;Weighting Method;Vector Space Model;Cosine Similarity;
 Language
English
 Cited by
1.
Business Model Mining: Analyzing a Firm's Business Model with Text Mining of Annual Report,;;

Industrial Engineering and Management Systems, 2014. vol.13. 4, pp.432-441 crossref(new window)
1.
Metric Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, 2015, 9, 1, 1  crossref(new windwow)
2.
Identifying Emerging Trends of Financial Business Method Patents, Sustainability, 2017, 9, 9, 1670  crossref(new windwow)
 References
1.
Aizawa, A. (2000), The Feature Quantity: An Information Theoretic Perspective of Tfidf-like Measures, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, 104-111.

2.
Aizawa, A. (2003), An Information-theoric perspective tf-idf Measure, Information Processing and Management, 39, 45-65. crossref(new window)

3.
Bishop, C. M. (2006), Pattern Recognition and Machine Learning, Springer-Verlag.

4.
Goto, M., Ishida, T., and Hirasawa, S. (2007), Statistical Evaluation of Measure and Distance on Document Classification Problems in Text Mining, IEEE International Conference on Computer and Information Technology, 674-679.

5.
Goto, M., Ishida, T., Suzuki, M., and Hirasawa, S. (2008), Asymptotic Evaluation of Distance Measure on High Dimensional Vector Space in Text Mining, International Symposium on Information Theory and its Applications.

6.
Hearst, M. A. (1999), Untangling text data mining, ACL '99 Proceedings, 3-10.

7.
Hofmann, T. (1999), Probabilistic Latent Semantic Indexing, Proceeding of the 22nd International Conference on Research and Development in Information Retrieval, 50-57.

8.
Manning, C. D., Raghavan, P., and Schuetze, H. (2008), Introduction to Information Retrieval, Cambridge University Press.

9.
McCallum, A. and Nigam, K. (1998), A Comparison of Event Models for Naive Bayes Text Classification, Proceeding of AAAI-98 Workshop on Learning for Text Categorization, 41-48.

10.
Mikawa, K., Ishida, T., and Goto, M. (2012), A Proposal of Extended Cosine Measure for Distance Metric Learning in Text Classification, Proceeding of 2011 IEEE International Conference on the Systems, Man, Cybernetics (SMC), 1741-1746.

11.
Nagata, M. (1994), A Stochastic Japanese morphological analyzer using a forward-DP backward-A* best search algorithm, Proceeding of the 15th International Conference on Computational Linguistics, 201-207.

12.
Salton, G. and Buckley, C. (1988), Term-Weighting Approaches in Automatic Text Retrieval, Information Processing and Management, 24(5), 513-523. crossref(new window)