Advanced SearchSearch Tips
Study of Machine-Learning Classifier and Feature Set Selection for Intent Classification of Korean Tweets about Food Safety
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Study of Machine-Learning Classifier and Feature Set Selection for Intent Classification of Korean Tweets about Food Safety
Yeom, Ha-Neul; Hwang, Myunggwon; Hwang, Mi-Nyeong; Jung, Hanmin;
  PDF(new window)
In recent years, several studies have proposed making use of the Twitter micro-blogging service to track various trends in online media and discussion. In this study, we specifically examine the use of Twitter to track discussions of food safety in the Korean language. Given the irregularity of keyword use in most tweets, we focus on optimistic machine-learning and feature set selection to classify collected tweets. We build the classifier model using Naive Bayes & Naive Bayes Multinomial, Support Vector Machine, and Decision Tree Algorithms, all of which show good performance. To select an optimum feature set, we construct a basic feature set as a standard for performance comparison, so that further test feature sets can be evaluated. Experiments show that precision and F-measure performance are best when using a Naive Bayes Multinomial classifier model with a test feature set defined by extracting Substantive, Predicate, Modifier, and Interjection parts of speech.
Twitter;Tweets;Machine-learning Feature;Text Classification;
 Cited by
Androutsopoulos, I., Koutsias, J., Chandrinos, K. V., Paliouras, G., & Spyropoulos, C. D. (2000). An evaluation of Naive Bayesian anti-spam filtering. Proceedings of the Workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning, 9-17.

Chen, B. (2010). Chapter 6. Classification and prediction. Lecture Note Distributed in Data Mining CSCI 4370/5370 at Georgia State University, Retrieved June 2, 2014, from

Choi, D., Hwang, M., Kim, J., Ko, B., & Kim, P. (2014). Tracing trending topics by analyzing the sentiment status of Tweets. Computer Science and Information Systems, 11(1), 157-169. crossref(new window)

Drucker, H., Wu, D., & Vapnik, V. N. (1999). Support vector machines for spam categorization. IEEE Transactions on Neural Networks, 10(5), 1048-1054. crossref(new window)

Dustin, B. (2002). Introduction to support vector machines. Retrieved Jun 2, 2014, from

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1), 10-18. crossref(new window)

KAIST Semantic Web Research Center, (2011). Hannanum Korean Morphological Analyzer User Manual.

Lampos, V., Bie, T. D., & Cristianini, N. (2010). Flu detector-tracking epidemics on Twitter. Machine Learning and Knowledge Discovery in Databases, 6323, 599-602.

Lee, W., Kim, S., Kim, G., & Choi, K. (1999). Implementation of modularized morphological analyzer. Proceedings of Korean Institute of Information Scientists and Engineers: Special Interest Group on Human Language Technology, 123-136.

McCallum, A., & Nigam, K. (1998). A comparison of event models for Naive Bayes text classification. AAAI-98 Workshop on Learning for Text Categorization, 752, 41-48.

Paul, M. J., & Dredze, M. (2011). You are what you Tweet: Analyzing Twitter for public health. Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media.

Rokach, L., & Maimon, O. (2005). Decision trees. Data mining and knowledge discovery handbook. Springer, US, 165-192.

Tumasjan, A., Sprenger, T. O., Sandner, P. G., & Welpe, I. M. (2010). Predicting elections with Twitter: What 140 characters reveal about political sentiment. Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, 178-185.

Youn, S., & McLeod, D. (2007). A comparative study for email classification. Advances and Innovations in Systems, Computing Sciences and Software Engineering, Springer, Netherland, 387-391.