- Volume 17 Issue 1
DOI QR Code
Time-Series based Dataset Selection Method for Effective Text Classification
효율적인 문헌 분류를 위한 시계열 기반 데이터 집합 선정 기법
- Received : 2016.11.07
- Accepted : 2016.12.19
- Published : 2017.01.28
As the Internet technology advances, data on the web is increasing sharply. Many research study about incremental learning for classifying effectively in data increasing. Web document contains the time-series data such as published date. If we reflect time-series data to classification, it will be an effective classification. In this study, we analyze the time-series variation of the words. We propose an efficient classification through dividing the dataset based on the analysis of time-series information. For experiment, we corrected 1 million online news articles including time-series information. We divide the dataset and classify the dataset using SVM and
- B. Croft, "Machine Learning and Information Retrieval," ICML '95, 1995.
- E. Jessica, "Forecast: Mobile Data Traffic, Worldwide, 2011-2018," Gartner, 2015.
- H. Chih and N. Kulathuramaiyer, "An empirical study of feature selection for text categorization based on term weightage," In Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, pp.599-602, 2004.
- D. Jeong, J. Kim, M. Hwang, S. Song, and H. Jung, "Classification Method by Integrating Feature PropertyMatrices for Large Scale Data," SMA, 2012.
- A. McCallum and K. Nigam, "A Comparison of Event Models for Naive Bayes Text Classification," AAAI '98, 1998.
- Irina Rish, An empirical study of the naive Bayes classifier, IBM Research Report, 2001.
- C. Cortes andV. Vapnik, "Support-Vector Net-works," Machine Learning, 제20권, 제3호, pp.273-297, 1995.
- B. E. Boser, I. M. Guyon, and V. N. Vapnik, "A training algorithm for optimal margin classifiers," COLT '92, 1992.
- H. Taira and M. Haruno, "Feature selection in SVM text categorization," AAAI, 1999.
- F. Colas and P. Brazdil, "Comparison of SVM and some older classification algorithms in text classification tasks," IFIP, 2006.
- Pascal Soucy and Guy W. Mineau, "Beyond TF -IDF Weighting for Text Categorization in the Vector Space Model," IJCAI, 제5권, pp.1130-1135, 2005.
- G. Forman, "BNS Feature Scaling: An Improved Representation over TF.IDF for SVM Text Classification," ACM, 2008.
- Yiming Yang and Jan O. Pedersen, "A comparative study on feature selection in text categorization," ICML, 제97권, pp.412-420, 1997.
- Saket S. R. Mengle and Nazli Goharian, "Ambiguity Measure Feature-Selection Algorithm," Journal of the American Society for Information Science and Technology, 제60권, 제5호, pp.1037-1050, 2009. https://doi.org/10.1002/asi.21023
- 정도헌, "최대 개념강도 인지기법을 이용한 데이터베이스 자동선택 방법에 관한 연구," 정보관리학회지, 제27권, 제3호, pp.265-281, 2010.
- J. Gim, Y. Jang, D. Jeong, and H. Jung, "Anayzing Email Patterns with Timelines on Researcher Data," JIST 2014, 2014.
- Derry Tanti Wijaya and Reyyan Yeniterzi, "Understanding Semantic Change of Words Over Centuries," DETECT, 2011.
- Do-Heon Jeong and Min Song, "Time gap analysis by he topic model-based temporal technique," Journal of Informetrics, 제8권, 제3호, pp.776-790, 2014. https://doi.org/10.1016/j.joi.2014.07.005
- 정도헌, 정창후, 김장원, 김태홍, 빅데이터 마이닝을 위한 점진적 학습 기술 개발, KISTI 성과보고서, 2015.