References
- Kim, Dowoo, & Koo, Moung-Wan (2017). Categorization of Korean news articles based on convolutional neural network using Doc2Vec and Word2Vec. Journal of KIISE, 44(7), 742-747. https://doi.org/10.5626/JOK.2017.44.7.742
- Kim, Pan-Jun (2016). An analytical study on performance factors of automatic classification based on machine learning. Journal of Korean Society for Information Management, 33(2), 33-59. http://dx.doi.org/10.3743/KOSIM.2016.33.2.033
- Lee, Jae-Yun (2005). An empirical study on improving the performance of text categorization considering the relationships between feature selection criteria and weighting methods. Journal of the Korean Library and Information Science Society, 39(2), 123-146. https://doi.org/10.4275/KSLIS.2005.39.2.123
- Chung, Yung-Mee. (2012). Research in information retrieval (Rev. ed.). Seoul: Yonsei University Press.
- Jin, Seol A, & Song, Min (2016). Topic modeling based interdisciplinarity measurement in the informatics related journals. Journal of Korean Society for Information Management, 33(1), 7-32. http://doi.org/10.3743/KOSIM.2016.33.1.007
- Choi, Sanghee, & Lee, Jae-Yun (2012). Usability analysis of structured abstracts in journal articles for document clustering. Journal of Korean Society for Information Management, 29(1), 331-349. http://dx.doi.org/10.3743/KOSIM.2012.29.1.331
- Atlig, C., Reyyan, K. O. C., & Yigit, T. A. K. A. (2017). Learning-based classification of natural science articles. International Journal of Scientific Research in Information Systems and Engineering (IJSRISE), 2(3), 20-26. http://www.ijsrise.com/index.php/IJSRISE/article/view/52
- Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3(Feb), 1137-1155.
- Bhushan, S. B., Danti, A., & Fernandes, S. L. (2017). A novel integer representation based approach for classification of text documents. In Proceedings of the International Conference on Data Engineering and Communication Technology (pp. 557-564). Springer, Singapore.
- Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84. http://dx.doi.org/10.1145/2133806.2133826
- Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993-1022.
- Collobert, R., & Weston, J. (2008, July). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning (pp. 160-167). ACM. https://doi.org/10.1145/1390156.1390177
- Dai, A. M., Olah, C., & Le, Q. V. (2015). Document embedding with paragraph vectors. arXiv preprint arXiv:1507.07998.
- Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391-407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
- Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3(Mar), 1289-1305.
- Fuhr, N., & Buckley, C. (1991). A probabilistic learning approach for document indexing. ACM Transactions on Information Systems (TOIS), 9(3), 223-248. https://doi.org/10.1145/125187.125189
- Harter, S. P. (1975). A probabilistic approach to automatic keyword indexing. Part II. An algorithm for probabilistic indexing. Journal of the American Society for Information Science, 26(5), 280-289. https://doi.org/10.1002/asi.4630260504
- Hofmann, T. (2017, August). Probabilistic latent semantic indexing. In ACM SIGIR Forum (Vol. 51, No. 2, pp. 211-218). ACM.
- Hughes, M., Li, I., Kotoulas, S., & Suzumura, T. (2017). Medical text classification using convolutional neural networks. Stud Health Technol Inform, 235, 246-50.
- Jiang, S., Lewris, J., Voltmer, M., & Wang, H. (2016, April). Integrating rich document representations for text classification. In Systems and Information Engineering Design Symposium (SIEDS), 2016 IEEE (pp. 303-308). IEEE. https://doi.org/10.1109/sieds.2016.7489319
- John, G. H., Kohavi, R., & Pfleger, K. (1994). Irrelevant features and the subset selection problem. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 121-129). https://doi.org/10.1016/b978-1-55860-335-6.50023-4
- Koller, D., & Sahami, M. (1996). Toward optimal feature selection. Stanford InfoLab.
- Kusner, M., Sun, Y., Kolkin, N., & Weinberger, K. (2015, June). From word embeddings to document distances. In Proceedings of the 32nd International Conference on Machine Learning (pp. 957-966).
- Lau, J. H., & Baldwin, T. (2016). An empirical evaluation of doc2vec with practical insights into document embedding generation. arXiv preprint arXiv:1607.05368.
- Le, D. T., & Bernardi, R. (2012, July). Query classification using topic models and support vector machine. In Proceedings of ACL 2012 Student Research Workshop (pp. 19-24). Association for Computational Linguistics.
- Le, Q., & Mikolov, T. (2014, January). Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (pp. 1188-1196).
- Lewis, D. D. (1992, February). Feature selection and feature extraction for text categorization. In Proceedings of the workshop on Speech and Natural Language for Computational Linguistics. https://doi.org/10.3115/1075527.1075574
- Li, C., Wang, H., Zhang, Z., Sun, A., & Ma, Z. (2016, July). Topic modeling for short texts with auxiliary word embeddings. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval (pp. 165-174). ACM. https://doi.org/10.1145/2911451.2911499
- Lilleberg, J., Zhu, Y., & Zhang, Y. (2015, July). Support vector machines and word2vec for text classification with semantic features. In Cognitive Informatics & Cognitive Computing (ICCI* CC), 2015 IEEE 14th International Conference on (pp. 136-140). IEEE. https://doi.org/10.1109/icci-cc.2015.7259377
- Liu, Y., Liu, Z., Chua, T. S., & Sun, M. (2015, January). Topical word embeddings. In AAAI (pp. 2418-2424).
- Luhn, H. P. (1957). A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1(4), 309-317. https://doi.org/10.1147/rd.14.0309
- Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., & McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations (pp. 55-60). https://doi.org/10.3115/v1/p14-5010
- Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119).
- Mladenic, D., & Grobelnik, M. (1999). Predicting content from hyperlinks. In Proceedings of the ICML-99 Workshop on Machine Learning in Text Data Analysis, J. Stephan Institute.
- PubMed Central (2017). Retrieved from https://www.ncbi.nlm.nih.gov/pmc/
- Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. New York: McGraw-Hill. 24-51.
- Tang, D., Qin, B., & Liu, T. (2015). Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 1422-1432). https://doi.org/10.18653/v1/d15-1167
- Torkkola, K. (2004). Discriminative features for text document classification. Formal Pattern Analysis & Applications, 6(4), 301-308. https://doi.org/10.1007/s10044-003-0196-8
- Turian, J., Ratinov, L., & Bengio, Y. (2010, July). Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th annual meeting of the association for computational linguistics (pp. 384-394). Association for Computational Linguistics.
- Wadbude, R., Gupta, V., Mekala, D., Jindal, J., & Karnick, H. (2016). User bias removal in fine grained sentiment analysis. arXiv preprint arXiv:1612.06821.
- Wang, P., Xu, B., Xu, J., Tian, G., Liu, C. L., & Hao, H. (2016). Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing, 174, 806-814. https://doi.org/10.1016/j.neucom.2015.09.096
- Wang, S., & Manning, C. D. (2012, July). Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2 (pp. 90-94). Association for Computational Linguistics.
- Wang, Z., & Qian, X. (2008, December). Text categorization based on LDA and SVM. In Computer Science and Software Engineering, 2008 International Conference on (Vol. 1, pp. 674-677). IEEE. https://doi.org/10.1109/csse.2008.571
- Wei, X., & Croft, W. B. (2006, August). LDA-based document models for ad-hoc retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 178-185). ACM. https://doi.org/10.1145/1148170.1148204
- Xing, C., Wang, D., Zhang, X., & Liu, C. (2014, December). Document classification with distributions of word vectors. In Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific (pp. 1-5). IEEE.
- Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information retrieval, 1(1-2), 69-90. https://doi.org/10.1109/apsipa.2014.7041633 http://dx.doi.org/10.3743/KOSIM.2016.33.2.033