DOI QR코드

DOI QR Code

Two-Phase Clustering Method Considering Mobile App Trends

모바일 앱 트렌드를 고려한 2단계 군집화 방법

  • Heo, Jeong-Man (Dept. of Game Design & Development, SangMyung University) ;
  • Park, So-Young (Dept. of Game Design & Development, SangMyung University)
  • Received : 2014.11.13
  • Accepted : 2015.03.16
  • Published : 2015.04.30

Abstract

In this paper, we propose a mobile app clustering method using word clusters. Considering the quick change of mobile app trends, the proposed method divides the mobile apps into some semantically similar mobile apps by applying a clustering algorithm to the mobile app set, rather than the predefined category system. In order to alleviate the data sparseness problem in the short mobile app description texts, the proposed method additionally utilizes the unigram, the bigram, the trigram, the cluster of each word. For the purpose of accurately clustering mobile apps, the proposed method manages to avoid exceedingly small or large mobile app clusters by using the word clusters. Experimental results show that the proposed method improves 22.18% from 57.48% to 79.66% on overall accuracy by using the word clusters.

본 논문에서는 단어 군집을 사용하여 모바일 앱을 군집화하는 방법을 제안한다. 모바일 앱 트렌드의 빠른 변화를 고려하여, 제안하는 방법은 미리 정의된 분류체계를 사용하지 않고, 모바일 앱 집합에 군집화 기술을 적용하여 의미적으로 유사한 모바일 앱을 묶는다. 짧은 모바일 앱 소개 글의 자료 부족 문제를 완화하기 위해서, 각 단어에 대해 unigram 뿐만 아니라, bigram, trigram, 단어 군집 정보를 추가적으로 확보하여 활용한다. 모바일 앱을 전체적으로 정확하게 군집화하기 위해서, 제안하는 방법은 단어 군집을 활용하여 모바일 앱 군집의 크기가 지나치게 작거나 크지 않도록 관리한다. 실험결과 제안하는 방법은 단어 군집을 활용하여 전체 정확도를 57.48%에서 79.66%로 22.18% 개선시켰다.

Keywords

References

  1. S. S. Kim, K. S. Han, B. S. Kim, S. K. Park and S. K. Ahn, "An Empirical Study on Users' Intention to Use Mobile Applications", Journal of Korean Institute of Information Technology, Vol. 9, No. 8, pp. 213-228, Aug. 2011.
  2. J. M. Lim, J. Y. Yu, S. J. Jang, J. H. Lee and J. M. Yu, "Survey on the Internet Usage", Korea Internet & Security Agency, pp. 284, Dec. 2013.
  3. S. Y. Park, J. Chang, and T. Kihl, "Document Classification Model using Web Documents for Balancing Training Corpus Size per Category," Journal of Information and Communication Convergence Engineering, Vol. 11, No. 4, Dec. 2013.
  4. J. Heo, S. Y. Park, "Word Cluster-based Mobile Application Categorization", Journal of The Korea Society of Computer and Information, Vol. 19, No. 3, pp.17-24, Mar. 2014. https://doi.org/10.9708/jksci.2014.19.3.017
  5. H. S. Lim, "Development Trends and Construction of an Automatic Document Classifier", Journal of Internet Computing and Services, Vol. 3, No. 3, pp. 48-56, Sep. 2002.
  6. Y. Yang, J. O. Pedersenm, "A Comparative Study on Feature Selection in Text Categorization", Proc. of the International Conference in Machine Learning, pp. 412-420, July. 1997.
  7. J. P. Moon, W. S. Lee, J. H. Chang, "A Proper Folder Recommendation Technique using Frequent Itemsets for Efficient e-mail Classification," Journal of the Korea Society of Computer and Information, Vol. 16, No. 2, pp. 33-46, Feb. 2011. https://doi.org/10.9708/jksci.2011.16.2.033
  8. C. Apte and F. Damerau, "Automated Learning of Decision Rules for Text Categorization", ACM Trans. on Information Systems, Vol. 12, No. 3, pp. 223-251, July. 1994.
  9. E. Weiner, J. O. Pedersenm and A. S. Weigned, "A Neural Network Approach to Topic Spotting", Proc. of the Annual Symposium on Document Analysis and Information Retrieval, pp.317-332, Apr. 1995.
  10. T. Joachims, "Text Categorization with Support Vector Machines : Learning with many relevant features", Proc. of International Conference on Machine Learning, pp. 137-142, July. 1998.
  11. Y. S. Hwang, J. C. Moon, S. J. Cho, "Classification of Malicious Web Pages by Using SVM," Journal of the Korea Society of Computer and Information, Vol. 17, No. 3, pp. 77-83, Mar. 2012. https://doi.org/10.9708/jksci.2012.17.3.077
  12. D. W. Noh, S. Y. Lee and D. Y. Ra, "Developing a Text Categorization System Based on Unsupervised Learning Using an Information Retrieval Technique", Journal of KIISE : Computer Systems and Theory, Vol. 34, No. 2, pp. 160-168, Feb. 2007.
  13. P. Liang, D. Klein, "Online EM for unsupervised models", Proc. of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 611-619, Jun. 2009.
  14. O. Zamir, "Fast and Intuitive Clustering of Web Documents," Proc. of the International Conference on Knowledge Discovery and Data Mining, pp. 287-290, Aug. 1997.
  15. O. Zamir and O. Etzioni, "Web Document Clustering: A Feasibility Demonstration," Proc. of ACM SIGIR, pp.46-54, Aug. 1998.
  16. O. Zamir and O. Etzioni, "Grouper: A Dynamic Clustering Interface to Web Search Results," Proc. of the International World Wide Web Conference, pp.1361-1374, May. 1999.
  17. G. Wei, "Named Entity Recognition and An Apply on Document Clustering," MSCs thesis, Dalhousie University, Oct. 2004.
  18. H. Toda and R. Kataoka, "A Search Result Clustering Method Using Informatively Named Entities," Proc. of ACM International workshop on WIDM, pp.81-86, Nov. 2005.
  19. K. Y. Sung and B. H. Yun, "Topic based Web Document Clustering using Named Entities", Journal of the Korea Contents Association, Vol. 10, No. 5, pp. 29-36, May. 2010. https://doi.org/10.5392/JKCA.2010.10.5.029
  20. D. H. Kim, K. H. Joo and J. T. Choi, "An Effective Content Clustering Method for the Large Documents", Proceedings of KIIT Summer Conference, Hanbat National University, Korea, pp. 289-297, Jun. 2006.
  21. J. C. Shin and C. Y. Ock, "Search Results Clustering In Real-time", Korea Computer Congress 2009, Mokpo National Maritime University, Korea, pp. 474-479, Jun. 2009.
  22. H. G. Yoon, S. Kim, and S. B. Park, "Noise Elimination in Mobile App Descriptions based on Topic Model," in Proceeding of the Conference on Human & Cognitive Language Technology, pp.64-68, Oct. 2013.
  23. S. Z. Lee, J. I. Tsujii, and H. C. Rim, "Hidden Markov Model-based Korean Part-of-Speech Tagging Considering High Agglutinativity, Word-spacing, and Lexical Correlativity," in Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 384-391, Oct. 2000.
  24. J. A. Hartigan, and M. A. Wong, "A K-means Clustering Algorithm", Applied. Statistics, Vol. 28, No. 1, pp.100-108, Mar. 1979. https://doi.org/10.2307/2346830