DOI QR코드

DOI QR Code

Word Cluster-based Mobile Application Categorization

단어 군집 기반 모바일 애플리케이션 범주화

  • Heo, Jeongman (Dept. of Game Design & Development, SangMyung University) ;
  • Park, So-Young (Dept. of Game Design & Development, SangMyung University)
  • Received : 2013.12.19
  • Accepted : 2014.02.20
  • Published : 2014.03.31

Abstract

In this paper, we propose a mobile application categorization method using word cluster information. Because the mobile application description can be shortly written, the proposed method utilizes the word cluster seeds as well as the words in the mobile application description, as categorization features. For the fragmented categories of the mobile applications, the proposed method generates the word clusters by applying the frequency of word occurrence per category to K-means clustering algorithm. Since the mobile application description can include some paragraphs unrelated to the categorization, such as installation specifications, the proposed method uses some word clusters useful for the categorization. Experiments show that the proposed method improves the recall (5.65%) by using the word cluster information.

본 논문에서는 단어 군집 정보를 활용하여 모바일 애플리케이션의 범주를 분류하는 방법을 제안한다. 제안하는 방법은 모바일 애플리케이션 설명이 짧을 수 있다는 점을 고려하여, 모바일 애플리케이션 설명에 포함된 단어 정보 뿐만 아니라 각 단어의 단어 군집 대표 정보를 범주화 자질로 활용한다. 그리고, 모바일 애플리케이션의 카테고리가 세분화되어 있으므로, 제안하는 방법은 범주별 단어 발생 빈도를 K 평균 군집화 알고리즘에 적용하여 단어 군집을 생성한다. 모바일 애플리케이션 설명이 설치사양과 같이 범주와 관련없는 내용이 있을 수 있다는 점을 반영하여, 제안하는 방법은 단어 군집 중에서 범주화에 유용한 일부 단어 군집만을 선별하여 활용한다. 실험결과 제안하는 방법은 단어 군집 정보를 활용하여 모바일 애플리케이션 범주화 재현율을 5.65% 개선시켰다.

Keywords

References

  1. B. Baharudin, L. H. Lee, and K. Khan, "A review of machine learning algorithms for text-documents classification," Journal of Advances in Information Technology, Vol. 1, No. 1, pp. 4-20, Feb. 2010.
  2. J. P. Moon, W. S. Lee, J. H. Chang, "A proper folder recommendation technique using frequent itemsets for efficient e-mail classification," Journal of the Korea Society of Computer and Information, Vol. 16, No. 2, pp. 33-46, Feb. 2011. https://doi.org/10.9708/jksci.2011.16.2.033
  3. Y. S. Hwang, J. C. Moon, S. J. Cho, "Classification of malicious Web pages by using SVM," Journal of the Korea Society of Computer and Information, Vol. 17, No. 3, pp. 77-83, Mar. 2012. https://doi.org/10.9708/jksci.2012.17.3.077
  4. T. N. Rubin, A. Chambers, P. Smyth, and M. Steyvers, "Statistical topic models for multi-label document classification," Machine Learning, Vol. 88, No. 1-2, pp. 157-208, Dec. 2011.
  5. S. Y. Park, J. Chang, and T. Kihl, "Document classification model using Web documents for balancing training corpus size per category," Journal of Information and Communication Convergence Engineering, Vol. 11, No. 4, Dec. 2013. https://doi.org/10.6109/jicce.2013.11.4.268
  6. G. Lu, P. Huang, L. He, C. Cu, and X. Li, "A new semantic similarity measuring method based on Web search engines," WSEAS Transactions on Computers, Vol. 9, No. 1, pp. 1-10, Jan. 2010.
  7. B. K. Sun, D. H. We, K. R. Han, "A Study on Paper Retrieval System based on OWL Ontology," Journal of the Korea Society of Computer and Information, Vol. 14, No. 2, pp. 169-180, Feb. 2009.
  8. S. Samarawickrama, and L. Jayaratne, "Automatic text classification and focused crawling," in Proceeding of the 6th International Conference on Digital Information Management, Melbourne, Australia, pp. 143-148, Sept. 2011.
  9. de Groc, C. "Babouk: focused web crawling for corpus compilation and automatic terminology extraction," In Proceeding of IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pp. 497-498, Aug. 2011.
  10. H. Liu, and E. Milios, "Probabilistic models for focused Web crawling," Computational Intelligence, Vol. 28, No. 3, pp. 289-328, Aug. 2012. https://doi.org/10.1111/j.1467-8640.2012.00411.x
  11. Y. R. Lee, and E. G. Im, "A Study on the smart phone application clustering using information of android permissions," in Proceeding of the Conference on the Korean Institute of Communication Science, pp. 812-813, Feb. 2012.
  12. H. G. Yoon, S. Kim, and S. B. Park, "Noise elimination in mobile app descriptions based on topic model," in Proceeding of the Conference on Human & Cognitive Language Technology, pp.64-68, Oct. 2013.
  13. W. H. Rho, S. B. Cho, "A mobile app category recommendation system with contexts using bayesian network," in Proceeding of Korea Computer Congress, pp.1408-1410, Jun. 2013.
  14. B. Yan, and G. Chen, "AppJoy: personalized mobile application discovery," in Proceedings of the 9th international conference on mobile systems, applications, and services, pp. 113-126, Jun. 2011.
  15. S. Z. Lee, J. I. Tsujii, and H. C. Rim, "Hidden Markov model-based Korean part-of-speech tagging considering high agglutinativity, word-spacing, and lexical correlativity," in Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 384-391, Oct. 2000.
  16. A. L. Berger, V. J. Della Pietra, and S. A. Della Pietra, "A maximum entropy approach to natural language processing," Computational Linguistics, vol. 22, no. 1, pp. 39-71, Mar. 1996.
  17. J. H. Lim, Y. S. Hwang, S. Y. Park, and H. C. Rim, "Semantic role labeling using maximum entropy model," in Proceeding of the Conference on Computational Natural Language Learning, Boston: MA, pp. 122-125, May. 2004.
  18. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, and D. Cournapeau, "Scikit-learn: machine learning in python," Journal of Machine Learning Research, Vol. 12, pp. 2825-2830, Oct. 2011.
  19. A. K. McCallum, MALLET: a machine learning for lan-guage toolkit, http://mallet.cs.umass.edu.