DOI QR코드

DOI QR Code

Document Summarization using Semantic Feature and Hadoop

하둡과 의미특징을 이용한 문서요약

  • Kim, Chul-Won (Department of Computer Engineering, Honam University)
  • Received : 2013.11.04
  • Accepted : 2013.12.20
  • Published : 2014.09.30

Abstract

In this paper, we proposes a new document summarization method using the extracted semantic feature which the semantic feature is extracted by distributed parallel processing based Hadoop. The proposed method can well represent the inherent structure of documents using the semantic feature by the non-negative matrix factorization (NMF). In addition, it can summarize the big data document using Hadoop. The experimental results demonstrate that the proposed method can summarize the big data document which a single computer can not summarize those.

본 논문은 하둡 기반의 분산병렬처리에 의한 문서의 의미특징을 추출하고, 추출된 의미특징을 이용하여 문서를 요약하는 새로운 방법을 제안한다. 제안된 방법은 문서요약에 비음수 분해된 문서의 의미특징을 이용함으로써 문서의 내부 구조를 잘 표현 할 수 있다. 또한 하둡을 이용하여 빅데이터의 문서를 요약할 수 있다. 실험결과 제안방법이 단일 컴퓨터 환경에서 처리할 수 없는 대용량의 문서를 요약할 수 있음을 보인다.

Keywords

References

  1. T. White, Hadoop: The Definitive Guide, 3th ed. O'Reilly Media, 2012.
  2. V. Nastase, "Topic-Driven Multi-Document Summarization with Encyclopedic Knowledge and Spreading Activation," in Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii, USA, pp.763-772, 2008.
  3. K. Ramanathan, Y. Sankarasubramaniam, N. Mathur, A. Gupta, "Document Summarization using Wikipedia", in Proceedings of the First International Conference on HCI, Japan, 2009.
  4. S. Ye, T. S. Chua, J. Lu, "Summarization Definition from Wikipedia", in Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, Singapore, pp. 199-207, 2009.
  5. S. Gong, Y. Qu, S. Tian, "Summarization using Wikipedia", in Proceedings of Text Analysis Conference 2010, Gaithersburg, Maryland, USA, 2010.
  6. M., Sanderson, "Accurate user directed summarization from existing tools", in Proceeding of the international conference on information and knowledge management, Bethesda, Maryland, USA, pp.45-51, 1998.
  7. A., Tombros, M., Sanderson, "Advantages of Query Biased summaries in Information Retrieval", in Proceeding of ACM Special Interest Group on Information Retrieval, pp.2-10, Melbourne, Australia, 1998.
  8. R., Varadarajan, V., Hristidis, "A System for Query Specific Document Summarization", in Proceeding of the International Conference on Information and Knowledge Management, Arlington, Virginia, USA, pp.622-631, 2006.
  9. S. Owen, R. Anil, T. Dunning, E. Friedman, Mahout in Action, Manning Publiications, 2011.
  10. D. D. Lee, H. S. Seung, "Algorithms for non-negative matrix factorization," In Advances in Neural Information Processing Systems, vol. 13, pp.556-562, Aug. 2001.
  11. C. Liu, H. C. Yang, J. Fan, L. W. He, Y. M. Wang, "Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce," in Proceeding of the International World Wide Web Conferene Comittee, USA, pp.1-10, 2010.
  12. B. Y. Ricardo, Berthier, R. N., Moden Information Retrieval, ACM Press. 1999.