Construction of Test Collection for Automatically Extracting Technological Knowledge

기술 지식 자동 추출을 위한 테스트 컬렉션 구축

  • 신성호 (한국과학기술정보연구원 소프트웨어연구실) ;
  • 최윤수 (한국과학기술정보연구원 소프트웨어연구실) ;
  • 송사광 (한국과학기술정보연구원 소프트웨어연구실) ;
  • 최성필 (한국과학기술정보연구원 소프트웨어연구실) ;
  • 정한민 (한국과학기술정보연구원 소프트웨어연구실)
  • Received : 2012.06.18
  • Accepted : 2012.07.04
  • Published : 2012.07.28


For last decade, the amount of information has been increased rapidly because of the internet and computing technology development, mobile devices and sensors, and social networks like facebook or twitter. People who want to gain important knowledge from database have been frustrated with large database. Many studies for automatic knowledge extracting meaningful knowledge from large database have been fulfilled. In that sense, automatic knowledge extracting with computing technology has been highly significant in information technology field, but still has many challenges to go further. In order to improve the effectives and efficiency of knowledge extracting system, test collection is strongly necessary. In this research, we introduce a test collection for automatic knwoledge extracting. We name the test collection KEEC/KREC(KISTI Entity Extraction Collection/KISTI Relation Extraction Collection) and present the process and guideline for building as well as the features of. The main feature is to tag by experts to guarantee the quality of collection. The experts read documents and tag entities and relation between entities with a tool for tagging. KEEC/KREC is being used for a research to evaluate system performance and will continue to contribute to next researches.


Technological Knowledge;Test Collection;Named Entity Extraction;Relation Extraction


  2. 한국과학기술단체총연합회, 새로운 연구․비즈니스 분야로 등장하는 지식기술, The Science & Technology, 2012(2).
  3. R. Grishman, "Information extraction: Techniques and challenges. In Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology," International Summer School, pp.10-17, 1997.
  4. E. Agichtein, "Scaling Information Extraction to Large Document Collections," IEEE, 2005.
  5. D. Bikel, "Nymble:A High-Performance Learning Name-Finer," In proceedings of 5th Conference on Applied Natural Language Processing, p.194, 1997.
  6. 최성필, 정창후, 최윤수, 맹성현, "평면적 어휘 자질들을 활용한 확장 혼합 커널 기반 관계 추출," 정보과학회논문지 : 소프트웨어 및 응용, 제36권, 제8호, pp.642-652, 2009(8).
  7. 강현규, 전흥석, 오염덕, "정보 검색 시스템 평가를 위한 한텍(HANTEC) 적합성 정보의 평가 및 수정 구축," 한국정보기술학회논문지, 제9권, 제4호, pp.167-172, 2011(4).
  8. 이준호, 정보검색이론, 숭실대학교, 2003(3).
  9. 정창후, 최성필, 이민호, 최윤수, "기술용어 간 관계 추출의 성능평가를 위한 반자동 테스트 컬렉션 구축 프레임워크 개발," 한국콘텐츠학회논문지, 제10권, 제2호, pp.1-8, 2010(2).
  10. 정창후, 최성필, 최윤수, 송사광, 전홍우, "술어-논항 구조의 패턴 유사도를 결합한 혼합 커널 기반 관계 추출," 한국인터넷정보학회 논문지, 제12권, 제5호, pp.73-85, 2011(10).