A Study on Information Resource Evaluation for Text Categorization

Chung, Eun-Kyung;

doi:10.3743/KOSIM.2007.24.4.305

Journal of the Korean Society for information Management (정보관리학회지)

Volume 24 Issue 4
/
Pages.305-321
/
2007
/
1013-0799(pISSN)
/
2586-2073(eISSN)

Korean Society for Information Management (한국정보관리학회)

DOI QR Code

A Study on Information Resource Evaluation for Text Categorization

문서범주화 효율성 제고를 위한 정보원 평가에 관한 연구

Chung, Eun-Kyung

정은경 (이화여자대학교 사회과학대학 문헌정보학)

Published : 2007.12.31

https://doi.org/10.3743/KOSIM.2007.24.4.305 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

The purpose of this study is to examine whether the information resources referenced by human indexers during indexing process are effective on Text Categorization. More specifically, information resources from bibliographic information as well as full text information were explored in the context of a typical scientific journal article data set. The experiment results pointed out that information resources such as citation, source title, and title were not significantly different with full text. Whereas keyword was found to be significantly different with full text. The findings of this study identify that information resources referenced by human indexers can be considered good candidates for text categorization for automatic subject term assignment.

이 연구는 색인가가 주제 색인하는 과정에서 참조하는 여러 문서구성요소를 문서 범주화의 정보원으로 인식하여 이들이 문서 범주화 성능에 미치는 영향을 살펴보는데 그 목적이 있다. 이는 기존의 문서 범주화 연구가 전문(full text)에 치중하는 것과는 달리 문서구성요소로서 정보원의 영향을 평가하여 문서 범주화에 효율적으로 사용될 수 있는지를 파악하고자 한다. 전형적인 과학기술분야의 저널 및 회의록 논문을 데이터 집합으로 하였을때 정보원은 본문정보 중심과 문서구성요소중심으로 나뉘어 질 수 있다. 본문정보중심은 본론자체와 서론과 결론으로 구성되며, 문서구성요소중심은 제목, 인용, 출처, 초록, 키워드로 파악된다. 실험결과를 살펴보면, 인용, 출처, 제목정보원은 본문정보원과 비교하여 유의한 차이를 보이지 않으며, 키워드정보원은 본문 정보원과 비교하여 유의한 차이를 보인다. 이러한 결과는 색인가가 참고하는 문서구성요소로서의 정보원이 문서 범주화에 본문을 대신하여 효율적으로 활용될 수 있음을 보여주고 있다.

Keywords

References

Chan, L.M. (1981). Cataloging and classification: An introduction. New York City, NY: McGraw -Hill
Chan, L.M. (1987). Instructional materials used in teaching cataloging and classification. Cataloging and Classification. (7) : 131-144
Chu, C.M. & O'Brien, A. (1993). Subject analysis: The critical first stage in indexing. Journal of Information Science. (19) : 439- 454 https://doi.org/10.1177/016555159301900603
Cunningham, S.J., Witten, I.H., & L ittin, J. (1999). Applications of machinelearning in information retrieval. Annual Review of Information Science and Technology, (34) : 341-384
Diaz, I., Ranilla, J., Montanes, E., Fernandez, J., & Combarro, E. (2004). Improving performance of text categorization by combining filtering and support vector machines, Journal of the American Society for Information Science and Technology, 55(7) : 579-592 https://doi.org/10.1002/asi.10409
Efron, M., Marchionini, G., Elsas, J., & Zhang, J. (2004). Machine learning for information architecture in a large governmental website. Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 151-159
Engineering Village 2. (n.d.). Retrieved November 11, 2006, from http://www.negineeringviallge2.org/controller/servlet/Controller
Foskett, A.C. (1996). The Subject Approach to Information. London: Library Association Publishing
ISO 5963: 1985. (1985). Documentation-methods for examining documents: Determining their subjects and selecting indexing terms. International Standards Organization
Jeng, L.H. (1996). Using verbal reports to understand cataloging expertise: Two cases, Library Resources and Technical Services 40(4) : 343-358 https://doi.org/10.5860/lrts.40n4.343
Joachims, T. (1998). Text categorization with support vector machine : Learning with many relevant features, Proceedings of the 10th European Conference on Machine Learning, 137-142
Larkey, L.S. (1999). A patent search and classification system. Proceedings of the 4th ACM Conference on Digital Libraries, 179-187
Lewis, D.D. (1995). Evaluating and optimizing autonomous text categorization systems. Unpublished Doctoral Dissetation, University of Massachusetts, Massachusetts
Mai, J.E. (2005). Analysis in indexing : document and domain centered approaches, Information Processing and Management, (41) : 599-611 https://doi.org/10.1016/j.ipm.2003.12.004
Mitchell, J.S. et al. (Eds.). (2003). Dewey Decimal Classification and Relative Index. Dublin, OH: OCLC Online Library Computer, Inc
Moens, M.F. (2000). Automatic Indexing and Abstracting of Document Texts. Norwell, MS: Kluwer Academic Publishers
O'Connor, B.C. (1996). Explorations in Indexing and Abstracting: pointing, virtue, and power. CO: Libraries Unlimited
Porter, M.F. (1980). An algorithm for suffix stripping, Program, (14) : 130-137 https://doi.org/10.1108/eb046814
Sauperl, A. (2002). Subject determination during the cataloging process. Lanham, MD; Scarecrow Press
Sauperl, A. (2004). Catalogers' common ground and shared knowledge. Journal of the American Society for Information Science and Technology, 55(1) : 55-63 https://doi.org/10.1002/asi.10351
Sebastiani, F. (2002). Hypertext categorization. In A. Zanasi (Eds.), Text Mining and Its Applications(pp. 109-129), Southhampton, U.K.: WIT Press
Sebastiani, F. (2005). Text categorization. In A. Zanasi (Eds.), Text mining and its applications (pp. 109-129), Southhampton, U.K : WIT Press
Slattery, S. (2002). Hypertext categorization. Unpublished Doctoral Dissertation. School of Computer Science. Carnegie Mellon University
Taylor, A.G. (2003). The organization of information (2nd ed.). Englewood, CO; Libraries Unlimited
van Rijsbergen, C.J. (1979). Information Retrieval Butterworths, London
Witten, I.H. & Frank, E. (2000). Data Mining: Practical Machine Learning Tools and techniques with JAVA Implementations. CA: SanDiego, Academic Press
Yang, Y. 1999. An evaluation of statistcial approaches to text categorization. Information Retrieval, (1) : 69-90 https://doi.org/10.1023/A:1009982220290
Zhang, B., Goncalves, M.A., Fan, W., Chen, Y., Fox, E.A., Calado, P. & Cristo, M. (2004). Combining structural and citation-based evidence for text categorization, Proceedings of the 13th ACM Conference on Information and Knowledge Management, 162-163 https://doi.org/10.1145/1031171.1031204

Journal of the Korean Society for information Management (정보관리학회지)

A Study on Information Resource Evaluation for Text Categorization

문서범주화 효율성 제고를 위한 정보원 평가에 관한 연구

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)