Identification of Chinese Event Types Based on Local Feature Selection and Explicit Positive & Negative Feature Combination

  • Tan, Hongye (Department of Computer Science and Technology, Harbin Institute of Technology, School of Computer and Information Technology, Shanxi University) ;
  • Zhao, Tiejun (Department of Computer Science and Technology, Harbin Institute of Technology) ;
  • Wang, Haochang (Department of Computer Science and Technology, Harbin Institute of Technology) ;
  • Hong, Wan-Pyo (IT Division of Hansei University)
  • Published : 2007.09.30

Abstract

An approach to identify Chinese event types is proposed in this paper which combines a good feature selection policy and a Maximum Entropy (ME) model. The approach not only effectively alleviates the problem that classifier performs poorly on the small and difficult types, but improve overall performance. Experiments on the ACE2005 corpus show that performance is satisfying with the 83.5% macro - average F measure. The main characters and ideas of the approach are: (1) Optimal feature set is built for each type according to local feature selection, which fully ensures the performance of each type. (2) Positive and negative features are explicitly discriminated and combined by using one - sided metrics, which makes use of both features' advantages. (3) Wrapper methods are used to search new features and evaluate the various feature subsets to obtain the optimal feature subset.

Keywords

References

  1. The ACE 2005 Evaluation Plan, http://www.ldc.upenn.edu/Projects/ACE/Annotation
  2. Yanyan Zhao, Xiaoyin Wang, Bin Qin etal., Automatic Event Type Extraction in Chinese Event Extraction, In: Proceeding of the $3^{rd}$ Student Workshop of Computational Linguistics(SWCL-2006), Shenyang city, China, 2006. 240-245
  3. Steven Bethard, James H. Martin, Identification of Event Mentions and their Semantic Class, In: Proceeding of 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP2006), Sydney, Australia, 2006. 146-154
  4. Forman G, a Pitfall and Solution in Multi-Class Feature Selection for Text Classification, In: Proceedings of the $21^{st}$ International Conference on Machine Learning (ICML2004), Banff, Canada, Morgan Kaufmann Publishers, 2004. 38
  5. Zheng Z H, Wu X Y, Srihari R, Feature Selection for Text Categorization on Imbalanced Data, In: Proceeding s of SIGKDD2004, vol.6, Issue 1, 2004. 80-89
  6. Huan Liu, Lei Yu, Toward Integrating Feature Selection Algorithm for Classification and Clustering, IEEE Transaction on Knowledge and Data Engineering, 2005, 17(4): 491-502 https://doi.org/10.1109/TKDE.2005.66
  7. Fabrizio Sebastiani, Machine Learning in Automated Text Categorization, ACM Computing Surveys, 2002, 34(1): 1-47 https://doi.org/10.1145/505282.505283
  8. Adwait Ratnaparkhi, A Simple Introduction to Maximum Entropy Models for Natural Language Processing, Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania, http://citeseer.ist.psu.edu/128751.html