JOURNAL BROWSE
Search
Advanced SearchSearch Tips
A Semi-automatic Construction method of a Named Entity Dictionary Based on Wikipedia
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
  • Journal title : Journal of KIISE
  • Volume 42, Issue 11,  2015, pp.1397-1403
  • Publisher : Korean Institute of Information Scientists and Engineers
  • DOI : 10.5626/JOK.2015.42.11.1397
 Title & Authors
A Semi-automatic Construction method of a Named Entity Dictionary Based on Wikipedia
Song, Yeongkil; Jeong, Seokwon; Kim, Harksoo;
 
 Abstract
A named entity(NE) dictionary is an important resource for the performance of NE recognition. However, it is not easy to construct a NE dictionary manually since human annotation is time consuming and labor-intensive. To save construction time and reduce human labor, we propose a semi-automatic system for the construction of a NE dictionary. The proposed system constructs a pseudo-document with Wiki-categories per NE class by using an active learning technique. Then, it calculates similarities between Wiki entries and pseudo-documents using the BM25 model, a well-known information retrieval model. Finally, it classifies each Wiki entry into NE classes based on similarities. In experiments with three different types of NE class sets, the proposed system showed high performance(macro-average F1-score of 0.9028 and micro-average F1-score 0.9554).
 Keywords
named entity dictionary construction;Wikipedia;information retrieval method;active learning;
 Language
Korean
 Cited by
 References
1.
D. Nadeau, S. Sekine, "A Survey of Named Entity Recognition and Classification," Linguisticae Investigationes, Vol. 30, No. 1, pp. 3-26, 2007. crossref(new window)

2.
(1996, Apr. 25). MUC-6 [Online]. Available: http://www.cs.nyu.edu/cs/faculty/grishman/muc6.html (downloaded 2015, Jul. 22)

3.
(2002, Aug. 03). BBN [Online]. Available: https://catalog.ldc.upenn.edu/docs/LDC2005T33/BBN-Types-Subtypes.html (downloaded 2015, Jul. 22)

4.
M. Tkachenko, A. Ulanov, A. Simanovsky, "Fine grained classification of named entities in wikipedia," Technical report, HP Laboratories, 2010.

5.
E. Agichtein, L. Gravano, "Snowball : Extracting Relations from Large Plain-Text Collections," Proc. of the 5th ACM Conference on Digital Libraries, pp. 85-94, 2000.

6.
M. Thelen, E. Riloff, "A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts," Proc. of the Conference on Empirical Methods in NLP, pp. 217-221, 2002.

7.
K. Shinzato, S. Sekine, N. Yoshinaga, K. Torisawa, "Constructing Dictionaries for Named Entity Recognition on Specific Domains from the Web," Proc. of the 5th International Semantic Web Conference - Workshop on Web Content Mining with Human Language Technologies, 2006.

8.
S. Bae, Y. Ko, "Automatic Construction of Class Hierarchies and Named Entity Dictionaries using Korean Wikipedia," Journal of KIISE (B), Vol. 16, No. 4, pp. 492-496, 2010. (in Korean)

9.
DBpedia Ontology [Online]. Available: http://wiki.dbpedia.org/services-resources/ontology (downloaded 2015, Jul. 22)

10.
B. Settles, "Active learning literature survey," Computer Sciences Technical Report 1648, University of Wisconsin-Madison, 2009.

11.
Y. H. Lee, S. B. Lee, "A Research on Enhancement of Text Categorization Performance by using Okapi BM25 Word Weight Method," Journal of the Korea Academia-Industrial cooperation Society, Vol. 11, No. 12, pp. 5089-5096, 2010. crossref(new window)

12.
(2015, Jun. 13) Okapi BM25 [Online]. Available: https://en.wikipedia.org/wiki/Okapi_BM25 (downloaded 2015, Jul. 22)

13.
Ontology Classes [Online]. Available: http://mappings.dbpedia.org/server/ontology/classes/ (downloaded 2015, Aug, 31)