JOURNAL BROWSE
Search
Advanced SearchSearch Tips
Improving The Performance of Triple Generation Based on Distant Supervision By Using Semantic Similarity
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
  • Journal title : Journal of KIISE
  • Volume 43, Issue 6,  2016, pp.653-661
  • Publisher : Korean Institute of Information Scientists and Engineers
  • DOI : 10.5626/JOK.2016.43.6.653
 Title & Authors
Improving The Performance of Triple Generation Based on Distant Supervision By Using Semantic Similarity
Yoon, Hee-Geun; Choi, Su Jeong; Park, Seong-Bae;
 
 Abstract
The existing pattern-based triple generation systems based on distant supervision could be flawed by assumption of distant supervision. For resolving flaw from an excessive assumption, statistics information has been commonly used for measuring confidence of patterns in previous studies. In this study, we proposed a more accurate confidence measure based on semantic similarity between patterns and properties. Unsupervised learning method, word embedding and WordNet-based similarity measures were adopted for learning meaning of words and measuring semantic similarity. For resolving language discordance between patterns and properties, we adopted CCA for aligning bilingual word embedding models and a translation-based approach for a WordNet-based measure. The results of our experiments indicated that the accuracy of triples that are filtered by the semantic similarity-based confidence measure was 16% higher than that of the statistics-based approach. These results suggested that semantic similarity-based confidence measure is more effective than statistics-based approach for generating high quality triples.
 Keywords
triple generation;WordNet;word embedding;semantic similarity;canonical correlation analysis;
 Language
Korean
 Cited by
1.
지식 베이스 확장을 위한 트리플 추출,윤희근;박성배;

정보과학회지, 2016. vol.34. 8, pp.17-24
2.
사회적 이슈 리스크 유형 분류를 위한 어휘 자질 선별,오효정;윤보현;김찬영;

정보처리학회논문지:소프트웨어 및 데이터공학, 2016. vol.5. 11, pp.541-548 crossref(new window)
 References
1.
Soren Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, and Zachary Ives, "DBpedia: A Nucleus for a Web of Open Data," Proc. of International Semantic Web Conference, pp. 11-15, 2007.

2.
Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum, "Yago: a core of semantic knowledge," Proc. of the 16th international conference on World Wide Web, pp. 697-706, 2007.

3.
G. Zhou, J. Su, J. Zhang, and M. Zhang, "Exploring various knowledge in relation extraction," Proc. of the 43rd annual meeting on association for computational linguistics, pp. 427-434, 2005.

4.
A. Culotta and J. Sorensen, "Dependency Tree Kernels for Relation Extraction," Proc. of the 42nd annual meeting on association for computational linguistics, pp. 423-429, 2004.

5.
D. Gerber and A.-C. Ngonga Ngomo, "Bootstrapping the linked data web," Proc. of the 1st Workshop on Web Scale Knowledge Extraction, 2011.

6.
F. Wu and D. S. Weld, "Open information extraction using Wikipedia", Proc. of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 118-127, 2010.

7.
HyunGoo Lee, Maengsik Choi, and Harksoo Kim, "Relation Extraction Using Suffix Tree and Distant Supervision," Proc. of Annual Conference on Human and Cognitive Language Technology, pp. 149-152, 2014. (In Korean)

8.
Hee-Geun Yoon and Seong-Bae Park, "Pattern and Instance Generation for Self-knowledge Learning in Korean," Journal of The Korean Institute of Intelligent Systems, Vol. 25, No. 1, pp. 63-69, 2015. (In Korean) crossref(new window)

9.
George A. Miller, "WordNet: A Lexical Database for English," Communications of the ACM, Vol. 38, No. 11, pp. 39-41, 1995.

10.
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," Proc. of the Conference on Advances in Neural Information Processing Systems, pp. 3111-3119, 2013.s

11.
M. Faruqui and C. Dyer, "Improving Vector Space Word Representations Using Multilingual Correlation," Proc. of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 462-471, 2014.

12.
J. J. Jiang and D. W. Conrath, "Semantic similarity based on corpus statistics and lexical taxonomy," Proc. of International Conference Research on Computational Linguistics, pp. 19-33, 1997.

13.
[Online]. Available: https://code.google.com/p/word2vec/

14.
[Online]. Available: https://wit3.fbk.eu/

15.
Y. Chen, B. Perozzi, R. Al-Rfou, and S. Skiena, "The expressive power of word embeddings," Proc. of the ICML 2013 Workshop on Deep Learning for Audio, Speech, and Language Processing, 2013.

16.
O. Levy and Y. Goldberg, "Dependency-based word embeddings," Proc. of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 302-308, 2014.

17.
T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves, and J. Welling, "Never-Ending Learning," Proc. of the Conference on Artificial Intelligence, pp. 2302-2310, 2015.