JOURNAL BROWSE
Search
Advanced SearchSearch Tips
TAKES: Two-step Approach for Knowledge Extraction in Biomedical Digital Libraries
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
TAKES: Two-step Approach for Knowledge Extraction in Biomedical Digital Libraries
Song, Min;
  PDF(new window)
 Abstract
This paper proposes a novel knowledge extraction system, TAKES (Two-step Approach for Knowledge Extraction System), which integrates advanced techniques from Information Retrieval (IR), Information Extraction (IE), and Natural Language Processing (NLP). In particular, TAKES adopts a novel keyphrase extraction-based query expansion technique to collect promising documents. It also uses a Conditional Random Field-based machine learning technique to extract important biological entities and relations. TAKES is applied to biological knowledge extraction, particularly retrieving promising documents that contain Protein-Protein Interaction (PPI) and extracting PPI pairs. TAKES consists of two major components: DocSpotter, which is used to query and retrieve promising documents for extraction, and a Conditional Random Field (CRF)-based entity extraction component known as FCRF. The present paper investigated research problems addressing the issues with a knowledge extraction system and conducted a series of experiments to test our hypotheses. The findings from the experiments are as follows: First, the author verified, using three different test collections to measure the performance of our query expansion technique, that DocSpotter is robust and highly accurate when compared to Okapi BM25 and SLIPPER. Second, the author verified that our relation extraction algorithm, FCRF, is highly accurate in terms of F-Measure compared to four other competitive extraction algorithms: Support Vector Machine, Maximum Entropy, Single POS HMM, and Rapier.
 Keywords
Semantic Query Expansion;Information Extraction;Information Retrieval;Text Mining;
 Language
English
 Cited by
 References
1.
Abdou, S., & Savoy, J. (2008). Searching in Medline: Query expansion and manual indexing evaluation. Information Processing and Management, 44(2), 781-789. crossref(new window)

2.
Agichtein, E., & Gravano, L. (2003). Querying text databases for efficient information extraction. Proceedings of the 19th IEEE International Conference on Data Engineering (ICDE), 113-124. New York.

3.
Airola, A., Pyysalo, S., Bjorne, J., Pahikkala, T., Ginter, F., & Salakoski, T. (2008). All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning. BMC Bioinformatics, 9:S2.

4.
Banko, M., & Etzioni, O. (2007). Strategies for lifelong knowledge extraction from the web. Proceedings of the 4th International Conference on Knowledge Capture, 95-102.

5.
Blaschke, C., Andrade, M. A., Ouzounis, C., & Valencia, A. (1999). Automatic extraction of biological information from scientific text: Protein-Protein interactions. Proceedings of the First International Conference on Intelligent Systems for Molecular Biology, 60-67. New York.

6.
Blaschke, C., Hirschman, L., Shatkay, H., & Valencia, A. (2010). Overview of the Ninth Annual Meeting of the BioLINK SIG at ISMB: Linking Literature. Information and Knowledge for Biology, Linking Literature, Information, and Knowledge for Biology, 6004: 1-7. crossref(new window)

7.
Califf, M. E., & Mooney, R. (2003). Bottom-up relational learning of pattern matching rules for information extraction. Journal of Machine Learning Research, 2, 177-210.

8.
Carpineto, C., & Romano, G. (2010). Towards more effective techniques for automatic query expansion. Research and Advanced Technology for Digital Libraries, 851-852.

9.
Cohen, W., & Singer, Y. (1996). Learning to query the web. Proceedings of the AAAI Workshop on Internet-Based Information System.

10.
Feng, D., Burns, G., & Hovy, E. (2008). Adaptive information extraction for complex biomedical tasks. BioNLP 2008: Current Trends in Biomedical Natural Language Processing, 120-121. New York.

11.
Frants, V.I., & Shapiro, J. (1991). Algorithm for automatic construction of query formulations in Boolean form. Journal of the American Society for Information Science, 42(1), 16-26. crossref(new window)

12.
He, M., Wang, Y., & Li, W. (2009). PPI finder: A mining tool for human protein-protein interactions. PLoS One, 4(2): e4554. Epub 2009 Feb 23. crossref(new window)

13.
Hu, X., & Shen, X. (2009). Mining biomedical literature for identification of potential virus/bacteria. IEEE Intelligent System, 24(6), 73-77. New York. crossref(new window)

14.
Kim, M. Y. (2008). Detection of protein subcellular localization based on a full syntactic parser and semantic information. Fifth International Conference on Fuzzy Systems and Knowledge Discovery, 4, 407-411.

15.
Kudo, T., & Matsumoto, Y. (2000). Use of support vector learning for chunk identification. Proceedings of CoNLL- 2000 and LLL-2000, 142-144. Saarbruncken, Germany; New York.

16.
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proceedings of the ICML' 01.

17.
Manning, C., & Klein, D. (2003). Optimization, maxent models, and conditional estimation without magic. Tutorial at HLT-NAACL 2003. New York.

18.
McKusick, V.A. (1998). Mendelian inheritance in man. A catalog of human genes and genetic disorders, 12th ed. Johns Hopkins University Press: Baltimore, MD.

19.
Mitra, C.U., Singhal, A., & Buckely, C. (1998). Improving automatic query expansion. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 206-214. New York.

20.
Miyao, Y., Sagae, K., Saetre, R., Matsuzaki, T., & Tsujii, J. (2009). Evaluating contributions of natural language parsers to protein-protein interaction extraction. Bioinformatics, 25(3), 394-400. crossref(new window)

21.
Muller, H.W., Kenny E.E., & Sternberg, P.W. (2004). Textpresso: An ontology-based information retrieval and extraction system for biological literature, PLoS Biol. Nov,2(11), e309. crossref(new window)

22.
Poon, H., & Vanderwende, L. (2010). Joint inference for knowledge extraction from biomedical literature. Proceedings of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, NJ: Human Language Technologies 2010 conference. Los Angeles, CA.

23.
Pyysalo, S., Ginter, F., Heimonen, J., Bjorne, J., Boberg, J., Jarvinen, J., & Salakoski, T. (2007). BioInfer: A corpus for information extraction in the biomedical domain. BMC Bioinformatics, 8(50).

24.
Quinlan, J. R. (1993). Programs for machine learning. San Mateo, CA: Morgan Kaufmann.

25.
Ray, S., & Craven, M. (2001). Representing sentence structure in hidden markov models for information extraction. Proceedings of the 17th International Joint Conference on Artificial Intelligence. Seattle, WA: Morgan Kaufmann.

26.
Robertson, S. E., Zaragoza, H., & Taylor, M. (2004). Simple BM25 extension to multiple weighted fields. Proceedings of the thirteenth ACM international conference on Information and knowledge management, 42-49. New York.

27.
Robertson, S.E., & Sparck, J.K. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science, 27, 129-146. crossref(new window)

28.
Shatkay, H., & Feldman, R. (2003). Mining the biomedical literature in the genomic era: An overview. Journal of Computational Biology, 10 (6), 821-855. crossref(new window)

29.
Xenarios, I., & Eisenberg, D. (2001). Protein interaction databases. Current Opinion in Biotechnology, 12(4), 334-339. crossref(new window)

30.
Zhou, G., & Zhang, M. (2007). Extracting relation information from text documents by exploring various types of knowledge. Information Processing and Management, 43(4), 969-982. crossref(new window)