DOI QR코드

DOI QR Code

A Term Importance-based Approach to Identifying Core Citations in Computational Linguistics Articles

  • Kang, In-Su (Dept. of Computer Science & Engineering, Kyungsung University)
  • Received : 2017.05.31
  • Accepted : 2017.09.04
  • Published : 2017.09.30

Abstract

Core citation recognition is to identify influential ones among the prior articles that a scholarly article cite. Previous approaches have employed citing-text occurrence information, textual similarities between citing and cited article, etc. This study proposes a term-based approach to core citation recognition, which exploits the importance of individual terms appearing in in-text citation to calculate influence-strength for each cited article. Term importance is computed using various frequency information such as term frequency(tf) in in-text citation, tf in the citing article, inverse sentence frequency in the citing article, inverse document frequency in a collection of articles. Experiments using a previous test set consisting of computational linguistics articles show that the term-based approach performs comparably with the previous approaches. The proposed technique could be easily extended by employing other term units such as n-grams and phrases, or by using new term-importance formulae.

Keywords

References

  1. X. Wan, and F. Liu, "Are All Literature Citations Equally Important? Automatic Citation Strength Estimation and its Applications," Journal of the Association for Information Science and Technology, Vol. 65, No. 9, pp. 1929-1938, 2014. https://doi.org/10.1002/asi.23083
  2. X. Zhu, P. D. Turney, D. Lemire, and A. Vellino, "Measuring Academic Influence: Not All CitationsAre Equal," Journal of the Association for Information Science and Technology, Vol. 66, No. 2, pp. 408-427, 2015. https://doi.org/10.1002/asi.23179
  3. A. Abu Jbara, and D. R. Radev, "Coherent Citation-Based Summarization of Scientific Papers," Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies(ACL), pp. 500-509, 2011.
  4. M. Valenzuela, V. Ha, and O. Etzioni, "Identifying Meaningful Citations," AAAI Workshop: Scholarly Big Data, 2015.
  5. T. Chakraborty, and R. Narayanam, "All Fingers are not Equal: Intensity of References in Scientific Articles," Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing(EMNLP), pp. 1348-1358, 2016.
  6. A. Akram, "Distinguishing Important Citations Using Contextual Information in Scholarly Big Data," Master's Thesis, Information Technology University, 2017.
  7. J. Allan, C. Wade, and A. Bolivar, "Retrieval and Novelty Detection at the Sentence Level," Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR), pp. 314-321, 2003.
  8. M. Galley, "A Skip-Chain Conditional Random Field for Ranking Meeting Utterances by Importance," Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing(EMNLP), pp. 364-372, 2006.
  9. C. Y. Lin, and E. H. Hovy, "The Automated Acquisition of Topic Signatures for Text Summarization," Proceedings of the 18th International Conference on Computational Linguistics(COLING), pp. 495-501, 2000.
  10. S. Xie, and Y. Liu, "Improving Supervised Learning for Meeting Summarization using Sampling and Regression," Computer Speech & Language, Vol. 24, No. 3, pp. 495-514, 2010. https://doi.org/10.1016/j.csl.2009.04.007
  11. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. VanderPlas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, "Scikit-learn: Machine Learning in Python," Journal of Machine Learning Research, 12, pp. 2825-2830, 2011.
  12. C. C. Chang, and C. J. Lin, "LIBSVM: A Library for Support Vector Machines," ACM Transactions on Intelligent Systems and Technology, Vol. 2, No. 3, pp. 27:1-27:27, 2011.
  13. R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin, "LIBLINEAR: A Library for Large Linear Classification," Journal of Machine Learning Research, Vol. 9, pp. 1871-1874, 2008.
  14. C. W. Hsu, C. C. Chang, and C. J. Lin, "A Practical Guide to Support Vector Classification," https://www.csie.ntu.edu.tw/-cjlin/papers/guide/guide.pdf, 2003.
  15. I. G. Councill, C. L. Giles, and M. Y. Kan, "ParsCit: an Open-source CRF Reference String Parsing Package," Proceedings of the International Conference on Language Resources and Evaluation(LREC), 2008.
  16. S. Bird, R. Dale, B. J. Dorr, B. R. Gibson, M. T. Joseph, M. Y. Kan, D. Lee, B. Powley, D. R. Radev, and Y. F. Tan, "The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics," Proceedings of the International Conference on Language Resources and Evaluation(LREC), 2008.
  17. C. J. Fox, "A Stop List for General Text," SIGIR Forum, 24(1-2), pp. 19-35, 1990.
  18. M. F. Porter, "An Algorithm for Suffix Stripping," Program, Vol. 14, No. 3, pp. 130-137, 1980. https://doi.org/10.1108/eb046814