DOI QR코드

DOI QR Code

An Experimental Study on the Effect of Domain Expertise on the Consistency of Relevance Judgements

주제전문지식이 적합성판정의 일관성에 미치는 영향에 관한 실험적 연구

  • Scholten, Stacey (Department of Library and Information Science, Yonsei University) ;
  • Moon, Sung-Been (Department of Library and Information Science, Yonsei University)
  • Received : 2021.08.16
  • Accepted : 2021.09.18
  • Published : 2021.09.30

Abstract

An online experiment was conducted to test the subject-knowledge view of relevance theory in order to find evidence of a conceptual basis for relevance. Six experts in Library and Information Science (LIS), nine Master's students of LIS, and twelve non-experts judged the relevance of 14 abstracts within and outside of the LIS domain. Consistency among the judges was calculated by joint-probability agreement (PA) and interclass correlation coefficients (ICC). When using PA to analyze the judgements, non-experts had a higher consensus regardless of the task or division of groups. However, ICC calculations found Master's candidates had a higher level of consensus than non-experts within LIS, although the experts did not; and the agreement rates on the non-LIS task for all groups were only poor to moderate. It was only when the groups were analyzed as two groups (experts including Master's candidates and non-experts) that the expected trend of higher consistency among experts in the LIS task was seen.

본 논문은 주제분야 전문지식이 적합성 판단에 미치는 영향을 온라인 실험을 통해 살펴보고 주제분야 전문지식이 적합성개념의 기반이 될 수 있는 지를 검증해 보려고 하였다. 문헌정보학 전문가 6명, 문헌정보학 석사과정 학생 9명, 비전문가 12명이 실험에 참여해 문헌정보학 분야에 대한 14개 논문초록과 문헌정보학 영역 이외 14개 논문초록의 적합성을 판정을 실시하였다. 적합성 판단의 일관성은 공동 확률 일치성(Joint-Probability Agreement, PA)과 IBM SPSS의 클래스간 상관관계 계수(Interclass Correlation Coefficient, ICC)를 통해 산출되었다. PA를 사용한 경우, 비전문가는 과제나 그룹 구분에 상관없이 높은 일관성이 보였다. ICC 계산에 따르면, 문헌정보학 전문가들과 비교하였을 때, 문헌정보학 석사과정학생들은 비전문가들보다 높은 수준의 일관성을 가지고 있다는 것으로 나타났다. 2개 그룹(석사 및 박사를 통합으로 하는 전문가그룹과 비전문가)으로 구분하였을 때는 문헌정보학분야 과제에서 예상대로 전문가들이 더 높은 수준의 일관성을 보이는 경향을 볼 수 있었다.

Keywords

References

  1. Bailey, P., Craswell, N., Soboroff, I., Thomas, P., Vries, A.D., & Yilmaz, E. (2008). Relevance assessment: are judges exchangeable and does it matter. Proceedings of the 31st annual International ACM Sigir Conference on Research and Development in Information Retrieval, 667-674. http://doi.org/10.1145/1390334.1390447
  2. Beck, S., Ruhnke, B., Issleib, M., Daubmann, A., Harendza, S., & Zollner, C. (2016). Analyses of inter-rater reliability between professionals, medical students and trained school children as assessors of basic life support skills. BMC Medical Education, 16(1), 263. http://doi.org/10.1186/s12909-016-0788-9
  3. Cole, T. J. & Altman, D. G. (2017). Statistics notes: What is a percentage difference? BMJ: British Medical Journal (Online), 358. http://doi.org/10.1136/bmj.j3663
  4. Crump, M. J. C., McDonnell, J. V., & Gureckis, T. M. (2013). Evaluating amazon's mechanical turk as a tool for experimental behavioral research. PloS One, 8(3), e57410-e57410. http://doi.org/10.1371/journal.pone.0057410
  5. Dong, P., Loh, M., & Mondry, A. (2005). Relevance similarity: An alternative means to monitor information retrieval systems. Biomedical Digital Libraries, 2(1), 6-6. http://doi.org/10.1186/1742-5581-2-6
  6. Foskett, D. (1972). A note on the concept of "relevance". Inform. Star. Retr., 8, 77-78. http://doi.org/10.1016/0020-0271(72)90009-5
  7. Harter, S. (1992). Psychological Relevance and Information Science. Journal of the American Society for Information Science, 43(9), 602-615. http://doi.org/10.1002/(SICI)1097-4571(199210)43:9<602::AID-ASI3>3.0.CO;2-Q
  8. Hjorland, B. & Albrechtsen, H. (1995). Toward a new horizon in information-science - domain-analysis. Journal of the American Society for Information Science, 46(6), 400-425. http://doi.org/10.1002/(SICI)1097-4571(199507)46:6<400::AID-ASI2>3.0.CO;2-Y
  9. Hjorland, B. (2002). Epistemology and the socio-cognitive perspective in information science. Journal of the American Society for Information Science and Technology, 53(4), 257-270. http://doi.org/10.1002/asi.10042
  10. Hjorland, B. (2010). The foundation of the concept of relevance. Journal of the American Society for Information Science and Technology, 61(2), 217-237. http://doi.org/10.1002/asi.21261
  11. Huang, M. & Hui-yu, W. (2004). The influence of document presentation order and number of documents judged on users' judgments of relevance. Journal of the American Society for Information Science and Technology, 55(11), 970-979. http://doi.org/10.1002/asi.20047
  12. Huang, X. & Soergel, D. (2012). Relevance: An improved framework for explicating the notion. Journal of the American Society for Information Science and Technology, 64(1), 18-35. doi:10.1002/asi.22811
  13. Ingerwesen, P. & Jarvelin, H. (2005) Information retrieval in context: IRiX. ACR SIGIR Forum, 39(2), 31-39. http://doi.org/10.1145/1113343.1113351
  14. Janes, J. W. (1994). Other people's judgments: A comparison of users' and others' judgments of document relevance, topicality, and utility. Journal of the American Society for Information Science (1986-1998), 45(3), 160. http://doi.org/10.1002/(SICI)1097-4571(199404)45:3<160::AID-ASI6>3.0.CO;2-4
  15. Jiang, J. (2017). Ephemeral Relevance and User Activities in a Search Session. Doctoral dissertation, University of Pittsburg, United States. Available: http://d-scholarship.pitt.edu/30612/
  16. Koo, T. K. & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155-163. http://doi.org10.1016/j.jcm.2016.02.012
  17. Kwon, H. (2016). On the social epistemological nature of questions: A comparison of knowledge domains' question formulations on the topic of "memory". Doctoral dissertation, Rutgers, United States. http://doi.org/doi:10.7282/T36Q20DB
  18. Liu, J. & Zhang, X. (2019). The role of domain knowledge in document selection from search results. Journal of the Association for Information Science and Technology, 70(11), 1236-1247. http://doi.org/10.1002/asi.24199
  19. Merriam-Webster. (n.d.). Relevance. In Merriam-Webster.com Dictionary. Available: https://www.merriam-webster.com/dictionary/relevance
  20. Mizzaro, S. (1997). Relevance: The whole history. Journal of the American Society for Information Science, 48(9), 810-832. http://doi.org/10.1002/(SICI)1097-4571(199709)48:9<810::AID-ASI6>3.0.CO;2-U
  21. Mizzaro, S. (1998). How many relevances in information retrieval?. Interacting with Computers, 10(3), 303-320. http://doi.org/10.1016/S0953-5438(98)00012-5
  22. Nweke, W. C., Perkins, T. P., & Afolabi, C. Y. (2019). Reliability analysis of complementary assessment tools for measuring teacher candidate dispositions. Georgia Educational Researcher, 16(2), Article 2. http://doi.org/10.20429/ger.2019.160202
  23. Quarfoot, D. & Levine, R. (2016). How robust are multirater interrater reliability indices to changes in frequency distribution?. The American Statistician, 70(4), 373-384. http://doi.org/10.1080/00031305.2016.1141708
  24. Rees, A. & Schultz, D. G. (1967). A Field Experimental Approach to the Study of Relevance Assessments in Relation to Document Searching. Final Report to the National Science Foundation. Volume II, Appendices. Springfield, VA: Clearinghouse for Federal Scientific and Technical Information.
  25. Ruthven, I. (2014). Relevance behaviour in TREC. Journal of Documentation, 70(6), 1098-1117. http://doi.org/10.1108/JD-02-2014-0031
  26. Saracevic, T. (1975). Relevance: A review of and a framework for the thinking on the notion in information science. Journal of the American Society for Information Science, 26(6), 321-343. http://doi.org/10.1002/asi.4630260604
  27. Saracevic, T. (1997). The stratified model of information retrieval interaction: Extension and applications. Proceedings of the American Society for Information Science, 34, 313-327. Available: https://www.researchgate.net/publication/333293923_The_stratified_model_of_information_retrieval_interaction_Extension_and_applications
  28. Saracevic, T. (2007). Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II: nature and manifestations of relevance. Journal of the American Society for Information Science and Technology, 58(13), 1915-1933. http://doi.org/10.1002/asi.20682
  29. Saracevic, T. (2017). The Notion of Relevance in Information Science: Everybody knows what relevance is. But, what is it really? San Rafael, CA: Morgan and Claypool Publishers.
  30. Spink, A. & Greisdorf. H. (2001). Regions and levels: measuring and mapping users' relevance judgments. Journal of the American Society for Information Science and Technology, 52(2), 161-173. http://doi.org/10.1002/1097-4571(2000)9999:99993.0.CO;2-L
  31. Talja, S. & Maula, H. (2003). Reasons for the use and non-use of electronic journals and databases: A domain analytic study in four scholarly disciplines. Journal of Documentation, 59(6), 673. http://doi.org/10.1108/00220410310506312
  32. Tamine, L. & Chouquet, C. (2017). On the impact of domain expertise on query formulation, relevance assessment and retrieval performance in clinical settings. Information Processing and Management, 53(2), 332-350. http://doi.org/10.1016/j.ipm.2016.11.004
  33. Tang, R., Shaw, W. M., & Vevea, J. L. (1999). Towards the identification of the optimal number of relevance categories. Journal of the American Society for Information Science, 50(3), 254-264. http://doi.org/10.1002/(SICI)1097-4571(1999)50:3<254::AID-ASI8>3.0.CO;2-Y
  34. Vakkari, P. & Sormunen, E. (2004). The influence of relevance levels on the effectiveness of interactive information retrieval. Journal of the American Society for Information Science and Technology, 55(11), 963-969. http://doi.org10.1002/asi.20046
  35. Van der Veer Martens, B. & Van Fleet, C. (2012). Opening the black box of "relevance work": A domain analysis. Journal of the American Society for Information Science and Technology, 63(5), 936-947. http://doi/org10.1002/asi.21699
  36. Van Rijsbergen, C. J. (1986). A new theoretical framework for information retrieval. ACM SIGIR Forum, 21(1-2), 23-29. http://doi.org/10.1145/3130348.3130354
  37. Voorhees, E. M. & Harman, D. K. (Eds.). (2005). TREC: Experiment and evaluation in information retrieval. Cambridge, MA: MIT Press.
  38. White, R., Dumais, S., & Teevan, J. (2009). Characterizing the influence of domain expertise on web search behavior. Proceedings of the Second ACM International Conference on Web Search and Data Mining, 132-141. http://doi.org/10.1145/1498759.1498819
  39. Zhitomirsky-Geffet, M., Bar-Ilan, J., & Levene, M. (2017). Analysis of change in users' assessment of search results over time. Journal of the Association for Information Science and Technology, 68(5), 1137-1148. http://doi.org/10.1002/asi.23745
  40. Zhitomirsky-Geffet, M., Bar-Ilan, J., & Levene, M. (2018). Categorical relevance judgment. Journal of the Association for Information Science and Technology, 69(9), 1084-1094. http://doi.org/10.1002/asi.24035