DOI QR코드

DOI QR Code

A simulation study of rater agreement measures

모의 실험을 이용한 여러 합치도들의 비교

  • Han, Kyung-Do (Department of Biostatistics, The Catholic University of Korea) ;
  • Park, Yong-Gyu (Department of Biostatistics, The Catholic University of Korea)
  • 한경도 (가톨릭대학교 의학통계학과) ;
  • 박용규 (가톨릭대학교 의학통계학과)
  • Received : 2011.11.08
  • Accepted : 2011.11.29
  • Published : 2012.01.31

Abstract

Many statistics, such as Cohen's (1960) ${\kappa}$, Scott's (1955) ${\pi}$, and Park and Park's (2007) H have been proposed as measures of agreement to represent inter-rater reliability. This study compared bias, SE, MSE, and CV of the measures of agreement with nominal and ordinal categories in the balanced marginal distributions, and those with nominal categories in the two paradoxical situations. As a result, in all cases, AC1and Hhad smaller SE and CV.

두 평정자간 평가의 일치정도를 나타내는 합치도로 Cohen (1960)의 ${\pi}$, Scott (1955)의 H, 박미희와 박용규 (2007)의 등 많은 통계량이 제안되어왔다. 모의실험을 통하여 균형적 주변분포에서의 명목형과 순서형 합치도, 두 가지 역설이 발생하는 불균형 주변분포에서의 명목형 합치도들의 편의, 표준오차, 평균오차제곱 분산, 변이계수를 비교한 결과, 모든 경우에서 AC1과 H의 표준오차와 변이계수가 가장 작게 나타났다.

Keywords

References

  1. 권나영, 김진곤, 박용규 (2009). 가중 합치도 $H_w$와 k의 새로운 역설. <응용통계연구>, 22, 1073-1085.
  2. 김진곤, 박미희, 박용규 (2009). $m{\times}m$ 분할표에서의 합치도 H. <한국통계학회논문집>, 16, 753-762.
  3. 박미희, 박용규 (2007). COHEN의 합치도의 두 가지 역설을 해결하기 위한 새로운 합치도의 제안. <응용통계연구>, 20, 117-132.
  4. Agresti, A. (2002). Categorical data analysis, Wiley, New York.
  5. Cicchetti, D. V. and Allison, T. (1971). A new procedure for assessing reliability of scoring EEG sleep recordings. American Journal of EEG Technology, 11, 101-109.
  6. Cohen, J. (1960). A coefficient of agreement for nominal scales. Education and Psychological Measurement, 20, 37-46. https://doi.org/10.1177/001316446002000104
  7. Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213-220. https://doi.org/10.1037/h0026256
  8. Feinstein, A. R. and Cicchetti, D. V. (1990). High agreement but low kappa: 1. The problems of two paradoxes. Journal of Clinical Epidemiology, 43, 543-549. https://doi.org/10.1016/0895-4356(90)90158-L
  9. Fleiss, J. L. and Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Education and Psychological Measurement, 33, 613-619. https://doi.org/10.1177/001316447303300309
  10. Gwet, K. (2001). Handbook of inter-rater reliability, STATAXIS Publishing company, Gaithersburg.
  11. Holley, J. W. and Guilford, J. P. (1964). A note on the G index of agreement. Education and Psychological Measurement, 24, 749-753. https://doi.org/10.1177/001316446402400402
  12. Janson, S. and Vegelius, J. (1979). On generalizations of the G index and the PHI coefficient to nominal scales. Multivariate Behavioral Research, 14, 255-269. https://doi.org/10.1207/s15327906mbr1402_9
  13. Scott, W. A. (1955). Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 19, 321-325. https://doi.org/10.1086/266577

Cited by

  1. Permutation p-values for specific-category kappa measure of agreement vol.27, pp.4, 2016, https://doi.org/10.7465/jkdi.2016.27.4.899