A simulation study of rater agreement measures

Han, Kyung-Do;Park, Yong-Gyu;

doi:10.7465/jkdi.2012.23.1.025

Journal of the Korean Data and Information Science Society

Volume 23 Issue 1
/
Pages.25-37
/
2012
/
1598-9402(pISSN)

The Korean Data and Information Science Society (한국데이터정보과학회)

DOI QR Code

A simulation study of rater agreement measures

모의 실험을 이용한 여러 합치도들의 비교

Han, Kyung-Do (Department of Biostatistics, The Catholic University of Korea) ;
Park, Yong-Gyu (Department of Biostatistics, The Catholic University of Korea)

한경도 (가톨릭대학교 의학통계학과) ;
박용규 (가톨릭대학교 의학통계학과)

Received : 2011.11.08
Accepted : 2011.11.29
Published : 2012.01.31

https://doi.org/10.7465/jkdi.2012.23.1.025 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Many statistics, such as Cohen's (1960) ${\kappa}$, Scott's (1955) ${\pi}$, and Park and Park's (2007) H have been proposed as measures of agreement to represent inter-rater reliability. This study compared bias, SE, MSE, and CV of the measures of agreement with nominal and ordinal categories in the balanced marginal distributions, and those with nominal categories in the two paradoxical situations. As a result, in all cases, AC1and Hhad smaller SE and CV.

두 평정자간 평가의 일치정도를 나타내는 합치도로 Cohen (1960)의 ${\pi}$, Scott (1955)의 H, 박미희와 박용규 (2007)의 등 많은 통계량이 제안되어왔다. 모의실험을 통하여 균형적 주변분포에서의 명목형과 순서형 합치도, 두 가지 역설이 발생하는 불균형 주변분포에서의 명목형 합치도들의 편의, 표준오차, 평균오차제곱 분산, 변이계수를 비교한 결과, 모든 경우에서 AC1과 H의 표준오차와 변이계수가 가장 작게 나타났다.

Keywords

References

권나영, 김진곤, 박용규 (2009). 가중 합치도 $H_w$와 k의 새로운 역설. <응용통계연구>, 22, 1073-1085.
김진곤, 박미희, 박용규 (2009). $m{\times}m$ 분할표에서의 합치도 H. <한국통계학회논문집>, 16, 753-762.
박미희, 박용규 (2007). COHEN의 합치도의 두 가지 역설을 해결하기 위한 새로운 합치도의 제안. <응용통계연구>, 20, 117-132.
Agresti, A. (2002). Categorical data analysis, Wiley, New York.
Cicchetti, D. V. and Allison, T. (1971). A new procedure for assessing reliability of scoring EEG sleep recordings. American Journal of EEG Technology, 11, 101-109.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Education and Psychological Measurement, 20, 37-46. https://doi.org/10.1177/001316446002000104
Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213-220. https://doi.org/10.1037/h0026256
Feinstein, A. R. and Cicchetti, D. V. (1990). High agreement but low kappa: 1. The problems of two paradoxes. Journal of Clinical Epidemiology, 43, 543-549. https://doi.org/10.1016/0895-4356(90)90158-L
Fleiss, J. L. and Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Education and Psychological Measurement, 33, 613-619. https://doi.org/10.1177/001316447303300309
Gwet, K. (2001). Handbook of inter-rater reliability, STATAXIS Publishing company, Gaithersburg.
Holley, J. W. and Guilford, J. P. (1964). A note on the G index of agreement. Education and Psychological Measurement, 24, 749-753. https://doi.org/10.1177/001316446402400402
Janson, S. and Vegelius, J. (1979). On generalizations of the G index and the PHI coefficient to nominal scales. Multivariate Behavioral Research, 14, 255-269. https://doi.org/10.1207/s15327906mbr1402_9
Scott, W. A. (1955). Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 19, 321-325. https://doi.org/10.1086/266577

Cited by

Permutation p-values for specific-category kappa measure of agreement vol.27, pp.4, 2016, https://doi.org/10.7465/jkdi.2016.27.4.899

Journal of the Korean Data and Information Science Society

A simulation study of rater agreement measures

모의 실험을 이용한 여러 합치도들의 비교

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)