JOURNAL BROWSE
Search
Advanced SearchSearch Tips
A Study on Comparison of Generalized Kappa Statistics in Agreement Analysis
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
A Study on Comparison of Generalized Kappa Statistics in Agreement Analysis
Kim, Min-Seon; Song, Ki-Jun; Nam, Chung-Mo; Jung, In-Kyung;
  PDF(new window)
 Abstract
Agreement analysis is conducted to assess reliability among rating results performed repeatedly on the same subjects by one or more raters. The kappa statistic is commonly used when rating scales are categorical. The simple and weighted kappa statistics are used to measure the degree of agreement between two raters, and the generalized kappa statistics to measure the degree of agreement among more than two raters. In this paper, we compare the performance of four different generalized kappa statistics proposed by Fleiss (1971), Conger (1980), Randolph (2005), and Gwet (2008a). We also examine how sensitive each of four generalized kappa statistics can be to the marginal probability distribution as to whether marginal balancedness and/or homogeneity hold or not. The performance of the four methods is compared in terms of the relative bias and coverage rate through simulation studies in various scenarios with different numbers of raters, subjects, and categories. A real data example is also presented to illustrate the four methods.
 Keywords
Agreement;generalized kappa;marginal probability distribution;
 Language
Korean
 Cited by
1.
Measurement of Inter-Rater Reliability in Systematic Review, Hanyang Medical Reviews, 2015, 35, 1, 44  crossref(new windwow)
2.
Development of a scale to measure diabetes self-management behaviors among older Koreans with type 2 diabetes, based on the seven domains identified by the American Association of Diabetes Educators, Japan Journal of Nursing Science, 2016  crossref(new windwow)
 References
1.
Berry, K. J. and Mielke, P. W. (1988). A generalization of Cohen's kappa, Educational and Psychological Measurement, 48, 921-933. crossref(new window)

2.
Brennan, R. L. and Prediger, D. J. (1981). Coefficient kappa: Some uses, misuses, and alternatives, Educational and Psychological Measurement, 41, 687-699. crossref(new window)

3.
Cohen, J. (1960). A coefficient of agreement for nominal scales, Educational and Psychological Measurement, 20, 37-46. crossref(new window)

4.
Cohen, J. (1968). Weighted kappa: Nominal scale agreement provision for scaled disagreement of partial credit, Psychological Bulletin, 70, 213-220. crossref(new window)

5.
Conger, A. J. (1980). Integration and generalization of kappas for multiple raters, Psychological Bulletin, 88, 322-328. crossref(new window)

6.
Feinstein, A. R. and Cicchetti, D. V. (1990). High agreement but low kappa: 1. The problems of two paradoxes, Journal of Clinical Epidemiology, 43, 543-549. crossref(new window)

7.
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters, Psychological Bulletin, 76, 378-382. crossref(new window)

8.
Gwet, K. L. (2008a). Computing inter-rater reliability and its variance in the presence of high agreement, British Journal of Mathematical and Statistical Psychology, 61, 29-48. crossref(new window)

9.
Gwet, K. L. (2008b). Variance estimation of nominal-scale interrater reliability with random selection of raters, Psychometrika, 73, 407-430. crossref(new window)

10.
Gwet, K. L. (2010). Handbook of Inter-Rater Reliability, 2nd edn. Advanced Analytics, LLC.

11.
Janson, H. and Olsson, U. (2001). A measure of agreement for interval or nominal multivariate observations, Educational and Psychological Measurement, 61, 277-289. crossref(new window)

12.
Janson, H. and Olsson, U. (2004). A measure of agreement for interval or nominal multivariate observations by different sets of judges, Educational and Psychological Measurement, 64, 62-70. crossref(new window)

13.
Park, M. H. and Park, Y. G. (2007). A new measure of agreement to resolve the two paradoxes of Cohen's kappa, The Korean Journal of Applied Statistics, 20, 117-132. crossref(new window)

14.
Quenouille, M. H. (1949). Approximate test of correlation in time-series, Journal of the Royal Statistical Society, Series B, (Methodological), 11, 68-84.

15.
Randolph, J. J. (2005). Free-marginal multirater kappa: An alternative to Fleiss' fixed-marginal multirater kappa, Paper presented at the Joensuu University Learning and Instruction Symposium.

16.
Scott, W. (1955). Reliability of content analysis: The case of nominal scale coding, Public Opinion Quarterly, 19, 321-325. crossref(new window)