Permutation p-values for specific-category kappa measure of agreement

Title & Authors
Permutation p-values for specific-category kappa measure of agreement
Um, Yonghwan;

Abstract
Asymptotic tests are often not suitable for the analysis of sparse ordered contingency tables as asymptotic p-values may either overestimate or underestimate the true pvalues. In this pater, we describe permutation procedures in which we compute exact or resampling p-values for a weighted specific-category agreement in ordered $\small{k{\times}k}$ contingency tables. We use the weighted specific-category kappa proposed by $\small{Kv{\dot{a}}lseth}$ to measure the extent to which two independent raters agree on the specific categories. We carried out comparison studies between exact p-values, resampling p-values and asymptotic p-values using $\small{3{\times}3}$ contingency data (real and artificial data sets) and $\small{4{\times}4}$ artificial contingency data.
Keywords
Contingency tables;permutation;p-values;weighted specific category agreement;
Language
Korean
Cited by
References
1.
Agresti, A. (2002). Categorical data anaysis, 2nd Ed., Wiley, New York.

2.
Berry, K. J., Johnston, J. E. and Mielke, P. W. (2006). Exact and resampling probability values for measures associated with ordered R by C contingency tables. Psychological Reports, 99, 231-238.

3.
Cicchetti, D. V. and Allison, T. (1971). A new procedure for assessing reliability of scoring EEG sleep redordings. The American Journal of EEG Technology, 11, 101-109.

4.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37-46.

5.
Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213-220.

6.
Feinstein, A. R. and Cicchetti, D. V. (1990). High agreement but low kappa: I. The problems of two paradoxes. Journal of Clinical Epidemiology, 43, 543-549.

7.
Fisher R. A. (1935). A design of experiments, Oliver & Boyd, Edinburgh.

8.
Fleiss, J. L. (1981). Statistical methods for rates and proportions, 2nd Ed., Wiley, New York.

9.
Fleiss, J. L. and Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 2, 113-117.

10.
Good, P. I. (2000). Permutation tests : A practical guide to resampling to resampling methods for testing hypotheses, 2nd Ed., springer-Verlag, New York.

11.
Good, P. I. (2001). Resampling methods : A practical guide to data analysis, 2nd Ed., Birkhauser, Massachusetts.

12.
Han, K. D. and Park Y. G. (2012). A simulation study of rater agreement measures. Journal of the Korean Data & Information Science Society, 23, 25-37.

13.
Holms, C. B. (1979). Sample size in psychological research. Perceptual and Motor Skills, 49, 283-288.

14.
Holms, C. B. (1990). The honest truth about lying with statistics, Thomas Springfield, Illinois.

15.
Johnston, J. E., Berry, K. J. and Mielke, P. W. (2007). Permutation tests: Precision in estimating probability values. Perceptual and Motor Skills, 105, 915-920.

16.
Johnston, J. E., Berry, K. J. and Mielke, P. W. (2008). Resampling permutation probability values for weighted kappa. Psychological Reports, 103, 467-475.

17.
Kim, J. and Lee, J. D. (2014). Independence tests using coin package in R. Journal of the Korean Data & Information Science Society, 25, 1039-1055.

18.
Kraemer, H. C. (1983). Kappa coefficient. In Encyclopedia of Statistical Sciences 4, Wiley, New York, 352-354.

19.
Kvalseth, T. O. (1989). Note on Cohen's kappa. Psychological Reports, 65, 223-226.

20.
Kvalseth, T. O. (2003). Weighted specific-category kappa measure of interobserver agreement. Psychological Reports, 93, 1283-1290.

21.
Mielke, P. W. and Berry, K. J. (2001). Permutation methods : A distance function approach. 2001, Springer-Verlag, New York.

22.
Oleckno, W. A. (2008). Epidemiology : Concepts and methods, Waveland Press, Inc., Illinois.

23.
Patefield, W. M. (1981). Algorithm AS 159: An efficient method of generating random R ${\time}$ C tables with given row and column totals. Journal of the Royal Statistical Society C, 30, 91-97.

24.
Shoukri, M. M. (2004). Measures of intererobserver agreement, CRC Press, Florida.

25.
Spitzer, R. L., Cohen, J., Fleiss, J. L. and Endicott, J. (1967). Quantization ofagreement in psychiatric diagnosis. Archives of General Psychiatry, 17, 83-87.

26.
Upton, G. and Cook, I. (2002). Oxford dictionary of statistics, Oxford University Press, United Kingdom.

27.
Zhao, X. (2011). When to use Cohens K, if ever? International Communication Association 2011 Conference.