1. Agresti, A (1992). "Modelling patterns of agreement and disagreement." Stat Methods Med Res 1(2): 201-218. 
  2. Henriques T, Antunes L, Bernardes J, Matias M, Sato D, Costa-Santos C. "Information-based measure of disagreement for more than two observers: a useful tool to compare the degree of observer disagreement." BMC Medical Research Methodology 2013, 13:47
  3. Altman, DG and Bland, JM (1994). "Diagnostic tests. 1: Sensitivity and specificity." BMJ 308(6943): 1552.
  4. Atkinson, G and Nevill, A (1997). "Comment on the Use of Concordance Correlation to Assess the Agreement between Two Variables." Biometrics 53(775-777).
  5. Bartko, JJ (1966). "The intraclass correlation coefficient as a measure of reliability." Psychol Rep 19(1): 3-11.
  6. Bland, JM (2000). An introduction to medical statistics, Oxford University Press.
  7. Bland, JM. (2004, May 2004). "How do I analyse observer variation studies?" Retrieved 06-01-2010, from http://wwwusers.york.ac.uk/~mb55/meas/observer.pdf.
  8. Bland, JM and Altman, DG (1986). "Statistical methods for assessing agreement between two methods of clinical measurement." Lancet 1(8476): 307-310.
  9. Bland, JM and Altman, DG (1990). "A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement." Comput Biol Med 20(5): 337-340.
  10. Bland, JM and Altman, DG (1995). "Comparing methods of measurement: why plotting difference against standard method is misleading." Lancet 346(8982): 1085-1087.
  11. Bland, JM and Altman, DG (2003). "Applying the right statistics: analyses of measurement studies." Ultrasound Obstet Gynecol 22(1): 85-93.
  12. Byrt, T, Bishop, J and Carlin, JB (1993). "Bias, prevalence and kappa." J Clin Epidemiol 46(5): 423-429.
  13. Chamberlain, J, Rogers, P, Price, JL, Ginks, S, Nathan, BE and Burn, I (1975). "Validity of clinical examination and mammography as screening tests for breast cancer." Lancet 2(7943): 1026-1030. 
  14. Cohen, J (1960). "A coefficient of agreement for nominal scales."Educational and Psychological Measurement 20: 37-46.
  15. Cohen, J (1968). "Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit." Phychological Bulletin 70: 213-220. 
  16. Cook, R (2005). Kappa and its dependence on marginal rates. The Encyclopedia of Biostatistics. Armitage, P. New York, Wiley: 2166-2168. 
  17. Costa Santos C, Costa Pereira A, Bernardes J (2005). Agreement studies in obstetrics and gynaecology: Inappropriateness, controversies and consequences. BJOG. 112 (5): 667-9
  18. Costa-Santos C , Bernardes J, Ayres-de-Campos D, Costa A, Costa C (2011). The limits of agreement and the intra-class correlation coefficient may be inconsistent in the interpretation of agreement. J Clin Epidemiol. 64: 264-269. 
  19. Costa-Santos C, Antunes L, Souto A, Bernardes J (2010). Assessment of disagreement: a new information based approach. Ann Epidemiol. 20:552-558
  20. de Vet, HC (2005). Observe Reliability and Agreement. Encyclopedia of Biostatistics. Armitage, Pc, T., John Wiley.
  21. Dice, L (1945). "Measurements of the amount of ecologic association between species. " Ecology 26: 297-302.
  22. Fleiss, J (1971). "Measuring Nominal Scale agreement among many raters."Psychological Bulletin 76(5): 378-382.
  23. Fleiss, J (1986). Design and Analysis of Clinical Experiments. New York,John Wiley & Sons.
  24. Fleiss, J and Cohen, J (1973). "The equivalence of weighted kappa and intraclass correlation coefficient as measures of reliability." Educational and Psychological Measurements 33(613–619.).
  25. Fleiss, J, Levin, B and Paik, M (2003). Statistical Methods for Rates and Proportions, John Wiley & Sons, Inc.
  26. Giraudeau, B (1996). "Negative values of the intraclass correlation coefficient are not theoretically possible." J Clin Epidemiol 49(10): 1205-1206.
  27. Goddman, L and Kruskal, W (1954). "Measures of association for cross classification." Journal of American Statistical Association. 49: 732-764.
  28. Gow, RM, Barrowman, NJ, Lai, L and Moher, D (2008). "A review of five cardiology journals found that observer variability of measured variables was infrequently reported." J Clin Epidemiol 61(4): 394-401. 
  29. Jakobsson, U and Westergren, A (2005). "Statistical methods for assessing agreement for ordinal data." Scand J Caring Sci 19(4): 427-431. 
  30. Khan, KS and Chien, PF (2001). "Evaluation of a clinical test. I: assessment of reliability." BJOG 108(6): 562-567. 
  31. Landis, JR and Koch, GG (1977). "The measurement of observer agreement for categorical data." Biometrics 33(1): 159-174.
  32. Lee, J, Koh, D and Ong, CN (1989). "Statistical evaluation of agreement between two methods for measuring a quantitative variable." Comput Biol Med 19(1): 61-70.
  33. Lin, LI (1989). "A concordance correlation coefficient to evaluate reproducibility." Biometrics 45(1): 255-268.
  34. Luiz, RR and Szklo, M (2005). "More than one statistical strategy to assess agreement of quantitative measurements may usefully be reported." J Clin Epidemiol 58(3): 215-216.
  35. Markus, H, Bland, JM, Rose, G, Sitzer, M and Siebler, M (1996). "How good is intercenter agreement in the identification of embolic signals in carotid artery disease?" Stroke 27(7): 1249-1252.
  36. McGraw, K and Wong, S (1996). "Forming inferences about some intraclass correlation coefficients." Psychological Methods 1: 30-46.
  37. Metz, CE (1978). "Basic principles of ROC analysis." Semin Nucl Med 8(4): 283-298.
  38. Muller, R and Buttner, P (1994). "A critical discussion of intraclass correlation coefficients." Stat Med 13(23-24): 2465-2476.
  39. Nickerson, C (1997). "A Note on "A Concordance Correlation Coefficientto Evaluate Reproducibility"." Biometrics 53: 1503-1507.
  40. Plsek, PE and Greenhalgh, T (2001). "Complexity science: The challenge of  complexity in health care." BMJ 323(7313): 625-628.
  41. Rogot, E and Goldberg, ID (1966). "A proposed index for measuring agreement in test-retest studies." J Chronic Dis 19(9): 991-1006.
  42. Rosner, B (1990). Fundamentals of biostatistics. Boston, PWS-Kent Publishing.
  43. Rothwell, PM (2000). "Analysis of agreement between measurements of continuous variables: general principles and lessons from studies of imaging of carotid stenosis." J Neurol 247(11): 825-834.
  44. Shoukri, M (2005). Agreement, Measurement of. E. Armitage, Pc, T., John Wiley.
  45. Shoukri, M and Edge, V (1996). Statistical Methods for Health Sciences,CRC Press, Inc.
  46. Shrout, PE and Fleiss, JL (1979). "Intraclass correlations: uses in assessing rater reliability." Psychol Bull 86(2): 420-428.
  47. Uebersax, J. (2000, 02 Oct 2009 ). "Raw Agreement Indices." Retrieved January, 2010., from http://www.john-uebersax.com/stat/raw.htm
  48. Uebersax, J. (2009). "The Myth of Chance-Corrected Agreement." Retrieved January 2010, from http://www.johnuebersax.com/stat/kappa2.htm.
  49. Uebersax, JS (1987). "Diversity of decision-making models and the measurement of interrater agreement." Psychological Bulletin 101: 140-146.
  50. Uebersax, JS (1992). "Modeling approaches for the analysis of observer agreement." Invest Radiol 27(9): 738-743.
  51. Wilson, T, Holt, T and Greenhalgh, T (2001). "Complexity science:complexity and clinical care." BMJ 323(7314): 685-688.

Last updated: March 2011