Kappa系数:一种衡量评估者间一致性的常用方法

Kappa系数:一种衡量评估者间一致性的常用方法
Biostatistics in psychiatry (25)?
Kappa coefficient: a popular measure of rater agreement
Wan TANG 1*, Jun HU 2, Hui ZHANG 3, Pan WU 4, Hua HE 1,5
1 Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY , United States
2
College of Basic Science and Information Engineering, Yunnan Agricultural University, Kunming, Yunnan Province, China 3
Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, TN, United States 4
Value Institute, Christiana Care Health System, Newark, DE, United States 5
Center of Excellence for Suicide Prevention, Canandaigua VA Medical Center Canandaigua, NY , United States *correspondence: ***********************.edu
A full-text Chinese translation of this article will be available at /cn on March 25, 2015.
Summary: In mental health and psychosocial studies it is often necessary to report on the between-rater agreement of measures used in the study. This paper discusses the concept of agreement, highlighting its fundamental difference from correlation. Several examples demonstrate how to compute the kappa coefficient – a popular statistic for measuring agreement – both by hand and by using statistical software packages such as SAS and SPSS. Real study data are used to illustrate how to use and interpret this coefficient in clinical research and practice. The article concludes with a discussion of the limitations of the coefficient. Keywords: interrater agreement; kappa coefficient; weighted kappa; correlation [Shanghai Arch Psychiatry . 2015; 27(1): 62-67. doi: 10.11919/j.issn.1002-0829.215010]
kappa系数
大接访
1. Introduction
For most physical illnesses such as high blood pressure and tuberculosis, definitive diagnoses can be made using medical devices such as a sphygmomanometer for blood pressure or an X-ray for tuberculosis. However, there are no error-free gold standard physical indicators of mental disorders, so the diagnosis and severity of mental disorders typically depends on the use of instruments (questionnaires) that attempt to measure latent multi-faceted constructs. For example, psychiatric diagnoses are often based on criteria specified in the Fourth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV)[1], published by the American Psychiatric Association. But different clinicians may have different opinions about the presence or absence of the specific symptoms required to determine the presence of a diagnosis, so there is typically no perfect agreement between evaluators. In this situation, statistical methods are needed to address variability in clinicians’ ratings.
2005梦想中国Cohen’s kappa is a widely used index for assessing agreement between raters.[2] Althou
机辅翻译gh similar in appearance, agreement is a fundamentally different concept from correlation. To illustrate, consider an instrument with six items and suppose that two raters’ ratings of the six items on a single subject are (3,5), (4,6), (5,7), (6,8), (7,9) and (8,10). Although the scores of the two raters are quite different, the Pearson correlation广州远洋运输公司
数量词
coefficient for the two scores is 1, indicating perfect correlation. The paradox occurs because there is a bias in the scoring that results in a consistent difference of 2 points in the scores of the two raters for all 6 items in the instrument. Thus, although perfectly correlated (precision), there is quite poor agreement between the two raters. The kappa index, the most popular measure of raters’ agreement, resolves this problem by assessing both the bias and the precision between raters’ ratings.
In addition to its applications to psychiatric diagnosis, the concept of agreement is also widely applied to assess the utility of diagnostic and screening tests. Diagnostic tests provide information about a patient’s condition that clinicians’ often use when making decisions about the management of patients. Early detection of disease or of important ch
anges in the clinical status of patients often leads to less suffering and quicker recovery, but false negative and false positive screening results can result in delayed treatment or in inappropriate treatment. Thus when a new diagnostic or screening test is developed, it is critical to assess its accuracy by comparing test results with those from a gold or reference standard. When assessing such tests, it is incorrect to measure the correlation of the results of the test and the gold standard, the correct procedure is to assess the agreement of the test results with the gold standard.

本文发布于:2024-09-20 17:53:56,感谢您对本站的认可!

本文链接:https://www.17tex.com/xueshu/720786.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:一致性   评估   衡量   梦想   翻译   广州   中国   机辅
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2024 Comsenz Inc.Powered by © 易纺专利技术学习网 豫ICP备2022007602号 豫公网安备41160202000603 站长QQ:729038198 关于我们 投诉建议