首页 > 学术百科

Kappa系数:一种衡量评估者间一致性的常用方法

Kappa系数：一种衡量评估者间一致性的常用方法

Biostatistics in psychiatry (25)?

Kappa coefficient: a popular measure of rater agreement

Wan TANG 1*, Jun HU 2, Hui ZHANG 3, Pan WU 4, Hua HE 1,5

1 Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY , United States

College of Basic Science and Information Engineering, Yunnan Agricultural University, Kunming, Yunnan Province, China 3

Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, TN, United States 4

Value Institute, Christiana Care Health System, Newark, DE, United States 5

Center of Excellence for Suicide Prevention, Canandaigua VA Medical Center Canandaigua, NY , United States *correspondence: ***********************.edu

A full-text Chinese translation of this article will be available at /cn on March 25, 2015.

Summary: In mental health and psychosocial studies it is often necessary to report on the between-rater agreement of measures used in the study. This paper discusses the concept of agreement, highlighting its fundamental difference from correlation. Several examples demonstrate how to compute the kappa coefficient – a popular statistic for measuring agreement – both by hand and by using statistical software packages such as SAS and SPSS. Real study data are used to illustrate how to use and interpret this coefficient in clinical research and practice. The article concludes with a discussion of the limitations of the coefficient. Keywords: interrater agreement; kappa coefficient; weighted kappa; correlation [Shanghai Arch Psychiatry . 2015; 27(1): 62-67. doi: 10.11919/j.issn.1002-0829.215010]

kappa系数

大接访

1. Introduction

For most physical illnesses such as high blood pressure and tuberculosis, definitive diagnoses can be made using medical devices such as a sphygmomanometer for blood pressure or an X-ray for tuberculosis. However, there are no error-free gold standard physical indicators of mental disorders, so the diagnosis and severity of mental disorders typically depends on the use of instruments (questionnaires) that attempt to measure latent multi-faceted constructs. For example, psychiatric diagnoses are often based on criteria specified in the Fourth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV)[1], published by the American Psychiatric Association. But different clinicians may have different opinions about the presence or absence of the specific symptoms required to determine the presence of a diagnosis, so there is typically no perfect agreement between evaluators. In this situation, statistical methods are needed to address variability in clinicians’ ratings.

2005梦想中国Cohen’s kappa is a widely used index for assessing agreement between raters.[2] Althou

机辅翻译gh similar in appearance, agreement is a fundamentally different concept from correlation. To illustrate, consider an instrument with six items and suppose that two raters’ ratings of the six items on a single subject are (3,5), (4,6), (5,7), (6,8), (7,9) and (8,10). Although the scores of the two raters are quite different, the Pearson correlation广州远洋运输公司

数量词

coefficient for the two scores is 1, indicating perfect correlation. The paradox occurs because there is a bias in the scoring that results in a consistent difference of 2 points in the scores of the two raters for all 6 items in the instrument. Thus, although perfectly correlated (precision), there is quite poor agreement between the two raters. The kappa index, the most popular measure of raters’ agreement, resolves this problem by assessing both the bias and the precision between raters’ ratings.

In addition to its applications to psychiatric diagnosis, the concept of agreement is also widely applied to assess the utility of diagnostic and screening tests. Diagnostic tests provide information about a patient’s condition that clinicians’ often use when making decisions about the management of patients. Early detection of disease or of important ch

anges in the clinical status of patients often leads to less suffering and quicker recovery, but false negative and false positive screening results can result in delayed treatment or in inappropriate treatment. Thus when a new diagnostic or screening test is developed, it is critical to assess its accuracy by comparing test results with those from a gold or reference standard. When assessing such tests, it is incorrect to measure the correlation of the results of the test and the gold standard, the correct procedure is to assess the agreement of the test results with the gold standard.

本文发布于:2024-09-20 17:53:56，感谢您对本站的认可！

本文链接：https://www.17tex.com/xueshu/720786.html

上一篇：kappa统计值与分类精度的对应关系(举例说明)

下一篇：遥感分类精度评价方法--混淆矩阵和kappa系数

标签：一致性评估衡量梦想翻译广州中国机辅

留言与评论（共有 0 条评论）