组间相关系数(Intraclass

组间相关系数
(Intraclass correlation coefficient, ICC)_1
Intraclass correlation (ICC) is used to measure inter-rater reliability for two or more raters. It may also be used to assess test-retest reliability. ICC may be conceptualized as the ratio of between-groups variance to total variance, as elaborated below. A classic citation for intraclass correlation is Shrout and Fleiss (1979), though ICC is based on work going back before WWI.
pkaICC vs. Pearson r: ICC is preferred over Pearson's r only when sample size is small (<15) or when there are more than two tests (one test, one retest) to be correlated. Since Pearson's r makes no assumptions about rater means, a t-test of the significance of r reveals if inter-rater means differ. For small samples (<15), Pearson's r overestimates test-retest correlation and in this situation, intraclass correlation is used instead of Pearson's r.
Data setup: In using intraclass correlation for inter-rater reliability, one constructs a table in which column 1 is the target id (1, 2, ..., n) and subsequent columns are the raters (A, B, C, ...). The row varia
ble is some grouping variable which is the target of the ratings, such as persons (Subject1, Subject2, etc.) or neighborhood (E, W, N, S). The cell entries after the first id column are the raters' ratings of the target on some interval variable or interval-like variable, such as some Likert scale. The purpose of ICC is to assess the inter-rater (column) effect in relation to the grouping (row) effect, using two-way ANOVA.
Interpretation: ICC is interpreted similar to Kappa, discussed above. ICC will approach 1.0 when there is no variance within targets (ex., subjects, neighborhoods -- for any target, all raters give the same ratings), indicating total variation in measurements on the Likert scale is due solely to the target (ex., subject, neighborhood) variable. That is, ICC will be high when any given row tends to have the same score across the columns (which are the raters). For instance, one may find all raters rate an item the same way for a given target, indicating total variation in the measure of a variable depends solely on the values of the variable being measured -- that is, there is perfect inter-rater reliability.
In SPSS, select Analyze, Scale/Reliability Analysis; select your variables; click Statistics; in the Descriptives group, select Item and select Intraclass correlation coefficient.; select a model from the Model drop-down list (ex., two-way mixed); select a type from the Type drop-down list (ex., consisten
cy). Continue. OK. Models and Types are discussed below.
Models: ICC varies depending on whether the judges are all judges of interest or are conceived as a random sample of possible judges, and whether all targets are rated or only a random sample, and whether reliability is to be measured based on individual ratings or mean ratings of all judges. These considerations give rise to six forms of intraclass correlation, described in the classic article by Shrout and Fleiss (1979). In SPSS, these types are selected under the Model button of the Reliability dialog and under the Type drop-down list (3 models times 2 types = the six forms of ICC). .
1. One-way random effects model. Judges/raters are conceived as being a random selection of possible
蒋毅君 清华大学
raters/judges, who rate all targets of interest. That is, in this model judges are treated as a random sample and the focus of interest is a one-way anova testing if there is a subject/target effect. This model applies even when the researcher cannot associate a particular subject with a particular rater because information is lacking about which judge assigned which score to a subject. This would happen if the columns were first rating of a subject, second rating, third rating, etc., but a given rating (ex., the first rating) for one subject might be by a different judge than the first rating for another subje
ct, etc. This in turn means there is no way to separate out a judge/rater effect. There would also be no way to separate out a judge/rater effect if each judge rates only one subject, even if it is known which judge assigned which score. In either of these situations the researcher uses a one-way random effects model. This model conceptualizes that there is a target/subject factor, with each observed actual subject
representing a level of that target/subject factor. The rater/judge factor cannot be measured and is absorbed into error variance. The ICC is interpreted as the proportion of target/subject variance associated with differences
among the scores of the subjects.
2. Two-way random effects model. Judges are conceived as being a random selection from among all possible
judges, and targets/subjects are conceived as being a random factor too. Raters rate all n subjects/targets chosen at random from a pool of targets/subjects and it is known how each judge rated each subject. The ICC is国术联盟
interpreted as the proportion of Subject plus Rater variance that is associated with differences among the scores of the subjects. The ICC is interpreted as being generalizable to all possible judges.
kappa系数
3. Two-way mixed model. All judges of interest rate all targets, which are a random sample. This is a mixed model
because the judges are seen as a fixed effect (not as a random sample of all possible raters/judges) and the
targets are a random effect. The ICC coefficients will be identical to the two-way random effects model, but the ICC
is interpretated as not being generalizable beyond the given judges.
Types: Under the Model button of the SPSS Reliability dialog, the Type drop-down list allows the researcher to specify one of two types of ICC computation:
1. Absolute agreement: Measures if raters assign the same absolute score. Absolute agreement is often used when
王牌拖拉机systematic variability due to raters is relevant.
2. Consistency: Measures if raters' scores are highly correlated even if they are not identical in absolute terms That is,
raters are consistent as long as their relative ratings are similar. Consistency agreement is often used when
systematic variability due to raters is irrelevant.
Single versus average measures: Each model has two versions of the intraclass correlation coefficient:
1. Single measure reliability: individual ratings constitute the unit of analysis. That is, single measure reliability gives
The the reliability for a single judge's rating. Use this if further research will use the ratings of a single rater.
冶金自动化2. Average measure reliability: the mean of all ratings is the unit of analysis. That is, average measure reliability gives
the reliability of the mean of the ratings of all raters. Use this if the research design involves averaging multiple ratings for each item, perhaps because the researcher judges that using an individual rating would involve too much uncertainty. Note average measure reliability for either two-way random effects or two-way mixed models will be the same as Cronbach's alpha.
Average measure reliability requires a reasonable number of judges to form a stable average. The number of judges required is estimated beforehand as nj = ICC*(1 - rl)/rl( 1 - ICC*), where nj is the number of judges needed, rl is the lower bound from the (1-a)*100% confidence interval around the ICC, discovered in a pilot study; and ICC* is the minimum level of ICC acceptable to the researcher (ex., .80).
Use in other contexts. ICC is sometimes used outside the context of inter-rater reliability. In general, ICC is a coefficient which approaches 1.0 as the between-groups effect (the row effect) is very large relative to the within-groups effect (the column effect), whatever the rows and columns represent. In this way ICC is a measure of homogeneity: it approaches 1.0 when any given row tends to have the same values for all columns. For instance, let columns be survey respondents and let rows be Census block numbers, and let the attribute measured be white=0/nonwhite=1. If blocks are homogenous by race, any given row will tend to have mostly 0's or mostly 1's, and ICC will be high a
nd positive. As a rule of thumb, when the row variable is some grouping or clustering variable, such as Census areas, ICC will more and more approach 1.0 as the size of the clusters decreases and becomes more compact (ex., as one goes from metropolitan statistical areas to Census tracts to Census blocks). ICC is 0 when within-groups variance equals between-groups variance, indicative of the grouping variable having no effect. Though less common, note that ICC can become negative when the within-groups variance exceeds the between-groups variance.

本文发布于:2024-09-20 17:41:44,感谢您对本站的认可!

本文链接:https://www.17tex.com/xueshu/720722.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:转载   王牌   资料   拖拉机
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2024 Comsenz Inc.Powered by © 易纺专利技术学习网 豫ICP备2022007602号 豫公网安备41160202000603 站长QQ:729038198 关于我们 投诉建议