首页 > 外语翻译

Experimental and quasi-experimental designs for research

2023年12月15日发(作者：awazliksikixtomhayayalax)

ii:.ANDEXPERIMENTAL

QUASI-EXPERIMENTALGENERALIZEDFOR

DESIGNS

INFERENCECAUSAL

ShadishWilliam

op MEvPrrtsTrru UNIvERSITY

**.jr-*"-

'"+.'-iLli"D. CookThomas

NonrrrwpsrERN

UNrvPnslrYfrCampbellDonald

COMPANY

MIFFLIN

HOUGHTON

Boston New

York

andExperiments

CausalGeneralized

lnferencefrom 'i'ment

(ik-spEr'e-mant):

[Middle

English from Old French

per-

in Indo-European Roots.]experimentum,

from experiri,

to try; see

n. Abbr. exp.,

expt,

1. a. A test under

controlled conditions that

isa known

truth, examine

the validity of a hypothe-made to demonstrate

previously untried' b. Theof something

the efficacy

or determine

sis,

process of conducting

such a test; experimentation.

2' An innovative"Democracy

is only an experiment

gouernment"act or

procedure:

(.V{illiam Ralp

h lnge).(k6z):

[Middle

English from

Old French from

Latin causa' teason,Cause

result, or e.] n. 1. a. The

producer of an

effect,

b. The one, such

as a

person, an event' or a condition,

that is responsi-for; re-of or reason

ble for an action

or a result.

v. 1. To be the

cause

sult in.

2. To bring

about or compel

by authority or

experimenta-and

philosophers, the

increased emphasis

o MANv

historians

of modern sciencemarked

the emergence

tion in the

15th and L7th centuries

(1981)

citesfrom its roots

in natural

philosophy

(Hacking, 1983). Drake

'Water,

'1.6'!.2

inushering

or Moue in It as

Atop

Tbat Stay

Bodies

treatrse

Galileo's

science, but

earlier claims

can be made favoring

Tilliammodern experimental

Vinci'sLeonardo da

MagneticBodies,

and

Onthe Loadstone

1,600 study

Gilbert's

o-perhaps even the

Sth-century

and

(1,452-1.51.9)

many investigations,

to argue againstvarious

empirical demonstrations

pher Empedocles, who used

'1.969a,

sense of the term,

humans(Jones,

1'969b).In

the everyday

Parmenides

mo-ways of doing

things

from the earliest

with different

have been experimenting

tryingnatural a

part

of our

life as

is as

experimenting

Such

of their history.

ments

way of starting

a different

a new

recipe

1. EXeERTMENTs

IAND GENERALTzED

cAUsAL

INFERENcEHowever,

the scientific revolution of the 1.7th century departed

in three waysfrom the common use

of observation in natural philosophy atthat time. First, it in-creasingly used

observation to correct errors in theory. Throughout historg natu-ral philosophers

often used observation in their theories, usually to win

philo-sophical arguments

by finding observations that supported their r, they still

subordinated the use of observation to the

practice of

derivingtheories from

"first

principles,"

starting

points

that humans

know to be true by ournature or by divine revelation (e.g.,

the assumed

properties of the four basic

ele-ments of fire, water,

earth, and air in Aristotelian natural

philosophy). Accordingto some accounts,

this subordination of evidence to theory degenerated in the 17thcentury:

"The

Aristotelian principle of

appealing to experience

had degeneratedamong philosophers

into dependence on reasoning supported by casual examplesand the refutation of opponents by pointing to apparent exceptions

examined"

(Drake,

'1,98"1.,

not carefullyp. xxi).'Sfhen

some 17th-century

scholars then began

touse observation to correct apparent

errors in theoretical and

religious first princi-ples,

they came into conflict with religious or philosophical authorities, as in thecase

of the Inquisition's demands

that Galileo recant his account of the earth re-volving around the sun.

Given such hazards, the fact that the new experimental sci-ence tipped the balance

toward observation and ^way from dogma

is the time

Galileo died, the role of systematic

observation

was firmly entrenchedas a central feature of science, and it has remained so ever since

(Harr6,1981).Second,

before the 17th century, appeals

to experience

were usually based

onpassive

observation of ongoing systems

rather than on observation of what

hap-pens after a

system

is deliberately

changed.

After the scientific

revolution in theL7th centurS the

word experiment

(terms

in boldface in this book are defined

inthe Glossary)

came to connote taking a deliberate action followed by systematicobservation

of what occurred

noted of Francis Ba-con:

"He

afterward. As Hacking

(1983)

taught that not only must we observe

nature in the

also

'twist

raw, but that we mustthe lion's tale', that is, manipulate our world in order to

learn its

se-crets"

(p.

U9). Although passive

observation

reveals much about the world, ac-tive manipulation is required

to discover some of the world's

regularities and pos-sibilities

(Greenwood,,

1989). As a mundane example, stainless steel does notoccur naturally; humans must manipulate it into existence.

Experimental sciencecame to be concerned

with observing the effects

of such

, early

experimenters realized the desirability of controlling extraneousinfluences that might limit

or bias observation. So telescopes

were carried tohigher points

at which the air was clearer, the glass

for microscopes was

groundever more accuratelg

and scientists

constructed laboratories in which it was

pos-sible to use

walls to keep out potentially biasing ether waves and to use

(eventu-ally sterilized)

test tubes to keep out dust or bacteria. At first, these

controls weredeveloped

for astronomg chemistrg and

physics,

the natural sciences

in which in-terest in science

first bloomed. But when scientists

started to use experiments

inareas such as public health or education, in which extraneous influences areharder to control

(e.g.,

Lind

1,7 53lr, they found

that the controls used

in natural

AND CAUSATTON

EXPERTMENTS

Ithey

devel-worked

poorly in these new applications. So

in the laboratory

science

as random

influence, such assign-with extraneous

new methods of dealing

oped

control

group

(Coover

& Angell,a nonrandomized

1,925) or adding

ment

(Fisher,

across these set-accumulated

experience

As theoretical and

observational

1.907).

more methods were de-and

were identified

of bias

sources

topics, more

tings and

(Dehue,

2000).with them

veloped to cope

varyis still to deliberately

to all experiments

TodaS the

key feature common

else later-to discover theto something

what happens

to discover

so as

something

whatwe do this,

for example, to assess

As laypersons

causes.

effects of

presumed

more, to our weight

if we diet less,if we exercise

happens to our

blood

pressure

scientific experimenta-book. However,

a self-help

if we read

or ro our behavior

language, and tools, in-substance,

increasingly specialized

tion has developed

sciences that is the

pri-in the social

the

practice of field experimentation

cluding

mary focus of

this book.

This chapter begins

to explore these

matters by(2)

explaining the spe-test,

(1) of causation that

experiments

nature

the

discussing

quasi-experiments) that , randomized experiments,

cialized terminology

(3)

introducing the

problem of how to generalizesocial

experiments,

scribes

the ex-and

(4)

briefly situating

experiments,

from individual

causal connections

of the

nature

literature

periment within a larger

AN D CAUSATIONEXPERIMENTS

for talking aboutrequires both a vocabulary

of experiments

discussion

A sensible

that underlie

of key concepts

an understanding

causation and

RelationshipsCausal

Effect, and

Defining Cause,

For in-relationships in their

daily lives.

causal

recognize

Most

people intuitively

hitting

yours

was a cause of theautomobile's

you may say that another

stance,

you

spent

studying was

a cause ofof hours

that the number

damage to

your car;

of his was a cause

of food a friend

amount

your test grades; or that

the

noting that a lowcausal relationships,

point to more

complicated

You may even

studying, which

causedwhich

reduced subsequent

was demoralizing,

test

grade

(low grade) anbe both a cause and

can

grades. Here the same variable

even lower

(lowtwo variables

between

relationship

effect, and there

can be a reciprocal

and not studying) that

cause each

definitionrelationsbips, a

precise

familiarity with causal

Despite this

intuitive

Indeed, the definitionsfor centuries.l

philosophers

has eluded

effect

of cause and

discussions ofin ordinary

language, not the more detailed

of the word causation

1. Our analysis refldcts the use

in thisa host of works that

we reference

may consult

detail

in such

Readers interested

cause by

philosophers.

(1979).including Cook and Campbell

chapter,

1. EXPERTMENTS

AND

GENERALTZED

CAUSAL INFERENCEof terms

such

as cause and,

effectdepend partly

relationship

on each

other

and on the

which both

causalare

embedded.

Locke

So the 17th-century philosopher

said:

"That

which produces

any

general

simple or complex

idea,

we denote

by theJohn"

name

caLtse,

and that

which is produce

d, effect"

(1,97

A cAtrse

s, p. 32fl

is that

and also:which makes

any other thing,

either

mode,

simple idea, substance,

begin

to be;

orand

an effect is

that,

thing" (p.

which had its beginning from

325).

some otherSince then,

other

philosophers

and

definitions

scientists have given

of the

three

usefulkey ideas--cause,

effect, and

causal relationship-that

specific and

that better

areilluminate

how experiments

fend

work. We

any

would not

of these

de-as the true or

correct

philosophers

definition,

given

that the

latter

for millennia;

has

eludedbut we do claign

that these ideas help

entific

practice

to clarify

of probing

the onsider

the cause

of a forest

fire.

'We

know that fires

start in different

match

tossed from

ways-aa ca a lightning

ple.

strike,

or a smoldering campfire, for

None

of these causes

exam-is necessary

say'

because a forest fire can

a match

start

is not present.

even

when,Also,

none

of them is sufficient

all, a match

to start

must stay

"hot"

the fire. Afterlong enough

to start combustion; it

combustible

must contactmaterial

such

dry leaves;

there

must

oxygen for combustion

occur;

and the weather

tomust be

dry enough

so that the leaves

match

are dry and theis not

doused

by rain. So

the match

part

of a constellation

without which

of conditionsa fire

will not result,

although

ally taken

some of these conditions

for granted,

can

be usu-such

as the

fore,

availability

of oxygen. A lighted match

what Mackie (1,974)

is,

rhere-called an inus

redundant part

condition-"an insufficient

but non-of an unnecessary

but sufficient

inal). It

condition"

(p.

62; italics in

is insufficient

orig-because

a match

cannot start a fire without

ditions. It

the other con-is nonredundant

only if it adds

something fire-promoting

uniquely

different

that isfrom

what the other

factors

in the constellation

(e.g.,

dry leaves)

oxygen,contribute

to starting

a fire; after

all,it would be

the match

harder ro say

caused the

whetherfire if someone

else simultaneously

cigarette lighter.

tried starting

it with

It is part

aof a sufficient

condition to start a fire in

with the full

combinationconstellation

of factors.

But that condition

is not necessary

there

are

other

becausesets of

conditions

that can

A research

also start

of an inus

condition

for cancer.

concerns a new

potential

In the

treatmentlate 1990s,

a team

of researchers in Boston headed

Folkman

by Dr.

reported

that a new

drug

called Endostatin

shrank tumors

Judahtheir blood

by limitingsupply

(Folkman,

1996).

Other respected researchers

cate

could not repli-the

effect

even when

using drugs

shipped to them

from Folkman's lab.

tists

Scien-eventually

replicated

the results

after they had traveled

learn

how

to Folkman's lab

to properly

tomanufacture,

to inject

transport, store, and

handle the

it in the

drug

and howright location

at the

right

these

depth and angle. One

contingencies

observer labeledthe

"in-our-hands"

phenomenon, meaning

"even

we don't

CAUSATION

AND

EXPERIMENTS I

Stime to work

it out"know which

details

are important,

it might

take you some

causewas

an inus

condition.

It was insufficient

(Rowe, L999,

p.732). Endostatin

required

it to be

embedded

in a larger set

of condi-by itself, and

its effectiveness

fully understood

by the original

that were

not even

called

inus conditions.

Many factors are

usu-are

more accurately

Most causes

to occur,

but we

rarely

know

all of them and how

theyally

required

for an effect

inthat the

causal

relationships

we discuss

relate to each

other.

This is one

reason

willthe

probability that an effect

this book

are not

deterministic

but only

increase

relation-why a

given causal

Holland,

1,994).It

also explains

1,991,;

occur

(Eells,

hu-time, space,

ship

will occur

under

some

conditions

but

not universally

across

that are

more or lessand outcomes

-"r

pop,rlations,

or other

kinds

of treatments

all causal

relationships

are contextrelated

io those

studied.

To different

{egrees,

That isis always at

issue.

of experimental

effects

so the

generalization

dependent,

throughout

this book.*hy *.

return to

such

generahzations

Effect'We

is through

a counterfactual

model thatcan better

understand

what

an effect

'l'973'(Lewis,

Hume

David

philosopher

18th-century

to the

least

back at

goes

In an experiment,p. SSe

A counterfactual

is something

that

is contrary

to fact.

a treatment.

The counterfac-what did

happez when

people received

ie obserue

to those

same

people if they si-tual is knowledge

of what

would

haue happened

whatbetween

treatment.

An effect

is the difference

multaneously

had not

received

what would

have

did happen

'We

cannot

actually

observe

a counterfactual.

Consider

phenylketonuriaunlessmental

retardation

that causes

disease

metabolic

(PKU),

genetically-based

thatof an enzyme

treated

during the

first few

weeks of

life. PKU

is the absence

toxic

to thewould otherwise

prevent a buildup

phenylalanine,

a substance

Vhen

a restricted

phenylalanine

diet

is begun early

and

main-nervous

system.

could be thought

of asIn this example,

the cause

tained,

reiardation

prevented.

disorder,

or as the

diet. Each

im-the underlying

genetic

defect, as

the enzymatic

plies a difierenicounterfactual.

For example,

if we say

that a

restricted

phenyl-mental

retardation

in infants who

arein PKU-based

a decrease

alanine

diet caused

at birth,

the counterfactual

is whatever

would

have happenedphenylketonuric

'h"d

logicphenylalanine diet.

The same

a restricted

infants

not received

same

But it is

impossible

forapplies to

the

genetic or enzymatic

version

of the

cause.

,"-i

infants

simultaneously

to both

have and

not have

the diet, the

ge-these

the

enzyme

netic disorder,

ap-So a central

task

for all cause-probing

research

is to create

reasonable

if it wereproximations to

this

physically impossible

counterfactual.

For instance,

ethical to do

so, we

might contrast

phenylketonuric

infants

who were

given thediet with

other

phenylketonuric

infants

who

wer€ not

given the diet

but who weresocioeco-gender, age,

similar

face)

similar

in many ways

to those

who were

(e.g.,

Or we

might

(if

it were ethical)

contrast

infants whohealth

status).

nomic status,

1. EXPERIMENTS

AND GENERALIZED

CAUSAL INFERENCEwere not on the diet for

the first 3 months

of their lives with those same infantsafter they were put

on the diet starting in

the 4th month. Neither of these ap-proximations is

a true counterfactual. In

the first case, the individual infants in thetreatment condition are

different from those in the comparison condition; in

thesecond case,

the identities are

the same, but time has passed

and many changesother than the treatment have

occurred to the infants (including permanent

dam-age done by phenylalanine

during the first 3 months of life). So two central

tasksin experimental

design are creating

a high-quality but necessarily imperfect sourceof counterfactual inference

and understanding how this source differs from

thetreatment counterfactual

reasoning

is fundarnentally qualitative because causal

in-ference, even in experiments,

is fundamentally qualitative

(Campbell,

1975;Shadish, 1995a;

Shadish 6c Cook, 1,999).

However, some of these points havebeen

formalized

by statisticians into

a special case that is sometimes called Rubin'sCausal Model

(Holland,

1,986;Rubin,

"1.974,'1.977,1978,79861.

This

book is

notabout statistics,

so we do not

describe that model in detail

('West,

Biesanz,

Pitts[2000]

do so and relate it to the

Campbell tradition). A

primary

emphasis of Ru-bin's model is the analysis

of cause in experiments,

and its basic

premises

are con-sistent with those

of this book.2 Rubin's model has also been

widely used

to ana-lyze causal inference

in case-control

studies in public health and medicine(Holland

6c Rubin, 1988), in path

analysis in sociology

(Holland,1986),

and ina

paradox

that Lord

(1967)

introduced into psychology

(Holland

6c Rubin,1983); and it has generated

many statistical

innovations that we cover later in thisbook. It is new enough that

critiques of it are

just

now beginning to appear

(e.g.,Dawid, 2000;

Pearl, 2000).

tUfhat

is clear,

however, is that Rubin's is a very gen-eral model with obvious

and subtle implications. Both it and the critiques of it arerequired material

for advanced

students

and scholars

of cause-probing RelationshipHow do

we know if cause

and effect are related? In a classic

analysis

formalizedby the 19th-century philosopher

John

Stuart Mill, a causal

relationship exists if(1)

the cause preceded

the effect,

(2)

the cause was related to the effect, and

(3)

wecan find no plausible

alternative

explanation for the effect other than the three

characteristics

mirror what happens in experiments in which

(1)

wemanipulate the presumed

cause and

observe an outcome afterward;

(2)

we seewhether variation in

the cause

is related to variation in the effect; and

(3)

we usevarious methods

during the experiment

to reduce the

plausibility

of other expla-nations for

the effect, along with

ancillary methods to explore the

plausibility

ofthose we cannot

rule out

(most

of this book is about methods

for doing this).2. However, Rubin's model

is not intended to say much

about the matters of causal

generalization that

we addressin this book.

EXPERTMENTS

CAUSATTON

AND

7Isci-No other

causal relationships.

well-suited to

studying

are

Hence experiments

so causal relationships

the characteristics

matches

method regularly

entific

methods. In many correlationalof other

weakness

points to the

also

Mill's analysis

came first,to know

which of two variables

it is impossible

studies, for example,

thisis

precarious.

Understanding

between them

relationship

a causal

so defending

cause and effect, aresuch as

and how

its key terms,

relationships

logic of causal

-probing

to critique

helps researchers

defined

Confoundsand

Correlation,

Causation,

A well-known

maxim

in research is:

Correlation

does not

proue

causation. This isex-first nor whether

alternative

we may not

know which

variable came

so because

income and educa-suppose

exist.

For example,

planations for the

presumed effect

payyou can

a high income

before

to have

you have

tion are correlated.

before

you can get a bet-to get a

good education

or do

you first have

for education,

But un-investigation.

true, and

so both need

may be

possibility

ter

paying

job?

Each

by the scholarly communiry

aand evaluated

are completed

til those

investigations

first. Correlations also

docame

which variable

not indicate

simple

correlation does

explanations

for a relationship

between two variableslittle to rule out alternative

at all but ratherThat relationship

may not be causal

and income.

such as education

or family so-due to a

third variable

(often

called a confound),

such as

intelligence

For example,and high

income.

both high

education

that causes

status,

cioeconomic

and on the

job,

then intelligent

peo-in education

causes success

if high intelligence

causes in-not because education

and incomes,

education

correlated

ple would have

Thus a cen-by intelligence.

both would

be caused

but because

come

(or

vice

versa)

tral task in the study

of experiments

is identifying

the different

kinds of confoundsandthe strengths

in a particular

research area

and understanding

that can operate

with themwith various

ways of dealing

associated

weaknesses

CausesNonmanipulable

and

Manipulable

people have, it makesthat most

of experimentation

understanding

In the intuitive

"Let's

to work"; butwelfare recipients

if we require

see what

happens

to say,

sense

"Let's

male into aif I change this adult

see what

happens

to say,

it makes no sense

exploreExperiments

in scientific experiments.

it is also

girl." And so

three-year-old

of a medicine, thesuch as

the dose

of things that

can be

manipulated,

the effects

psychotherapy or the numberthe kind or

amount

check,

amount of a welfare

(e.g., of a super-the explosion

events

Nonmanipulable

in a classroom.

of children

or their , people's their

raw

genetic material,

ages,

nova) or attributes

themvary

deliberately

we cannot

because

in experiments

be causes

cal sex) cannot

and

philosophers agreemost scientists

Consequently,

to see what

then happens.

effects of

nonmanipulable

to discover

harder

that it is much

1. EXeERTMENTS AND GENERALTzED cAUsAL

TNFERENcETo be clear, we are not arguing that all causes

must be manipulable-only

thatexperimental causes

must be so. Many variables that we correctly

think of as

causesare not directly manipulable. Thus

it is well established that a

genetic

defect causesPKU even

though that defect is not directly

manipulable.'We can

investigate

suchcauses indirectly in nonexperimental studies

or even

in experiments

by manipulat-ing biological

processes

that

prevent the gene from exerting

its influence, asthrough the use

of diet to inhibit the

gene's biological consequences.

Both the non-manipulable

gene

and the manipulable diet can be

viewed as

causes-both covarywith PKU-based

retardation, both precede the retardation,

and

it is possible

to ex-plore other explanations for the gene's and the diet's

effects on cognitive

function-ing. However, investigating

the manipulablc diet as a

cause

has two important ad-vantages

over considering the nonmanipulable

genetic

problem as a cause.

First,only the diet

provides a direct action to solve the

problem; and second,

we will seethat studying manipulable agents

allows a higher

quality source of counterfactualinference through such

methods as random assignment.

fhen individuals with

thenonmanipulable genetic

problem are compared with

persons

without

it, the latterare likely to be different

from the former in many ways

other than the

genetic

de-fect. So the counterfactual

inference about what would

have happened to thosewith the PKU genetic defect

is much more difficult to

eless, nonmanipulable causes should

be studied

using

whatever meansare available

and seem useful.

This is true because such

causes eventually

help usto find manipulable agents that can then be

used to ameliorate

the problem athand. The PKU example illustrates this.

Medical researchers

did not discover

howto treat PKU effectively by

first trying different diets with

retarded children.

Theyfirst discovered the nonmanipulable biological

features of retarded children

af-fected with PKU, finding abnormally high

levels of phenylalanine

and its associ-ated metabolic and

genetic problems in those children. Those

findings

pointed incertain ameliorative directions and away

from others,

leading scientists

to exper-iment with treatments they thought might be effective

and

practical. Thus the newdiet resulted from a sequence of studies

with different

immediate

purposes, withdifferent forms, and with varying degrees of

uncertainty

reduction. Some

were ex-perimental, but others were r, analogue experiments can sometimes

be done on

nonmanipulablecauses, that is, experiments that manipulate an

agent that

is similar to the causeof interest.

Thus we cannot change a person's

race, but we can

chemically

induceskin pigmentation changes

in volunteer individuals-though

such analogues

donot match the reality of being Black every day and

everywhere

for an entire rly past events, which are normally nonmanipulable,

sometimes

constitutea natural experiment that may even

have been randomized,

as when the

1'970Vietnam-era draft lottery was used

to investigate

a variety of

outcomes

(e.g.,

An-grist, Imbens,

Rubin, 1.996a; Notz, Staw,

Cook,

l97l).Although experimenting on manipulable causes

makes the

job

of discoveringtheir effects easier, experiments are far from

perfect means

of investigating

causes.

EXPERIMENTS AND CAUSATION I 9Iexperiments modify the conditions

which testing occurs in a waySometimes

those conditions and the situation to which the resultsthat reduces the fit between

tellsAlso, knowledge of the

effects of manipulable causes

are to be

generalized.

Nor do experiments answer manyoccur.

effects

nothing about how and why those

other

questions relevant to the real world-for

example, which

questions

areis distributedworth asking, how strong

the need for treatment is, how a cause

through societg whether

the treatment is

implemented with theoretical fidelitSand what value should be

attached to the experimental

first manipulate a treatment and only then ob-In additioq, in experiments,

an effect, such as AIDS,we

first observe

other studies

its effects; but in some

serve

whether manipulable

or not. Experiments cannotfor its cause,

and then search

(1976)

likens such searches to detective work inScriven

help us with that search.

committed

(..d.,

" robbery),

the detectives observe a par-which a crime has

been

the robber wore a baseballsurrounding the

crime

(e.g.,

ticular

pattern of evidence

cap and a distinct

jacket

and used a certain

kind of

Bun),

and then the detectivessearch for criminals whose

known method of

operating

(their

modus ) includes this

pattern. A criminal whose

m.o. fits that pattern of evidencefurther. Epidemiologists use a similarto be investigated

then becomes a suspect

design

(Ahlbom

6c Norell, 1,990), in which they observemethod, the case-control

in brain tumors) that is not seen inan increase

a particular health outcome

(e.g.,

increased

causes cellanother

group and then attempt to identify

associated

phone use). Experiments do

not aspire to answer all the

kinds of questions,

notquestions, social scientists

even all the types of

causal

Explanationand Causal

Causal Description

attrib-is in describing the consequences

The unique strength of experimentation

call this causal description. In con-varying a treatment.'We

utable to deliberately

through which andthe mechanisms

trast, experiments do

less well in clarifying

holds-what we call causalrelationship

which that causal

the conditions under

quickly learn the descriptive

causalchildren very

explanation. For example, most

illumination in a obtaining

relationship between flicking

a light switch

(or

even

why that light

goes

fully explain

adults)

However, few children

(the

act of flicking a lightthe treatment

to decompose

To do so, they would have

(e.g.,

closing an insulated circuit) andfeatures

into its causally efficacious

switch)

(e.g., is thrown by hand or a motionwhether the switch

its nonessential features

(either

for the effect incandescent orto do the same

detector). They

would have

whether thebut light

will still be

produced

fluorescent light can be

produced,

they would then have tolight fixture is recessed or not). For

full explanation,

influence the causallyefficacious parts of

the treatment

show how the causally

(e.g.,

processes

theparts of the outcome through identified

mediating

affected

1O I T.

ICXPTRIMENTS

AND GENERALIZED CAUSAL

INFERENCEpassage of electricity through the circuit, the excitation

photons).3

ClearlS thecause of the light

going on is a complex cluster

of many factors.

For those

philoso-phers who equate

cause

with identifying that constellation

of variables

that nec-essarily inevitably and

infallibly results in the effect

(Beauchamp,1.974),

talk ofcause

is not warranted until everything of

relevance

is known. For them,

there isno causal description without causal

explanation. Whatever

the

philosophic mer-its of their

position, though, it is not practical to expect

much current social

sci-ence

to achieve such complete practical importance of causal explanation

is brought home when theswitch fails to make the

light go on and when

replacing the

light bulb

(anothereasily learned manipulation)

fails to solva the

problem. Explanatory

knowledgethen offers clues

about how to

fix the problem-for

example, by detecting

and re-pairing

a short circuit. Or

if we wanted to create

illumination

in a place withoutlights and we had explanatory

knowledge, we would

know exactly which

featuresof the cause-and-effect

relationship are essential

to create

light and which are

ir-relevant. Our explanation might tell

us that there

must be a source

of electricitybut that that source

could take several

different

molar forms, such

as abattery, agenerator,

a windmill, or a solar array.

There must also

be a switch

mechanism toclose a circuit, but this could also

take many forms,

including the touching of

twobare wires or even a

motion detector that trips the

switch

when someone

entersthe room. So causal explanation

is an important

route to the

generalization ofcausal descriptions

because it tells us which

features of the

causal

relationship areessential to transfer to other benefit of causal explanation

helps elucidate

its priority and

prestige inall sciences and

helps

explain why, once a

novel and

important causal

relationshipis discovered, the bulk of basic scientific

effort turns

toward explaining

why andhow it happens. Usuallg this involves decomposing

the cause

into its causally ef-fective parts, decomposing the effects

into its causally

affected

parts, and identi-fying the processes through which the effective

causal

parts influence the causallyaffected outcome

examples also show the close

parallel between

descriptive

and explana-tory causation and molar and

molecular causation.a

Descriptive causation

usuallyconcerns

simple bivariate relationships between

molar treatments

and

molar out-comes,

molar here referring to a

package

that consists

of many different

parts. Forinstance, we may find that

psychotherapy decreases

depression,

a simple descrip-tive causal

relationship benveen

a molar treatment

package and

a molar r,

psychotherapy consists of such

parts as verbal

interactions,

placebo-3. However, the full explanation a

physicist would offer might be

quite different

from this electrician'sexplanation, perhaps invoking

the behavior of subparticles.

This difference

indicates

iust

how complicated is thenotion of explanation and how it can

quickly become quite complex

once

one

shifts

levels of analysis.4. By molar, we mean something taken as

a whole rather than in

parts. An analogy

is to

physics, in which molarmight refer to the

properties

or motions

of masses, as

distinguished

from those

of molecules

or atoms that make

upthose masses.

EXPERIMENTS AND CAUSATION I 11Itime constraints,

and payment forsetting characteristics,

procedures,

generating

consist of items

pertaining to themeasures

Similarly, many

depression

services.

aspects of

depression. Explan atory causationand affective

physiological, cognitive,

into their

molecular

parts so as

to learn, say,and effects

molar causes

these

breaks

changesfeatures of therapy both cause

that the verbal

interactions

and the

placebo

not dobut that

payment for services does

of depression,

symptoms

in the cognitive

though

it is

part of the molar treatment

even

If experiments

are less

able to

provide this highly-prized

explanatory causalsocial sci-to basic

so central

to science, especially

experiments

knowledge,

in which theory

and explanation

are often

the coin of the realm?

The answer isence,

clear in sci-that the dichotomy

ber'*reen descriptive

and explanatory

causation

is less

First, many causal ex-about causation.

discussions

than in abstract

entific

practice

thelinks in which one event causes

of chains

of descriptivi

causal

planatirons consist

experiments help dis-chain. Second,

help to test the

links in each

next. Experiments

for example, by test-the validity of

competing explanatory

theories,

tinguish between

ing competing

mediating links

proposed by those

theories. Third, some

experimentstest whether

a descriptive

causal relationship

varies in strength

or direction underCondition

(then

the condition

is a moderator variable that ex-Condition

A versus

addholds). Fourth, some

experiments

plains the conditions under

which the effect

quantitative or

qualitative observations

of the

links in the explanatory

chain

(medi-and study

explanations

for the descriptive

causal

generate

ator variables)

in which theExperiments are

also

prized in applied

areas

of social science,

greatergreat

or even

identification of

practical solutions to

social

problems has as

priority than explanations

of those solutions.

After all, explanation

is not alwaysrequired for

identifying

practical solutions.

Lewontin

(1997)

makes this

pointabout the

Human Genome

Project, a

coordinated

multibillion-dollar

researchthat

it is hoped eventually

will clarify the

ge-program ro map the human

genome

of this search:Lewontin is skeptical

about aspects

netic causes of

diseases.

'!ilhat

Manyand intervention.

between explanation

is the difference

here

is involved

protein,

aa normal

organism to

make

of the

failure

by the

explained

can be

disorders

that theBut interuention requires

of a

gene mutation.

that is the

consequence

failure

inright time and

right cells, at the

right

place in the

provided at the

protein be

normal

cellularfound to

provide

way be

right amount, or

else that an alternative

the

awayabnormal protein

to keep the

necessary

even be

is worse, it

might

function.'What

theis served by knowing

objectives

moments. None

of these

cells at critical

from the

"1,997,

p.29)(Lewontin,

gene.

defective

of the

DNA sequence

by theoretical

-Practical applications

are

not immediately

revealed

of sim-of follow-up

work, including

tests

them may take

decades

stead, to

reveal

ple descriptive causal

relationships.

The same

point is illustrated by the cancerknew the action

of the drug

occurredScientists

drug Endostatin,

discussed earlier.

use the drug

to treatbut to successfully

through

cutting off tumor

blood supplies;

in mice required

administering

it at the

right

place, angle, and depth,

andcancers

were not

part of the usual

scientific

explanation of the

drug's

details

1. EXPERTMENTS

AND

IGENERALTZED

CAUSAL

TNFERENCEIn the

end, then, causal descriptions

and causal explanations

are

in delicate

bal-ance in experiments.'$7hat experiments

do best

is to improve causal descriptions;they do less well at explaining causal relationships. But most experiments

can bedesigned to

provide

better explanations than

typically the case today. Further, infocusing

on causal descriptions, experiments

often investigate

molar events thatmay be less

strongly related to outcomes than are more molecular

mediatingprocesses,

especially

those

processes

that are

closer to the outcome in the explana-tory chain.

However,

many causal

descriptions are still dependable

and strongenough to be useful,

to be worth making the building blocks

around which im-portant policies

and theories are created.

of suchcausal statements as

that school desegregation

Just

consider the dependability

causes

white flight, or that outgroupthreat causes ingroup

cohesion, or that psychotherapy

improves mental health,

orthat diet reduces

the

retardation due

to PKU.

Such dependable causal relationshipsare

useful to policymakers,

practitioners,

and

scientists DESCRIPTIONS

OF EXPERIMENTSSome of the terms used in describing

modern experimentation

(see

Table L.L) areunique, clearly defined,

and consistently used; others are blurred and

inconsis-tently used. The common attribute in all experiments is control of treatment(though

control can take many different forms). So Mosteller

(1990, p.

225)writes,

"fn

an experiment the investigator controls the application of the treat-ment"l and Yaremko, Harari, Harrison,

and Lynn

(1,986, p.72)

write,

"one

ormore independent

variables are manipulated to observe their effects on one ormore dependent

variables." However, over time many different experimental sub-types have developed in response

to the needs and histories of different sciences('Winston,

1990;

'Winston

6c Blais, 1.1 The

Vocabulary

of ExperimentsExperiment: A study in

which an intervention is

deliberately

introduced

to observe its ized

Experiment:

An experiment in which units are assigned

to receive the treatment oran alternative

condition by a random

process

such as the toss of a

coin or a table ofrandom

-Experiment:

An experiment in

which units are

not assigned to conditions

Experiment: Not

really an experiment because the cause

usually cannot bemanipulated;

a study that

contrasts

a naturally

occurring

event such

as an earthquake witha comoarison

ational

Study: Usually

synonymous

with nonexperimental

or observational study; a studythat

simply

observes the size and direction

of a relationship

among

variables.

EXPERIMENTS

DESCRIPTIONS

MODERN

trIExperimentRandomized

creditedexperiment, widely

is the randomized

variant

described

clearly

The most

spreadbut

later

used in agriculture

(1,925,1926).It first

was

Fisher

Ronald

to Sir

of vari-it promised control

over extraneous sources

because

topic areas

to other

of the

laboratory. Its distinguishingthe

physical isolation

ation

without requiring

(in-contrasted

treatments being

and

important-that

the various

feature is clear

for ex-units' by chance,

to experimental

assigned

at all)

are

no treatment

cluding

correctlSnumbers. If

implemented

of random

of a table

toss or use

ample, by

coin

probabilisticallythat are

groups of units

two or

creates

,"rdo-

assignment

ob-that are

outcome differences

Hence, any

on the average.6

to .".h other

similar

to be due to treatment'likely

are

end,of a study

groups at the

those

served between

of the d at the

start

groups that already

between the

not to differences

yields

anmet, the

randomized experiment

are

assumptions

certain

Further, when

properties'desirable statistical

that

has

effect

of a treatment

estimate of

the size

falls within a definedof the

probability that

the true effect

with estimates

along

that in aare so highly

prized

features of experiments

These

interval.

confidence

referred to asis often

experiment

the randomized

medicine

such

area

research

research.'outcome

for treatment

the

gold standard

and in-is a more ambiguous

experiment

to the

randomized

Closely

withit synonymously

use

authors

Some

true experiment.

used term,

consistently

gener-it more

use

1991'). Others

(Rosenthal & Rosnow,

experiment

randomized

manip-is deliberately

variable

in which

an independent

to any

study

ally to refer

'We

notis assessed.

shall

variable

a dependent

(Yaremko

et al.,

1,9861and

ulated

tothat the

modifier true seems

and

given

its ambiguity

use the term

at all

given

mental

correct

to a single

claims

imply restricted

Quasi-Experimentand Stanleythat Campbell

of designs

Much of this

book

focuses on

a class

share with all otherquasi-experiments.s

(1,963)

popularized as

Quasi-experiments

anything else. Typically

in fieldbe

people, animals, time

periods, institutions, or

almost

5. Units can

a littleor work sites. In addition,

as classrooms

of people, such

aggregate

people

or some

they

are

experimentation

to units, soof treatments

as assignment

is the same

of units to treatments

that random

assignment

thought shows

interchangeably'phrases are

frequendy used

these

explained in

more detail

in Chapter

6. The word

probabilistically is crucial,

many fields and

in this book,this way

consistently

across

7. Although the rerm

randomized

experiment

is used

related term random

experiment

in a different way to

indicate experimentsuse the closely

sometimes

statisticians

(e.g.,

Hogg &

Tanis, 1988).predicted with certainry

cannor

for which the outcome

very

quickly; Rosenbaumdesigns but

changed terminology

(1957) these compromise

first called

8. Campbell

people it tomany use

a term we avoid because

(1965

refer to these as

observational studies,

(1995a

and Cochran

to(1997) qudsi-etcperiment

use

and Shroder

well. Greenberg

studies, as

or nonexperimental

refer to correlational

group-but we would

consider these

to conditions,

(e.g.,

groups communities)

assign

refer to studies that

randomly

(Murray' 1998).randomized experiments

I14

I1. EXPERIMENTS

AND GENERALIZED

CAUSAL INFERENCEexperiments

a similar purpose-to test descriptive

causal hypotheses about

manip-ulable causes-as well as many structural details, such as

the

frequent presence ofcontrol

groups

and pretest measures, to support a counterfactual

inference

aboutwhat would have happened in the

absence

of treatment. But, by definition,

quasi-experiments lack random assignment.

Assignment to conditions is by

means

of self-selection, by which units choose

treatment

for themselves, or by means

of adminis-trator selection,

by which teachers, bureaucrats, legislators, therapists,

physicians,or others decide

which persons should get which treatment. Howeveq researcherswho use quasi-experiments may still have considerable

control over selecting

andscheduling measures, over how nonrandom

assignment

is executed, over the kindsof comparison

groups

with which treatment,groups are compared,

and over someaspects of how treatment is scheduled. As Campbell and Stanley

note:There are many natural social

settings in which the research

person can introducesomething like experimental

design into his scheduling

of data collection ,

the uhen and to whom of measurement), even though

he lacks the full controlover the

scheduling of experimental stimuli

(the

when and

to wltom of exposure andthe ability to randomize exposures)

which makes a true experiment

possible.

Collec-tively, such

situations can be regarded as quasi-experimental

designs.

(Campbell

&StanleS

1,963,

34)In quasi-experiments,

the cause

is manipulable and occurs before

the effect ismeasured. However, quasi-experimental

design

features usually create less com-pelling

support for counterfactual inferences. For example,

quasi-experimentalcontrol groups may differ from the treatment condition in many systematic

(non-random) ways other than the presence of

the treatment

Many of these ways couldbe alternative explanations for the observed effect, and so researchers have toworry about ruling them out in order to get a more valid estimate

of the treatmenteffect. By contrast, with random assignment the researcher does

not have to thinkas much about all these alternative explanations. If correctly done,

random as-signment makes most of the alternatives less likely as causes

of the observedtreatment effect at the start of the quasi-experiments, the researcher has

to enumerate

alternative

explanationsone by one, decide

which are plausible, and then use logic, design,

and measure-ment to assess

whether each one is operating in a way that

might explain any ob-served effect. The difficulties are that these

alternative

explanations

are

never com-pletely enumerable

in advance, that some of them are

particular to the contextbeing studied,

and that the methods needed to eliminate them

from contention willvary from alternative to alternative

and from study to study.

For example, supposetwo nonrandomly formed groups

of children are

studied, a volunteer treatmentgroup

that gets

a new reading program and a control

group of nonvolunteers whodo not get it. If the treatment group

does better,

is it because of treatment or be-cause the cognitive

development of the volunteers was increasing

more rapidly evenbefore treatment began?

(In

a randomized experiment,

maturation rates wouldtrl

OF EXPERIMENTS

1sMODERN DESCRIPTIONS

re-the

this alternative,

in both

groups.) To assess

equal

probabilistically

been

have

the

treat-trend before

maturational

pretests to reveal

multiple

might

add

searcher

that trend

with the

trend after

, and then

compare

in-control

group

nonrandom

that the

might be

explanation

alternative

Another

homes orin their

to books

less access

who had

children

disadvantaged

cluded more

bothexperiment'

often.

(In

a randomized

to them

less

who

read

who had

parents

this alter-To assess

children.)

of such

proportions

had

similar

groups would

have

timeparental

at home,

of books

the number

may

measure

nativi, the

experimenter

wouldresearcher

the

Then

to libraries.

perhaps trips

and

children,

spent readingto

in the hypothe-and control

groups

treatment

across

differed

variables

see if these

as theObviously,

effect.

treatment

that could

explain

the observed

direction

sized

of the

quasi-the design

increases,

explapations

plausible alternative

number

be-and complex---especially

demanding

experiment

intellectually

becomes

Theexplanations.

all the alternative

identified

we have

never certain

cause we

are

a woundto bandage

start

to look

like affempts

efforts of

the

quasi-experimenter

had been

if random assignment

less severe

been

that would

have

to a falsificationistrelated

is closely

hypotheses

The ruling out

of alternative

that asure

how hard it

is to be

noted

(1959).

Popper

popularized by

Popper

logic

ofbased on

a limited

set

white)

is correct

(e.g.,

,ll r*"ttr

are

g*.r"t conclusion

white).

After all,

future observa-were

I've seen

(e.g.,

all the

swans

observations

is log-So confirmation

swan).

I may see a

black

(e.g.,

some day

change

tions

may

(e.g.,

swan)a black

instance

a disconfirming

observing

By contrast,

difficult.

ically

that all

swans areto falsify

the

general conclusion

view,

in Popper's

is sufficient,

falsify the con-to try deliberately

scientists

urged

nopper

white.

Accordingly,

information

corroboratingclusions they

wiih

to draw

rather than

only

to seek

orbooks

in scientific

retained

are

falsification

that withstand

them.

Conciusions

comes along.

Quasi-journals

and treated

plausible until

better evidence

identify

aexperimenters

in that

it requires

is falsificationist

experimentation

explanationsplausible alternative

and examine

generate

then

and

causal claim

the might

falsify

KuhnPopper hoped.

as definitive

never

can

falsification

However, such

that can

neveron two

assumptions

depends

(7962) pointed out that

falsification

But that

isspecified.

claim

is perfectly

The first

is that the

causal

be fully

tested.

of both

the claim

and the test

of the claim

aremany

features

never ih. ."r..

thehow it

is measured,

is of interest,

which

outcome

debatable-for

example,

and all the

many other

decisionswho

needs treatment,

of treatment,

conditions

disconfir-As a result,

relationships.

causal

in testing

make

must

that

researchers

theories. For

exam-part of their causal

to respecify

leads theorists

mation often

to behold for their

theory

that must

conditions

novel

ple,

might

now specify

they

Sec-observations.

disconfirming

from the apparently

derived

irue and

that were

perfectly valid

reflections of

the the-that are

measures

requires

ond, falsification

ismost

philosophers maintain

that all

observation

However,

tested.

ory being

specific to

the

partiallynuances

both with

intellectual

It is laden

theorv-laden.

16 I 1. EXPERIMENTS

IAND GENERALIZED CAUSAL

INFERENCEunique

scientific understandings

of the theory

held by the

individual

or group de-vising the test and also with the

experimenters' extrascientific

wishes,

hopes,aspirations, and broadly shared

cultural assumptions

and understandings.

Ifmeasures are not independent

of theories, how can

they

provide independent the-ory tests, including

tests of causal theories?

If the

possibility of theory-neutral

ob-servations

is denied, with them

disappears

the

possibility

of definitive

knowledgeboth of what seems

to confirm a causal claim

and of what seems

to disconfirm

eless,

a fallibilist version of falsification

possible. It argues that stud-ies of causal hypotheses can

still usefully improve

understanding

general trendsdespite ignorance of all

the contingencies that

might

pertain

to those trends.

It ar-gues

that causal studies are

useful even

if w0 have

to respecify

the initial hypoth-esis

repeatedly to accommodate

new contingencies

and new understandings.

Af-ter all, those respecifications

are usually minor

in scope;

they rarely

involvewholesale overthrowing

of general trends

in favor of

completely opposite

ilist

falsification also assumes

that theory-neutral

observation

is impossiblebut that observations

can approach a more

factlike

status when

they

have been re-peatedly made

across different

theoretical conceptions

of a construct,

across mul-tiple kinds

of measurements,

and at multiple

times.

It also assumes

that observa-tions are imbued with multiple theories,

not

iust

one, and

that differentoperational

procedures do not share the same multiple

theories. As a result,

ob-servations that repeatedly occur

despite different

theories being

built into themhave a special factlike status even

if they can

never be

fully

justified

as completelytheory-neutral facts.

In summary, then,

fallible

falsification

is more than

just

see-ing whether observations disconfirm

a prediction.

It involves

discovering andjudging

the worth of ancillary

assumptions

about

the restricted

specificity

of thecausal hypothesis under

test and also

about the heterogeneity

of theories, view-points, settings, and times

built into the measures

of the cause

and effect

and ofany contingencies

modifying their is neither

feasible nor desirable to rule out

all

possible

alternative

interpre-tarions

of a causal relationship. Instead, only

plausible alternatives

constitute themajor focus.

This serves

partly to keep matters tractable

because

the number

ofpossible alternatives is endless. It also

recognizes

that many

alternatives

have noserious empirical or experiential

support and

do not warrant

special r, the lack of support

can sometimes

deceiving.

For example, the

causeof stomach ulcers was long thought

to be a combination

of lifestyle

(e.g.,

stress)and excess acid

production. Few scientists seriously

thought that ulcers

werecaused by a pathogen

(e.g.,

virus,

germ,

bacteria)

because

it was

assumed that

anacid-filled stomach would destroy

all living organisms.

However, in

L982 Aus-tralian researchers

Barry Marshall and

Robin

'Warren

discovered spiral-shapedbacteria, later

name d Helicobacter

pylori (H. pylori), in ulcer

patients' h

"1994,

this discovery, the

previously possible but implausible

became

plausible. Bya U.S. National Institutes

of Health Consensus

Development Conferenceconcluded that H.

pylori

was the major cause

of most

peptic ulcers.

So labeling ri-

IOF EXPERIMENTS

DESCRTPTONS

MODERN

ttInot

just

on what is logically

possible but onas

plausible depends

val hypotheses

and, empirical

experience

social consensus,

areas

de-Because such

factors are

often context

specific,

different substantive

to beare important

enough to need

velop their own

lore about

which alternatives

controlled, even

developing

their own

methods for

doing so. In early

psychologgwas invented to control forobservations

a control

group with

pretest

for example,

test con-alternative

explanation

that, by

giving practice

in answering

the plausible

would

produce gains in

performance even

in the absence of a treat-tent,

pretests

(Coover

Thus the

focus on

plausibility is a two-edgedAngell, 1907).

ment effect

in quasi-experimentalto be considered

the range of

alternatives

sword: it

reduces

the resulting causal

inference vulnerable

to the discoverywork,

yet it also leaves

a likely causal

alternative

may later emerge

that an implausible-seeming

Natural Experimenta naturally-occurring

contrast between aThe term natural

experiment

describes

1990;

Meyer, 1995;Zeisel,1,ion

(Fagan,

and a comparison

treatment

Often the treatments

are not

even

potentially manipulable,

as when researchersdrops in

prop-in California caused

whether earthquakes

examined

retrospectively

erty values

(Brunette, 1.995; Murdoch, Singh,

6c Thayer,

1993). Yet plausibleabout

the effects

of earthquakes

are easy

to construct and de-causal inferences

property val-occurred

before the

observations

fend. After all,

the earthquakes

A use-are related

properfy values.

whether earthquakes

to see

and it is easy

ues,

can be

constructed by

examining

propertyof counterfactual

inference

ful source

thatlocale before

the earthquake

or by studying similar

locales

in the same

values

did not experience

an earthquake

during

the bame time.

If property valuesin the earthquake

condition

but not in the com-dropped

right after the

earthquake

parison condition,

it is difficult to

find an alternative

explanation

for that have recently

gained a high

profile in economics.

Natural experiments

had great faith

in their

ability to

produce valid causal in-the 1990s economists

between treat-for initial

nonequivalence

ferences through

statistical

adjustments

job

training

programson the

effects

ment and control

groups. But two studies

that were

not close to thoseshowed that those

adjustments

produced estimates

tests

of thegenerated from a randomized

experiment

and were

unstable across

Lalonde, 1986).

Hence, in theirmodel's sensitivity

(Fraker

6c Maynard,

1,987;

came to do

natural experiments,many economists

for alternative

methods,

that occurred

in the Miami

job

marketsuch as the economic

study of the

effects

from Cuban

jails

and allowed to come

to thewere released

when many

prisoners

(or

the tim-(Card,

1990). They

assume that

the release

United States

ing of an earthquake)

is independent

of the ongoing

processes that usually affectunemployment

rates

(or

housing values).

Later we

explore the validity

of thisassumption-of

its desirability

there can

be little

question.

1. EXPERIMENTS

AND GENERALIZED

CAUSAL

INFERENCENonexperimental DesignsThe terms correlational

design,

passive

observational design, and

nonexperimentaldesign

refer to situations

in which a presumed cause and effect are identified andmeasured but in which other structural features of experiments are missing. Ran-dom assignment is not part of the design, nor are

such design elements as

pretestsand control

groups from

which researchers might

construct a useful counterfactualinference. Instead, reliance is placed

on measuring alternative explanations

indi-vidually and then statistically

controlling for them.

In cross-sectional studies inwhich all the data

are

gathered on

the respondents

at one time, the researcher

maynot even know if the cause

precedes

the dffect.

When these

studies are used forcausal purposes,

the missing design features can be

problematic unless much

is al-ready known about

which alternative interpretations are

plausible, unless thosethat are

plausible can

be validly measured, and unless the substantive

model

usedfor statistical adjustment is

well-specified. These are difficult

conditions to meet inthe real

world of research

practice,

and therefore many

commentators

doubt thepotential

of such designs

to support strong causal inferences

in most MENTS

AND

THE GENERALIZATION

OFCAUSAL

CONNECTIONSThe strength of experimentation is its ability to illuminate causal inference.

Theweakness of experimentation is doubt about the extent to which that causal

rela-tionship generalizes.

'We

hope that an innovative feature of this

book is its focuson

generalization.

Here we introduce the general issues that are expanded

in Experiments

Are Highly Local But HaveGeneral AspirationsMost experiments

are highly localized and

particularistic. They are almost alwaysconducted in a restricted range

of settings, often

just

one, with a

particular

ver-sion of one type of treatment

rather than, say, a sample of all

possible y they

have several measures-each with theoretical assumptions

that aredifferent from those present in

other

measures-but

far from a complete set of allpossible

measures.

Each

experiment nearly always

uses a convenient sample ofpeople

rather than

one that reflects a well-described

population;

and it will in-evitably

be conducted at a particular

point

in time that rapidly becomes

readers of experimental results

are rarely concerned

with what happenedin that particular, past,

local study. Rather, they usually

aim to learn

either abouttheoretical constructs

of interest or about alarger

policy. Theorists often want to

CONNECTIONS

CAUSAL

THE GENERALIZATION

AND

EXeERTMENTS

t'connect

experimental

results to

theories

with broad

conceptual

applicability,which ,.q,rir.,

generalization

at the

linguistic level

of constructs

rather than

at thethese constructs

in a

given of the

operations

used to

represent

than are

rep-to more

people and settings

They nearly

always

want to

generallze

theoryto a substantive

experiment.

Indeed,

the value assigned

in a single

resented

SimilarlSof

phenomena the theory

covers.

on how

broad a

range

usually

depends

policymakers

may be

interested

in whether

a causal

relationship

would holdthe

many sites

at which

it would be

implemented

as aiprobabilistically)

across

beyond the

original experimentalthat

requires

generalization

policS an inference

probably value

the

perceptual and cogni-all human

beings

stody contexr.

Indeed,

Otherwise,

the world

might ap-tive stability

that

is fostered

generalizations.

requiring constant

cognitivecacophony

of isolqted

instances

pear as a btulzzing

processing that would

overwhelm

our

limited more broadly

ap-a problem,

we do

not assume

In defining

generalization

physi-(Greenwood, 1989).

For example,

more desirable

plicable resulti

are always

may not expect

that itto discover

new elements

cists -ho

use

particle accelerators

into the world.

Similarly, social

scien-would be desiiable

to introduce

such

elements

aim to

demonstrate

that an

effect

is possible and

to understand

itstists sometimes

Forcan be

produced more

generally.

without

expecting

that the

effect

mechanisms

"sleeper

effect"

occurs

in an attitude

change study

involving

per-instance, when

is manifest after

a time delaycommunications,

the

implication

is that

change

suasive

turn out

tounder

which this

effect occurs

but not

immediately

so.

The circumstances

interest other

than to

show that

thebe

quite limited and

unlikely to

be of any

general

may not be

wrong

(Cook,theory

predicting

(and

many

other ancillary

theories)

Gruder,

Hennigan &

Flay ments

that demonstrate

limited

generaliza-broad

demonstrate

as those

valuable

tion

may be

just

nature of the

causalthe localized

to exist

berween

a conflict

seems

Nonetheless,

causalprovide and

the more

generalized

knowledge

that

individual

experiments

(Cronbach

et al.,to attain.

Cronbach

and his

colleagues

aspires

goals that research

made this

argument

most forcefully

and their

worksCronbach,

19821have

gSO;

have contributed

much to

our thinking

about

causal

generalization. Cronbachbeing

con-of

units that

receive the

experiences

consists

experiment

noted that

each

on the units,

and of thetrasted,

of the

treaiments

themselves

of obseruations

made

settings

in which

the study

is conducted.

Taking

the first

letter from each

of these"instances

on which

datafour iords,

he defined

the acronym

utos to

refer to the

"1.982,p.

measures'treatments'

78)-to

the actual

people,

are collected"

(Cronb

ach,

two

problems ofHe then defined

in the experiment.

that were

sampled

and settings

"domain

about

which

[the]

question is

asked"generalizition:

(1)

generaliiing

to the

"units,

treatments,

variables,(2) generalizing to

(p.7g),which

he called

UTOS; and

oUTOS.e*hi.h

he called

"nd r.r,ings

not directly

observed"

(p.

831,

capital S,only used

Cronbach

reasons. For example,

for pedagogical

presentation here

Cronbach's

9. We oversimplify

consistent definitionsand not always

only to

,tos, not

utos. He offered

diverse

,eferred

not small s, so

that his system

broad way we

the word

generalization in the

not use

And he does

of UTOS and

*UTOS,

particular.

I20 I 1. EXPERIMENTS AND GENERALIZED CAUSAL

INFERENCEOur theory of causal

generalization,

outlined below and

presented

in more de-tail in Chapters LL through 13, melds Cronbach's thinking

with our own

ideasabout

generalization

from previous works

(Cook,

1990, t99t;

Cook 6c Camp-bell,1979), creating a theory that is different

in modest ways

from both of thesepredecessors.

Our theory

is influenced by Cronbach's work in

two ways.

First, wefollow him by describing experiments consistently

throughout

this book as

con-sisting of the elements of units, treatments, observations,

and settingsrlo

thoughwe frequently substitute

persons

for units

given

tfield experimentation

isconducted with humans as

participants.

:We

hat most

also

often

substitute outcome

ob-seruations

given

the centrality of observations

about outcome

when examiningcausal relationships.

Second, we acknowledge

that researchers

are often

interestedin two kinds lization about each

of these five

elements,

and that thesetwo types are inspired bg but not identical to, the

Cronbach defined.

'We

two kinds

generalization

thatcall these

construct validity

generalizations

(inferencesabout

the constructs that research operations

represent)

and external validity

gen-eralizations

(inferences

about whether the causal

relationship holds

over

variationin persons, settings, treatment, and measurement

variables).Construct Validity: Causal Generalizationas

RepresentationThe first causal generalization

problem

concerns

how to

go from the

particularunits, treatments, observations, and settings

on which data

are collected to thehigher order constructs these

instances represent.

These constructs are

almost al-ways couched in terms that are more abstract

than the

particular instances

sam-pled in an experiment.

The labels may

pertain to the individual

elements of the ex-periment

(e.g.,

is the outcome measured by

given test best described asintelligence or as achievement?). Or the labels

may

pertain to the nature of

rela-tionships among elements,

including causal relationships,

as when cancer

treat-ments are classified

as cytotoxic or cytostatic

depending

on whether

they kill tu-mor cells directly or delay tumor

growth by modulating their

er a randomized experiment by Fortin

and Kirouac

(1.9761.

The treatmentwas a brief educational course administered by several

nurses,

who

gave

a tour oftheir hospital and covered some basic facts

about surgery

with individuals

whowere to have elective abdominal or thoracic surgery

1-5 to 20 days

later in a sin-gle Montreal hospital. Ten specific

outcome measures

were used

after the surgery,such as an activities of daily living scale

and a count

of the analgesics

used

to con-trol

pain.

Now compare this study with its

likely t^rget constructs-whether10.

occasionally

refer

to time as a separate feature of experiments,

following Campbell

(79571

and Cook andCampbell

(19791,

because time can cut across the other

factors

independently.

Cronbach

did not include time

inhis notational system, instead incorporating time into treatment

(e.g.,

the scheduling

of treatment), ,

when measures are administered), or setting

(e.g.,

the historical

context

of the experiment).

coNNEcrtoNS

oF cAUsAL

THE GENERALIZATIoN

AND

EXnERTMENTs |

,,Ipromotes

physical recovery

(the

targ€t effect)patient education

(the

target

cause)

"*ong

surgical

patients

(the

target

population of units)

in hospitals

(the

targetin which the

ques-research,

in basic

univeise

ofiettings).

Another

example

occurs

usedtion frequently

aiises

as to

whether

the actual

manipulations

and measures

and effect

constructs

specifiedin an experiment

really

tap into

the specific

cause

to a theory

is simply toan empirical

challenge

by the

theory. One

way to

dismiss

as they are

spec-the concepts

that the

data

do not

really

represent

make the

case

ified in the

initial understandingto change

Empirical

resnlts

often

force researchers

to aleads

the reconceptuahzation

of whaithe

domain

under

study

is. Sometimes

about what

has been

studied.

Thus the

planned causalmore restricted

inference

agent

in the

Fortin and

Kirouac

(I976),study-patie,nt

education-might

need

toed as

informational

patient education

if the information

component

ofb! respecifi

the treatment

proved to be

causally

to recovery

from surgery

but the

tourto thinklead researchers

sometimes

of the

hospital

did not.

Conversely

data can

than those

withthat are

general

and categories

in terms

o?,"rg.,

constructs

program.

Thus

the creative

analyst

patient educa-a research

which they

began

of interventions

thattion studies

mlght

surmise

that

the treatment

is a subclass

"perceived

control"

or that

recovery

from surgery

can befunction

by increasing

coping."

Subsequent

readers of the

study cantreated as

a subclas

control

is re-claiming

that

perceived

even add their

own

interpietations,

perhaps

construct.

There

is aof the even

general

self-efficacy

case

ally

just

a special

intendedthe researcher

sobtie

interplay

over time

among

the original