Tải bản đầy đủ - 0 (trang)
8 Reichenbach's principle of the common cause

8 Reichenbach's principle of the common cause

Tải bản đầy đủ - 0trang

Natural selection


causes Y from the hypothesis that X and Y are joint effects of a common

cause. Still, if the argument merely pits the hypothesis that X causes Y

against the hypothesis that X and Y are causally independent, it makes

sense, and the law of likelihood explains why.28

I now want to examine a different approach to this testing problem. It

appeals to an idea that Hans Reichenbach (1956) called the principle of the

common cause. This principle says that if the variables X and Y are correlated, then either X causes Y, Y causes X, or X and Y are joint effects of a

common cause.29 These three possibilities define what it means for X and

Y to be causally connected. Reichenbach’s principle has been central to the

Bayes net literature in computer science; it is closely connected with the

causal Markov condition (see Spirtes et al. 2001 and Woodward 2003).

Although Reichenbach’s principle and the likelihood approach I have

taken may seem to be getting at the same thing, I think there is a deep

difference. In fact, if the likelihood approach is right, then Reichenbach’s

principle must be too strong. The likelihood approach does not say that X

and Y must be causally connected if they are correlated; it doesn’t even say

that they probably are. The most that the law of likelihood permits one to

conclude is that the hypothesis of causal connection is better supported by

correlational data than is the hypothesis of causal independence.

To delve deeper into the principle of the common cause, let’s begin

with an example that Reichenbach used to illustrate it. Consider an acting

troupe that travels around the country presenting plays. We follow the

company for several years, recording on each day whether the leading man

and the leading lady have upset stomachs. This data allow us to see how

frequently each of them gets sick and how frequently both of them get

sick. Suppose the following inequality is true:


f ðActor 1 gets sick & Actor 2 gets sickÞ

> f ðActor 1 gets sickÞf ðActor 2 gets sickÞ:



The complaint of Leroi et al. (1994) that the comparative method does not get at the causal basis

of selection (because it fails to pry apart selection-of from selection-for, on which see Sober 1984)

needs to be understood in this light.

Reichenbach additionally believed that when X and Y are correlated and neither causes the other,

not only does there exist a common cause of X and Y; in addition, if all the common causes

affecting X and Y are taken into account, they will screen off X from Y, meaning that the completely

specified common causes will render X and Y conditionally probabilistically independent of each

other. Results in quantum mechanics pertaining to the Bell inequality have led many to question

this screening-off requirement (see, for example, Van Fraassen 1982).

Natural selection






f(X&Y) > f(X)f(Y )

Pr(X&Y ) > Pr(X)Pr(Y)



X,Y are



Figure 3.22 Although the principle of the common cause is sometimes described as

saying that an ‘‘observed correlation’’ entails a causal connection, it is better to divide the

inference into two steps.

Here f(e) means the frequency of days on which the event (type) e occurs.

For example, maybe each actor gets sick once every twenty days, but the


frequency of days on which both get sick is greater than 400

. If this

inequality is big enough and we have enough data, our observations will

license the inference that the following probabilistic inequality is true:

ð2Þ For each day i; PrðActor 1 gets sick on day i &

Actor 2 gets sick on day iÞ

>PrðActor 1 gets sick on day iÞPrðActor 2 gets sick on day iÞ:

It is important to be clear on the difference between the observed association

described in (1) and the inferred correlation stated in (2); this distinction was

discussed in §2.18 in connection with the inductive sampling formulation of

the argument from design. The association is in our data. However, we do

not observe probabilities; rather, we infer them.30 Once our frequency data

permit (2) to be inferred, the principle of the common cause kicks in,

concluding that there is a causal connection between one actor’s getting sick

on a given day and the other’s getting sick then too. Perhaps the correlation

exists because the two actors eat in the same restaurants; if one of them eats

tainted food on a given day, the other probably does too. The two-step

inference just described is depicted in Figure 3.22.

How could X and Y be associated in the data without being correlated?

Perhaps the sample size is too small. If you toss a pair of coins ten times, it

is possible that heads on one will be associated with heads on the other, in


In this inference from sample frequencies to probabilities, Bayesians will claim that prior

probabilities are needed while frequentists will deny that this is necessary. Set that disagreement


Natural selection


the sense that there is an inequality among the relevant frequencies. But

this may just be a fluke; the tosses may in fact be probabilistically independent of each other. One way to see whether this is so is to do a larger

experiment. If the association in the ten tosses is just a fluke, you expect

the association to disappear as sample size is increased.

The principle of the common cause sounds like a sensible idea when it

is considered in connection with examples like Reichenbach’s acting

troupe. But is it always true? Quantum mechanics has alerted us to the

possibility that X and Y might be correlated without being causally

connected; maybe there are stable correlations that are just brute facts.

However, it is not necessary to consider the world of micro-physics to find

problems for Reichenbach’s principle. Yule (1926) described a class of

cases in which X and Y are causally independent though probabilistically

correlated. A hypothetical example of the kind of situation he had in

mind is provided by the positive association of sea levels in Venice and

bread prices in Britain over the past 200 years (Sober 2001). Since both

have increased monotonically, higher than average values of the one are

associated with higher than average values of the other. This association is

not due to sampling error; if yearly data were supplemented with monthly

data from the same 200 years, the pattern would persist. Nor is this

problem for Reichenbach’s principle restricted to time series data. Variables can be spatially rather than temporally associated, due to two causally

independent processes each leading a variable to monotonically increase

across some stretch of terrain. Suppose that bread prices on a certain date

in the year 2008 increase along a line that runs from southeast to

northwest Europe. And suppose that songbirds on that day are larger in

the northwest than they are in the southeast. If so, higher bread prices are

spatially associated with larger songbirds. And the association is not a

fluke, in that the pattern of association persists with larger sample size.

But still, songbird size and bread prices may well be causally independent.

Reichenbach’s principle is too strong. The probabilistic correlation

between X and Y may be due to the fact that X and Y are causally

connected. However, to evaluate this possibility, we must consider

alternatives. If the alternatives we examine have lower likelihoods, relative

to data on observed frequencies, this provides evidence in favor of the

hypothesis of causal connection. On the other hand, if we consider an

alternative hypothesis that has the same likelihood as the hypothesis of

causal connection, then the data do not favor one hypothesis over the

other, or so the law of likelihood asserts. There is no iron law of metaphysics that says that a correlation between two variables must be due to


Natural selection

their being causally connected. Whether this is true in a given case should

be evaluated by considering the data and a set of alternative hypotheses,

not by appealing to a principle.

Those who accept Reichenbach’s principle invariably think that it is

useful as well as true. They do not affirm that correlation entails causal

connection only to deny that we can ever know that a correlation exists.31

Reichenbach’s treatment of the example of the two actors is entirely

typical. The data tell you that there is a correlation, and the correlation

tells you that there is a causal connection. This readiness to use Reichenbach’s principle to draw causal inferences from observed associations

suggests the following argument against the principle. Take a data set that

you think amply supports the claim that variables X and Y are probabilistically correlated. If you believe Reichenbach’s principle, you are

prepared to further conclude that X and Y must be causally connected.

But do you really believe that the data in front of you could not possibly

have been produced without X and Y being causally connected? For

example, take Allman et al.’s data set (§3.7). Surely it is not impossible that

each primate species (both those in the data set and those not included)

came to its values for X and Y by its own special suite of causal processes.

I do not say that this is true or even plausible, only that it is possible. This

is enough to show that Reichenbach’s principle is too strong.

Although the example I have considered to make my argument against

Reichenbach’s principle involves a data set in which two variables

monotonically increase with time, the same point holds for a data set in

which the variables each rise and fall irregularly but in seeming synchrony.

If a common cause model is plausible in this case, this is not because a

Reichenbachian principle says that it must be true. Rather, its credentials

need to be established within a contrastive inferential framework, whether

the governing principle is the law of likelihood or a model selection

criterion like AIC. For a monotonic data set, a fairly simple commoncause model and a somewhat more complex separate-cause model each fit

the data well, in which case the former will have a slightly better AIC score

than the latter. When the data set is a lot more complex, a common-cause

model that achieves good fit will have far fewer adjustable parameters than

a separate-cause model that does the same, in which case the difference in

their AIC scores will be more substantial. It does not much strain our


It is tempting to argue that Venetian sea levels and British bread prices really aren’t correlated

because if enough data were drawn from times outside the 200-years period, the association would

disappear. Well, maybe it would, but so what? Why must real correlations be temporally (and

spatially) unbounded?

Natural selection


credulity to imagine that the steady rise in British bread prices and

Venetian sea levels is due to separate causes acting on each; the strain may

be more daunting for two time series that have lots of synchronous and

irregular wiggles. But this difference is a matter of degree and the relevant

inferential principles are the same. Strong metaphysics needs to be

replaced by more modest epistemology.32




The shift from the task of explaining a single trait value in a single species

(§3.1–§3.5) to that of explaining a correlation that exists across species

(§3.6) renders the problem of testing selection against drift more tractable.

In the former case, you need to know the location of the optimal trait

value towards which selection, if it occurs, will push the lineage. In the

latter, all that is needed is information about the slope of the optimality

line. Instead of needing to know what the optimal fur length is for the

polar bear lineage, it suffices to know that, if selection acts on fur length,

bears in cold climates have a longer optimal fur length than bears in warm.

A great deal of work in population genetics attempts to get by with

even less. Geneticists often test selection against drift by comparing DNA

sequences drawn from different species; a number of statistical tests have

been constructed for doing this (see Page and Holmes 1998, Kreitman

2000, and Nielsen 2005 for reviews). Scientists carry out these tests with

little or no information about the roles that different parts of these

sequences play in the construction of an organism’s phenotype. If these

tests are sound, they require no assumptions concerning what the optimal

sequence configuration would be; in fact, they don’t even require assumptions concerning how the optimum in one species is related to the optimum

in another. If selection can be tested without this type of information, what

does the hypothesis of natural selection predict about what we observe?

The parallel question about what drift predicts is easier to answer. The

predictions are probabilistic, not deductive. They take the following

form: If the process is one of pure drift, then the probability of this or that

observable result is such-and-such. If the observations turn out to deviate


Hoover (2003) proposes a patch for Reichenbach’s principle that introduces considerations

concerning stationarity and cointegration, but his proposal is still too strong; it isn’t true that a

data set that satisfies his requirements must be due to the two variables’ being causally connected.

In addition, Hoover’s reformulation makes no recommendations concerning some data sets that in

fact do favor a common cause over a separate cause model.


Natural selection

from what the drift hypothesis leads you to expect, should you reject it?

If you should, and if selection and drift are the only two alternatives,

selection has been ‘‘tested’’ by standing idly on the sidelines and witnessing the refutation of its one and only rival. If probabilistic modus

tollens (§1.4) made sense, this would be fine. But it does not. For selection

and drift to be tested against each other, both must make predictions. This

is harder to achieve for selection than it is for drift, since drift is a null

hypothesis (predicting that there should be zero difference between

various quantities; see below) whereas selection is a composite hypothesis

(predicting a difference but leaving open what its magnitude should be).

A central prediction of the neutral theory of molecular evolution is that

there should be a molecular clock (Kimura 1983). In a diploid population

containing N individuals, there are 2N nucleotides at a given site. If each

of those nucleotides has a probability l of mutating in a given amount of

time (e.g., a year), and a mutated nucleotide has a probability u of evolving

from mutation frequency to fixation (i.e., 100 percent representation in

the population), then the expected rate of substitution (i.e., the origination

and fixation of new mutations) at that site in that population will be

k ¼ 2N lu:

This covers all the mutations that might occur, regardless of whether they

are advantageous, neutral, or deleterious. Because l and u are probabilities, k isn’t the de-facto rate of substitution; rather, it is a probabilistic

quantity – an expected value (§1.4). If the 2N nucleotides found at a site

at a given time are equal in fitness, the initial probability that each has of

eventually reaching fixation is




I say that this is the ‘‘initial’’ probability since the probability of fixation

itself evolves.33 If we substitute 1/2N for u in the first equation, we obtain

one of the most fundamental propositions of the neutral theory:

Neutralityị k ẳ l:


This equality pertains to the 2N token nucleotides present at the start of the process; some of those

tokens may be of the same type. It follows from the above equality that the initial probability that a

type of nucleotide found at time t will eventually reach fixation is its frequency at time t. This

point applies to phenotypic drift models as well as genetic ones; see the squashing of the bell curve

depicted in Figure 3.4.

Natural selection


The expected rate of substitution at a site is given by the mutation rate if

the site is evolving by drift. Notice that the population size N has cancelled out.

What will happen if a mutation is advantageous? Its probability of

fixation (u) depends on the selection coefficient (s), on the effective

population size Ne, and on the population’s census size N:

2sNe 34



Substituting this value for u into the first equation displayed above, we

obtain the expected rate of evolution at a site that experiences positive


Selectionị k ẳ 4Ne sl:

If the probability of mutation per unit time at each site remains constant

through time (though different sites may have different mutation probabilities), the neutral theory predicts that the expected overall rate of

evolution in the lineage does not change. This is the clock hypothesis. It

doesn’t mean that the actual rate never changes; there can be fluctuations

around the mean (expected) value. The selection hypothesis is more

complicated. If each site’s value for Ne sl holds constant through time,

(Selection) also entails the clock hypothesis. But there is every reason to

expect this quantity to fluctuate. After all, Ne is a quantity that reflects the

breeding structure as well as the census size of the population (Crow and

Kimura 1970) whereas s, the selection coefficient, reflects the ecological

relationship that obtains between a nucleotide in an organism and the

environment. With both these quantities subject to fluctuation, it would

be a miracle if their product remained unchanged. This is why (Selection)

is taken to predict that there is no molecular clock.35

If we could trace a single lineage through time, taking molecular

snapshots on several occasions, it would be easy to test (Neutrality)

against (Selection). Although this procedure can be carried out on

populations of rapidly reproducing organisms, it isn’t feasible with respect

to lineages at longer time scales. It is here that the fact of common

ancestry comes to the rescue, just as it did in §3.6. We do not need a time



This useful approximation is strictly correct only for small s and large N.

The simple selection model described here does not predict a clock, but more complicated

selection models sometimes do. Discussion of these would take us too far afield.

Natural selection


Old World





New World



Figure 3.23 Given the phylogeny, the neutral theory entails that the expected difference

between 1 and 3 equals the expected difference between 2 and 3 (figure from Page and

Holmes 1998: 255).

machine that allows us to travel into the past so that we can observe earlier

states of a lineage we now see in the present; rather, we can look at three or

more tips in a phylogenetic tree and perform a relative rates test. Figure 3.23

provides an independently justified phylogeny of human beings, Old

World monkeys, and New World monkeys. We can gather sequence

data from these three taxa and observe how many differences there are

between each pair. The neutral hypothesis predicts that (d13 À d23) ¼ 0

(or, more precisely, it entails that the expected value of this difference is

zero); here dij is the number of differences between i and j. We don’t

need to know how many changes occurred in each lineage; it suffices to

know how many changes separate one extant group from another.

Looking at the present tells you what must have occurred in the past,

given the fact of common ancestry.

Li et al. (1987) carried out this test and discovered that d13 is significantly greater than d23; they didn’t look at whole genomes but at a

sample of synonymous sites, introns, flanking regions, and a pseudo-gene,

totaling about 9,700 base pairs. With respect to these parts of the genome,

human beings diverged more slowly than Old World monkeys from their

most recent common ancestor (see also Li 1993). This result was taken to

favor (Selection) over (Neutrality). In view of the negative comments I

made about Neyman–Pearson hypothesis testing in Chapter 1, I want to

examine the logic behind this analysis more carefully. Using relative rates

to test drift against selection resembles using two sample means to test

whether two fields of corn have the same mean height. The null

Natural selection


hypothesis says that they are the same; the alternative to the null says that

they differ, but it does not say by how much. The Neyman–Pearson

theory conceives of the testing problem in terms of acceptance and

rejection and requires that one stipulate an arbitrary level for a, the

probability of a Type-1 error. I suggested in Chapter 1 that it makes more

sense to place this problem in a model-selection framework. In the

relative-rates test, the drift hypothesis has no adjustable parameters

whereas the selection hypothesis has one. The question is not which

hypothesis to reject but which can be expected to be more predictively

accurate. The AIC score of the selection hypothesis is found by determining the maximum likelihood value of a single parameter h, the

expected value of (d13 À d23), taking the log-likelihood when h is assigned

its maximum likelihood value and subtracting the penalty for complexity.

The question is whether the selection hypothesis’ better fit to data suffices

to compensate for its greater complexity.

Bayesians and likelihoodists come at the problem differently. Their

framework obliges them to compute the average likelihood of the selection

hypothesis, which, as noted, is composite. If selection acted on these

different parts of the genome, how much should we expect d13 and d23

to differ? We would need to answer this question without looking at

the data. And since different types of selection predict different values for

(d13 À d23), our answer would have to average over these different possibilities. It isn’t impossible that empirical information should one day

provide a real answer to this question. However, at present, there is no

objective basis for producing an answer. I suggest that the model-selection

approach is more defensible than both Bayesianism and Neyman–Pearson

hypothesis testing as a tool for structuring the relative rate test.

The role played by the fact of common ancestry in facilitating tests of

process hypotheses can be seen in another context. If we could look at a

large number of replicate populations that all begin in the same state, we

could see if the variation among the end states of those lineages is closer to

the predictions of neutrality or selection. But why think that each of these

lineages begins in the same state? The answer is simple: If they share a

common ancestor, they must have. The neutral theory predicts that the tips

of a tree should vary according to a Poisson distribution. A number of

mammalian proteins (e.g., Hemoglobin a and b, Cytochrome c, Myoglobin)

were found to be ‘‘over dispersed’’ (Kimura 1983; Gillespie 1986), and

this was taken to be evidence of selection. Once again, neutrality is a null

hypothesis, and selection is composite.


Natural selection

Like the relative rate test, the McDonald–Kreitman test also relies on

an independently justified phylogeny, and there is no optimality line in

sight. McDonald and Kreitman (1991) examined sequences from three

species of Drosophila that all play a role in constructing the protein alcohol dehydrogenase (Adh). Fruit flies often eat fruit that is fermented,

and they need to break down the alcohol (human beings have the same

problem and solve it by way of a different version of the same protein). So

it seems obvious that the protein is adaptive. What is less obvious is

whether variations in the gene sequences that code for the protein are

adaptive or neutral. Perhaps different species find it useful to have different versions of the protein, and selection has caused these species to

diverge from each other. And different local populations that belong to

the same species may also have encountered different environments that

select for different sequences. Alternatively, the variation may be neutral.

The McDonald–Kreitman test compares the synonymous and nonsynonymous differences that are found both in different populations of

the same species and in different species. A substitution in a codon is said

to be synonymous when it does not affect the amino acid that results. For

example, the codons UUU and UUC both produce the amino acid

phenylalanine, while CUU, CUC, CUA and CUG all produce leucine. It

might seem obvious that synonymous substitutions must be caused by

neutral evolution, since they do not affect which amino acids and proteins

are constructed downstream. This would suggest that the hypothesis that

nonsynonymous substitutions evolve neutrally can be tested by seeing if

the rates of synonymous and nonsynonymous substitutions are the same.

However, it is possible that synonymous substitutions might not be

neutral, owing, for example, to differences in secondary structure, for

example, having to do with the stability of the molecules (Page and

Holmes 1998: 243). What seems safer is the inference that the ratio of the

rates of synonymous to nonsynonymous substitutions should be a constant if there is neutral evolution. Both should depend just on the

mutation rate, as discussed above. This ratio should have the same value

regardless of whether the sequences compared come from two populations of the same species or from different species.

Figure 3.24 describes the four kinds of observations that McDonald

and Kreitman assembled. They counted the number of nonsynonymous

and synonymous differences that separate different populations of the

same species; these are called polymorphisms. They also counted the

number of synonymous and nonsynonymous fixed differences that separate

pairs of species; these are sites that are monomorphic within each species

Natural selection

D. Melanogaster



D. simulans




D. yakuba




fixed differences

Figure 3.24 The number of nonsynonymous and synonymous differences that exist

within and between three Drosophila species at the Adh locus. The within-species

differences are called polymorphisms; the between-species differences are called fixed.

Data from McDonald and Kreitman (1991); figure from Page and Holmes (1998: 267).

but vary between them. Here are the numbers of differences that

McDonald and Kreitman found in the four categories:




Nonsynonymous 7




If these sequences evolved neutrally, the ratio of synonymous to nonsynonymous substitutions for the first column should be about the same

as the ratio for the second. But they aren’t close:







There is an excess of nonsynonymous fixed differences (or a deficiency of

polymorphic nonsynonymous substitutions). McDonald and Kreitman

took this departure from the prediction of neutrality to be evidence for

selection: Selection had reduced the within-species variation and amplified the between-species variation at nonsynonymous sites. They note that

a population bottleneck could also explain the data but argue that the

known history of Drosophila makes this alternative implausible.

The same epistemological questions arise in connection with the

McDonald–Kreitman test that I raised about the relative rates test. The

inference should not be thought of as an instance of probabilistic modus

Tài liệu bạn tìm kiếm đã sẵn sàng tải về

8 Reichenbach's principle of the common cause

Tải bản đầy đủ ngay(0 tr)