So what we need to do is find some way of capturing these relative abundances

in our measure of how much variation there is.

So one commonly used metric for measuring the amount of molecular variation in DNA

sequences, when you have sequences from many individuals, is referred to as pi.

Okay? Well,

pi just very simply put is the average number of pairwise

mismatches among all the sequences you have.

This is analogous the 2pq idea that we

used back when we were originally studying Hardy Weinberg.

It's kind of like the average number of predicted heterozygotes.

So let's look at these three example sequences I have here.

Between the first and second sequences, how many mismatches are there?

Well we can look very simply, there is one here, one here, one here.

Made it very easy by making them red?

Well we have three mismatches between the first and second sequence.

What about between the second and third?

Well, those same three positions also differ between the second

and third sequence.

So there you go, we have another three.

What about between the first and third sequence?

How many mismatches are there?

Well, the only variable sites, are the red ones.

And, here they're the same.

Here, they're the same.

And, there they're the same.

So, in fact, between the first and third sequence, there are 0 mismatches.

Now, pi is just the average number of pairwise mismatches.

So, we have, in this case, 3 numbers for mismatches, 3, 3, and 0.

And we had three pairwise combinations, so we just add these up and

divide by the number of pairwise combinations,

which in this case was three, we got three different comparisons.

3 + 3 = 6, 6 / 3 = 2.

Now it's not very useful just to use this two by itself because it could

be two in 10 bases, two in 100 bases, two in a 1000 bases, so typically when people

talk about pi, they average out by the number of sites being investigated.

In this particular case we looked across a stretch of 20 bases.

So pi per site would be two divided by the number of bases,

which in this case was 20.

So in this case, pi would be 0.1, or simply put, 10%.

Now pi is going to be much more affected the more differences you have among

sequences, so this is kind of interesting in that if you have a very rare variance,

it's not actually going to raise pi all that much within a population.

Whereas common variation if say half the individuals have C, and

half the individuals have T, you have a lot more opportunities for

pairwise mismatches.

So pi per site is greater when there are more bases differing among individuals.

And pi is not as much affected by very rare variants.

Let's apply this and see what's happening in terms of studying neutral sequences and

spread of variation.

So again, the question is,

how do patterns of recombination affect variation in neutral sequences?

Now what happens, although these particular sequences, or

the variance that are found at these sequences, may be neutral.

There may be an occasional beneficial mutation that arises.

So what happens when you have an occasional beneficial mutation?

How does that affect the neutral variation everywhere else?

So sequences are not intrinsically neutral, it just happens to be

the variation that's present at those sequences that is neutral.

So let's assume a case where we have no recombination whatsoever,

we have this variation here at base one is neutral.

In this case now we have three Gs, and about six As.

At base two, we again have these As and Ts, at base three we have Cs and Ts but

we're assuming all this variation is neutral.

None of this is affecting fitness.

Now, in this base four here, and let's say that it's possible for

there to be an adaptive mutation at base four.

And let's say here it is.

So here this G that has arisen by mutation is adaptive.

Well if there's no recombination across this entire stretch,

what's going to happen?

Well, since this G here is adaptive, it's going to spread.

And since there's no recombination,

it's basically glued to this whole set of sequence.

So as this G spreads, so will this C, so will this A, so will this A.

Right, so let's watch what happens over time.

The G starts to spread.