0:00

Hello and welcome back to Introduction to Genetics and Evolution.

Â In the previous set of videos, we talked about testing for

Â whether populations fit the assumptions of the Hardy-Weinberg equilibrium.

Â At the very end of the last set of videos, we looked particularly at one

Â deviation from the Hardy-Weinberg equilibrium, that of the Wahlund effect.

Â That is what happens when you sample two populations simultaneously

Â that differ in some of their allele frequencies, and

Â as a result of that, you see an under-representation of heterozygotes.

Â Let's recap that for just a moment, and

Â then we'll move into how we can actually leverage this Wahlund effect for

Â actually quantifying differences between populations in allele frequency.

Â 0:45

Now what we did last time is we tried to calculate Hardy-Weinberg expectations and

Â fit the Hardy-Weinberg expectations for this MN blood type.

Â I showed you data from Navajo and Aborigines.

Â Here's the data from the Navajo.

Â So we first have our raw genome type counts,

Â from that we got the total number of individuals.

Â Using these two together,

Â we then were able to calculate the observed genotype frequencies.

Â From the observed genotype frequencies,

Â we calculate out the observed allele frequencies here.

Â And then from these observed allele frequencies, we look at the expected, or

Â predicted, Hardy-Weinberg genotype frequencies.

Â And these are down here.

Â 1:36

Now what happens when we sample across two populations simultaneously.

Â Well, if we sample from Navajo and Aborigine together,

Â we follow the same overall procedure.

Â But in this case, now when we look at the predicted genotype frequencies,

Â they do not very well match the observed genotype frequencies.

Â And particularly we see this deficiency of heterozygotes in the observed

Â all through to the predicted.

Â The 0.488 predicted number is much greater than the observed, 2.246.

Â I attributed this to the Wahlund effect.

Â That is what happens when you have populations that differ in allele

Â frequencies even if each population individually is a Hardy-Weinberg.

Â If you put them together, you see a deficit of heterozygotes.

Â 2:21

Now how is it that populations differ?

Â Well, I like to look at this in two ways.

Â One possible way is they may have different allele and genotype frequencies.

Â That is what we have been emphasizing so far.

Â We'll come back to that in just a moment.

Â But first, I wanted to entertain another possibility.

Â There may be alleles at some genes that are found in some populations but

Â not found in other populations.

Â You may have some alleles that are found, for example, in just East Asians, but

Â you will never see those particular alleles at other populations.

Â The genes will be there but the particular alleles would not.

Â This is especially likely under two particular conditions.

Â One is if it's a very recent new mutation that arose in an East Asian population,

Â for example, it hasn't spread to others.

Â Alternatively, if the populations are in complete isolation,

Â then that difference can persist for quite some time.

Â So let's talk just briefly about this latter group and

Â then we'll come back to the former.

Â 3:14

Now differences can arise by mutation and spread.

Â So let's say in an ancestral population, everybody was aa, bb.

Â You can have over time, let's say for example the population splits,

Â so this is indicating a split between the population.

Â So in one population you're the one on the right, you have a mutation from b to B.

Â This B allele can spread, and eventually everybody here may be aa, BB.

Â In this other population, the one on the lef,t you can have a mutation from a to A.

Â The A allele can spread and eventually this population can be AA, bb.

Â So again, we have in this case a split between populations, and

Â we have differences that can then arise and potentially spread.

Â This is especially likely if you have complete isolation.

Â We'll come back to this section when we talk about speciation later on.

Â 4:02

Now groups within species are different yet they are related.

Â That they may have some of these unique alleles that are not found in

Â other groups.

Â But it's probably more likely that most of them have very different genotype or

Â allele frequencies instead.

Â This shows approximate relationships from one calculation of

Â various human ethnic groups.

Â So for example, Melanesian and Papuan are more closely related to each other

Â than either is two populations from the Mediterranean or from Siberia, etc.

Â 4:33

Well the easiest thing to quantify is if everybody in

Â one population differs from everybody else in another population at some allele.

Â This is referred to as a fixed difference, all right?

Â So for example, in Population 1 everybody's AA.

Â Population 2, everybody's aa.

Â So we're looking specifically at the aging in this case.

Â Everybody in one population differs from everybody in the other population.

Â We refer to this as a fixed difference.

Â So you go back to the ancestor, presumably they had the same alleles.

Â But something arose, at least in one population, and spread to make it so

Â it was completely different from the ancestor and from the others.

Â Now, this does happen, but it's not very common within a species.

Â And it's generally not true among modern human ethnic groups.

Â 5:18

Instead as I said,

Â it's more common that you have frequency differences of alleles and genotypes.

Â For example A may have a frequency of 0.7 in East Asians, but

Â 0.5 in Indian populations.

Â That's just as an example.

Â Now, what we're trying to do here, let me emphasize this,

Â is we're measuring differences between populations, not individuals.

Â We cannot say just because somebody has A,

Â oh therefore they are of Middle Eastern descent or something like that.

Â It's based on overall population.

Â The relative abundance of particular alleles

Â differs in one population as a whole relative to other populations as a whole.

Â Well this poses a challenge.

Â How do we quantify these slight differences in frequency between

Â populations?

Â 6:03

Well, the deviation from Hardy-Weinberg that's associated with the Wahlund effect

Â that I've already introduced,

Â actually allows you to quantify allele frequency differences between population.

Â So, let's assume the two populations are at Hardy-Weinberg, and

Â that is an important assumption.

Â We'll come back to that later.

Â So, if you sample each population by itself you'll see them at Hardy-Weinberg.

Â So this would be similar to what we saw with the Aborigines and

Â the Navajo example I showed you earlier.

Â But if you sample both together, we see this deviation from Hardy-Weinberg.

Â And again, as you saw, that was the Wahlund effect.

Â Now how big this deviation is from Hardy-Weinberg when sampling these

Â populations simultaneously, will quantify the difference in allele frequencies.

Â That if you're fairly close, the allele frequencies must be similar.

Â If you're very different, then you'll see a very large Wahlund effect.

Â 6:55

So the measure that we'll use is referred to as F ST.

Â So it's a F with subscript ST.

Â Now this measure ranges from 0 to 1.

Â If you have 0, then there is absolutely no frequency differences between

Â the populations you're studying.

Â If it's between 0 and 1, then the allele frequencies differ somewhat, and

Â if it's 1 you actually have this fixed sequence difference.

Â Such that for example everyone in this population is AA,

Â everybody in that population is aa.

Â They don't actually all have to be the same.

Â Maybe all of these would be a1 a1, or a1 a2, or a2 a2, and

Â over here in this other population they're all a3 a3 or a3 a4 or a4 a4.

Â Basically it just means you have no overlap in alleles between this population

Â and that population.

Â Now the measure is very simple.

Â So F ST can be estimated as the Hardy-Weinberg predicted 2pq.

Â Basically what would happen if you were at Hardy-Weinberg- %

Â observed heterozygotes / Hardy-Weinberg predicted 2pq.

Â Essentially it's the predicted- the observed / the predicted,

Â which is fairly common.

Â Let me show you an application of this, and

Â you'll see this is very straightforward.

Â 8:03

So let's say that, there's an example as I mentioned to you of fixed difference.

Â Population 1 everybody's AA.

Â Population 2, everybody's aa.

Â So here's the totals.

Â Now let's imagine we're sampling these two populations together, and we have

Â to assume that we're sampling similar numbers here to calculate F ST correctly.

Â So here's our pooled population sample, 100, 0, 100.

Â Well, if we calculate this out the total genotype count is 200.

Â When we look at the genotype frequencies the observed would be 0.5, 0, and 0.5.

Â All right,

Â half the individuals we're sampling from this pooled population are AA.

Â Half of them are aa, none of them are Aa.

Â Here are our allele frequencies, 0.5 and 0.5.

Â All right, all straightforward so far.

Â 9:05

This is the Hardy-Weinberg predicted, 0.5,- the observed,

Â 0, / 0.5, and this gives you an F ST of 1.

Â As I mentioned, if you have an F ST of 1, that indicates a fixed difference.

Â Let's try one that's not quite so extreme.

Â 9:21

Here's a set of populations.

Â Each population size here is 1,000.

Â So we have our total population size is 2,000.

Â This population is at Hardy-Weinberg.

Â This population is also at Hardy-Weinberg, but they differ in allele frequencies.

Â So, here's our pooled population, I just summed these numbers together.

Â 250 + 90 is 340.

Â 250 + 490 is 740, etc.

Â So again, what we do is we calculate the totals, get the genotype frequencies,

Â get the allele frequency.

Â So, here's our total, 2,000.

Â Here's our genotype frequencies, which is each of those numbers divided by 2000.

Â Here's our allele frequencies, actually comes out very nicely to 0.6 and 0.4.

Â So what is our Hardy-Weinburg predicted 2pq?

Â Well, it would be 2 times 0.6 time 0.4.

Â So 0.6 time 0.4 is 0.24, so

Â our predictive should be 0.48 if my math in my head is correct.

Â 10:11

That is what we see.

Â In contrast to the observed, it's 0.46, so again we have a slight difference here.

Â It's not a huge difference in this case, but it is noticeable nonetheless.

Â Well now we can calculate a much smaller F ST than we did before.

Â Here's our number, here's our Hardy Weinberg predicted,

Â 0.48- 0.46 which is the observed, / 0.48.

Â Again, it's always the predicted minus the observed over the predicted.

Â And this case our F ST measure comes out to 0.042.

Â And again in this case we have small differences in allele frequencies.

Â In the previous example we had a very large difference in allele frequencies,

Â and you can see that F ST ranges from 0 to 1, 0 being no difference,

Â 1 being complete difference.

Â Let me give you one to try.

Â Here's a mixed data from the Aborigines and Navajo.

Â The numbers here aren't perfect because the sample sizes weren't exactly right,

Â but let's just pretend for a moment that they are.

Â Go ahead and calculate what you would see as F ST in this example for

Â Aborigine to Navajo.

Â 11:15

Well, I hope that wasn't too difficult.

Â Let's go ahead and run through them.

Â Again, it's just to remind you what F ST is.

Â F ST is the Hardy-Weinberg predicted- the observed heterozygotes / the predicted.

Â So let's put in the numbers.

Â Here's our allele frequencies.

Â Here's our expected, or Hardy-Weinberg predicted.

Â We do see a deviation in this case.

Â There's our predicted is 0.488, our observed is 0.246.

Â So what should F ST be?

Â F ST would be in this case 0.488-

Â 0.246 / 0.488.

Â Comes out not exactly but approximately 0.5.

Â So in this case we see a fairly large F ST.

Â And you'd expect this that there's been no historical gene flow between Aborigine and

Â Navajo.

Â 12:08

Now again to recap, F ST is larger when you're

Â comparing populations that are more different in allele frequencies.

Â So again the Aborigine and Navajo were very different allele frequencies and

Â that's why we saw fairly large F ST there, 0.5.

Â If the frequencies were identical, F ST would be 0.

Â If they were fixed different, F ST would be 1.

Â Now let me give you a couple of values, so

Â you get an idea of what we actually see in human populations.

Â Obviously Aborigine and Navajo is an extreme example.

Â Among human populations, this is from a 2010 study.

Â They used a little bit over a million SNPs.

Â If you look at African Americans relative to Europeans, F ST was about 0.11, so

Â that's noticeable, but not tremendous.

Â African Americans to Chinese, F ST was 0.15.

Â Europeans to Chinese, F ST was about 0.11.

Â And if you look among the European populations, let's say for example,

Â you compare the Spanish to the Italians to the Germans, things like that.

Â F ST's typically, not always, but typically less than 0.01, very,

Â very low as you look at that sort of scale.

Â Let me show you a big table.

Â Here's a big table of F ST measures.

Â I'm not going to walk through all of these, but

Â you can see that there are some cases which are fairly high.

Â Like over here we have a couple that are over 0.2, like Papuan to Red Sea, etc.

Â Several of these are high.

Â Several of these are low.

Â But you see this range of values.

Â Now what is F ST in words?

Â 13:29

Well, we can define F ST as the percent heterozygous of randomly chosen alleles

Â within populations, that's the observed aspect,

Â relative to what would be expected in the entire population, okay?

Â It's the percent heterozygous of randomly chosen alleles within populations,

Â relative to what would be expected if there was completely randomly

Â interbreeding.

Â And again, it's measuring this difference in allele frequencies.

Â So, why don't we see higher F ST among human populations?

Â We know there's a lot of non random mating out there, why isn't it bigger?

Â Well, there's a couple of reasons.

Â First, some of the assumptions of F ST are violated in humans, that this is supposed

Â to be applied to genes experiencing little or no natural selection.

Â You'll recall I mentioned earlier that F ST should be applied when each of

Â the individual populations is known to be at Hardy-Weinberg.

Â Now some of the SNPs,

Â for example, that were studied in that previous example, may not be neutral.

Â In fact, some of that variation may be under some sort of selection.

Â It's also susceptible to differences, and

Â historic changes in population size among groups.

Â But probably the biggest reason we don't see higher F ST values

Â is because we actually do have a fair amount of gene flow or

Â closer to random mating among population.

Â That'll be the subject for the next video.

Â Thank you.

Â