So in fact both of these things are the case.
Some DNA substitutions are more likely than others.
So for example, we can divide all possible DNA substitutions
into two categories called transitions and transversions.
And to understand what these are, we first have to understand that the bases A and
G, adenine and guanine, both belong to a category called purines,
and then the bases C and T both belong to a category called pyrimidines.
And for substitutions that change purine to another purine or
that change a pyrimidine to another pyrimidine,
those kinds of substitutions are called transitions.
And then all other kinds of substitutions are called transversions, and so
if you just enumerate the possibilities you'll see that there are twice as
many kinds of transversion, as there are kinds of transitions.
So you might think that transversions are going to be something like twice as
frequent as transitions.
But it turns out that, in reality,
if you look at the substitutions that differentiate, say, the genomes of two
unrelated humans actually transitions are about twice as frequent as transversions.
So it's the other way around from what you would expect.
So if transitions are so much more frequent than transversions,
it seems like in our penalty scheme we might want
to penalize the transversions more than we penalize the transitions.