By Lemma 7.16, the mutual information

between the X sequence and the the Y sequence, is upper

bounded by summation i equals 1 up to n, the mutual

information between X_i and Y_i, which is at most n times

C. By Fano's inequality, the conditional

entropy of W given W hat, that is, the first

term above, is less than one plus P_e, the

average probability of error, that is, the

probability that W is not equal to W hat, times

log, times log of the size of the message set.

And this is equal to one plus P_e times log M.

Then, from step 2, we have log M is less than or equal to H(W|W hat)

plus nC,

where by Fano's inequality,

H(W|W hat) is less than one plus P_e times log M.

Now P_e,

the average probability of error, as we have seen,

is less than or equal to lambda_max. And by our assumption

of the code, lambda_max is less than epsilon.

Now moving the term

epsilon time log M to the left hand side, we have 1 minus epsilon times

log M is less than one plus nC.

And moving 1 minus epsilon to the right hand side, we have log M less than one plus nC divided by one minus
epsilon.

Upon dividing by n on both sides,

we have one over n times log M less than one over n

plus C

divided by one minus epsilon. Therefore, following our assumption on the code, we have

R minus epsilon less than one over n times log M. And one over n times log M is less than one over n plus C
divided by one minus epsilon. So we obtain R minus epsilon
is less than one over n plus C divided by one minus epsilon.
To complete the proof,

we let n goes to infinity so that one over n goes to zero, and then let epsilon goes to zero, to conclude that R is less
than or equal to C.