which is a measure of the spread of the distribution,
plus a constant, that is one half log 2 pi e.
In particular, as sigma, the standard deviation, tends
to 0, the differential entropy tends to minus infinity.
[BLANK_AUDIO]
The next 2 theorems, are the vector
generalizations of theorem 10.43, and 10.44 respectively.
Let X be a vector of n
continuous random variables, with correlation matrix K tilde.
Then the differential entropy of X is upper bounded by one half log two pi e
to the power n times the determinant of the correlation matrix K tilde
with equality, if and only if X is a Gaussian
vector with mean 0 and covariance matrix K tilde.
[BLANK_AUDIO]
Theorem 10.46 says that, for a random vector with mean nu and
covariance matrix K, the differential entropy is upper bounded by one half log 2 pi e
to the power n, times the determinant of K, with equality if and
only if X is a Gaussian vector with mean mu and covariance matrix K.
[BLANK_AUDIO]
We now prove theorem 10.45.
Define the function r_{ij}(x) to be x_i times x_j, and let
the (i,j)-th element of the matrix K tilde be k tilde ij.
Then the constraints on the pdf of the random vector X,
namely the requirement that the correlation matrix is equal to K
tilde, are equivalent to setting the integral of r_{ij}(x)
f(x)dx over the support of f to k tilde ij.
[BLANK_AUDIO]
It is because r_{ij}(x) is equal to x_i times x_j, and so this
integral is equal to the expectation of X_i times X_j,
that is the correlation between X_i and X_j, and this is for all i,j between 1 and n.
[BLANK_AUDIO]
Now by theorem 10.41, the joint pdf that
maximizes the differential entropy, has the form,
f star of x equals e to the power minus lambda_0,
minus summation over all i and j, lambda_{ij} x_i times x_j,
where x_i times x_j is r_{ij}(x).
[BLANK_AUDIO]
Here, the summation over all i,j, lambda_{ij} times x_i
times x_j can be written as x transpose L times x
[BLANK_AUDIO]
where L is an n by n matrix, with the (i,j)-th element equal to lambda_{ij}.
[BLANK_AUDIO]
Thus, f star is the joint pdf
of a multivariate Gaussian distribution with 0 mean.
[BLANK_AUDIO]
To see this, we only need to compare the form of f
star of x and the pdf of a Gaussian distribution with mean 0.
[BLANK_AUDIO]
Then for all i, j between 1 and n, the covariance between X_i and X_j is
equal to expectation of X_i times X_j, minus
the expectation of X_i times the expectation of X_j,
where the expectation of X_i is equal to 0,
and the expectation of X_j is also equal to 0.