Gibbs' inequality explained

thumb|200px|Josiah Willard GibbsIn information theory, Gibbs' inequality is a statement about the information entropy of a discrete probability distribution. Several other bounds on the entropy of probability distributions are derived from Gibbs' inequality, including Fano's inequality.It was first presented by J. Willard Gibbs in the 19th century.

Gibbs' inequality

Suppose that

P=\{p1,\ldots,pn\}

and

Q=\{q1,\ldots,qn\}

are discrete probability distributions. Then

-

n
\sum
i=1

pilogpi\leq-

n
\sum
i=1

pilogqi

with equality if and only if

pi=qi

for

i=1,...n

.[1] Put in words, the information entropy of a distribution

P

is less than or equal to its cross entropy with any other distribution

Q

.

The difference between the two quantities is the Kullback–Leibler divergence or relative entropy, so the inequality can also be written:[2]

DKL(P\|Q)\equiv

n
\sum
i=1

pilog

pi
qi

\geq0.

Note that the use of base-2 logarithms is optional, and allows one to refer to the quantity on each side of the inequality as an "average surprisal" measured in bits.

Proof

For simplicity, we prove the statement using the natural logarithm, denoted by, since

logba=

lna
lnb

,

so the particular logarithm base that we choose only scales the relationship by the factor .

Let

I

denote the set of all

i

for which pi is non-zero. Then, since

lnx\leqx-1

for all x > 0, with equality if and only if x=1, we have:

-\sumipiln

qi
pi

\geq-\sumipi\left(

qi
pi

-1\right)

=-\sumiqi+\sumipi=-\sumiqi+1\geq0

The last inequality is a consequence of the pi and qi being part of a probability distribution. Specifically, the sum of all non-zero values is 1. Some non-zero qi, however, may have been excluded since the choice of indices is conditioned upon the pi being non-zero. Therefore, the sum of the qi may be less than 1.

So far, over the index set

I

, we have:

-\sumipiln

qi
pi

\geq0

,

or equivalently

-\sumipilnqi\geq-\sumipilnpi

.

Both sums can be extended to all

i=1,\ldots,n

, i.e. including

pi=0

, by recalling that the expression

plnp

tends to 0 as

p

tends to 0, and

(-lnq)

tends to

infty

as

q

tends to 0. We arrive at

-

n
\sum
i=1

pilnqi\geq-

n
\sum
i=1

pilnpi

For equality to hold, we require

qi
pi

=1

for all

i\inI

so that the equality

ln

qi
pi

=

qi
pi

-1

holds,
  1. and

\sumiqi=1

which means

qi=0

if

i\notinI

, that is,

qi=0

if

pi=0

.

This can happen if and only if

pi=qi

for

i=1,\ldots,n

.

Alternative proofs

The result can alternatively be proved using Jensen's inequality, the log sum inequality, or the fact that the Kullback-Leibler divergence is a form of Bregman divergence.

Proof by Jensen's inequality

Because log is a concave function, we have that:

\sumipilog

qi
pi

\lelog\sumi

p
iqi
pi

=log\sumiqi=0

where the first inequality is due to Jensen's inequality, and

q

being a probability distribution implies the last equality.

Furthermore, since

log

is strictly concave, by the equality condition of Jensen's inequality we get equality when
q1
p1

=

q2
p2

==

qn
pn

and

\sumiqi=1

.

Suppose that this ratio is

\sigma

, then we have that

1=\sumiqi=\sumi\sigmapi=\sigma

where we use the fact that

p,q

are probability distributions. Therefore, the equality happens when

p=q

.

Proof by Bregman divergence

Alternatively, it can be proved by noting thatq - p - p\ln\frac qp \geq 0 for all

p,q>0

, with equality holding iff

p=q

. Then, sum over the states, we have\sum_i q_i - p_i - p_i\ln\frac \geq 0 with equality holding iff

p=q

.

This is because the KL divergence is the Bregman divergence generated by the function

t\mapstolnt

.

Corollary

The entropy of

P

is bounded by:[1]

H(p1,\ldots,pn)\leqlogn.

The proof is trivial – simply set

qi=1/n

for all i.

See also

Notes and References

  1. Book: Pierre Bremaud. An Introduction to Probabilistic Modeling. 6 December 2012. Springer Science & Business Media. 978-1-4612-1046-7.
  2. Book: David J. C. MacKay. Information Theory, Inference and Learning Algorithms. 25 September 2003. Cambridge University Press. 978-0-521-64298-9.