Data processing inequality explained

The data processing inequality is an information theoretic concept that states that the information content of a signal cannot be increased via a local physical operation. This can be expressed concisely as 'post-processing cannot increase information'.

Statement

XYZ

, implying that the conditional distribution of

Z

depends only on

Y

and is conditionally independent of

X

. Specifically, we have such a Markov chain if the joint probability mass function can be written as

p(x,y,z)=p(x)p(y|x)p(z|y)=p(y)p(x|y)p(z|y)

In this setting, no processing of

Y

, deterministic or random, can increase the information that

Y

contains about

X

. Using the mutual information, this can be written as :

I(X;Y)\geqslantI(X;Z),

with the equality

I(X;Y)=I(X;Z)

if and only if

I(X;Y\midZ)=0

. That is,

Z

and

Y

contain the same information about

X

, and

XZY

also forms a Markov chain.[1]

Proof

One can apply the chain rule for mutual information to obtain two different decompositions of

I(X;Y,Z)

:

I(X;Z)+I(X;Y\midZ)=I(X;Y,Z)=I(X;Y)+I(X;Z\midY)

By the relationship

XYZ

, we know that

X

and

Z

are conditionally independent, given

Y

, which means the conditional mutual information,

I(X;Z\midY)=0

. The data processing inequality then follows from the non-negativity of

I(X;Y\midZ)\ge0

.

See also

External links

Notes and References

  1. Book: Elements of information theory . Cover . Thomas . 2012 . John Wiley & Sons.