Data processing inequality explained

The data processing inequality is an information theoretic concept that states that the information content of a signal cannot be increased via a local physical operation. This can be expressed concisely as 'post-processing cannot increase information'.

Statement

X → Y → Z

, implying that the conditional distribution of

depends only on

and is conditionally independent of

. Specifically, we have such a Markov chain if the joint probability mass function can be written as

p(x,y,z)=p(x)p(y|x)p(z|y)=p(y)p(x|y)p(z|y)

In this setting, no processing of

, deterministic or random, can increase the information that

contains about

. Using the mutual information, this can be written as :

I(X;Y)\geqslantI(X;Z),

with the equality

I(X;Y)=I(X;Z)

if and only if

I(X;Y\midZ)=0

. That is,

and

contain the same information about

, and

X → Z → Y

also forms a Markov chain.^[1]

Proof

One can apply the chain rule for mutual information to obtain two different decompositions of

I(X;Y,Z)

I(X;Z)+I(X;Y\midZ)=I(X;Y,Z)=I(X;Y)+I(X;Z\midY)

By the relationship

X → Y → Z

, we know that

and

are conditionally independent, given

, which means the conditional mutual information,

I(X;Z\midY)=0

. The data processing inequality then follows from the non-negativity of

I(X;Y\midZ)\ge0

External links

http://www.scholarpedia.org/article/Mutual_information

Notes and References

Book: Elements of information theory . Cover . Thomas . 2012 . John Wiley & Sons.

Data processing inequality explained

Statement

Proof

See also

External links

Notes and References