Suppose the sender transmits the th input symbol as at a probability of , this probability is called the prior probability of .
“Prior probability” is often written in Latin as “a priori probability”.
Suppose the receiver receives as the th output symbol, the probability of this output conditioned on the th input being is the likelihood of : .
However in most cases, we are more interested in the posterior probability of : , i.e., what is the probability that the th input is given the th output is .
“Posterior probability” is often written in Latin as “a posteriori probability”.
The posterior probability helps us determine the amount of information that can be inferred about the input when the output takes a certain value.
The information gain or uncertainty loss about input upon receiving output is the mutual information of and , denoted by [MC12, pp. 126-127].
is thus the uncertainty in before receiving minus the uncertainty in after receiving .
The uncertainty in before receiving , measured in number of bits, is .
The uncertainty in after receiving , measured in number of bits, is .
Thus,
By Bayes’ Theorem, , so
i.e., provides as much information about as does about .
If the events and are independent, what is ?
Extending the result above from to and from to , we define the system/average mutual information of and , denoted by , as the information gain or uncertainty loss about random variable by observing random variable [MC12, Definition 6.7]:
It is trivial to show that 1️⃣ , 2️⃣ , 3️⃣ iff and are independent.