Claude Shannon defined perfect secrecy to mean an encryption scheme where the ciphertext did not leak any information whatsoever about the corresponding plaintext. Or, equivalently, an encryption scheme that remains secure even in the presence of an adversary with unlimited computational power. This notion is formalised with probabilities, but to understand that, one must first understand how (and why) do probabilities get mixedin with crypto in the first place.
The basic principle, first put forth by Kerckhoffs, is that the safety of the cryptosystem rests solely on the key. That is, you have a plaintext message, you choose a key at “random” (more on this later), and obtain a ciphertext message; and the only way of obtaining the original plaintext from the ciphertext is to decrypt it with the same key.^{1}
First, in any “real world” scenario, the plaintexts are not all equally likely: some will be more likely than others (e.g. considering the set of all strings over the Latin alphabet, the plaintexts that correspond to (for instance) valid English will be more likely than those that are just a random sequence of characters). This means that there exists a probability distribution over the set of all possible plaintexts, which very likely is not the uniform distribution. Shannon calls this the a priori probability of a plaintext, and one must assume that this plaintext probability distribution is known to the cryptanalyst (remember we assume that the only thing the cryptanalyst does not know is the key—cf. above).
Now formally, a cryptosystem consists of three algorithms, . The first generates a random key, the second is the encryption algorithm that takes a plaintext and a key and outputs a ciphertext, and the third is the decryption algorithm that takes a ciphertext and a key and outputs a plaintext. A cryptosystem is consistent if for all and , with probability . The algorithm also implicitly defines a probability distribution over the set of possible keys, , and furthermore, one can always assume that that distribution is the uniform one.^{2} The probability distributions over and , together with the encryption algorithm , specify (again implicitly) the probability distribution over the set of possible ciphertexts, . The ensuing discussion assumes that the encryption algorithm is fixed.
Which brings us back to perfect secrecy. Shannon defined a cryptosystem as perfectly secret if, for every plaintext, the a priori probability was equal to the a posteriori probability; which is the probability that that plaintext was the one that originated the observed ciphertext. Mathematically we have the following
Definition 1 (Perfect secrecy). An encryption scheme is perfectly secret if for all and , and for all probability distributions over , we have ^{3}. End.
An elementary application of Bayes’ theorem shows that the previous definition is equivalent to having perfect secrecy if and only if (again for all , and probability distributions over ).
This is a rigorous definition, but when I first learned the concept, then read through the facts that stem from the definition, I felt some unease, due to the fact that for all the talk about probabilities, their space was never defined. So let’s do that now. This set is composed of all the triples , where , and . There are such tuples. The probability of is , because when the encryption algorithm is fixed, is fully determined by the plaintext and the key. Note that , because and , and multiplying both yields the initial summation.
This may seem as a pointless display of pedantry, but its value becomes obvious when one tries to understand (and calculate) probabilities like , where is a fixed ciphertext. (A remark about notation: values that are assumed fixed are not subscripted.) Intuitively, one could surmise it should be something like the summation of the probabilities for all possible plaintexts, each multiplied by the probability of choosing a key that encrypts that plaintext into . The formalism of the previous paragraph allows us to verify this conjecture. Indeed, to calculate , just select all tuples where , and sum their probability. We obtain
To see how this is equivalent to our intuitive guess, consider what happens if for a given , there are two different keys (say and ) that encrypt it to . Then we would have:
So we conclude that each is multiplied by the total probability of selecting a key that encrypts it to —just as conjectured.
Conditional probabilities
Both forms of Definition 1 are based on conditional probabilities. Let’s see what insight our formalism can provide on those events.
We can rewrite the second summation in Equation 1 differently, by noting that each of the terms is just the probability that both and occur:
It is implicit that this is done for all keys that encrypt into . This makes sense because the plaintexts form a partition of the probability space: (which is the same as saying that the sum of the probabilities for all plaintexts is ). On the other hand, for a given , that is encrypted into by keys and ,
This is because is the sum of the probabilities of the tuples of the form , and in our example there are two such tuples, and summing their probabilities yields . Dividing by we get . This of course holds for more than two keys. But this means this sum is also equal to , and this in turn allows us to, yet again, rewrite the probability like so:
As far as I can tell, there is no description of that is similar to Equation 2, because this depends on the probability distribution of the plaintext. The best we can write for is the following, which is not simple at all…
Back to perfect secrecy (again)
Despite all the talk about perfectly secret ciphers, the truth is that so far all the equalities shown are valid for any symmetric encryption scheme (not just for perfectly secret ones). The next two however, are only true if the cipher has perfect secrecy. First using conditional probabilities we prove another equivalent condition to (both forms of) Definition 1.
Theorem 1. A cipher has perfect secrecy, iff for any distribution over , any and any , we have . End.
Proof. () If the cipher has perfect secrecy, then $P(em_1)=P(e)=P(em_2)m_1m_2e$$.
() If , for any , and , then
$P(em)P(em_i)m_i$$, because it is always the same by hypothesis. QED.
Next we prove another necessary and sufficient condition for perfect secrecy. In the case of such a cipher, has always the same value, for all (viz. ). And because we can always assume that the keys are generated according to the uniform distribution, this means that, for a fixed , and for any , the number of keys that encrypt to is always the same. The next result shows the converse is also true.
Theorem 2. A cipher has perfect secrecy if and only if, having fixed a ciphertext, for any plaintext, the number of keys that encrypt it to that fixed ciphertext is the same (but note this number can vary for different ciphertexts). More formally, let be a fixed ciphertext as before, and let be the set of keys that encrypt to . Then a cipher has perfect secrecy iff has the same value, for all .^{4} End.
Note that the analog property when having fixed a plaintext is false: there can be more keys that encrypt that plaintext to one ciphertext than to another ciphertext, but the cipher can still be perfectly secret. We’ll see an example of this shortly.
Proof. () We have argued this direction informally, based on the property that, for perfectly secret ciphers, . But we can also use . From (3), if the cipher is perfectly secret:
is constant in the numerator summation, so it can be put outside it, and cancelled with the of the right hand side. We thus obtain
Remember the denominator equals . What this means is that for a given (fixed) ciphertext , and for all plaintext messages , the number of keys that encrypt to must have probabilities that always sum to the same value. Given the assumption of a uniform key distribution, this is the same as saying the number of keys must always be the same.
() If for any ciphertext , the number of keys that decrypt it to any plaintext is the same, then we immediately have that for any two different plaintext messages, and , it must be the case that . Theorem 1 now yields the conclusion that the cipher is perfectly secure. QED.
The question
After that lengthy introduction, we now finally come to the question that actually annoyed me enough to try to visualise the probability space in the way I’ve just described. That question is the following: does perfect secrecy imply the ciphertext distribution is uniform? I could not either prove it or refute it, but it turns out the answer is no. Here’s the counterexample: we have two bits of plaintext, , four bits of key material , and three bits of ciphertext :
This encryption algorithm has perfect secrecy, because for any given ciphertext, there is the same number of keys that decrypt it to any plaintext (cf. Theorem 2). This is straightforward (if somewhat laborious) to see.
Consider the ciphertext , and an arbitrary plaintext . What are the keys that would encrypt the said plaintext into the said ciphertext? Given that must be zero, we have that . The same reasoning yields , because . Finally, given that and , it must be the case that . This yields the following three possible keys for encrypting into : , and . We denote this set as .
Reasoning similarly, we can write the following table, listing the keys that encrypt an arbitrary plaintext into the ciphertext in the left column. An overline () denotes the complement of the bit .
Ciphertext  Keys 

$$(0, 0, 0)$$  $$(b_0, b_1, \{(0, 0), (0, 1), (1, 0)\})$$ 
$$(0, 0, 1)$$  $$(b_0, b_1, 1, 1)$$ 
$$(0, 1, 0)$$  $$(b_0, \overline{b_1}, \{(0, 0), (0, 1), (1, 0)\})$$ 
$$(0, 1, 1)$$  $$(\overline{b_0}, \overline{b_1}, 1, 1)$$ 
$$(1, 0, 0)$$  $$(b_0, b_1, \{(0, 0), (0, 1), (1, 0)\})$$ 
$$(1, 0, 1)$$  $$(\overline{b_0}, b_1, \{(0, 0), (0, 1), (1, 0)\})$$ 
$$(1, 1, 0)$$  $$(b_0, \overline{b_1}, 1, 1)$$ 
$$(1, 1, 1)$$  $$(\overline{b_0}, \overline{b_1}, \{(0, 0), (0, 1), (1, 0)\})$$ 
Thus we can see that for any fixed ciphertext, there is the same number of keys that cause it to decrypt to any plaintext; thus the scheme has perfect secrecy. However, the ciphertext distribution is not always uniform: if both plaintext and keys are assumed uniform, then (for example) the ciphertext will be more likely to appear than , because there are more keys that encrypt an arbitrary plaintext to it. In other words, although the cipher is perfectly secret for all plaintext distributions, there is at least one (viz. the uniform distribution) for which the ciphertext distribution will not be uniform.
Also recall the remark made after stating Theorem 2: in this cipher for a given plaintext, there are more keys that encrypt it to some ciphertexts rather than others—indeed that is the cause of the nonuniformity of the ciphertexts—but it does not prevent perfect secrecy, as this example illustrates.
The converse question
And what about the converse? I.e. if the ciphertext is uniformly distributed, then is the cipher perfectly secret? As already mentioned, the ciphertext distribution is implicitly specified by the encryption algorithm, and key and plaintext distributions. Given that we can assume the key to be uniformly distributed, if the ciphertext is also uniformly distributed we have (notice that the first summation is just another notation for writing (1)):
Remember that is the set of keys that encrypt to , and is the summation of the probabilities of those keys. If the ciphertext is uniformly distributed regardless of the plaintext distribution, then we must have . This is only possible if, having fixed an , as the same value for all —we denote that common value by (notice the subscript is gone). The uniformity of the key distribution now means for a fixed , has the same value, for all . Theorem 2 now yields that the cipher is perfectly secret. Thus we can now state
Theorem 3. If a cipher has an uniform ciphertext distribution, regardless of the plaintext distribution, then it is a perfectly secret cipher. End.
The converse is false, as the above example cipher shows. An example of an encryption scheme where the ciphertext distribution is always uniform, is the One Time Pad.
Notice that in this case—uniform ciphertext distribution—as mentioned above, fixing , the number of keys that encrypt to is the same, for all . But in addition to that, because has the same value for all , so does . This means that for a fixed plaintext , we have has the same value, for all . Or in words, for all plaintexts, the number of keys that encrypt any particular plaintext to any particular ciphertext is the same.

This assumes that given one specific plaintext and ciphertext, there is only key that encrypts the former into the latter (and viceversa for decryption). This need not be the case (even for perfect secrecy!), as we shall see. But of course, for any secure cryptosystem, even if they do exist, it should be infeasible to calculate one such key. ↩

Consider a cryptosystem , in which is a really complicated algorithm, that outputs the key according to a very nonuniform distribution. This algorithm usually takes as input a “random tape”, which outputs symbols of the same alphabet, according to a uniform distribution. Then we can use this random tape as a new key generation algorithm , and incorporate the operations of in and . Thus we obtain a derived cryptosystem , with a uniformly generated key, but with the same ciphertext distribution—because the operations are still the same, they are just “in a different place”.
More concretely, consider a specific tuple , with probability . Let be the (uniformly random) key which used as its random tape to produce key . This means that , where the last term is the probability that produced key when its random tape produced . Then, under the new cryptosystem, what is the probability of ? Remember that the previous has been incorporated into , which means that the probability is that of being selected, times the probability of being selected, times the probability of internally transforming into . I.e. ; but this is just the original value. In particular this means that for any and , the value of does not change (cf. footnote #4).
As an ending remark, note that if the only source of randomness for is the random tape, then , i.e. is always transformed in . ↩

To be rigorous, we would have to define random variables and , and say that , where and . But such a level of rigour is not needed here. ↩

To better illustrate what the assumption of uniformly generated keys means in this context, suppose that (for some perfectly secure cryptosystem), for a given ciphertext , there exists a plaintext , for which there are two keys that encrypt it to , and a plaintext for which there are three such keys. Furthermore, suppose that the key distribution is such that . By the method of footnote #2, if we now produce an equivalent cryptosystem with uniform keys, the ciphertext distribution does not change, so neither do or ; but they can no longer be equal, because as there are more keys for than , we must have . This shows the original cipher could not have been perfectly secure. ↩