On (the algorithm for) the ANF of binary boolean functions

October 27, 2018 • Tags: mathematics, english

\(\newcommand\funcdecl[3]{#1\, \colon #2 \rightarrow #3}\) \(\newcommand\covered[2]{#1 \preccurlyeq #2}\) \(\newcommand\finitefield[1]{\mathbb{#1}}\)

Given any binary boolean function \(\funcdecl{\varphi}{\mathbb{F}_2^n}{\mathbb{F}_2}\), there exist \(a_u \in \mathbb{F}_2\) such that:

\[\begin{equation} \varphi(x) = \sum_{u\in \mathbb{F}_2^n}a_u\left(\prod_{i=1}^n x_i^{u_i}\right) \label{eq:anf1}\tag{1} \end{equation}\]

This representation is called the algebraic normal form (ANF) of \(\varphi\), and the bulk of this text is about the way we compute these coefficients.1 But first, we need to show that this representation always exists. One way to do so, is by reasoning as follows: define a function \(\funcdecl{\delta}{\mathbb{F}_2^n}{\mathbb{F}_2}\), such that \(\delta(x)=1\) if \(x=0\), and \(\delta(x)=0\) if \(x\neq 0\).2 We can now define \(\varphi\) as:

\[\begin{equation} \varphi(x) = \sum_{z\in \mathbb{F}_2^n} \varphi(z)\delta(z+x) \label{eq:phi_kronecker1}\tag{2} \end{equation}\]

This works because \(\delta(z+x)\) is \(0\) except when \(z=x\), in which case it is \(1\). Equation \(\ref{eq:phi_kronecker1}\) might seem a circular definition, specially if one is thinking of \(\varphi\) as being defined by a “formula”. However, the most general definition of a function from, say, set \(X\) to set \(Y\) is as a subset — that is, a relation — of \(X \times Y\), subject to the caveat that, for every \(x\in X\), there must exist exactly one pair \((x, y)\) in that relation (in the case of function \(\varphi\), we would say that for all \(x\), there must exist exactly one \(y\) such that \(y=\varphi(x)\)). Hence equation \(\ref{eq:phi_kronecker1}\) is just an analytical description of that subset.

Now, we can replace the \(\delta\) function with \(\prod_{i=1}^n (1+x_i+z_i)\), which is \(1\) if and only if \(x=z\), just like \(\delta\). Equation \(\ref{eq:phi_kronecker1}\) now becomes:

\[\begin{equation} \varphi(x) = \sum_{z\in \mathbb{F}_2^n} \varphi(z)\prod_{i=1}^n (1+x_i+z_i) \label{eq:phi_kronecker2}\tag{3} \end{equation}\]

When \(\varphi\) is a concrete function, both \(\varphi(z)\) and the \(z_i\)’s will be concrete \(0\)’s and \(1\)’s — the only variables will be the \(x_i\)’s. Furthermore, the \(x_i\)’s belong to \(\mathbb{F}_2\), and hence we have that \(x_i^2 = x_i\). Thus when developing (\(\ref{eq:phi_kronecker2}\)) for a concrete function, we will end up with an equation of the form of (\(\ref{eq:anf1}\)). This shows that the ANF always exists.

The ANF is also unique, which we can show through a cardinality argument. Consider the set of all the functions \(\funcdecl{\varphi}{\mathbb{F}_2^n}{\mathbb{F}_2}\); this set has \(2^{2^n}\) elements; denote it set \(\mathcal{BBF}\).3 Now consider the set \(\mathcal{ANF}\) consisting of all the expressions of the form of the right hand side of equation \(\ref{eq:anf1}\). Its elements are in one-to-one correspondence with the set of subgroups of the set of all expressions of the form \(\prod_{i=1}^n x_i^{u_i}\). There are \(2^n\) of these expressions (one for each \(u\in \mathbb{F}_2^n\)), and hence the number of elements of \(\mathcal{ANF}\) is \(\sum_{j=0}^{n} \binom{2^n}{j}\), which, by a well-known identity relating the binomial coefficients,4 equals \(2^{2^n}\). As every element of \(\mathcal{ANF}\) corresponds exactly to one binary boolean function — i.e. an element of \(\mathcal{BBF}\) — and as both sets have the same cardinality, we conclude that the ANF is unique.5

We usually abbreviate \(\prod_{i=1}^n x_i^{u_i}\) as \(x^u\), so (\(\ref{eq:anf1}\)) becomes \(\varphi(x) = \sum_{u\in \mathbb{F}_2^n}a_u x^u\). But we can simplify it even further. For any \(x\in \mathbb{F}_2^n\), the set \(\{i\in \{1..n\}\ \vert\ x_i\neq 0\}\) is called the support of \(x\), denoted \(supp(x)\). Note that the cardinality of the support of \(x\) equals its Hamming weight, \(hw(x)\).6 Going back to \(\prod_{i=1}^n x_i^{u_i}\), it equals \(1\) if and only if for all \(i\), \(u_i = 1 \rightarrow x_i = 1\) (otherwise it is \(0\)) — but this is the same as requiring that \(supp(u)\subseteq supp(x)\). Let us denote this latter condition as \(u \preccurlyeq x\) (we thus say that \(u\) is covered by \(x\), or equivalently, that \(x\) covers \(u\)). This allows us to rewrite (\(\ref{eq:anf1}\)) as:

\[\begin{equation} \varphi(x) = \sum_{u\preccurlyeq x}a_u \label{eq:anf2}\tag{4} \end{equation}\]

But how to compute the different \(a_u\)? We are aided by the fact that the “converse” of (\(\ref{eq:anf2}\)) also holds:

Theorem 1. For all \(u\in \mathbb{F}_2\), the following holds:

\[\begin{equation} a_u = \sum_{x\preccurlyeq u}\varphi(x) \label{eq:anf3}\tag{5} \end{equation}\]

\[\tag*{$\square$}\]

Proof. Replace \(a_u\) in (\(\ref{eq:anf2}\)) with the right hand side of (\(\ref{eq:anf3}\)) to obtain:

\[\begin{equation} \sum_{u \in \finitefield{F}_2^n \mid \covered{u}{x}} \left(\sum_{y \in \finitefield{F}_2^n \mid \covered{y}{u}} \varphi(y)\right) \label{eq:anf4}\tag{6} \end{equation}\]

We can look at the nested summations as defining a set of pairs, over which the function iterates. Denote that set as \(S\); in this case \(S = \{(u, y) \mid u\preccurlyeq x \land y\preccurlyeq u\}\). From the transitivity of \(\preccurlyeq\) we immediately get that both \(y\preccurlyeq x\) and \(y\preccurlyeq u\preccurlyeq x\) hold. Let \(S' = \{(y, u) \mid y\preccurlyeq x \land y\preccurlyeq u\preccurlyeq x\}\). We have just shown that \((u, y)\in S \rightarrow (y, u) \in S'\). The converse is straightforward, indeed from \(y\preccurlyeq u\preccurlyeq x\) we get both \(u\preccurlyeq x\) and \(y\preccurlyeq u\), meaning that \((y, u) \in S'\rightarrow (u, y)\in S\). But \(S'\) describes precisely the pairs iterated over by the following summation:

\[\begin{equation} \sum_{y \in \finitefield{F}_2^n \mid \covered{y}{x}} \left(\sum_{u \in \finitefield{F}_2^n \mid \covered{y}{u} \preccurlyeq x} \varphi(y)\right) \label{eq:anf5}\tag{7} \end{equation}\]

The equivalence between \(S\) and \(S'\) means that the summations in equation \(\ref{eq:anf4}\) include pair \((u, y)\) if and only if the summations in equation \(\ref{eq:anf5}\) include \((y, u)\) — which implies their equality. The latter can be rearranged as:

\[\begin{equation} \sum_{y \in \finitefield{F}_2^n \mid \covered{y}{x}} \varphi(y) \left(\sum_{u \in \finitefield{F}_2^n \mid \covered{y}{u} \preccurlyeq x} 1\right) = \varphi(x) \tag{8} \end{equation}\]

The last equality follows because of the set \(\{u \in \finitefield{F}_2^n \mid \covered{y}{u} \preccurlyeq x\}\): if \(x \neq y\), then it is either empty (if \(y \not\preccurlyeq x\)), or (if \(y \preccurlyeq x\)) it is nonempty, and has \(2^{hw(x)-hw(y)}\) elements — but in both cases, the inner summation evaluates to zero (remember we are summing in \(\finitefield{F}_2\)). If \(y=x\), the set has just one element, and the result follows. \(\blacksquare\)

The ANF/truth table algorithm

The duality shown in equations \(\ref{eq:anf2}\) and \(\ref{eq:anf3}\) yields a simple algorithm to compute the ANF coefficients (\(a_u\)’s) from \(\varphi\)’s truth table, or vice-versa (i.e. it can also be used to compute the value of the function, given the \(a_u\)’s). I will focus on the former. Let us take as an example the function defined by \((x_1 \rightarrow x_2) \rightarrow x_3\). Here is its truth table:

\[ \begin{array}{c|c|c|c} x_1 & x_2 & x_3 & (x_1 \rightarrow x_2) \rightarrow x_3 \\ \hline 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 1 & 0 & 0 \\ 0 & 1 & 1 & 1 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 1 \\ 1 & 1 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ \end{array} \]

The algorithm goes like this: start in the column that corresponds to the “most significant bit”; \(x_1\) in the table above. Where it has the value \(0\), leave the value of the function unchanged; where it has the value \(1\), add to the value of the function at the current entry the value of the function at the entry obtained by setting \(x_1\) to \(0\). This is illustrated in the table below, where instead of replacing the values of the function, they are shown in an auxiliary column:

\[ \begin{array}{c|c|c|c|c} x_1 & x_2 & x_3 & (x_1 \rightarrow x_2) \rightarrow x_3 & \textrm{aux_1} \\ \hline 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 & 1 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 1 & 1 & 1 & 1 \\ 1 & 0 & 0 & 1 & 1 \\ 1 & 0 & 1 & 1 & 0 \\ 1 & 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 & 0 \\ \end{array} \]

And now you just do the same thing for \(x_2\), and then for \(x_3\) (if there were more columns, you would just continue until reaching the one corresponding to the “least significant bit”). Again to illustrate, here’s the result, with two more auxiliary columns (aux_2 for \(x_2\) and aux_3 for \(x_3\)). Note that, as we are not replacing the function values, aux_2 is computed using the values of aux_1, and aux_3 using the values of aux_2.

\[ \begin{array}{c|c|c|c|c|c|c} x_1 & x_2 & x_3 & (x_1 \rightarrow x_2) \rightarrow x_3 & \textrm{aux_1} & \textrm{aux_2} & \textrm{aux_3} \\ \hline 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 & 1 & 1 & 1 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 1 & 1 & 0 & 0 \\ 1 & 0 & 0 & 1 & 1 & 1 & 1 \\ 1 & 0 & 1 & 1 & 0 & 0 & 1 \\ 1 & 1 & 0 & 0 & 0 & 1 & 1 \\ 1 & 1 & 1 & 1 & 0 & 0 & 1 \\ \end{array} \]

Now, the values in column aux_3 are the \(a_u\)’s that correspond to \(x^u\); for example the coefficient for \(x_1x_2\) (i.e. \(u=110\)) is \(1\), which means the monomial shows up in the final expression. Doing this for all the values in the table, yields that final expression:

\[\begin{equation} (x_1 \rightarrow x_2) \rightarrow x_3 \equiv x_1+x_3 + x_1x_2 + x_1x_3 + x_1x_2x_3 \tag{9} \end{equation}\]

And as mentioned above, if we start with a table listing of the \(a_u\)’s, this algorithm yields the original function:

\[ \begin{array}{c|c|c|c|c|c|c} u_1 & u_2 & u_3 & \textrm{aux_3 } (\text{i.e. } a_u) & \textrm{aux_4} & \textrm{aux_5} & \textrm{aux_6} \text{ i.e. } (x_1 \rightarrow x_2) \rightarrow x_3\\ \hline 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 & 1 & 1 & 1 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 1 & 0 & 0 & 1 & 1 & 1 & 1 \\ 1 & 0 & 1 & 1 & 0 & 0 & 1 \\ 1 & 1 & 0 & 1 & 1 & 0 & 0 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ \end{array} \]

Doing this for a few different functions, it is very easy to get a “feeling” that this works. Of course the high level reason why this works is obvious: because by the time the algorithm get to the aux_3 column, it has already, for each row (where the variables’ columns are seen as an \(u\) value) summed all the function values for all the \(x\)’s that that \(u\) value covers. But how to prove this? This was the reason lead me to write about this in the first place: the algorithm is dead simple, so the proof should be simple too, right? Well… kind of. It is simple but — to me at least — not at all straightforward.

To see the lower level reason why this works, we borrow a bit of notation from Python: indexing starts from \(0\), and \(x[m:n+1]\) means the substring of \(x\) that starts at index \(m\) and ends at index \(n\), including this last element (but not the one at position \(n+1\)) — this is how it works on Python. The element of \(x\) at index \(m\) is denoted \(x[m]\). Also, \(x[m:]\) is the strings that goes from (and includes) position \(m\), to the end of string \(x\). Lastly, if \(x\) and \(y\) are strings, then \(x == y\) means the strings are equal, and \(x + y\) denotes concatenation. A string that contains just the element \(a\) is denoted \([a]\) (this is also a “Python-ism”).

Also, to explain why the algorithm works (we will only look at the case going from the function to the \(a_u\)’s), it is a lot easier to forget the ANF and just think of it as a transversal problem: for a given \(x\), which elements does the algorithm touch? We will show it touches (once and only once) precisely those elements \(u\) such that \(u\preccurlyeq x\).

Indeed, if we are dealing with strings of length \(1\), it is obvious that the algorithm works. The case of length \(2\) can be easily checked by hand, and furthermore, the truth table for length two essentially “doubles” that of the case of length \(1\) — and this provides the motivation for an inductive reasoning. Suppose that for a given string \(x\) of length \(n\) (i.e. when working in \(\mathbb{F}_2^n\)), the algorithm works as expected: i.e., it passes by every string \(u\) of the same length, such that \(u\preccurlyeq x\) holds, exactly once. Consider now the strings of length \(n+1\), where the “extra bit” is added to the left: then, this set can be constructed taking the set of strings of length \(n\), and prepending to it a column of all \(0\)’s, and then doing the same thing, but now with all \(1\)’s column. To take a concrete example, in the truth tables above, observe that the set of tuples of \(x_2\) and \(x_3\) are repeated, once for \(x_1=0\) and again for \(x_1=1\). So we expand \(x\) by prepending an extra \(0\) or \(1\) to it, and run the algorithm again, but ignoring this new column — i.e. we start in column \(1\) (which was column \(0\) in the length-\(n\) case). It should be clear that from the induction hypothesis (and the fact that the remaining \(n\) columns are basically the same as in the case of strings of length \(n\)), that the algorithm will go once over all \(u\in \mathbb{F}_2^{n+1}\) such that \(u[0] == x[0]\) and \(u[1:]\preccurlyeq x[1:]\).

Observation: if we do this (start the algorithm on the second column, working in \(\mathbb{F}_2^{n+1}\)) for two values \(x\) and \(x'\), that only differ on the first element, the resulting set of strings, except for the first element, will be the same. That is to say — assuming, without loss of generality, that \(x[0] == 0\) and \(x'[0] == 1\) — the set output when the input is \(x\) will only contain strings that begin with \(0\). And if we change this first bit to \(1\), we obtain precisely the set of strings output on input \(x'\) (and vice-versa). This is because (loosely speaking) the truth-table for \(\mathbb{F}_2^{n+1}\) minus the first column, is basically the same as that of \(\mathbb{F}_2^n\), repeated twice, as explained above.

If, going back to our inductive reasoning, we now start the algorithm normally (i.e. on the first column, again in \(\mathbb{F}_2^{n+1}\)), and \(x[0] == 0\), then we still obtain all the strings that \(x\) covers (because the algorithm essentially ignores zeroes, meaning it reduces to the case of \(\mathbb{F}_2^n\)). Let \(\mathcal{S}\) be this set; i.e. the set of all values \(u\) such that \(u[0] == 0\) and \(u\preccurlyeq x\). By the induction hypothesis, \(\mathcal{S}\) is always part of the algorithm’s output. If we do the same, but \(x[0] == 1\), then the algorithm adds \([0] + x[1:]\), and moves on to column \(1\) — and the final set of strings will be \(\mathcal{S}\), plus (by the above observation) all the strings in \(\mathcal{S}\) with the first element changed to \(1\).

And we are done: it is straightforward to see that all the strings thus produced are indeed covered by \(x\); and conversely, if there existed a string \(u\) such that \(u\preccurlyeq x\) but that was not part of the algorithm’s output, then in particular the string \([0]+u[1:]\) could not be in \(\mathcal{S}\), which by the induction hypothesis would mean that \(u[1:]\not\preccurlyeq x[1:]\), implying \(u\not\preccurlyeq x\) — which is contradictory.7 Hence we conclude the algorithm indeed outputs all the strings covered by \(x\).

November 22, 2018: added the table showing how to go from the \(a_u\)’s to the function (the inverse case to what is explained).


  1. \(\mathbb{F}_2^n\) denotes the vector space of dimension \(n\) over the Galois field with two elements. It will be clear from context the field in which the summations take place (i.e. \(\mathbb{F}_2\) or \(\mathbb{Z}\)). Also, this formalism implies the convention — widely accepted within the “discrete domains” of mathematics, but by no means universal outside of it — that \(0^0 = 1\). If the reader finds himself mystified by this, checking out the relevant Wikipedia page is very recommended.

  2. Such a function is called the Kronecker delta.

  3. There two choices — \(0\) or \(1\) — for each element of the domain, and each element of the domain must have exactly one image. Hence we have two choices for the first element of the domain, and for each of those, another two choices for the second element, … which yields a total number of functions of \(2^{2^n}\). More generally, for a function \(\funcdecl{\psi}{X}{Y}\), with \(X\) and \(Y\) finite, the total number of functions is \(|Y|^{|X|}\).

  4. Wikipedia to the rescue.

  5. Actually, this argument also shows the ANF always exists, because the correspondence between \(\mathcal{BBF}\) and \(\mathcal{ANF}\) is a bijection.

  6. https://en.wikipedia.org/wiki/Hamming_weight.

  7. If \(([0]+u[1:])\in \mathcal{S}\), and \(u[0] == 0\), then \(u\) is part of the algorithm’s output. If \(([0]+u[1:])\in \mathcal{S}\), and \(u[0] == 1\) then the algorithm necessarily passed through \([1]+u[1:]\) (i.e. \(u\)), otherwise it could not have reached \([0]+u[1:]\). This again means that \(u\) is part of the output. Both scenarios contradict the assumption that \(u\) is not a part of the algorithm’s output.