Here is a way to represent a permutation with a fairly economical number of bits.

The number of permutations on \(N\) objects is \(N!,\) which is \(\prod_{1 \leq i \leq N}i.\) Therefore, to specify one of these permutations requires at least \(\left\lceil \sum_{1 \leq i \leq N} \log_2 i \right\rceil\) bits. I found a way to represent such a permutation with \(\sum_{1 \leq i \leq N} \left\lceil \log_2 i \right\rceil\) bits. This has probably been discovered before (even several times). But I still enjoyed coming up with it.

Sufficiency

Let \(B(m) = \sum_{1 \leq i \leq m} \left\lceil \log_2 i \right\rceil.\) It is easiest to represent a permutation on no elements, of course. This requires no bits of data—the permutation must send the element to itself. All the information is contained, so to speak, in the type of the permutation, and there is only one such permutation. And in fact, \(B(0) = 0.\)

So suppose instead that \(\sigma\) is a permutation on \(N+1\) elements. To fix ideas, let us assume the elements are the numbers \(0, \cdots, N.\) Then the permutation \((N\ \sigma(N)) \circ \sigma\) fixes \(N.\) We may therefore regard \(next(\sigma) = (N\ \sigma(N)) \circ \sigma\) as a permutation on \(N\) elements. By induction, we may represent \(next(\sigma)\) with \(B(N)\) bits. Since there are \(N+1\) possibilities for \(\sigma(N)\) (for \(N\) itself is a possibility), we may represent \(\sigma(N)\) with \(\left\lceil \log_2 (N+1) \right\rceil\) bits. Now, \((N\ \sigma(N)) \circ next(\sigma) = \sigma,\) so \(\sigma(N)\) and \(next(\sigma)\) together determine \(\sigma.\) Therefore we may represent \(\sigma\) with

\[\left\lceil \log_2 (N+1) \right\rceil + B(N) = B(N+1)\]

bits, completing the induction.

How to act

By the above argument, we may write

\[\begin{align*}\sigma &= (N\quad \sigma(N)) \circ (N-1\quad next(\sigma)(N-1)) \circ \cdots \\ &= \cdots (N-1\quad next(\sigma)(N-1)) \circ (N\quad \sigma(N)).\end{align*}\]

Hence the initial cycle is the one on the “inside.” It is easier to shift bits right than to shift them left. So we will put the initial cycle’s bits in the least significant position possible. Thus in general, cycles’ bits will be recorded in the least significant position possible, after prior cycles’ bits have been so recorded.

To give an example before formalizing this in code, consider the number 100 given in binary. Now, we want this to represent a permutation \(\sigma\) on four elements. So we’ll write it as 00100 instead. The least significant bit is 0, indicating the cycle \((1\ 0),\) instead of the identity cycle \((1\ 1).\) The next bits need to represent one of the three possible elements \(a \in \{0,1,2\}\) in the next cycle \((2\ a).\) This requires two bits, which here are 10, which is 2 itself. Thus this represents the cycle \((2\ 2).\) Finally we require two bits still to represent the four possible elements \(a \in \{0,1,2,3\}\) in the final cycle \((3\ a).\) These bits are 00, so this cycle is \((3\ 0).\) All told then we have \(\sigma = (1\ 0) (3\ 0) = (0\ 1\ 3).\)

With that example given, let’s now derive more formally how to act on an element \(x\) via a permutation \(\sigma\) given via this representation. The overall structure should be something like

initialize variables
while (there is another transposition)
    apply it to x
    update variables
return x

How do we tell if there is another tranposition, and if so, what it is? Considering the previous example, we see that it might be useful to have a bitmask variable to grab a collection of bits; and a variable indicating what element the current transposition considers its initial element in the cycle. Now, the question is whether all that is enough information to tell when to exit the loop. If we allow ourselves to refer to the number \(N,\) say as a constant, then yes we can. If \(f \geq N,\) we can stop, since \(\sigma\) only acts on \(\{0, \cdots, N-1\}.\) So now we know the algorithm looks like

initialize variables
while (f < N)
    get next tranposition
    apply it to x
    update variables
return x

Clearly the next transposition is \((f\ y)\) where \(y\) is the number indicated by the next collection of bits. One convenient way to write this would be y ← mask & bits, where mask is a mask with the appropriate number of bits; bits are the remaining bits that we haven’t dealt with from the representation of \(\sigma;\) and & is bitwise-and, not boolean-and. So we can rewrite the loop body as follows:

initialize variables
while (f < N)
    y ← mask & bits
    if x = f
        x ← y
    else if x = y
        x ← f
    update variables
return x

This has a branch inside a loop, which is not great. Identifying booleans with natural numbers in the usual way, viz. false = 0 and true = 1, we can rewrite the branch as follows.

initialize variables
while (f < N)
    y ← mask & bits
    x ← x + (x = f)⋅(y-x) + (x = y)⋅(f-x)
    update variables
return x

It remains to determine how to update the variables f and mask. Of course f is easy; just increment it. Now, mask is supposed to be \(\left\lceil\log_2 f\right\rceil\) least-significant 1 bits, and 0 bits everywhere else. That is, it is supposed to be the number \(2^{\left\lceil\log_2 f\right\rceil + 1} - 1.\) But putting that in the update would evaluate a discrete logarithm. We can do better than that. First, we know f is representable in \(\left\lceil\log_2 f\right\rceil\) bits. The maximum number so representable happens to be mask itself. If f+1 > mask then the next f will require an additional bit. So instead of evaluating a discrete logarithm, to update the variables we can just write

f ← f+1
if f > mask
    mask ← 2⋅mask + 1

Again, this conditional assignment occurs in a loop. We may rewrite it as follows.

b ← f > mask
mask ← mask + b⋅(mask + 1)

This assumes that the next interesting bits are already least significant. So we must also update the bits variable by shifting it right the appropriate number of bits. We could divide bits by mask+1, but division is best avoided. Instead we keep track of the number of bits in the mask, nbits. We increment it along with mask when needed. Thus updating the variables looks like this:

bits ← bits >> nbits
f ← f+1
b ← f > mask
mask ← mask + b⋅(mask + 1)
nbits ← nbits + b

It remains to initialize the variables. Clearly we begin with f = 0.¹ There are no other elements yet that we need to distinguish from f, so nbits = 0 and mask = 0 likewise.

Therefore, in conclusion, the following Hoare triple should hold. (Here we use Dĳkstra’s guarded command language.)

\[\begin{align*} \{&N, X: \mathbb{N},\quad \sigma: S_N &\mathbf{constants} \\ &f, nbits, mask, bits, x, y : \mathbb{N} &\mathbf{variables} \\ &bits\mbox{ represents }\sigma &\} \end{align*}\]

f,nbits,mask,x ← 0,0,0,X;
do f < N → 
      y ← mask & bits
    ; x ← x + (x=f)⋅(y-x) + (x=y)⋅(f-x)
    ; bits ← bits >> nbits
    ; f ← f+1
    ; b ← f > mask
    ; mask, nbits ← mask + b⋅(mask+1), nbits + b
od;

\[\{ x = \sigma(X)\}\]

In Python we could write this as follows.

def act(N,sigma,x):
    f,nbits,mask = 0,0,0
    while f < N:
        y = mask & sigma
        x += (x==f)*(y-x) + (x==y)*(f-x)
        sigma >>= nbits
        f += 1
        b = f > mask
        mask += b*(mask+1)
        nbits += b
    return x

Some comments

The above method is redundant, and doesn’t check that all the tranpositions have the form \((n\ m)\) with \(m \leq n.\) One could write a method to verify the form of the tranposition before using it. The above code is faster for skipping this verification.

Also, this method is as data-space-efficient as possible for \(N \leq 4.\) Since I like three-dimensional computational topology, this is all I need. However, for \(N\leq 4,\) the number of permutations is at most 24. So a table lookup in this case might be more appropriate than the above function!

The biggest permutations you can fit in one 64-bit word are those on nineteen elements. Remarkably, these fit exactly in 64 bits with the above representation. Likewise, one can fit permutations on five elements in a single byte.

Permutations on 32 elements need 124 bits with this representation, and no more elements are possible with 128 bits allowed. This is a (space) improvement over the “sheep-and-goats” representation in Warren’s excellent Hacker’s Delight, which uses 160 bits to represent permutations on 32 elements.

Footnote

The reader may be concerned about the base case \(\sigma: \emptyset \to \emptyset.\) It is true that the code above will necessarily fail in this case. But it also succeeds in this case, since to run the code, you first need an element to run it on. Assuming we have an element from the empty set lets us conclude the code terminates correctly, even though it also fails. ↩