popcount

popcount is a function that returns the number of 1-bit in a bit vector. It is implemented in x86-64 instruction set and the software implementation is trivial (but not necessarily fastest).

The intuitive way is:

unsigned popcount(unsigned v)
{
    unsigned count = 0;
    while (v) {
        count = count + (v % 2);
        v >> 1;
    };
    return count;
}

However, the intuitive and fastest way is to look up a table. For example, if v is assumed to be only 4-bits long, the table is:

unsigned popcount(unsigned v)
{
    assert(v<16);
    unsigned count[] = {0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4};
    return count[v];
};

Since I am asked for what x = x & (x - 1) means some years ago, this simply says to reset the last 1-bit in x to zero. So we can reduce the first code segment into one with fewer loops:

unsigned popcount(unsigned v)
{
    unsigned count = 0;
    while (v) {
        count++;
        v &= v-1;
    };
    return count;
}

Similar to table method, we can also do divide and conquer. Assuming 64-bit vector,

unsigned popcount(unsigned v)
{
    // popcount = sum of all 1-bits in v
    unsigned odd1  = v & 0x5555555555555555;
    unsigned even1 = v & 0xAAAAAAAAAAAAAAAA;
    even1 >>= 1;
    unsigned sum1 = odd1 + even1;

    // popcount = sum of every 2-bit segments in sum1, e.g. 0110 => 01 (=1) + 10 (=2) = 11 (=3)
    unsigned odd2  = sum1 & 0x3333333333333333;
    unsigned even2 = sum1 & 0xCCCCCCCCCCCCCCCC;
    even2 >>= 2;
    unsigned sum2 = odd2 + even2;

    // popcount = sum of every 4-bit segments in sum2, but each 4-bit values is at most 4
    unsigned odd3  = sum2 & 0x0F0F0F0F0F0F0F0F;
    unsigned even3 = sum2 & 0xF0F0F0F0F0F0F0F0;
    even3 >>= 4;
    unsigned sum3 = odd3 + even3;

    // popcount = sum of every 8-bit segments in sum3, but each 8-bit values is at most 8
    unsigned odd4  = sum3 & 0x00FF00FF00FF00FF;
    unsigned even4 = sum3 & 0xFF00FF00FF00FF00;
    even4 >>= 8;
    unsigned sum4 = odd4 + even4;

    // popcount = sum of every 16-bit segments in sum4, but each 16-bit values is at most 16
    unsigned odd5  = sum4 & 0x0000FFFF0000FFFF;
    unsigned even5 = sum4 & 0xFFFF0000FFFF0000;
    even5 >>= 16;
    unsigned sum5 = odd5 + even5;

    // popcount = sum of every 32-bit segments in sum5, but each 32-bit values is at most 32
    unsigned odd6  = sum5 & 0x00000000FFFFFFFF;
    unsigned even6 = sum5 & 0xFFFFFFFF00000000;
    even6 >>= 32;
    unsigned sum6 = odd6 + even6;

    // Now sum6 is at most 64, which is the number of 1-bits
    return sum6;
}

However, there is a much faster approach by MIT Hacker’s Memo: (for 32-bit vector)

unsigned popcount(unsigned v)
{
    unsigned u;
    u = v - ((v>>1) & 033333333333) - ((v>>2) & 011111111111);
    return ((u + (u>>3)) & 030707070707) % 63;
 }

The idea is as follows: If we write a 32-bit integer as n\(=b_{31}2^{31}+b_{30}2^{30}+...+b_12+b_0\), we also notice that \(b2^k-b2^{k-1}\equiv b2^{k-1}\), then we have n-(n>>1)\(=(b_{31}+b_{30})2^{30}+(b_{30}+b_{29}2^{29}+...+(b_1+b_0)\), so for

n - (n>>1) - (n>>2) - (n>>3) - ... - (n>>31)

it is equals to exactly \(b_{31}+b_{30}+...+b_1+b_0\), which is the popcount. Now consider only three bits v, the popcount will be v-(v>>1)-(v>>2). When v is 32-bit, the shift operation is on 3-bit unit, thus the AND-mask in the above segment. Next, the 32-bit popcount is the addition of every 3-bits in u, firstly, make it into every-6-bit counts, the number is put into 6-bit slots:

(u + (u>>3)) & 030707070707

which you can view as \(c_5 (2^6)^{5} + ... + c_1 2^6 + c_0\), by remainder theorem, the polynomial in \(2^6\) divide by \(2^6-1\) (=63), the remainder is the polynomial with all \(2^6\) substitute by 1, which is exactly the sum \(c_5+c_4+...+c_1+c_0\).

Now move on to 64-bit. The modulus-63 operation is more then enough for 32-bit, but at least modulus-65 is required for 64-bit, or otherwise we can never have 63 or 64 bits set in our data. By the same virtue, the code will be:

unsigned popcount(unsigned v)
{
    unsigned u;
    u = v - ((v>>1) & 0x7777777777777777) - ((v>>2) & 0x3333333333333333) - ((v>>3) & 0x1111111111111111);
    return ((u + (u>>4)) & 0x0F0F0F0F0F0F0F0F) % 0xFF; /* Modulus-255 */
 }

Final note:
The popcount is also known as the Hamming weight, see http://en.wikipedia.org/wiki/Hamming_weight. The MIT Hacker Memo mentioned above is available at http://home.pipeline.com/~hbaker1/hakmem/hakmem.html