popcount
is a function that returns the number of 1-bit in a bit vector. It is
implemented in x86-64 instruction set and the software implementation is trivial
(but not necessarily fastest).
The intuitive way is:
unsigned popcount(unsigned v)
{
unsigned count = 0;
while (v) {
count = count + (v % 2);
v >> 1;
};
return count;
}
However, the intuitive and fastest way is to look up a table. For example, if
v
is assumed to be only 4-bits long, the table is:
unsigned popcount(unsigned v)
{
assert(v<16);
unsigned count[] = {0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4};
return count[v];
};
Since I am asked for what x = x & (x - 1)
means some years ago, this simply
says to reset the last 1-bit in x
to zero. So we can reduce the first code
segment into one with fewer loops:
unsigned popcount(unsigned v)
{
unsigned count = 0;
while (v) {
count++;
v &= v-1;
};
return count;
}
Similar to table method, we can also do divide and conquer. Assuming 64-bit vector,
unsigned popcount(unsigned v)
{
// popcount = sum of all 1-bits in v
unsigned odd1 = v & 0x5555555555555555;
unsigned even1 = v & 0xAAAAAAAAAAAAAAAA;
even1 >>= 1;
unsigned sum1 = odd1 + even1;
// popcount = sum of every 2-bit segments in sum1, e.g. 0110 => 01 (=1) + 10 (=2) = 11 (=3)
unsigned odd2 = sum1 & 0x3333333333333333;
unsigned even2 = sum1 & 0xCCCCCCCCCCCCCCCC;
even2 >>= 2;
unsigned sum2 = odd2 + even2;
// popcount = sum of every 4-bit segments in sum2, but each 4-bit values is at most 4
unsigned odd3 = sum2 & 0x0F0F0F0F0F0F0F0F;
unsigned even3 = sum2 & 0xF0F0F0F0F0F0F0F0;
even3 >>= 4;
unsigned sum3 = odd3 + even3;
// popcount = sum of every 8-bit segments in sum3, but each 8-bit values is at most 8
unsigned odd4 = sum3 & 0x00FF00FF00FF00FF;
unsigned even4 = sum3 & 0xFF00FF00FF00FF00;
even4 >>= 8;
unsigned sum4 = odd4 + even4;
// popcount = sum of every 16-bit segments in sum4, but each 16-bit values is at most 16
unsigned odd5 = sum4 & 0x0000FFFF0000FFFF;
unsigned even5 = sum4 & 0xFFFF0000FFFF0000;
even5 >>= 16;
unsigned sum5 = odd5 + even5;
// popcount = sum of every 32-bit segments in sum5, but each 32-bit values is at most 32
unsigned odd6 = sum5 & 0x00000000FFFFFFFF;
unsigned even6 = sum5 & 0xFFFFFFFF00000000;
even6 >>= 32;
unsigned sum6 = odd6 + even6;
// Now sum6 is at most 64, which is the number of 1-bits
return sum6;
}
However, there is a much faster approach by MIT Hacker’s Memo: (for 32-bit vector)
unsigned popcount(unsigned v)
{
unsigned u;
u = v - ((v>>1) & 033333333333) - ((v>>2) & 011111111111);
return ((u + (u>>3)) & 030707070707) % 63;
}
The idea is as follows: If we write a 32-bit integer as n
\(=b_{31}2^{31}+b_{30}2^{30}+...+b_12+b_0\),
we also notice that \(b2^k-b2^{k-1}\equiv b2^{k-1}\),
then we have n-(n>>1)
\(=(b_{31}+b_{30})2^{30}+(b_{30}+b_{29}2^{29}+...+(b_1+b_0)\), so for
n - (n>>1) - (n>>2) - (n>>3) - ... - (n>>31)
it is equals to exactly \(b_{31}+b_{30}+...+b_1+b_0\), which is the popcount. Now
consider only three bits v
, the popcount will be v-(v>>1)-(v>>2)
. When v
is 32-bit, the shift operation is on 3-bit unit, thus the AND-mask in the above
segment. Next, the 32-bit popcount is the addition of every 3-bits in u
,
firstly, make it into every-6-bit counts, the number is put into 6-bit slots:
(u + (u>>3)) & 030707070707
which you can view as \(c_5 (2^6)^{5} + ... + c_1 2^6 + c_0\), by remainder theorem, the polynomial in \(2^6\) divide by \(2^6-1\) (=63), the remainder is the polynomial with all \(2^6\) substitute by 1, which is exactly the sum \(c_5+c_4+...+c_1+c_0\).
Now move on to 64-bit. The modulus-63 operation is more then enough for 32-bit, but at least modulus-65 is required for 64-bit, or otherwise we can never have 63 or 64 bits set in our data. By the same virtue, the code will be:
unsigned popcount(unsigned v)
{
unsigned u;
u = v - ((v>>1) & 0x7777777777777777) - ((v>>2) & 0x3333333333333333) - ((v>>3) & 0x1111111111111111);
return ((u + (u>>4)) & 0x0F0F0F0F0F0F0F0F) % 0xFF; /* Modulus-255 */
}
Final note:
The popcount is also known as the Hamming weight, see
http://en.wikipedia.org/wiki/Hamming_weight. The MIT Hacker Memo mentioned above
is available at http://home.pipeline.com/~hbaker1/hakmem/hakmem.html