`popcount`

is a function that returns the number of 1-bit in a bit vector. It is
implemented in x86-64 instruction set and the software implementation is trivial
(but not necessarily fastest).

The intuitive way is:

```
unsigned popcount(unsigned v)
{
unsigned count = 0;
while (v) {
count = count + (v % 2);
v >> 1;
};
return count;
}
```

However, the intuitive and fastest way is to look up a table. For example, if
`v`

is assumed to be only 4-bits long, the table is:

```
unsigned popcount(unsigned v)
{
assert(v<16);
unsigned count[] = {0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4};
return count[v];
};
```

Since I am asked for what `x = x & (x - 1)`

means some years ago, this simply
says to reset the last 1-bit in `x`

to zero. So we can reduce the first code
segment into one with fewer loops:

```
unsigned popcount(unsigned v)
{
unsigned count = 0;
while (v) {
count++;
v &= v-1;
};
return count;
}
```

Similar to table method, we can also do divide and conquer. Assuming 64-bit vector,

```
unsigned popcount(unsigned v)
{
// popcount = sum of all 1-bits in v
unsigned odd1 = v & 0x5555555555555555;
unsigned even1 = v & 0xAAAAAAAAAAAAAAAA;
even1 >>= 1;
unsigned sum1 = odd1 + even1;
// popcount = sum of every 2-bit segments in sum1, e.g. 0110 => 01 (=1) + 10 (=2) = 11 (=3)
unsigned odd2 = sum1 & 0x3333333333333333;
unsigned even2 = sum1 & 0xCCCCCCCCCCCCCCCC;
even2 >>= 2;
unsigned sum2 = odd2 + even2;
// popcount = sum of every 4-bit segments in sum2, but each 4-bit values is at most 4
unsigned odd3 = sum2 & 0x0F0F0F0F0F0F0F0F;
unsigned even3 = sum2 & 0xF0F0F0F0F0F0F0F0;
even3 >>= 4;
unsigned sum3 = odd3 + even3;
// popcount = sum of every 8-bit segments in sum3, but each 8-bit values is at most 8
unsigned odd4 = sum3 & 0x00FF00FF00FF00FF;
unsigned even4 = sum3 & 0xFF00FF00FF00FF00;
even4 >>= 8;
unsigned sum4 = odd4 + even4;
// popcount = sum of every 16-bit segments in sum4, but each 16-bit values is at most 16
unsigned odd5 = sum4 & 0x0000FFFF0000FFFF;
unsigned even5 = sum4 & 0xFFFF0000FFFF0000;
even5 >>= 16;
unsigned sum5 = odd5 + even5;
// popcount = sum of every 32-bit segments in sum5, but each 32-bit values is at most 32
unsigned odd6 = sum5 & 0x00000000FFFFFFFF;
unsigned even6 = sum5 & 0xFFFFFFFF00000000;
even6 >>= 32;
unsigned sum6 = odd6 + even6;
// Now sum6 is at most 64, which is the number of 1-bits
return sum6;
}
```

However, there is a much faster approach by MIT Hacker’s Memo: (for 32-bit vector)

```
unsigned popcount(unsigned v)
{
unsigned u;
u = v - ((v>>1) & 033333333333) - ((v>>2) & 011111111111);
return ((u + (u>>3)) & 030707070707) % 63;
}
```

The idea is as follows: If we write a 32-bit integer as `n`

,
we also notice that ,
then we have `n-(n>>1)`

, so for

```
n - (n>>1) - (n>>2) - (n>>3) - ... - (n>>31)
```

it is equals to exactly , which is the popcount. Now
consider only three bits `v`

, the popcount will be `v-(v>>1)-(v>>2)`

. When `v`

is 32-bit, the shift operation is on 3-bit unit, thus the AND-mask in the above
segment. Next, the 32-bit popcount is the addition of every 3-bits in `u`

,
firstly, make it into every-6-bit counts, the number is put into 6-bit slots:

```
(u + (u>>3)) & 030707070707
```

which you can view as , by remainder theorem, the polynomial in divide by (=63), the remainder is the polynomial with all substitute by 1, which is exactly the sum .

Now move on to 64-bit. The modulus-63 operation is more then enough for 32-bit, but at least modulus-65 is required for 64-bit, or otherwise we can never have 63 or 64 bits set in our data. By the same virtue, the code will be:

```
unsigned popcount(unsigned v)
{
unsigned u;
u = v - ((v>>1) & 0x7777777777777777) - ((v>>2) & 0x3333333333333333) - ((v>>3) & 0x1111111111111111);
return ((u + (u>>4)) & 0x0F0F0F0F0F0F0F0F) % 0xFF; /* Modulus-255 */
}
```

Final note:

The popcount is also known as the Hamming weight, see
http://en.wikipedia.org/wiki/Hamming_weight. The MIT Hacker Memo mentioned above
is available at http://home.pipeline.com/~hbaker1/hakmem/hakmem.html