8000 Population Count Op · Issue #36380 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Population Count Op #36380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tom-bird opened this issue Apr 10, 2020 · 5 comments
Open

Population Count Op #36380

tom-bird opened this issue Apr 10, 2020 · 5 comments
Labels
actionable feature A request for a proper, new feature. module: python frontend For issues relating to PyTorch's Python frontend triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@tom-bird
Copy link
tom-bird commented Apr 10, 2020

🚀 Feature

It would be great to have the bitwise operation 'Population Count' that counts the number of 1 bits to be exposed.

Not sure if this exists in the back end of torch? It is exposed in tensorflow

Motivation

This is a key operation in implementing binarised neural nets, for example in XNOR-Net. Binarised linear and conv layers can be performed quickly using XNOR and population counts. XNOR is already exposed in the pytorch API.

cc @albanD

@Adam-Vandervorst
Copy link

Any update or workaround here?

@Exferro
Copy link
Exferro commented Mar 13, 2024

I will express interest too, since NumPy is about to add np.bitwise_count in its new major release. I benchmarked it a bit, and here's what I've got:

  1. My custom implementation of popcount64c taken from the Wikipedia page on Hamming weight (population count) takes ~800 ms to popcount an array a = np.random.randint(20000, size=(10000, 10000)).
  2. At the same time, a novel np.bitwise_count which just calls the underlying CPU instruction takes ~50 ms, which is a 16x speedup.
  3. To give more context, np.bitwise_and(b, b) takes ~75 ms. Thus, we can see that bitwise_and roughly bounds from above the time required for popcount.

Now, if we proceed to GPU, I have the following:

  1. My custom implementation of popcount64c takes ~30ms.
  2. At the same time, torch.bitwise_and(b, b) takes ~2.5ms.
  3. Thus, I would expect that native PyTorch popcount implementation, which calls just a basic processor instruction, would also take milliseconds, again providing a ~10x speedup.

The code I work on daily is quite specific and there I use popcount a lot.
Thus I would really appreciate if PyTorch had a native popcount :) Thank you!

@albanD albanD added the module: python frontend For issues relating to PyTorch's Python frontend label Apr 15, 2024
@albanD
Copy link
Collaborator
albanD commented Apr 15, 2024

Quick update: I think the fact that numpy added it is a good signal that we want it as well for best compatibility.
We would be happy to accept a PR that adds this new unary op!

@Felix-Petersen
Copy link

I am strongly in support of including popcount in torch. Though, I want to remark that if we follow the numpy variant, it would be important to also have support for uint64 etc. (#58734) as the numpy.bitwise_count "Computes the number of 1-bits in the absolute value of x". This would be necessary if one wants to do an actual popcount when using all bits of the datatype (an all ones int64 leads to bitwise_count=1, and an all ones uint64 leads to bitwise_count=64.)

@lapp0
Copy link
lapp0 commented Sep 4, 2024

Throwing this in here in case it helps anyone. Torch bit packing and packed tensor counting. Went through a few iterations to create a performant solution. I'm in support of this features inclusion natively, but here's a working solution in the mean time.

(Requires you to flatten before bit packing.)

def _pack_bit_tensor(bool_tensor):
    """Packs a boolean tensor into an int64 tensor using bitwise operations and summation."""
    assert len(bool_tensor.shape) == 1
    padding = (64 - bool_tensor.shape[0] % 64) % 64
    if padding > 0:
        bool_tensor = torch.cat([bool_tensor, torch.zeros(padding, dtype=bool)])

    bit_groups = bool_tensor.view(-1, 64)
    shifts = torch.arange(64, device=bool_tensor.device, dtype=torch.int64)

    packed_tensor = torch.sum(bit_groups * (1 << shifts), dim=-1, dtype=torch.int64)
    return packed_tensor


def _bit_tensor_sum(packed_tensor):
    """Counts the number of 1-bits in a packed int64 tensor using the Hamming weight"""
    count = packed_tensor
    count = (count - ((count >> 1) & 0x5555555555555555))
    count = (count & 0x3333333333333333) + ((count >> 2) & 0x3333333333333333)
    count = (count + (count >> 4)) & 0x0F0F0F0F0F0F0F0F
    count = (count * 0x0101010101010101) >> 56
    return torch.sum(count).item()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
actionable feature A request for a proper, new feature. module: python frontend For issues relating to PyTorch's Python frontend triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

7 participants
0