popcount | popcount algorithms in C and Rust

by BartMassey C Version: Current License: Non-SPDX

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | popcount Summary

popcount is a C library. popcount has no bugs, it has no vulnerabilities and it has low support. However popcount has a Non-SPDX License. You can download it from GitHub.

Copyright 2007 Bart Massey 2021-03-05. Here's some implementations of bit population count in C and Rust, with benchmarks. The writeup in this README is based on work from 2007 and later, as far as I can tell today. It consists of status, and then a chronological log including benchmark results.

Support

Quality

Security

License

Reuse

Support

popcount has a low active ecosystem.

It has 12 star(s) with 1 fork(s). There are 3 watchers for this library.

It had no major release in the last 6 months.

There are 0 open issues and 1 have been closed. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of popcount is current.

Quality

popcount has no bugs reported.

Security

popcount has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

popcount has a Non-SPDX License.

Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

Reuse

popcount releases are not available. You will need to build from source code and install.

Installation instructions, examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of popcount

Get all kandi verified functions for this library.

popcount Key Features

No Key Features are available at this moment for popcount.

popcount Examples and Code Snippets

No Code Snippets are available at this moment for popcount.

Community Discussions

Trending Discussions on popcount

Set 8th bit if all lower 7 bits are set without branching

How to use C++ 20 in g++

Is using C++20's std::popcount with vector optimization is equivalent to popcnt intristic?

Avoid unnecessary mov ecx, ecx instruction in bzhi(y, tzcnt(x))

64 bit popcount using bit tricks in Python

How to close app page if body with two Scaffold

Correct Algorithm for Game of two stacks on HackerRank

Count bits set upto a given position for a 32 bit type

Trying to debug mystery sinewave using AKMIDISampler in Audiokit

I want to pack the bits based on arbitrary mask

QUESTION

Set 8th bit if all lower 7 bits are set without branching

Asked 2021-May-26 at 05:02

I am trying to set the highest bit in a byte value only when all lower 7 bits are set without introducing branching.

for example, given the following inputs:

...

ANSWER

Answered 2021-May-24 at 09:48

How about:

Source https://stackoverflow.com/questions/67669858

QUESTION

How to use C++ 20 in g++

Asked 2021-Apr-06 at 19:53

I am trying to access std::popcount, but it seems like it's only there in C++ 20.

When I try compiling with g++ -std=c++20 main.cpp, it says g++: error: unrecognized command line option '-std=c++20'; did you mean '-std=c++03'

How do I tell g++ to use c++ 20?

I am using Ubuntu 18.04

...

ANSWER

Answered 2021-Apr-06 at 19:53

I would try updating gcc. C++ 20 was introduced in gcc version 8 which is pretty new.

Source https://stackoverflow.com/questions/66975491

QUESTION

Is using C++20's std::popcount with vector optimization is equivalent to popcnt intristic?

Asked 2021-Jan-05 at 19:23

C++20 introduces many new functions such as std::popcount, I use the same functionality using an Intel Intrinsic.

I compiled both options - can be seen in Compiler Explorer code:

Using Intel's AVX2 intrinsic
Using std::popcount and GCC compiler flag "-mavx2"

It looks like the generated assembly code is the same, besides of the type checks used in std's template.

In terms of OS agnostic code and having the same optimizations - Is it right to assume that using std::popcount and the apt compiler vector optimization flags is better than directly using intrinsics?

Thanks.

...

ANSWER

Answered 2021-Jan-05 at 19:23

Technically No. (But practically, yes). The C++ standard only specifies the behavior of popcount, and not the implementation (Refer to [bit.count]).

Implementors are allowed to do whatever they want to achieve this behavior, including using the popcnt intrinsic, but they could also write a while loop:

Source https://stackoverflow.com/questions/65580986

QUESTION

Avoid unnecessary mov ecx, ecx instruction in bzhi(y, tzcnt(x))

Asked 2020-May-24 at 23:54

I have a bit position (it's never zero), calculated by using tzcnt and I would like to zero high bits starting from that position. This is code in C++ and disassembly (I'm using MSVC):

...

ANSWER

Answered 2020-May-24 at 23:54

This is an MSVC missed optimization. GCC/clang can use bzhi directly on the output of tzcnt for your source. All compilers have missed optimizations in some cases, but GCC and clang tend to have fewer cases than MSVC.

(And GCC is careful to break the output dependency of tzcnt when tuning for Haswell to avoid the risk of creating a loop-carried dependency chain through that false dependency. Unfortunately GCC still does this with -march=skylake which doesn't have a false dep for tzcnt, only popcnt and a "true" dep for bsr/bsf.)

Intel documents the 2nd input to _bzhi_u64 as unsigned __int32 index. (You're making this explicit with a static_cast to uint32_t for some reason, but removing the explicit cast doesn't help). IDK how MSVC defines the intrinsic or handles it internally.

IDK why MSVC wants to do this; I wonder if it's zero-extension to 64-bit inside the internal logic of MSVC's _bzhi_u64 intrinsic that takes a 32-bit C input but uses a 64-bit asm register. (tzcnt's output value-range is 0..64 so this zero-extension is a no-op in this case)

Masked popcnt: shift yyy instead of masking it

As in What is the efficient way to count set bits at a position or lower?, it can be more efficient to just shift out the bits you don't want, instead of zeroing them in-place. (Although bzhi avoids the cost of creating a mask so this is just break-even, modulo differences in which execution ports bzhi vs. shrx can run on.) popcnt doesn't care where the bits are.

Source https://stackoverflow.com/questions/61986529

QUESTION

64 bit popcount using bit tricks in Python

Asked 2020-Jan-28 at 18:25

I would like to have a bit trick implementation of 64 bit popcount in Python. I tried to copy this code as follows:

...

ANSWER

Answered 2020-Jan-28 at 18:25

To make the solution obvious, I'm adding it here, but the credit goes to @TimPeters and @Heap-Overflow

Source https://stackoverflow.com/questions/59939945

QUESTION

How to close app page if body with two Scaffold

Asked 2019-Dec-18 at 12:07

I was using this line to close page Navigator.pop(context); but in this case it showing black screen and i tried to call Navigator.pop(context); 2 times but black page still there. What to do here? My page code code

...

ANSWER

Answered 2019-Dec-18 at 12:07

Your project should declare MaterialApp only once, basically in main.dart file.

remove the MaterialApp Widget from build() method.

Source https://stackoverflow.com/questions/59391553

QUESTION

Correct Algorithm for Game of two stacks on HackerRank

Asked 2019-Jul-15 at 11:16

I just attempted a stack based problem on HackerRank

https://www.hackerrank.com/challenges/game-of-two-stacks

Alexa has two stacks of non-negative integers, stack A and stack B where index 0 denotes the top of the stack. Alexa challenges Nick to play the following game:

In each move, Nick can remove one integer from the top of either stack A or B stack.

Nick keeps a running sum of the integers he removes from the two stacks.

Nick is disqualified from the game if, at any point, his running sum becomes greater than some integer X given at the beginning of the game.

Nick's final score is the total number of integers he has removed from the two stacks.

find the maximum possible score Nick can achieve (i.e., the maximum number of integers he can remove without being disqualified) during each game and print it on a new line.

For each of the games, print an integer on a new line denoting the maximum possible score Nick can achieve without being disqualified.

...

ANSWER

Answered 2017-May-20 at 09:32

Ok I will try to explain an algorithm which basically can solve this issue with O(n), you need to try coding it yourself.

I will explain it on the simple example and you can reflect it

Source https://stackoverflow.com/questions/44083755

QUESTION

Count bits set upto a given position for a 32 bit type

Asked 2019-May-12 at 14:42

I am working with an algorithm that performs many popcount/sideways addition up to a given index for a 32 bit type. I am looking to minimize the operations required to perform what I have currently implemented as this:

...

ANSWER

Answered 2019-May-12 at 14:42

Thanks everyone for the suggestions, I decided to pit all the methods I had come across head to head as I couldn't find any similar tests.

N.B. The population counts shown are for indexes up to argv[1], not a popcount of argv[1] - 8x 32-bit arrays make up 256 bits. The code used to produce these results can be seen here.

On my Ryzen 1700,For my usage, the fastest population count was (often) the one on page 180 of the Software Optimization Guide for AMD64 Processors. This (often) remains true for larger population counts too.

Source https://stackoverflow.com/questions/54991020

QUESTION

Trying to debug mystery sinewave using AKMIDISampler in Audiokit

Asked 2019-May-02 at 08:36

Following on from a previous issue, I stopped using AKSampler to move to the functionality used in AKMIDISampler. Got my loops working again (with help from this Google Groups post), but I have a sinewave playing (which happens when the MIDISampler can't find it's source file).

It's not an issue with the source files I'm targeting because they all play OK. The sinewave is coming from somewhere else in the process, but I can't see where...

Please help 8•)

(Simplified and edited code to show only relevant details - please get in touch for any clarification)

...

ANSWER

Answered 2017-Nov-16 at 18:01

This is a bit of a guess, but its a very common issue to have your audio files in a location that the sampler likes. Try putting the audiofiles in a Samples/ folder like in these examples:

http://audiokit.io/playgrounds/Playback/Sequencer/ http://audiokit.io/playgrounds/Playback/Sampler/

or I think a Sounds or "Sampler Instruments" folders work as well as in the Sampler Demo:

https://github.com/AudioKit/AudioKit/tree/master/Examples/iOS/SamplerDemo/SamplerDemo/Sounds

Source https://stackoverflow.com/questions/47160168

QUESTION

I want to pack the bits based on arbitrary mask

Asked 2019-Apr-24 at 12:19

Let's say that data is 1011 1001 and the mask is 0111 0110, then you have:

...

ANSWER

Answered 2018-Jan-28 at 21:28

If you're targeting x86 most compilers will have an instrinsic for the pdep (parallel bit deposit) instruction which directly performs the operation you want, in hardware, at a rate of 1 per cycle (3 cycles latency)¹, on Intel hardware that supports it. For example, gcc offers it as the _pdep_u32 and _pdep_u64 intrinsic functions.

Unfortunately, on AMD Ryzen (the only AMD hardware that supports BMI2) this operation is very slow: one per 18 cycles. You might want to have a separate code-path to support non-Intel platforms if they are important to you.

If you aren't on x86, you can find general purpose implementations of these options here - the specific operation you want is expand_right - and this other section will probably be of great interest in that it specifically covers the simple case where you are dealing with word-sized elements.

In practice, if you are really dealing with 8-bit data and mask values, you might just use a precomputed lookup table - either a big 8 bit x 8 bit = 65k one which covers all {data, mask} combinations and which gives you the answer directly, or a 256-entry one which covers all mask values and gives you some coefficients for a simple bit-shifting calculation or a multiplication-based code.

FWIW, I'm not sure how you can do it easily with 5 rotate instructions, because it seems that the naive solution needs 1 rotate instruction for each bit, whether set or not (so for a word size of 8 bits, 7 or 8 rotate² instructions).

¹ Of course, the performance in principle depends on the hardware, but on all the mainstream Intel CPUs that implement it, it's 1 cycle throughput, 3 cycles latency (not sure about AMD).

² Only 7 rotates because the "rotate of 0" operation for the lowest order bit can evidently be omitted.

Source https://stackoverflow.com/questions/41617369

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install popcount

You can build the C code with both GCC and Clang and the Rust code with Cargo by typing make, but you should first examine the Makefile and set what you need to tune. Depending on your environment, you might just want to build specific binaries. Once you've built popcount_gcc or whichever, run it with a number of iterations to bench: 100000 is good to start. It will spit out some explanatory numbers.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: