core2 | Alloc support

by bbqsrc Rust Version: Current License: Non-SPDX

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | core2 Summary

core2 is a Rust library typically used in Embedded System applications. core2 has no bugs, it has no vulnerabilities and it has low support. However core2 has a Non-SPDX License. You can download it from GitHub.

Ever wanted a Cursor or the Error trait in no_std? Well now you can have it. A 'fork' of Rust's std modules for no_std environments, with the added benefit of optionally taking advantage of alloc. The goal of this crate is to provide a stable interface for building I/O and error trait functionality in no_std environments. The current code corresponds to the most recent stable API of Rust 1.47.0. It is also a goal to achieve a true alloc-less experience, with opt-in alloc support. This crate works on stable with some limitations in functionality, and nightly without limitations by adding the relevant feature flag. This crate is std by default -- use no default features to get no_std mode.

Support

Quality

Security

License

Reuse

Support

core2 has a low active ecosystem.

It has 35 star(s) with 6 fork(s). There are 2 watchers for this library.

It had no major release in the last 6 months.

There are 1 open issues and 3 have been closed. On average issues are closed in 6 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of core2 is current.

Quality

core2 has 0 bugs and 0 code smells.

Security

core2 has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

core2 code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

core2 has a Non-SPDX License.

Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

Reuse

core2 releases are not available. You will need to build from source code and install.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of core2

Get all kandi verified functions for this library.

core2 Key Features

No Key Features are available at this moment for core2.

core2 Examples and Code Snippets

No Code Snippets are available at this moment for core2.

Community Discussions

Trending Discussions on core2

Why does gcc -march=znver1 restrict uint64_t vectorization?

What exactly is the problem that memory barriers deal with?

How to loop over a large list in R?

How to execute selenium script in a lot of threads?

Correct value for Yocto KMACHINE setting

Same compute intensive function running on two different cores resulting in different latency

Error undefined hidden symbol `__dso_handle' can not be used when making a shared object

Build Alexa Auto SDK on Apple Silicon

openblas.bb:do_compile failed with exit code '1"

Updating R packages in anaconda but getting a clang-12 error

QUESTION

Why does gcc -march=znver1 restrict uint64_t vectorization?

Asked 2022-Apr-10 at 02:47

I'm trying to make sure gcc vectorizes my loops. It turns out, that by using -march=znver1 (or -march=native) gcc skips some loops even though they can be vectorized. Why does this happen?

In this code, the second loop, which multiplies each element by a scalar is not vectorised:

...

ANSWER

Answered 2022-Apr-10 at 02:47

The default -mtune=generic has -mprefer-vector-width=256, and -mavx2 doesn't change that.

znver1 implies -mprefer-vector-width=128, because that's all the native width of the HW. An instruction using 32-byte YMM vectors decodes to at least 2 uops, more if it's a lane-crossing shuffle. For simple vertical SIMD like this, 32-byte vectors would be ok; the pipeline handles 2-uop instructions efficiently. (And I think is 6 uops wide but only 5 instructions wide, so max front-end throughput isn't available using only 1-uop instructions). But when vectorization would require shuffling, e.g. with arrays of different element widths, GCC code-gen can get messier with 256-bit or wider.

And vmovdqa ymm0, ymm1 mov-elimination only works on the low 128-bit half on Zen1. Also, normally using 256-bit vectors would imply one should use vzeroupper afterwards, to avoid performance problems on other CPUs (but not Zen1).

I don't know how Zen1 handles misaligned 32-byte loads/stores where each 16-byte half is aligned but in separate cache lines. If that performs well, GCC might want to consider increasing the znver1 -mprefer-vector-width to 256. But wider vectors means more cleanup code if the size isn't known to be a multiple of the vector width.

Ideally GCC would be able to detect easy cases like this and use 256-bit vectors there. (Pure vertical, no mixing of element widths, constant size that's am multiple of 32 bytes.) At least on CPUs where that's fine: znver1, but not bdver2 for example where 256-bit stores are always slow due to a CPU design bug.

You can see the result of this choice in the way it vectorizes your first loop, the memset-like loop, with a vmovdqu [rdx], xmm0. https://godbolt.org/z/E5Tq7Gfzc

So given that GCC has decided to only use 128-bit vectors, which can only hold two uint64_t elements, it (rightly or wrongly) decides it wouldn't be worth using vpsllq / vpaddd to implement qword *5 as (v<<2) + v, vs. doing it with integer in one LEA instruction.

Almost certainly wrongly in this case, since it still requires a separate load and store for every element or pair of elements. (And loop overhead since GCC's default is not to unroll except with PGO, -fprofile-use. SIMD is like loop unrolling, especially on a CPU that handles 256-bit vectors as 2 separate uops.)

I'm not sure exactly what GCC means by "not vectorized: unsupported data-type". x86 doesn't have a SIMD uint64_t multiply instruction until AVX-512, so perhaps GCC assigns it a cost based on the general case of having to emulate it with multiple 32x32 => 64-bit pmuludq instructions and a bunch of shuffles. And it's only after it gets over that hump that it realizes that it's actually quite cheap for a constant like 5 with only 2 set bits?

That would explain GCC's decision-making process here, but I'm not sure it's exactly the right explanation. Still, these kinds of factors are what happen in a complex piece of machinery like a compiler. A skilled human can easily make smarter choices, but compilers just do sequences of optimization passes that don't always consider the big picture and all the details at the same time.

-mprefer-vector-width=256 doesn't help: Not vectorizing uint64_t *= 5 seems to be a GCC9 regression

(The benchmarks in the question confirm that an actual Zen1 CPU gets a nearly 2x speedup, as expected from doing 2x uint64 in 6 uops vs. 1x in 5 uops with scalar. Or 4x uint64_t in 10 uops with 256-bit vectors, including two 128-bit stores which will be the throughput bottleneck along with the front-end.)

Even with -march=znver1 -O3 -mprefer-vector-width=256, we don't get the *= 5 loop vectorized with GCC9, 10, or 11, or current trunk. As you say, we do with -march=znver2. https://godbolt.org/z/dMTh7Wxcq

We do get vectorization with those options for uint32_t (even leaving the vector width at 128-bit). Scalar would cost 4 operations per vector uop (not instruction), regardless of 128 or 256-bit vectorization on Zen1, so this doesn't tell us whether *= is what makes the cost-model decide not to vectorize, or just the 2 vs. 4 elements per 128-bit internal uop.

With uint64_t, changing to arr[i] += arr[i]<<2; still doesn't vectorize, but arr[i] <<= 1; does. (https://godbolt.org/z/6PMn93Y5G). Even arr[i] <<= 2; and arr[i] += 123 in the same loop vectorize, to the same instructions that GCC thinks aren't worth it for vectorizing *= 5, just different operands, constant instead of the original vector again. (Scalar could still use one LEA). So clearly the cost-model isn't looking as far as final x86 asm machine instructions, but I don't know why arr[i] += arr[i] would be considered more expensive than arr[i] <<= 1; which is exactly the same thing.

GCC8 does vectorize your loop, even with 128-bit vector width: https://godbolt.org/z/5o6qjc7f6

Source https://stackoverflow.com/questions/71811588

QUESTION

What exactly is the problem that memory barriers deal with?

Asked 2022-Mar-28 at 12:43

I'm trying to wrap my head around the issue of memory barriers right now. I've been reading and watching videos about the subject, and I want to make sure I understand it correctly, as well as ask a question or two.

I start with understanding the problem accurately. Let's take the following classic example as the basis for the discussion: Suppose we have 2 threads running on 2 different cores

This is pseudo-code!

We start with int f = 0; int x = 0; and then run those threads:

...

ANSWER

Answered 2022-Mar-28 at 09:06

For my point of view you missed the most important thing!

As the compiler did not see that the change of x nor f has any side effect, the compiler also can optimize all of that away. And also the loop with condition f==0 will result in "nothing" as the compiler only sees that you propagate a constant for f=0 before, it can assume that f==0 will always be true and optimize it away.

And for all of that you have to tell the compiler that there will be something happen which is not visible from the given flow of code. That can be something like a call to some semaphore/mutex/... or other IPC functionality or the use of atomic vars.

If you compile your code, I assume you get more or less "nothing" as for each of both code parts nothing has any effect and the compiler did not see that the variables are used from two thread context and optimize all and everything away.

If we implement the code as the following example, we see it fails and print 0 on my system.

Source https://stackoverflow.com/questions/71644651

QUESTION

How to loop over a large list in R?

Asked 2022-Feb-03 at 16:22

I have a Largelist data with 300 names having names and a data frame data6 with values.

Largelist data looks like below:

...

ANSWER

Answered 2022-Feb-03 at 16:22

We may use lapply/Map to loop over the columns of 'data', apply the function and then cbind the list elements

Source https://stackoverflow.com/questions/70973934

QUESTION

How to execute selenium script in a lot of threads?

Asked 2022-Jan-21 at 20:25

I like to test my site in a lot of threads. But when I trying to do that, I see one problem. All motions when I like are happening in last opened window. So, the first window just stuck in background.

...

ANSWER

Answered 2022-Jan-21 at 20:25

This is because of you using static field for Forefox driver.

Static means the one per all instances. So remove static here.

Source https://stackoverflow.com/questions/70806236

QUESTION

Correct value for Yocto KMACHINE setting

Asked 2022-Jan-18 at 07:53

I'm trying to find the correct value for the KMACHINE setting, defined as "The machine as known by the kernel."

When I manually configure the kernel (outside of Yocto) I do not enter a machine type. I do set ARCH=arm, choose a "system type" config option like CONFIG_ARCH_LPC32XX=y, or load a defconfig like lpc32xx_defconfig but I don't know if any of those is what KMACHINE is supposed to be.

As an example, the Yocto documentation gives intel-core2-32 which does not appear anywhere the Linux 5.15 sources.

...

ANSWER

Answered 2022-Jan-18 at 07:53

KMACHINE is used to select Yocto-specific metadata for building the kernel, and is not passed to the kernel build system. By default, it is set to ${MACHINE} in kernel-yocto.bbclass, and can be overridden if a machine does not need its own metadata selection, and can instead use an existing metadata.

There's a better description under LINUX_KERNEL_TYPE in the manual (paraphrased):

The KMACHINE and LINUX_KERNEL_TYPE variables define the search arguments used by Yocto's kernel tools to find the appropriate description within Yocto's kernel metadata with which to build out the kernel sources and configuration.

This kernel metadata is maintained by the Yocto Project, in the yocto-kernel-cache repository. It is optional, and is only used if the selected kernel recipe is a "linux-yocto" style recipe (i.e. it inherits linux-yocto.inc).

If you're using an out-of-kernel-tree defconfig to configure your kernel, it's unlikely you'll need Yocto's kernel metadata, and therefore don't need to override KMACHINE.

Source https://stackoverflow.com/questions/70725893

QUESTION

Same compute intensive function running on two different cores resulting in different latency

Asked 2022-Jan-13 at 08:40

#include 
#include 
#include 

#include 
#include 

using namespace std;

static inline void stick_this_thread_to_core(int core_id);
static inline void* incrementLoop(void* arg);

struct BenchmarkData {
    long long iteration_count;
    int core_id;
};

pthread_barrier_t g_barrier;

int main(int argc, char** argv)
{
    if(argc != 3) {
        cout << "Usage: ./a.out  " << endl;
        return EXIT_FAILURE;
    }

    cout << "================================================ STARTING ================================================" << endl;

    int core1 = std::stoi(argv[1]);
    int core2 = std::stoi(argv[2]);

    pthread_barrier_init(&g_barrier, nullptr, 2);

    const long long iteration_count = 100'000'000'000;

    BenchmarkData benchmark_data1{iteration_count, core1};
    BenchmarkData benchmark_data2{iteration_count, core2};

    pthread_t worker1, worker2;
    pthread_create(&worker1, nullptr, incrementLoop, static_cast(&benchmark_data1));
    cout << "Created worker1" << endl;
    pthread_create(&worker2, nullptr, incrementLoop, static_cast(&benchmark_data2));
    cout << "Created worker2" << endl;

    pthread_join(worker1, nullptr);
    cout << "Joined worker1" << endl;
    pthread_join(worker2, nullptr);
    cout << "Joined worker2" << endl;

    return EXIT_SUCCESS;
}

static inline void stick_this_thread_to_core(int core_id) {
    int num_cores = sysconf(_SC_NPROCESSORS_ONLN);
    if (core_id < 0 || core_id >= num_cores) {
        cerr << "Core " << core_id << " is out of assignable range.\n";
        return;
    }

    cpu_set_t cpuset;
    CPU_ZERO(&cpuset);
    CPU_SET(core_id, &cpuset);

    pthread_t current_thread = pthread_self();

    int res = pthread_setaffinity_np(current_thread, sizeof(cpu_set_t), &cpuset);

    if(res == 0) {
        cout << "Thread bound to core " << core_id << " successfully." << endl;
    } else {
        cerr << "Error in binding this thread to core " << core_id << '\n';
    }
}

static inline void* incrementLoop(void* arg)
{
    BenchmarkData* arg_ = static_cast(arg);
    int core_id = arg_->core_id;
    long long iteration_count = arg_->iteration_count;

    stick_this_thread_to_core(core_id);

    cout << "Thread bound to core " << core_id << " will now wait for the barrier." << endl;
    pthread_barrier_wait(&g_barrier);
    cout << "Thread bound to core " << core_id << " is done waiting for the barrier." << endl;

    long long data = 0; 
    long long i;

    cout << "Thread bound to core " << core_id << " will now increment private data " << iteration_count / 1'000'000'000.0 << " billion times." << endl;
    std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();
    for(i = 0; i < iteration_count; ++i) {
        ++data;
        __asm__ volatile("": : :"memory");
    }

    std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
    unsigned long long elapsed_time = std::chrono::duration_cast(end - begin).count();

    cout << "Elapsed time: " << elapsed_time << " ms, core: " << core_id << ", iteration_count: " << iteration_count << ", data value: " << data << ", i: " << i << endl;

    return nullptr;
}

...

ANSWER

Answered 2022-Jan-13 at 08:40

It turns out that cores 0, 16, 17 were running at much higher frequency on my Skylake server.

Source https://stackoverflow.com/questions/70668229

QUESTION

Error undefined hidden symbol `__dso_handle' can not be used when making a shared object

Asked 2021-Dec-09 at 08:55

Yocto 3.4 failed at the glibc 2.34 compilation.

I think that the error happening at the linking stage:

...

ANSWER

Answered 2021-Dec-09 at 08:55

I tried with the definition in dso_handle.h as below:

Source https://stackoverflow.com/questions/70256796

QUESTION

Build Alexa Auto SDK on Apple Silicon

Asked 2021-Nov-20 at 08:33

I try to build Alexa Auto SDK https://github.com/alexa/alexa-auto-sdk/blob/3.2/builder/README.md

and I use an Apple Silicon M1, installed Docker successfully but

sadly I run now with

./builder/build.sh android -t androidx86-64 --android-api 28

into

...

ANSWER

Answered 2021-Nov-20 at 08:33

I don't know if this will solve your problem, but I was facing a similar issue building the auto-sdk for android on MAC OS machines (intel based silicon) We were able to solve the problem by increasing the docker default ram usage (set to 2gb). https://docs.docker.com/desktop/mac/ After increasing to 6gb it worked perfectly.

Source https://stackoverflow.com/questions/69188988

QUESTION

openblas.bb:do_compile failed with exit code '1"

Asked 2021-Nov-07 at 16:09

I am trying to build the openblas.bb in a yocto project but it fails.
machine is "qemux86-64"

...

ANSWER

Answered 2021-Nov-02 at 15:25

I changed to use the newest version of the openblas:

Source https://stackoverflow.com/questions/69768022

QUESTION

Updating R packages in anaconda but getting a clang-12 error

Asked 2021-Sep-15 at 22:16

I'm on a mac m1 machine. I'm using RStudio in Anaconda and I wanted to update the R packages with the update button. However, I got the same error for many of the packages when I tried to update. Here is one example:

...

ANSWER

Answered 2021-Sep-15 at 22:16

It is simpler to avoid using install.packages() when using an R environment managed by Conda, especially when the package involves compilation. Instead, prefer using Conda for installation. In this particular case, use

Source https://stackoverflow.com/questions/69197113

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install core2

You can download it from GitHub.
Rust is installed and managed by the rustup tool. Rust has a 6-week rapid release process and supports a great number of platforms, so there are many builds of Rust available at any time. Please refer rust-lang.org for more information.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: