intrinsic | Provide Golang native SIMD intrinsics on x86/amd64 platform

by mengzhuo Go Version: Current License: BSD-3-Clause

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | intrinsic Summary

intrinsic is a Go library typically used in Big Data applications. intrinsic has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

Provide Golang native SIMD intrinsics on x86/amd64 platform.

Support

Quality

Security

License

Reuse

Support

intrinsic has a low active ecosystem.

It has 42 star(s) with 1 fork(s). There are 4 watchers for this library.

It had no major release in the last 6 months.

There are 1 open issues and 0 have been closed. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of intrinsic is current.

Quality

intrinsic has 0 bugs and 0 code smells.

Security

intrinsic has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

intrinsic code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

intrinsic is licensed under the BSD-3-Clause License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

intrinsic releases are not available. You will need to build from source code and install.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi has reviewed intrinsic and discovered the below as its top functions. This is intended to give you an instant insight into intrinsic implemented functionality, and help decide if they suit your requirements.

makeInst creates a list of Instances .
ParseInst parses a struct into an Inst struct .
Generate a table .
readDesc reads the dest file
FrameSize returns the size of the frame in bytes .
set returns a map of strings
Feature Parse a feature string
PMOVOVSXWDm64 byte array
PMOVOVZXZXBWm64 byte array
PMOVOVZXBQM16byte byte array

Get all kandi verified functions for this library.

intrinsic Key Features

No Key Features are available at this moment for intrinsic.

intrinsic Examples and Code Snippets

No Code Snippets are available at this moment for intrinsic.

Community Discussions

Trending Discussions on intrinsic

Strange kotlin checkNotNullParameter error

Overload Fortran intrinsic operator on intrinsic types?

Slow SIMD performance - no inlining

Efficient overflow-immune arithmetic mean in C/C++

getting error when i deploy the NFT with ETH

Memory semantics of java.lang.ref.Reference methods

Why don't I have to specify that the result of a fortran function is being passed by value to my C++ program?

Difference between _mm256_extractf32x4_ps and _mm256_extractf128_ps

Why does hint::spin_loop use ISB on aarch64?

parameter pendingDynamicLinkData specified as non-null is null

QUESTION

Strange kotlin checkNotNullParameter error

Asked 2022-Apr-12 at 11:53

we received a crash on Firebase for a kotlin method:

...

ANSWER

Answered 2022-Apr-12 at 11:53

Shouldn't the exception be thrown way before getting to the constructor call for DeliveryMethod?

Within Kotlin, it's not possible for a non-null parameter to be given a null value at runtime accidentally (because the code wouldn't have compiled in the first place). However, this can happen if the value is passed from Java. This is why the Kotlin compiler tries to protect you from Java's null unsafety by generating null-checks at the beginning of some methods (with the intrinsic checkNotNullParameter you're seeing fail here).

However, there is no point in doing that in private or suspend methods since they can only be called from Kotlin (usually), and it would add some overhead that might not be acceptable in performance-sensitive code. That is why these checks are only generated for non-suspend public/protected/internal methods (because their goal is to prevent misuse from Java).

This is why, if you manage to call addSingleDMInAd with a null argument, it doesn't fail with this error. That said, it would be interesting to see how you're getting the null here, because usually the checks at the public API surface are enough. Is some reflection or unsafe cast involved here?

EDIT: with the addition of the calling code, this clears up the problem. You're calling a method that takes a List from Java, with a list that contains nulls. Unfortunately Kotlin only checks the parameters themselves (in this case, it checks that the list itself is not null), it doesn't iterate your list to check for nulls inside. This is why it didn't fail at the public API surface in this case.

Also, the way your model is setup is quite strange. It seems the lateinit is lying because depending on which constructor is used, the properties may actually not be set at all. It would be safer to mark them as nullable to account for when users of that class don't set the value of these properties. Doing this, you won't even need all secondary constructors, and you can just use default values:

Source https://stackoverflow.com/questions/71826442

QUESTION

Overload Fortran intrinsic operator on intrinsic types?

Asked 2022-Apr-10 at 20:50

For the code

...

ANSWER

Answered 2022-Apr-10 at 20:50

This is forbidden by this restriction in Fortran 2018

15.4.3.4.2
Defined operations 1 If OPERATOR is specified in a generic specification, all of the procedures specified in the generic interface shall be functions that may be referenced as defined operations (10.1.6, 15.5). In the case of functions of two arguments, infix binary operator notation is implied. In the case of functions of one argument, prefix operator notation is implied. OPERATOR shall not be specified for functions with no arguments or for functions with more than two arguments. The dummy arguments shall be nonoptional dummy data objects and shall have the INTENT (IN) or VALUE attribute. The function result shall not have assumed character length. If the operator is an intrinsic-operator (R608), the number of dummy arguments shall be consistent with the intrinsic uses of that operator, and the types, kind type parameters, or ranks of the dummy arguments shall differ from those required for the intrinsic operation (10.1.5).

Source https://stackoverflow.com/questions/71820344

QUESTION

Slow SIMD performance - no inlining

Asked 2022-Apr-10 at 07:02

Consider following examples for calculating sum of i32 array:

Example1: Simple for loop

...

ANSWER

Answered 2022-Apr-09 at 09:13

It appears you forgot to tell rustc it was allowed to use AVX2 instructions everywhere, so it couldn't inline those functions. Instead, you get a total disaster where only the wrapper functions are compiled as AVX2-using functions, or something like that.

Works fine for me with -O -C target-cpu=skylake-avx512 (https://godbolt.org/z/csY5or43T) so it can inline even the AVX512VL load you used, _mm256_load_epi32¹, and then optimize it into a memory source operand for vpaddd ymm0, ymm0, ymmword ptr [rdi + 4*rax] (AVX2) inside a tight loop.

In GCC / clang, you get an error like "inlining failed in call to always_inline foobar" in this case, instead of working but slow asm. (See this for details). This is something Rust should probably sort out before this is ready for prime time, either be like MSVC and actually inline the instruction into a function using the intrinsic, or refuse to compile like GCC/clang.

Footnote 1: See How to emulate _mm256_loadu_epi32 with gcc or clang? if you didn't mean to use AVX512.

With -O -C target-cpu=skylake (just AVX2), it inlines everything else, including vpaddd ymm, but still calls out to a function that copies 32 bytes from memory to memory with AVX vmovaps. It requires AVX512VL to inline the intrinsic, but later in the optimization process it realizes that with no masking, it's just a 256-bit load it should do without a bloated AVX-512 instruction. It's kinda dumb that Intel even provided a no-masking version of _mm256_mask[z]_loadu_epi32 that requires AVX-512. Or dumb that gcc/clang/rustc consider it an AVX512 intrinsic.

Source https://stackoverflow.com/questions/71806517

QUESTION

Efficient overflow-immune arithmetic mean in C/C++

Asked 2022-Mar-10 at 14:02

The arithmetic mean of two unsigned integers is defined as:

...

ANSWER

Answered 2022-Mar-08 at 10:54

The following method avoids overflow and should result in fairly efficient assembly (example) without depending on non-standard features:

Source https://stackoverflow.com/questions/71019078

QUESTION

getting error when i deploy the NFT with ETH

Asked 2022-Mar-09 at 17:13

I am new in NFT, i am trying to create test NFT, when i am trying to deploy that NFT, i am getting this error,insufficient funds for intrinsic transaction cost, even though in my account have 1 ETH balance here i have attached my whole code of it, can anyone please help me, how to resolve this issue ? MyNFT.sol

...

ANSWER

Answered 2022-Feb-24 at 22:28

That error is clear. you do not have sufficient funds. This is how you are getting the account information:

Source https://stackoverflow.com/questions/71159017

QUESTION

Memory semantics of java.lang.ref.Reference methods

Asked 2022-Feb-28 at 17:38

I am developing some concurrent algorithms which deal with Reference objects. I am using java 17.

The thing is I don't know what's the memory semantics of operations like get, clear or refersTo. It isn't documented in the Javadoc.

Looking into the source code of OpenJdk, the referent has no modifier, such as volatile (while the next pointer for reference queues is volatile). Also, get implementation is trivial, but it is an intrinsic candidate. clear and refersTo are native. So I don't know what they really do.

When the GC clears a reference, I have to assume that all threads will see it cleared, or otherwise they would see a reference to an object (in process of being) garbage collected, but it's just an informal guess.

Is there any warranty about the memory semantics of all these operations?

If there isn't, is there a way to obtain the same warranries of a volatile access by invoking, for instance, a fence operation before and/or after calling one of these operations?

...

ANSWER

Answered 2022-Feb-28 at 17:38

When you invoke clear() on a reference object, it will only clear this particular Reference object without any impact on the rest of your application and no special memory semantics. It’s exactly like you have seen in the code, an assignment of null to a field which has no volatile modifier.

Mind the documentation of clear():

This method is invoked only by Java code; when the garbage collector clears references it does so directly, without invoking this method.

So this is not related to the event of the GC clearing a reference. Your assumption “that all threads will see it cleared” when the GC clears a reference is correct. The documentation of WeakReference states:

Suppose that the garbage collector determines at a certain point in time that an object is weakly reachable. At that time it will atomically clear all weak references to that object and all weak references to any other weakly-reachable objects from which that object is reachable through a chain of strong and soft references.

So at this point, not only all threads will agree that a weak reference has been cleared, they will also agree that all weak references to the same object have been cleared. A similar statement can be found at SoftReference and PhantomReference.

The Java Language Specification, §12.6.2. Interaction with the Memory Model refers to points where such an atomic clear may happen as reachability decision points. It specifies interactions between these points and other program actions, in terms of “comes-before di” and “comes-after di” relationships, the most import ones being:

If r is a read that sees a write w and r comes-before di, then w must come-before di.

If x and y are synchronization actions on the same variable or monitor such that so(x, y) (§17.4.4) and y comes-before di, then x must come-before di.

So, the GC action will be inserted into the synchronization order and even a racy read could not subvert it, but it’s important to keep in mind that the exact location of the reachability decision point is not known to the application. It’s obviously somewhere between the last point where get() returned a non-null reference or refersTo(null) returned false and the first point where get() returned null or refersTo(null) returned true.

For practical applications, the fact that once the reference reports the object to be garbage collected you can be sure that it won’t reappear anywhere¹, is enough. Just keep the reference object private, to be sure that not someone invoked clear() on it.

¹ Letting things like “finalizer resurrection aside”

Source https://stackoverflow.com/questions/71255376

QUESTION

Why don't I have to specify that the result of a fortran function is being passed by value to my C++ program?

Asked 2022-Feb-18 at 22:32

I am learning about fortran C++ interoperability. In this case I was trying to write a 'wrapper' function (f_mult_wrapper) to interface between my 'pure' fortran function (f_mult) and C++. The function is defined in my C code as

...

ANSWER

Answered 2022-Feb-18 at 22:32

Function results are simply not function arguments/parameters. They are passed differently and the exact mechanism depends on the ABI (calling conventions) and their type.

In some ABIs, results are passed on the stack. In other ABIs, they are passed using registers. That concerns simple types that can actually fit into registers. More complex objects may be passed using pointers (on the stack or in registers).

The by value/by reference distinction distinguishes, whether the value of the argument is passed on the stack/in the register directly, or indirectly using a pointer. It does not concern function return values.

There are simpler functions that can be C-interoperable and other Fortran functions that cannot be interoperable, e.g. functions returning arrays. Such Fortran-specific functions are implemented in a compiler-specific way. Often, a hidden argument is being passed. Such a hidden argument may contain a pointer and may be passed using a register or using the stack. The details are again dependent on the specific ABI.

For the calling conventions to the most common x86 architecture, see https://en.wikipedia.org/wiki/X86_calling_conventions There are several different variations for 32 bit and for 64 bit.

Source https://stackoverflow.com/questions/71180130

QUESTION

Difference between _mm256_extractf32x4_ps and _mm256_extractf128_ps

Asked 2022-Jan-28 at 02:24

The intel documentation for _mm256_extractf32x4_ps and _mm256_extractf128_ps read very similar. I could only spot two differences:

_mm256_extractf128_ps takes a const int as parameter, _mm256_extractf32x4_ps takes an int. This should not make any difference.
_mm256_extractf128_ps requires AVX flags, while _mm256_extractf32x4_ps requires AVX512F + AVX512VL, making the former seemingly more portable across CPUs.

What justifies the existence of _mm256_extractf32x4_ps?

...

ANSWER

Answered 2022-Jan-28 at 02:24

Right, the int arg has to become an immediate in both cases, so it needs to be a compile-time constant after constant propagation.

And yeah, there's no reason to use the no-masking version of the C intrinsic for the AVX-512VL version in C; it only really makes sense to have _mm256_mask_extractf32x4_ps and _mm256_maskz_extractf32x4_ps.

In asm you might want the AVX-512 version because an EVEX encoding is necessary to access ymm16..31, and only VEXTRACTF32X4 has an EVEX encoding. But this is IMO something your C compiler should be able to take care of for you, whichever intrinsic you write.

If your compiler optimize intrinsics at all, it will know you're compiling with AVX-512 enabled and will use whatever shuffle allows it work with the registers it picked during register allocation. (e.g. clang has a very aggressive shuffle optimizer, often using different instructions or turning shuffles into cheaper blends when possible. Or sometimes defeating efforts to write smarter code than the shuffle optimizer comes up with).

But some compilers (notably MSVC) don't optimize intrinsics, not even doing constant-propagation through them. I think Intel ICC is also like this. (I haven't looked at ICX, their newer clang/LLVM-based compiler.) This model makes it possible to use AVX-512 intrinsics without telling the compiler that it can use AVX-512 instructions on its own. In that case, compiling _mm256_extractf128_ps to VEXTRACTF32X4 to allow usage of YMM16..31 might be a problem (especially if there weren't other AVX-512VL instructions in the same block, or that will definitely execute if this one did).

Source https://stackoverflow.com/questions/70887045

QUESTION

Why does hint::spin_loop use ISB on aarch64?

Asked 2022-Jan-23 at 14:13

In std::hint there's a spin_loop function with the following definition in its documentation:

Emits a machine instruction to signal the processor that it is running in a busy-wait spin-loop (“spin lock”).

Upon receiving the spin-loop signal the processor can optimize its behavior by, for example, saving power or switching hyper-threads.

Depending on the target architecture, this compiles to either:

_mm_pause, A.K.A. the pause intrinsic on x86
yield instruction on 32-bit arm
ISB SY on 64-bit arm (aarch64)

That last one has got my head spinning a little bit (😉). I thought that ISB is a lengthy operation, which would mean that, if used within a spin lock, the thread lags a bit in trying to detect whether the lock is open again, but otherwise there's hardly any profit to it.

What are the advantages of using ISB SY instead of a NOP in a spin loop on aarch64?

...

ANSWER

Answered 2022-Jan-23 at 14:13

I had to dig into the Rust repository history to get to this answer:

The yield has been replaced with isb in c064b6560b7c:

On arm64 we have seen on several databases that ISB (instruction synchronization barrier) is better to use than yield in a spin loop. The yield instruction is a nop. The isb instruction puts the processor to sleep for some short time. isb is a good equivalent to the pause instruction on x86.

[...]

So essentially, it uses the time it takes for an ISB to complete to pause the processor, so that it wastes less power.

Peter Cordes explained it nicely in one of his comments:

ISB SY doesn't stall for long, just saves a bit of power vs. spamming loads in a tight loop.

Source https://stackoverflow.com/questions/70810121

QUESTION

parameter pendingDynamicLinkData specified as non-null is null

Asked 2022-Jan-17 at 14:36

private fun getReferralId() {
    Firebase.dynamicLinks
        .getDynamicLink(intent)
        .addOnSuccessListener(this) { pendingDynamicLinkData ->
            pendingDynamicLinkData?.link?.getQueryParameter(
                DEEP_LINK_QUERY_PARAM_REFERRAL_ID
            )?.let { refId ->
                viewModel.saveReferralId(refId)
            }
        }
}

...

ANSWER

Answered 2021-Dec-17 at 17:18

it's a bug in the library due to a play services update. To fix it, you should explicitly declare that the pendingDynamicLinkData is nullable.

Like this:

Source https://stackoverflow.com/questions/70377724

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install intrinsic

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: