intrin | Compatibility intrin.h header for GCC | Compiler library

by yuikns C Version: Current License: No License

X-Ray Key Features Code Snippets Community Discussions(9)Vulnerabilities Install Support

kandi X-RAY | intrin Summary

intrin is a C library typically used in Utilities, Compiler applications. intrin has no bugs, it has no vulnerabilities and it has low support. You can download it from GitHub.

Compatibility header for GCC

Support

Quality

Security

License

Reuse

Support

intrin has a low active ecosystem.

It has 4 star(s) with 1 fork(s). There are 1 watchers for this library.

It had no major release in the last 6 months.

intrin has no issues reported. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of intrin is current.

Quality

intrin has 0 bugs and 0 code smells.

Security

intrin has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

intrin code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

intrin does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

intrin releases are not available. You will need to build from source code and install.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of intrin

Get all kandi verified functions for this library.

intrin Key Features

No Key Features are available at this moment for intrin.

intrin Examples and Code Snippets

No Code Snippets are available at this moment for intrin.

Community Discussions

Trending Discussions on intrin

Code duplication issue without define macro

What calling convention should I use to make things portable?

HeapAlloc hooking with miniHook, deadlock on Windows 10, works on Windows 7

Achieving the effect of 'volatile' without the added MOV* instructions?

Installing a compiler for Qt4.8.7 on Windows10

Code alignment dramatically affects performance

Compiling dependencies in node.js

Are there compatibility issues with clang-cl and arch:avx2?

rdtsc returns no results

QUESTION

Code duplication issue without define macro

Asked 2022-Apr-16 at 12:55

I'm trying to imitate CUDA/OpenCL workflow using vectorized functions like this:

...

ANSWER

Answered 2022-Apr-16 at 12:55

You could use a generic lambda (C++14) to achieve something like this. Note that this requires you to change the type of Kernel::kernel and change the creation of the kernel a bit to allow for automatic type deduction:

Kernel

Source https://stackoverflow.com/questions/71893243

QUESTION

What calling convention should I use to make things portable?

Asked 2022-Mar-11 at 03:23

I am writing a C interface for CPU's cpuid instruction. I'm just doing this as kind of an exercise: I don't want to use compiler-depended headers such as cpuid.h for GCC or intrin.h for MSVC. Also, I'm aware that using C inline assembly would be a better choice, since it avoids thinking about calling conventions (see this implementation): I'd just have to think about different compiler's syntaxes. However I'd like to start practicing a bit with integrating assembly and C.

Given that I now have to write a different assembly implementation for each major assembler (I was thinking of GAS, MASM and NASM) and for each of them both for x86-64 and x86, how should I handle the fact that different machines and C compilers may use different calling conventions?

...

ANSWER

Answered 2022-Mar-11 at 03:23

If you really want to write, as just an exercise, an assembly function that "conforms" to all the common calling conventions for x86_64 (I know only the Windows one and the System V one), without relying on attributes or compiler flags to force the calling convention, let's take a look at what's common.

The Windows GPR passing order is rcx, rdx, r8, r9. The System V passing order is rdi, rsi, rdx, rcx, r8, r9. In both cases, rax holds the return value if it fits and is a piece of POD. Technically speaking, you can get away with a "polyglot" called function if it (0) saves the union of what each ABI considers non-volatile, and (1) returns something that can fit in a single register, and (2) takes no more than 2 GPR arguments, because overlap would happen past that. To be absolutely generic, you could make it take a single pointer to some structure that would hold whatever arbitrary return data you want.

So now our arguments will come through either rcx and rdx or rdi and rsi. How do you tell which will contain the arguments? I'm actually not sure of a good way. Maybe what you could do instead is have a wrapper that puts the arguments in the right spot, and have your actual function take "padding" arguments, so that your arguments always land in rcx and rdx. You could technically expand to r8 and r9 this way.

Source https://stackoverflow.com/questions/71361589

QUESTION

HeapAlloc hooking with miniHook, deadlock on Windows 10, works on Windows 7

Asked 2021-Nov-08 at 17:10

I have a most peculiar bug... I'm hooking HeapAlloc to log all calls and get the name the DLLs calling the API. The code works on Windows 7, but doesn't work on Windows 10. I use miniHook for hooking. Everything compiled with Visual Studio 2019, v142.

...

ANSWER

Answered 2021-Nov-08 at 17:10

So, answering my own question. I found the issue. GetModuleHandleExA increments the module's reference count. Turns out, if you increment the reference count too much, there is a deadlock. I have no idea why... Adding the flag GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT fixes the issue.

Source https://stackoverflow.com/questions/69838353

QUESTION

Achieving the effect of 'volatile' without the added MOV* instructions?

Asked 2021-Sep-10 at 12:12

I have a piece of code that must run under all circumstances, as it modifies things outside of its own scope. Let's define that piece of code as:

...

ANSWER

Answered 2021-Sep-10 at 11:27

volatile is directly relevant to optimizers because reads and writes of volatile variables are observable behavior. That means the reads and writes cannot be removed.

Similarly, optimizers cannot remove writes of variables that are observable by other means - whether you write the variable to std::cout, file or socket. And the burden of proof is on the compiler - the write can only be eliminated if the write is provably dead.

In the example above, for instance, mymap.begin()->first is written to std::cout. That is observable behavior, so even in absence of volatile the behavior must be kept. But the exact details do not matter. An optimizer may spot that only the ->first member is observed in this particular example. Hence, v (the ->second value) is not observed, and can legally be optimized out.

But if you copy mymap.begin()->second to a volatile float sink, then that write to sink is observable behavior, and the compiler must make sure the right value is written. That pretty much means that your v calculation inside the loop needs to be preserved, even though v itself is not volatile.

The compiler could do loop unrolls that affect how v is read and written, because the individual v updates are no longer observable. Only the value that's eventually written to volatile float sink counts.

Source https://stackoverflow.com/questions/69130349

QUESTION

Installing a compiler for Qt4.8.7 on Windows10

Asked 2021-Aug-08 at 11:28

I am trying to install Qt4.8.7 for Windows 10 and I am having some issues with installing the corresponding compiler.

I got the Qt4.8.7 installer from this link: https://download.qt.io/archive/qt/4.8/4.8.7/ and I have tried working with the MSVC2010 and the mingw versions. For the MSVC2010 version, I followed this guide https://wiki.qt.io/How_to_setup_MSVC2010 (with a lot of dead links) and installed the compiler alongside the MSVC service pack 1 and Windows SDK 7.1. I have not been able to find an installer for Visual Studio 2010 or the VS service pack 1. Qt studio recognises the version of qt I have installed alongside the corresponding MSVC2010 x86 compiler but when I compile I get this error for a missing header: "C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\include\intrin.h:26: error: C1083: Cannot open include file: 'ammintrin.h': No such file or directory".

For the mingw version, I have not been able to find the correct version "mingw482" and other versions I have tried do not seem to be compatible. I have tried mingw installer programs as well as using the QT online installer to try and find the correct version but I haven't had much luck when compiling.

Has anyone got qt4.8.7 running on windows recently? If so, could you please point me in the right direction for installing the correct compiler?

Many thanks.

...

ANSWER

Answered 2021-Aug-08 at 11:28

Here a short description for getting it to work with Visual Studio 2008 and the newest Qt Creator 4.13.

You will need:

Visual Studio 2008 Express for the build tools, there are no standalone build tools as far as I'm aware
Qt 4.8.7 precompiled for VS2008 from this link to Qt archives at the time of writing this the version you need is called "qt-opensource-windows-x86-vs2008-4.8.7.exe"
Any Windows debugger cdb.exe

Steps (all absolute paths are standard installation paths):

Install VS2008
Install Qt 4.8.7
Open your Qt Creator go to Tools->Options...->Kits->Tab Compilers and search for "Microsoft Visual C++ Compiler 9.0", it probably won't be there so you will need to add it by hand by looking for the vcvarsall.bat of this compiler. You will find it in C:/Program Files(x86)/Microsoft Visual Studio 9.0/VC/vcvarsall.bat. Repeat for C, C++, x86 and x64. Press save
Open the Qt-Versions tab and look for Qt 4.8.7 Version. It will probably not be there again so add it by hand by selecting the qmake.exe from C:/Qt/4.8.7/bin/qmake.exe. Press save
Open the Kits tab and add a new kit. Select your Qt 4.8.7 version and the MS compilers for C and C++, your favorite debugger and input the Qt-makespec win32-msvc2008. Press save again

Now you should be able to compile your project from Qt Creator and Qt-colored-commandline. For integration of MSVC 9.0 into Visual Studio 2015 and newer you will also need to install Visual Studio 2012 Express. In that order:

VS2008
VS2012 (Here MS programmed in some magic so newer VS can see older build tools)
VS201x

It could work in any other order but don't rely on it. Also it could just flat out not work and you will waste a week of your life to fix it; but then it will work.

Haven't tested it but I could imagine the same workflow will work for VS2010.

Source https://stackoverflow.com/questions/68668084

QUESTION

Code alignment dramatically affects performance

Asked 2021-May-09 at 19:06

Today I have found sample code which slowed down by 50%, after adding some unrelated code. After debugging I have figured out the problem was in the loop alignment. Depending of the loop code placement there is different execution time e.g.:

Address Time[us] 00007FF780A01270 980us 00007FF7750B1280 1500us 00007FF7750B1290 986us 00007FF7750B12A0 1500us

I didn't expect previously that code alignment may have such a big impact. And I thought my compiler is smart enough to align the code correctly.

What exactly cause such a big difference in execution time ? (I suppose some processor architecture details).

The test program I have compiled in Release mode with Visual Studio 2019 and run it on Windows 10. I have checked the program on 2 processors: i7-8700k (the results above), and on intel i5-3570k but the problem does not exist there and the execution time is always about 1250us. I have also tried to compile the program with clang, but with clang the result is always ~1500us (on i7-8700k).

My test program:

...

ANSWER

Answered 2021-May-07 at 22:18

I thought my compiler is smart enough to align the code correctly.

As you said, the compiler is always aligning things to a multiple of 16 bytes. This probably does account for the direct effects of alignment. But there are limits to the "smartness" of the compiler.

Besides alignment, code placement has indirect performance effects as well, because of cache associativity. If there is too much contention for the few cache lines that can map to this address, performance will suffer. Moving to an address with less contention makes the problem go away.

The compiler may be smart enough to handle cache contention effects as well, but only IF you turn on profile-guided optimization. The interactions are far too complex to predict in a reasonable amount of work; it is much easier to watch for cache conflicts by actually running the program and that's what PGO does.

Source https://stackoverflow.com/questions/67442222

QUESTION

Compiling dependencies in node.js

Asked 2021-Feb-03 at 02:24

So. I have been trying to use this project i cloned from GitHub. when I try to compile better-sqlite3@7.1.2 here is the logs:

...

ANSWER

Answered 2021-Feb-03 at 02:24

That's a native module. If it's not providing prebuilt binary, then you will need to compile from source.

It's using node-gyp to build. In order for that to work, you need to install all node-gyp requirements => See here: https://github.com/nodejs/node-gyp#on-unix

Source https://stackoverflow.com/questions/66020327

QUESTION

Are there compatibility issues with clang-cl and arch:avx2?

Asked 2020-Jun-21 at 04:28

I'm using Windows 10, Visual Studio 2019, Platform: x64 and have the following test script in a single-file Visual Studio Solution:

...

ANSWER

Answered 2020-Jun-21 at 04:28

Getting 4 instead of 59 sounds like clang implemented _BitScanReverse64 as 63 - lzcnt. Actual bsr is slow on AMD, so yes there are reasons why a compiler would want to compiler a BSR intrinsic to a different instruction.

But then you ran the executable on a computer that doesn't actually support BMI so lzcnt decoded as rep bsr = bsr, giving the leading-zero count instead of the bit-index of the highest set bit.

AFAIK, all CPUs that have AVX2 also have BMI. If your CPU doesn't have that, you shouldn't expect your executables build with /arch:AVX2 to run correctly on your CPU. And in this case the failure mode wasn't an illegal instruction, it was lzcnt running as bsr.

MSVC doesn't generally optimize intrinsics, apparently including this case, so it just uses bsr directly.

Update: i7-3930K is SandyBridge-E. It doesn't have AVX2, so that explains your results.

clang-cl doesn't error when you tell it to build an AVX2 executable on a non-AVX2 computer. The use-case for that would be compiling on one machine to create an executable to run on different machines.

It also doesn't add CPUID-checking code to your executable for you. If you want that, write it yourself. This is C++, it doesn't hold your hand.

target CPU options

MSVC-style /arch options are much more limited than normal GCC/clang style. There aren't any for different levels of SSE like SSE4.1; it jumps straight to AVX.

Also, /arch:AVX2 apparently implies BMI1/2, even though those are different instruction-sets with different CPUID feature bits. In kernel code for example you might want integer BMI instructions but not SIMD instructions that touch XMM/YMM registers.

clang -O3 -mavx2 would not also enable -mbmi. You normally would want that, but if you failed to also enable BMI then clang would have been stuck using bsr. (Which is actually better for Intel CPUs than 63-lzcnt). I think MSVC's /arch:AVX2 is something like -march=haswell, if it also enables FMA instructions.

And nothing in MSVC has any support for making binaries optimized to run on the computer you build them on. That makes sense, it's designed for a closed-source binary-distribution model of software development.

But GCC and clang have -march=native to enable all the instruction sets your computer supports. And also importantly, set tuning options appropriate for your computer. e.g. don't worry about making code that would be slow on an AMD CPU, or on older Intel, just make asm that's good for your CPU.

TL:DR: CPU selection options in clang-cl are very coarse, lumping non-SIMD extensions in with some level of AVX. That's why /arch:AVX2 enabled integer BMI extension, while clang -mavx2 would not have.

Source https://stackoverflow.com/questions/62493597

QUESTION

rdtsc returns no results

Asked 2020-May-18 at 22:37

i am attempting to use rdtsc for a timer, but both the eax and edx registers either remain empty or they form a number very different from the one given by the __rdtsc function from MS's instrin.h library.

here's the assembly code:

...

ANSWER

Answered 2020-May-17 at 23:36

According to Wikipedia:

The instruction RDTSC returns the TSC in EDX:EAX. In x86-64 mode, RDTSC also clears the higher 32 bits of RAX and RDX.

So, on x86, your code can simply be:

Source https://stackoverflow.com/questions/61860126

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install intrin

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: