intrin | Compatibility intrin.h header for GCC | Compiler library
kandi X-RAY | intrin Summary
kandi X-RAY | intrin Summary
Compatibility
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of intrin
intrin Key Features
intrin Examples and Code Snippets
Community Discussions
Trending Discussions on intrin
QUESTION
I'm trying to imitate CUDA/OpenCL workflow using vectorized functions like this:
...ANSWER
Answered 2022-Apr-16 at 12:55You could use a generic lambda (C++14) to achieve something like this. Note that this requires you to change the type of Kernel::kernel
and change the creation of the kernel a bit to allow for automatic type deduction:
Kernel
QUESTION
I am writing a C interface for CPU's cpuid
instruction. I'm just doing this as kind of an exercise: I don't want to use compiler-depended headers such as cpuid.h
for GCC or intrin.h
for MSVC. Also, I'm aware that using C inline assembly would be a better choice, since it avoids thinking about calling conventions (see this implementation): I'd just have to think about different compiler's syntaxes. However I'd like to start practicing a bit with integrating assembly and C.
Given that I now have to write a different assembly implementation for each major assembler (I was thinking of GAS, MASM and NASM) and for each of them both for x86-64 and x86, how should I handle the fact that different machines and C compilers may use different calling conventions?
...ANSWER
Answered 2022-Mar-11 at 03:23If you really want to write, as just an exercise, an assembly function that "conforms" to all the common calling conventions for x86_64 (I know only the Windows one and the System V one), without relying on attributes or compiler flags to force the calling convention, let's take a look at what's common.
The Windows GPR passing order is rcx
, rdx
, r8
, r9
. The System V passing order is rdi
, rsi
, rdx
, rcx
, r8
, r9
. In both cases, rax
holds the return value if it fits and is a piece of POD. Technically speaking, you can get away with a "polyglot" called function if it (0) saves the union of what each ABI considers non-volatile, and (1) returns something that can fit in a single register, and (2) takes no more than 2 GPR arguments, because overlap would happen past that. To be absolutely generic, you could make it take a single pointer to some structure that would hold whatever arbitrary return data you want.
So now our arguments will come through either rcx
and rdx
or rdi
and rsi
. How do you tell which will contain the arguments? I'm actually not sure of a good way. Maybe what you could do instead is have a wrapper that puts the arguments in the right spot, and have your actual function take "padding" arguments, so that your arguments always land in rcx
and rdx
. You could technically expand to r8
and r9
this way.
QUESTION
I have a most peculiar bug... I'm hooking HeapAlloc
to log all calls and get the name the DLLs calling the API. The code works on Windows 7, but doesn't work on Windows 10. I use miniHook for hooking. Everything compiled with Visual Studio 2019, v142.
ANSWER
Answered 2021-Nov-08 at 17:10So, answering my own question. I found the issue.
GetModuleHandleExA
increments the module's reference count. Turns out, if you increment the reference count too much, there is a deadlock. I have no idea why... Adding the flag GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT
fixes the issue.
QUESTION
I have a piece of code that must run under all circumstances, as it modifies things outside of its own scope. Let's define that piece of code as:
...ANSWER
Answered 2021-Sep-10 at 11:27volatile
is directly relevant to optimizers because reads and writes of volatile
variables are observable behavior. That means the reads and writes cannot be removed.
Similarly, optimizers cannot remove writes of variables that are observable by other means - whether you write the variable to std::cout
, file or socket. And the burden of proof is on the compiler - the write can only be eliminated if the write is provably dead.
In the example above, for instance, mymap.begin()->first
is written to std::cout
. That is observable behavior, so even in absence of volatile
the behavior must be kept. But the exact details do not matter. An optimizer may spot that only the ->first
member is observed in this particular example. Hence, v
(the ->second
value) is not observed, and can legally be optimized out.
But if you copy mymap.begin()->second
to a volatile float sink
, then that write to sink
is observable behavior, and the compiler must make sure the right value is written. That pretty much means that your v
calculation inside the loop needs to be preserved, even though v
itself is not volatile
.
The compiler could do loop unrolls that affect how v
is read and written, because the individual v
updates are no longer observable. Only the value that's eventually written to volatile float sink
counts.
QUESTION
I am trying to install Qt4.8.7 for Windows 10 and I am having some issues with installing the corresponding compiler.
I got the Qt4.8.7 installer from this link: https://download.qt.io/archive/qt/4.8/4.8.7/ and I have tried working with the MSVC2010 and the mingw versions. For the MSVC2010 version, I followed this guide https://wiki.qt.io/How_to_setup_MSVC2010 (with a lot of dead links) and installed the compiler alongside the MSVC service pack 1 and Windows SDK 7.1. I have not been able to find an installer for Visual Studio 2010 or the VS service pack 1. Qt studio recognises the version of qt I have installed alongside the corresponding MSVC2010 x86 compiler but when I compile I get this error for a missing header: "C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\include\intrin.h:26: error: C1083: Cannot open include file: 'ammintrin.h': No such file or directory".
For the mingw version, I have not been able to find the correct version "mingw482" and other versions I have tried do not seem to be compatible. I have tried mingw installer programs as well as using the QT online installer to try and find the correct version but I haven't had much luck when compiling.
Has anyone got qt4.8.7 running on windows recently? If so, could you please point me in the right direction for installing the correct compiler?
Many thanks.
...ANSWER
Answered 2021-Aug-08 at 11:28Here a short description for getting it to work with Visual Studio 2008 and the newest Qt Creator 4.13.
You will need:
- Visual Studio 2008 Express for the build tools, there are no standalone build tools as far as I'm aware
- Qt 4.8.7 precompiled for VS2008 from this link to Qt archives at the time of writing this the version you need is called "qt-opensource-windows-x86-vs2008-4.8.7.exe"
- Any Windows debugger
cdb.exe
Steps (all absolute paths are standard installation paths):
- Install VS2008
- Install Qt 4.8.7
- Open your Qt Creator go to Tools->Options...->Kits->Tab Compilers and search for "Microsoft Visual C++ Compiler 9.0", it probably won't be there so you will need to add it by hand by looking for the vcvarsall.bat of this compiler. You will find it in
C:/Program Files(x86)/Microsoft Visual Studio 9.0/VC/vcvarsall.bat
. Repeat for C, C++, x86 and x64. Press save - Open the Qt-Versions tab and look for Qt 4.8.7 Version. It will probably not be there again so add it by hand by selecting the
qmake.exe
fromC:/Qt/4.8.7/bin/qmake.exe
. Press save - Open the Kits tab and add a new kit. Select your Qt 4.8.7 version and the MS compilers for C and C++, your favorite debugger and input the Qt-makespec
win32-msvc2008
. Press save again
Now you should be able to compile your project from Qt Creator and Qt-colored-commandline. For integration of MSVC 9.0 into Visual Studio 2015 and newer you will also need to install Visual Studio 2012 Express. In that order:
- VS2008
- VS2012 (Here MS programmed in some magic so newer VS can see older build tools)
- VS201x
It could work in any other order but don't rely on it. Also it could just flat out not work and you will waste a week of your life to fix it; but then it will work.
Haven't tested it but I could imagine the same workflow will work for VS2010.
QUESTION
Today I have found sample code which slowed down by 50%, after adding some unrelated code. After debugging I have figured out the problem was in the loop alignment. Depending of the loop code placement there is different execution time e.g.:
Address Time[us] 00007FF780A01270 980us 00007FF7750B1280 1500us 00007FF7750B1290 986us 00007FF7750B12A0 1500usI didn't expect previously that code alignment may have such a big impact. And I thought my compiler is smart enough to align the code correctly.
What exactly cause such a big difference in execution time ? (I suppose some processor architecture details).
The test program I have compiled in Release mode with Visual Studio 2019 and run it on Windows 10. I have checked the program on 2 processors: i7-8700k (the results above), and on intel i5-3570k but the problem does not exist there and the execution time is always about 1250us. I have also tried to compile the program with clang, but with clang the result is always ~1500us (on i7-8700k).
My test program:
...ANSWER
Answered 2021-May-07 at 22:18I thought my compiler is smart enough to align the code correctly.
As you said, the compiler is always aligning things to a multiple of 16 bytes. This probably does account for the direct effects of alignment. But there are limits to the "smartness" of the compiler.
Besides alignment, code placement has indirect performance effects as well, because of cache associativity. If there is too much contention for the few cache lines that can map to this address, performance will suffer. Moving to an address with less contention makes the problem go away.
The compiler may be smart enough to handle cache contention effects as well, but only IF you turn on profile-guided optimization. The interactions are far too complex to predict in a reasonable amount of work; it is much easier to watch for cache conflicts by actually running the program and that's what PGO does.
QUESTION
So. I have been trying to use this project i cloned from GitHub. when I try to compile better-sqlite3@7.1.2 here is the logs:
...ANSWER
Answered 2021-Feb-03 at 02:24That's a native module. If it's not providing prebuilt binary, then you will need to compile from source.
It's using node-gyp to build. In order for that to work, you need to install all node-gyp requirements => See here: https://github.com/nodejs/node-gyp#on-unix
QUESTION
I'm using Windows 10, Visual Studio 2019, Platform: x64 and have the following test script in a single-file Visual Studio Solution:
...ANSWER
Answered 2020-Jun-21 at 04:28Getting 4 instead of 59 sounds like clang implemented _BitScanReverse64
as 63 - lzcnt
. Actual bsr
is slow on AMD, so yes there are reasons why a compiler would want to compiler a BSR intrinsic to a different instruction.
But then you ran the executable on a computer that doesn't actually support BMI so lzcnt
decoded as rep bsr
= bsr
, giving the leading-zero count instead of the bit-index of the highest set bit.
AFAIK, all CPUs that have AVX2 also have BMI. If your CPU doesn't have that, you shouldn't expect your executables build with /arch:AVX2
to run correctly on your CPU. And in this case the failure mode wasn't an illegal instruction, it was lzcnt
running as bsr
.
MSVC doesn't generally optimize intrinsics, apparently including this case, so it just uses bsr
directly.
Update: i7-3930K is SandyBridge-E. It doesn't have AVX2, so that explains your results.
clang-cl doesn't error when you tell it to build an AVX2 executable on a non-AVX2 computer. The use-case for that would be compiling on one machine to create an executable to run on different machines.
It also doesn't add CPUID-checking code to your executable for you. If you want that, write it yourself. This is C++, it doesn't hold your hand.
target CPU optionsMSVC-style /arch
options are much more limited than normal GCC/clang style. There aren't any for different levels of SSE like SSE4.1; it jumps straight to AVX.
Also, /arch:AVX2
apparently implies BMI1/2, even though those are different instruction-sets with different CPUID feature bits. In kernel code for example you might want integer BMI instructions but not SIMD instructions that touch XMM/YMM registers.
clang -O3 -mavx2
would not also enable -mbmi
. You normally would want that, but if you failed to also enable BMI then clang would have been stuck using bsr
. (Which is actually better for Intel CPUs than 63-lzcnt
). I think MSVC's /arch:AVX2 is something like -march=haswell
, if it also enables FMA instructions.
And nothing in MSVC has any support for making binaries optimized to run on the computer you build them on. That makes sense, it's designed for a closed-source binary-distribution model of software development.
But GCC and clang have -march=native
to enable all the instruction sets your computer supports. And also importantly, set tuning options appropriate for your computer. e.g. don't worry about making code that would be slow on an AMD CPU, or on older Intel, just make asm that's good for your CPU.
TL:DR: CPU selection options in clang-cl are very coarse, lumping non-SIMD extensions in with some level of AVX. That's why /arch:AVX2
enabled integer BMI extension, while clang -mavx2
would not have.
QUESTION
i am attempting to use rdtsc for a timer, but both the eax and edx registers either remain empty or they form a number very different from the one given by the __rdtsc function from MS's instrin.h library.
here's the assembly code:
...ANSWER
Answered 2020-May-17 at 23:36According to Wikipedia:
The instruction RDTSC returns the TSC in EDX:EAX. In x86-64 mode, RDTSC also clears the higher 32 bits of RAX and RDX.
So, on x86, your code can simply be:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install intrin
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page