tinymembench | Simple benchmark for memory throughput and latency | Performance Testing library
kandi X-RAY | tinymembench Summary
kandi X-RAY | tinymembench Summary
Simple benchmark for memory throughput and latency
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of tinymembench
tinymembench Key Features
tinymembench Examples and Code Snippets
Community Discussions
Trending Discussions on tinymembench
QUESTION
I'm trying build tinymembench with clang and am hitting errors on the assembly code. I can fix the easy ones (remove .func/.endfunc's) but the 'ambiguous operand size for instruction' errors on add and sub surpass my minimal assembly skills. I posted an issue on the repo but it's possible it's no longer being maintained.
Using this source file (removes the .func/.endfunc) I get errors like:
$ clang-8 x86-sse2.S
/tmp/x86-sse2-dbaa71.s:86:9: error: ambiguous operand size for instruction 'add'
add SRC, 64
^~~
/tmp/x86-sse2-dbaa71.s:87:9: error: ambiguous operand size for instruction 'add'
add DST, 64
^~~
/tmp/x86-sse2-dbaa71.s:88:9: error: ambiguous operand size for instruction 'sub'
sub SIZE, 64
^~~~
...
I looked at this answer which looks similar, but I wasn't able to translate it into an answer for these instructions.
...ANSWER
Answered 2019-May-02 at 03:23I can fix the easy ones (remove .func/.endfunc's)
The .func
macro includes a .set SRC, rsi
which defines registers according to the appropriate calling convention (x86-64 System V, Windows x64, or 32-bit with stack args).
Removing it leaves just an undefined SRC
symbol which of course is treated as a memory operand. (And add mem, imm
doesn't have either operand implying an operand-size, so it's ambiguous.)
Your "fix" introduced this bug.
Use clang -no-integrated-as
to use the system assembler instead of clang's built-in assembler. As expected, that builds https://github.com/letrout/tinymembench/blob/master/x86-sse2.S just fine on my Linux desktop with clang7.0.1. (And system assembler = GNU Binutils as
2.31.1)
QUESTION
I would like to use enhanced REP MOVSB (ERMSB) to get a high bandwidth for a custom memcpy
.
ERMSB was introduced with the Ivy Bridge microarchitecture. See the section "Enhanced REP MOVSB and STOSB operation (ERMSB)" in the Intel optimization manual if you don't know what ERMSB is.
The only way I know to do this directly is with inline assembly. I got the following function from https://groups.google.com/forum/#!topic/gnu.gcc.help/-Bmlm_EG_fE
...ANSWER
Answered 2017-Apr-11 at 10:57There are far more efficient ways to move data. These days, the implementation of memcpy
will generate architecture specific code from the compiler that is optimized based upon the memory alignment of the data and other factors. This allows better use of non-temporal cache instructions and XMM and other registers in the x86 world.
When you hard-code rep movsb
prevents this use of intrinsics.
Therefore, for something like a memcpy
, unless you are writing something that will be tied to a very specific piece of hardware and unless you are going to take the time to write a highly optimized memcpy
function in assembly (or using C level intrinsics), you are far better off allowing the compiler to figure it out for you.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install tinymembench
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page