lc0 | The rewritten engine, originally for tensorflow Now all other backends have been ported here | Game Engine library
kandi X-RAY | lc0 Summary
kandi X-RAY | lc0 Summary
Lc0 is a UCI-compliant chess engine designed to play chess via neural network, specifically those of the LeelaChessZero project.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of lc0
lc0 Key Features
lc0 Examples and Code Snippets
Community Discussions
Trending Discussions on lc0
QUESTION
I was running some tests to compare C to Java and ran into something interesting. Running my exactly identical benchmark code with optimization level 1 (-O1) in a function called by main, rather than in main itself, resulted in roughly double performance. I'm printing out the size of test_t to verify beyond any doubt that the code is being compiled to x64.
I sent the executables to my friend who's running an i7-7700HQ and got similar results. I'm running an i7-6700.
Here's the slower code:
...ANSWER
Answered 2021-Jun-07 at 22:21The slow version:
Note that the sub rax, 1 \ jne
pair goes right across the boundary of the ..80
(which is a 32byte boundary). This is one of the cases mentioned in Intels document regarding this issue namely as this diagram:
So this op/branch pair is affected by the fix for the JCC erratum (which would cause it to not be cached in the µop cache). I'm not sure if that is the reason, there are other things at play too, but it's a thing.
In the fast version, the branch is not "touching" a 32byte boundary, so it is not affected.
There may be other effects that apply. Still due to crossing a 32byte boundary, in the slow case the loop is spread across 2 chunks in the µop cache, even without the fix for JCC erratum that may cause it to run at 2 cycles per iteration if the loop cannot execute from the Loop Stream Detector (which is disabled on some processors by an other fix for an other erratum, SKL150). See eg this answer about loop performance.
To address the various comments saying they cannot reproduce this, yes there are various ways that could happen:
- Whichever effect was responsible for the slowdown, it is likely caused by the exact placement of the op/branch pair across a 32byte boundary, which happened by pure accident. Compiling from source is unlikely to reproduce the same circumstances, unless you use the same compiler with the same setup as was used by the original poster.
- Even using the same binary, regardless of which of the effects is responsible, the weird effect would only happen on particular processors.
QUESTION
My aim here is to implement a simple baremetal program for ARM, compile it manually and analyze it in GDB.
A simple example main.c
that shows my problem is:
ANSWER
Answered 2021-May-26 at 22:08Answer is:
2) The compilation process / usage of the toolchain is wrong.
You may have several problems, an important one being that the use of the -kernel
option requires the start address of your program to be 0x00010000
.
And you don't have a startup file, nor a linker script.
The following example should work fine, and is just adapted from a seminal article from Francesco Balducci on his blog.
startup.s:
QUESTION
So both GCC and Clang are smart enough to optimize printf("%s\n", "foo")
to puts("foo")
(GCC, Clang). That's good and all.
But when I run this function through Compiler Explorer:
...ANSWER
Answered 2021-May-02 at 09:25Some very specific situations are optimized, like the one you showed, but it's very superficial, if you add something to your format string, even a space, it immediately discards the puts
and goes back to printf
.
I guess that there would be nothing to stop a more broad optimization, my speculation is that, since the performance gains are not that great, further adding more special cases was deemed as not being worth it.
In my speculation, the lack of fputs
optimization would fall in that not being worth it category.
This old gcc printf
optimization document sheds some light on these optimizations, I doubt that it would much different today.
Specifically:
2.3%s\n
A printf call with the format string
%s\n
[line 4679-4687] is converted to aputs()
call.
QUESTION
The function inventory take an array of device pointers and call evaluate to find out what the variation is. The inventory function then returns a pointer that has the highest variation.
...ANSWER
Answered 2021-Apr-12 at 03:24The bug is movl %eax, 36(%rdi)
at line 38 of calibrate.s
. This is apparently supposed to write to the avg
member of the relevant Device
, but it's a 32-bit store and Device::avg
is a 16-bit short
. So it should be movw %ax, 36(%rdi)
.
Hopefully this will provide some information about what gdb can do and how to use it effectively.
I set a breakpoint at the second printf
in inventory
, at which point I did x/s $rdx
to see what string %rdx
points to:
QUESTION
I have the following code:
...ANSWER
Answered 2021-Apr-03 at 14:55Output is different, for the simple reason that compiler does not 'think' like human but follows standard.
QUESTION
Having this simple c:
...ANSWER
Answered 2021-Mar-19 at 12:41movq
instruction will copy 8 bytes, so the data of entire struct foo
is copied here:
QUESTION
This is a strange behaviour:
...ANSWER
Answered 2021-Mar-03 at 03:52The value that's printed has nothing at all to do with the value you pass in the printf
call, since integer and floating point arguments are passed in separate areas (at least, this is a common convention, and I assume that you're operating on a machine where it's true). When you ask to printf %d
without passing any integer argument, you get whatever happened to be previously sitting in the space reserved for the first integer argument, which could be anything. It might be deterministic between different runs of the same compiled program (as a result of some C runtime initialization leaving a predictable value in a register, for instance) or it might be dependent on the execution environment or the phase of the moon. You really don't know, and to be honest, this isn't a case where it's worthwhile to figure out exactly how that value got there. It's junk, and that's that.
QUESTION
Easy question, but couldn't seem to find the answer with a duckduckgo or by searching SO (here).
I am aware that in C, the standard states that uninitialized arrays of int
s results in undefined behaviour. (Or at least most compilers behave this way.)
ANSWER
Answered 2021-Mar-02 at 11:40In your example
QUESTION
I'm trying to understand how GNU interprets several things so my first example is very simple: declaration of an integer and printing it. If no optimization is invoked, the assembly code reads:
...
ANSWER
Answered 2021-Jan-15 at 18:28For whatever reason -O3
doesn't turn on the -fomit-frame-pointer
option when compiling with GCC for ARM64 targets (including GNU Fortran). You'll need to enable this option explicitly for the compiler to optimize away the use of the frame pointer in non-leaf functions:
QUESTION
I wrote a C code like this:
...ANSWER
Answered 2020-Nov-19 at 10:48-O2 turns on many options in addition to O1, for example -falign-functions -falign-jumps -falign-labels -falign-loops
. Each of them seemed to have a negative performance impact on top of -O1. I have i7-8550U and GCC 9.3.0-17ubuntu1~20.04.
I believe the branch prediction failures make this hard on the processor.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install lc0
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page