Protector | fully automatic moderation bot that uses machine learning | Bot library
kandi X-RAY | Protector Summary
kandi X-RAY | Protector Summary
Click here For an invite link, You may also Join our Discord server If you would like to test out the bot before adding it to your server.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Handle incoming messages
- Handle a message
- Predict the prediction
- Send a log message
- Clean message
- Filter out characters that are printable
- Returns a time string
- Load configuration
- Handle edited messages
Protector Key Features
Protector Examples and Code Snippets
Community Discussions
Trending Discussions on Protector
QUESTION
With gcc
, is it possible to compile with -fstack-protector
, but omit for a specific function.
For example, say i have two functions.
...ANSWER
Answered 2022-Apr-17 at 22:14You'd have to test if it works (inspect the generated code at Godbolt) but it looks like you can do, for example:
__attribute__ ((no_stack_protector)) void foo () { ... }
no_sanitize
looks like an intriguing option, I wonder who uses that.
QUESTION
I'm trying to make sure gcc vectorizes my loops. It turns out, that by using -march=znver1
(or -march=native
) gcc skips some loops even though they can be vectorized. Why does this happen?
In this code, the second loop, which multiplies each element by a scalar is not vectorised:
...ANSWER
Answered 2022-Apr-10 at 02:47The default -mtune=generic
has -mprefer-vector-width=256
, and -mavx2
doesn't change that.
znver1 implies -mprefer-vector-width=128
, because that's all the native width of the HW. An instruction using 32-byte YMM vectors decodes to at least 2 uops, more if it's a lane-crossing shuffle. For simple vertical SIMD like this, 32-byte vectors would be ok; the pipeline handles 2-uop instructions efficiently. (And I think is 6 uops wide but only 5 instructions wide, so max front-end throughput isn't available using only 1-uop instructions). But when vectorization would require shuffling, e.g. with arrays of different element widths, GCC code-gen can get messier with 256-bit or wider.
And vmovdqa ymm0, ymm1
mov-elimination only works on the low 128-bit half on Zen1. Also, normally using 256-bit vectors would imply one should use vzeroupper
afterwards, to avoid performance problems on other CPUs (but not Zen1).
I don't know how Zen1 handles misaligned 32-byte loads/stores where each 16-byte half is aligned but in separate cache lines. If that performs well, GCC might want to consider increasing the znver1 -mprefer-vector-width
to 256. But wider vectors means more cleanup code if the size isn't known to be a multiple of the vector width.
Ideally GCC would be able to detect easy cases like this and use 256-bit vectors there. (Pure vertical, no mixing of element widths, constant size that's am multiple of 32 bytes.) At least on CPUs where that's fine: znver1, but not bdver2 for example where 256-bit stores are always slow due to a CPU design bug.
You can see the result of this choice in the way it vectorizes your first loop, the memset-like loop, with a vmovdqu [rdx], xmm0
. https://godbolt.org/z/E5Tq7Gfzc
So given that GCC has decided to only use 128-bit vectors, which can only hold two uint64_t
elements, it (rightly or wrongly) decides it wouldn't be worth using vpsllq
/ vpaddd
to implement qword *5
as (v<<2) + v
, vs. doing it with integer in one LEA instruction.
Almost certainly wrongly in this case, since it still requires a separate load and store for every element or pair of elements. (And loop overhead since GCC's default is not to unroll except with PGO, -fprofile-use
. SIMD is like loop unrolling, especially on a CPU that handles 256-bit vectors as 2 separate uops.)
I'm not sure exactly what GCC means by "not vectorized: unsupported data-type". x86 doesn't have a SIMD uint64_t
multiply instruction until AVX-512, so perhaps GCC assigns it a cost based on the general case of having to emulate it with multiple 32x32 => 64-bit pmuludq
instructions and a bunch of shuffles. And it's only after it gets over that hump that it realizes that it's actually quite cheap for a constant like 5
with only 2 set bits?
That would explain GCC's decision-making process here, but I'm not sure it's exactly the right explanation. Still, these kinds of factors are what happen in a complex piece of machinery like a compiler. A skilled human can easily make smarter choices, but compilers just do sequences of optimization passes that don't always consider the big picture and all the details at the same time.
-mprefer-vector-width=256
doesn't help:
Not vectorizing uint64_t *= 5
seems to be a GCC9 regression
(The benchmarks in the question confirm that an actual Zen1 CPU gets a nearly 2x speedup, as expected from doing 2x uint64 in 6 uops vs. 1x in 5 uops with scalar. Or 4x uint64_t in 10 uops with 256-bit vectors, including two 128-bit stores which will be the throughput bottleneck along with the front-end.)
Even with -march=znver1 -O3 -mprefer-vector-width=256
, we don't get the *= 5
loop vectorized with GCC9, 10, or 11, or current trunk. As you say, we do with -march=znver2
. https://godbolt.org/z/dMTh7Wxcq
We do get vectorization with those options for uint32_t
(even leaving the vector width at 128-bit). Scalar would cost 4 operations per vector uop (not instruction), regardless of 128 or 256-bit vectorization on Zen1, so this doesn't tell us whether *=
is what makes the cost-model decide not to vectorize, or just the 2 vs. 4 elements per 128-bit internal uop.
With uint64_t
, changing to arr[i] += arr[i]<<2;
still doesn't vectorize, but arr[i] <<= 1;
does. (https://godbolt.org/z/6PMn93Y5G). Even arr[i] <<= 2;
and arr[i] += 123
in the same loop vectorize, to the same instructions that GCC thinks aren't worth it for vectorizing *= 5
, just different operands, constant instead of the original vector again. (Scalar could still use one LEA). So clearly the cost-model isn't looking as far as final x86 asm machine instructions, but I don't know why arr[i] += arr[i]
would be considered more expensive than arr[i] <<= 1;
which is exactly the same thing.
GCC8 does vectorize your loop, even with 128-bit vector width: https://godbolt.org/z/5o6qjc7f6
QUESTION
I'm trying to wrap my head around Rust's aliasing rules in the following situation:
Let's assume we have a memory allocation in C. We pass a pointer to this allocation to Rust. The Rust function does something with the allocation, and then calls back to C code (without any parameters), where another rust function is called with the same allocation as parameter. For now, let's assume that only the first Rust function gets a mutable reference.
The call stack would look like:
...ANSWER
Answered 2022-Jan-08 at 20:11Is the above code invoking undefined behaviour?
Yes, you've broken Rust's pointer aliasing rules. Relying on the Stacked Borrows rules is a bit dubious since, as you've hinted, I don't think it has been officially adopted as Rust's memory access model (even if it was just a formalization on the current semantics). However, something that is a practical and concrete ruling is LLVM's noalias
attribute, which the Rust compiler uses on &mut
arguments.
This indicates that memory locations accessed via pointer values based on the argument or return value are not also accessed, during the execution of the function, via pointer values not based on the argument or return value. ...
So since you access ints[1]
in another_rust_function
from a pointer that is not based based on the ints
in first_rust_function
during the execution of that function, it is a violation. In light of this undefined behavior, I believe the compiler is well within its rights to make the code print "Second value: 4".
Would the behaviour be well-defined if
called_from_rust()
would have the pointer as parameter and pass it forward:void called_from_rust(const int * i) { another_rust_function(i); }
?
Yes, that will make it well defined. You can see that because the Rust borrow checker can then see that the value can be used in called_from_rust()
and will prevent improper use of ints
around that call.
What if both Rust functions were using
&mut [c_int;3]
?
If you used the fix above, where the second borrow is based on the first, then there's no problem. If you didn't though, then it is way worse.
QUESTION
I'm trying to add rotation metadata to the video recorded from RTSP stream. All works fine until I try to run recording with segment format. My command looks like this:
...ANSWER
Answered 2022-Feb-11 at 10:03I found out it has been resolved in
and it works fine in ffmpeg 5.0. You can also apply this patch to 4.4.
QUESTION
I am trying to run the training of stylegan2-pytorch on a remote system. The remote system has gcc (9.3.0) installed on it. I'm using conda env that has the following installed (cudatoolkit=10.2, torch=1.5.0+, and ninja=1.8.2, gcc_linux-64=7.5.0). I encounter the following error:
...ANSWER
Answered 2021-Dec-12 at 16:12Just to share, not sure it will help you. However it shows that in standard conditions it is possible to use the conda
gcc
as described in the documentation instead of the system gcc
.
QUESTION
I am utilizing yocto (dunfell) to cross-compile a project for multiple different architectures. Specifically, the targets I have are a 64-bit RaspberryPi4 (aarch64) and a 32-bit Orange Pi (armhf). My project that I am cross-compiling compiles and runs without issue when building for the raspi target; the runtime linker is properly set and things run without issue. However, whenever I build for the Orange Pi target, the program appears to compile without issue, but when I try to execute it on the platform, I get a "File not found" error.
This appears to be because the interpreter (runtime linker) is set to /usr/lib/ld.so
which is not actually on the system. See below:
ANSWER
Answered 2021-Dec-03 at 13:53After a few days of debugging, I figured out there problem. If anyone with more knowledge than I on linking would like to chime in to add things, please do. Ultimately, this was resolved by using gcc
as the linker as opposed to using ld
(the ones provided by yocto's cross compiler; i.e. aarch64-poky-linux-gcc
).
In order to do this, I modified my recipe to pass in LD=${CC} LDFLAGS=${LDFLAGS}
to my Makefile. Now, it builds and executes properly for both the RPi and OrangePi targets.
I believe this is mainly the case because the LDFLAGS
provided by yocto actually can't be parsed by ld
. From my research, it looks like ld
is typically invoked by gcc
. However, the flags still need to get to the complier. So, originally, LDFLAGS
that needed to be passed into linking, weren't being passed in at all because I just assumed there was an error with doing it that way. So, be sure you're passing your LDFLAGS
that yocto gives you into gcc
.
QUESTION
I've had a bit of a look around Stackoverflow and the wider Internet and identified that the most common causes for this error are conflation of declaration (int var = 1;
) and definition (int var;
), and including .c
files from .h
files.
My small project I just split from one file into several is not doing any of these things. I'm very confused.
I made a copy of the project and deleted all the code in the copy (which was fun) until I reached here:
main.c ...ANSWER
Answered 2021-Nov-10 at 21:14Yes there was a change in behaviour.
In C you are supposed to only define a global variable in one translation unit, other translation unit that want to access the variable should declare it as "extern".
In your code, a.h is included in both a.c and main.c so the variable is defined twice. To fix this you should change the "int test" in a.h to "extern int test", then add "int test" to a.c to define the variable exactly once.
In C a definition of a global variable that does not initialise the variable is considered "tentative". You can have multiple tentative definitions of a variable in the same compilation unit. Multiple tentative defintions in different compilation units are not allowed in standard C, but were historically allowed by C compilers on unix systems.
Older versions of gcc would allow multiple tenative definitions (but not multiple non-tentative definitions) of a global variable in different compilation units by default. gcc-10 does not. You can restore the old behavior with the command line option "-fcommon" but this is discouraged.
QUESTION
I use the following command to figure out the sequence of clang O3,
$ opt -enable-new-pm=0 -O3 -debug-pass=Arguments input.ll
and I get a very long optimization sequence.
Is that sequence same for all of code? Or O3 can change the order according to the source code?
And, I found that if I use -O0
flag to generate IR file, the attributes may be like this,
ANSWER
Answered 2021-Nov-04 at 10:44Yes, optimization sequence is the same for all inputs. But note that opt
's -O3 may not be the same as clang
's -O3.
As for disabling the frame pointer, you can remove it
- with
-fomit-frame-pointer
when generating LLVM IR withclang
:
QUESTION
Short and sweet:
I'm writing an Rcpp package that uses zlib and sqlite.
In the following Makevars.win
file, I set Compiler flags and try to set some targets.
ANSWER
Answered 2021-Oct-30 at 00:21There are a lot of things going on there we need to decompose.
First off, you managed to have SHLIB
use your enumerated list of object files. Good! I recently had to the same and I used a OBJECTS
list. I think you may get lucky if you stick the -fstack-protector
into PKG_LIBS
because the PKG_*
variables are there for your expand on the defaults use (in the hidden Makefile
controlled by R). Whereas ... LDFLAGS
may just get ignored.
Otherwise, I would recommend to sample among the 4000+ CRAN packages with compiled code. Some will set similar things, the search with the 'CRAN' "org" at GitHub is crude but better than nuttin'. Good luck!
Edit: You could look at my (more complicated still) Makevars.win
for RInside. I just grep'ed among all the repos I have here and I don't have a current example of anybody setting -fSOMETHING
on Windows.
Edit 2: I do actually have a better example for your. Each and every RcppArmadillo package uses
QUESTION
Installing the readline extension on php:7.3-fpm-alpine fails for these Alpine version:
- php:7.3-fpm-alpine3.14 / php:7.3-fpm-alpine
- php:7.3-fpm-alpine3.13
This is the minimum Dockerfile:
...ANSWER
Answered 2021-Oct-04 at 08:06Readline is a default extension
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install Protector
Create a channel called protector-log, this is where deletes and warnings will be logged
????
Profit!!
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page