FFP | Fast FIX

by maxim2266 C Version: Current License: BSD-2-Clause

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | FFP Summary

FFP is a C library. FFP has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

Fast FIX Parser (FFP) is a library for parsing Financial Information eXchange (FIX) messages. It takes input bytes as they arrive from, for example, a socket, and converts them into a representation of FIX messages which can be further analysed for semantic checks, converted into “business” structures, etc. It also provides a way to specify which tags are allowed for a particular message and verifies this specification at runtime.

Support

Quality

Security

License

Reuse

Support

FFP has a low active ecosystem.

It has 31 star(s) with 9 fork(s). There are 5 watchers for this library.

It had no major release in the last 6 months.

FFP has no issues reported. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of FFP is current.

Quality

FFP has 0 bugs and 0 code smells.

Security

FFP has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

FFP code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

FFP is licensed under the BSD-2-Clause License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

FFP releases are not available. You will need to build from source code and install.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of FFP

Get all kandi verified functions for this library.

FFP Key Features

No Key Features are available at this moment for FFP.

FFP Examples and Code Snippets

No Code Snippets are available at this moment for FFP.

Community Discussions

Trending Discussions on FFP

Why does gcc -march=znver1 restrict uint64_t vectorization?

Why don't non-strict floating-point models change the value 1 of __STDC_IEC_559__?

Is it considered normal that f = NAN may cause raising floating-point exceptions?

clang on Windows / msvc: why under FE_UPWARD printf("%.1f\n", 0.0) prints 0.1 instead of 0.0?

QNAN passed into C standard library functions (ex. llrintf): not clear whether FP exceptions are raised or not

floating point rounding giving different results for 80-bit register and 64-bit double: ill-formed code or gcc/clang bug?

darknet make error : gcc: command not found, Makefile: recipe for target failed

Build a list with a while loop using DataFrame

Writing 'Image.Image.paste()' to new file

How to write fast c++ lazy evaluation code in Fastor or Xtensor?

QUESTION

Why does gcc -march=znver1 restrict uint64_t vectorization?

Asked 2022-Apr-10 at 02:47

I'm trying to make sure gcc vectorizes my loops. It turns out, that by using -march=znver1 (or -march=native) gcc skips some loops even though they can be vectorized. Why does this happen?

In this code, the second loop, which multiplies each element by a scalar is not vectorised:

...

ANSWER

Answered 2022-Apr-10 at 02:47

The default -mtune=generic has -mprefer-vector-width=256, and -mavx2 doesn't change that.

znver1 implies -mprefer-vector-width=128, because that's all the native width of the HW. An instruction using 32-byte YMM vectors decodes to at least 2 uops, more if it's a lane-crossing shuffle. For simple vertical SIMD like this, 32-byte vectors would be ok; the pipeline handles 2-uop instructions efficiently. (And I think is 6 uops wide but only 5 instructions wide, so max front-end throughput isn't available using only 1-uop instructions). But when vectorization would require shuffling, e.g. with arrays of different element widths, GCC code-gen can get messier with 256-bit or wider.

And vmovdqa ymm0, ymm1 mov-elimination only works on the low 128-bit half on Zen1. Also, normally using 256-bit vectors would imply one should use vzeroupper afterwards, to avoid performance problems on other CPUs (but not Zen1).

I don't know how Zen1 handles misaligned 32-byte loads/stores where each 16-byte half is aligned but in separate cache lines. If that performs well, GCC might want to consider increasing the znver1 -mprefer-vector-width to 256. But wider vectors means more cleanup code if the size isn't known to be a multiple of the vector width.

Ideally GCC would be able to detect easy cases like this and use 256-bit vectors there. (Pure vertical, no mixing of element widths, constant size that's am multiple of 32 bytes.) At least on CPUs where that's fine: znver1, but not bdver2 for example where 256-bit stores are always slow due to a CPU design bug.

You can see the result of this choice in the way it vectorizes your first loop, the memset-like loop, with a vmovdqu [rdx], xmm0. https://godbolt.org/z/E5Tq7Gfzc

So given that GCC has decided to only use 128-bit vectors, which can only hold two uint64_t elements, it (rightly or wrongly) decides it wouldn't be worth using vpsllq / vpaddd to implement qword *5 as (v<<2) + v, vs. doing it with integer in one LEA instruction.

Almost certainly wrongly in this case, since it still requires a separate load and store for every element or pair of elements. (And loop overhead since GCC's default is not to unroll except with PGO, -fprofile-use. SIMD is like loop unrolling, especially on a CPU that handles 256-bit vectors as 2 separate uops.)

I'm not sure exactly what GCC means by "not vectorized: unsupported data-type". x86 doesn't have a SIMD uint64_t multiply instruction until AVX-512, so perhaps GCC assigns it a cost based on the general case of having to emulate it with multiple 32x32 => 64-bit pmuludq instructions and a bunch of shuffles. And it's only after it gets over that hump that it realizes that it's actually quite cheap for a constant like 5 with only 2 set bits?

That would explain GCC's decision-making process here, but I'm not sure it's exactly the right explanation. Still, these kinds of factors are what happen in a complex piece of machinery like a compiler. A skilled human can easily make smarter choices, but compilers just do sequences of optimization passes that don't always consider the big picture and all the details at the same time.

-mprefer-vector-width=256 doesn't help: Not vectorizing uint64_t *= 5 seems to be a GCC9 regression

(The benchmarks in the question confirm that an actual Zen1 CPU gets a nearly 2x speedup, as expected from doing 2x uint64 in 6 uops vs. 1x in 5 uops with scalar. Or 4x uint64_t in 10 uops with 256-bit vectors, including two 128-bit stores which will be the throughput bottleneck along with the front-end.)

Even with -march=znver1 -O3 -mprefer-vector-width=256, we don't get the *= 5 loop vectorized with GCC9, 10, or 11, or current trunk. As you say, we do with -march=znver2. https://godbolt.org/z/dMTh7Wxcq

We do get vectorization with those options for uint32_t (even leaving the vector width at 128-bit). Scalar would cost 4 operations per vector uop (not instruction), regardless of 128 or 256-bit vectorization on Zen1, so this doesn't tell us whether *= is what makes the cost-model decide not to vectorize, or just the 2 vs. 4 elements per 128-bit internal uop.

With uint64_t, changing to arr[i] += arr[i]<<2; still doesn't vectorize, but arr[i] <<= 1; does. (https://godbolt.org/z/6PMn93Y5G). Even arr[i] <<= 2; and arr[i] += 123 in the same loop vectorize, to the same instructions that GCC thinks aren't worth it for vectorizing *= 5, just different operands, constant instead of the original vector again. (Scalar could still use one LEA). So clearly the cost-model isn't looking as far as final x86 asm machine instructions, but I don't know why arr[i] += arr[i] would be considered more expensive than arr[i] <<= 1; which is exactly the same thing.

GCC8 does vectorize your loop, even with 128-bit vector width: https://godbolt.org/z/5o6qjc7f6

Source https://stackoverflow.com/questions/71811588

QUESTION

Why don't non-strict floating-point models change the value 1 of __STDC_IEC_559__?

Asked 2022-Jan-11 at 13:33

Sample code (t0.c):

...

ANSWER

Answered 2021-Nov-26 at 11:01

-fno-rounding-math is the gcc default. It enables optimisations which assume no-rounding IEEE arithmetic. -frounding-math turns these optimisations off. The switch itself does not change the rounding behaviour, and thus does not change whether the macro is defined.

Source https://stackoverflow.com/questions/70115688

QUESTION

Is it considered normal that f = NAN may cause raising floating-point exceptions?

Asked 2021-Nov-11 at 19:57

C2x (as well as previous):

The macro NAN is defined if and only if the implementation supports quiet NaNs for the float type. It expands to a constant expression of type float representing a quiet NaN.

Sample code (t0a.c)

...

ANSWER

Answered 2021-Nov-11 at 19:57

For the record, this ...

The macro NAN is defined if and only if the implementation supports quiet NaNs for the float type. It expands to a constant expression of type float representing a quiet NaN.

... is C17 7.12/5, and it probably has the same or similar numbering in C2x.

Updated

The fact that when used with MSVC or Clang on Windows, your test program causes the FE_INVALID FP exception to be raised suggests that the combination of

Microsoft's C standard library and runtime environment with
the MSVC and Clang compilers and
the options you are specifying to those

is causing a signaling NaN to be generated and used as an arithmetic operand. I would agree that that is an unexpected result, probably indicating that these combinations fail to fully conform to the C language specification in this area.

has nothing to do with whether the resulting NaN is a quiet or a signaling one. The misconception there is that the FE_INVALID flag would be raised only as a consequence of generating or operating on a signaling NaN. That is not the case.

For one thing, IEEE-754 does not define any case in which a signaling NaN is generated. All defined operations that produce an NaN produce a quiet NaN, including operations in which one operand is a signaling NaN (so MSVC and Clang on Windows almost certainly do produce a quiet NaN as the value of the NAN macro). Most operations with at least one signaling NaN as an operand do, by default, cause the FE_INVALID flag to be raised, but that is not the usual reason for that flag to be raised.

Rather, under default exception handling, the FE_INVALID flag is raised simply because of a request to compute an operation with no defined result, such as infinity times 0. The result will be a quiet NaN. Note that this does not include operations with at least one NaN operand, which do have a defined result: a quiet NaN in many cases, unordered / false for comparisons, and other results in a few cases.

With that for context, it is important to recognize that just because NAN expands to a constant expression (in a conforming C implementation) does not mean that the value of that expression is computed at compile time. Indeed, given the specifications for MSVC's and Clang's strict fp modes, I would expect those modes to disable most, if not all, compile-time computation of FP expressions (or at minimum to propogate FP status flags as if the computations were performed at run time).

Thus, raising FE_INVALID is not necessarily an effect of the assignment in f = NAN. If (as in Microsoft's C standard library) NAN expands to an expression involving arithmetic operations then the exception should be raised as a result of the evaluating that expression, notwithstanding that the resulting NaN is quiet. At least in implementations that claim full conformance with IEC 60559 by defining the __STDC_IEC_559__ feature-test macro.

Therefore, although I will not dispute that

people may wonder: "how writing to memory may cause raising floating-point exceptions?".

, no convincing evidence has been presented to suggest that such causation has been observed.

Nevertheless, the value represented by a particular appearance of NAN in an expression that is evaluated has some kind of physical manifestation. It is plausible for that to be in an FPU register, and storing a signaling NaN from an FPU register to memory indeed could cause a FP exception to be raised on some architectures.

Source https://stackoverflow.com/questions/69929589

QUESTION

clang on Windows / msvc: why under FE_UPWARD printf("%.1f\n", 0.0) prints 0.1 instead of 0.0?

Asked 2021-Oct-18 at 18:15

Sample code (t928.c):

...

ANSWER

Answered 2021-Oct-18 at 18:15

Because it is a bug in the Universal CRT.

Source https://stackoverflow.com/questions/68851345

QUESTION

QNAN passed into C standard library functions (ex. llrintf): not clear whether FP exceptions are raised or not

Asked 2021-Mar-31 at 18:30

The macro NAN from math.h is quiet NAN:

ISO/IEC 9899:2011 (E) (emphasis added):

The macro

NAN

is defined if and only if the implementation supports quiet NaNs for the float type. It expands to a constant expression of type float representing a quiet NaN.

Quiet NaNs usually do not lead to raising of FP exceptions. Examples:

ISO/IEC 9899:2011 (E) (emphasis added):

5.2.4.2.2 Characteristics of floating types

3 A quiet NaN propagates through almost every arithmetic operation without raising a floating-point exception; a signaling NaN generally raises a floating-point exception when occurring as an arithmetic operand.

IEEE 754-2008 (emphasis added):

5.11 Details of comparison predicates Programs that explicitly take account of the possibility of quiet NaN operands may use the unordered-quiet predicates in Table 5.3 which do not signal such an invalid operation exception.

However:

llrintf() is neither arithmetic operation, nor unordered-quiet predicates in Table 5.3. Hence, 5.2.4.2.2.3 and 5.11 are not applicable.
7.12.9.5 The lrint and llrint functions (for example) says nothing about whether FP exceptions are raised or not in case if input is quiet NaN.

Preliminary conclusion: Because of the general practice "quiet NaN does not lead to raising FP exceptions" it can be concluded that the lrint and llrint functions should not lead to raising FP exceptions if input is quiet NaN.

Practice:

Code (t125.c):

...

ANSWER

Answered 2021-Mar-30 at 16:13

The OP was under impression that term conversion implies only implicit conversion and explicit conversion (cast operation) (see C11 6.3 Conversions). Meaning that OP was under impression that term function is not a conversion.

However, C11 F.3 Operators and functions (emphasis added) explicitly states that:

The lrint and llrint functions in provide the IEC 60559 conversions.

Hence, yes, C11 F.4 Floating to integer conversion (emphasis added) answers the question:

F.4 Floating to integer conversion

1 ... Otherwise, if the floating value is infinite or NaN or if the integral part of the floating value exceeds the range of the integer type, then the ‘‘invalid’’ floating-point exception is raised and the resulting value is unspecified. ...

Source https://stackoverflow.com/questions/66816612

QUESTION

floating point rounding giving different results for 80-bit register and 64-bit double: ill-formed code or gcc/clang bug?

Asked 2020-Dec-04 at 21:39

The code given below shows different results, depending on -O or -fno-inline flags. Same (strange) results for g++ 10.1 and 10.2 and clang++ 10 on x86. Is this because the code is ill-formed or is this a genuine bug?

The "invalid" flag in Nakshatra constructor should be set whenever it's nakshatra (double) field is >= 27.0. But, when initialized via Nakshatra(Nirayana_Longitude{360.0}), the flag is not set, even though the the value after scaling becomes exactly 27.0. I assume that the reason is that the argument of 360.0 after scaling becomes 26.9999999999999990008 (raw 0x4003d7fffffffffffdc0) in 80-bit internal register, which is < 27.0, but, being stored as 64-bit double, becomes 27.0. Still, this behavior looks weird: the same nakshatra seems to be simultaneously <27.0 and >= 27.0. Is it the way it's supposed to be?

Is it the expected behaviour because my code contains UB or otherwise ill-formed? Or is it a compiler bug?

Minimal code to reproduce (two .cpp files + one header, could not reproduce with less):

main.cpp:

...

ANSWER

Answered 2020-Dec-04 at 21:39

Without -ffloat-store, GCC targeting x87 does violate the standard: it keeps values un-rounded even across statements. (-mfpmath=387 is the default for -m32). Assignment like double x = y; is supposed to round to actual double in ISO C++, and probably also passing a function arg.

So I think your code is safe for ISO C++ rules, even with the FLT_EVAL_METHOD == 2 that GCC claims to be doing. (https://en.cppreference.com/w/cpp/types/climits/FLT_EVAL_METHOD)

See also https://randomascii.wordpress.com/2012/03/21/intermediate-floating-point-precision/ for more about the real-world issues, with actual compilers for x86.

https://gcc.gnu.org/wiki/x87note doesn't really mention the difference between when GCC rounds vs. when ISO C++ requires rounding, just describes GCC's actual behaviour.

Source https://stackoverflow.com/questions/65150649

QUESTION

darknet make error : gcc: command not found, Makefile: recipe for target failed

Asked 2020-Nov-30 at 18:34

My intention is to do preparation of sound wave file, arrange train process, and test process via sound.c. ran into error during compiling darknet. need your help!

make: gcc: command not found Makefile: 175: recipe for target 'obj/sound.o' failed make: *** [obj/sound.o] Error 127 UBUNTU LTS 18.04 CUDA 11.1

@wbcalex-desktop:~$ sudo apt install gcc [sudo] password for wbcalex: Reading package lists... Done Building dependency tree
Reading state information... Done gcc is already the newest version (4:7.4.0-1ubuntu2.3). The following package was automatically installed and is no longer required: linux-hwe-5.4-headers-5.4.0-47 Use 'sudo apt autoremove' to remove it. 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. desktop:~$ cd darknet_tmp desktop:~/darknet_tmp$ make gcc -Iinclude/ -I3rdparty/stb/include -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wfatal-errors -Wno-unused-result -Wno-unknown-pragmas -fPIC -Ofast -DGPU -DCUDNN -I/usr/local/cudnn/include -fPIC -c ./src/sound.c -o obj/sound.o make: gcc: Command not found Makefile:175: recipe for target 'obj/sound.o' failed make: *** [obj/sound.o] Error 127

...

ANSWER

Answered 2020-Nov-30 at 18:34

PATH variable isn't updated correctly, $PATH is not makefile syntax. Fix:

Source https://stackoverflow.com/questions/65076842

QUESTION

Build a list with a while loop using DataFrame

Asked 2020-Nov-30 at 16:39

Say we're Amazon, and we receive a customer order in the form of this nested dict down here 👇

...

ANSWER

Answered 2020-Nov-30 at 16:39

Not sure why you use pandas - you can do this using simple python with no imports whatsoever:

Source https://stackoverflow.com/questions/65076953

QUESTION

Writing 'Image.Image.paste()' to new file

Asked 2020-Nov-16 at 19:12

I need to paste first picture on second and save new picture as new file. How I can do it? My not-working code in the bottom.

...

ANSWER

Answered 2020-Nov-16 at 19:12

I didn't see anything wrong but I think you can save the image without using the open method. Also, you can paste on your mat image and save it as another image with save

Source https://stackoverflow.com/questions/64864028

QUESTION

How to write fast c++ lazy evaluation code in Fastor or Xtensor?

Asked 2020-Oct-11 at 10:40

I am a newbie in c++, and heard that libraries like eigen, blaze, Fastor and Xtensor with lazy-evaluation and simd are fast for vectorized operation.

I measured the time collapsed in some doing basic numeric operation by the following function:

(Fastor)

...

ANSWER

Answered 2020-Oct-11 at 10:40

The reason the Numpy implementation is much faster is that it does not compute the same thing as the two others.

Indeed, the python version does not read z in the expression np.sin(x) * np.cos(x). As a result, the Numba JIT is clever enough to execute the loop only once justifying a factor of 100 between Fastor and Numba. You can check that by replacing range(100) by range(10000000000) and observing the same timings.

Finally, XTensor is faster than Fastor in this benchmark as it seems to use its own fast SIMD implementation of exp/sin/cos while Fastor seems to use a scalar implementation from libm justifying the factor of 2 between XTensor and Fastor.

Answer to the update:

Fastor/Xtensor performs really bad in exp, sin, cos, which was surprising.

No. We cannot conclude that from the benchmark. What you are comparing is the ability of compilers to optimize your code. In this case, Numba is better than plain C++ compilers as it deals with a high-level SIMD-aware code while C++ compilers have to deals with a huge low-level template-based code coming from the Fastor/Xtensor libraries. Theoretically, I think that it should be possible for a C++ compiler to apply the same kind of high-level optimization than Numba, but it is just harder. Moreover, note that Numpy tends to create/allocate temporary arrays while Fastor/Xtensor should not.

In practice, Numba is faster because u is a constant and so is exp(u), sin(u) and cos(u). Thus, Numba precompute the expression (computed only once) and still perform the sum in the loop. The following code give the same timing:

Source https://stackoverflow.com/questions/64293139

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install FFP

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: