xp | Please see github

by kidoman Go Version: Current License: No License

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | xp Summary

xp is a Go library. xp has no bugs, it has no vulnerabilities and it has low support. You can download it from GitHub.

xp is a tool created to make practising extreme programming easier.

Support

Quality

Security

License

Reuse

Support

xp has a low active ecosystem.

It has 16 star(s) with 5 fork(s). There are no watchers for this library.

It had no major release in the last 6 months.

There are 3 open issues and 1 have been closed. On average issues are closed in 2 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of xp is current.

Quality

xp has 0 bugs and 0 code smells.

Security

xp has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

xp code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

xp does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

xp releases are not available. You will need to build from source code and install.

Installation instructions, examples and code snippets are available.

It has 837 lines of code, 29 functions and 3 files.

It has high code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of xp

Get all kandi verified functions for this library.

xp Key Features

No Key Features are available at this moment for xp.

xp Examples and Code Snippets

No Code Snippets are available at this moment for xp.

Community Discussions

Trending Discussions on xp

Why does gcc -march=znver1 restrict uint64_t vectorization?

Genera: Unlocking a Package

Easy way of managing the recycling of C++ STL vectors of POD types

What is the correct intrinsic sequence to do PSRLDQ to an XMM register while keeping the YMM part unchanged?

Parse multiple element XML values into a R dataframe

Cannot run Carlini and Wagner Attack using foolbox on a tensorflow Model

Remove duplicates based on few columns and keep numeric numeric value if any and keep NA if there is no numeric in R

how to populate the upper part of a semicircle?

How to compile C++ app for Windows XP in MSVS?

Snowflake SQL Query taking too much time to run when trying to apply multiple joins

QUESTION

Why does gcc -march=znver1 restrict uint64_t vectorization?

Asked 2022-Apr-10 at 02:47

I'm trying to make sure gcc vectorizes my loops. It turns out, that by using -march=znver1 (or -march=native) gcc skips some loops even though they can be vectorized. Why does this happen?

In this code, the second loop, which multiplies each element by a scalar is not vectorised:

...

ANSWER

Answered 2022-Apr-10 at 02:47

The default -mtune=generic has -mprefer-vector-width=256, and -mavx2 doesn't change that.

znver1 implies -mprefer-vector-width=128, because that's all the native width of the HW. An instruction using 32-byte YMM vectors decodes to at least 2 uops, more if it's a lane-crossing shuffle. For simple vertical SIMD like this, 32-byte vectors would be ok; the pipeline handles 2-uop instructions efficiently. (And I think is 6 uops wide but only 5 instructions wide, so max front-end throughput isn't available using only 1-uop instructions). But when vectorization would require shuffling, e.g. with arrays of different element widths, GCC code-gen can get messier with 256-bit or wider.

And vmovdqa ymm0, ymm1 mov-elimination only works on the low 128-bit half on Zen1. Also, normally using 256-bit vectors would imply one should use vzeroupper afterwards, to avoid performance problems on other CPUs (but not Zen1).

I don't know how Zen1 handles misaligned 32-byte loads/stores where each 16-byte half is aligned but in separate cache lines. If that performs well, GCC might want to consider increasing the znver1 -mprefer-vector-width to 256. But wider vectors means more cleanup code if the size isn't known to be a multiple of the vector width.

Ideally GCC would be able to detect easy cases like this and use 256-bit vectors there. (Pure vertical, no mixing of element widths, constant size that's am multiple of 32 bytes.) At least on CPUs where that's fine: znver1, but not bdver2 for example where 256-bit stores are always slow due to a CPU design bug.

You can see the result of this choice in the way it vectorizes your first loop, the memset-like loop, with a vmovdqu [rdx], xmm0. https://godbolt.org/z/E5Tq7Gfzc

So given that GCC has decided to only use 128-bit vectors, which can only hold two uint64_t elements, it (rightly or wrongly) decides it wouldn't be worth using vpsllq / vpaddd to implement qword *5 as (v<<2) + v, vs. doing it with integer in one LEA instruction.

Almost certainly wrongly in this case, since it still requires a separate load and store for every element or pair of elements. (And loop overhead since GCC's default is not to unroll except with PGO, -fprofile-use. SIMD is like loop unrolling, especially on a CPU that handles 256-bit vectors as 2 separate uops.)

I'm not sure exactly what GCC means by "not vectorized: unsupported data-type". x86 doesn't have a SIMD uint64_t multiply instruction until AVX-512, so perhaps GCC assigns it a cost based on the general case of having to emulate it with multiple 32x32 => 64-bit pmuludq instructions and a bunch of shuffles. And it's only after it gets over that hump that it realizes that it's actually quite cheap for a constant like 5 with only 2 set bits?

That would explain GCC's decision-making process here, but I'm not sure it's exactly the right explanation. Still, these kinds of factors are what happen in a complex piece of machinery like a compiler. A skilled human can easily make smarter choices, but compilers just do sequences of optimization passes that don't always consider the big picture and all the details at the same time.

-mprefer-vector-width=256 doesn't help: Not vectorizing uint64_t *= 5 seems to be a GCC9 regression

(The benchmarks in the question confirm that an actual Zen1 CPU gets a nearly 2x speedup, as expected from doing 2x uint64 in 6 uops vs. 1x in 5 uops with scalar. Or 4x uint64_t in 10 uops with 256-bit vectors, including two 128-bit stores which will be the throughput bottleneck along with the front-end.)

Even with -march=znver1 -O3 -mprefer-vector-width=256, we don't get the *= 5 loop vectorized with GCC9, 10, or 11, or current trunk. As you say, we do with -march=znver2. https://godbolt.org/z/dMTh7Wxcq

We do get vectorization with those options for uint32_t (even leaving the vector width at 128-bit). Scalar would cost 4 operations per vector uop (not instruction), regardless of 128 or 256-bit vectorization on Zen1, so this doesn't tell us whether *= is what makes the cost-model decide not to vectorize, or just the 2 vs. 4 elements per 128-bit internal uop.

With uint64_t, changing to arr[i] += arr[i]<<2; still doesn't vectorize, but arr[i] <<= 1; does. (https://godbolt.org/z/6PMn93Y5G). Even arr[i] <<= 2; and arr[i] += 123 in the same loop vectorize, to the same instructions that GCC thinks aren't worth it for vectorizing *= 5, just different operands, constant instead of the original vector again. (Scalar could still use one LEA). So clearly the cost-model isn't looking as far as final x86 asm machine instructions, but I don't know why arr[i] += arr[i] would be considered more expensive than arr[i] <<= 1; which is exactly the same thing.

GCC8 does vectorize your loop, even with 128-bit vector width: https://godbolt.org/z/5o6qjc7f6

Source https://stackoverflow.com/questions/71811588

QUESTION

Genera: Unlocking a Package

Asked 2022-Feb-22 at 10:55

There isn't a Genera topic on stackoverflow, but I thought I'd take the chance that one of the (probably) 5 people in the world using it might be here; no harm in trying.

I've run into the situation where a few of the systems I'm working with use pretty printing, which is not implemented on Genera. I've managed to work around the problem in my own system by using the predecessor of pretty printing, XP. Looking at the code in xp-code.lisp and comparing it to that in CCL, it's clear where CCL got its pretty printing functions from.

One solution, now proving inadequate, is to have a top-level eval that does an (xp::install :package my-package) and resume from the redefinition warnings. The problem is that when one of the third-party systems is compiled, they too complain about pretty printing features that are not implemented, so I'd have to install XP in each of these other packages that want pretty printing.

What really needs to happen is for XP to be installed in the common-lisp package, because all of these other systems are going to :use :cl and expect to have a fully functional pretty printing system.

That's not so easy though; the CL package is locked and each XP symbol requires multiple confirms, and a type 'yes', to get it into the CL package. The documentation for External-only Packages and Locking suggests that:

To set up an external-only package, it can be temporarily unlocked and then the desired set of symbols interned in it

but no where does it say how to unlock a package, and the Document Examiner isn't turning up much.

I also have to stop and wonder if I'm barking up the wrong tree. XP was written with Genera in mind, and there are conditionalisations in the code for the platform. It shouldn't be so hard to install using the install function; and I wonder if I'm missing something obvious.

Does anyone out there know how to unlock the CL package, or the proper way install XP in Genera? The included instructions for XP appear to be out of date.

...

ANSWER

Answered 2022-Feb-22 at 10:55

I figured it out:

Source https://stackoverflow.com/questions/69265032

QUESTION

Easy way of managing the recycling of C++ STL vectors of POD types

Asked 2022-Jan-26 at 06:29

My application consists of calling dozens of functions millions of times. In each of those functions, one or a few temporary std::vector containers of POD (plain old data) types are initialized, used, and then destructed. By profiling my code, I find the allocations and deallocations lead to a huge overhead.

A lazy solution is to rewrite all the functions as functors containing those temporary buffer containers as class members. However this would blow up the memory consumption as the functions are many and the buffer sizes are not trivial.

A better way is to analyze the code, gather all the buffers, premeditate how to maximally reuse them, and feed a minimal set of shared buffer containers to the functions as arguments. But this can be too much work.

I want to solve this problem once for all my future development during which temporary POD buffers become necessary, without having to have much premeditation. My idea is to implement a container port, and take the reference to it as an argument for every function that may need temporary buffers. Inside those functions, one should be able to fetch containers of any POD type from the port, and the port should also auto-recall the containers before the functions return.

...

ANSWER

Answered 2022-Jan-20 at 17:21

Let me frame this by saying I don't think there's an "authoritative" answer to this question. That said, you've provided enough constraints that a suggested path is at least worthwhile. Let's review the requirements:

Solution must use std::vector. This is in my opinion the most unfortunate requirement for reasons I won't get into here.
Solution must be standards compliant and not resort to rule violations, like the strict aliasing rule.
Solution must either reduce the number of allocations performed, or reduce the overhead of allocations to the point of being negligible.

In my opinion this is definitely a job for a custom allocator. There are a couple of off-the-shelf options that come close to doing what you want, for example the Boost Pool Allocators. The one you're most interested in is boost::pool_allocator. This allocator will create a singleton "pool" for each distinct object size (note: not object type), which grows as needed, but never shrinks until you explicitly purge it.

The main difference between this and your solution is that you'll have distinct pools of memory for objects of different sizes, which means it will use more memory than your posted solution, but in my opinion this is a reasonable trade-off. To be maximally efficient, you could simply start a batch of operations by creating vectors of each needed type with an appropriate size. All subsequent vector operations which use these allocators will do trivial O(1) allocations and deallocations. Roughly in pseudo-code:

Source https://stackoverflow.com/questions/70765195

QUESTION

What is the correct intrinsic sequence to do PSRLDQ to an XMM register while keeping the YMM part unchanged?

Asked 2022-Jan-24 at 06:11

Assuming xmm0 is the first argument, this is the kind of code I want to produce.

...

ANSWER

Answered 2022-Jan-23 at 18:07

I was a fool. clang gave me an answer, and why didn't I notice it?

Source https://stackoverflow.com/questions/70823724

QUESTION

Parse multiple element XML values into a R dataframe

Asked 2022-Jan-23 at 23:09

I have an XML like:

...

ANSWER

Answered 2022-Jan-23 at 23:09

This is a solution I think with xml2 and a quick and verbose tidy verse approach.

Not always easy to have a compact code when importing nested xml data.

Source https://stackoverflow.com/questions/70825880

QUESTION

Cannot run Carlini and Wagner Attack using foolbox on a tensorflow Model

Asked 2022-Jan-13 at 12:24

I am using the latest version of foolbox (3.3.1), and my code simply load a RESNET-50 CNN, adds some layers for a transferred learning application, and loads the weights as follows.

...

ANSWER

Answered 2021-Nov-23 at 12:13

I think you might have mixed up the parameters of the L2CarliniWagnerAttack. Here is a simplified working example with dummy data:

Source https://stackoverflow.com/questions/70078251

QUESTION

Remove duplicates based on few columns and keep numeric numeric value if any and keep NA if there is no numeric in R

Asked 2022-Jan-09 at 12:52

I have a dataframe below and I want to remove duplicates based on columns country and year, and keep the non NA values for the columns 3 to the last column. If all rows within (country, year) are NA, the value for the row should be an NA as well.

...

ANSWER

Answered 2022-Jan-09 at 12:50

A possible solution, using all(is.na(.x)) to detect when all elements inside the grouped column are NA:

Source https://stackoverflow.com/questions/70641340

QUESTION

how to populate the upper part of a semicircle?

Asked 2021-Dec-18 at 17:52

so Im trying to make an interactive button that would be placed on the bottom of my screen and when clicked, a semicircle is created around it. this has buttons inside, so its kind of a navigation menu. Im now struggling with the math behind it. Right now, the buttons are distributed all around the circle, however i want them to only be placed in the upper part of my semicircle. This is the code i have so far:

...

ANSWER

Answered 2021-Dec-18 at 17:52

Both the top and left calculations need adjusting so there isn't so much movement:

Source https://stackoverflow.com/questions/70403917

QUESTION

How to compile C++ app for Windows XP in MSVS?

Asked 2021-Dec-10 at 06:48

As I read this article, it is enough to download most recent MSVS 2022 and then install toolset C++ Windows XP Support for VS 2017 (v141) tools [Deprecated].

After that in Visual Studio inside project properties I set this toolset. According to linked article it is enough to compile C++ app with XP support.

But after my .exe file is created if I run it on XP 64-bit SP2 then it shows error that CompareStringEx function is not found in KERNEL32.DLL.

Hence it appears that it is not enough to use this toolset. Something else is needed.

In some other places I see that one needs also to add define /D_USING_V110_SDK71_ when compiling and option /SUBSYSTEM:CONSOLE,5.01 when linking. In my project properties I also tried to add this two options, but still CompareStringEx is inside import table of final application.

As suggested by @BenVoigt, I did defines /DWINVER=0x0502 /D_WIN32_WINNT=0x0502. Also set C++ standard to /std:c++14 (I would set C++11 but this MSVS version allows to set only C++14 at minimum). Still some non-XP symbols remain in final EXE like InitializeSRWLock that is possibly used by C++11's std::mutex in my code.

Does anyone know everything what is needed in order to compile fully XP-compatible application?

Update. I managed to build working XP application by doing things above plus setting C++ CRT runtime to Multi Threaded DLL, i.e. using dynamic DLL linkage of CRT. Also as suggested by @ChuckWalbourn, I downloaded older version of msvcp140.dll.

But it is very important for my project to have statically linked runtime (C++ CRT), i.e. use Multi Threaded value for Runtime field in project properties. Only if it is REALLY not possible only then I will use DLL CRT. Until then solution about how to link CRT statically are welcome, of course to produce XP-compatible EXE.

...

ANSWER

Answered 2021-Dec-10 at 06:48

TL;DR For Window XP VC++ REDIST support, install https://aka.ms/vs/15/release/VC_redist.x86.exe on your Windows XP system

-or-

if you are doing "side-by-side application local deployment", then use the DLLs from C:\Program Files\Microsoft Visual Studio\2022\\VC\Redist\MSVC\14.16.27012\x86\Microsoft.VC141.CRT.

If you want the latest bug fixes to the CRT, you can also download the REDIST for VS 2019 (16.7) per the link on Microsoft Docs.

For Windows XP targeting, you use the v141_xp Platform Toolset installed by Visual Studio (VS 2017, VS 2019, or VS 2022) which is the latest VS 2017 (v141) C++ compiler using an included Windows 7.1A SDK.

Make sure you have installed (for VS 2022) the following individual components since you are using MFC:

Microsoft.VisualStudio.Component.WinXP: C++ Windows XP Support for VS 2017 (v141) tools [Deprecated]
Microsoft.VisualStudio.Component.VC.v141.x86.x64: MSVC v141 - VS 2017 C++ x64/x86 build tools (v14.16)
Microsoft.VisualStudio.Component.VC.v141.MFC: C++ MFC for v141 build tools (x86 & x64)

If you are doing DirectX development, be sure to read this blog post as well for various implications of using the Windows 7.1A SDK.

For deployment to Windows XP, you can install the latest VS 2017 Visual C++ REDIST or use VS 2019 Visual C++ up to VS 2019 (16.7). After that the REDIST DLLs themselves are not compatible with Windows XP.

On your development system with VS 2022 installed, you are going to have a newer set of Visual C++ REDIST files which are binary compatible with your v141_xp Platform Toolset built EXE, but those VC++ REDIST DLLs are not compatible with Windows XP.

IOW: If you look at a dumpbin /imports of the 14.30 (v143 version), 14.29 (v142 latest version), and/or 14.16 (v141 latest version ) copies of msvcp140.dll you will see different imports. The msvcp140.dll sitting in your C:\windows\SysWOW64 folder is going to be the 14.30 version.

Source https://stackoverflow.com/questions/70263892

QUESTION

Snowflake SQL Query taking too much time to run when trying to apply multiple joins

Asked 2021-Nov-26 at 23:40

I am trying to work with a sql query on snowflake where I am trying to join multiple tables but my query is taking forever to run, I am not sure whether its my query or may be I have taken some wrong approach.

I have these below tables in snowflake -

1)RR_SUMM, 2) YY_TEXTENTR, 3) KK_SUBEVT, 4) LG_NBETR, 5) XX_RPOPO, 6) VV_KJIU, 7) LL_JJHHHIP, 8) UU_GHGGHJ,
9) QQ_BHBHGGG, 10) TT_HJHHSY

So RR_SUMM is my primary table

and each table consists of a common column labelled as "_ID"

My Goal is to join all the other 9 tables with primary table RR_SUMM using _ID column

as I am trying to extract and combine some of the fields from each table with the primary table.

I am following the approach of applying left outer join to combine all the other tables with primary table RR_SUMM

But my approach is taking forever to run as most of the tables are of around 25 GB in size.

SQL query Which I have written in SNOWFLAKE is below-

...

ANSWER

Answered 2021-Nov-26 at 21:12

At one level GROUPING by all columns is the same as DISTINCT.

But given you are rolling it all up, to only get one of each, you can push the DISTINCTs lower into the queries, and the joins should have not duplicate values

Source https://stackoverflow.com/questions/70118655

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install xp

The simplest way to install xp in your dev environment is:. brew will be added as an option at a later date.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: