HIPIFY | HIPIFY: Convert CUDA to Portable C++ Code | GPU library

by ROCm-Developer-Tools C++ Version: rocm-5.5.1 License: MIT

X-Ray Key Features Code Snippets Community Discussions(1)Vulnerabilities Install Support

kandi X-RAY | HIPIFY Summary

HIPIFY is a C++ library typically used in Hardware, GPU applications. HIPIFY has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

hipify-clang is a clang-based tool for translating CUDA sources into HIP sources. It translates CUDA source into an abstract syntax tree, which is traversed by transformation matchers. After applying all the matchers, the output HIP source is produced.

Support

Quality

Security

License

Reuse

Support

HIPIFY has a low active ecosystem.

It has 251 star(s) with 48 fork(s). There are 20 watchers for this library.

It had no major release in the last 12 months.

There are 22 open issues and 174 have been closed. On average issues are closed in 42 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of HIPIFY is rocm-5.5.1

Quality

HIPIFY has 0 bugs and 0 code smells.

Security

HIPIFY has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

HIPIFY code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

HIPIFY is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

HIPIFY releases are available to install and integrate.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of HIPIFY

Get all kandi verified functions for this library.

HIPIFY Key Features

No Key Features are available at this moment for HIPIFY.

HIPIFY Examples and Code Snippets

No Code Snippets are available at this moment for HIPIFY.

Community Discussions

Trending Discussions on HIPIFY

What are the requirements for using `shfl` operations on AMD GPU using HIP C++?

QUESTION

What are the requirements for using `shfl` operations on AMD GPU using HIP C++?

Asked 2017-Jul-17 at 05:03

There is AMD HIP C++ which is very similar to CUDA C++. Also AMD created Hipify to convert CUDA C++ to HIP C++ (Portable C++ Code) which can be executed on both nVidia GPU and AMD GPU: https://github.com/GPUOpen-ProfessionalCompute-Tools/HIP

There are requirements to use shfl operations on nVidia GPU: https://github.com/GPUOpen-ProfessionalCompute-Tools/HIP/tree/master/samples/2_Cookbook/4_shfl#requirement-for-nvidia

requirement for nvidia

please make sure you have a 3.0 or higher compute capable device in order to use warp shfl operations and add -gencode arch=compute=30, code=sm_30 nvcc flag in the Makefile while using this application.

Also noted that HIP supports shfl for 64 wavesize (WARP-size) on AMD: https://github.com/GPUOpen-ProfessionalCompute-Tools/HIP/blob/master/docs/markdown/hip_faq.md#why-use-hip-rather-than-supporting-cuda-directly

In addition, HIP defines portable mechanisms to query architectural features, and supports a larger 64-bit wavesize which expands the return type for cross-lane functions like ballot and shuffle from 32-bit ints to 64-bit ints.

But which of AMD GPUs does support functions shfl, or does any AMD GPU support shfl because on AMD GPU it implemented by using Local-memory without hardware instruction register-to-register?

nVidia GPU required 3.0 or higher compute capable (CUDA CC), but what are the requirements for using shfl operations on AMD GPU using HIP C++?

...

ANSWER

Answered 2017-Mar-06 at 14:35

Yes, there are new instructions in GPU GCN3 such as ds_bpermute and ds_permute which can provide the functionality such as __shfl() and even more
These ds_bpermute and ds_permute instructions use only route of Local memory (LDS 8.6 TB/s), but don't actually use Local memory, this allows to accelerate data exchange between threads: 8.6 TB/s < speed < 51.6 TB/s: http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/

They use LDS hardware to route data between the 64 lanes of a wavefront, but they don’t actually write to an LDS location.

Also there are Data-Parallel Primitives (DPP) - is especially powerful when you can use it since an op can read registers of neighboring workitems directly. I.e. DPP can access to neighboring thread (workitem) at full speed ~51.6 TB/s

http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/

now, most of the vector instructions can do cross-lane reading at full throughput.

For example, wave_shr-instruction (Wavefront shift right) for Scan algorithm:

More about GCN3: https://github.com/olvaffe/gpu-docs/raw/master/amd-open-gpu-docs/AMD_GCN3_Instruction_Set_Architecture.pdf

New Instructions

“SDWA” – Sub Dword Addressing allows access to bytes and words of VGPRs in VALU instructions.

“DPP” – Data Parallel Processing allows VALU instructions to access data from neighboring lanes.

DS_PERMUTE_RTN_B32, DS_BPERMPUTE_RTN_B32.

...

DS_PERMUTE_B32 Forward permute. Does not write any LDS memory.

Source https://stackoverflow.com/questions/42468984

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install HIPIFY

You can download it from GitHub.

Support

To generate the above documentation with the actual information about all supported CUDA APIs in Markdown format, run hipify-clang --md with or without output directory specifying (-o).

Find more information at: