gdrcopy | fast GPU memory copy library based on NVIDIA GPUDirect RDMA | GPU library
kandi X-RAY | gdrcopy Summary
kandi X-RAY | gdrcopy Summary
While GPUDirect RDMA is meant for direct access to GPU memory from third-party devices, it is possible to use these same APIs to create perfectly valid CPU mappings of the GPU memory. The advantage of a CPU driven copy is the very small overhead involved. That might be useful when low latencies are required.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of gdrcopy
gdrcopy Key Features
gdrcopy Examples and Code Snippets
Community Discussions
Trending Discussions on gdrcopy
QUESTION
I'm using OpenMPI and I need to enable CUDA aware MPI. Together with MPI I'm using OpenACC with the hpc_sdk software.
Following https://www.open-mpi.org/faq/?category=buildcuda I downloaded and installed UCX (not gdrcopy, I haven't managed to install it) with
./contrib/configure-release --with-cuda=/opt/nvidia/hpc_sdk/Linux_x86_64/20.7/cuda/11.0 CC=pgcc CXX=pgc++ --disable-fortran
and it prints:
...ANSWER
Answered 2020-Oct-09 at 20:15This was an issue in the 20.7 release when adding UCX support. You can lower the optimization level to -O1 work around the problem, or update your NV HPC compiler version to 20.9 where we've resolved the issue.
https://developer.nvidia.com/nvidia-hpc-sdk-version-209-downloads
QUESTION
I try to get a MPI-CUDA program working with MVAPICH CUDA8. I did run the program successfully with openMPI before but I want to test if I get better performance with MVAPICH. Unfortunately the program gets stuck in MPI_Isend if a CUDA kernel is running at the same time when using MVAPICH.
I downloaded MVAPICH2-2.2 and built it from the source with the configuration flags
--enable-cuda --disable-mcast
to enable MPI calls on cuda memory. mcast was disabled because I could not compile it without the flag.
I used the following flags before running the application:
...ANSWER
Answered 2017-Mar-20 at 16:11I got back to this problem and used gdb to debug the code.
Apparently, the problem is the eager protocol of MVAPICH2 implemented in src/mpid/ch3/channels/mrail/src/gen2/ibv_send.c. The eager protocol uses a cuda_memcpy without async, which blocks until the kernel execution finishes.
The program posted in the question runs fine by passing MV2_IBA_EAGER_THRESHOLD 1 to mpirun. This prevents MPI to use the eager protocol and uses the rendez-vous protocol instead.
Patching the MVAPICH2 source code does solve the problem as well. I changed the synchronous cudaMemcpys to cudaMemcpyAsync in the files
- src/mpid/ch3/channels/mrail/src/gen2/ibv_send.c
- src/mpid/ch3/channels/mrail/src/gen2/ibv_recv.c
- src/mpid/ch3/src/ch3u_request.c
The change in the third file is only needed for MPI_Isend/MPI_Irecv. Other MPI functions might need some additional code changes.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install gdrcopy
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page