nccl | Optimized primitives for collective multi-GPU communication | TCP library

by NVIDIA C++ Version: v1.3.4-1 License: Non-SPDX

X-Ray Key Features Code Snippets(3)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | nccl Summary

nccl is a C++ library typically used in Networking, TCP applications. nccl has no bugs, it has no vulnerabilities and it has medium support. However nccl has a Non-SPDX License. You can download it from GitHub.

NCCL (pronounced "Nickel") is a stand-alone library of standard communication routines for GPUs, implementing all-reduce, all-gather, reduce, broadcast, reduce-scatter, as well as any send/receive based communication pattern. It has been optimized to achieve high bandwidth on platforms using PCIe, NVLink, NVswitch, as well as networking using InfiniBand Verbs or TCP/IP sockets. NCCL supports an arbitrary number of GPUs installed in a single node or across multiple nodes, and can be used in either single- or multi-process (e.g., MPI) applications. For more information on NCCL usage, please refer to the NCCL documentation.

Support

Quality

Security

License

Reuse

Support

nccl has a medium active ecosystem.

It has 2218 star(s) with 592 fork(s). There are 135 watchers for this library.

It had no major release in the last 12 months.

There are 252 open issues and 490 have been closed. On average issues are closed in 14 days. There are 36 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of nccl is v1.3.4-1

Quality

nccl has no bugs reported.

Security

nccl has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

nccl has a Non-SPDX License.

Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

Reuse

nccl releases are available to install and integrate.

Installation instructions, examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of nccl

Get all kandi verified functions for this library.

nccl Key Features

No Key Features are available at this moment for nccl.

nccl Examples and Code Snippets

Broadcast tensor .

python

Lines of Code : 62

License : Non-SPDX (Apache License 2.0)

Copy

def broadcast_send(t,
                   shape,
                   dtype,
                   group_size,
                   group_key,
                   instance_key,
                   communication_hint='auto',
                   timeout=0):
  """

Alluce reduce_v2 .

python

Lines of Code : 62

License : Non-SPDX (Apache License 2.0)

Copy

def all_reduce_v2(t,
                  group_size,
                  group_key,
                  instance_key,
                  merge_op='Add',
                  final_op='Id',
                  communication_hint='auto',
                  timeout=

Performs a group - reduce operation .

python

Lines of Code : 50

License : Non-SPDX (Apache License 2.0)

Copy

def all_reduce(t,
               group_size,
               group_key,
               instance_key,
               merge_op='Add',
               final_op='Id',
               subdiv_offsets=(0,),
               communication_hint='auto',

Community Discussions

Trending Discussions on nccl

NVProf for NCCL program

CUML fit functions throwing cp.full TypeError

Docker shared memory size out of bounds or unhandled system error, NCCL version 2.7.8

Could not load dynamic library libcuda.so.1 error on Google AI Platform with custom container

Pytorch DDP get stuck in getting free port

Memory allocation error on worker 0: std::bad_alloc: CUDA error

XGBoostError: [10:10:03] /workspace/src/tree/updater_gpu_hist.cu:1407: Exception in gpu_hist: NCCL failure

I cannot install cupy with pip

ImportError: Please install apex from https://www.github.com/nvidia/apex to use distributed and fp16 training

Tensorflow what is the tf.contrib.nccl.allsum in new version?

QUESTION

NVProf for NCCL program

Asked 2021-May-28 at 15:37

When I want to use NVProf for NCCL problem with --metrics all, The profiling results always return me like

...

ANSWER

Answered 2021-May-28 at 15:37

That behavior is expected.

events, metrics, that are gathered by default pertain to CUDA device code activity. To see something that might be instructive, try profiling with --print-gpu-trace switch (and remove --metrics all).

The documented "metrics" don't apply to the operations (data copying) that NCCL is doing. They apply to CUDA kernels (i.e. CUDA device code activity).

nvprof does seem to have metrics that can be collected for NVLink activity. To see these, on a system that is applicable (e.g. has NVLink), run a command such as:

Source https://stackoverflow.com/questions/67710465

QUESTION

CUML fit functions throwing cp.full TypeError

Asked 2021-May-06 at 17:13

I've been trying to run RAPIDS on Google Colab pro, and have successfully installed the cuml and cudf packages, however I am unable to run even the example scripts.

TLDR;

Anytime I try to run the fit function for cuml on Google Colab I get the following error. I get this when using the demo examples both for installation and then for cuml. This happens for a range of cuml examples (I first hit this trying to run UMAP).

...

ANSWER

Answered 2021-May-06 at 17:13

Colab retains cupy==7.4.0 despite conda installing cupy==8.6.0 during the RAPIDS install. It is a custom install. I just had success pip installing cupy-cuda110==8.6.0 BEFORE installing RAPIDS, with

!pip install cupy-cuda110==8.6.0:

I'll be updating the script soon so that you won't have to do it manually, but want to test a few more things out. Thanks again for letting us know!

EDIT: script updated.

Source https://stackoverflow.com/questions/67368715

QUESTION

Docker shared memory size out of bounds or unhandled system error, NCCL version 2.7.8

Asked 2021-Apr-13 at 05:55

The following error(s) and solution go for deploying a stack through YAML in portainer but they can surely be applied to docker otherwise.

Environment:

...

ANSWER

Answered 2021-Apr-13 at 05:55

It seems that by default, the size of the shared memory is limited to 64mb. The solution to this error therefore, as shown in this issue is to increase the size of shared memory.

Hence, the first idea that comes to mind would be simply defining something like shm_size: 9gb in the YAML file of the stack. However, this might not work as shown for e.g in this issue.

Therefore, in the end, I had to use the following workaround (also described here, but poorly documented):

Source https://stackoverflow.com/questions/67056737

QUESTION

Could not load dynamic library libcuda.so.1 error on Google AI Platform with custom container

Asked 2021-Mar-11 at 01:46

I'm trying to launch a training job on Google AI Platform with a custom container. As I want to use GPUs for the training, the base image I've used for my container is:

...

ANSWER

Answered 2021-Mar-11 at 01:05

The suggested way to build the most reliable container is to use the officially maintained 'Deep Learning Containers'. I would suggest pulling 'gcr.io/deeplearning-platform-release/tf2-gpu.2-4'. This should already have CUDA, CUDNN, GPU Drivers, and TF 2.4 installed & tested. You'll just need to add your code into it.

Source https://stackoverflow.com/questions/66550195

QUESTION

Pytorch DDP get stuck in getting free port

Asked 2021-Feb-25 at 00:32

I try to get a free port in DDP initialization of PyTorch. However, my code get stuck. The following snippet could repeat my description:

...

ANSWER

Answered 2021-Feb-25 at 00:32

The answer is derived from here. The detailed answer is: 1. Since each free port is generated from individual process, ports are different in the end; 2. We could get a free port at the beginning and pass it to processes.

The corrected snippet:

Source https://stackoverflow.com/questions/66348957

QUESTION

Memory allocation error on worker 0: std::bad_alloc: CUDA error

Asked 2020-Nov-17 at 22:25

ENVIRONMENT

followed guide - https://github.com/rapidsai-community/notebooks-contrib/blob/branch-0.14/intermediate_notebooks/E2E/synthetic_3D/rapids_ml_workflow_demo.ipynb
conda create -n rapids-0.16 -c rapidsai -c nvidia -c conda-forge -c defaults rapids=0.16 python=3.7 cudatoolkit=10.2
AWS EC2: Deep Learning AMI (Ubuntu 18.04) Version 36.0 - ami-063585f0e06d22308: MXNet-1.7.0, TensorFlow-2.3.1, 2.1.0 & 1.15.3, PyTorch-1.4.0 & 1.7.0, Neuron, & others. NVIDIA CUDA, cuDNN, NCCL, Intel MKL-DNN, Docker, NVIDIA-Docker & EFA support. For fully managed experience, check: https://aws.amazon.com/sagemaker
AWS EC2 instance - g4dn.4xlarge - 16GB VRAM, 64 GB RAM

CODE

I am just trying to gave a trainign and a test set for the model
1st data package - train_data = xgboost.DMatrix(data=X_train, label=y_train) Up until I run just this and do training and anything with, only this does not gives an error message
2nd data package - test_data = xgboost.DMatrix(data=X_test, label=y_test) couple cells down the line, they are not executed together

Side Note

ERROR GB VRAM sizes are NOT 30GB or 15GB
- 1 539 047 424 = 1.5 GB,
- 3 091 258 960 = 3 GB,
- 3 015 442 432 = 3GB,
- 3 091 258 960 = 3 GB.
- The GPU has 16 GB VRAM, so I don't think that this answers the question.

ERROR

...

ANSWER

Answered 2020-Nov-17 at 19:17

as per this part of your error,

Source https://stackoverflow.com/questions/64879009

QUESTION

XGBoostError: [10:10:03] /workspace/src/tree/updater_gpu_hist.cu:1407: Exception in gpu_hist: NCCL failure

Asked 2020-Oct-29 at 16:28

PROJECT

Nvidia Developer project
in a Google Collab environment

MY CODE

...

ANSWER

Answered 2020-Oct-29 at 16:28

The problem is library incompatibility. This docker container have solved my problem:

https://github.com/Kaggle/docker-python/commit/a6ba32e0bb017a30e079cf8bccab613cd4243a5f

Source https://stackoverflow.com/questions/64589547

QUESTION

I cannot install cupy with pip

Asked 2020-Mar-27 at 16:44

I have attached the error message because I have no idea where to start with it. I have tried updating setuptools and purging and reinstalling pip.

I am running Linux Mint 19.3 Cinnamon 4.4.8.

If anyone has experienced this problem or has any suggestions for solutions, answers are much appreciated.

...

ANSWER

Answered 2020-Mar-27 at 16:44

For the Python.h error, you probably need to install python3-dev (Debian/Ubuntu/Mint) or python3-devel (Fedora/CentOS/RHEL) using your operating system's package manager like apt or dnf.

For the other missing .h's, you can usually google for:

Source https://stackoverflow.com/questions/60889532

QUESTION

ImportError: Please install apex from https://www.github.com/nvidia/apex to use distributed and fp16 training

Asked 2020-Mar-06 at 16:11

cannot install apex for distributed and fp16 training of bert model i have tried to install by cloning the apex from github and tried to install packages using pip

i have tried to install apex by cloning from git hub using following command:

git clone https://github.com/NVIDIA/apex.git

and cd apex to goto apex directory and tried to install package using following pip command:

pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext"

full code is:

...

ANSWER

Answered 2019-Dec-05 at 14:36

This worked for me:

Source https://stackoverflow.com/questions/56850711

QUESTION

Tensorflow what is the tf.contrib.nccl.allsum in new version?

Asked 2020-Feb-29 at 10:16

It seems that from tensorflow 1.13, there is no api such as tf.contrib.nccl.allsum. However, in the Nvidia official GitHub https://github.com/tkarras/progressive_growing_of_gans, which uses this old API to reduce sum from different gpu devices as the following.

...

ANSWER

Answered 2020-Feb-29 at 10:16

I think the same API is nccl_ops.all_sum. I have demoed this API by the following code.

Source https://stackoverflow.com/questions/60453533

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install nccl

Note: the official and tested builds of NCCL can be downloaded from: https://developer.nvidia.com/nccl. You can skip the following build steps if you choose to use the official builds.
To install NCCL on the system, create a package then install it as root.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: