nccl | Optimized primitives for collective multi-GPU communication | TCP library

 by   NVIDIA C++ Version: v1.3.4-1 License: Non-SPDX

kandi X-RAY | nccl Summary

kandi X-RAY | nccl Summary

nccl is a C++ library typically used in Networking, TCP applications. nccl has no bugs, it has no vulnerabilities and it has medium support. However nccl has a Non-SPDX License. You can download it from GitHub.

NCCL (pronounced "Nickel") is a stand-alone library of standard communication routines for GPUs, implementing all-reduce, all-gather, reduce, broadcast, reduce-scatter, as well as any send/receive based communication pattern. It has been optimized to achieve high bandwidth on platforms using PCIe, NVLink, NVswitch, as well as networking using InfiniBand Verbs or TCP/IP sockets. NCCL supports an arbitrary number of GPUs installed in a single node or across multiple nodes, and can be used in either single- or multi-process (e.g., MPI) applications. For more information on NCCL usage, please refer to the NCCL documentation.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              nccl has a medium active ecosystem.
              It has 2218 star(s) with 592 fork(s). There are 135 watchers for this library.
              OutlinedDot
              It had no major release in the last 12 months.
              There are 252 open issues and 490 have been closed. On average issues are closed in 14 days. There are 36 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of nccl is v1.3.4-1

            kandi-Quality Quality

              nccl has no bugs reported.

            kandi-Security Security

              nccl has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              nccl has a Non-SPDX License.
              Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

            kandi-Reuse Reuse

              nccl releases are available to install and integrate.
              Installation instructions, examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of nccl
            Get all kandi verified functions for this library.

            nccl Key Features

            No Key Features are available at this moment for nccl.

            nccl Examples and Code Snippets

            Broadcast tensor .
            pythondot img1Lines of Code : 62dot img1License : Non-SPDX (Apache License 2.0)
            copy iconCopy
            def broadcast_send(t,
                               shape,
                               dtype,
                               group_size,
                               group_key,
                               instance_key,
                               communication_hint='auto',
                               timeout=0):
              """  
            Alluce reduce_v2 .
            pythondot img2Lines of Code : 62dot img2License : Non-SPDX (Apache License 2.0)
            copy iconCopy
            def all_reduce_v2(t,
                              group_size,
                              group_key,
                              instance_key,
                              merge_op='Add',
                              final_op='Id',
                              communication_hint='auto',
                              timeout=  
            Performs a group - reduce operation .
            pythondot img3Lines of Code : 50dot img3License : Non-SPDX (Apache License 2.0)
            copy iconCopy
            def all_reduce(t,
                           group_size,
                           group_key,
                           instance_key,
                           merge_op='Add',
                           final_op='Id',
                           subdiv_offsets=(0,),
                           communication_hint='auto',
                         

            Community Discussions

            QUESTION

            NVProf for NCCL program
            Asked 2021-May-28 at 15:37

            When I want to use NVProf for NCCL problem with --metrics all, The profiling results always return me like

            ...

            ANSWER

            Answered 2021-May-28 at 15:37

            That behavior is expected.

            events, metrics, that are gathered by default pertain to CUDA device code activity. To see something that might be instructive, try profiling with --print-gpu-trace switch (and remove --metrics all).

            The documented "metrics" don't apply to the operations (data copying) that NCCL is doing. They apply to CUDA kernels (i.e. CUDA device code activity).

            nvprof does seem to have metrics that can be collected for NVLink activity. To see these, on a system that is applicable (e.g. has NVLink), run a command such as:

            Source https://stackoverflow.com/questions/67710465

            QUESTION

            CUML fit functions throwing cp.full TypeError
            Asked 2021-May-06 at 17:13

            I've been trying to run RAPIDS on Google Colab pro, and have successfully installed the cuml and cudf packages, however I am unable to run even the example scripts.

            TLDR;

            Anytime I try to run the fit function for cuml on Google Colab I get the following error. I get this when using the demo examples both for installation and then for cuml. This happens for a range of cuml examples (I first hit this trying to run UMAP).

            ...

            ANSWER

            Answered 2021-May-06 at 17:13

            Colab retains cupy==7.4.0 despite conda installing cupy==8.6.0 during the RAPIDS install. It is a custom install. I just had success pip installing cupy-cuda110==8.6.0 BEFORE installing RAPIDS, with

            !pip install cupy-cuda110==8.6.0:

            I'll be updating the script soon so that you won't have to do it manually, but want to test a few more things out. Thanks again for letting us know!

            EDIT: script updated.

            Source https://stackoverflow.com/questions/67368715

            QUESTION

            Docker shared memory size out of bounds or unhandled system error, NCCL version 2.7.8
            Asked 2021-Apr-13 at 05:55

            The following error(s) and solution go for deploying a stack through YAML in portainer but they can surely be applied to docker otherwise.

            Environment:

            ...

            ANSWER

            Answered 2021-Apr-13 at 05:55

            It seems that by default, the size of the shared memory is limited to 64mb. The solution to this error therefore, as shown in this issue is to increase the size of shared memory.

            Hence, the first idea that comes to mind would be simply defining something like shm_size: 9gb in the YAML file of the stack. However, this might not work as shown for e.g in this issue.

            Therefore, in the end, I had to use the following workaround (also described here, but poorly documented):

            Source https://stackoverflow.com/questions/67056737

            QUESTION

            Could not load dynamic library libcuda.so.1 error on Google AI Platform with custom container
            Asked 2021-Mar-11 at 01:46

            I'm trying to launch a training job on Google AI Platform with a custom container. As I want to use GPUs for the training, the base image I've used for my container is:

            ...

            ANSWER

            Answered 2021-Mar-11 at 01:05

            The suggested way to build the most reliable container is to use the officially maintained 'Deep Learning Containers'. I would suggest pulling 'gcr.io/deeplearning-platform-release/tf2-gpu.2-4'. This should already have CUDA, CUDNN, GPU Drivers, and TF 2.4 installed & tested. You'll just need to add your code into it.

            Source https://stackoverflow.com/questions/66550195

            QUESTION

            Pytorch DDP get stuck in getting free port
            Asked 2021-Feb-25 at 00:32

            I try to get a free port in DDP initialization of PyTorch. However, my code get stuck. The following snippet could repeat my description:

            ...

            ANSWER

            Answered 2021-Feb-25 at 00:32

            The answer is derived from here. The detailed answer is: 1. Since each free port is generated from individual process, ports are different in the end; 2. We could get a free port at the beginning and pass it to processes.

            The corrected snippet:

            Source https://stackoverflow.com/questions/66348957

            QUESTION

            Memory allocation error on worker 0: std::bad_alloc: CUDA error
            Asked 2020-Nov-17 at 22:25

            ENVIRONMENT

            CODE

            • I am just trying to gave a trainign and a test set for the model
            • 1st data package - train_data = xgboost.DMatrix(data=X_train, label=y_train) Up until I run just this and do training and anything with, only this does not gives an error message
            • 2nd data package - test_data = xgboost.DMatrix(data=X_test, label=y_test) couple cells down the line, they are not executed together

            Side Note

            • ERROR GB VRAM sizes are NOT 30GB or 15GB
              • 1 539 047 424 = 1.5 GB,
              • 3 091 258 960 = 3 GB,
              • 3 015 442 432 = 3GB,
              • 3 091 258 960 = 3 GB.
              • The GPU has 16 GB VRAM, so I don't think that this answers the question.

            ERROR

            ...

            ANSWER

            Answered 2020-Nov-17 at 19:17

            as per this part of your error,

            Source https://stackoverflow.com/questions/64879009

            QUESTION

            XGBoostError: [10:10:03] /workspace/src/tree/updater_gpu_hist.cu:1407: Exception in gpu_hist: NCCL failure
            Asked 2020-Oct-29 at 16:28

            PROJECT

            MY CODE

            ...

            ANSWER

            Answered 2020-Oct-29 at 16:28

            The problem is library incompatibility. This docker container have solved my problem:

            https://github.com/Kaggle/docker-python/commit/a6ba32e0bb017a30e079cf8bccab613cd4243a5f

            Source https://stackoverflow.com/questions/64589547

            QUESTION

            I cannot install cupy with pip
            Asked 2020-Mar-27 at 16:44

            I have attached the error message because I have no idea where to start with it. I have tried updating setuptools and purging and reinstalling pip.

            I am running Linux Mint 19.3 Cinnamon 4.4.8.

            If anyone has experienced this problem or has any suggestions for solutions, answers are much appreciated.

            ...

            ANSWER

            Answered 2020-Mar-27 at 16:44

            For the Python.h error, you probably need to install python3-dev (Debian/Ubuntu/Mint) or python3-devel (Fedora/CentOS/RHEL) using your operating system's package manager like apt or dnf.

            For the other missing .h's, you can usually google for:

            Source https://stackoverflow.com/questions/60889532

            QUESTION

            ImportError: Please install apex from https://www.github.com/nvidia/apex to use distributed and fp16 training
            Asked 2020-Mar-06 at 16:11

            cannot install apex for distributed and fp16 training of bert model i have tried to install by cloning the apex from github and tried to install packages using pip

            i have tried to install apex by cloning from git hub using following command:

            git clone https://github.com/NVIDIA/apex.git

            and cd apex to goto apex directory and tried to install package using following pip command:

            pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext"

            full code is:

            ...

            ANSWER

            Answered 2019-Dec-05 at 14:36

            QUESTION

            Tensorflow what is the tf.contrib.nccl.allsum in new version?
            Asked 2020-Feb-29 at 10:16

            It seems that from tensorflow 1.13, there is no api such as tf.contrib.nccl.allsum. However, in the Nvidia official GitHub https://github.com/tkarras/progressive_growing_of_gans, which uses this old API to reduce sum from different gpu devices as the following.

            ...

            ANSWER

            Answered 2020-Feb-29 at 10:16

            I think the same API is nccl_ops.all_sum. I have demoed this API by the following code.

            Source https://stackoverflow.com/questions/60453533

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install nccl

            Note: the official and tested builds of NCCL can be downloaded from: https://developer.nvidia.com/nccl. You can skip the following build steps if you choose to use the official builds.
            To install NCCL on the system, create a package then install it as root.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries

            Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Explore Related Topics

            Consider Popular TCP Libraries

            masscan

            by robertdavidgraham

            wait-for-it

            by vishnubob

            gnet

            by panjf2000

            Quasar

            by quasar

            mumble

            by mumble-voip

            Try Top Libraries by NVIDIA

            DeepLearningExamples

            by NVIDIAJupyter Notebook

            FastPhotoStyle

            by NVIDIAPython

            vid2vid

            by NVIDIAPython

            TensorRT

            by NVIDIAC++