cuda-gdb | This directory contains various GNU compilers , assemblers | GPU library

 by   NVIDIA C Version: cuda-toolkit-12.1-release License: GPL-2.0

kandi X-RAY | cuda-gdb Summary

kandi X-RAY | cuda-gdb Summary

cuda-gdb is a C library typically used in Hardware, GPU applications. cuda-gdb has no bugs, it has no vulnerabilities, it has a Strong Copyleft License and it has low support. You can download it from GitHub.

CUDA GDB
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              cuda-gdb has a low active ecosystem.
              It has 134 star(s) with 57 fork(s). There are 28 watchers for this library.
              OutlinedDot
              It had no major release in the last 12 months.
              There are 5 open issues and 5 have been closed. On average issues are closed in 1581 days. There are no pull requests.
              It has a neutral sentiment in the developer community.
              The latest version of cuda-gdb is cuda-toolkit-12.1-release

            kandi-Quality Quality

              cuda-gdb has no bugs reported.

            kandi-Security Security

              cuda-gdb has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              cuda-gdb is licensed under the GPL-2.0 License. This license is Strong Copyleft.
              Strong Copyleft licenses enforce sharing, and you can use them when creating open source projects.

            kandi-Reuse Reuse

              cuda-gdb releases are available to install and integrate.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of cuda-gdb
            Get all kandi verified functions for this library.

            cuda-gdb Key Features

            No Key Features are available at this moment for cuda-gdb.

            cuda-gdb Examples and Code Snippets

            No Code Snippets are available at this moment for cuda-gdb.

            Community Discussions

            QUESTION

            Why does this CUDA reduction fail if I use 31 blocks?
            Asked 2020-Oct-03 at 01:20

            The following CUDA code takes a list of labels (0, 1, 2, 3, ...) and finds the sums of the weights of these labels.

            To accelerate the calculation, I use shared memory so that each thread maintains its own running sum. At the end of the calculation, I perform a CUB block-wide reduction and then an atomic add to the global memory.

            The CPU and GPU agree on the results if I use fewer than 30 blocks, but disagree if I use more than this. Why is this and how can I fix it?

            Checking error codes in the code doesn't yield anything and cuda-gdb and cuda-memcheck do not show any uncaught errors or memory issues.

            I'm using NVCC v10.1.243 and running on a Nvidia Quadro P2000.

            MWE ...

            ANSWER

            Answered 2020-Oct-03 at 01:20

            When I run your code on a Tesla V100, all the results are failures except the first test.

            You have a problem here:

            Source https://stackoverflow.com/questions/64179024

            QUESTION

            Program hit cudaErrorIllegalAdress without cuda-memcheck error when running program with a large dataset
            Asked 2020-Sep-03 at 02:22

            I'm new to cuda and was working on a small project to learn how to use it when I came across the following error when executing my program:

            ...

            ANSWER

            Answered 2020-Sep-03 at 02:22

            Your in-kernel malloc operation are exceeding the device heap size.

            Any time you are having trouble with a CUDA code that uses in kernel malloc or new, it's good practice (at least as a diagnostic) to check the returned pointer value for NULL, before attempting to use (i.e. dereference) it.

            When I do that in your code right after the malloc operations in aligner::kmdist, I get the asserts being hit, indicating NULL return values. This is the indication you have exceeded the device heap. You can increase the device heap size.

            When I increase the device heap size to 1GB, this particular issue disappears, and at that point cuda-memcheck may start reporting other errors (I don't know, your application may have other defects, but the proximal issue here is exceeding the device heap).

            As an aside, I also recommend that you compile your code to match the architecture you are running on:

            Source https://stackoverflow.com/questions/63713831

            QUESTION

            GCC version 4.9 has no installation candidate
            Asked 2020-Jun-04 at 08:51

            I'm trying to install gcc version 4.9 on Ubuntu to replace the current version 7.5 (because Torch is not compatible with version 6 and above). However, even following precise instructions, I can't install it. I did:

            ...

            ANSWER

            Answered 2020-Jun-04 at 08:51

            In the meantime, I figured out myself. You must add however that strangely, G++ and GCC version 4.9 is still not available, you must go with 4.8. By combining multiple sources, I constructed a way to install G++ and GCC 4.8.5 on your machine and configure them as the default ones:

            Source https://stackoverflow.com/questions/62177887

            QUESTION

            CUDA Separable Compilation with CMake, invalid device function
            Asked 2020-Apr-26 at 12:22

            I am developing a C++ application with cmake as the build system. Each component in the application builds into a static library, which the executable links to.

            I am trying to link in some cuda code that is built as a separate static library, also with cmake. When I attempt to invoke the global function entry point in the cuda static library from the main application, everything seems to work fine - the cudaDeviceSynchronize that follows my global function invocation returns 0. However, the output of the kernel is not set and the call returns immediately.

            I ran cuda-gdb. Despite the code being compiled with -g and -G, I was not able to break within the device function called by the kernel. So, I ran cuda-memcheck. When the kernel is launched, this message appears: ========= Program hit cudaErrorInvalidDeviceFunction (error 8) due to "invalid device function" on CUDA API call to cudaLaunchKernel.

            I looked this up, and the NVIDIA docs/forum posts I read suggested this is usually due to compiling for the wrong compute capability. However, I'm running Titan V's, and the CC is correctly set to 7.0 when compiling.

            I have set CUDA_SEPARABLE_COMPILATION on both the cuda library and the component in the main application that the cuda code links to per https://devblogs.nvidia.com/building-cuda-applications-cmake/. I've also tried setting CUDA_RESOLVE_DEVICE_SYMBOLS.

            Here is the relevant portion of the cmake for the main application:

            (kronmult_cuda is the component in the main application that links to the cuda library ${KRONLIB}. another component, kronmult, links to kronmult_cuda. Eventually, something that links to kronmult is linked to the main application).

            ...

            ANSWER

            Answered 2020-Apr-26 at 12:22

            After the helpful hint from @talonmies, I suspected this was a device linking problem. I simplified my build process, included all CUDA files in one translation unit, and turned off SEPARABLE COMPILATION.

            Still, I did not see a cmake_device_link.o in either my main application binary or the component that called into my cuda library. And, still had the same error. Tried setting CUDA_RESOLVE_DEVICE_SYMBOLS to no effect.

            Finally, I tried building the component that calls into my cuda library as SHARED. I saw the device linking step when building the .so in my cmake output, and the program runs fine. I do not know why building SHARED fixes what I suspect was a device linking problem - will accept any answer that deciphers that?

            Source https://stackoverflow.com/questions/61435330

            QUESTION

            CUDA 10.1 installed but Tensorflow doesn't run simulation on GPU
            Asked 2020-Feb-13 at 20:46

            CUDA 10.1 and the NVidia drivers v440 are installed on my Ubuntu 18.04 system. I don't understand why the nvidia-smi tool reports CUDA version 10.2 when the installed version is 10.1 (see further down).

            ...

            ANSWER

            Answered 2020-Feb-13 at 20:46

            From Could not dlopen library 'libcudart.so.10.0'; we can get that you tensorflow package is built against CUDA 10.0. You should install CUDA 10.0 or build it from source (against CUDA 10.1 or 10.2) by yourself.

            Source https://stackoverflow.com/questions/60213884

            QUESTION

            Wrong results using CUDA streams and memCpyAsync, become correct adding cudaDeviceSynchronize
            Asked 2019-Jun-19 at 09:35

            I'm developing a CUDA matrix multiplication, but I did some modifications to observe how they affect performances.

            I'm trying to observe the behavior (and I'm measuring the changes in GPU events time) of a simple matrix multiplication kernel. But I'm testing it in two speicific different conditions:

            • I have an amount of matrices (say matN) either for A, B and C, then I transfer (H2D) one matrix for A, one for B at time and then multply them, to transfer back (D2H) one C;

            • I have matN either for A, B and C, but I transfer >1(say chunk) matrices at time for A and for B, perform exactly chunk multiplications, and transfer back chunk result matrices C.

            In the first case (chunk = 1) all works as expected, but in the second case (chunk > 1) I get some of Cs are correct, while others are wrong.

            But if I put a cudaDeviceSynchronize() after the cudaMemcpyAsync all results I get are correct.

            Here's the part of code doing what I've just described above:

            ...

            ANSWER

            Answered 2019-Jun-19 at 09:35

            If you are using multiples streams, you may override Ad and Bd before using them.

            Example with iters = 2 and nStream = 2 :

            Source https://stackoverflow.com/questions/56654713

            QUESTION

            What does "RuntimeError: CUDA error: device-side assert triggered" in PyTorch mean?
            Asked 2019-Apr-21 at 15:08

            I have seen a lot of specific posts to particular case-specific problems, but no fundamental motivating explanation. What does this error:

            ...

            ANSWER

            Answered 2019-Apr-21 at 15:08

            When a device-side error is detected while CUDA device code is running, that error is reported via the usual CUDA runtime API error reporting mechanism. The usual detected error in device code would be something like an illegal address (e.g. attempt to dereference an invalid pointer) but another type is a device-side assert. This type of error is generated whenever a C/C++ assert() occurs in device code, and the assert condition is false.

            Such an error occurs as a result of a specific kernel. Runtime error checking in CUDA is necessarily asynchronous, but there are probably at least 3 possible methods to start to debug this.

            1. Modify the source code to effectively convert asynchronous kernel launches to synchronous kernel launches, and do rigorous error-checking after each kernel launch. This will identify the specific kernel that has caused the error. At that point it may be sufficient simply to look at the various asserts in that kernel code, but you could also use step 2 or 3 below.

            2. Run your code with cuda-memcheck. This is a tool something like "valgrind for device code". When you run your code with cuda-memcheck, it will tend to run much more slowly, but the runtime error reporting will be enhanced. It is also usually preferable to compile your code with -lineinfo. In that scenario, when a device-side assert is triggered, cuda-memcheck will report the source code line number where the assert is, and also the assert itself and the condition that was false. You can see here for a walkthrough of using it (albeit with an illegal address error instead of assert(), but the process with assert() will be similar.

            3. It should also be possible to use a debugger. If you use a debugger such as cuda-gdb (e.g. on linux) then the debugger will have back-trace reports that will indicate which line the assert was, when it was hit.

            Both cuda-memcheck and the debugger can be used if the CUDA code is launched from a python script.

            At this point you have discovered what the assert is and where in the source code it is. Why it is there cannot be answered generically. This will depend on the developers intention, and if it is not commented or otherwise obvious, you will need some method to intuit that somehow. The question of "how to work backwards" is also a general debugging question, not specific to CUDA. You can use printf in CUDA kernel code, and also a debugger like cuda-gdb to assist with this (for example, set a breakpoint prior to the assert, and inspect machine state - e.g. variables - when the assert is about to be hit).

            Source https://stackoverflow.com/questions/55780923

            QUESTION

            nvcc fatal : Value 'sm_20' is not defined for option 'gpu-architecture'
            Asked 2019-Feb-20 at 09:01

            I've looked at many pages and either could not follow what they were saying because they were unclear and/or my knowledge is just not sufficient enough.

            I am trying to run:

            luarocks install https://raw.githubusercontent.com/qassemoquab/stnbhwd/master/stnbhwd-scm-1.rockspec

            So that I may run DenseCap over some images using GPU Acceleration. When I run it, I get this error:

            ...

            ANSWER

            Answered 2017-Dec-05 at 23:35

            Try to change the code architecture (such as sm_20) to some higher version in CMakeLists.txt of stnbhwd that you are trying to install.

            From:

            Source https://stackoverflow.com/questions/47663033

            QUESTION

            Using CUDA-gdb with NVRTC
            Asked 2019-Feb-16 at 18:31

            I have an application which generates CUDA C++ source code, compiles it into PTX at runtime using NVRTC, and then creates CUDA modules from it using the CUDA driver API.

            If I debug this application using cuda-gdb, it displays the kernel (where an error occured) in the backtrace, but does not show the line number.

            I export the generated source code into a file, and give the directory to cuda-gdb using the --directory option. I also tried passing its file name to nvrtcCreateProgram() (name argument). I use the compile options --device-debug and --generate-line-info with NVRTC.

            Is there a way to let cuda-gdb know the location of the generated source code file, and display the line number information in its backtrace?

            ...

            ANSWER

            Answered 2019-Feb-16 at 18:31

            I was able to do kernel source-level debugging on a nvrtc-generated kernel with cuda-gdb as follows:

            • start with vectorAdd_nvrtc sample code
            • modify the compileFileToPTX routine (provided by nvrtc_helper.h) to add the --device-debug switch during the compile-cu-to-ptx step.
            • modify the loadPTX routine (provided by nvrtc_helper.h) to add the CU_JIT_GENERATE_DEBUG_INFO option (set to 1) for the cuModuleLoadDataEx load/JIT PTX-to-binary step.
            • compile the main function (vectorAdd.cpp) with -g option.

            Here is a complete test case/session. I'm only showing the vectorAdd.cpp file from the project because that is the only file I modified. Other project file(s) are identical to what is in the sample project:

            Source https://stackoverflow.com/questions/54670436

            QUESTION

            cudaMemcpy returns success but does not copy anything
            Asked 2018-Oct-12 at 06:55

            below are the things I have checked with cuda-gdb:

            1. the contents of src are correct
            2. cudaMalloc, malloc, and file I/O are successful
            3. cudaMemcpy returns cudaSuccess
            4. the problematic cudaMemcpy is called and throws no errors or exceptions
            5. destination is allocated (cudaMalloc) successfully

            Below are relevent parts of the code: wavenet_server.cc mallocs the source, copies data from a file to the source, and calls make_wavenet. wavenet_infer.cu calls constructor of MyWaveNet and calls setEmbeddings.

            wavenet_server.cc:

            ...

            ANSWER

            Answered 2018-Oct-12 at 06:55

            Turns out that cudaMemcpy was not the issue. when examining device global memroy using cuda-gdb, one cannot do: x/10fw float_array. It will give incorrect values. To view, try this: p ((@global float*) float_array)[0]@10

            Source https://stackoverflow.com/questions/52771152

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install cuda-gdb

            You can download it from GitHub.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries