cuda-gdb | This directory contains various GNU compilers , assemblers | GPU library

by NVIDIA C Version: cuda-toolkit-12.1-release License: GPL-2.0

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | cuda-gdb Summary

cuda-gdb is a C library typically used in Hardware, GPU applications. cuda-gdb has no bugs, it has no vulnerabilities, it has a Strong Copyleft License and it has low support. You can download it from GitHub.

CUDA GDB

Support

Quality

Security

License

Reuse

Support

cuda-gdb has a low active ecosystem.

It has 134 star(s) with 57 fork(s). There are 28 watchers for this library.

It had no major release in the last 12 months.

There are 5 open issues and 5 have been closed. On average issues are closed in 1581 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of cuda-gdb is cuda-toolkit-12.1-release

Quality

cuda-gdb has no bugs reported.

Security

cuda-gdb has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

cuda-gdb is licensed under the GPL-2.0 License. This license is Strong Copyleft.

Strong Copyleft licenses enforce sharing, and you can use them when creating open source projects.

Reuse

cuda-gdb releases are available to install and integrate.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of cuda-gdb

Get all kandi verified functions for this library.

cuda-gdb Key Features

No Key Features are available at this moment for cuda-gdb.

cuda-gdb Examples and Code Snippets

No Code Snippets are available at this moment for cuda-gdb.

Community Discussions

Trending Discussions on cuda-gdb

Why does this CUDA reduction fail if I use 31 blocks?

Program hit cudaErrorIllegalAdress without cuda-memcheck error when running program with a large dataset

GCC version 4.9 has no installation candidate

CUDA Separable Compilation with CMake, invalid device function

CUDA 10.1 installed but Tensorflow doesn't run simulation on GPU

Wrong results using CUDA streams and memCpyAsync, become correct adding cudaDeviceSynchronize

What does "RuntimeError: CUDA error: device-side assert triggered" in PyTorch mean?

nvcc fatal : Value 'sm_20' is not defined for option 'gpu-architecture'

Using CUDA-gdb with NVRTC

cudaMemcpy returns success but does not copy anything

QUESTION

Why does this CUDA reduction fail if I use 31 blocks?

Asked 2020-Oct-03 at 01:20

The following CUDA code takes a list of labels (0, 1, 2, 3, ...) and finds the sums of the weights of these labels.

To accelerate the calculation, I use shared memory so that each thread maintains its own running sum. At the end of the calculation, I perform a CUB block-wide reduction and then an atomic add to the global memory.

The CPU and GPU agree on the results if I use fewer than 30 blocks, but disagree if I use more than this. Why is this and how can I fix it?

Checking error codes in the code doesn't yield anything and cuda-gdb and cuda-memcheck do not show any uncaught errors or memory issues.

I'm using NVCC v10.1.243 and running on a Nvidia Quadro P2000.

MWE ...

ANSWER

Answered 2020-Oct-03 at 01:20

When I run your code on a Tesla V100, all the results are failures except the first test.

You have a problem here:

Source https://stackoverflow.com/questions/64179024

QUESTION

Program hit cudaErrorIllegalAdress without cuda-memcheck error when running program with a large dataset

Asked 2020-Sep-03 at 02:22

I'm new to cuda and was working on a small project to learn how to use it when I came across the following error when executing my program:

...

ANSWER

Answered 2020-Sep-03 at 02:22

Your in-kernel malloc operation are exceeding the device heap size.

Any time you are having trouble with a CUDA code that uses in kernel malloc or new, it's good practice (at least as a diagnostic) to check the returned pointer value for NULL, before attempting to use (i.e. dereference) it.

When I do that in your code right after the malloc operations in aligner::kmdist, I get the asserts being hit, indicating NULL return values. This is the indication you have exceeded the device heap. You can increase the device heap size.

When I increase the device heap size to 1GB, this particular issue disappears, and at that point cuda-memcheck may start reporting other errors (I don't know, your application may have other defects, but the proximal issue here is exceeding the device heap).

As an aside, I also recommend that you compile your code to match the architecture you are running on:

Source https://stackoverflow.com/questions/63713831

QUESTION

GCC version 4.9 has no installation candidate

Asked 2020-Jun-04 at 08:51

I'm trying to install gcc version 4.9 on Ubuntu to replace the current version 7.5 (because Torch is not compatible with version 6 and above). However, even following precise instructions, I can't install it. I did:

...

ANSWER

Answered 2020-Jun-04 at 08:51

In the meantime, I figured out myself. You must add however that strangely, G++ and GCC version 4.9 is still not available, you must go with 4.8. By combining multiple sources, I constructed a way to install G++ and GCC 4.8.5 on your machine and configure them as the default ones:

Source https://stackoverflow.com/questions/62177887

QUESTION

CUDA Separable Compilation with CMake, invalid device function

Asked 2020-Apr-26 at 12:22

I am developing a C++ application with cmake as the build system. Each component in the application builds into a static library, which the executable links to.

I am trying to link in some cuda code that is built as a separate static library, also with cmake. When I attempt to invoke the global function entry point in the cuda static library from the main application, everything seems to work fine - the cudaDeviceSynchronize that follows my global function invocation returns 0. However, the output of the kernel is not set and the call returns immediately.

I ran cuda-gdb. Despite the code being compiled with -g and -G, I was not able to break within the device function called by the kernel. So, I ran cuda-memcheck. When the kernel is launched, this message appears: ========= Program hit cudaErrorInvalidDeviceFunction (error 8) due to "invalid device function" on CUDA API call to cudaLaunchKernel.

I looked this up, and the NVIDIA docs/forum posts I read suggested this is usually due to compiling for the wrong compute capability. However, I'm running Titan V's, and the CC is correctly set to 7.0 when compiling.

I have set CUDA_SEPARABLE_COMPILATION on both the cuda library and the component in the main application that the cuda code links to per https://devblogs.nvidia.com/building-cuda-applications-cmake/. I've also tried setting CUDA_RESOLVE_DEVICE_SYMBOLS.

Here is the relevant portion of the cmake for the main application:

(kronmult_cuda is the component in the main application that links to the cuda library ${KRONLIB}. another component, kronmult, links to kronmult_cuda. Eventually, something that links to kronmult is linked to the main application).

...

ANSWER

Answered 2020-Apr-26 at 12:22

After the helpful hint from @talonmies, I suspected this was a device linking problem. I simplified my build process, included all CUDA files in one translation unit, and turned off SEPARABLE COMPILATION.

Still, I did not see a cmake_device_link.o in either my main application binary or the component that called into my cuda library. And, still had the same error. Tried setting CUDA_RESOLVE_DEVICE_SYMBOLS to no effect.

Finally, I tried building the component that calls into my cuda library as SHARED. I saw the device linking step when building the .so in my cmake output, and the program runs fine. I do not know why building SHARED fixes what I suspect was a device linking problem - will accept any answer that deciphers that?

Source https://stackoverflow.com/questions/61435330

QUESTION

CUDA 10.1 installed but Tensorflow doesn't run simulation on GPU

Asked 2020-Feb-13 at 20:46

CUDA 10.1 and the NVidia drivers v440 are installed on my Ubuntu 18.04 system. I don't understand why the nvidia-smi tool reports CUDA version 10.2 when the installed version is 10.1 (see further down).

...

ANSWER

Answered 2020-Feb-13 at 20:46

From Could not dlopen library 'libcudart.so.10.0'; we can get that you tensorflow package is built against CUDA 10.0. You should install CUDA 10.0 or build it from source (against CUDA 10.1 or 10.2) by yourself.

Source https://stackoverflow.com/questions/60213884

QUESTION

Wrong results using CUDA streams and memCpyAsync, become correct adding cudaDeviceSynchronize

Asked 2019-Jun-19 at 09:35

I'm developing a CUDA matrix multiplication, but I did some modifications to observe how they affect performances.

I'm trying to observe the behavior (and I'm measuring the changes in GPU events time) of a simple matrix multiplication kernel. But I'm testing it in two speicific different conditions:

I have an amount of matrices (say matN) either for A, B and C, then I transfer (H2D) one matrix for A, one for B at time and then multply them, to transfer back (D2H) one C;
I have matN either for A, B and C, but I transfer >1(say chunk) matrices at time for A and for B, perform exactly chunk multiplications, and transfer back chunk result matrices C.

In the first case (chunk = 1) all works as expected, but in the second case (chunk > 1) I get some of Cs are correct, while others are wrong.

But if I put a cudaDeviceSynchronize() after the cudaMemcpyAsync all results I get are correct.

Here's the part of code doing what I've just described above:

...

ANSWER

Answered 2019-Jun-19 at 09:35

If you are using multiples streams, you may override Ad and Bd before using them.

Example with iters = 2 and nStream = 2 :

Source https://stackoverflow.com/questions/56654713

QUESTION

What does "RuntimeError: CUDA error: device-side assert triggered" in PyTorch mean?

Asked 2019-Apr-21 at 15:08

I have seen a lot of specific posts to particular case-specific problems, but no fundamental motivating explanation. What does this error:

...

ANSWER

Answered 2019-Apr-21 at 15:08

When a device-side error is detected while CUDA device code is running, that error is reported via the usual CUDA runtime API error reporting mechanism. The usual detected error in device code would be something like an illegal address (e.g. attempt to dereference an invalid pointer) but another type is a device-side assert. This type of error is generated whenever a C/C++ assert() occurs in device code, and the assert condition is false.

Such an error occurs as a result of a specific kernel. Runtime error checking in CUDA is necessarily asynchronous, but there are probably at least 3 possible methods to start to debug this.

Modify the source code to effectively convert asynchronous kernel launches to synchronous kernel launches, and do rigorous error-checking after each kernel launch. This will identify the specific kernel that has caused the error. At that point it may be sufficient simply to look at the various asserts in that kernel code, but you could also use step 2 or 3 below.
Run your code with cuda-memcheck. This is a tool something like "valgrind for device code". When you run your code with cuda-memcheck, it will tend to run much more slowly, but the runtime error reporting will be enhanced. It is also usually preferable to compile your code with -lineinfo. In that scenario, when a device-side assert is triggered, cuda-memcheck will report the source code line number where the assert is, and also the assert itself and the condition that was false. You can see here for a walkthrough of using it (albeit with an illegal address error instead of assert(), but the process with assert() will be similar.
It should also be possible to use a debugger. If you use a debugger such as cuda-gdb (e.g. on linux) then the debugger will have back-trace reports that will indicate which line the assert was, when it was hit.

Both cuda-memcheck and the debugger can be used if the CUDA code is launched from a python script.

At this point you have discovered what the assert is and where in the source code it is. Why it is there cannot be answered generically. This will depend on the developers intention, and if it is not commented or otherwise obvious, you will need some method to intuit that somehow. The question of "how to work backwards" is also a general debugging question, not specific to CUDA. You can use printf in CUDA kernel code, and also a debugger like cuda-gdb to assist with this (for example, set a breakpoint prior to the assert, and inspect machine state - e.g. variables - when the assert is about to be hit).

Source https://stackoverflow.com/questions/55780923

QUESTION

nvcc fatal : Value 'sm_20' is not defined for option 'gpu-architecture'

Asked 2019-Feb-20 at 09:01

I've looked at many pages and either could not follow what they were saying because they were unclear and/or my knowledge is just not sufficient enough.

I am trying to run:

luarocks install https://raw.githubusercontent.com/qassemoquab/stnbhwd/master/stnbhwd-scm-1.rockspec

So that I may run DenseCap over some images using GPU Acceleration. When I run it, I get this error:

...

ANSWER

Answered 2017-Dec-05 at 23:35

Try to change the code architecture (such as sm_20) to some higher version in CMakeLists.txt of stnbhwd that you are trying to install.

From:

Source https://stackoverflow.com/questions/47663033

QUESTION

Using CUDA-gdb with NVRTC

Asked 2019-Feb-16 at 18:31

I have an application which generates CUDA C++ source code, compiles it into PTX at runtime using NVRTC, and then creates CUDA modules from it using the CUDA driver API.

If I debug this application using cuda-gdb, it displays the kernel (where an error occured) in the backtrace, but does not show the line number.

I export the generated source code into a file, and give the directory to cuda-gdb using the --directory option. I also tried passing its file name to nvrtcCreateProgram() (name argument). I use the compile options --device-debug and --generate-line-info with NVRTC.

Is there a way to let cuda-gdb know the location of the generated source code file, and display the line number information in its backtrace?

...

ANSWER

Answered 2019-Feb-16 at 18:31

I was able to do kernel source-level debugging on a nvrtc-generated kernel with cuda-gdb as follows:

start with vectorAdd_nvrtc sample code
modify the compileFileToPTX routine (provided by nvrtc_helper.h) to add the --device-debug switch during the compile-cu-to-ptx step.
modify the loadPTX routine (provided by nvrtc_helper.h) to add the CU_JIT_GENERATE_DEBUG_INFO option (set to 1) for the cuModuleLoadDataEx load/JIT PTX-to-binary step.
compile the main function (vectorAdd.cpp) with -g option.

Here is a complete test case/session. I'm only showing the vectorAdd.cpp file from the project because that is the only file I modified. Other project file(s) are identical to what is in the sample project:

Source https://stackoverflow.com/questions/54670436

QUESTION

cudaMemcpy returns success but does not copy anything

Asked 2018-Oct-12 at 06:55

below are the things I have checked with cuda-gdb:

the contents of src are correct
cudaMalloc, malloc, and file I/O are successful
cudaMemcpy returns cudaSuccess
the problematic cudaMemcpy is called and throws no errors or exceptions
destination is allocated (cudaMalloc) successfully

Below are relevent parts of the code: wavenet_server.cc mallocs the source, copies data from a file to the source, and calls make_wavenet. wavenet_infer.cu calls constructor of MyWaveNet and calls setEmbeddings.

wavenet_server.cc:

...

ANSWER

Answered 2018-Oct-12 at 06:55

Turns out that cudaMemcpy was not the issue. when examining device global memroy using cuda-gdb, one cannot do: x/10fw float_array. It will give incorrect values. To view, try this: p ((@global float*) float_array)[0]@10

Source https://stackoverflow.com/questions/52771152

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install cuda-gdb

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: