cuda-samples | cuda programs for senior project | GPU library
kandi X-RAY | cuda-samples Summary
kandi X-RAY | cuda-samples Summary
Run make in a given directory to build. Run make run to run built sample. Examples with OpenGL output use GLUT and Apple's -framework compiler flag. If not on a Mac, you'll obviously need to fix this. Copyright (c) 2010 Patrick Rogers.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of cuda-samples
cuda-samples Key Features
cuda-samples Examples and Code Snippets
Community Discussions
Trending Discussions on cuda-samples
QUESTION
I came across an example of a simple convolution of two signals using cuFFT.
https://github.com/NVIDIA/cuda-samples/blob/master/Samples/simpleCUFFT/simpleCUFFT.cu
It performs zero-padding of both signals so that their sizes are matched together. But when I printed the padded output, It showed that the padding was done in the middle of the signal which I don't understand because usually it is done at the start or end. Check the padData function.
Why the padding is done in the middle?
If two signals are of the same length, Is padding required?
I am not really familiar with signal processing techniques. I would appreciate it if someone could explain this to me.
...ANSWER
Answered 2021-Mar-21 at 17:49If you want to phase result of a complex FFT to stay the same, then any zero padding needs to be circularly symmetric around beginning of the input. If you just pad at the beginning or end, the phase result would likely change.
If you want the complex IFFT of a spectrum to produce a strictly real result, then any zero padding has to maintain conjugate symmetry (which means the padding needs to be centered or symmetric, not just all at the start or end).
QUESTION
I'm trying to understand how this sample code from CUDA SDK 8.0 works:
...ANSWER
Answered 2021-Mar-03 at 20:01Here is a drawing to understand the values set to the first variables of the CUDA kernel and the overall computation performed:
Matrices are stored using a row-major ordering. The CUDA code assume the matrix sizes can be divided by BLOCK_SIZE
.
The matrices A
, B
and C
are virtually split in blocks according to the kernel CUDA grid. All blocks of C
can be computed in parallel. For a given dark-grey block of C
, the main loop walk through the several light-grey blocks of A
and B
(in lockstep). Each block is computed in parallel using BLOCK_SIZE * BLOCK_SIZE
threads.
bx
and by
are the block-based position of the current block within the CUDA grid.
tx
and ty
are the cell-based position of the cell computed by the current thread within the current computed block of the CUDA grid.
Here is a detailed analysis for the aBegin
variable:
aBegin
refers to the memory location of the first cell of the first computed block of the matrix A
. It is set to wA * BLOCK_SIZE * by
because each block contains BLOCK_SIZE * BLOCK_SIZE
cells and there is wA / BLOCK_SIZE
blocks horizontally and by
blocks above the current computed block of A
. Thus, (BLOCK_SIZE * BLOCK_SIZE) * (wA / BLOCK_SIZE) * by = BLOCK_SIZE * wA * by
.
The same logic apply for bBegin
:
it is set to BLOCK_SIZE * bx
because there is bx
block of size BLOCK_SIZE
in memory before the first cell of the first computed block of the matrix B
.
a
is incremented by aStep = BLOCK_SIZE
in the loop so that the next computed block is the following on the right (on the drawing) of the current computed block of A
. b
is incremented by bStep = BLOCK_SIZE * wB
in the same loop so that the next computed block is the following of the bottom (on the drawing) of the current computed block of B
.
QUESTION
I am using the code given here to find out the TFlops of mixed precision ops on Nvidia Tesla T4. Its theoretical value is given 65 Tflops. however, the code produces the value as 10 Tflops. Any explanation that can justify this happening?
...ANSWER
Answered 2021-Jan-16 at 15:25This might be more of an extended comment, bet hear me out ...
As pointed out in the comments CUDA Samples are not meant as performance measuring tools. The second benchmark you provided does not actually use tensor cores, but just a normal instruction executed on FP32 or FP64 cores.
QUESTION
here is a demo.cu
aiming to printf from the GPU device:
ANSWER
Answered 2020-Aug-31 at 17:34The CUDA compiler must compile for a GPU target (i.e. a device architecture). If you don't specify a target architecture on the compile command line, historically, CUDA has chosen a very flexible default architecture specification that can run on all GPUs that the CUDA version supports.
That isn't always the case, however, and its not the case with CUDA 11. CUDA 11 compiles for a default architecture of sm_52
(i.e. as if you had specified -arch=sm_52
on the command line). But CUDA 11 supports architectures down to sm_35
.
Therefore if you don't specify the target architecture on the compile command line with CUDA 11, and attempt to run on a GPU with an architecture that predates sm_52
, any CUDA code (kernels) that you have written definitely won't work.
It's good practice, any time you are having trouble with a CUDA code, to use proper CUDA error checking, and if you had done that here you would have gotten a runtime error indication that would have immediately identified the issue (at least for someone who is familiar with CUDA errors).
The solution in these cases is to specify a compilation command that includes the GPU you intend to run on (this is usually good practice anyway). If you do that, and the architecture you specify is "deprecated", then the nvcc
compiler will issue a warning letting you know that a future CUDA version may not support the GPU you are trying to run on. The warning does not mean anything you are doing is wrong or illegal or needs to be changed, but it means that in the future, a future CUDA version may not support that GPU.
If you want to suppress that warning, you can pass the -Wno-deprecated-gpu-targets
switch on the compile command line.
QUESTION
I'm trying to build an image for the nvidia jetson nano board using yocto (zeus branch), here is my configuration:
...ANSWER
Answered 2020-Mar-05 at 13:58The gcc recipes is located in
QUESTION
CUDA 10.1 and the NVidia drivers v440 are installed on my Ubuntu 18.04 system. I don't understand why the nvidia-smi
tool reports CUDA version 10.2 when the installed version is 10.1 (see further down).
ANSWER
Answered 2020-Feb-13 at 20:46From Could not dlopen library 'libcudart.so.10.0';
we can get that you tensorflow package is built against CUDA 10.0. You should install CUDA 10.0 or build it from source (against CUDA 10.1 or 10.2) by yourself.
QUESTION
I am running a custom Yocto image on the Nvidia Jetson Nano that has docker-ce (v19.03.2) included. I am able to run docker without problems.
The problem comes when I want to use docker for vision testing. I need access to host side CUDA and tensorRT. This is accessed through the Nvidia Container Runtime on top of the docker-ce. I have installed Nvidia Container Runtime (v0.9.0 beta) manually (extracted the necessary .deb packages and copy pasted them into the rootfs) to test on my build, and it seems to be working fine.
When I run docker info
I can see that the nvidia
runtime is available, and it doesn't complain when I run a docker with docker run -it --runtime=nvidia image
.
If i run deviceQuery test OUTSIDE docker, i get the following:
...ANSWER
Answered 2020-Jan-23 at 14:14The .csv
that are included in the rootfs from the NVIDIA SDK-manager contains specific lib/dir/sym that are needed for the passing of GPU access to the container. The files that are listed in the .csv files are merged into the container and allows access to these files. What specific files are needed, depends on what is needed in the container.
It is of course very important that the actual paths to the files listed in the csv files are the same on the host, otherwise the merge will fail. These paths are not the correct paths on the default Yocto setup as they are made for the default NVIDIA SDK-manager image rootfs setup and thus needs to be corrected.
Once corrected, the access to GPU acceleration in the container should be possible and can be confirmed by doing a deviceQuery test.
QUESTION
I have a Intel Xeon machine with NVIDIA GeForce1080 GTX configured and CentOS 7 as operating system. I have installed NVIDIA-driver 410.93 and cuda-toolkit 10.0. After compiling the cuda-samples, i tried to run ./deviceQuery. But it throws like this
...ANSWER
Answered 2019-Feb-27 at 05:16I debugged it. The problem is version mismatch between nvidia-driver(410.93) and cuda(with driver 410.48 came with cuda run file). Gave autoremove all the drivers and reinstalled from the beginning. Deleted all the link files in /var/lib/dkms/nvidia/*. Now it works fine. And nvidia-uvm also loaded.
lsmod | grep nvidia
QUESTION
I have some trouble in setting up my system properly. My system consists of:
- Intel NUC7i7BNH
- ASUS ROG with a NVidia 1080ti
My GPU is correctly detected by lspci:
...ANSWER
Answered 2018-Sep-24 at 15:35After trying out different versions of the nvidia driver, I came to the conclusion, that there is a problem with version 384.
I deleted all nvidia driver related stuff via sudo apt-get purge nvidia.
and installed version 390:
QUESTION
I have installed the cuda-samples-8-0
package on Ubuntu 17.04, and I will would like to build the samples with gcc-5
, which I have installed along gcc-6
on my machine. gcc-6` is not supported by CUDA 8 and gives numerous compilation errors.
How to get make
to use gcc-5
?
I have tried sudo make CXX=g++-5 CC=gcc-5
from the /usr/local/cuda-8.0/samples
directory which still leads to gcc-6
being used.
ANSWER
Answered 2017-Jul-15 at 22:39I just got this to work by symlinking /usr/bin/gcc-5
to /usr/local/cuda-8.0/bin/gcc
and /usr/bin/g++-5
to /usr/local/cuda-8.0/bin/g++
:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install cuda-samples
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page