CUDAfy.NET | NET access to work with Visual Studio | GPU library
kandi X-RAY | CUDAfy.NET Summary
kandi X-RAY | CUDAfy.NET Summary
CUDAfy .NET allows easy development of high performance GPGPU applications completely from the Microsoft .NET framework. It's developed in C#. Modern graphics cards provide the potential of massive speed increase over CPUs for non-graphics related intensive numeric operations. Many large data set operations such as matrices can see a 100x or more speed up. CUDAfy allows .NET developers to easily create complex applications that split processing cleanly between host and GPU. There are no separate CUDA cu files or complex set-up procedures to launch GPU device functions. It follows the CUDA programming model and any knowledge gained from tutorials or books on CUDA can be easily transferred to CUDAfy, only in a clean .NET fashion.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of CUDAfy.NET
CUDAfy.NET Key Features
CUDAfy.NET Examples and Code Snippets
Community Discussions
Trending Discussions on CUDAfy.NET
QUESTION
I can't believe after all the research and reading I've done I am still not 100% clear on how to do this, so I must ask.. I am trying to get something like the following to run on a gpu card and I am using Cudafy.Net to generate the Cuda C equivalent. I want to get this to run as fast as possible.
If I have a function (simplified) such as:
...ANSWER
Answered 2017-Jun-26 at 22:55It depends on the size of lgeHeight and lgeWidth. If the product of them is less than the threads on the card, then when you launch the kernel you can assume that each thread will run on one pair of x and y.
QUESTION
Trying to get my head around cuda, after not grasping similar stackoverflow questions i decided to test out an example (i'm using cudafy.net for c# but the underlying cuda should be parsable)
I want to do the following. Send a 4x4x4 matrix to the kernel and get a 4x4x4 out according to this logic:
...ANSWER
Answered 2017-May-17 at 12:08How many threads am I starting ? You are starting 1 thread per block, hence 16 total since the Z parameter is not used. For better performance, I would recommend also using threads (at least 128, and multiple of 32 anyways).
How do you go about 'indexing' my example problem in 3 dimensions (Starting 4x4x4 threads and getting the variables for flat3DArray[x * sizeY * sizeZ + y * sizeZ + z])?
The second parameter of gpu.Launch
method is for threads. x
, y
and z
could hence be threadIdx.x
, threadIdx.y
and threadIdx.z
respectively. But you may also want to use many blocks, thus threadIdx.x + blockDim.x * blockIdx.x
could be a good peak.
The link you provided here explains why your Z dimension is not relevant. CUDAfy.Net exposes the launch function that further calls cuda runtime CUDA/C API call. When passing parameters from dot net to native environment, it seems that CUDAfy.Net simply ignores the Z argument leaving it to one. (this is most probably due to the fact that early versions of CUDA did not support the Z parameter different than one). The explanation is not pure-cuda because CUDA now supports Z value different than one, but your parameter is simply ignored in the CUDAfy.Net implementation.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install CUDAfy.NET
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page