compute-shader | platform interface to GPU compute functionality | GPU library

by pcwalton Rust Version: Current License: Non-SPDX

X-Ray Key Features Code Snippets Community Discussions(5)Vulnerabilities Install Support

kandi X-RAY | compute-shader Summary

compute-shader is a Rust library typically used in Hardware, GPU applications. compute-shader has no bugs, it has no vulnerabilities and it has low support. However compute-shader has a Non-SPDX License. You can download it from GitHub.

A simple cross-platform interface to a subset of GPU compute functionality in Rust. Supports OpenCL 1.2+ and OpenGL 4.3+. See examples/matrix-multiply.rs and examples/generate-cave.rs for examples of use.

Support

Quality

Security

License

Reuse

Support

compute-shader has a low active ecosystem.

It has 36 star(s) with 2 fork(s). There are 1 watchers for this library.

It had no major release in the last 6 months.

compute-shader has no issues reported. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of compute-shader is current.

Quality

compute-shader has 0 bugs and 0 code smells.

Security

compute-shader has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

compute-shader code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

compute-shader has a Non-SPDX License.

Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

Reuse

compute-shader releases are not available. You will need to build from source code and install.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of compute-shader

Get all kandi verified functions for this library.

compute-shader Key Features

No Key Features are available at this moment for compute-shader.

compute-shader Examples and Code Snippets

No Code Snippets are available at this moment for compute-shader.

Community Discussions

Trending Discussions on compute-shader

GLSL compute shader flickering blocks/squares artifact

Is vkCmdPipelineBarrier needed after vkCmdCopyBuffer when updating a VertexBuffer / IndexBuffer?

How can I properly create an array texture in OpenGL (Go)?

How to execute parallel compute shaders across multiple compute queues in Vulkan?

'ComputeShader does not contain a definition for SetMatrix'

QUESTION

GLSL compute shader flickering blocks/squares artifact

Asked 2021-Mar-30 at 05:39

I'm trying to write a bare minimum GPU raycaster using compute shaders in OpenGL. I'm confident the raycasting itself is functional, as I've gotten clean outlines of bounding boxes via a ray-box intersection algorithm.

However, when attempting ray-triangle intersection, I get strange artifacts. My shader is programmed to simply test for a ray-triangle intersection, and color the pixel white if an intersection was found and black otherwise. Instead of the expected behavior, when the triangle should be visible onscreen, the screen is instead filled with black and white squares/blocks/tiles which flicker randomly like TV static. The squares are at most 8x8 pixels (the size of my compute shader blocks), although there are dots as small as single pixels as well. The white blocks generally lie in the expected area of my triangle, although sometimes they are spread out across the bottom of the screen as well.

Here is a video of the artifact. In my full shader the camera can be rotated around and the shape appears more triangle-like, but the flickering artifact is the key issue and still appears in this video which I generated from the following minimal version of my shader code:

...

ANSWER

Answered 2021-Mar-30 at 05:39

I've fixed the issue, and it was (unsurprisingly) simply a stupid mistake on my own part.

Observe the following lines from my code snippet:

Which leaves my v2 vertex quite uninitialized.

The moral of this story is that if you have a similar issue to the one I described above, and you swear up and down that you've initialized all your variables and it must be a driver bug or someone else's fault... quadruple-check your variables, you probably forgot to initialize one.

Source https://stackoverflow.com/questions/66861415

QUESTION

Is vkCmdPipelineBarrier needed after vkCmdCopyBuffer when updating a VertexBuffer / IndexBuffer?

Asked 2021-Feb-22 at 18:02

Consider this pseudo code:

...

ANSWER

Answered 2021-Feb-22 at 18:02

I thought all commands pushed to a queue start sequentially but finish in an unspecified order, meaning they are executed in parallel.

Your use of the word "are" is the source of your confusion. They "can be" executed in parallel, but that doesn't mean they will be. The fact that a thing works on a Vulkan implementation (or even all implementations) is not enough evidence to say that your code is correct. This is why validation layers are so important.

You do need some kind of synchronization between the transfer and the point when the data is read. I would use an external dependency between the subpass reading the data (specifically the vertex input stage of that subpass) and the transfer operation. And it needs a memory dependency that covers that memory too, for the purpose of reading vertex data.

Indeed, if you haven't set any external dependencies for an attachment, your render pass will generate one automatically. And this is probably what makes your program "work". The implementation probably issues a full barrier between the render pass and any previous commands (even though the automatic external dependency doesn't include the vertex input stage). And the vertex input caches were probably cleared at some point, which prevents you from accidentally seeing old data.

However, the actual text of the implicit external dependency does not actually cover this use case. So if your code is "working", it is only by accident. So you still need an explicit dependency.

Your initial code (where you wrote data to host-visible memory that the GPU read directly) worked because submitting a batch implicitly synchronizes host writes globally, for all uses, so long as those host writes happen before the submit operation.

Source https://stackoverflow.com/questions/66320547

QUESTION

How can I properly create an array texture in OpenGL (Go)?

Asked 2020-Dec-21 at 22:41

I have a total of two textures, the first is used as a framebuffer to work with inside a computeshader, which is later blitted using BlitFramebuffer(...). The second is supposed to be an OpenGL array texture, which is used to look up textures and copy them onto the framebuffer. It's created in the following way:

...

ANSWER

Answered 2020-Dec-21 at 22:41

vec4 c = texture(texAtlas, vec3(iCoords.x%16, iCoords.y%16, 7))

Source https://stackoverflow.com/questions/65392626

QUESTION

How to execute parallel compute shaders across multiple compute queues in Vulkan?

Asked 2020-Oct-17 at 17:47

Update: This has been solved, you can find further details here: https://stackoverflow.com/a/64405505/1889253

A similar question was asked previously, but that question was initially focused around using multiple command buffers, and triggering the submit across different threads to achieve parallel execution of shaders. Most of the answers suggest that the solution is to use multiple queues instead. The use of multiple queues also seems to be the consensus across various blog posts and Khronos forum answers. I have attempted those suggestions running shader executions across multiple queues but without being able to see parallel execution, so I wanted to ask what I may be doing wrong. As suggested, this question includes the runnable code of multiple compute shaders being submitted to multiple queues, which hopefully can be useful for other people looking to do the same (once this is resolved).

The current implementation is in this pull request / branch, however I will cover the main Vulkan specific points, to ensure only Vulkan knowledge is required to answer this question. It's also worth mentioning that the current use-case is specifically for compute queues and compute shaders, not graphics or transfer queues (although insights/experience achieving parallelism across those would still be very useful, and would most probably also lead to the answer).

More specifically, I have the following:

Multiple queues first are "fetched" - my device is a NVIDIA 1650, and supports 16 graphics+compute queues in queue family index 0, and 8 compute queues in queue family index 2
evalAsync performs the submission (which contains recorded shader commands) - You should notice that a fence is created which we'll be able to use. Also the submit doesn't have any waitStageMasks (PipelineStageFlags).
evalAwait allows us to wait for the fence - When calling the evalAwait, we are able to wait for the submission to finish through the created fence

A couple of points that are not visible in the examples above but are important:

All evalAsync run on the same application, instance and device
Each evalAsync executes with its own separate commandBuffer and buffers, and in a separate queue
If you are wondering whether memory barriers could be having something to do, we have tried by removing all memoryBarriers (this on for example that runs before shader execution) completely but this has not made any difference on performance

The test that is used in the benchmark can be found here, however the only key things to understand are:

This is the shader that we use for testing, as you can see, we just add a bunch of atomicAdd steps to increase the amount of processing time
Currently the test has small buffer size and high number of shader loop iterations, but we also tested with large buffer size (i.e. 100,000 instead of 10), and smaller iteration (1,000 istead of 100,000,000).

When running the test, we first run a set of "synchronous" shader executions on the same queue (the number is variable but we've tested with 6-16, the latter which is the max number of queues). Then we run these in an asychrnonous manner, where we run all of them and the evalAwait until they are finished. When comparing the resulting times from both approaches, they take the same amount of time eventhough they run across different compute queues.

My questions are:

Am I currently missing something when fetching the queues?
Are there further parameters in the vulkan setup that need to be configured to ensure asynchronous execution?
Are there any restrictions I may not be aware about around potentially operating system processes only being able to submit GPU workloads in a synchronous way to the GPU?
Would multithreading be required in order for parallel execution to work properly when dealing with multiple queue submissions?

Furthermore I have found several useful resources online across various reddit posts and Khronos Group forums that provide very in-depth conceptual and theoretical overviews on the topic, but I haven't come across end to end code examples that show parallel execution of shaders. If there are any practical examples out there that you can share, which have funcioning parallel execution of shaders, that would be very helpful.

If there are further details or questions that can help provide further context please let me know, happy to answer them and/or provide more detail.

For completeness, my tests were using:

Vulkan SDK 1.2
Windows 10
NVIDIA 1650

Other relevant links that have been shared in similar posts:

Similar discussion with suggested link to example but which seems to have disappeared...
Post on Leveraging asynchronous queues for concurrent execution (unfortunately no example code)
(Relatively old - 5 years) Post that suggests nvidia cards can't do parallel execution of shaders, but doesn't seem to have a conculsive answer
Nvidia presentation on Vulkan Multithreading with multiple queue execution (hence my question above on threads)

...

ANSWER

Answered 2020-Oct-16 at 22:18

You are getting "asynchronous execution". You just don't expect it to behave the way it behaves.

On a CPU, if you have one thread active, then you're using one CPU core (or hyper-thread). All of that core's execution and computation capabilities are given to your thread alone (ignoring pre-emption). But at the same time, if there are other cores, your one thread cannot use any of the computational resources of those cores. Not unless you create another thread.

GPUs don't work that way. A queue is not like a CPU thread. It does not specifically relate to a particular quantity of computational resources. A queue is merely the interface through which commands get executed; the underlying hardware decides how to farm out commands to the various compute resources provided by the GPU as a whole.

What generally happens when you execute a command is that the hardware attempts to fully saturate the available shader execution units using your command. If there are more shader units available than the number of invocations your operation requires, then some resources are available immediately for the next command. But if not, then the entire GPU's compute resources will be dedicated to executing the first operation; the second one must wait for resources to become available before it can start.

It doesn't matter how many compute queues you shove work into; they're all going to try to use as many compute resources as possible. So they will largely execute in some particular order.

Queue priority systems exist, but these mainly help determine the order of execution for commands. That is, if a high-priority queue has some commands that need to be executed, then they will take priority the next time compute resources become available for a new command.

So submitting 3 dispatch batches on 3 separate queues is not going to complete faster than submitting 1 batch on one queue containing 3 dispatch operations.

The main reason multiple queues (of the same family) exist is to be able to submit work from multiple threads without having them do inter-thread synchronization (and to provide some possible prioritization of submissions).

Source https://stackoverflow.com/questions/64384786

QUESTION

'ComputeShader does not contain a definition for SetMatrix'

Asked 2020-Feb-26 at 20:04

Currently I am trying to follow this tutorial on ray tracing using compute shaders. I am trying to set the matrices for the compute shader using ComputeShader.SetMatrix(string name, Matrix4x4 val), or in my case specifically, RayTracingShader.SetMatrix("_CameraToWorld", _camera.cameraToWorldMatrix);.

However, I get the error 'ComputeShader does not contain a definition for SetMatrix', even though this Unity page clearly indicates that it should be possible. I am also unable to find anyone on Google with similar problems. I have just now updated Unity to version 5.5.4p4 Personal. Before that it didn't work either.

Any help fixing this would be greatly appreciated.

...

ANSWER

Answered 2020-Feb-26 at 20:03

Unity 5.5 does not define ComputeShader.SetMatrix.

You need to update to a version of Unity where ComputeShader.SetMatrix exists.

It seems like it first arrived in 2017.3

Source https://stackoverflow.com/questions/60421583

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install compute-shader

You can download it from GitHub.
Rust is installed and managed by the rustup tool. Rust has a 6-week rapid release process and supports a great number of platforms, so there are many builds of Rust available at any time. Please refer rust-lang.org for more information.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: