compute-shader | platform interface to GPU compute functionality | GPU library
kandi X-RAY | compute-shader Summary
kandi X-RAY | compute-shader Summary
A simple cross-platform interface to a subset of GPU compute functionality in Rust. Supports OpenCL 1.2+ and OpenGL 4.3+. See examples/matrix-multiply.rs and examples/generate-cave.rs for examples of use.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of compute-shader
compute-shader Key Features
compute-shader Examples and Code Snippets
Community Discussions
Trending Discussions on compute-shader
QUESTION
I'm trying to write a bare minimum GPU raycaster using compute shaders in OpenGL. I'm confident the raycasting itself is functional, as I've gotten clean outlines of bounding boxes via a ray-box intersection algorithm.
However, when attempting ray-triangle intersection, I get strange artifacts. My shader is programmed to simply test for a ray-triangle intersection, and color the pixel white if an intersection was found and black otherwise. Instead of the expected behavior, when the triangle should be visible onscreen, the screen is instead filled with black and white squares/blocks/tiles which flicker randomly like TV static. The squares are at most 8x8 pixels (the size of my compute shader blocks), although there are dots as small as single pixels as well. The white blocks generally lie in the expected area of my triangle, although sometimes they are spread out across the bottom of the screen as well.
Here is a video of the artifact. In my full shader the camera can be rotated around and the shape appears more triangle-like, but the flickering artifact is the key issue and still appears in this video which I generated from the following minimal version of my shader code:
...ANSWER
Answered 2021-Mar-30 at 05:39I've fixed the issue, and it was (unsurprisingly) simply a stupid mistake on my own part.
Observe the following lines from my code snippet:
Which leaves my v2
vertex quite uninitialized.
The moral of this story is that if you have a similar issue to the one I described above, and you swear up and down that you've initialized all your variables and it must be a driver bug or someone else's fault... quadruple-check your variables, you probably forgot to initialize one.
QUESTION
Consider this pseudo code:
...ANSWER
Answered 2021-Feb-22 at 18:02I thought all commands pushed to a queue start sequentially but finish in an unspecified order, meaning they are executed in parallel.
Your use of the word "are" is the source of your confusion. They "can be" executed in parallel, but that doesn't mean they will be. The fact that a thing works on a Vulkan implementation (or even all implementations) is not enough evidence to say that your code is correct. This is why validation layers are so important.
You do need some kind of synchronization between the transfer and the point when the data is read. I would use an external dependency between the subpass reading the data (specifically the vertex input stage of that subpass) and the transfer operation. And it needs a memory dependency that covers that memory too, for the purpose of reading vertex data.
Indeed, if you haven't set any external dependencies for an attachment, your render pass will generate one automatically. And this is probably what makes your program "work". The implementation probably issues a full barrier between the render pass and any previous commands (even though the automatic external dependency doesn't include the vertex input stage). And the vertex input caches were probably cleared at some point, which prevents you from accidentally seeing old data.
However, the actual text of the implicit external dependency does not actually cover this use case. So if your code is "working", it is only by accident. So you still need an explicit dependency.
Your initial code (where you wrote data to host-visible memory that the GPU read directly) worked because submitting a batch implicitly synchronizes host writes globally, for all uses, so long as those host writes happen before the submit operation.
QUESTION
I have a total of two textures, the first is used as a framebuffer to work with inside a computeshader, which is later blitted using BlitFramebuffer(...)
. The second is supposed to be an OpenGL array texture, which is used to look up textures and copy them onto the framebuffer. It's created in the following way:
ANSWER
Answered 2020-Dec-21 at 22:41vec4 c = texture(texAtlas, vec3(iCoords.x%16, iCoords.y%16, 7))
QUESTION
Update: This has been solved, you can find further details here: https://stackoverflow.com/a/64405505/1889253
A similar question was asked previously, but that question was initially focused around using multiple command buffers, and triggering the submit across different threads to achieve parallel execution of shaders. Most of the answers suggest that the solution is to use multiple queues instead. The use of multiple queues also seems to be the consensus across various blog posts and Khronos forum answers. I have attempted those suggestions running shader executions across multiple queues but without being able to see parallel execution, so I wanted to ask what I may be doing wrong. As suggested, this question includes the runnable code of multiple compute shaders being submitted to multiple queues, which hopefully can be useful for other people looking to do the same (once this is resolved).
The current implementation is in this pull request / branch, however I will cover the main Vulkan specific points, to ensure only Vulkan knowledge is required to answer this question. It's also worth mentioning that the current use-case is specifically for compute queues and compute shaders, not graphics or transfer queues (although insights/experience achieving parallelism across those would still be very useful, and would most probably also lead to the answer).
More specifically, I have the following:
- Multiple queues first are "fetched" - my device is a NVIDIA 1650, and supports 16 graphics+compute queues in queue family index 0, and 8 compute queues in queue family index 2
- evalAsync performs the submission (which contains recorded shader commands) - You should notice that a fence is created which we'll be able to use. Also the submit doesn't have any waitStageMasks (PipelineStageFlags).
- evalAwait allows us to wait for the fence - When calling the evalAwait, we are able to wait for the submission to finish through the created fence
A couple of points that are not visible in the examples above but are important:
- All evalAsync run on the same application, instance and device
- Each evalAsync executes with its own separate commandBuffer and buffers, and in a separate queue
- If you are wondering whether memory barriers could be having something to do, we have tried by removing all memoryBarriers (this on for example that runs before shader execution) completely but this has not made any difference on performance
The test that is used in the benchmark can be found here, however the only key things to understand are:
- This is the shader that we use for testing, as you can see, we just add a bunch of atomicAdd steps to increase the amount of processing time
- Currently the test has small buffer size and high number of shader loop iterations, but we also tested with large buffer size (i.e. 100,000 instead of 10), and smaller iteration (1,000 istead of 100,000,000).
When running the test, we first run a set of "synchronous" shader executions on the same queue (the number is variable but we've tested with 6-16, the latter which is the max number of queues). Then we run these in an asychrnonous manner, where we run all of them and the evalAwait until they are finished. When comparing the resulting times from both approaches, they take the same amount of time eventhough they run across different compute queues.
My questions are:
- Am I currently missing something when fetching the queues?
- Are there further parameters in the vulkan setup that need to be configured to ensure asynchronous execution?
- Are there any restrictions I may not be aware about around potentially operating system processes only being able to submit GPU workloads in a synchronous way to the GPU?
- Would multithreading be required in order for parallel execution to work properly when dealing with multiple queue submissions?
Furthermore I have found several useful resources online across various reddit posts and Khronos Group forums that provide very in-depth conceptual and theoretical overviews on the topic, but I haven't come across end to end code examples that show parallel execution of shaders. If there are any practical examples out there that you can share, which have funcioning parallel execution of shaders, that would be very helpful.
If there are further details or questions that can help provide further context please let me know, happy to answer them and/or provide more detail.
For completeness, my tests were using:
- Vulkan SDK 1.2
- Windows 10
- NVIDIA 1650
Other relevant links that have been shared in similar posts:
- Similar discussion with suggested link to example but which seems to have disappeared...
- Post on Leveraging asynchronous queues for concurrent execution (unfortunately no example code)
- (Relatively old - 5 years) Post that suggests nvidia cards can't do parallel execution of shaders, but doesn't seem to have a conculsive answer
- Nvidia presentation on Vulkan Multithreading with multiple queue execution (hence my question above on threads)
ANSWER
Answered 2020-Oct-16 at 22:18You are getting "asynchronous execution". You just don't expect it to behave the way it behaves.
On a CPU, if you have one thread active, then you're using one CPU core (or hyper-thread). All of that core's execution and computation capabilities are given to your thread alone (ignoring pre-emption). But at the same time, if there are other cores, your one thread cannot use any of the computational resources of those cores. Not unless you create another thread.
GPUs don't work that way. A queue is not like a CPU thread. It does not specifically relate to a particular quantity of computational resources. A queue is merely the interface through which commands get executed; the underlying hardware decides how to farm out commands to the various compute resources provided by the GPU as a whole.
What generally happens when you execute a command is that the hardware attempts to fully saturate the available shader execution units using your command. If there are more shader units available than the number of invocations your operation requires, then some resources are available immediately for the next command. But if not, then the entire GPU's compute resources will be dedicated to executing the first operation; the second one must wait for resources to become available before it can start.
It doesn't matter how many compute queues you shove work into; they're all going to try to use as many compute resources as possible. So they will largely execute in some particular order.
Queue priority systems exist, but these mainly help determine the order of execution for commands. That is, if a high-priority queue has some commands that need to be executed, then they will take priority the next time compute resources become available for a new command.
So submitting 3 dispatch batches on 3 separate queues is not going to complete faster than submitting 1 batch on one queue containing 3 dispatch operations.
The main reason multiple queues (of the same family) exist is to be able to submit work from multiple threads without having them do inter-thread synchronization (and to provide some possible prioritization of submissions).
QUESTION
Currently I am trying to follow this tutorial on ray tracing using compute shaders. I am trying to set the matrices for the compute shader using ComputeShader.SetMatrix(string name, Matrix4x4 val)
, or in my case specifically, RayTracingShader.SetMatrix("_CameraToWorld", _camera.cameraToWorldMatrix);
.
However, I get the error 'ComputeShader does not contain a definition for SetMatrix'
, even though this Unity page clearly indicates that it should be possible. I am also unable to find anyone on Google with similar problems. I have just now updated Unity to version 5.5.4p4 Personal. Before that it didn't work either.
Any help fixing this would be greatly appreciated.
...ANSWER
Answered 2020-Feb-26 at 20:03Unity 5.5 does not define ComputeShader.SetMatrix
.
You need to update to a version of Unity where ComputeShader.SetMatrix
exists.
It seems like it first arrived in 2017.3
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install compute-shader
Rust is installed and managed by the rustup tool. Rust has a 6-week rapid release process and supports a great number of platforms, so there are many builds of Rust available at any time. Please refer rust-lang.org for more information.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page