CL-basic | simple prototype for a basic OpenCL host | GPU library

by OpenCL C Version: Current License: No License

X-Ray Key Features Code Snippets Community Discussions(2)Vulnerabilities Install Support

kandi X-RAY | CL-basic Summary

CL-basic is a C library typically used in Hardware, GPU applications. CL-basic has no bugs, it has no vulnerabilities and it has low support. You can download it from GitHub.

A very simple prototype for a basic OpenCL host and kernel code

Support

Quality

Security

License

Reuse

Support

CL-basic has a low active ecosystem.

It has 10 star(s) with 3 fork(s). There are 2 watchers for this library.

It had no major release in the last 6 months.

CL-basic has no issues reported. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of CL-basic is current.

Quality

CL-basic has no bugs reported.

Security

CL-basic has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

CL-basic does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

CL-basic releases are not available. You will need to build from source code and install.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of CL-basic

Get all kandi verified functions for this library.

CL-basic Key Features

No Key Features are available at this moment for CL-basic.

CL-basic Examples and Code Snippets

No Code Snippets are available at this moment for CL-basic.

Community Discussions

Trending Discussions on CL-basic

Can I transfer a jpeg or mjpeg file from the hard drive to the GPU's memory?

Is there any guarantee that all of threads in WaveFront (OpenCL) always synchronized?

QUESTION

Can I transfer a jpeg or mjpeg file from the hard drive to the GPU's memory?

Asked 2017-Sep-06 at 13:36

I am trying to translate single-threaded serial code for the MJPEG decoder into OpenCL code which I want to execute on the GPU (NVIDIA Tesla k20c).

After translating several major functions into kernels, the execution time of the code has gone from about 18 ms per frame to an abysmal 400 ms per frame.

I am using a standard method of opening a file, reading it, using buffer and ndrange commands to execute code on the GPU and read the results from the CPU. I feel that transferring the mjpeg file (which is of the data type FILE) to the GPU's memory will considerably cut down the communication overhead when the code is processed.

I referred to this link but the suggestions are only applicable to CUDA. This source and NVIDIA's OpenCL guide explain the utility of pinned memory but their usage of pinned memory is confined to kernel parameters and buffer commands.

I want to transfer the entire MJPEG file (size is about 2.8 MB) to the GPU's memory but I am struggling to find resources which do it.

Can I do this safely? If this can be done, how can I read the file to perform the various steps of MJPEG decoding?

EDIT:

The details of my GPU are as follows:

...

ANSWER

Answered 2017-Sep-05 at 21:02

There's nothing stopping you from copying the literal data of the image into a buffer in host memory and then copying it to the GPU:

Source https://stackoverflow.com/questions/46048565

QUESTION

Is there any guarantee that all of threads in WaveFront (OpenCL) always synchronized?

Asked 2017-Feb-19 at 13:34

As known, there are WARP (in CUDA) and WaveFront (in OpenCL): http://courses.cs.washington.edu/courses/cse471/13sp/lectures/GPUsStudents.pdf

WARP in CUDA: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#simt-architecture

4.1. SIMT Architecture

...

A warp executes one common instruction at a time, so full efficiency is realized when all 32 threads of a warp agree on their execution path. If threads of a warp diverge via a data-dependent conditional branch, the warp serially executes each branch path taken, disabling threads that are not on that path, and when all paths complete, the threads converge back to the same execution path. Branch divergence occurs only within a warp; different warps execute independently regardless of whether they are executing common or disjoint code paths.

The SIMT architecture is akin to SIMD (Single Instruction, Multiple Data) vector organizations in that a single instruction controls multiple processing elements. A key difference is that SIMD vector organizations expose the SIMD width to the software, whereas SIMT instructions specify the execution and branching behavior of a single thread.

WaveFront in OpenCL: https://sites.google.com/site/csc8820/opencl-basics/opencl-terms-explained#TOC-Wavefront

During runtime, the first wavefront is sent to the compute unit to run, then the second wavefront is sent to the compute unit, and so on. Work items within one wavefront are executed in parallel and in lock steps. But different wavefronts are executed sequentially.

I.e. we know, that:

threads in WARP (CUDA) - are SIMT-threads, which always executes the same instructions at each time and always are stay synchronized - i.e. threads of WARP are the same as lanes of SIMD (on CPU)
threads in WaveFront (OpenCL) - are threads, which always executes in parallel, but not necessarily all the threads perform the exact same instruction, and not necessarily all of the threads are synchronized

But is there any guarantee that all of the threads in the WaveFront always synchronized such as threads in WARP or as lanes in SIMD?

Conclusion:

WaveFront-threads (items) are always synchronized - lock step: "wavefront executes a number of work-items in lock step relative to each other."
WaveFront mapped on SIMD-block: "all work-items in the wavefront go to both paths of flow control"
I.e. each WaveFront-thread (item) mapped to SIMD-lanes

page-20: http://developer.amd.com/wordpress/media/2013/07/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide-rev-2.7.pdf

Chapter 1 OpenCL Architecture and AMD Accelerated Parallel Processing

1.1 Terminology

...

Wavefronts and work-groups are two concepts relating to compute kernels that provide data-parallel granularity. A wavefront executes a number of work-items in lock step relative to each other. Sixteen workitems are execute in parallel across the vector unit, and the whole wavefront is covered over four clock cycles. It is the lowest level that flow control can affect. This means that if two work-items inside of a wavefront go divergent paths of flow control, all work-items in the wavefront go to both paths of flow control.

This is true for: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_OpenCL_Programming_Optimization_Guide2.pdf

(page-45) Chapter 2 OpenCL Performance and Optimization for GCN Devices
(page-81) Chapter 3 OpenCL Performance and Optimization for Evergreen and Northern Islands Devices

...

ANSWER

Answered 2017-Feb-15 at 22:41

First, you can query some values:

Source https://stackoverflow.com/questions/42259118

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install CL-basic

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: