pocl | pocl - Portable Computing Language | GPU library

by pocl C Version: v3.1 License: MIT

X-Ray Key Features Code Snippets Community Discussions(5)Vulnerabilities Install Support

kandi X-RAY | pocl Summary

pocl is a C library typically used in Hardware, GPU applications. pocl has no bugs, it has no vulnerabilities, it has a Permissive License and it has medium support. You can download it from GitHub.

pocl is being developed towards an efficient implementation of OpenCL standard which can be easily adapted for new targets. Please refer to the file INSTALL in this directory for building and installing pocl. More documentation available at The main web page is at

Support

Quality

Security

License

Reuse

Support

pocl has a medium active ecosystem.

It has 748 star(s) with 222 fork(s). There are 74 watchers for this library.

It had no major release in the last 12 months.

There are 65 open issues and 556 have been closed. On average issues are closed in 429 days. There are 10 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of pocl is v3.1

Quality

pocl has 0 bugs and 0 code smells.

Security

pocl has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

pocl code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

pocl is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

pocl releases are available to install and integrate.

It has 562 lines of code, 11 functions and 13 files.

It has high code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of pocl

Get all kandi verified functions for this library.

pocl Key Features

No Key Features are available at this moment for pocl.

pocl Examples and Code Snippets

No Code Snippets are available at this moment for pocl.

Community Discussions

Trending Discussions on pocl

Do 64bit atomic operations work in openCL on AMD cards?

How to optimize this OpenCL kernel?

Error in R CMD SHLIB compiling Fortran code

OpenCL development under Ubuntu

Optimization of PyOpenCL program / FFT

QUESTION

Do 64bit atomic operations work in openCL on AMD cards?

Asked 2021-Apr-22 at 11:41

The implementation of emulated atomics in openCL following the STREAM blog works nicely for atomic add in 32bit, on CPU as well as NVIDIA and AMD GPUs.

The 64bit equivalent based on the cl_khr_int64_base_atomics extension seems to run properly on (pocl and intel) CPU as well as NVIDIA openCL drivers.

I fail to make 64bit work on AMD GPU cards though -- both on amdgpu-pro and rocm (3.5.0) environments, running on a Radeon VII and a Radeon Instinct MI50, respectively.

The implementation goes as follows:

...

ANSWER

Answered 2021-Apr-22 at 11:41

For 64-bit, the function is called atom_cmpxchg and not atomic_cmpxchg.

Source https://stackoverflow.com/questions/67211566

QUESTION

How to optimize this OpenCL kernel?

Asked 2020-Nov-07 at 03:22

I'm working on a project and I've got some problems with this OpenCL kernel :-(

...

ANSWER

Answered 2020-Nov-07 at 03:22

All the operations inside the loop do not have side effects, you only read from those __global pointers, and you calculate some temporary values that in the end get accumulated into aa through that final aa += .... In other words, the sole purpose of that loop is to calculate the value of aa.

Therefore, if you remove aa from the last line (outside the loop), all the operations inside the loop are completely useless, and you end up with a loop that does nothing except reading some values and updating local variables that will get discarded at function return. Compiling the above code with optimizations enabled (which I assume you are doing, otherwise your question wouldn't make much sense), the compiler is very likely to just get rid of the entire loop. Hence, the code without that final aa runs a lot faster.

Here's a GCC example (adapted removing CUDA annotations), where you can see that even the lowest level of optimization (-O1) removes the entire body of the loop, leaving only comparisons and the incrementing of i. With -O2, the whole loop is removed.

Source https://stackoverflow.com/questions/64723746

QUESTION

Error in R CMD SHLIB compiling Fortran code

Asked 2020-Oct-01 at 12:37

I'm trying to compile a Fortran subroutine in the remote machine, when I run:

R CMD SHLIB -fPIC vintp2p_afterburner_wind.f

I get the following error:

...

ANSWER

Answered 2020-Oct-01 at 12:37

Maybe someone will find it useful: compiling is done by :

gfortran -fPIC -shared -ffree-form vintp2p_afterburner_wind.f -o vintp2p_afterburner_wind.so

Source https://stackoverflow.com/questions/64154853

QUESTION

OpenCL development under Ubuntu

Asked 2020-Sep-28 at 20:57

I want to develop an OpenCL based application with host code in C, using Ubuntu.

But the development packages overwhelm me:

...

ANSWER

Answered 2020-Sep-28 at 20:57

You don't need any of them. See this answer.

Source https://stackoverflow.com/questions/63942556

QUESTION

Optimization of PyOpenCL program / FFT

Asked 2020-Jan-17 at 01:42

General Overview of Program: The majority of the code here creates the FrameProcessor object. This object is initialized with some data shape, generally 2048xN, and can then be called to process the data using a series of kernels (proc_frame). For each vector of length 2048 the program will:

Apply a Hanning window (elementwise multiplication 2048*2048)
Do a linear interpolation to remap values (to map to linear-in-wavenumber space from non-linear spectrometer bins which signal is derived from--not too important of a detail but I figured it would be good to include in case it was unclear)
Apply an FFT

Problem: I want to go faster! The code below is not performing poorly, but for this project I need it to be as fast as it can possibly be. However, I am unsure on how I might make further improvements to this code. So, I'm looking for suggestions on relevant reading, alternate libraries which I should use, changes to code structure, etc.

Current Performance: On my rig with a GeForce RTX 2080 the benchmarks I get (with n=60, which seems to give best performance) are:

...

ANSWER

Answered 2020-Jan-17 at 01:42

Copying my reply in the Reikna group for reference.

Create a reikna Thread object from whatever pyopencl queue you want it to use (probably the one associated with the arrays you want to pass to FFT)

Create an FFT computation based on this Thread

Pass your pyopencl arrays to it without any conversion. (you can create a reikna array based on the buffer from a pyopencl array, by passing it as base_data keyword, but if using FFT is all you need, that is not necessary).

Reikna threads are wrappers on top of pyopencl context + queue, and reikna arrays are subclasses of pyopencl arrays, so the interop should be pretty simple.

Applying this (in a quick and dirty way, feel free to improve), I get: https://gist.github.com/fjarri/f781d3695b7c6678856110cced95be40 . Basically, the changes are:

creating a Thread out of the existing queue (self.thr = self.api.Thread(self.queue))
using the PyOpenCL buffer in FFT without copying it to CPU.

The results I get:

Source https://stackoverflow.com/questions/59777482

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install pocl

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: