MIOpenGEMM | OpenCL general matrix multiplication API | GPU library

by ROCmSoftwarePlatform C++ Version: rocm-5.5.0 License: MIT

X-Ray Key Features Code Snippets(3)Community Discussions(1)Vulnerabilities Install Support

kandi X-RAY | MIOpenGEMM Summary

MIOpenGEMM is a C++ library typically used in Hardware, GPU applications. MIOpenGEMM has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

An OpenCL general matrix multiplication (GEMM) API and kernel generator. More information is available on the wiki.

Support

Quality

Security

License

Reuse

Support

MIOpenGEMM has a low active ecosystem.

It has 54 star(s) with 14 fork(s). There are 30 watchers for this library.

It had no major release in the last 12 months.

There are 7 open issues and 16 have been closed. On average issues are closed in 48 days. There are 4 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of MIOpenGEMM is rocm-5.5.0

Quality

MIOpenGEMM has 0 bugs and 0 code smells.

Security

MIOpenGEMM has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

MIOpenGEMM code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

MIOpenGEMM is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

MIOpenGEMM releases are available to install and integrate.

Installation instructions, examples and code snippets are available.

It has 105 lines of code, 1 functions and 3 files.

It has low code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of MIOpenGEMM

Get all kandi verified functions for this library.

MIOpenGEMM Key Features

No Key Features are available at this moment for MIOpenGEMM.

MIOpenGEMM Examples and Code Snippets

MIOpenGEMM,Configure with cmake

C++

Lines of Code : 4

License : Permissive (MIT)

Copy

mkdir build; cd build;

cmake ..

cmake -DOPENCL_LIBRARIES= -DOPENCL_INCLUDE_DIRS ..

cmake -DCMAKE_INSTALL_PREFIX= ..

MIOpenGEMM,Run the test

C++

Lines of Code : 3

License : Permissive (MIT)

Copy

make smallgeometrytests
./tests/smallgeometrytests 

make check

MIOpenGEMM,Use the library

C++

Lines of Code : 3

License : Permissive (MIT)

Copy

#include 

template 
MIOpenGEMM::GemmStatus xgemm(...)

Community Discussions

Trending Discussions on MIOpenGEMM

Performance drop in matrix multiplication for certain sizes on AMD Polaris

QUESTION

Performance drop in matrix multiplication for certain sizes on AMD Polaris

Asked 2021-Jul-09 at 20:28

I have an OpenCL code that multiplies 2 matrices (GEMM) with M=4096, N=4096 and K=16. (i.e. matrices 4096 x 16 floats)

I run it on Polaris 560, 16CU GPU.

Code: https://github.com/artyom-beilis/oclblas/blob/master/gemm/gemm.cl

I noticed very strange performance drops for this size, matrix multiplication with this size has ~8-10 GFlops performance while if I change N to 4095 or 4097 I'm getting around 130-150Gflops. I notices similar behaviour with other GEMM libraries like clblas or miopengemm - I'm getting significant performance drop for this particular size of 4096x16 and changing N by 1 boosts the performance several times.

The workload is split into work-groups of 256 threads. Each work-group handles 128x16 and 128x16 matrix tiles (8x8 block per threads).

I tried changing matrix tiling to 96x96 with 6x6 blocks instead of 128x128 with 8x8 - same result.

I tested same code with ROCm 3.7 OpenCL, Clover OpenCL and even with Windows OpenCL driver - same behavior.

There is no such issue with nvidia gtx 960 having same number of gpu cores (threads) and same memory type/size.

I suspect that this is somehow cache/collision related but I don't understand how it happens. Thus I don't know how to work-around it.

...

ANSWER

Answered 2021-Jul-09 at 20:28

Finally I found that clBlas library (developed for AMD originally) handles special case of lda % 1024==0, ldb % 1024==0 probably due to cache

https://github.com/clMathLibraries/clBLAS/blob/master/src/library/blas/specialCases/GemmSpecialCases.cpp#L228

I found that the better way was to rearrange blocks in z-curve order instead of queuing several kernels.

https://github.com/artyom-beilis/oclblas/blob/master/gemm/gemm.cl#L109

To handle cases M!=N or M != 1< I just increased number of work groups on M/N to neares 1< and groups that don't have jobs exit in the begging not adding too much overhead.

z-order improved performance x4 times.

Source https://stackoverflow.com/questions/68149241

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install MIOpenGEMM

The library can be built, from the build directory. And can be installed by using the 'install' target.
All examples can be built with. or individually by name, for example.
HTML and PDF documentation can be built using:. This will build a local searchable web site inside the ./MIOpenGEMM/doc/html folder and a PDF document inside the ./MIOpenGEMM/doc/pdf folder. Documentation is generated using Doxygen and should be installed separately. HTML and PDFs are generated using Sphinx and Breathe, with the ReadTheDocs theme.

Support

HTML and PDF documentation can be built using:. This will build a local searchable web site inside the ./MIOpenGEMM/doc/html folder and a PDF document inside the ./MIOpenGEMM/doc/pdf folder. Documentation is generated using Doxygen and should be installed separately. HTML and PDFs are generated using Sphinx and Breathe, with the ReadTheDocs theme.

Find more information at: