MIOpenGEMM | OpenCL general matrix multiplication API | GPU library

 by   ROCmSoftwarePlatform C++ Version: rocm-5.5.0 License: MIT

kandi X-RAY | MIOpenGEMM Summary

kandi X-RAY | MIOpenGEMM Summary

MIOpenGEMM is a C++ library typically used in Hardware, GPU applications. MIOpenGEMM has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

An OpenCL general matrix multiplication (GEMM) API and kernel generator. More information is available on the wiki.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              MIOpenGEMM has a low active ecosystem.
              It has 54 star(s) with 14 fork(s). There are 30 watchers for this library.
              OutlinedDot
              It had no major release in the last 12 months.
              There are 7 open issues and 16 have been closed. On average issues are closed in 48 days. There are 4 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of MIOpenGEMM is rocm-5.5.0

            kandi-Quality Quality

              MIOpenGEMM has 0 bugs and 0 code smells.

            kandi-Security Security

              MIOpenGEMM has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              MIOpenGEMM code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              MIOpenGEMM is licensed under the MIT License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              MIOpenGEMM releases are available to install and integrate.
              Installation instructions, examples and code snippets are available.
              It has 105 lines of code, 1 functions and 3 files.
              It has low code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of MIOpenGEMM
            Get all kandi verified functions for this library.

            MIOpenGEMM Key Features

            No Key Features are available at this moment for MIOpenGEMM.

            MIOpenGEMM Examples and Code Snippets

            MIOpenGEMM,Configure with cmake
            C++dot img1Lines of Code : 4dot img1License : Permissive (MIT)
            copy iconCopy
            mkdir build; cd build;
            
            cmake ..
            
            cmake -DOPENCL_LIBRARIES= -DOPENCL_INCLUDE_DIRS ..
            
            cmake -DCMAKE_INSTALL_PREFIX= ..
              
            MIOpenGEMM,Run the test
            C++dot img2Lines of Code : 3dot img2License : Permissive (MIT)
            copy iconCopy
            make smallgeometrytests
            ./tests/smallgeometrytests 
            
            make check
              
            MIOpenGEMM,Use the library
            C++dot img3Lines of Code : 3dot img3License : Permissive (MIT)
            copy iconCopy
            #include 
            
            template 
            MIOpenGEMM::GemmStatus xgemm(...)
              

            Community Discussions

            QUESTION

            Performance drop in matrix multiplication for certain sizes on AMD Polaris
            Asked 2021-Jul-09 at 20:28

            I have an OpenCL code that multiplies 2 matrices (GEMM) with M=4096, N=4096 and K=16. (i.e. matrices 4096 x 16 floats)

            I run it on Polaris 560, 16CU GPU.

            Code: https://github.com/artyom-beilis/oclblas/blob/master/gemm/gemm.cl

            I noticed very strange performance drops for this size, matrix multiplication with this size has ~8-10 GFlops performance while if I change N to 4095 or 4097 I'm getting around 130-150Gflops. I notices similar behaviour with other GEMM libraries like clblas or miopengemm - I'm getting significant performance drop for this particular size of 4096x16 and changing N by 1 boosts the performance several times.

            The workload is split into work-groups of 256 threads. Each work-group handles 128x16 and 128x16 matrix tiles (8x8 block per threads).

            I tried changing matrix tiling to 96x96 with 6x6 blocks instead of 128x128 with 8x8 - same result.

            I tested same code with ROCm 3.7 OpenCL, Clover OpenCL and even with Windows OpenCL driver - same behavior.

            There is no such issue with nvidia gtx 960 having same number of gpu cores (threads) and same memory type/size.

            I suspect that this is somehow cache/collision related but I don't understand how it happens. Thus I don't know how to work-around it.

            ...

            ANSWER

            Answered 2021-Jul-09 at 20:28

            Finally I found that clBlas library (developed for AMD originally) handles special case of lda % 1024==0, ldb % 1024==0 probably due to cache

            https://github.com/clMathLibraries/clBLAS/blob/master/src/library/blas/specialCases/GemmSpecialCases.cpp#L228

            I found that the better way was to rearrange blocks in z-curve order instead of queuing several kernels.

            https://github.com/artyom-beilis/oclblas/blob/master/gemm/gemm.cl#L109

            To handle cases M!=N or M != 1< I just increased number of work groups on M/N to neares 1< and groups that don't have jobs exit in the begging not adding too much overhead.

            z-order improved performance x4 times.

            Source https://stackoverflow.com/questions/68149241

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install MIOpenGEMM

            The library can be built, from the build directory. And can be installed by using the 'install' target.
            All examples can be built with. or individually by name, for example.
            HTML and PDF documentation can be built using:. This will build a local searchable web site inside the ./MIOpenGEMM/doc/html folder and a PDF document inside the ./MIOpenGEMM/doc/pdf folder. Documentation is generated using Doxygen and should be installed separately. HTML and PDFs are generated using Sphinx and Breathe, with the ReadTheDocs theme.

            Support

            HTML and PDF documentation can be built using:. This will build a local searchable web site inside the ./MIOpenGEMM/doc/html folder and a PDF document inside the ./MIOpenGEMM/doc/pdf folder. Documentation is generated using Doxygen and should be installed separately. HTML and PDFs are generated using Sphinx and Breathe, with the ReadTheDocs theme.
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/ROCmSoftwarePlatform/MIOpenGEMM.git

          • CLI

            gh repo clone ROCmSoftwarePlatform/MIOpenGEMM

          • sshUrl

            git@github.com:ROCmSoftwarePlatform/MIOpenGEMM.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Explore Related Topics

            Consider Popular GPU Libraries

            taichi

            by taichi-dev

            gpu.js

            by gpujs

            hashcat

            by hashcat

            cupy

            by cupy

            EASTL

            by electronicarts

            Try Top Libraries by ROCmSoftwarePlatform

            tensorflow-upstream

            by ROCmSoftwarePlatformC++

            rocBLAS

            by ROCmSoftwarePlatformC++

            Tensile

            by ROCmSoftwarePlatformPython

            rccl

            by ROCmSoftwarePlatformC++

            composable_kernel

            by ROCmSoftwarePlatformC++