fatbin | Compress executable and its resources

 by   remeh Go Version: Current License: MIT

kandi X-RAY | fatbin Summary

kandi X-RAY | fatbin Summary

fatbin is a Go library. fatbin has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

Instead of shipping a ZIP containing resources (images, sounds, etc.) and an executable, fatbin permits to compress everything in an unique executable file. It's my entry to the GopherGala 2016.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              fatbin has a low active ecosystem.
              It has 16 star(s) with 3 fork(s). There are 7 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              fatbin has no issues reported. There are no pull requests.
              It has a neutral sentiment in the developer community.
              The latest version of fatbin is current.

            kandi-Quality Quality

              fatbin has no bugs reported.

            kandi-Security Security

              fatbin has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              fatbin is licensed under the MIT License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              fatbin releases are not available. You will need to build from source code and install.
              Installation instructions are not available. Examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi has reviewed fatbin and discovered the below as its top functions. This is intended to give you an instant insight into fatbin implemented functionality, and help decide if they suit your requirements.
            • readFiles reads the contents of the bufio . Reader into dstDir .
            • BuildFatbin builds a fatbin binary for the given executable .
            • parseDirectory parses the given directory .
            • Parse a fatbin file
            • RunFatbin runs the fatbin
            • extractData extracts data part from a file .
            • parseFlags parses the command line flags .
            • main is the main function
            • writeFile writes a file to dst
            • Extracts data from a file
            Get all kandi verified functions for this library.

            fatbin Key Features

            No Key Features are available at this moment for fatbin.

            fatbin Examples and Code Snippets

            No Code Snippets are available at this moment for fatbin.

            Community Discussions

            QUESTION

            Does the CUDA JIT compiler perform device link-time optimization?
            Asked 2021-May-17 at 09:07

            Before device link-time optimization (DLTO) was introduced in CUDA 11.2, it was relatively easy to ensure forward compatibility without worrying too much about differences in performance. You would typically just create a fatbinary containing PTX for the lowest possible arch and SASS for the specific architectures you would normally target. For any future GPU architectures, the JIT compiler would then assemble the PTX into SASS optimized for that specific GPU arch.

            Now, however, with DLTO, it is less clear to me how to ensure forward compatibility and maintain performance on those future architectures.

            Let’s say I compile/link an application using nvcc with the following options:

            Compile

            ...

            ANSWER

            Answered 2021-May-17 at 09:07

            According to an NVIDIA employee on the CUDA forums the answer is "not yet":

            Good question. We are working on support for JIT LTO, but in 11.2 it is not supported. So in the example you give at JIT time it will JIT each individual PTX to cubin and then do a cubin link. This is the same as we have always done for JIT linking. But we should have more support for JIT LTO in future releases.

            Source https://stackoverflow.com/questions/67466664

            QUESTION

            Struggling with CUDA, Clang and LLVM IR, and getting: CUDA failure: 'Invalid device function'
            Asked 2021-Apr-17 at 16:49

            I am trying to optimize a CUDA code with LLVM passes on a PowerPC system (RHEL 7.6 with no root access) equipped with V100 GPUs, CUDA 10.1, and LLVM 11 (built from source). Also, I tested clang, lli, and opt on a simple C++ code, and everything works just fine.

            After days of searching, reading, and trials-and-errors, I managed to compile a simple CUDA source. The code is the famous axpy:

            ...

            ANSWER

            Answered 2021-Apr-17 at 16:29

            The problem was not related to PowerPC architecture. I needed to pass the fatbin file to the host-side compilation command with -Xclang -fcuda-include-gpubinary -Xclang axpy.fatbin to replicate the whole compilation behavior.

            Here is the corrected Makefile:

            Source https://stackoverflow.com/questions/67070926

            QUESTION

            Using cuModuleLoad to get current Module from ELF binary (from argv[0])
            Asked 2020-Nov-14 at 04:42

            Situation: I am trying to use cuModuleLoad to load the current binary's (ELF) embedded cubin (and PTX), but it keep erroring out with error code 200. My question is, if the cubin is embedded into the final binary, why can't I use cuModuleLoad to dynamically load ones self? It works when the I compile a separate fatbinary, but not when I load a separate PTX module, and of course when I try to load the final binary (a.out). I have a few reasons why I want to load the current executable that I will forgo to not go off topic. I am also looking for a workaround that maintains a single file without using utility tools (or system calls).

            In Linux:

            ...

            ANSWER

            Answered 2020-Nov-14 at 04:42

            Found a solution. In a nutshell :

            1. fopen( argv[0] )
            2. mmap ( file )
            3. Read the ELF headers and find the ".nv_fatbin" section
            4. Parse the ".nv_fatbin" aligning to byte sequence "50 ed 55 ba 01 00 10 00"
            5. Find the cubin related to the global method you want to cuModuleGetFunction
            6. Call cuModuleLoadFatBinary with a base address of the .nv_fatbin + specific cubin offset.
            7. Get the function using cuModuleGetFunction
            8. Finally call cuLaunchKernel

            See sloppy code below for reference:

            Source https://stackoverflow.com/questions/64815293

            QUESTION

            CUDA compilation with relocatable code: "Could not find fatbin in ..."
            Asked 2020-Jul-08 at 15:02

            As part of a larger CMake project, I am adding a CUDA library. The rest of the project is C++, compiled with clang.

            To test that the library works correctly, I'm creating a small executable and linking the CUDA library to it:

            ...

            ANSWER

            Answered 2020-Jul-08 at 15:02

            I couldn't reproduce this issue in a fresh, tiny CMake project, so I eventually figured out that some flag from my larger project wasn't playing along.

            It turns out that Thin LTO, which was enabled in CMAKE_CXX_FLAGS is causing this issue. I disabled it for this particular target with:

            Source https://stackoverflow.com/questions/62797804

            QUESTION

            CMake + CUDA "invalid device function" even with correct SM version
            Asked 2019-Sep-16 at 08:21

            I keep getting an "invalid device function" on my kernel launch. Google turns up a plethora of instances for this, however all of them seem to be related to a mismatch of the embedded SASS/PTX code embedded in the binary.

            The way I understand how it works is:

            • SASS code can only be interpreted by an GPU with the exact same SM version 2
            • PTX code is forward-compatible, i.e. any newer GPU will be able to run the code (however, driver needs to JIT) 2
            • I need to specify what I want to target by passing suitable -arch commands to nvcc: -gencode arch=compute_30,code=sm_30 will create a SASS targeting SM 3.0, -gencode arch=compute_60,code=compute_60 will create PTX code 1
            • To use cuda with static and shared libraries, I need to compile for position-independent code and enable separable compilation

            What I did now is:

            • Confirmed that I have SM 6.1 for my Titan Xp 5
            • Forced nvcc to generate compatible code 3

              ...

            ANSWER

            Answered 2019-Sep-16 at 08:19

            Ultimately, as expected, this was due to a build system setup problem.

            TLDR version:
            I managed to fix it by changing the library with my CUDA code from STATIC to SHARED.

            To fix it, I first used the automatic architecture detection from FindCuda CMake (which seems to have create SM 6.1, so I was at lest right there)

            Source https://stackoverflow.com/questions/57915085

            QUESTION

            Understanding cuobjdump output
            Asked 2019-Sep-09 at 10:13

            I already read about virtual architecture and code generation for nvcc but I still have some questions.

            I have a cuda compiled executable whose cuobjdump output is

            ...

            ANSWER

            Answered 2019-Sep-09 at 10:13
            1. What does code version mean? Documentation doesn't say that.

            It means the version of the fatbin element it is printing -- elf version 1.7 and PTX version 5.0 respectively (see here for PTX versions)

            1. Would such an executable be compatible on a system with a sm_30 (Kepler) device?

            Yes. The presence of the PTX (version 5.0) means the code can be JIT compiled by the driver to assembler to run on a compute capability 3.0 device (again documentation here)

            Source https://stackoverflow.com/questions/57851566

            QUESTION

            Unresolved extern function error with template default parameter in CUDA9.2 and above
            Asked 2019-Feb-02 at 00:26

            I am working with some c++/CUDA code that makes significant use of templates for both classes and functions. We have mostly been using CUDA 9.0 and 9.1, where everything compiles and runs fine. However, compilation fails on newer versions of CUDA (specifically 9.2 and 10).

            After further investigation, it seems that trying to compile exactly the same code with CUDA version 9.2.88 and above will fail, whereas with CUDA version 8 through 9.1.85 the code compiles and runs correctly.

            A minimal example of the problematic code can be written as follows:

            ...

            ANSWER

            Answered 2019-Feb-02 at 00:26

            This is a bug in CUDA 9.2 and 10.0 and a fix is being worked on. Thanks for pointing it out.

            One possible workaround as you've already pointed out would be to revert to CUDA 9.1

            Another possible workaround is to repeat the offending template instantiation in the body of the function (e.g. in a discarded statement). This has no impact on performance, it just forces the compiler to emit code for that function:

            Source https://stackoverflow.com/questions/54353255

            QUESTION

            Parallel Compilation of multiple CUDA architectures on same . cu file
            Asked 2018-Jun-30 at 14:40

            I want my compiled CUDA code to work on any Nvidia GPU, so I compile each .cu file with the options:

            ...

            ANSWER

            Answered 2018-Jun-29 at 07:44

            The tool chain doesn't support this and you shouldn't expect to be able to do this by hand as nvcc does either.

            However, you can certainly script some sort process to

            1. Execute parallel compilation of the code to multiple cubin files, one for each target architecture
            2. Perform a device link pass to combine the cubins to a single elf payload
            3. Link the final executable with the resulting object file emitted by the device link phase

            You will probably need to enable separate device code compilation and you might also need to refactor your code slightly as a result. Caveat Emptor and all that.

            Source https://stackoverflow.com/questions/51092717

            QUESTION

            Torch installation failure: "No space left on device"
            Asked 2018-Mar-02 at 10:59

            Cannot reinstall most recent Torch. Cloning fresh repo and attempting to install via install.sh which performs a series of make calls results in:

            ...

            ANSWER

            Answered 2017-May-30 at 19:18

            It depends on what tmp is.

            Sometimes, as an optimization, tmp is mounted in a ramdisk. You can take a look at that using mount or in /etc/fstab.

            If this is not the case, then make sure the disk partition where /tmp is has enough space, or delete other unused temporary files.

            BleachBit, packaged in many distros, can help you freeing space.

            Source https://stackoverflow.com/questions/44270171

            QUESTION

            Initializing cuda global variable
            Asked 2017-Dec-26 at 08:49
               __constant__ const unsigned int *ff = (const unsigned int[]){90, 50, 100};
            
            
            int main()
            {
            }
            
            ...

            ANSWER

            Answered 2017-Dec-26 at 08:49

            The compiler is telling you exactly what the error is. When you do this:

            Source https://stackoverflow.com/questions/47945178

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install fatbin

            You can download it from GitHub.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/remeh/fatbin.git

          • CLI

            gh repo clone remeh/fatbin

          • sshUrl

            git@github.com:remeh/fatbin.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link