fatbin | Compress executable and its resources

by remeh Go Version: Current License: MIT

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | fatbin Summary

fatbin is a Go library. fatbin has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

Instead of shipping a ZIP containing resources (images, sounds, etc.) and an executable, fatbin permits to compress everything in an unique executable file. It's my entry to the GopherGala 2016.

Support

Quality

Security

License

Reuse

Support

fatbin has a low active ecosystem.

It has 16 star(s) with 3 fork(s). There are 7 watchers for this library.

It had no major release in the last 6 months.

fatbin has no issues reported. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of fatbin is current.

Quality

fatbin has no bugs reported.

Security

fatbin has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

fatbin is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

fatbin releases are not available. You will need to build from source code and install.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi has reviewed fatbin and discovered the below as its top functions. This is intended to give you an instant insight into fatbin implemented functionality, and help decide if they suit your requirements.

readFiles reads the contents of the bufio . Reader into dstDir .
BuildFatbin builds a fatbin binary for the given executable .
parseDirectory parses the given directory .
Parse a fatbin file
RunFatbin runs the fatbin
extractData extracts data part from a file .
parseFlags parses the command line flags .
main is the main function
writeFile writes a file to dst
Extracts data from a file

Get all kandi verified functions for this library.

fatbin Key Features

No Key Features are available at this moment for fatbin.

fatbin Examples and Code Snippets

No Code Snippets are available at this moment for fatbin.

Community Discussions

Trending Discussions on fatbin

Does the CUDA JIT compiler perform device link-time optimization?

Struggling with CUDA, Clang and LLVM IR, and getting: CUDA failure: 'Invalid device function'

Using cuModuleLoad to get current Module from ELF binary (from argv[0])

CUDA compilation with relocatable code: "Could not find fatbin in ..."

CMake + CUDA "invalid device function" even with correct SM version

Understanding cuobjdump output

Unresolved extern function error with template default parameter in CUDA9.2 and above

Parallel Compilation of multiple CUDA architectures on same . cu file

Torch installation failure: "No space left on device"

Initializing cuda global variable

QUESTION

Does the CUDA JIT compiler perform device link-time optimization?

Asked 2021-May-17 at 09:07

Before device link-time optimization (DLTO) was introduced in CUDA 11.2, it was relatively easy to ensure forward compatibility without worrying too much about differences in performance. You would typically just create a fatbinary containing PTX for the lowest possible arch and SASS for the specific architectures you would normally target. For any future GPU architectures, the JIT compiler would then assemble the PTX into SASS optimized for that specific GPU arch.

Now, however, with DLTO, it is less clear to me how to ensure forward compatibility and maintain performance on those future architectures.

Let’s say I compile/link an application using nvcc with the following options:

Compile

...

ANSWER

Answered 2021-May-17 at 09:07

According to an NVIDIA employee on the CUDA forums the answer is "not yet":

Good question. We are working on support for JIT LTO, but in 11.2 it is not supported. So in the example you give at JIT time it will JIT each individual PTX to cubin and then do a cubin link. This is the same as we have always done for JIT linking. But we should have more support for JIT LTO in future releases.

Source https://stackoverflow.com/questions/67466664

QUESTION

Struggling with CUDA, Clang and LLVM IR, and getting: CUDA failure: 'Invalid device function'

Asked 2021-Apr-17 at 16:49

I am trying to optimize a CUDA code with LLVM passes on a PowerPC system (RHEL 7.6 with no root access) equipped with V100 GPUs, CUDA 10.1, and LLVM 11 (built from source). Also, I tested clang, lli, and opt on a simple C++ code, and everything works just fine.

After days of searching, reading, and trials-and-errors, I managed to compile a simple CUDA source. The code is the famous axpy:

...

ANSWER

Answered 2021-Apr-17 at 16:29

The problem was not related to PowerPC architecture. I needed to pass the fatbin file to the host-side compilation command with -Xclang -fcuda-include-gpubinary -Xclang axpy.fatbin to replicate the whole compilation behavior.

Here is the corrected Makefile:

Source https://stackoverflow.com/questions/67070926

QUESTION

Using cuModuleLoad to get current Module from ELF binary (from argv[0])

Asked 2020-Nov-14 at 04:42

Situation: I am trying to use cuModuleLoad to load the current binary's (ELF) embedded cubin (and PTX), but it keep erroring out with error code 200. My question is, if the cubin is embedded into the final binary, why can't I use cuModuleLoad to dynamically load ones self? It works when the I compile a separate fatbinary, but not when I load a separate PTX module, and of course when I try to load the final binary (a.out). I have a few reasons why I want to load the current executable that I will forgo to not go off topic. I am also looking for a workaround that maintains a single file without using utility tools (or system calls).

In Linux:

...

ANSWER

Answered 2020-Nov-14 at 04:42

Found a solution. In a nutshell :

fopen( argv[0] )
mmap ( file )
Read the ELF headers and find the ".nv_fatbin" section
Parse the ".nv_fatbin" aligning to byte sequence "50 ed 55 ba 01 00 10 00"
Find the cubin related to the global method you want to cuModuleGetFunction
Call cuModuleLoadFatBinary with a base address of the .nv_fatbin + specific cubin offset.
Get the function using cuModuleGetFunction
Finally call cuLaunchKernel

See sloppy code below for reference:

Source https://stackoverflow.com/questions/64815293

QUESTION

CUDA compilation with relocatable code: "Could not find fatbin in ..."

Asked 2020-Jul-08 at 15:02

As part of a larger CMake project, I am adding a CUDA library. The rest of the project is C++, compiled with clang.

To test that the library works correctly, I'm creating a small executable and linking the CUDA library to it:

...

ANSWER

Answered 2020-Jul-08 at 15:02

I couldn't reproduce this issue in a fresh, tiny CMake project, so I eventually figured out that some flag from my larger project wasn't playing along.

It turns out that Thin LTO, which was enabled in CMAKE_CXX_FLAGS is causing this issue. I disabled it for this particular target with:

Source https://stackoverflow.com/questions/62797804

QUESTION

CMake + CUDA "invalid device function" even with correct SM version

Asked 2019-Sep-16 at 08:21

I keep getting an "invalid device function" on my kernel launch. Google turns up a plethora of instances for this, however all of them seem to be related to a mismatch of the embedded SASS/PTX code embedded in the binary.

The way I understand how it works is:

SASS code can only be interpreted by an GPU with the exact same SM version 2
PTX code is forward-compatible, i.e. any newer GPU will be able to run the code (however, driver needs to JIT) 2
I need to specify what I want to target by passing suitable -arch commands to nvcc: -gencode arch=compute_30,code=sm_30 will create a SASS targeting SM 3.0, -gencode arch=compute_60,code=compute_60 will create PTX code 1
To use cuda with static and shared libraries, I need to compile for position-independent code and enable separable compilation

What I did now is:

Confirmed that I have SM 6.1 for my Titan Xp 5
Forced nvcc to generate compatible code 3
...

ANSWER

Answered 2019-Sep-16 at 08:19

Ultimately, as expected, this was due to a build system setup problem.

TLDR version:
I managed to fix it by changing the library with my CUDA code from STATIC to SHARED.

To fix it, I first used the automatic architecture detection from FindCuda CMake (which seems to have create SM 6.1, so I was at lest right there)

Source https://stackoverflow.com/questions/57915085

QUESTION

Understanding cuobjdump output

Asked 2019-Sep-09 at 10:13

I already read about virtual architecture and code generation for nvcc but I still have some questions.

I have a cuda compiled executable whose cuobjdump output is

...

ANSWER

Answered 2019-Sep-09 at 10:13

What does code version mean? Documentation doesn't say that.

It means the version of the fatbin element it is printing -- elf version 1.7 and PTX version 5.0 respectively (see here for PTX versions)

Would such an executable be compatible on a system with a sm_30 (Kepler) device?

Yes. The presence of the PTX (version 5.0) means the code can be JIT compiled by the driver to assembler to run on a compute capability 3.0 device (again documentation here)

Source https://stackoverflow.com/questions/57851566

QUESTION

Unresolved extern function error with template default parameter in CUDA9.2 and above

Asked 2019-Feb-02 at 00:26

I am working with some c++/CUDA code that makes significant use of templates for both classes and functions. We have mostly been using CUDA 9.0 and 9.1, where everything compiles and runs fine. However, compilation fails on newer versions of CUDA (specifically 9.2 and 10).

After further investigation, it seems that trying to compile exactly the same code with CUDA version 9.2.88 and above will fail, whereas with CUDA version 8 through 9.1.85 the code compiles and runs correctly.

A minimal example of the problematic code can be written as follows:

...

ANSWER

Answered 2019-Feb-02 at 00:26

This is a bug in CUDA 9.2 and 10.0 and a fix is being worked on. Thanks for pointing it out.

One possible workaround as you've already pointed out would be to revert to CUDA 9.1

Another possible workaround is to repeat the offending template instantiation in the body of the function (e.g. in a discarded statement). This has no impact on performance, it just forces the compiler to emit code for that function:

Source https://stackoverflow.com/questions/54353255

QUESTION

Parallel Compilation of multiple CUDA architectures on same . cu file

Asked 2018-Jun-30 at 14:40

I want my compiled CUDA code to work on any Nvidia GPU, so I compile each .cu file with the options:

...

ANSWER

Answered 2018-Jun-29 at 07:44

The tool chain doesn't support this and you shouldn't expect to be able to do this by hand as nvcc does either.

However, you can certainly script some sort process to

Execute parallel compilation of the code to multiple cubin files, one for each target architecture
Perform a device link pass to combine the cubins to a single elf payload
Link the final executable with the resulting object file emitted by the device link phase

You will probably need to enable separate device code compilation and you might also need to refactor your code slightly as a result. Caveat Emptor and all that.

Source https://stackoverflow.com/questions/51092717

QUESTION

Torch installation failure: "No space left on device"

Asked 2018-Mar-02 at 10:59

Cannot reinstall most recent Torch. Cloning fresh repo and attempting to install via install.sh which performs a series of make calls results in:

...

ANSWER

Answered 2017-May-30 at 19:18

It depends on what tmp is.

Sometimes, as an optimization, tmp is mounted in a ramdisk. You can take a look at that using mount or in /etc/fstab.

If this is not the case, then make sure the disk partition where /tmp is has enough space, or delete other unused temporary files.

BleachBit, packaged in many distros, can help you freeing space.

Source https://stackoverflow.com/questions/44270171

QUESTION

Initializing cuda global variable

Asked 2017-Dec-26 at 08:49

   __constant__ const unsigned int *ff = (const unsigned int[]){90, 50, 100};


int main()
{
}

...

ANSWER

Answered 2017-Dec-26 at 08:49

The compiler is telling you exactly what the error is. When you do this:

Source https://stackoverflow.com/questions/47945178

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install fatbin

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: