face-alignment | 7000FPS face alignment | Graphics library

by memoiry C++ Version: Current License: No License

X-Ray Key Features Code Snippets Community Discussions(1)Vulnerabilities Install Support

kandi X-RAY | face-alignment Summary

face-alignment is a C++ library typically used in User Interface, Graphics, Pytorch applications. face-alignment has no bugs, it has no vulnerabilities and it has low support. You can download it from GitHub.

7000+FPS face alignment

Support

Quality

Security

License

Reuse

Support

face-alignment has a low active ecosystem.

It has 22 star(s) with 13 fork(s). There are 4 watchers for this library.

It had no major release in the last 6 months.

There are 1 open issues and 0 have been closed. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of face-alignment is current.

Quality

face-alignment has no bugs reported.

Security

face-alignment has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

face-alignment does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

face-alignment releases are not available. You will need to build from source code and install.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of face-alignment

Get all kandi verified functions for this library.

face-alignment Key Features

No Key Features are available at this moment for face-alignment.

face-alignment Examples and Code Snippets

No Code Snippets are available at this moment for face-alignment.

Community Discussions

Trending Discussions on face-alignment

Why does Torch use ~700mb of GPU memory when predicting with a 1.5mb network

QUESTION

Why does Torch use ~700mb of GPU memory when predicting with a 1.5mb network

Asked 2019-Apr-13 at 14:19

I am very new to Torch/CUDA, and I'm trying to test the small binary network (~1.5mb) from https://github.com/1adrianb/binary-face-alignment, but I keep running into 'out of memory' issues.

I am using a relatively weak GPU (NVIDIA Quadro K600) with ~900Mb of graphics memory on 16.04 Ubuntu with CUDA 10.0 and CudNN version 5.1. So I don't really care about performance, but I thought I would at least be able to run a small network for prediction, one image at a time (especially one that supposedly is aimed at those "with Limited Resources").

I managed to run the code in headless mode and checked the memory consumption to be around 700Mb, which would explain why it fails immediately when I have an X-server running which takes around 250Mb of GPU memory.

I also added some logs to see how far along main.lua I get, and it's the call output:copy(model:forward(img)) on the very first image that runs out of memory.

For reference, here's the main.lua code up until the crash:

...

ANSWER

Answered 2019-Apr-11 at 20:18

What usually consumes most of the memory are the activation maps (and gradients, when training). I am not familiar with this particular model and implementation, but I would say that you are using a "fake" binary network; by fake I mean they still use floating-point numbers to represent the binary values since most users are going to use their code on GPUs that do not fully support real binary operations. The authors even write in Section 5:

Performance. In theory, by replacing all floating-point multiplications with bitwise XOR and making use of the SWAR (Single instruction, multiple data within a register) [5], [6], the number of operations can be reduced up to 32x when compared against the multiplication-based convolution. However, in our tests, we observed speedups of up to 3.5x, when compared against cuBLAS, for matrix multiplications, a result being in accordance with those reported in [6]. We note that we did not conduct experiments on CPUs. However, given the fact that we used the same method for binarization as in [5], similar improvements in terms of speed, of the order of 58x, are to be expected: as the realvalued network takes 0.67 seconds to do a forward pass on a i7-3820 using a single core, a speedup close to x58 will allow the system to run in real-time. In terms of memory compression, by removing the biases, which have minimum impact (or no impact at all) on performance, and by grouping and storing every 32 weights in one variable, we can achieve a compression rate of 39x when compared against the single precision counterpart of Torch.

In this context, a small model (w.r.t. number of parameters or model size in MiB) does not necessarily mean low memory footprint. It is likely that all this memory is being used to store the activation maps in single- or double-precision.

Source https://stackoverflow.com/questions/55636577

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install face-alignment

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: