cnmem | simple memory manager for CUDA | GPU library

by NVIDIA C++ Version: Current License: BSD-3-Clause

X-Ray Key Features Code Snippets Community Discussions(9)Vulnerabilities Install Support

kandi X-RAY | cnmem Summary

cnmem is a C++ library typically used in Hardware, GPU, Deep Learning applications. cnmem has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

Simple library to help the Deep Learning frameworks manage CUDA memory. CNMeM is not intended to be a general purpose memory management library. It was designed as a simple tool for applications which work on a limited number of large memory buffers. CNMeM is mostly developed on Ubuntu Linux. It should support other operating systems as well. If you encounter an issue with the library on other operating systems, please submit a bug (or a fix).

Support

Quality

Security

License

Reuse

Support

cnmem has a low active ecosystem.

It has 268 star(s) with 75 fork(s). There are 41 watchers for this library.

It had no major release in the last 6 months.

There are 6 open issues and 1 have been closed. On average issues are closed in 2 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of cnmem is current.

Quality

cnmem has 0 bugs and 0 code smells.

Security

cnmem has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

cnmem code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

cnmem is licensed under the BSD-3-Clause License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

cnmem releases are not available. You will need to build from source code and install.

Installation instructions, examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of cnmem

Get all kandi verified functions for this library.

cnmem Key Features

No Key Features are available at this moment for cnmem.

cnmem Examples and Code Snippets

No Code Snippets are available at this moment for cnmem.

Community Discussions

Trending Discussions on cnmem

CIFAR-10 Dimension Error Keras

Windows/Python Error WindowsError: [Error 3] The system cannot find the path specified

Why is sklearn faster on CPU than Theano on GPU?

Lasagne vs Theano possible version mismatch (Windows)

libgpuarray Test #10 and #11 fails

TensorFlow 1.0 does not see GPU on Windows (but Theano does)

Is there any point in using CNMeM when running TensorFlow?

I am trying to run autoencoder_layers.py using keras on gpu but i get this error

How to use the GPU to speed up the Pymc3 sampling?

QUESTION

CIFAR-10 Dimension Error Keras

Asked 2017-Oct-15 at 16:50

I am trying to run the Cifar-10 CNN code in my machine's GPU but I am facing the following issue:

Dimension (-1) must be in the range [0, 2), where 2 is the number of dimensions in the input. for 'metrics/acc/ArgMax' (op: 'ArgMax') with input shapes: [?,?], [].

Here is my code:

...

ANSWER

Answered 2017-Oct-15 at 16:50

My issue was solved after I reinstalled Anaconda, Tensorflow and Keras

Source https://stackoverflow.com/questions/46742071

QUESTION

Windows/Python Error WindowsError: [Error 3] The system cannot find the path specified

Asked 2017-Oct-07 at 13:37

Hi I am new to python and i need some help. I trying to run a file on Windows 10 OS with python 2.7.

...

ANSWER

Answered 2017-Oct-07 at 13:30

In windows pathe is given by back slash \ instead of forward slash / which is used in linux/unix.

Try it like blow if file is 1 folder back:

Source https://stackoverflow.com/questions/46620691

QUESTION

Why is sklearn faster on CPU than Theano on GPU?

Asked 2017-Sep-12 at 02:57

I've compared processing time with theano(CPU), theano(GPU) and Scikit-learn(CPU) using Python. But, I got strange result. Here look at the graph that I plot.

Processing Time Comparison:

you can see the result of scikit-learn that is faster than theano(GPU). The program that I checked its elapsed time is to compute euclidean distance matrix from a matrix which have n * 40 elements.

Here is the part of code.

...

ANSWER

Answered 2017-Sep-07 at 02:07

What makes scikit-learn ( on pure CPU-side ) so fast?

My initial candidates would be a mix of:

highly efficient use of available CPU-cores' L1-/ L2- sizes within the fastest [ns]-distances
smart numpy vectorised execution being friendly to CPU cache-lines
dataset so small, it can completely remain non-evicted from cache ( test to scale the dataset-under-review way above the L2-/L3-cache sizes to see the DDRx-memory-cost effects on the observed performance ( details are in the URL below ) )
might enjoy even better timing on numpy, if avoiding .astype() conversions ( test it )

Facts on the GPU-side

auto-generated GPU-kernels do not have much chance to get ultimate levels of global memory latency-masking, compared to manually tweaked kernel-designs, tailor fit to respective GPU-silicon-architecture / latencies observed in-vivo
data-structures larger than just a few KB remain paying GPU-SM/GDDR-MEM distances of ~ large hundreds of [ns], nearly [us] -v/s- compared to small units ~ small tens of [ns] at CPU/L1/L2/L3/DDRx ) ref. timing details in >>> https://stackoverflow.com/a/33065382
not being able to enjoy much of the GPU/SMX power, due to this task's obvious low-reuse of data points and dataset size beyond the GPU/SM-silicon limits, that causes and must cause GPU/SM-register capacity spillovers in any kind of GPU-kernel design attempts and tweaking
the global task is not having a minimum reasonable amount of asynchronous, isolated ( non-communicating islands ) mathematically-dense, yet SMX-local, GPU-kernel processing steps ( there is not much to compute so as to adjust for the add-on overheads and expensive SMX/GDDR memory costs )

GPU-s can lovely exhibit it's best-performance, if sufficiently enough densely-convoluted re-processing operations take place -- like in large-scale/high-resolution image-processing -- on [m,n,o]-convolution-kernel matrices so small, so as that all these m*n*o constant values can reside local to SM, inside an available set of SMX-SM_registers and if the GPU-kernel-launchers are optimally tweaked by the 3D-tblock/grid processing-layout geometries, so that the global memory access latencies are at its best-masked performance, having all the GPU-threads enforced within the hardware WARP-aligned SMx:WarpScheduler RoundRobin thread-scheduling capabilites ( the first swap from Round-Robin into Greedy-WarpSchedule mode loses the whole battle in case of divergent execution-paths in GPU-kernel-code ).

Source https://stackoverflow.com/questions/46046360

QUESTION

Lasagne vs Theano possible version mismatch (Windows)

Asked 2017-Apr-25 at 13:39

So i finally managed to get theano up and running on the GPU using this guide. (the test code runs fine, telling me it used the GPU, YAY!!) I then wanted to try it out and followed this guide for training a CNN on digit recognition.

problem is: i get errors from the way lasagne calls theano (i guess there is a version mismatch here):

...

ANSWER

Answered 2017-Apr-25 at 13:39

Try to reinstall Theano and Lasagne like this:

Source https://stackoverflow.com/questions/42998355

QUESTION

libgpuarray Test #10 and #11 fails

Asked 2017-Mar-18 at 00:35

Problem

I always have used theano normally. WIth CUDA and CUDNN and CNMEM. I have an XTITAN. Actually I ran my code on the university server.

Im trying to install libgpuarray but the tests #10 and #11 fails.

What should i do ?

Extra-Information

nvidia-smi

...

ANSWER

Answered 2017-Mar-18 at 00:35

According to the github libgpuarray issues :

The last two tests require nccl and will fail if it's not present. If you're not trying to use nccl, you can ignore those failures.

—

https://github.com/Theano/libgpuarray/issues/383#issuecomment-287491789

Source https://stackoverflow.com/questions/42867656

QUESTION

TensorFlow 1.0 does not see GPU on Windows (but Theano does)

Asked 2017-Feb-26 at 22:08

I have a running installation of Keras & Theano on Windows (by following this tutorial). Now I've tried to switch the backend to Tensorflow which worked quite fine.

The only issue I have, is that Tensorflow does not detect my GPU, which Theano in contrast does:

...

ANSWER

Answered 2017-Feb-26 at 22:08

Installing both tensorflow and tensorflow-gpu on the same machine might cause issues at the moment.

Install either tensorflow (for cpu only) or tensorflow-gpu (for gpu only) for version 1.0

Source https://stackoverflow.com/questions/42473052

QUESTION

Is there any point in using CNMeM when running TensorFlow?

Asked 2017-Feb-22 at 19:26

The CNMeM library is a "simple library to help the Deep Learning frameworks manage CUDA memory."

CNMeM has been reported to give some interesting speed improvements, and is supported by Theano, Torch, and Caffe. However, TensorFlow preallocates GPU memory when starting a session, unlike Theano, Torch, and Caffe.

Does using CNMeM when running a TensorFlow-based program help (e.g., reduce the running time)?

...

ANSWER

Answered 2017-Feb-22 at 19:26

No. Tensorflow has its own GPU memory management. Indeed it takes upfront the whole GPU memory regardless of the size of your problem.

Source https://stackoverflow.com/questions/42396992

QUESTION

I am trying to run autoencoder_layers.py using keras on gpu but i get this error

Asked 2017-Feb-21 at 04:40

autoencoder_layers.py github code

...

ANSWER

Answered 2017-Feb-21 at 04:40

Comment the line from keras.backend.theano_backend import _on_gpu and define _on_gpu as:

Source https://stackoverflow.com/questions/42344795

QUESTION

How to use the GPU to speed up the Pymc3 sampling?

Asked 2017-Feb-02 at 10:52

I've used the 'njobs' parameter to get the multi-sample results, and it's far away from my expection
I've changed the '.theanorc' file to set the 'floatX', 'cnmem' value, etc.
I've monitored the GPU source by the command 'nvidia-smi', and it's well used

But, the sampling speed is already slow, even slower than the CPU.
Is that normal?

...

ANSWER

Answered 2017-Feb-02 at 10:52

This sounds like a problem of convergence or model construction, not related to njobs or parallelism. Without the model or traces there is not a lot that can be said here.

GPU is still experimental and we've seen speed-ups for some models and slow-downs for others. ADVI seems to be easier to run on the GPU, though. You can also check that all your model types and input data are float32.

Source https://stackoverflow.com/questions/41824310

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install cnmem

To build the tests, you need to add an extra option to the cmake command.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: