Explore all GPU open source software, libraries, packages, source code, cloud functions and APIs.

Popular New Releases in GPU

taichi

v1.0.0

gpu.js

Maintenance release

hashcat

hashcat v6.2.4

cupy

v10.3.1

EASTL

3.17.06

Popular Libraries in GPU

taichi

by taichi-dev doticonc++doticon

star image 18670 doticonApache-2.0

Productive & portable high-performance programming in Python.

gpu.js

by gpujs doticonjavascriptdoticon

star image 13614 doticonMIT

GPU Accelerated JavaScript

hashcat

by hashcat doticoncdoticon

star image 10426 doticon

World's fastest and most advanced password recovery utility

cupy

by cupy doticonpythondoticon

star image 5918 doticonMIT

NumPy & SciPy for GPU

EASTL

by electronicarts doticonc++doticon

star image 5812 doticonBSD-3-Clause

EASTL stands for Electronic Arts Standard Template Library. It is an extensive and robust implementation that has an emphasis on high performance.

ethminer

by ethereum-mining doticonc++doticon

star image 5395 doticonGPL-3.0

Ethereum miner with OpenCL, CUDA and stratum support

polars

by pola-rs doticonrustdoticon

star image 5341 doticonMIT

Fast multi-threaded DataFrame library in Rust | Python | Node.js

gfx

by gfx-rs doticonrustdoticon

star image 4976 doticonNOASSERTION

[maintenance mode] A low-overhead Vulkan-like GPU API for Rust.

Halide

by halide doticonc++doticon

star image 4890 doticonNOASSERTION

a language for fast, portable data-parallel computation

Trending New libraries in GPU

polars

by pola-rs doticonrustdoticon

star image 5341 doticonMIT

Fast multi-threaded DataFrame library in Rust | Python | Node.js

rust-gpu

by EmbarkStudios doticonrustdoticon

star image 4419 doticonNOASSERTION

🐉 Making Rust a first-class language and ecosystem for GPU shaders 🚧

vgpu_unlock

by DualCoder doticoncdoticon

star image 2587 doticonMIT

Unlock vGPU functionality for consumer grade GPUs.

libcudacxx

by NVIDIA doticonc++doticon

star image 1953 doticonNOASSERTION

The C++ Standard Library for your entire system.

tiny-cuda-nn

by NVlabs doticonc++doticon

star image 1093 doticonNOASSERTION

Lightning fast C++/CUDA neural network framework

ZLUDA

by vosen doticonc++doticon

star image 977 doticonNOASSERTION

CUDA on Intel GPUs

nvitop

by XuehaiPan doticonpythondoticon

star image 784 doticonGPL-3.0

An interactive NVIDIA-GPU process viewer, the one-stop solution for GPU process management.

VkFFT

by DTolm doticonc++doticon

star image 745 doticonMIT

Vulkan/CUDA/HIP/OpenCL Fast Fourier Transform library

gpu

by AsahiLinux doticoncdoticon

star image 662 doticon

Dissecting the M1's GPU for 3D acceleration

Top Authors in GPU

1

NVIDIA

31 Libraries

star icon14035

2

intel

14 Libraries

star icon2886

3

ROCmSoftwarePlatform

11 Libraries

star icon621

4

komrad36

10 Libraries

star icon110

5

LLNL

10 Libraries

star icon555

6

rapidsai

9 Libraries

star icon6610

7

Azure

9 Libraries

star icon80

8

arrayfire

8 Libraries

star icon5138

9

ROCm-Developer-Tools

8 Libraries

star icon3064

10

GPUOpen-LibrariesAndSDKs

8 Libraries

star icon1603

1

31 Libraries

star icon14035

2

14 Libraries

star icon2886

3

11 Libraries

star icon621

4

10 Libraries

star icon110

5

10 Libraries

star icon555

6

9 Libraries

star icon6610

7

9 Libraries

star icon80

8

8 Libraries

star icon5138

9

8 Libraries

star icon3064

10

8 Libraries

star icon1603

Trending Kits in GPU

The most powerful open source Java CPU libraries available today. It provides the core Java language with an incredible range of features and functions, including class loading, garbage collection, threading, synchronization, security, I/O management, exception handling, and more. Its flexibility makes it an amazing choice for those looking to build their own custom-made applications and programs. Java CPU libraries allow you to implement, adapt and use complex algorithms for the analysis of data. A profiler is a tool used by developers to see what part of their program or website is consuming resources such as memory or CPU time. Popular open source libraries for Java CPU include: oshi - Native Operating System and Hardware Information; AnotherMonitor - memory usage of Android devices; react-native-threads - Create new JS processes for CPU intensive work; r2cloud - Decode satellite signals on Raspberry PI or any other 64bit intel.

JavaScript is one of the most widely used programming languages in the world. The main purpose of the development of JavaScript was to provide dynamic interactivity on websites. The JS engine has attracted a significant number of developers from around the world. It become an industry standard for web development and backend programming. In 2020, JavaScript celebrated its 25th birthday, and it continues to be at the forefront of programming languages. When it comes to CPU libraries, there are many JavaScript libraries that can be used for your projects. It's hard to imagine a world without JavaScript CPU and it has many CPU-intensive tasks. Many developers depend on the following JavaScript CPU open source libraries are: scalene - high precision CPU, GPU, and memory profiler; ua-parser-js - UAParser.js Detect Browser, Engine, OS, CPU, and Device type/model from User Agent data; chillout - Reduce CPU usage by nonblocking async loop.

Java GPU Open Source libraries are a vital part of the Java ecosystem and a key component of many of the world's most popular websites. These projects are designed to enable high-performance Java applications on a variety of hardware and operating system architectures that can be used for various use cases like gaming, AI, ML, and Crypto mining. As GPU programming has become an active research area, many libraries have been proposed to speed up the development of scientific applications. We've done the research, and these are the 8 best Java GPU Open Source libraries listed in this kit. They are PixelFlow - A Processing/Java library for high-performance GPUComputing; CNNdroid - Open Source Library for GPUAccelerated Execution; aparapi - New Official Aparapi: a framework for executing native Java

A GPU (Graphics Processing Unit) is a programmable logic chip that executes many operations in parallel, especially for graphics computations. GPUs are essential for running heavy applications and thus are widely used in various areas of modern technology. In the world of web development, the GPU can be utilized to accelerate performing heavy computations. JavaScript is undoubtedly one of the most popular programming languages in the world. It is used to create dynamic web pages, build mobile apps and games, and even run servers thanks to NodeJS. And now it can also be used for GPU computing. NPM is the default package manager for JavaScript. It’s used by millions of developers to build and manage software. GPUs are already being used for quite some time in desktop and mobile applications like Adobe Premiere, Photoshop, After Effects or even games. Popular open-source libraries for JavaScript GPU among developers include: gpu.js - GPU Accelerated JavaScript; scalene - high precision CPU, GPU, and memory profiler; pai -Resource scheduling and cluster management for AI.

Python is the most popular programming language in the world. Its success lies in its versatility, allowing developers to create everything from simple APIs to complex applications. For machine learning and deep learning, Python has become a preferred language because of its flexibility. The data science and machine learning community has been developing many open source libraries for Python. GPUs are highly specialized chips designed to perform matrix multiplication operations at blazing speeds. Although they were initially intended for rendering computer graphics on screens, they have proved quite useful for machine learning applications as well. Python has a number of libraries that make it easy for us to leverage GPUs for both training and inference tasks. Some of these focus on improving generic performance by leveraging CUDA primitives and it provide higher level abstractions that allow you to quickly build complex architectures without worrying about implementation details. Some of the most popular open-source libraries for Python GPU among developers are: Jax - Composable transformations of Python NumPy programs; kitty - Cross platform, fast, feature rich, GPU based terminal; Image AI - python library built to empower developers to build applications.

Go (also known as go lang) is a programming language that was created by Google in 2009. It is designed for high performance and scalability with strict requirements on program correctness. The language combines elements of other languages such as Java, C++ and Python. The biggest difference between go and other languages such as Java or C++ is that it does not require compilation before running your code; instead, you can simply type “go build my program” into the terminal which will then execute all instructions one after another without needing an interpreter or compiler step first. GPU computing is a technology that has been around for a while, but only recently gained popularity due to the rise of deep learning and artificial intelligence. Developers tend to use some of the following open source libraries for Go GPU are: aresdb - A GPU powered real-time analytics storage and query engine; gapid - Graphics API Debugger; gpu-operator - NVIDIA GPU Operator creates.

C++ is a powerful programming language, which is widely used in many fields, especially in the embedded system. C++ is a statically typed, compiled programming language for general-purpose programming. It is also considered to be an intermediate-level language, as it comprises both high-level and low-level language features. These features make C++ a popular choice in the software industry and allow developers to create efficient applications that can be used in various domains. GPU libraries are widely used to accelerate the performance of matrix calculations, image processing and machine learning. GPUs are used not only in gaming and entertainment, but also in modern science. The number of computations that can be done on a GPU is significant. A few of the most popular open source libraries for C++ GPU are: Tensor RT - Tensor RT is a C library for high performance inference on NVIDIA GPUs and deep learning accelerators; array fire - Array Fire: a general purpose GPU library; compute - A C GPU Computing Library for OpenCL.

The next evolution of scientific computing will involve hardware accelerators, mainly FPGAs and GPUs, which have a higher number of cores and faster clock speeds than their CPU counterparts. With the ever-expanding functionality of personalized computing devices, it is now possible to achieve high performance computing on the graphics processing unit (GPU) with little knowledge of programming for parallel computing. Computing on the GPU has been shown to yield faster computation times than traditional CPU (central processing unit) programming by taking advantage of the high thread counts and large register files available on modern GPUs. Some of the most widely used open source libraries for C# GPU among developers include: Compute Sharp - NET 5 library to run C; GPU-particles - A GPU Particle System for Unity; Marching-Cubes-On-The-GPU - A implementation of the marching cubes algorithm on the GPU in Unity.

Trending Discussions on GPU

module 'numpy.distutils.__config__' has no attribute 'blas_opt_info'

How could I speed up my written python code: spheres contact detection (collision) using spatial searching

Unknown OpenCV exception while using EasyOcr

WebSocket not working when trying to send generated answer by keras

Does it make sense to use Conda + Poetry?

Azure Auto ML JobConfigurationMaxSizeExceeded error when using a cluster

Win10 Electron Error: Passthrough is not supported, GL is disabled, ANGLE is

How to make mediapipe pose estimation faster (python)

Why does nvidia-smi return "GPU access blocked by the operating system" in WSL2 under Windows 10 21H2

How to run Pytorch on Macbook pro (M1) GPU?

QUESTION

module 'numpy.distutils.__config__' has no attribute 'blas_opt_info'

Asked 2022-Mar-17 at 10:50

I'm trying to study the neural-network-and-deep-learning (http://neuralnetworksanddeeplearning.com/chap1.html). Using the updated version for Python 3 by MichalDanielDobrzanski (https://github.com/MichalDanielDobrzanski/DeepLearningPython). Tried to run it in my command console and it gives an error below. I've tried uninstalling and reinstalling setuptools, theano, and numpy but none have worked thus far. Any help is very appreciated!!

Here's the full error log:

1WARNING (theano.configdefaults): g++ not available, if using conda: `conda install m2w64-toolchain`
2C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configdefaults.py:560: UserWarning: DeprecationWarning: there is no c++ compiler.This is deprecated and with Theano 0.11 a c++ compiler will be mandatory
3  warnings.warn("DeprecationWarning: there is no c++ compiler."
4WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
5Traceback (most recent call last):
6  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configparser.py", line 168, in fetch_val_for_key
7    return theano_cfg.get(section, option)
8  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\configparser.py", line 781, in get
9    d = self._unify_values(section, vars)
10  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\configparser.py", line 1149, in _unify_values
11    raise NoSectionError(section) from None
12configparser.NoSectionError: No section: 'blas'
13
14During handling of the above exception, another exception occurred:
15
16Traceback (most recent call last):
17  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configparser.py", line 327, in __get__
18    val_str = fetch_val_for_key(self.fullname,
19  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configparser.py", line 172, in fetch_val_for_key
20    raise KeyError(key)
21KeyError: 'blas.ldflags'
22
23During handling of the above exception, another exception occurred:
24
25Traceback (most recent call last):
26  File "C:\Users\ASUS\Documents\GitHub\Neural-network-and-deep-learning-but-for-python-3\test.py", line 156, in <module>
27    import network3
28  File "C:\Users\ASUS\Documents\GitHub\Neural-network-and-deep-learning-but-for-python-3\network3.py", line 37, in <module>
29    import theano
30  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\__init__.py", line 124, in <module>
31    from theano.scan_module import (scan, map, reduce, foldl, foldr, clone,
32  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\scan_module\__init__.py", line 41, in <module>
33    from theano.scan_module import scan_opt
34  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\scan_module\scan_opt.py", line 60, in <module>
35    from theano import tensor, scalar
36  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\tensor\__init__.py", line 17, in <module>
37    from theano.tensor import blas
38  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\tensor\blas.py", line 155, in <module>
39    from theano.tensor.blas_headers import blas_header_text
40  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\tensor\blas_headers.py", line 987, in <module>
41    if not config.blas.ldflags:
42  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configparser.py", line 332, in __get__
43    val_str = self.default()
44  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configdefaults.py", line 1284, in default_blas_ldflags
45    blas_info = np.distutils.__config__.blas_opt_info
46AttributeError: module 'numpy.distutils.__config__' has no attribute 'blas_opt_info'
47

ANSWER

Answered 2022-Feb-17 at 14:12

I had the same issue and solved it downgrading numpy to version 1.20.3 by:

1WARNING (theano.configdefaults): g++ not available, if using conda: `conda install m2w64-toolchain`
2C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configdefaults.py:560: UserWarning: DeprecationWarning: there is no c++ compiler.This is deprecated and with Theano 0.11 a c++ compiler will be mandatory
3  warnings.warn("DeprecationWarning: there is no c++ compiler."
4WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
5Traceback (most recent call last):
6  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configparser.py", line 168, in fetch_val_for_key
7    return theano_cfg.get(section, option)
8  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\configparser.py", line 781, in get
9    d = self._unify_values(section, vars)
10  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\configparser.py", line 1149, in _unify_values
11    raise NoSectionError(section) from None
12configparser.NoSectionError: No section: 'blas'
13
14During handling of the above exception, another exception occurred:
15
16Traceback (most recent call last):
17  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configparser.py", line 327, in __get__
18    val_str = fetch_val_for_key(self.fullname,
19  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configparser.py", line 172, in fetch_val_for_key
20    raise KeyError(key)
21KeyError: 'blas.ldflags'
22
23During handling of the above exception, another exception occurred:
24
25Traceback (most recent call last):
26  File "C:\Users\ASUS\Documents\GitHub\Neural-network-and-deep-learning-but-for-python-3\test.py", line 156, in <module>
27    import network3
28  File "C:\Users\ASUS\Documents\GitHub\Neural-network-and-deep-learning-but-for-python-3\network3.py", line 37, in <module>
29    import theano
30  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\__init__.py", line 124, in <module>
31    from theano.scan_module import (scan, map, reduce, foldl, foldr, clone,
32  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\scan_module\__init__.py", line 41, in <module>
33    from theano.scan_module import scan_opt
34  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\scan_module\scan_opt.py", line 60, in <module>
35    from theano import tensor, scalar
36  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\tensor\__init__.py", line 17, in <module>
37    from theano.tensor import blas
38  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\tensor\blas.py", line 155, in <module>
39    from theano.tensor.blas_headers import blas_header_text
40  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\tensor\blas_headers.py", line 987, in <module>
41    if not config.blas.ldflags:
42  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configparser.py", line 332, in __get__
43    val_str = self.default()
44  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configdefaults.py", line 1284, in default_blas_ldflags
45    blas_info = np.distutils.__config__.blas_opt_info
46AttributeError: module 'numpy.distutils.__config__' has no attribute 'blas_opt_info'
47pip3 install --upgrade numpy==1.20.3
48

Source https://stackoverflow.com/questions/70839312

QUESTION

How could I speed up my written python code: spheres contact detection (collision) using spatial searching

Asked 2022-Mar-13 at 15:43

I am working on a spatial search case for spheres in which I want to find connected spheres. For this aim, I searched around each sphere for spheres that centers are in a (maximum sphere diameter) distance from the searching sphere’s center. At first, I tried to use scipy related methods to do so, but scipy method takes longer times comparing to equivalent numpy method. For scipy, I have determined the number of K-nearest spheres firstly and then find them by cKDTree.query, which lead to more time consumption. However, it is slower than numpy method even by omitting the first step with a constant value (it is not good to omit the first step in this case). It is contrary to my expectations about scipy spatial searching speed. So, I tried to use some list-loops instead some numpy lines for speeding up using numba prange. Numba run the code a little faster, but I believe that this code can be optimized for better performances, perhaps by vectorization, using other alternative numpy modules or using numba in another way. I have used iteration on all spheres due to prevent probable memory leaks and …, where number of spheres are high.

1import numpy as np
2import numba as nb
3from scipy.spatial import cKDTree, distance
4
5# ---------------------------- input data ----------------------------
6""" For testing by prepared files:
7radii = np.load('a.npy')     # shape: (n-spheres, )     must be loaded by np.load('a.npy') or np.loadtxt('radii_large.csv')
8poss = np.load('b.npy')      # shape: (n-spheres, 3)    must be loaded by np.load('b.npy') or np.loadtxt('pos_large.csv', delimiter=',')
9"""
10
11rnd = np.random.RandomState(70)
12data_volume = 200000
13
14radii = rnd.uniform(0.0005, 0.122, data_volume)
15dia_max = 2 * radii.max()
16
17x = rnd.uniform(-1.02, 1.02, (data_volume, 1))
18y = rnd.uniform(-3.52, 3.52, (data_volume, 1))
19z = rnd.uniform(-1.02, -0.575, (data_volume, 1))
20poss = np.hstack((x, y, z))
21# --------------------------------------------------------------------
22
23# @nb.jit('float64[:,::1](float64[:,::1], float64[::1])', forceobj=True, parallel=True)
24def ends_gap(poss, dia_max):
25    particle_corsp_overlaps = np.array([], dtype=np.float64)
26    ends_ind = np.empty([1, 2], dtype=np.int64)
27    """ using list looping """
28    # particle_corsp_overlaps = []
29    # ends_ind = []
30
31    # for particle_idx in nb.prange(len(poss)):  # by list looping
32    for particle_idx in range(len(poss)):
33        unshared_idx = np.delete(np.arange(len(poss)), particle_idx)                                                    # <--- relatively high time consumer
34        poss_without = poss[unshared_idx]
35
36        """ # SCIPY method ---------------------------------------------------------------------------------------------
37        nears_i_ind = cKDTree(poss_without).query_ball_point(poss[particle_idx], r=dia_max)         # <--- high time consumer
38        if len(nears_i_ind) > 0:
39            dist_i, dist_i_ind = cKDTree(poss_without[nears_i_ind]).query(poss[particle_idx], k=len(nears_i_ind))       # <--- high time consumer
40            if not isinstance(dist_i, float):
41                dist_i[dist_i_ind] = dist_i.copy()
42        """  # NUMPY method --------------------------------------------------------------------------------------------
43        lx_limit_idx = poss_without[:, 0] <= poss[particle_idx][0] + dia_max
44        ux_limit_idx = poss_without[:, 0] >= poss[particle_idx][0] - dia_max
45        ly_limit_idx = poss_without[:, 1] <= poss[particle_idx][1] + dia_max
46        uy_limit_idx = poss_without[:, 1] >= poss[particle_idx][1] - dia_max
47        lz_limit_idx = poss_without[:, 2] <= poss[particle_idx][2] + dia_max
48        uz_limit_idx = poss_without[:, 2] >= poss[particle_idx][2] - dia_max
49
50        nears_i_ind = np.where(lx_limit_idx & ux_limit_idx & ly_limit_idx & uy_limit_idx & lz_limit_idx & uz_limit_idx)[0]
51        if len(nears_i_ind) > 0:
52            dist_i = distance.cdist(poss_without[nears_i_ind], poss[particle_idx][None, :]).squeeze()                   # <--- relatively high time consumer
53        # """  # -------------------------------------------------------------------------------------------------------
54            contact_check = dist_i - (radii[unshared_idx][nears_i_ind] + radii[particle_idx])
55            connected = contact_check[contact_check <= 0]
56
57            particle_corsp_overlaps = np.concatenate((particle_corsp_overlaps, connected))
58            """ using list looping """
59            # if len(connected) > 0:
60            #    for value_ in connected:
61            #        particle_corsp_overlaps.append(value_)
62
63            contacts_ind = np.where([contact_check <= 0])[1]
64            contacts_sec_ind = np.array(nears_i_ind)[contacts_ind]
65            sphere_olps_ind = np.where((poss[:, None] == poss_without[contacts_sec_ind][None, :]).all(axis=2))[0]       # <--- high time consumer
66
67            ends_ind_mod_temp = np.array([np.repeat(particle_idx, len(sphere_olps_ind)), sphere_olps_ind], dtype=np.int64).T
68            if particle_idx > 0:
69                ends_ind = np.concatenate((ends_ind, ends_ind_mod_temp))
70            else:
71                ends_ind[0, 0], ends_ind[0, 1] = ends_ind_mod_temp[0, 0], ends_ind_mod_temp[0, 1]
72            """ using list looping """
73            # for contacted_idx in sphere_olps_ind:
74            #    ends_ind.append([particle_idx, contacted_idx])
75
76    # ends_ind_org = np.array(ends_ind)  # using lists
77    ends_ind_org = ends_ind
78    ends_ind, ends_ind_idx = np.unique(np.sort(ends_ind_org), axis=0, return_index=True)                                # <--- relatively high time consumer
79    gap = np.array(particle_corsp_overlaps)[ends_ind_idx]
80    return gap, ends_ind, ends_ind_idx, ends_ind_org
81

In one of my tests on 23000 spheres, scipy, numpy, and numba-aided methods finished the loop in about 400, 200, and 180 seconds correspondingly using Colab TPU; for 500.000 spheres it take 3.5 hours. These execution times are not satisfying at all for my project, where number of spheres may be up to 1.000.000 in a medium data volume. I will call this code many times in my main code and seeking for ways that could perform this code in milliseconds (as much as fastest that it could). Is it possible?? I would be appreciated if anyone would speed up the code as it is needed.

Notes:

  • This code must be executable with python 3.7+, on CPU and GPU.
  • This code must be applicable for data size, at least, 300.000 spheres.
  • All numpy, scipy, and … equivalent modules instead of my written modules, which make my code faster significantly, will be upvoted.

I would be appreciated for any recommendations or explanations about:

  1. Which method could be faster in this subject?
  2. Why scipy is not faster than other methods in this case and where it could be helpful relating to this subject?
  3. Choosing between iterator methods and matrix form methods is a confusing matter for me. Iterating methods use less memory and could be used and tuned up by numba and … but, I think, are not useful and comparable with matrix methods (which depends on memory limits) like numpy and … for huge sphere numbers. For this case, perhaps I could omit the iteration by numpy, but I guess strongly that it cannot be handled due to huge matrix size operations and memory leaks.

Prepared sample test data:

Poss data: 23000, 500000
Radii data: 23000, 500000
Line by line speed test logs: for two test cases scipy method and numpy time consumption.

ANSWER

Answered 2022-Feb-14 at 10:23

Have you tried FLANN?

This code doesn't solve your problem completely. It simply finds the nearest 50 neighbors to each point in your 500000 point dataset:

1import numpy as np
2import numba as nb
3from scipy.spatial import cKDTree, distance
4
5# ---------------------------- input data ----------------------------
6""" For testing by prepared files:
7radii = np.load('a.npy')     # shape: (n-spheres, )     must be loaded by np.load('a.npy') or np.loadtxt('radii_large.csv')
8poss = np.load('b.npy')      # shape: (n-spheres, 3)    must be loaded by np.load('b.npy') or np.loadtxt('pos_large.csv', delimiter=',')
9"""
10
11rnd = np.random.RandomState(70)
12data_volume = 200000
13
14radii = rnd.uniform(0.0005, 0.122, data_volume)
15dia_max = 2 * radii.max()
16
17x = rnd.uniform(-1.02, 1.02, (data_volume, 1))
18y = rnd.uniform(-3.52, 3.52, (data_volume, 1))
19z = rnd.uniform(-1.02, -0.575, (data_volume, 1))
20poss = np.hstack((x, y, z))
21# --------------------------------------------------------------------
22
23# @nb.jit('float64[:,::1](float64[:,::1], float64[::1])', forceobj=True, parallel=True)
24def ends_gap(poss, dia_max):
25    particle_corsp_overlaps = np.array([], dtype=np.float64)
26    ends_ind = np.empty([1, 2], dtype=np.int64)
27    """ using list looping """
28    # particle_corsp_overlaps = []
29    # ends_ind = []
30
31    # for particle_idx in nb.prange(len(poss)):  # by list looping
32    for particle_idx in range(len(poss)):
33        unshared_idx = np.delete(np.arange(len(poss)), particle_idx)                                                    # <--- relatively high time consumer
34        poss_without = poss[unshared_idx]
35
36        """ # SCIPY method ---------------------------------------------------------------------------------------------
37        nears_i_ind = cKDTree(poss_without).query_ball_point(poss[particle_idx], r=dia_max)         # <--- high time consumer
38        if len(nears_i_ind) > 0:
39            dist_i, dist_i_ind = cKDTree(poss_without[nears_i_ind]).query(poss[particle_idx], k=len(nears_i_ind))       # <--- high time consumer
40            if not isinstance(dist_i, float):
41                dist_i[dist_i_ind] = dist_i.copy()
42        """  # NUMPY method --------------------------------------------------------------------------------------------
43        lx_limit_idx = poss_without[:, 0] <= poss[particle_idx][0] + dia_max
44        ux_limit_idx = poss_without[:, 0] >= poss[particle_idx][0] - dia_max
45        ly_limit_idx = poss_without[:, 1] <= poss[particle_idx][1] + dia_max
46        uy_limit_idx = poss_without[:, 1] >= poss[particle_idx][1] - dia_max
47        lz_limit_idx = poss_without[:, 2] <= poss[particle_idx][2] + dia_max
48        uz_limit_idx = poss_without[:, 2] >= poss[particle_idx][2] - dia_max
49
50        nears_i_ind = np.where(lx_limit_idx & ux_limit_idx & ly_limit_idx & uy_limit_idx & lz_limit_idx & uz_limit_idx)[0]
51        if len(nears_i_ind) > 0:
52            dist_i = distance.cdist(poss_without[nears_i_ind], poss[particle_idx][None, :]).squeeze()                   # <--- relatively high time consumer
53        # """  # -------------------------------------------------------------------------------------------------------
54            contact_check = dist_i - (radii[unshared_idx][nears_i_ind] + radii[particle_idx])
55            connected = contact_check[contact_check <= 0]
56
57            particle_corsp_overlaps = np.concatenate((particle_corsp_overlaps, connected))
58            """ using list looping """
59            # if len(connected) > 0:
60            #    for value_ in connected:
61            #        particle_corsp_overlaps.append(value_)
62
63            contacts_ind = np.where([contact_check <= 0])[1]
64            contacts_sec_ind = np.array(nears_i_ind)[contacts_ind]
65            sphere_olps_ind = np.where((poss[:, None] == poss_without[contacts_sec_ind][None, :]).all(axis=2))[0]       # <--- high time consumer
66
67            ends_ind_mod_temp = np.array([np.repeat(particle_idx, len(sphere_olps_ind)), sphere_olps_ind], dtype=np.int64).T
68            if particle_idx > 0:
69                ends_ind = np.concatenate((ends_ind, ends_ind_mod_temp))
70            else:
71                ends_ind[0, 0], ends_ind[0, 1] = ends_ind_mod_temp[0, 0], ends_ind_mod_temp[0, 1]
72            """ using list looping """
73            # for contacted_idx in sphere_olps_ind:
74            #    ends_ind.append([particle_idx, contacted_idx])
75
76    # ends_ind_org = np.array(ends_ind)  # using lists
77    ends_ind_org = ends_ind
78    ends_ind, ends_ind_idx = np.unique(np.sort(ends_ind_org), axis=0, return_index=True)                                # <--- relatively high time consumer
79    gap = np.array(particle_corsp_overlaps)[ends_ind_idx]
80    return gap, ends_ind, ends_ind_idx, ends_ind_org
81from pyflann import FLANN
82
83p = np.loadtxt("pos_large.csv", delimiter=",")
84flann = FLANN()
85flann.build_index(pts=p)
86idx, dist = flann.nn_index(qpts=p, num_neighbors=50)
87

The last line takes less than a second in my laptop without any tuning or parallelization.

Source https://stackoverflow.com/questions/71104627

QUESTION

Unknown OpenCV exception while using EasyOcr

Asked 2022-Feb-22 at 09:04

Code:

1import easyocr
2
3reader = easyocr.Reader(['en'])
4result = reader.readtext('R.png')
5

Output:

1import easyocr
2
3reader = easyocr.Reader(['en'])
4result = reader.readtext('R.png')
5CUDA not available - defaulting to CPU. Note: This module is much faster with a GPU.
6
7cv2.error: Unknown C++ exception from OpenCV code
8
9

I would truly appreciate any support!

ANSWER

Answered 2022-Jan-09 at 10:19

The new version of OpenCV has some issues. Uninstall the newer version of OpenCV and install the older one using:

1import easyocr
2
3reader = easyocr.Reader(['en'])
4result = reader.readtext('R.png')
5CUDA not available - defaulting to CPU. Note: This module is much faster with a GPU.
6
7cv2.error: Unknown C++ exception from OpenCV code
8
9pip install opencv-python==4.5.4.60
10

Source https://stackoverflow.com/questions/70573780

QUESTION

WebSocket not working when trying to send generated answer by keras

Asked 2022-Feb-17 at 12:52

I am implementing a simple chatbot using keras and WebSockets. I now have a model that can make a prediction about the user input and send the according answer.

When I do it through command line it works fine, however when I try to send the answer through my WebSocket, the WebSocket doesn't even start anymore.

Here is my working WebSocket code:

1@sock.route('/api')
2def echo(sock):
3    while True:
4        # get user input from browser
5        user_input = sock.receive()
6        # print user input on console
7        print(user_input)
8        # read answer from console
9        response = input()
10        # send response to browser
11        sock.send(response)
12

Here is my code to communicate with the keras model on command line:

1@sock.route('/api')
2def echo(sock):
3    while True:
4        # get user input from browser
5        user_input = sock.receive()
6        # print user input on console
7        print(user_input)
8        # read answer from console
9        response = input()
10        # send response to browser
11        sock.send(response)
12while True:
13    question = input("")
14    ints = predict(question)
15    answer = response(ints, json_data)
16    print(answer)
17

Used methods are those:

1@sock.route('/api')
2def echo(sock):
3    while True:
4        # get user input from browser
5        user_input = sock.receive()
6        # print user input on console
7        print(user_input)
8        # read answer from console
9        response = input()
10        # send response to browser
11        sock.send(response)
12while True:
13    question = input("")
14    ints = predict(question)
15    answer = response(ints, json_data)
16    print(answer)
17def predict(sentence):
18    bag_of_words = convert_sentence_in_bag_of_words(sentence)
19    # pass bag as list and get index 0
20    prediction = model.predict(np.array([bag_of_words]))[0]
21    ERROR_THRESHOLD = 0.25
22    accepted_results = [[tag, probability] for tag, probability in enumerate(prediction) if probability > ERROR_THRESHOLD]
23
24    accepted_results.sort(key=lambda x: x[1], reverse=True)
25
26    output = []
27    for accepted_result in accepted_results:
28        output.append({'intent': classes[accepted_result[0]], 'probability': str(accepted_result[1])})
29        print(output)
30    return output
31
32
33def response(intents, json):
34    tag = intents[0]['intent']
35    intents_as_list = json['intents']
36    for i in intents_as_list:
37        if i['tag'] == tag:
38            res = random.choice(i['responses'])
39            break
40    return res
41

So when I start the WebSocket with the working code I get this output:

1@sock.route('/api')
2def echo(sock):
3    while True:
4        # get user input from browser
5        user_input = sock.receive()
6        # print user input on console
7        print(user_input)
8        # read answer from console
9        response = input()
10        # send response to browser
11        sock.send(response)
12while True:
13    question = input("")
14    ints = predict(question)
15    answer = response(ints, json_data)
16    print(answer)
17def predict(sentence):
18    bag_of_words = convert_sentence_in_bag_of_words(sentence)
19    # pass bag as list and get index 0
20    prediction = model.predict(np.array([bag_of_words]))[0]
21    ERROR_THRESHOLD = 0.25
22    accepted_results = [[tag, probability] for tag, probability in enumerate(prediction) if probability > ERROR_THRESHOLD]
23
24    accepted_results.sort(key=lambda x: x[1], reverse=True)
25
26    output = []
27    for accepted_result in accepted_results:
28        output.append({'intent': classes[accepted_result[0]], 'probability': str(accepted_result[1])})
29        print(output)
30    return output
31
32
33def response(intents, json):
34    tag = intents[0]['intent']
35    intents_as_list = json['intents']
36    for i in intents_as_list:
37        if i['tag'] == tag:
38            res = random.choice(i['responses'])
39            break
40    return res
41 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
42 * Restarting with stat
43 * Serving Flask app 'server' (lazy loading)
44 * Environment: production
45   WARNING: This is a development server. Do not use it in a production deployment.
46   Use a production WSGI server instead.
47 * Debug mode: on
48

But as soon as I have anything of my model in the server.py class I get this output:

1@sock.route('/api')
2def echo(sock):
3    while True:
4        # get user input from browser
5        user_input = sock.receive()
6        # print user input on console
7        print(user_input)
8        # read answer from console
9        response = input()
10        # send response to browser
11        sock.send(response)
12while True:
13    question = input("")
14    ints = predict(question)
15    answer = response(ints, json_data)
16    print(answer)
17def predict(sentence):
18    bag_of_words = convert_sentence_in_bag_of_words(sentence)
19    # pass bag as list and get index 0
20    prediction = model.predict(np.array([bag_of_words]))[0]
21    ERROR_THRESHOLD = 0.25
22    accepted_results = [[tag, probability] for tag, probability in enumerate(prediction) if probability > ERROR_THRESHOLD]
23
24    accepted_results.sort(key=lambda x: x[1], reverse=True)
25
26    output = []
27    for accepted_result in accepted_results:
28        output.append({'intent': classes[accepted_result[0]], 'probability': str(accepted_result[1])})
29        print(output)
30    return output
31
32
33def response(intents, json):
34    tag = intents[0]['intent']
35    intents_as_list = json['intents']
36    for i in intents_as_list:
37        if i['tag'] == tag:
38            res = random.choice(i['responses'])
39            break
40    return res
41 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
42 * Restarting with stat
43 * Serving Flask app 'server' (lazy loading)
44 * Environment: production
45   WARNING: This is a development server. Do not use it in a production deployment.
46   Use a production WSGI server instead.
47 * Debug mode: on
482022-02-13 11:31:38.887640: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
492022-02-13 11:31:38.887734: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
50Metal device set to: Apple M1
51
52systemMemory: 16.00 GB
53maxCacheSize: 5.33 GB
54

It is enough when I just have an import at the top like this: from chatty import response, predict - even though they are unused.

ANSWER

Answered 2022-Feb-16 at 19:53

There is no problem with your websocket route. Could you please share how you are triggering this route? Websocket is a different protocol and I'm suspecting that you are using a HTTP client to test websocket. For example in Postman:

Postman New Screen

HTTP requests are different than websocket requests. So, you should use appropriate client to test websocket.

Source https://stackoverflow.com/questions/71099818

QUESTION

Does it make sense to use Conda + Poetry?

Asked 2022-Feb-14 at 10:04

Does it make sense to use Conda + Poetry for a Machine Learning project? Allow me to share my (novice) understanding and please correct or enlighten me:

As far as I understand, Conda and Poetry have different purposes but are largely redundant:

  • Conda is primarily a environment manager (in fact not necessarily Python), but it can also manage packages and dependencies.
  • Poetry is primarily a Python package manager (say, an upgrade of pip), but it can also create and manage Python environments (say, an upgrade of Pyenv).

My idea is to use both and compartmentalize their roles: let Conda be the environment manager and Poetry the package manager. My reasoning is that (it sounds like) Conda is best for managing environments and can be used for compiling and installing non-python packages, especially CUDA drivers (for GPU capability), while Poetry is more powerful than Conda as a Python package manager.

I've managed to make this work fairly easily by using Poetry within a Conda environment. The trick is to not use Poetry to manage the Python environment: I'm not using commands like poetry shell or poetry run, only poetry init, poetry install etc (after activating the Conda environment).

For full disclosure, my environment.yml file (for Conda) looks like this:

1name: N
2
3channels:
4  - defaults
5  - conda-forge
6
7dependencies:
8  - python=3.9
9  - cudatoolkit
10  - cudnn
11

and my poetry.toml file looks like that:

1name: N
2
3channels:
4  - defaults
5  - conda-forge
6
7dependencies:
8  - python=3.9
9  - cudatoolkit
10  - cudnn
11[tool.poetry]
12name = "N"
13authors = ["B"]
14
15[tool.poetry.dependencies]
16python = "3.9"
17torch = "^1.10.1"
18
19[build-system]
20requires = ["poetry-core>=1.0.0"]
21build-backend = "poetry.core.masonry.api"
22

To be honest, one of the reasons I proceeded this way is that I was struggling to install CUDA (for GPU support) without Conda.

Does this project design look reasonable to you?

ANSWER

Answered 2022-Feb-14 at 10:04

As I wrote in the comment, I've been using a very similar Conda + Poetry setup in a data science project for the last year, for reasons similar to yours, and it's been working fine. The great majority of my dependencies are specified in pyproject.toml, but when there's something that's unavailable in PyPI, I add it to environment.yml.

Some additional tips:

  1. Add Poetry, possibly with a version number (if needed), as a dependency in environment.yml, so that you get Poetry installed when you run conda env create, along with Python and other non-PyPI dependencies.
  2. Consider adding conda-lock, which gives you lock files for Conda dependencies, just like you have poetry.lock for Poetry dependencies.

Source https://stackoverflow.com/questions/70851048

QUESTION

Azure Auto ML JobConfigurationMaxSizeExceeded error when using a cluster

Asked 2022-Jan-03 at 10:09

I am running into the following error when I try to run Automated ML through the studio on a GPU compute cluster:

Azure ML error message

Error: AzureMLCompute job failed. JobConfigurationMaxSizeExceeded: The specified job configuration exceeds the max allowed size of 32768 characters. Please reduce the size of the job's command line arguments and environment settings

The attempted run is on a registered tabulated dataset in filestore and is a simple regression case. Strangely, it works just fine with the CPU compute instance I use for my other pipelines. I have been able to run it a few times using that and wanted to upgrade to a cluster only to be hit by this error. I found online that it could be a case of having the following setting: AZUREML_COMPUTE_USE_COMMON_RUNTIME:false; but I am not sure where to put this in when just running from the web studio.

ANSWER

Answered 2021-Dec-13 at 17:58

This is a known bug. I am following up with product group to see if there any update for this bug. For the workaround you mentioned, it need you to go to the node failing with the JobConfigurationMaxSizeExceeded exception and manually set AZUREML_COMPUTE_USE_COMMON_RUNTIME:false in their Environment JSON field.

The node is as below screenshot.enter image description here

Source https://stackoverflow.com/questions/70279636

QUESTION

Win10 Electron Error: Passthrough is not supported, GL is disabled, ANGLE is

Asked 2022-Jan-03 at 01:54

I have an electron repo (https://github.com/MartinBarker/RenderTune) which used to work on windows 10 fine when ran with command prompt. After a couple months I come back on a new fresh windows 10 machine with an Nvidia GPU, and the electron app prints an error in the window when starting up:

1Uncaught TypeError: Cannot read properties of undefined (reading 'getCurrentWindow')
2

Running ffmpeg shell commands results in an error as well, and in the command prompt terminal this message is outputted:

1Uncaught TypeError: Cannot read properties of undefined (reading 'getCurrentWindow')
2[14880:1207/145651.085:ERROR:gpu_init.cc(457)] Passthrough is not supported, GL is disabled, ANGLE is
3

I checked on my other Windows laptop machines running the same exact code from the master branch of my repo, and it works perfectly fine when running locally.

It seems like this might be a recent issue? I have found it discussed in various forums: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1944468

https://www.reddit.com/r/electronjs/comments/qdauhu/passthrough_is_not_supported_gl_is_disabled_angle/

I tried upgrading my global electron npm package to a more recent version: electron@16.0.4 , but the errors still appear.

ANSWER

Answered 2022-Jan-03 at 01:54

You can try disabling hardware acceleration using app.disableHardwareAcceleration() (See the docs). I don't think this is a fix though, it just makes the message go away for me.


Example Usage

main.js

1Uncaught TypeError: Cannot read properties of undefined (reading 'getCurrentWindow')
2[14880:1207/145651.085:ERROR:gpu_init.cc(457)] Passthrough is not supported, GL is disabled, ANGLE is
3import { app, BrowserWindow } from 'electron'
4import isDev from 'electron-is-dev'
5
6app.disableHardwareAcceleration()
7
8let win = null
9
10async function createWindow() {
11  win = new BrowserWindow({
12    title: 'My Window'
13  })
14
15  const winURL = isDev
16    ? 'http://localhost:9080'
17    : `file://${__dirname}/index.html`
18  win.loadURL(url)
19
20  win.on('ready-to-show', async () => {
21    win.show()
22    win.maximize()
23  })
24}
25
26app.whenReady().then(createWindow)
27
28app.on('window-all-closed', () => {
29  win = null
30  if (process.platform !== 'darwin') {
31    app.quit()
32  }
33})
34

Source https://stackoverflow.com/questions/70267992

QUESTION

How to make mediapipe pose estimation faster (python)

Asked 2021-Dec-20 at 16:11

I'm making a pose estimation script for my game. However, it's working at 20-30 fps and not using the whole CPU even if there is no fps limit. It's not using whole GPU too. Can someone help me?

Here is resource usage while playing a dance video: https://imgur.com/a/6yI2TWg

Here is my code:

1import cv2
2import mediapipe as mp
3import time
4
5inFile = '/dev/video0'
6
7capture = cv2.VideoCapture(inFile)
8FramesVideo = int(capture.get(cv2.CAP_PROP_FRAME_COUNT)) # Number of frames inside video
9FrameCount = 0 # Currently playing frame
10prevTime = 0
11
12# some objects for mediapipe
13mpPose = mp.solutions.pose
14mpDraw = mp.solutions.drawing_utils
15pose = mpPose.Pose()
16
17while True:
18    FrameCount += 1
19    #read image and convert to rgb
20    success, img = capture.read()
21    imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
22    
23    #process image
24    results = pose.process(imgRGB)
25
26    if results.pose_landmarks:
27        mpDraw.draw_landmarks(img, results.pose_landmarks, mpPose.POSE_CONNECTIONS)
28        #get landmark positions
29        landmarks = []
30        for id, lm in enumerate(results.pose_landmarks.landmark):
31            h, w, c = img.shape 
32            cx, cy = int(lm.x * w), int(lm.y * h) 
33            cv2.putText(img, str(id), (cx,cy), cv2.FONT_HERSHEY_PLAIN, 1, (255,0,0), 1)
34            landmarks.append((cx,cy))
35 
36    # calculate and print fps
37    frameTime = time.time()
38    fps = 1/(frameTime-prevTime)
39    prevTime = frameTime
40    cv2.putText(img, str(int(fps)), (30,50), cv2.FONT_HERSHEY_PLAIN, 3, (255,0,0), 3)
41
42    #show image
43    cv2.imshow('Video', img)
44    cv2.waitKey(1)
45    if FrameCount == FramesVideo-1:
46        capture.release()
47        cv2.destroyAllWindows()
48        break
49

ANSWER

Answered 2021-Dec-20 at 16:11

Set the model_complexity of mp.Pose to 0.

As the documentation states:

MODEL_COMPLEXITY Complexity of the pose landmark model: 0, 1 or 2. Landmark accuracy as well as inference latency generally go up with the model complexity. Default to 1.

This is the best solution I've found, also use this.

Source https://stackoverflow.com/questions/68745309

QUESTION

Why does nvidia-smi return "GPU access blocked by the operating system" in WSL2 under Windows 10 21H2

Asked 2021-Nov-18 at 19:20
Installing CUDA on WSL2

I've installed Windows 10 21H2 on both my desktop (AMD 5950X system with RTX3080) and my laptop (Dell XPS 9560 with i7-7700HQ and GTX1050) following the instructions on https://docs.nvidia.com/cuda/wsl-user-guide/index.html:

  1. Install CUDA-capable driver in Windows
  2. Update WSL2 kernel in PowerShell: wsl --update
  3. Install CUDA toolkit in Ubuntu 20.04 in WSL2 (Note that you don't install a CUDA driver in WSL2, the instructions explicitly tell that the CUDA driver should not be installed.):
1$ wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
2$ sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
3$ wget https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb
4$ sudo dpkg -i cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb
5$ sudo apt-key add /var/cuda-repo-wsl-ubuntu-11-4-local/7fa2af80.pub
6$ sudo apt-get update
7$ sudo apt-get -y install cuda
8
The Error

On my desktop nvidia-smi and CUDA samples are working fine in WSL2. But on my laptop running nvidia-smi in WSL2 returns:

1$ wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
2$ sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
3$ wget https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb
4$ sudo dpkg -i cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb
5$ sudo apt-key add /var/cuda-repo-wsl-ubuntu-11-4-local/7fa2af80.pub
6$ sudo apt-get update
7$ sudo apt-get -y install cuda
8$ nvidia-smi
9Failed to initialize NVML: GPU access blocked by the operating system
10Failed to properly shut down NVML: GPU access blocked by the operating system
11

I'm aware my laptop has NVIDIA Optimus with both Intel IGP and NVIDIA GTX1050, but CUDA is working fine in Windows. Only not in WSL2. But I also could not find any information that CUDA is not supposed to work in WSL2 for Optimus systems.

What I've tried

I've tried the following mitigations, but the error remains:

  • reinstalling the Windows CUDA driver again and rebooting
  • Making the GTX1050 the preferred GPU in global settings in the NVIDIA control panel
  • Making the GTX1050 the default physx processor
  • Following the same steps for a fresh Ubuntu 18.04 in WSL2
The question

Is this a CUDA WSL2 bug? Or does CUDA simply not work with Optimus? Or how can I fix or further debug this?

More details

I've compared running nvidia-smi.exe in Windows powershell between my desktop and laptop, and they both return the same software versions:

1$ wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
2$ sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
3$ wget https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb
4$ sudo dpkg -i cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb
5$ sudo apt-key add /var/cuda-repo-wsl-ubuntu-11-4-local/7fa2af80.pub
6$ sudo apt-get update
7$ sudo apt-get -y install cuda
8$ nvidia-smi
9Failed to initialize NVML: GPU access blocked by the operating system
10Failed to properly shut down NVML: GPU access blocked by the operating system
11PS C:\WINDOWS\system32> nvidia-smi
12Wed Nov 17 21:46:50 2021
13+-----------------------------------------------------------------------------+
14| NVIDIA-SMI 510.06       Driver Version: 510.06       CUDA Version: 11.6     |
15|-------------------------------+----------------------+----------------------+
16| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
17| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
18|                               |                      |               MIG M. |
19|===============================+======================+======================|
20|   0  NVIDIA GeForce ... WDDM  | 00000000:01:00.0 Off |                  N/A |
21| N/A   44C    P8    N/A /  N/A |     75MiB /  4096MiB |      1%      Default |
22|                               |                      |                  N/A |
23+-------------------------------+----------------------+----------------------+
24
25+-----------------------------------------------------------------------------+
26| Processes:                                                                  |
27|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
28|        ID   ID                                                   Usage      |
29|=============================================================================|
30|  No running processes found                                                 |
31+-----------------------------------------------------------------------------+
32
Even more details

The full nvidia-smi.exe -q on my laptop in Windows Powershell returns the following information about my laptop's GPU:

1$ wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
2$ sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
3$ wget https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb
4$ sudo dpkg -i cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb
5$ sudo apt-key add /var/cuda-repo-wsl-ubuntu-11-4-local/7fa2af80.pub
6$ sudo apt-get update
7$ sudo apt-get -y install cuda
8$ nvidia-smi
9Failed to initialize NVML: GPU access blocked by the operating system
10Failed to properly shut down NVML: GPU access blocked by the operating system
11PS C:\WINDOWS\system32> nvidia-smi
12Wed Nov 17 21:46:50 2021
13+-----------------------------------------------------------------------------+
14| NVIDIA-SMI 510.06       Driver Version: 510.06       CUDA Version: 11.6     |
15|-------------------------------+----------------------+----------------------+
16| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
17| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
18|                               |                      |               MIG M. |
19|===============================+======================+======================|
20|   0  NVIDIA GeForce ... WDDM  | 00000000:01:00.0 Off |                  N/A |
21| N/A   44C    P8    N/A /  N/A |     75MiB /  4096MiB |      1%      Default |
22|                               |                      |                  N/A |
23+-------------------------------+----------------------+----------------------+
24
25+-----------------------------------------------------------------------------+
26| Processes:                                                                  |
27|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
28|        ID   ID                                                   Usage      |
29|=============================================================================|
30|  No running processes found                                                 |
31+-----------------------------------------------------------------------------+
32PS C:\WINDOWS\system32> nvidia-smi -q
33
34==============NVSMI LOG==============
35
36Timestamp                                 : Wed Nov 17 21:48:19 2021
37Driver Version                            : 510.06
38CUDA Version                              : 11.6
39
40Attached GPUs                             : 1
41GPU 00000000:01:00.0
42    Product Name                          : NVIDIA GeForce GTX 1050
43    Product Brand                         : GeForce
44    Product Architecture                  : Pascal
45    Display Mode                          : Disabled
46    Display Active                        : Disabled
47    Persistence Mode                      : N/A
48    MIG Mode
49        Current                           : N/A
50        Pending                           : N/A
51    Accounting Mode                       : Disabled
52    Accounting Mode Buffer Size           : 4000
53    Driver Model
54        Current                           : WDDM
55        Pending                           : WDDM
56    Serial Number                         : N/A
57    GPU UUID                              : GPU-7645072f-7516-5488-316d-6277d101f64e
58    Minor Number                          : N/A
59    VBIOS Version                         : 86.07.3e.00.1c
60    MultiGPU Board                        : No
61    Board ID                              : 0x100
62    GPU Part Number                       : N/A
63    Module ID                             : 0
64    Inforom Version
65        Image Version                     : N/A
66        OEM Object                        : N/A
67        ECC Object                        : N/A
68        Power Management Object           : N/A
69    GPU Operation Mode
70        Current                           : N/A
71        Pending                           : N/A
72    GSP Firmware Version                  : N/A
73    GPU Virtualization Mode
74        Virtualization Mode               : None
75        Host VGPU Mode                    : N/A
76    IBMNPU
77        Relaxed Ordering Mode             : N/A
78    PCI
79        Bus                               : 0x01
80        Device                            : 0x00
81        Domain                            : 0x0000
82        Device Id                         : 0x1C8D10DE
83        Bus Id                            : 00000000:01:00.0
84        Sub System Id                     : 0x07BE1028
85        GPU Link Info
86            PCIe Generation
87                Max                       : 3
88                Current                   : 3
89            Link Width
90                Max                       : 16x
91                Current                   : 16x
92        Bridge Chip
93            Type                          : N/A
94            Firmware                      : N/A
95        Replays Since Reset               : 0
96        Replay Number Rollovers           : 0
97        Tx Throughput                     : 0 KB/s
98        Rx Throughput                     : 0 KB/s
99    Fan Speed                             : N/A
100    Performance State                     : P8
101    Clocks Throttle Reasons
102        Idle                              : Active
103        Applications Clocks Setting       : Not Active
104        SW Power Cap                      : Not Active
105        HW Slowdown                       : Not Active
106            HW Thermal Slowdown           : Not Active
107            HW Power Brake Slowdown       : Not Active
108        Sync Boost                        : Not Active
109        SW Thermal Slowdown               : Not Active
110        Display Clock Setting             : Not Active
111    FB Memory Usage
112        Total                             : 4096 MiB
113        Used                              : 75 MiB
114        Free                              : 4021 MiB
115    BAR1 Memory Usage
116        Total                             : 256 MiB
117        Used                              : 2 MiB
118        Free                              : 254 MiB
119    Compute Mode                          : Default
120    Utilization
121        Gpu                               : 0 %
122        Memory                            : 0 %
123        Encoder                           : 0 %
124        Decoder                           : 0 %
125    Encoder Stats
126        Active Sessions                   : 0
127        Average FPS                       : 0
128        Average Latency                   : 0
129    FBC Stats
130        Active Sessions                   : 0
131        Average FPS                       : 0
132        Average Latency                   : 0
133    Ecc Mode
134        Current                           : N/A
135        Pending                           : N/A
136    ECC Errors
137        Volatile
138            Single Bit
139                Device Memory             : N/A
140                Register File             : N/A
141                L1 Cache                  : N/A
142                L2 Cache                  : N/A
143                Texture Memory            : N/A
144                Texture Shared            : N/A
145                CBU                       : N/A
146                Total                     : N/A
147            Double Bit
148                Device Memory             : N/A
149                Register File             : N/A
150                L1 Cache                  : N/A
151                L2 Cache                  : N/A
152                Texture Memory            : N/A
153                Texture Shared            : N/A
154                CBU                       : N/A
155                Total                     : N/A
156        Aggregate
157            Single Bit
158                Device Memory             : N/A
159                Register File             : N/A
160                L1 Cache                  : N/A
161                L2 Cache                  : N/A
162                Texture Memory            : N/A
163                Texture Shared            : N/A
164                CBU                       : N/A
165                Total                     : N/A
166            Double Bit
167                Device Memory             : N/A
168                Register File             : N/A
169                L1 Cache                  : N/A
170                L2 Cache                  : N/A
171                Texture Memory            : N/A
172                Texture Shared            : N/A
173                CBU                       : N/A
174                Total                     : N/A
175    Retired Pages
176        Single Bit ECC                    : N/A
177        Double Bit ECC                    : N/A
178        Pending Page Blacklist            : N/A
179    Remapped Rows                         : N/A
180    Temperature
181        GPU Current Temp                  : 40 C
182        GPU Shutdown Temp                 : 102 C
183        GPU Slowdown Temp                 : 97 C
184        GPU Max Operating Temp            : 78 C
185        GPU Target Temperature            : N/A
186        Memory Current Temp               : N/A
187        Memory Max Operating Temp         : N/A
188    Power Readings
189        Power Management                  : N/A
190        Power Draw                        : N/A
191        Power Limit                       : N/A
192        Default Power Limit               : N/A
193        Enforced Power Limit              : N/A
194        Min Power Limit                   : N/A
195        Max Power Limit                   : N/A
196    Clocks
197        Graphics                          : 0 MHz
198        SM                                : 0 MHz
199        Memory                            : 405 MHz
200        Video                             : 0 MHz
201    Applications Clocks
202        Graphics                          : N/A
203        Memory                            : N/A
204    Default Applications Clocks
205        Graphics                          : N/A
206        Memory                            : N/A
207    Max Clocks
208        Graphics                          : 1911 MHz
209        SM                                : 1911 MHz
210        Memory                            : 3504 MHz
211        Video                             : 1708 MHz
212    Max Customer Boost Clocks
213        Graphics                          : N/A
214    Clock Policy
215        Auto Boost                        : N/A
216        Auto Boost Default                : N/A
217    Voltage
218        Graphics                          : N/A
219    Processes                             : None
220

ANSWER

Answered 2021-Nov-18 at 19:20

Turns out that Windows 10 Update Assistant incorrectly reported it upgraded my OS to 21H2 on my laptop. Checking Windows version by running winver reports that my OS is still 21H1. Of course CUDA in WSL2 will not work in Windows 10 without 21H2.

After successfully installing 21H2 I can confirm CUDA works with WSL2 even for laptops with Optimus NVIDIA cards.

Source https://stackoverflow.com/questions/70011494

QUESTION

How to run Pytorch on Macbook pro (M1) GPU?

Asked 2021-Nov-18 at 03:08

I tried to train a model using PyTorch on my Macbook pro. It uses the new generation apple M1 CPU. However, PyTorch couldn't recognize my GPUs.

1GPU available: False, used: False
2TPU available: False, using: 0 TPU cores
3IPU available: False, using: 0 IPUs
4

Does anyone know any solution?

I have updated all the libraries to the latest versions.

ANSWER

Answered 2021-Nov-18 at 03:08

It looks like PyTorch support for the M1 GPU is in the works, but is not yet complete.

From @soumith on GitHub:

So, here's an update. We plan to get the M1 GPU supported. @albanD, @ezyang and a few core-devs have been looking into it. I can't confirm/deny the involvement of any other folks right now.

So, what we have so far is that we had a prototype that was just about okay. We took the wrong approach (more graph-matching-ish), and the user-experience wasn't great -- some operations were really fast, some were really slow, there wasn't a smooth experience overall. One had to guess-work which of their workflows would be fast.

So, we're completely re-writing it using a new approach, which I think is a lot closer to your good ole PyTorch, but it is going to take some time. I don't think we're going to hit a public alpha in the next ~4 months.

We will open up development of this backend as soon as we can.

That post: https://github.com/pytorch/pytorch/issues/47702#issuecomment-965625139

TL;DR: a public beta is at least 4 months out.

Source https://stackoverflow.com/questions/68820453

Community Discussions contain sources that include Stack Exchange Network

Tutorials and Learning Resources in GPU

Tutorials and Learning Resources are not available at this moment for GPU

Share this Page

share link

Get latest updates on GPU