Popular New Releases in GPU
taichi
v1.0.0
gpu.js
Maintenance release
hashcat
hashcat v6.2.4
cupy
v10.3.1
EASTL
3.17.06
Popular Libraries in GPU
by taichi-dev c++
18670 Apache-2.0
Productive & portable high-performance programming in Python.
by gpujs javascript
13614 MIT
GPU Accelerated JavaScript
by hashcat c
10426
World's fastest and most advanced password recovery utility
by cupy python
5918 MIT
NumPy & SciPy for GPU
by electronicarts c++
5812 BSD-3-Clause
EASTL stands for Electronic Arts Standard Template Library. It is an extensive and robust implementation that has an emphasis on high performance.
by ethereum-mining c++
5395 GPL-3.0
Ethereum miner with OpenCL, CUDA and stratum support
by pola-rs rust
5341 MIT
Fast multi-threaded DataFrame library in Rust | Python | Node.js
by gfx-rs rust
4976 NOASSERTION
[maintenance mode] A low-overhead Vulkan-like GPU API for Rust.
by halide c++
4890 NOASSERTION
a language for fast, portable data-parallel computation
Trending New libraries in GPU
by pola-rs rust
5341 MIT
Fast multi-threaded DataFrame library in Rust | Python | Node.js
by EmbarkStudios rust
4419 NOASSERTION
🐉 Making Rust a first-class language and ecosystem for GPU shaders 🚧
by DualCoder c
2587 MIT
Unlock vGPU functionality for consumer grade GPUs.
by NVIDIA c++
1953 NOASSERTION
The C++ Standard Library for your entire system.
by NVlabs c++
1093 NOASSERTION
Lightning fast C++/CUDA neural network framework
by vosen c++
977 NOASSERTION
CUDA on Intel GPUs
by XuehaiPan python
784 GPL-3.0
An interactive NVIDIA-GPU process viewer, the one-stop solution for GPU process management.
by DTolm c++
745 MIT
Vulkan/CUDA/HIP/OpenCL Fast Fourier Transform library
by AsahiLinux c
662
Dissecting the M1's GPU for 3D acceleration
Top Authors in GPU
1
31 Libraries
14035
2
14 Libraries
2886
3
11 Libraries
621
4
10 Libraries
110
5
10 Libraries
555
6
9 Libraries
6610
7
9 Libraries
80
8
8 Libraries
5138
9
8 Libraries
3064
10
8 Libraries
1603
1
31 Libraries
14035
2
14 Libraries
2886
3
11 Libraries
621
4
10 Libraries
110
5
10 Libraries
555
6
9 Libraries
6610
7
9 Libraries
80
8
8 Libraries
5138
9
8 Libraries
3064
10
8 Libraries
1603
Trending Kits in GPU
The most powerful open source Java CPU libraries available today. It provides the core Java language with an incredible range of features and functions, including class loading, garbage collection, threading, synchronization, security, I/O management, exception handling, and more. Its flexibility makes it an amazing choice for those looking to build their own custom-made applications and programs. Java CPU libraries allow you to implement, adapt and use complex algorithms for the analysis of data. A profiler is a tool used by developers to see what part of their program or website is consuming resources such as memory or CPU time. Popular open source libraries for Java CPU include: oshi - Native Operating System and Hardware Information; AnotherMonitor - memory usage of Android devices; react-native-threads - Create new JS processes for CPU intensive work; r2cloud - Decode satellite signals on Raspberry PI or any other 64bit intel.
JavaScript is one of the most widely used programming languages in the world. The main purpose of the development of JavaScript was to provide dynamic interactivity on websites. The JS engine has attracted a significant number of developers from around the world. It become an industry standard for web development and backend programming. In 2020, JavaScript celebrated its 25th birthday, and it continues to be at the forefront of programming languages. When it comes to CPU libraries, there are many JavaScript libraries that can be used for your projects. It's hard to imagine a world without JavaScript CPU and it has many CPU-intensive tasks. Many developers depend on the following JavaScript CPU open source libraries are: scalene - high precision CPU, GPU, and memory profiler; ua-parser-js - UAParser.js Detect Browser, Engine, OS, CPU, and Device type/model from User Agent data; chillout - Reduce CPU usage by nonblocking async loop.
Java GPU Open Source libraries are a vital part of the Java ecosystem and a key component of many of the world's most popular websites. These projects are designed to enable high-performance Java applications on a variety of hardware and operating system architectures that can be used for various use cases like gaming, AI, ML, and Crypto mining. As GPU programming has become an active research area, many libraries have been proposed to speed up the development of scientific applications. We've done the research, and these are the 8 best Java GPU Open Source libraries listed in this kit. They are PixelFlow - A Processing/Java library for high-performance GPUComputing; CNNdroid - Open Source Library for GPUAccelerated Execution; aparapi - New Official Aparapi: a framework for executing native Java
A GPU (Graphics Processing Unit) is a programmable logic chip that executes many operations in parallel, especially for graphics computations. GPUs are essential for running heavy applications and thus are widely used in various areas of modern technology. In the world of web development, the GPU can be utilized to accelerate performing heavy computations. JavaScript is undoubtedly one of the most popular programming languages in the world. It is used to create dynamic web pages, build mobile apps and games, and even run servers thanks to NodeJS. And now it can also be used for GPU computing. NPM is the default package manager for JavaScript. It’s used by millions of developers to build and manage software. GPUs are already being used for quite some time in desktop and mobile applications like Adobe Premiere, Photoshop, After Effects or even games. Popular open-source libraries for JavaScript GPU among developers include: gpu.js - GPU Accelerated JavaScript; scalene - high precision CPU, GPU, and memory profiler; pai -Resource scheduling and cluster management for AI.
Python is the most popular programming language in the world. Its success lies in its versatility, allowing developers to create everything from simple APIs to complex applications. For machine learning and deep learning, Python has become a preferred language because of its flexibility. The data science and machine learning community has been developing many open source libraries for Python. GPUs are highly specialized chips designed to perform matrix multiplication operations at blazing speeds. Although they were initially intended for rendering computer graphics on screens, they have proved quite useful for machine learning applications as well. Python has a number of libraries that make it easy for us to leverage GPUs for both training and inference tasks. Some of these focus on improving generic performance by leveraging CUDA primitives and it provide higher level abstractions that allow you to quickly build complex architectures without worrying about implementation details. Some of the most popular open-source libraries for Python GPU among developers are: Jax - Composable transformations of Python NumPy programs; kitty - Cross platform, fast, feature rich, GPU based terminal; Image AI - python library built to empower developers to build applications.
Go (also known as go lang) is a programming language that was created by Google in 2009. It is designed for high performance and scalability with strict requirements on program correctness. The language combines elements of other languages such as Java, C++ and Python. The biggest difference between go and other languages such as Java or C++ is that it does not require compilation before running your code; instead, you can simply type “go build my program” into the terminal which will then execute all instructions one after another without needing an interpreter or compiler step first. GPU computing is a technology that has been around for a while, but only recently gained popularity due to the rise of deep learning and artificial intelligence. Developers tend to use some of the following open source libraries for Go GPU are: aresdb - A GPU powered real-time analytics storage and query engine; gapid - Graphics API Debugger; gpu-operator - NVIDIA GPU Operator creates.
C++ is a powerful programming language, which is widely used in many fields, especially in the embedded system. C++ is a statically typed, compiled programming language for general-purpose programming. It is also considered to be an intermediate-level language, as it comprises both high-level and low-level language features. These features make C++ a popular choice in the software industry and allow developers to create efficient applications that can be used in various domains. GPU libraries are widely used to accelerate the performance of matrix calculations, image processing and machine learning. GPUs are used not only in gaming and entertainment, but also in modern science. The number of computations that can be done on a GPU is significant. A few of the most popular open source libraries for C++ GPU are: Tensor RT - Tensor RT is a C library for high performance inference on NVIDIA GPUs and deep learning accelerators; array fire - Array Fire: a general purpose GPU library; compute - A C GPU Computing Library for OpenCL.
The next evolution of scientific computing will involve hardware accelerators, mainly FPGAs and GPUs, which have a higher number of cores and faster clock speeds than their CPU counterparts. With the ever-expanding functionality of personalized computing devices, it is now possible to achieve high performance computing on the graphics processing unit (GPU) with little knowledge of programming for parallel computing. Computing on the GPU has been shown to yield faster computation times than traditional CPU (central processing unit) programming by taking advantage of the high thread counts and large register files available on modern GPUs. Some of the most widely used open source libraries for C# GPU among developers include: Compute Sharp - NET 5 library to run C; GPU-particles - A GPU Particle System for Unity; Marching-Cubes-On-The-GPU - A implementation of the marching cubes algorithm on the GPU in Unity.
Trending Discussions on GPU
module 'numpy.distutils.__config__' has no attribute 'blas_opt_info'
How could I speed up my written python code: spheres contact detection (collision) using spatial searching
Unknown OpenCV exception while using EasyOcr
WebSocket not working when trying to send generated answer by keras
Does it make sense to use Conda + Poetry?
Azure Auto ML JobConfigurationMaxSizeExceeded error when using a cluster
Win10 Electron Error: Passthrough is not supported, GL is disabled, ANGLE is
How to make mediapipe pose estimation faster (python)
Why does nvidia-smi return "GPU access blocked by the operating system" in WSL2 under Windows 10 21H2
How to run Pytorch on Macbook pro (M1) GPU?
QUESTION
module 'numpy.distutils.__config__' has no attribute 'blas_opt_info'
Asked 2022-Mar-17 at 10:50I'm trying to study the neural-network-and-deep-learning (http://neuralnetworksanddeeplearning.com/chap1.html). Using the updated version for Python 3 by MichalDanielDobrzanski (https://github.com/MichalDanielDobrzanski/DeepLearningPython). Tried to run it in my command console and it gives an error below. I've tried uninstalling and reinstalling setuptools, theano, and numpy but none have worked thus far. Any help is very appreciated!!
Here's the full error log:
1WARNING (theano.configdefaults): g++ not available, if using conda: `conda install m2w64-toolchain`
2C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configdefaults.py:560: UserWarning: DeprecationWarning: there is no c++ compiler.This is deprecated and with Theano 0.11 a c++ compiler will be mandatory
3 warnings.warn("DeprecationWarning: there is no c++ compiler."
4WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
5Traceback (most recent call last):
6 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configparser.py", line 168, in fetch_val_for_key
7 return theano_cfg.get(section, option)
8 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\configparser.py", line 781, in get
9 d = self._unify_values(section, vars)
10 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\configparser.py", line 1149, in _unify_values
11 raise NoSectionError(section) from None
12configparser.NoSectionError: No section: 'blas'
13
14During handling of the above exception, another exception occurred:
15
16Traceback (most recent call last):
17 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configparser.py", line 327, in __get__
18 val_str = fetch_val_for_key(self.fullname,
19 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configparser.py", line 172, in fetch_val_for_key
20 raise KeyError(key)
21KeyError: 'blas.ldflags'
22
23During handling of the above exception, another exception occurred:
24
25Traceback (most recent call last):
26 File "C:\Users\ASUS\Documents\GitHub\Neural-network-and-deep-learning-but-for-python-3\test.py", line 156, in <module>
27 import network3
28 File "C:\Users\ASUS\Documents\GitHub\Neural-network-and-deep-learning-but-for-python-3\network3.py", line 37, in <module>
29 import theano
30 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\__init__.py", line 124, in <module>
31 from theano.scan_module import (scan, map, reduce, foldl, foldr, clone,
32 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\scan_module\__init__.py", line 41, in <module>
33 from theano.scan_module import scan_opt
34 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\scan_module\scan_opt.py", line 60, in <module>
35 from theano import tensor, scalar
36 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\tensor\__init__.py", line 17, in <module>
37 from theano.tensor import blas
38 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\tensor\blas.py", line 155, in <module>
39 from theano.tensor.blas_headers import blas_header_text
40 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\tensor\blas_headers.py", line 987, in <module>
41 if not config.blas.ldflags:
42 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configparser.py", line 332, in __get__
43 val_str = self.default()
44 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configdefaults.py", line 1284, in default_blas_ldflags
45 blas_info = np.distutils.__config__.blas_opt_info
46AttributeError: module 'numpy.distutils.__config__' has no attribute 'blas_opt_info'
47
ANSWER
Answered 2022-Feb-17 at 14:12I had the same issue and solved it downgrading numpy to version 1.20.3 by:
1WARNING (theano.configdefaults): g++ not available, if using conda: `conda install m2w64-toolchain`
2C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configdefaults.py:560: UserWarning: DeprecationWarning: there is no c++ compiler.This is deprecated and with Theano 0.11 a c++ compiler will be mandatory
3 warnings.warn("DeprecationWarning: there is no c++ compiler."
4WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
5Traceback (most recent call last):
6 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configparser.py", line 168, in fetch_val_for_key
7 return theano_cfg.get(section, option)
8 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\configparser.py", line 781, in get
9 d = self._unify_values(section, vars)
10 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\configparser.py", line 1149, in _unify_values
11 raise NoSectionError(section) from None
12configparser.NoSectionError: No section: 'blas'
13
14During handling of the above exception, another exception occurred:
15
16Traceback (most recent call last):
17 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configparser.py", line 327, in __get__
18 val_str = fetch_val_for_key(self.fullname,
19 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configparser.py", line 172, in fetch_val_for_key
20 raise KeyError(key)
21KeyError: 'blas.ldflags'
22
23During handling of the above exception, another exception occurred:
24
25Traceback (most recent call last):
26 File "C:\Users\ASUS\Documents\GitHub\Neural-network-and-deep-learning-but-for-python-3\test.py", line 156, in <module>
27 import network3
28 File "C:\Users\ASUS\Documents\GitHub\Neural-network-and-deep-learning-but-for-python-3\network3.py", line 37, in <module>
29 import theano
30 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\__init__.py", line 124, in <module>
31 from theano.scan_module import (scan, map, reduce, foldl, foldr, clone,
32 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\scan_module\__init__.py", line 41, in <module>
33 from theano.scan_module import scan_opt
34 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\scan_module\scan_opt.py", line 60, in <module>
35 from theano import tensor, scalar
36 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\tensor\__init__.py", line 17, in <module>
37 from theano.tensor import blas
38 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\tensor\blas.py", line 155, in <module>
39 from theano.tensor.blas_headers import blas_header_text
40 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\tensor\blas_headers.py", line 987, in <module>
41 if not config.blas.ldflags:
42 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configparser.py", line 332, in __get__
43 val_str = self.default()
44 File "C:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\theano\configdefaults.py", line 1284, in default_blas_ldflags
45 blas_info = np.distutils.__config__.blas_opt_info
46AttributeError: module 'numpy.distutils.__config__' has no attribute 'blas_opt_info'
47pip3 install --upgrade numpy==1.20.3
48
QUESTION
How could I speed up my written python code: spheres contact detection (collision) using spatial searching
Asked 2022-Mar-13 at 15:43I am working on a spatial search case for spheres in which I want to find connected spheres. For this aim, I searched around each sphere for spheres that centers are in a (maximum sphere diameter) distance from the searching sphere’s center. At first, I tried to use scipy related methods to do so, but scipy method takes longer times comparing to equivalent numpy method. For scipy, I have determined the number of K-nearest spheres firstly and then find them by cKDTree.query
, which lead to more time consumption. However, it is slower than numpy method even by omitting the first step with a constant value (it is not good to omit the first step in this case). It is contrary to my expectations about scipy spatial searching speed. So, I tried to use some list-loops instead some numpy lines for speeding up using numba prange
. Numba run the code a little faster, but I believe that this code can be optimized for better performances, perhaps by vectorization, using other alternative numpy modules or using numba in another way. I have used iteration on all spheres due to prevent probable memory leaks and …, where number of spheres are high.
1import numpy as np
2import numba as nb
3from scipy.spatial import cKDTree, distance
4
5# ---------------------------- input data ----------------------------
6""" For testing by prepared files:
7radii = np.load('a.npy') # shape: (n-spheres, ) must be loaded by np.load('a.npy') or np.loadtxt('radii_large.csv')
8poss = np.load('b.npy') # shape: (n-spheres, 3) must be loaded by np.load('b.npy') or np.loadtxt('pos_large.csv', delimiter=',')
9"""
10
11rnd = np.random.RandomState(70)
12data_volume = 200000
13
14radii = rnd.uniform(0.0005, 0.122, data_volume)
15dia_max = 2 * radii.max()
16
17x = rnd.uniform(-1.02, 1.02, (data_volume, 1))
18y = rnd.uniform(-3.52, 3.52, (data_volume, 1))
19z = rnd.uniform(-1.02, -0.575, (data_volume, 1))
20poss = np.hstack((x, y, z))
21# --------------------------------------------------------------------
22
23# @nb.jit('float64[:,::1](float64[:,::1], float64[::1])', forceobj=True, parallel=True)
24def ends_gap(poss, dia_max):
25 particle_corsp_overlaps = np.array([], dtype=np.float64)
26 ends_ind = np.empty([1, 2], dtype=np.int64)
27 """ using list looping """
28 # particle_corsp_overlaps = []
29 # ends_ind = []
30
31 # for particle_idx in nb.prange(len(poss)): # by list looping
32 for particle_idx in range(len(poss)):
33 unshared_idx = np.delete(np.arange(len(poss)), particle_idx) # <--- relatively high time consumer
34 poss_without = poss[unshared_idx]
35
36 """ # SCIPY method ---------------------------------------------------------------------------------------------
37 nears_i_ind = cKDTree(poss_without).query_ball_point(poss[particle_idx], r=dia_max) # <--- high time consumer
38 if len(nears_i_ind) > 0:
39 dist_i, dist_i_ind = cKDTree(poss_without[nears_i_ind]).query(poss[particle_idx], k=len(nears_i_ind)) # <--- high time consumer
40 if not isinstance(dist_i, float):
41 dist_i[dist_i_ind] = dist_i.copy()
42 """ # NUMPY method --------------------------------------------------------------------------------------------
43 lx_limit_idx = poss_without[:, 0] <= poss[particle_idx][0] + dia_max
44 ux_limit_idx = poss_without[:, 0] >= poss[particle_idx][0] - dia_max
45 ly_limit_idx = poss_without[:, 1] <= poss[particle_idx][1] + dia_max
46 uy_limit_idx = poss_without[:, 1] >= poss[particle_idx][1] - dia_max
47 lz_limit_idx = poss_without[:, 2] <= poss[particle_idx][2] + dia_max
48 uz_limit_idx = poss_without[:, 2] >= poss[particle_idx][2] - dia_max
49
50 nears_i_ind = np.where(lx_limit_idx & ux_limit_idx & ly_limit_idx & uy_limit_idx & lz_limit_idx & uz_limit_idx)[0]
51 if len(nears_i_ind) > 0:
52 dist_i = distance.cdist(poss_without[nears_i_ind], poss[particle_idx][None, :]).squeeze() # <--- relatively high time consumer
53 # """ # -------------------------------------------------------------------------------------------------------
54 contact_check = dist_i - (radii[unshared_idx][nears_i_ind] + radii[particle_idx])
55 connected = contact_check[contact_check <= 0]
56
57 particle_corsp_overlaps = np.concatenate((particle_corsp_overlaps, connected))
58 """ using list looping """
59 # if len(connected) > 0:
60 # for value_ in connected:
61 # particle_corsp_overlaps.append(value_)
62
63 contacts_ind = np.where([contact_check <= 0])[1]
64 contacts_sec_ind = np.array(nears_i_ind)[contacts_ind]
65 sphere_olps_ind = np.where((poss[:, None] == poss_without[contacts_sec_ind][None, :]).all(axis=2))[0] # <--- high time consumer
66
67 ends_ind_mod_temp = np.array([np.repeat(particle_idx, len(sphere_olps_ind)), sphere_olps_ind], dtype=np.int64).T
68 if particle_idx > 0:
69 ends_ind = np.concatenate((ends_ind, ends_ind_mod_temp))
70 else:
71 ends_ind[0, 0], ends_ind[0, 1] = ends_ind_mod_temp[0, 0], ends_ind_mod_temp[0, 1]
72 """ using list looping """
73 # for contacted_idx in sphere_olps_ind:
74 # ends_ind.append([particle_idx, contacted_idx])
75
76 # ends_ind_org = np.array(ends_ind) # using lists
77 ends_ind_org = ends_ind
78 ends_ind, ends_ind_idx = np.unique(np.sort(ends_ind_org), axis=0, return_index=True) # <--- relatively high time consumer
79 gap = np.array(particle_corsp_overlaps)[ends_ind_idx]
80 return gap, ends_ind, ends_ind_idx, ends_ind_org
81
In one of my tests on 23000 spheres, scipy, numpy, and numba-aided methods finished the loop in about 400, 200, and 180 seconds correspondingly using Colab TPU; for 500.000 spheres it take 3.5 hours. These execution times are not satisfying at all for my project, where number of spheres may be up to 1.000.000 in a medium data volume. I will call this code many times in my main code and seeking for ways that could perform this code in milliseconds (as much as fastest that it could). Is it possible?? I would be appreciated if anyone would speed up the code as it is needed.
Notes:
- This code must be executable with python 3.7+, on CPU and GPU.
- This code must be applicable for data size, at least, 300.000 spheres.
- All numpy, scipy, and … equivalent modules instead of my written modules, which make my code faster significantly, will be upvoted.
I would be appreciated for any recommendations or explanations about:
- Which method could be faster in this subject?
- Why scipy is not faster than other methods in this case and where it could be helpful relating to this subject?
- Choosing between iterator methods and matrix form methods is a confusing matter for me. Iterating methods use less memory and could be used and tuned up by numba and … but, I think, are not useful and comparable with matrix methods (which depends on memory limits) like numpy and … for huge sphere numbers. For this case, perhaps I could omit the iteration by numpy, but I guess strongly that it cannot be handled due to huge matrix size operations and memory leaks.
Prepared sample test data:
Poss data: 23000, 500000
Radii data: 23000, 500000
Line by line speed test logs: for two test cases scipy method and numpy time consumption.
ANSWER
Answered 2022-Feb-14 at 10:23Have you tried FLANN?
This code doesn't solve your problem completely. It simply finds the nearest 50 neighbors to each point in your 500000 point dataset:
1import numpy as np
2import numba as nb
3from scipy.spatial import cKDTree, distance
4
5# ---------------------------- input data ----------------------------
6""" For testing by prepared files:
7radii = np.load('a.npy') # shape: (n-spheres, ) must be loaded by np.load('a.npy') or np.loadtxt('radii_large.csv')
8poss = np.load('b.npy') # shape: (n-spheres, 3) must be loaded by np.load('b.npy') or np.loadtxt('pos_large.csv', delimiter=',')
9"""
10
11rnd = np.random.RandomState(70)
12data_volume = 200000
13
14radii = rnd.uniform(0.0005, 0.122, data_volume)
15dia_max = 2 * radii.max()
16
17x = rnd.uniform(-1.02, 1.02, (data_volume, 1))
18y = rnd.uniform(-3.52, 3.52, (data_volume, 1))
19z = rnd.uniform(-1.02, -0.575, (data_volume, 1))
20poss = np.hstack((x, y, z))
21# --------------------------------------------------------------------
22
23# @nb.jit('float64[:,::1](float64[:,::1], float64[::1])', forceobj=True, parallel=True)
24def ends_gap(poss, dia_max):
25 particle_corsp_overlaps = np.array([], dtype=np.float64)
26 ends_ind = np.empty([1, 2], dtype=np.int64)
27 """ using list looping """
28 # particle_corsp_overlaps = []
29 # ends_ind = []
30
31 # for particle_idx in nb.prange(len(poss)): # by list looping
32 for particle_idx in range(len(poss)):
33 unshared_idx = np.delete(np.arange(len(poss)), particle_idx) # <--- relatively high time consumer
34 poss_without = poss[unshared_idx]
35
36 """ # SCIPY method ---------------------------------------------------------------------------------------------
37 nears_i_ind = cKDTree(poss_without).query_ball_point(poss[particle_idx], r=dia_max) # <--- high time consumer
38 if len(nears_i_ind) > 0:
39 dist_i, dist_i_ind = cKDTree(poss_without[nears_i_ind]).query(poss[particle_idx], k=len(nears_i_ind)) # <--- high time consumer
40 if not isinstance(dist_i, float):
41 dist_i[dist_i_ind] = dist_i.copy()
42 """ # NUMPY method --------------------------------------------------------------------------------------------
43 lx_limit_idx = poss_without[:, 0] <= poss[particle_idx][0] + dia_max
44 ux_limit_idx = poss_without[:, 0] >= poss[particle_idx][0] - dia_max
45 ly_limit_idx = poss_without[:, 1] <= poss[particle_idx][1] + dia_max
46 uy_limit_idx = poss_without[:, 1] >= poss[particle_idx][1] - dia_max
47 lz_limit_idx = poss_without[:, 2] <= poss[particle_idx][2] + dia_max
48 uz_limit_idx = poss_without[:, 2] >= poss[particle_idx][2] - dia_max
49
50 nears_i_ind = np.where(lx_limit_idx & ux_limit_idx & ly_limit_idx & uy_limit_idx & lz_limit_idx & uz_limit_idx)[0]
51 if len(nears_i_ind) > 0:
52 dist_i = distance.cdist(poss_without[nears_i_ind], poss[particle_idx][None, :]).squeeze() # <--- relatively high time consumer
53 # """ # -------------------------------------------------------------------------------------------------------
54 contact_check = dist_i - (radii[unshared_idx][nears_i_ind] + radii[particle_idx])
55 connected = contact_check[contact_check <= 0]
56
57 particle_corsp_overlaps = np.concatenate((particle_corsp_overlaps, connected))
58 """ using list looping """
59 # if len(connected) > 0:
60 # for value_ in connected:
61 # particle_corsp_overlaps.append(value_)
62
63 contacts_ind = np.where([contact_check <= 0])[1]
64 contacts_sec_ind = np.array(nears_i_ind)[contacts_ind]
65 sphere_olps_ind = np.where((poss[:, None] == poss_without[contacts_sec_ind][None, :]).all(axis=2))[0] # <--- high time consumer
66
67 ends_ind_mod_temp = np.array([np.repeat(particle_idx, len(sphere_olps_ind)), sphere_olps_ind], dtype=np.int64).T
68 if particle_idx > 0:
69 ends_ind = np.concatenate((ends_ind, ends_ind_mod_temp))
70 else:
71 ends_ind[0, 0], ends_ind[0, 1] = ends_ind_mod_temp[0, 0], ends_ind_mod_temp[0, 1]
72 """ using list looping """
73 # for contacted_idx in sphere_olps_ind:
74 # ends_ind.append([particle_idx, contacted_idx])
75
76 # ends_ind_org = np.array(ends_ind) # using lists
77 ends_ind_org = ends_ind
78 ends_ind, ends_ind_idx = np.unique(np.sort(ends_ind_org), axis=0, return_index=True) # <--- relatively high time consumer
79 gap = np.array(particle_corsp_overlaps)[ends_ind_idx]
80 return gap, ends_ind, ends_ind_idx, ends_ind_org
81from pyflann import FLANN
82
83p = np.loadtxt("pos_large.csv", delimiter=",")
84flann = FLANN()
85flann.build_index(pts=p)
86idx, dist = flann.nn_index(qpts=p, num_neighbors=50)
87
The last line takes less than a second in my laptop without any tuning or parallelization.
QUESTION
Unknown OpenCV exception while using EasyOcr
Asked 2022-Feb-22 at 09:04Code:
1import easyocr
2
3reader = easyocr.Reader(['en'])
4result = reader.readtext('R.png')
5
Output:
1import easyocr
2
3reader = easyocr.Reader(['en'])
4result = reader.readtext('R.png')
5CUDA not available - defaulting to CPU. Note: This module is much faster with a GPU.
6
7cv2.error: Unknown C++ exception from OpenCV code
8
9
I would truly appreciate any support!
ANSWER
Answered 2022-Jan-09 at 10:19The new version of OpenCV has some issues. Uninstall the newer version of OpenCV and install the older one using:
1import easyocr
2
3reader = easyocr.Reader(['en'])
4result = reader.readtext('R.png')
5CUDA not available - defaulting to CPU. Note: This module is much faster with a GPU.
6
7cv2.error: Unknown C++ exception from OpenCV code
8
9pip install opencv-python==4.5.4.60
10
QUESTION
WebSocket not working when trying to send generated answer by keras
Asked 2022-Feb-17 at 12:52I am implementing a simple chatbot using keras and WebSockets. I now have a model that can make a prediction about the user input and send the according answer.
When I do it through command line it works fine, however when I try to send the answer through my WebSocket, the WebSocket doesn't even start anymore.
Here is my working WebSocket code:
1@sock.route('/api')
2def echo(sock):
3 while True:
4 # get user input from browser
5 user_input = sock.receive()
6 # print user input on console
7 print(user_input)
8 # read answer from console
9 response = input()
10 # send response to browser
11 sock.send(response)
12
Here is my code to communicate with the keras model on command line:
1@sock.route('/api')
2def echo(sock):
3 while True:
4 # get user input from browser
5 user_input = sock.receive()
6 # print user input on console
7 print(user_input)
8 # read answer from console
9 response = input()
10 # send response to browser
11 sock.send(response)
12while True:
13 question = input("")
14 ints = predict(question)
15 answer = response(ints, json_data)
16 print(answer)
17
Used methods are those:
1@sock.route('/api')
2def echo(sock):
3 while True:
4 # get user input from browser
5 user_input = sock.receive()
6 # print user input on console
7 print(user_input)
8 # read answer from console
9 response = input()
10 # send response to browser
11 sock.send(response)
12while True:
13 question = input("")
14 ints = predict(question)
15 answer = response(ints, json_data)
16 print(answer)
17def predict(sentence):
18 bag_of_words = convert_sentence_in_bag_of_words(sentence)
19 # pass bag as list and get index 0
20 prediction = model.predict(np.array([bag_of_words]))[0]
21 ERROR_THRESHOLD = 0.25
22 accepted_results = [[tag, probability] for tag, probability in enumerate(prediction) if probability > ERROR_THRESHOLD]
23
24 accepted_results.sort(key=lambda x: x[1], reverse=True)
25
26 output = []
27 for accepted_result in accepted_results:
28 output.append({'intent': classes[accepted_result[0]], 'probability': str(accepted_result[1])})
29 print(output)
30 return output
31
32
33def response(intents, json):
34 tag = intents[0]['intent']
35 intents_as_list = json['intents']
36 for i in intents_as_list:
37 if i['tag'] == tag:
38 res = random.choice(i['responses'])
39 break
40 return res
41
So when I start the WebSocket with the working code I get this output:
1@sock.route('/api')
2def echo(sock):
3 while True:
4 # get user input from browser
5 user_input = sock.receive()
6 # print user input on console
7 print(user_input)
8 # read answer from console
9 response = input()
10 # send response to browser
11 sock.send(response)
12while True:
13 question = input("")
14 ints = predict(question)
15 answer = response(ints, json_data)
16 print(answer)
17def predict(sentence):
18 bag_of_words = convert_sentence_in_bag_of_words(sentence)
19 # pass bag as list and get index 0
20 prediction = model.predict(np.array([bag_of_words]))[0]
21 ERROR_THRESHOLD = 0.25
22 accepted_results = [[tag, probability] for tag, probability in enumerate(prediction) if probability > ERROR_THRESHOLD]
23
24 accepted_results.sort(key=lambda x: x[1], reverse=True)
25
26 output = []
27 for accepted_result in accepted_results:
28 output.append({'intent': classes[accepted_result[0]], 'probability': str(accepted_result[1])})
29 print(output)
30 return output
31
32
33def response(intents, json):
34 tag = intents[0]['intent']
35 intents_as_list = json['intents']
36 for i in intents_as_list:
37 if i['tag'] == tag:
38 res = random.choice(i['responses'])
39 break
40 return res
41 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
42 * Restarting with stat
43 * Serving Flask app 'server' (lazy loading)
44 * Environment: production
45 WARNING: This is a development server. Do not use it in a production deployment.
46 Use a production WSGI server instead.
47 * Debug mode: on
48
But as soon as I have anything of my model in the server.py
class I get this output:
1@sock.route('/api')
2def echo(sock):
3 while True:
4 # get user input from browser
5 user_input = sock.receive()
6 # print user input on console
7 print(user_input)
8 # read answer from console
9 response = input()
10 # send response to browser
11 sock.send(response)
12while True:
13 question = input("")
14 ints = predict(question)
15 answer = response(ints, json_data)
16 print(answer)
17def predict(sentence):
18 bag_of_words = convert_sentence_in_bag_of_words(sentence)
19 # pass bag as list and get index 0
20 prediction = model.predict(np.array([bag_of_words]))[0]
21 ERROR_THRESHOLD = 0.25
22 accepted_results = [[tag, probability] for tag, probability in enumerate(prediction) if probability > ERROR_THRESHOLD]
23
24 accepted_results.sort(key=lambda x: x[1], reverse=True)
25
26 output = []
27 for accepted_result in accepted_results:
28 output.append({'intent': classes[accepted_result[0]], 'probability': str(accepted_result[1])})
29 print(output)
30 return output
31
32
33def response(intents, json):
34 tag = intents[0]['intent']
35 intents_as_list = json['intents']
36 for i in intents_as_list:
37 if i['tag'] == tag:
38 res = random.choice(i['responses'])
39 break
40 return res
41 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
42 * Restarting with stat
43 * Serving Flask app 'server' (lazy loading)
44 * Environment: production
45 WARNING: This is a development server. Do not use it in a production deployment.
46 Use a production WSGI server instead.
47 * Debug mode: on
482022-02-13 11:31:38.887640: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
492022-02-13 11:31:38.887734: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
50Metal device set to: Apple M1
51
52systemMemory: 16.00 GB
53maxCacheSize: 5.33 GB
54
It is enough when I just have an import at the top like this: from chatty import response, predict
- even though they are unused.
ANSWER
Answered 2022-Feb-16 at 19:53There is no problem with your websocket route. Could you please share how you are triggering this route? Websocket is a different protocol and I'm suspecting that you are using a HTTP client to test websocket. For example in Postman:
HTTP requests are different than websocket requests. So, you should use appropriate client to test websocket.
QUESTION
Does it make sense to use Conda + Poetry?
Asked 2022-Feb-14 at 10:04Does it make sense to use Conda + Poetry for a Machine Learning project? Allow me to share my (novice) understanding and please correct or enlighten me:
As far as I understand, Conda and Poetry have different purposes but are largely redundant:
- Conda is primarily a environment manager (in fact not necessarily Python), but it can also manage packages and dependencies.
- Poetry is primarily a Python package manager (say, an upgrade of pip), but it can also create and manage Python environments (say, an upgrade of Pyenv).
My idea is to use both and compartmentalize their roles: let Conda be the environment manager and Poetry the package manager. My reasoning is that (it sounds like) Conda is best for managing environments and can be used for compiling and installing non-python packages, especially CUDA drivers (for GPU capability), while Poetry is more powerful than Conda as a Python package manager.
I've managed to make this work fairly easily by using Poetry within a Conda environment. The trick is to not use Poetry to manage the Python environment: I'm not using commands like poetry shell
or poetry run
, only poetry init
, poetry install
etc (after activating the Conda environment).
For full disclosure, my environment.yml file (for Conda) looks like this:
1name: N
2
3channels:
4 - defaults
5 - conda-forge
6
7dependencies:
8 - python=3.9
9 - cudatoolkit
10 - cudnn
11
and my poetry.toml file looks like that:
1name: N
2
3channels:
4 - defaults
5 - conda-forge
6
7dependencies:
8 - python=3.9
9 - cudatoolkit
10 - cudnn
11[tool.poetry]
12name = "N"
13authors = ["B"]
14
15[tool.poetry.dependencies]
16python = "3.9"
17torch = "^1.10.1"
18
19[build-system]
20requires = ["poetry-core>=1.0.0"]
21build-backend = "poetry.core.masonry.api"
22
To be honest, one of the reasons I proceeded this way is that I was struggling to install CUDA (for GPU support) without Conda.
Does this project design look reasonable to you?
ANSWER
Answered 2022-Feb-14 at 10:04As I wrote in the comment, I've been using a very similar Conda + Poetry setup in a data science project for the last year, for reasons similar to yours, and it's been working fine. The great majority of my dependencies are specified in pyproject.toml
, but when there's something that's unavailable in PyPI, I add it to environment.yml
.
Some additional tips:
- Add Poetry, possibly with a version number (if needed), as a dependency in
environment.yml
, so that you get Poetry installed when you runconda env create
, along with Python and other non-PyPI dependencies. - Consider adding
conda-lock
, which gives you lock files for Conda dependencies, just like you havepoetry.lock
for Poetry dependencies.
QUESTION
Azure Auto ML JobConfigurationMaxSizeExceeded error when using a cluster
Asked 2022-Jan-03 at 10:09I am running into the following error when I try to run Automated ML through the studio on a GPU compute cluster:
Error: AzureMLCompute job failed. JobConfigurationMaxSizeExceeded: The specified job configuration exceeds the max allowed size of 32768 characters. Please reduce the size of the job's command line arguments and environment settings
The attempted run is on a registered tabulated dataset in filestore and is a simple regression case. Strangely, it works just fine with the CPU compute instance I use for my other pipelines. I have been able to run it a few times using that and wanted to upgrade to a cluster only to be hit by this error. I found online that it could be a case of having the following setting: AZUREML_COMPUTE_USE_COMMON_RUNTIME:false; but I am not sure where to put this in when just running from the web studio.
ANSWER
Answered 2021-Dec-13 at 17:58This is a known bug. I am following up with product group to see if there any update for this bug. For the workaround you mentioned, it need you to go to the node failing with the JobConfigurationMaxSizeExceeded exception and manually set AZUREML_COMPUTE_USE_COMMON_RUNTIME:false in their Environment JSON field.
QUESTION
Win10 Electron Error: Passthrough is not supported, GL is disabled, ANGLE is
Asked 2022-Jan-03 at 01:54I have an electron repo (https://github.com/MartinBarker/RenderTune) which used to work on windows 10 fine when ran with command prompt. After a couple months I come back on a new fresh windows 10 machine with an Nvidia GPU, and the electron app prints an error in the window when starting up:
1Uncaught TypeError: Cannot read properties of undefined (reading 'getCurrentWindow')
2
Running ffmpeg shell commands results in an error as well, and in the command prompt terminal this message is outputted:
1Uncaught TypeError: Cannot read properties of undefined (reading 'getCurrentWindow')
2[14880:1207/145651.085:ERROR:gpu_init.cc(457)] Passthrough is not supported, GL is disabled, ANGLE is
3
I checked on my other Windows laptop machines running the same exact code from the master branch of my repo, and it works perfectly fine when running locally.
It seems like this might be a recent issue? I have found it discussed in various forums: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1944468
I tried upgrading my global electron npm package to a more recent version: electron@16.0.4 , but the errors still appear.
ANSWER
Answered 2022-Jan-03 at 01:54You can try disabling hardware acceleration using app.disableHardwareAcceleration()
(See the docs). I don't think this is a fix though, it just makes the message go away for me.
Example Usage
main.js
1Uncaught TypeError: Cannot read properties of undefined (reading 'getCurrentWindow')
2[14880:1207/145651.085:ERROR:gpu_init.cc(457)] Passthrough is not supported, GL is disabled, ANGLE is
3import { app, BrowserWindow } from 'electron'
4import isDev from 'electron-is-dev'
5
6app.disableHardwareAcceleration()
7
8let win = null
9
10async function createWindow() {
11 win = new BrowserWindow({
12 title: 'My Window'
13 })
14
15 const winURL = isDev
16 ? 'http://localhost:9080'
17 : `file://${__dirname}/index.html`
18 win.loadURL(url)
19
20 win.on('ready-to-show', async () => {
21 win.show()
22 win.maximize()
23 })
24}
25
26app.whenReady().then(createWindow)
27
28app.on('window-all-closed', () => {
29 win = null
30 if (process.platform !== 'darwin') {
31 app.quit()
32 }
33})
34
QUESTION
How to make mediapipe pose estimation faster (python)
Asked 2021-Dec-20 at 16:11I'm making a pose estimation script for my game. However, it's working at 20-30 fps and not using the whole CPU even if there is no fps limit. It's not using whole GPU too. Can someone help me?
Here is resource usage while playing a dance video: https://imgur.com/a/6yI2TWg
Here is my code:
1import cv2
2import mediapipe as mp
3import time
4
5inFile = '/dev/video0'
6
7capture = cv2.VideoCapture(inFile)
8FramesVideo = int(capture.get(cv2.CAP_PROP_FRAME_COUNT)) # Number of frames inside video
9FrameCount = 0 # Currently playing frame
10prevTime = 0
11
12# some objects for mediapipe
13mpPose = mp.solutions.pose
14mpDraw = mp.solutions.drawing_utils
15pose = mpPose.Pose()
16
17while True:
18 FrameCount += 1
19 #read image and convert to rgb
20 success, img = capture.read()
21 imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
22
23 #process image
24 results = pose.process(imgRGB)
25
26 if results.pose_landmarks:
27 mpDraw.draw_landmarks(img, results.pose_landmarks, mpPose.POSE_CONNECTIONS)
28 #get landmark positions
29 landmarks = []
30 for id, lm in enumerate(results.pose_landmarks.landmark):
31 h, w, c = img.shape
32 cx, cy = int(lm.x * w), int(lm.y * h)
33 cv2.putText(img, str(id), (cx,cy), cv2.FONT_HERSHEY_PLAIN, 1, (255,0,0), 1)
34 landmarks.append((cx,cy))
35
36 # calculate and print fps
37 frameTime = time.time()
38 fps = 1/(frameTime-prevTime)
39 prevTime = frameTime
40 cv2.putText(img, str(int(fps)), (30,50), cv2.FONT_HERSHEY_PLAIN, 3, (255,0,0), 3)
41
42 #show image
43 cv2.imshow('Video', img)
44 cv2.waitKey(1)
45 if FrameCount == FramesVideo-1:
46 capture.release()
47 cv2.destroyAllWindows()
48 break
49
ANSWER
Answered 2021-Dec-20 at 16:11Set the model_complexity
of mp.Pose
to 0
.
MODEL_COMPLEXITY Complexity of the pose landmark model: 0, 1 or 2. Landmark accuracy as well as inference latency generally go up with the model complexity. Default to 1.
This is the best solution I've found, also use this.
QUESTION
Why does nvidia-smi return "GPU access blocked by the operating system" in WSL2 under Windows 10 21H2
Asked 2021-Nov-18 at 19:20I've installed Windows 10 21H2 on both my desktop (AMD 5950X system with RTX3080) and my laptop (Dell XPS 9560 with i7-7700HQ and GTX1050) following the instructions on https://docs.nvidia.com/cuda/wsl-user-guide/index.html:
- Install CUDA-capable driver in Windows
- Update WSL2 kernel in PowerShell:
wsl --update
- Install CUDA toolkit in Ubuntu 20.04 in WSL2 (Note that you don't install a CUDA driver in WSL2, the instructions explicitly tell that the CUDA driver should not be installed.):
1$ wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
2$ sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
3$ wget https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb
4$ sudo dpkg -i cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb
5$ sudo apt-key add /var/cuda-repo-wsl-ubuntu-11-4-local/7fa2af80.pub
6$ sudo apt-get update
7$ sudo apt-get -y install cuda
8
On my desktop nvidia-smi
and CUDA samples are working fine in WSL2.
But on my laptop running nvidia-smi
in WSL2 returns:
1$ wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
2$ sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
3$ wget https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb
4$ sudo dpkg -i cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb
5$ sudo apt-key add /var/cuda-repo-wsl-ubuntu-11-4-local/7fa2af80.pub
6$ sudo apt-get update
7$ sudo apt-get -y install cuda
8$ nvidia-smi
9Failed to initialize NVML: GPU access blocked by the operating system
10Failed to properly shut down NVML: GPU access blocked by the operating system
11
I'm aware my laptop has NVIDIA Optimus with both Intel IGP and NVIDIA GTX1050, but CUDA is working fine in Windows. Only not in WSL2. But I also could not find any information that CUDA is not supposed to work in WSL2 for Optimus systems.
What I've triedI've tried the following mitigations, but the error remains:
- reinstalling the Windows CUDA driver again and rebooting
- Making the GTX1050 the preferred GPU in global settings in the NVIDIA control panel
- Making the GTX1050 the default physx processor
- Following the same steps for a fresh Ubuntu 18.04 in WSL2
Is this a CUDA WSL2 bug? Or does CUDA simply not work with Optimus? Or how can I fix or further debug this?
More detailsI've compared running nvidia-smi.exe
in Windows powershell between my desktop and laptop, and they both return the same software versions:
1$ wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
2$ sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
3$ wget https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb
4$ sudo dpkg -i cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb
5$ sudo apt-key add /var/cuda-repo-wsl-ubuntu-11-4-local/7fa2af80.pub
6$ sudo apt-get update
7$ sudo apt-get -y install cuda
8$ nvidia-smi
9Failed to initialize NVML: GPU access blocked by the operating system
10Failed to properly shut down NVML: GPU access blocked by the operating system
11PS C:\WINDOWS\system32> nvidia-smi
12Wed Nov 17 21:46:50 2021
13+-----------------------------------------------------------------------------+
14| NVIDIA-SMI 510.06 Driver Version: 510.06 CUDA Version: 11.6 |
15|-------------------------------+----------------------+----------------------+
16| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
17| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
18| | | MIG M. |
19|===============================+======================+======================|
20| 0 NVIDIA GeForce ... WDDM | 00000000:01:00.0 Off | N/A |
21| N/A 44C P8 N/A / N/A | 75MiB / 4096MiB | 1% Default |
22| | | N/A |
23+-------------------------------+----------------------+----------------------+
24
25+-----------------------------------------------------------------------------+
26| Processes: |
27| GPU GI CI PID Type Process name GPU Memory |
28| ID ID Usage |
29|=============================================================================|
30| No running processes found |
31+-----------------------------------------------------------------------------+
32
The full nvidia-smi.exe -q
on my laptop in Windows Powershell returns the following information about my laptop's GPU:
1$ wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
2$ sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
3$ wget https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb
4$ sudo dpkg -i cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb
5$ sudo apt-key add /var/cuda-repo-wsl-ubuntu-11-4-local/7fa2af80.pub
6$ sudo apt-get update
7$ sudo apt-get -y install cuda
8$ nvidia-smi
9Failed to initialize NVML: GPU access blocked by the operating system
10Failed to properly shut down NVML: GPU access blocked by the operating system
11PS C:\WINDOWS\system32> nvidia-smi
12Wed Nov 17 21:46:50 2021
13+-----------------------------------------------------------------------------+
14| NVIDIA-SMI 510.06 Driver Version: 510.06 CUDA Version: 11.6 |
15|-------------------------------+----------------------+----------------------+
16| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
17| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
18| | | MIG M. |
19|===============================+======================+======================|
20| 0 NVIDIA GeForce ... WDDM | 00000000:01:00.0 Off | N/A |
21| N/A 44C P8 N/A / N/A | 75MiB / 4096MiB | 1% Default |
22| | | N/A |
23+-------------------------------+----------------------+----------------------+
24
25+-----------------------------------------------------------------------------+
26| Processes: |
27| GPU GI CI PID Type Process name GPU Memory |
28| ID ID Usage |
29|=============================================================================|
30| No running processes found |
31+-----------------------------------------------------------------------------+
32PS C:\WINDOWS\system32> nvidia-smi -q
33
34==============NVSMI LOG==============
35
36Timestamp : Wed Nov 17 21:48:19 2021
37Driver Version : 510.06
38CUDA Version : 11.6
39
40Attached GPUs : 1
41GPU 00000000:01:00.0
42 Product Name : NVIDIA GeForce GTX 1050
43 Product Brand : GeForce
44 Product Architecture : Pascal
45 Display Mode : Disabled
46 Display Active : Disabled
47 Persistence Mode : N/A
48 MIG Mode
49 Current : N/A
50 Pending : N/A
51 Accounting Mode : Disabled
52 Accounting Mode Buffer Size : 4000
53 Driver Model
54 Current : WDDM
55 Pending : WDDM
56 Serial Number : N/A
57 GPU UUID : GPU-7645072f-7516-5488-316d-6277d101f64e
58 Minor Number : N/A
59 VBIOS Version : 86.07.3e.00.1c
60 MultiGPU Board : No
61 Board ID : 0x100
62 GPU Part Number : N/A
63 Module ID : 0
64 Inforom Version
65 Image Version : N/A
66 OEM Object : N/A
67 ECC Object : N/A
68 Power Management Object : N/A
69 GPU Operation Mode
70 Current : N/A
71 Pending : N/A
72 GSP Firmware Version : N/A
73 GPU Virtualization Mode
74 Virtualization Mode : None
75 Host VGPU Mode : N/A
76 IBMNPU
77 Relaxed Ordering Mode : N/A
78 PCI
79 Bus : 0x01
80 Device : 0x00
81 Domain : 0x0000
82 Device Id : 0x1C8D10DE
83 Bus Id : 00000000:01:00.0
84 Sub System Id : 0x07BE1028
85 GPU Link Info
86 PCIe Generation
87 Max : 3
88 Current : 3
89 Link Width
90 Max : 16x
91 Current : 16x
92 Bridge Chip
93 Type : N/A
94 Firmware : N/A
95 Replays Since Reset : 0
96 Replay Number Rollovers : 0
97 Tx Throughput : 0 KB/s
98 Rx Throughput : 0 KB/s
99 Fan Speed : N/A
100 Performance State : P8
101 Clocks Throttle Reasons
102 Idle : Active
103 Applications Clocks Setting : Not Active
104 SW Power Cap : Not Active
105 HW Slowdown : Not Active
106 HW Thermal Slowdown : Not Active
107 HW Power Brake Slowdown : Not Active
108 Sync Boost : Not Active
109 SW Thermal Slowdown : Not Active
110 Display Clock Setting : Not Active
111 FB Memory Usage
112 Total : 4096 MiB
113 Used : 75 MiB
114 Free : 4021 MiB
115 BAR1 Memory Usage
116 Total : 256 MiB
117 Used : 2 MiB
118 Free : 254 MiB
119 Compute Mode : Default
120 Utilization
121 Gpu : 0 %
122 Memory : 0 %
123 Encoder : 0 %
124 Decoder : 0 %
125 Encoder Stats
126 Active Sessions : 0
127 Average FPS : 0
128 Average Latency : 0
129 FBC Stats
130 Active Sessions : 0
131 Average FPS : 0
132 Average Latency : 0
133 Ecc Mode
134 Current : N/A
135 Pending : N/A
136 ECC Errors
137 Volatile
138 Single Bit
139 Device Memory : N/A
140 Register File : N/A
141 L1 Cache : N/A
142 L2 Cache : N/A
143 Texture Memory : N/A
144 Texture Shared : N/A
145 CBU : N/A
146 Total : N/A
147 Double Bit
148 Device Memory : N/A
149 Register File : N/A
150 L1 Cache : N/A
151 L2 Cache : N/A
152 Texture Memory : N/A
153 Texture Shared : N/A
154 CBU : N/A
155 Total : N/A
156 Aggregate
157 Single Bit
158 Device Memory : N/A
159 Register File : N/A
160 L1 Cache : N/A
161 L2 Cache : N/A
162 Texture Memory : N/A
163 Texture Shared : N/A
164 CBU : N/A
165 Total : N/A
166 Double Bit
167 Device Memory : N/A
168 Register File : N/A
169 L1 Cache : N/A
170 L2 Cache : N/A
171 Texture Memory : N/A
172 Texture Shared : N/A
173 CBU : N/A
174 Total : N/A
175 Retired Pages
176 Single Bit ECC : N/A
177 Double Bit ECC : N/A
178 Pending Page Blacklist : N/A
179 Remapped Rows : N/A
180 Temperature
181 GPU Current Temp : 40 C
182 GPU Shutdown Temp : 102 C
183 GPU Slowdown Temp : 97 C
184 GPU Max Operating Temp : 78 C
185 GPU Target Temperature : N/A
186 Memory Current Temp : N/A
187 Memory Max Operating Temp : N/A
188 Power Readings
189 Power Management : N/A
190 Power Draw : N/A
191 Power Limit : N/A
192 Default Power Limit : N/A
193 Enforced Power Limit : N/A
194 Min Power Limit : N/A
195 Max Power Limit : N/A
196 Clocks
197 Graphics : 0 MHz
198 SM : 0 MHz
199 Memory : 405 MHz
200 Video : 0 MHz
201 Applications Clocks
202 Graphics : N/A
203 Memory : N/A
204 Default Applications Clocks
205 Graphics : N/A
206 Memory : N/A
207 Max Clocks
208 Graphics : 1911 MHz
209 SM : 1911 MHz
210 Memory : 3504 MHz
211 Video : 1708 MHz
212 Max Customer Boost Clocks
213 Graphics : N/A
214 Clock Policy
215 Auto Boost : N/A
216 Auto Boost Default : N/A
217 Voltage
218 Graphics : N/A
219 Processes : None
220
ANSWER
Answered 2021-Nov-18 at 19:20Turns out that Windows 10 Update Assistant incorrectly reported it upgraded my OS to 21H2 on my laptop.
Checking Windows version by running winver
reports that my OS is still 21H1.
Of course CUDA in WSL2 will not work in Windows 10 without 21H2.
After successfully installing 21H2 I can confirm CUDA works with WSL2 even for laptops with Optimus NVIDIA cards.
QUESTION
How to run Pytorch on Macbook pro (M1) GPU?
Asked 2021-Nov-18 at 03:08I tried to train a model using PyTorch on my Macbook pro. It uses the new generation apple M1 CPU. However, PyTorch couldn't recognize my GPUs.
1GPU available: False, used: False
2TPU available: False, using: 0 TPU cores
3IPU available: False, using: 0 IPUs
4
Does anyone know any solution?
I have updated all the libraries to the latest versions.
ANSWER
Answered 2021-Nov-18 at 03:08It looks like PyTorch support for the M1 GPU is in the works, but is not yet complete.
From @soumith on GitHub:
So, here's an update. We plan to get the M1 GPU supported. @albanD, @ezyang and a few core-devs have been looking into it. I can't confirm/deny the involvement of any other folks right now.
So, what we have so far is that we had a prototype that was just about okay. We took the wrong approach (more graph-matching-ish), and the user-experience wasn't great -- some operations were really fast, some were really slow, there wasn't a smooth experience overall. One had to guess-work which of their workflows would be fast.
So, we're completely re-writing it using a new approach, which I think is a lot closer to your good ole PyTorch, but it is going to take some time. I don't think we're going to hit a public alpha in the next ~4 months.
We will open up development of this backend as soon as we can.
That post: https://github.com/pytorch/pytorch/issues/47702#issuecomment-965625139
TL;DR: a public beta is at least 4 months out.
Community Discussions contain sources that include Stack Exchange Network
Tutorials and Learning Resources in GPU
Tutorials and Learning Resources are not available at this moment for GPU