Popular New Releases in Numpy
pytorch
PyTorch 1.11, TorchData, and functorch are now available
numpy
jax
JAX release v0.3.6
datasets
2.1.0
mlcourse.ai
Self-paced mlcourse.ai
Popular Libraries in Numpy
by jackfrued python
114192
Python - 100天从新手到大师
by pytorch c++
55457 NOASSERTION
Tensors and Dynamic neural networks in Python with strong GPU acceleration
by jakevdp jupyter notebook
32215 NOASSERTION
Python Data Science Handbook: full text in Jupyter Notebooks
by donnemartin python
21519 NOASSERTION
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
by numpy python
20101 BSD-3-Clause
The fundamental package for scientific computing with Python.
by google python
17239 Apache-2.0
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
by fchollet jupyter notebook
13453 MIT
Jupyter notebooks for the code samples of the book "Deep Learning with Python"
by huggingface python
13088 Apache-2.0
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
by dask python
9771 BSD-3-Clause
Parallel computing with task scheduling
Trending New libraries in Numpy
by huggingface python
13088 Apache-2.0
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
by pola-rs rust
5341 MIT
Fast multi-threaded DataFrame library in Rust | Python | Node.js
by iswbm python
1913
Python 黑魔法手册
by deepmind python
1855 Apache-2.0
JAX-based neural network library
by google python
1747 Apache-2.0
Generates LaTeX math description from Python functions.
by MicrosoftDocs jupyter notebook
930 MIT
Exercise notebooks for Machine Learning modules on Microsoft Learn
by zongyi-li python
554 MIT
Use Fourier transform to learn operators in differential equations.
by MilesCranmer python
549 Apache-2.0
High-Performance Symbolic Regression in Python
by firmai python
476
PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)
Top Authors in Numpy
1
17 Libraries
376
2
16 Libraries
124
3
13 Libraries
2405
4
11 Libraries
3095
5
10 Libraries
20810
6
9 Libraries
1105
7
8 Libraries
306
8
8 Libraries
1046
9
7 Libraries
7715
10
7 Libraries
27163
1
17 Libraries
376
2
16 Libraries
124
3
13 Libraries
2405
4
11 Libraries
3095
5
10 Libraries
20810
6
9 Libraries
1105
7
8 Libraries
306
8
8 Libraries
1046
9
7 Libraries
7715
10
7 Libraries
27163
Trending Kits in Numpy
No Trending Kits are available at this moment for Numpy
Trending Discussions on Numpy
Why is `np.sum(range(N))` very slow?
Installing scipy and scikit-learn on apple m1
How could I speed up my written python code: spheres contact detection (collision) using spatial searching
Error while downloading the requirements using pip install (setup command: use_2to3 is invalid.)
TypeError: load() missing 1 required positional argument: 'Loader' in Google Colab
How do I calculate square root in Python?
Using a pip requirements file in a conda yml file throws AttributeError: 'FileNotFoundError'
Problem with memory allocation in Julia code
Efficient summation in Python
NumPy 1.21.2 may not yet support Python 3.10
QUESTION
Why is `np.sum(range(N))` very slow?
Asked 2022-Mar-29 at 14:31I saw a video about speed of loops in python, where it was explained that doing sum(range(N))
is much faster than manually looping through range
and adding the variables together, since the former runs in C due to built-in functions being used, while in the latter the summation is done in (slow) python. I was curious what happens when adding numpy
to the mix. As I expected np.sum(np.arange(N))
is the fastest, but sum(np.arange(N))
and np.sum(range(N))
are even slower than doing the naive for loop.
Why is this?
Here's the script I used to test, some comments about the supposed cause of slowing done where I know (taken mostly from the video) and the results I got on my machine (python 3.10.0, numpy 1.21.2):
updated script:
1import numpy as np
2from timeit import timeit
3
4N = 10_000_000
5repetition = 10
6
7def sum0(N = N):
8 s = 0
9 i = 0
10 while i < N: # condition is checked in python
11 s += i
12 i += 1 # both additions are done in python
13 return s
14
15def sum1(N = N):
16 s = 0
17 for i in range(N): # increment in C
18 s += i # addition in python
19 return s
20
21def sum2(N = N):
22 return sum(range(N)) # everything in C
23
24def sum3(N = N):
25 return sum(list(range(N)))
26
27def sum4(N = N):
28 return np.sum(range(N)) # very slow np.array conversion
29
30def sum5(N = N):
31 # much faster np.array conversion
32 return np.sum(np.fromiter(range(N),dtype = int))
33
34def sum5v2_(N = N):
35 # much faster np.array conversion
36 return np.sum(np.fromiter(range(N),dtype = np.int_))
37
38def sum6(N = N):
39 # possibly slow conversion to Py_long from np.int
40 return sum(np.arange(N))
41
42def sum7(N = N):
43 # list returns a list of np.int-s
44 return sum(list(np.arange(N)))
45
46def sum7v2(N = N):
47 # tolist conversion to python int seems faster than the implicit conversion
48 # in sum(list()) (tolist returns a list of python int-s)
49 return sum(np.arange(N).tolist())
50
51def sum8(N = N):
52 return np.sum(np.arange(N)) # everything in numpy (fortran libblas?)
53
54def sum9(N = N):
55 return np.arange(N).sum() # remove dispatch overhead
56
57def array_basic(N = N):
58 return np.array(range(N))
59
60def array_dtype(N = N):
61 return np.array(range(N),dtype = np.int_)
62
63def array_iter(N = N):
64 # np.sum's source code mentions to use fromiter to convert from generators
65 return np.fromiter(range(N),dtype = np.int_)
66
67print(f"while loop: {timeit(sum0, number = repetition)}")
68print(f"for loop: {timeit(sum1, number = repetition)}")
69print(f"sum_range: {timeit(sum2, number = repetition)}")
70print(f"sum_rangelist: {timeit(sum3, number = repetition)}")
71print(f"npsum_range: {timeit(sum4, number = repetition)}")
72print(f"npsum_iterrange: {timeit(sum5, number = repetition)}")
73print(f"npsum_iterrangev2: {timeit(sum5, number = repetition)}")
74print(f"sum_arange: {timeit(sum6, number = repetition)}")
75print(f"sum_list_arange: {timeit(sum7, number = repetition)}")
76print(f"sum_arange_tolist: {timeit(sum7v2, number = repetition)}")
77print(f"npsum_arange: {timeit(sum8, number = repetition)}")
78print(f"nparangenpsum: {timeit(sum9, number = repetition)}")
79print(f"array_basic: {timeit(array_basic, number = repetition)}")
80print(f"array_dtype: {timeit(array_dtype, number = repetition)}")
81print(f"array_iter: {timeit(array_iter, number = repetition)}")
82
83print(f"npsumarangeREP: {timeit(lambda : sum8(N/1000), number = 100000*repetition)}")
84print(f"npsumarangeREP: {timeit(lambda : sum9(N/1000), number = 100000*repetition)}")
85
86# Example output:
87#
88# while loop: 11.493371912998555
89# for loop: 7.385945574002108
90# sum_range: 2.4605720699983067
91# sum_rangelist: 4.509678105998319
92# npsum_range: 11.85120212900074
93# npsum_iterrange: 4.464334709002287
94# npsum_iterrangev2: 4.498494338993623
95# sum_arange: 9.537815956995473
96# sum_list_arange: 13.290120724996086
97# sum_arange_tolist: 5.231948580003518
98# npsum_arange: 0.241889145996538
99# nparangenpsum: 0.21876695199898677
100# array_basic: 11.736577274998126
101# array_dtype: 8.71628468400013
102# array_iter: 4.303306431000237
103# npsumarangeREP: 21.240833958996518
104# npsumarangeREP: 16.690092379001726
105
106
ANSWER
Answered 2021-Oct-16 at 17:42From the cpython source code for sum
sum initially seems to attempt a fast path that assumes all inputs are the same type. If that fails it will just iterate:
1import numpy as np
2from timeit import timeit
3
4N = 10_000_000
5repetition = 10
6
7def sum0(N = N):
8 s = 0
9 i = 0
10 while i < N: # condition is checked in python
11 s += i
12 i += 1 # both additions are done in python
13 return s
14
15def sum1(N = N):
16 s = 0
17 for i in range(N): # increment in C
18 s += i # addition in python
19 return s
20
21def sum2(N = N):
22 return sum(range(N)) # everything in C
23
24def sum3(N = N):
25 return sum(list(range(N)))
26
27def sum4(N = N):
28 return np.sum(range(N)) # very slow np.array conversion
29
30def sum5(N = N):
31 # much faster np.array conversion
32 return np.sum(np.fromiter(range(N),dtype = int))
33
34def sum5v2_(N = N):
35 # much faster np.array conversion
36 return np.sum(np.fromiter(range(N),dtype = np.int_))
37
38def sum6(N = N):
39 # possibly slow conversion to Py_long from np.int
40 return sum(np.arange(N))
41
42def sum7(N = N):
43 # list returns a list of np.int-s
44 return sum(list(np.arange(N)))
45
46def sum7v2(N = N):
47 # tolist conversion to python int seems faster than the implicit conversion
48 # in sum(list()) (tolist returns a list of python int-s)
49 return sum(np.arange(N).tolist())
50
51def sum8(N = N):
52 return np.sum(np.arange(N)) # everything in numpy (fortran libblas?)
53
54def sum9(N = N):
55 return np.arange(N).sum() # remove dispatch overhead
56
57def array_basic(N = N):
58 return np.array(range(N))
59
60def array_dtype(N = N):
61 return np.array(range(N),dtype = np.int_)
62
63def array_iter(N = N):
64 # np.sum's source code mentions to use fromiter to convert from generators
65 return np.fromiter(range(N),dtype = np.int_)
66
67print(f"while loop: {timeit(sum0, number = repetition)}")
68print(f"for loop: {timeit(sum1, number = repetition)}")
69print(f"sum_range: {timeit(sum2, number = repetition)}")
70print(f"sum_rangelist: {timeit(sum3, number = repetition)}")
71print(f"npsum_range: {timeit(sum4, number = repetition)}")
72print(f"npsum_iterrange: {timeit(sum5, number = repetition)}")
73print(f"npsum_iterrangev2: {timeit(sum5, number = repetition)}")
74print(f"sum_arange: {timeit(sum6, number = repetition)}")
75print(f"sum_list_arange: {timeit(sum7, number = repetition)}")
76print(f"sum_arange_tolist: {timeit(sum7v2, number = repetition)}")
77print(f"npsum_arange: {timeit(sum8, number = repetition)}")
78print(f"nparangenpsum: {timeit(sum9, number = repetition)}")
79print(f"array_basic: {timeit(array_basic, number = repetition)}")
80print(f"array_dtype: {timeit(array_dtype, number = repetition)}")
81print(f"array_iter: {timeit(array_iter, number = repetition)}")
82
83print(f"npsumarangeREP: {timeit(lambda : sum8(N/1000), number = 100000*repetition)}")
84print(f"npsumarangeREP: {timeit(lambda : sum9(N/1000), number = 100000*repetition)}")
85
86# Example output:
87#
88# while loop: 11.493371912998555
89# for loop: 7.385945574002108
90# sum_range: 2.4605720699983067
91# sum_rangelist: 4.509678105998319
92# npsum_range: 11.85120212900074
93# npsum_iterrange: 4.464334709002287
94# npsum_iterrangev2: 4.498494338993623
95# sum_arange: 9.537815956995473
96# sum_list_arange: 13.290120724996086
97# sum_arange_tolist: 5.231948580003518
98# npsum_arange: 0.241889145996538
99# nparangenpsum: 0.21876695199898677
100# array_basic: 11.736577274998126
101# array_dtype: 8.71628468400013
102# array_iter: 4.303306431000237
103# npsumarangeREP: 21.240833958996518
104# npsumarangeREP: 16.690092379001726
105
106/* Fast addition by keeping temporary sums in C instead of new Python objects.
107 Assumes all inputs are the same type. If the assumption fails, default
108 to the more general routine.
109*/
110
I'm not entirely certain what is happening under the hood, but it is likely the repeated creation/conversion of C types to Python objects that is causing these slow-downs. It's worth noting that both sum
and range
are implemented in C.
This next bit is not really an answer to the question, but I wondered if we could speed up sum
for python range
s as range
is quite a smart object.
To do this I've used functools.singledispatch
to override the built-in sum
function specifically for the range
type; then implemented a small function to calculate the sum of an arithmetic progression.
1import numpy as np
2from timeit import timeit
3
4N = 10_000_000
5repetition = 10
6
7def sum0(N = N):
8 s = 0
9 i = 0
10 while i < N: # condition is checked in python
11 s += i
12 i += 1 # both additions are done in python
13 return s
14
15def sum1(N = N):
16 s = 0
17 for i in range(N): # increment in C
18 s += i # addition in python
19 return s
20
21def sum2(N = N):
22 return sum(range(N)) # everything in C
23
24def sum3(N = N):
25 return sum(list(range(N)))
26
27def sum4(N = N):
28 return np.sum(range(N)) # very slow np.array conversion
29
30def sum5(N = N):
31 # much faster np.array conversion
32 return np.sum(np.fromiter(range(N),dtype = int))
33
34def sum5v2_(N = N):
35 # much faster np.array conversion
36 return np.sum(np.fromiter(range(N),dtype = np.int_))
37
38def sum6(N = N):
39 # possibly slow conversion to Py_long from np.int
40 return sum(np.arange(N))
41
42def sum7(N = N):
43 # list returns a list of np.int-s
44 return sum(list(np.arange(N)))
45
46def sum7v2(N = N):
47 # tolist conversion to python int seems faster than the implicit conversion
48 # in sum(list()) (tolist returns a list of python int-s)
49 return sum(np.arange(N).tolist())
50
51def sum8(N = N):
52 return np.sum(np.arange(N)) # everything in numpy (fortran libblas?)
53
54def sum9(N = N):
55 return np.arange(N).sum() # remove dispatch overhead
56
57def array_basic(N = N):
58 return np.array(range(N))
59
60def array_dtype(N = N):
61 return np.array(range(N),dtype = np.int_)
62
63def array_iter(N = N):
64 # np.sum's source code mentions to use fromiter to convert from generators
65 return np.fromiter(range(N),dtype = np.int_)
66
67print(f"while loop: {timeit(sum0, number = repetition)}")
68print(f"for loop: {timeit(sum1, number = repetition)}")
69print(f"sum_range: {timeit(sum2, number = repetition)}")
70print(f"sum_rangelist: {timeit(sum3, number = repetition)}")
71print(f"npsum_range: {timeit(sum4, number = repetition)}")
72print(f"npsum_iterrange: {timeit(sum5, number = repetition)}")
73print(f"npsum_iterrangev2: {timeit(sum5, number = repetition)}")
74print(f"sum_arange: {timeit(sum6, number = repetition)}")
75print(f"sum_list_arange: {timeit(sum7, number = repetition)}")
76print(f"sum_arange_tolist: {timeit(sum7v2, number = repetition)}")
77print(f"npsum_arange: {timeit(sum8, number = repetition)}")
78print(f"nparangenpsum: {timeit(sum9, number = repetition)}")
79print(f"array_basic: {timeit(array_basic, number = repetition)}")
80print(f"array_dtype: {timeit(array_dtype, number = repetition)}")
81print(f"array_iter: {timeit(array_iter, number = repetition)}")
82
83print(f"npsumarangeREP: {timeit(lambda : sum8(N/1000), number = 100000*repetition)}")
84print(f"npsumarangeREP: {timeit(lambda : sum9(N/1000), number = 100000*repetition)}")
85
86# Example output:
87#
88# while loop: 11.493371912998555
89# for loop: 7.385945574002108
90# sum_range: 2.4605720699983067
91# sum_rangelist: 4.509678105998319
92# npsum_range: 11.85120212900074
93# npsum_iterrange: 4.464334709002287
94# npsum_iterrangev2: 4.498494338993623
95# sum_arange: 9.537815956995473
96# sum_list_arange: 13.290120724996086
97# sum_arange_tolist: 5.231948580003518
98# npsum_arange: 0.241889145996538
99# nparangenpsum: 0.21876695199898677
100# array_basic: 11.736577274998126
101# array_dtype: 8.71628468400013
102# array_iter: 4.303306431000237
103# npsumarangeREP: 21.240833958996518
104# npsumarangeREP: 16.690092379001726
105
106/* Fast addition by keeping temporary sums in C instead of new Python objects.
107 Assumes all inputs are the same type. If the assumption fails, default
108 to the more general routine.
109*/
110from functools import singledispatch
111
112def sum_range(range_, /, start=0):
113 """Overloaded `sum` for range, compute arithmetic sum"""
114 n = len(range_)
115 if not n:
116 return start
117 return int(start + (n * (range_[0] + range_[-1]) / 2))
118
119sum = singledispatch(sum)
120sum.register(range, sum_range)
121
122def test():
123 """
124 >>> sum(range(0, 100))
125 4950
126 >>> sum(range(0, 10, 2))
127 20
128 >>> sum(range(0, 9, 2))
129 20
130 >>> sum(range(0, -10, -1))
131 -45
132 >>> sum(range(-10, 10))
133 -10
134 >>> sum(range(-1, -100, -2))
135 -2500
136 >>> sum(range(0, 10, 100))
137 0
138 >>> sum(range(0, 0))
139 0
140 >>> sum(range(0, 100), 50)
141 5000
142 >>> sum(range(0, 0), 10)
143 10
144 """
145
146if __name__ == "__main__":
147 import doctest
148 doctest.testmod()
149
I'm not sure if this is complete, but it's definitely faster than looping.
QUESTION
Installing scipy and scikit-learn on apple m1
Asked 2022-Mar-22 at 06:21The installation on the m1 chip for the following packages: Numpy 1.21.1, pandas 1.3.0, torch 1.9.0 and a few other ones works fine for me. They also seem to work properly while testing them. However when I try to install scipy or scikit-learn via pip this error appears:
ERROR: Failed building wheel for numpy
Failed to build numpy
ERROR: Could not build wheels for numpy which use PEP 517 and cannot be installed directly
Why should Numpy be build again when I have the latest version from pip already installed?
Every previous installation was done using python3.9 -m pip install ...
on Mac OS 11.3.1 with the apple m1 chip.
Maybe somebody knows how to deal with this error or if its just a matter of time.
ANSWER
Answered 2021-Aug-02 at 14:33Please see this note of scikit-learn
about
Installing on Apple Silicon M1 hardware
The recently introduced
macos/arm64
platform (sometimes also known asmacos/aarch64
) requires the open source community to upgrade the build configuation and automation to properly support it.At the time of writing (January 2021), the only way to get a working installation of scikit-learn on this hardware is to install scikit-learn and its dependencies from the conda-forge distribution, for instance using the miniforge installers:
https://github.com/conda-forge/miniforge
The following issue tracks progress on making it possible to install scikit-learn from PyPI with pip:
QUESTION
How could I speed up my written python code: spheres contact detection (collision) using spatial searching
Asked 2022-Mar-13 at 15:43I am working on a spatial search case for spheres in which I want to find connected spheres. For this aim, I searched around each sphere for spheres that centers are in a (maximum sphere diameter) distance from the searching sphere’s center. At first, I tried to use scipy related methods to do so, but scipy method takes longer times comparing to equivalent numpy method. For scipy, I have determined the number of K-nearest spheres firstly and then find them by cKDTree.query
, which lead to more time consumption. However, it is slower than numpy method even by omitting the first step with a constant value (it is not good to omit the first step in this case). It is contrary to my expectations about scipy spatial searching speed. So, I tried to use some list-loops instead some numpy lines for speeding up using numba prange
. Numba run the code a little faster, but I believe that this code can be optimized for better performances, perhaps by vectorization, using other alternative numpy modules or using numba in another way. I have used iteration on all spheres due to prevent probable memory leaks and …, where number of spheres are high.
1import numpy as np
2import numba as nb
3from scipy.spatial import cKDTree, distance
4
5# ---------------------------- input data ----------------------------
6""" For testing by prepared files:
7radii = np.load('a.npy') # shape: (n-spheres, ) must be loaded by np.load('a.npy') or np.loadtxt('radii_large.csv')
8poss = np.load('b.npy') # shape: (n-spheres, 3) must be loaded by np.load('b.npy') or np.loadtxt('pos_large.csv', delimiter=',')
9"""
10
11rnd = np.random.RandomState(70)
12data_volume = 200000
13
14radii = rnd.uniform(0.0005, 0.122, data_volume)
15dia_max = 2 * radii.max()
16
17x = rnd.uniform(-1.02, 1.02, (data_volume, 1))
18y = rnd.uniform(-3.52, 3.52, (data_volume, 1))
19z = rnd.uniform(-1.02, -0.575, (data_volume, 1))
20poss = np.hstack((x, y, z))
21# --------------------------------------------------------------------
22
23# @nb.jit('float64[:,::1](float64[:,::1], float64[::1])', forceobj=True, parallel=True)
24def ends_gap(poss, dia_max):
25 particle_corsp_overlaps = np.array([], dtype=np.float64)
26 ends_ind = np.empty([1, 2], dtype=np.int64)
27 """ using list looping """
28 # particle_corsp_overlaps = []
29 # ends_ind = []
30
31 # for particle_idx in nb.prange(len(poss)): # by list looping
32 for particle_idx in range(len(poss)):
33 unshared_idx = np.delete(np.arange(len(poss)), particle_idx) # <--- relatively high time consumer
34 poss_without = poss[unshared_idx]
35
36 """ # SCIPY method ---------------------------------------------------------------------------------------------
37 nears_i_ind = cKDTree(poss_without).query_ball_point(poss[particle_idx], r=dia_max) # <--- high time consumer
38 if len(nears_i_ind) > 0:
39 dist_i, dist_i_ind = cKDTree(poss_without[nears_i_ind]).query(poss[particle_idx], k=len(nears_i_ind)) # <--- high time consumer
40 if not isinstance(dist_i, float):
41 dist_i[dist_i_ind] = dist_i.copy()
42 """ # NUMPY method --------------------------------------------------------------------------------------------
43 lx_limit_idx = poss_without[:, 0] <= poss[particle_idx][0] + dia_max
44 ux_limit_idx = poss_without[:, 0] >= poss[particle_idx][0] - dia_max
45 ly_limit_idx = poss_without[:, 1] <= poss[particle_idx][1] + dia_max
46 uy_limit_idx = poss_without[:, 1] >= poss[particle_idx][1] - dia_max
47 lz_limit_idx = poss_without[:, 2] <= poss[particle_idx][2] + dia_max
48 uz_limit_idx = poss_without[:, 2] >= poss[particle_idx][2] - dia_max
49
50 nears_i_ind = np.where(lx_limit_idx & ux_limit_idx & ly_limit_idx & uy_limit_idx & lz_limit_idx & uz_limit_idx)[0]
51 if len(nears_i_ind) > 0:
52 dist_i = distance.cdist(poss_without[nears_i_ind], poss[particle_idx][None, :]).squeeze() # <--- relatively high time consumer
53 # """ # -------------------------------------------------------------------------------------------------------
54 contact_check = dist_i - (radii[unshared_idx][nears_i_ind] + radii[particle_idx])
55 connected = contact_check[contact_check <= 0]
56
57 particle_corsp_overlaps = np.concatenate((particle_corsp_overlaps, connected))
58 """ using list looping """
59 # if len(connected) > 0:
60 # for value_ in connected:
61 # particle_corsp_overlaps.append(value_)
62
63 contacts_ind = np.where([contact_check <= 0])[1]
64 contacts_sec_ind = np.array(nears_i_ind)[contacts_ind]
65 sphere_olps_ind = np.where((poss[:, None] == poss_without[contacts_sec_ind][None, :]).all(axis=2))[0] # <--- high time consumer
66
67 ends_ind_mod_temp = np.array([np.repeat(particle_idx, len(sphere_olps_ind)), sphere_olps_ind], dtype=np.int64).T
68 if particle_idx > 0:
69 ends_ind = np.concatenate((ends_ind, ends_ind_mod_temp))
70 else:
71 ends_ind[0, 0], ends_ind[0, 1] = ends_ind_mod_temp[0, 0], ends_ind_mod_temp[0, 1]
72 """ using list looping """
73 # for contacted_idx in sphere_olps_ind:
74 # ends_ind.append([particle_idx, contacted_idx])
75
76 # ends_ind_org = np.array(ends_ind) # using lists
77 ends_ind_org = ends_ind
78 ends_ind, ends_ind_idx = np.unique(np.sort(ends_ind_org), axis=0, return_index=True) # <--- relatively high time consumer
79 gap = np.array(particle_corsp_overlaps)[ends_ind_idx]
80 return gap, ends_ind, ends_ind_idx, ends_ind_org
81
In one of my tests on 23000 spheres, scipy, numpy, and numba-aided methods finished the loop in about 400, 200, and 180 seconds correspondingly using Colab TPU; for 500.000 spheres it take 3.5 hours. These execution times are not satisfying at all for my project, where number of spheres may be up to 1.000.000 in a medium data volume. I will call this code many times in my main code and seeking for ways that could perform this code in milliseconds (as much as fastest that it could). Is it possible?? I would be appreciated if anyone would speed up the code as it is needed.
Notes:
- This code must be executable with python 3.7+, on CPU and GPU.
- This code must be applicable for data size, at least, 300.000 spheres.
- All numpy, scipy, and … equivalent modules instead of my written modules, which make my code faster significantly, will be upvoted.
I would be appreciated for any recommendations or explanations about:
- Which method could be faster in this subject?
- Why scipy is not faster than other methods in this case and where it could be helpful relating to this subject?
- Choosing between iterator methods and matrix form methods is a confusing matter for me. Iterating methods use less memory and could be used and tuned up by numba and … but, I think, are not useful and comparable with matrix methods (which depends on memory limits) like numpy and … for huge sphere numbers. For this case, perhaps I could omit the iteration by numpy, but I guess strongly that it cannot be handled due to huge matrix size operations and memory leaks.
Prepared sample test data:
Poss data: 23000, 500000
Radii data: 23000, 500000
Line by line speed test logs: for two test cases scipy method and numpy time consumption.
ANSWER
Answered 2022-Feb-14 at 10:23Have you tried FLANN?
This code doesn't solve your problem completely. It simply finds the nearest 50 neighbors to each point in your 500000 point dataset:
1import numpy as np
2import numba as nb
3from scipy.spatial import cKDTree, distance
4
5# ---------------------------- input data ----------------------------
6""" For testing by prepared files:
7radii = np.load('a.npy') # shape: (n-spheres, ) must be loaded by np.load('a.npy') or np.loadtxt('radii_large.csv')
8poss = np.load('b.npy') # shape: (n-spheres, 3) must be loaded by np.load('b.npy') or np.loadtxt('pos_large.csv', delimiter=',')
9"""
10
11rnd = np.random.RandomState(70)
12data_volume = 200000
13
14radii = rnd.uniform(0.0005, 0.122, data_volume)
15dia_max = 2 * radii.max()
16
17x = rnd.uniform(-1.02, 1.02, (data_volume, 1))
18y = rnd.uniform(-3.52, 3.52, (data_volume, 1))
19z = rnd.uniform(-1.02, -0.575, (data_volume, 1))
20poss = np.hstack((x, y, z))
21# --------------------------------------------------------------------
22
23# @nb.jit('float64[:,::1](float64[:,::1], float64[::1])', forceobj=True, parallel=True)
24def ends_gap(poss, dia_max):
25 particle_corsp_overlaps = np.array([], dtype=np.float64)
26 ends_ind = np.empty([1, 2], dtype=np.int64)
27 """ using list looping """
28 # particle_corsp_overlaps = []
29 # ends_ind = []
30
31 # for particle_idx in nb.prange(len(poss)): # by list looping
32 for particle_idx in range(len(poss)):
33 unshared_idx = np.delete(np.arange(len(poss)), particle_idx) # <--- relatively high time consumer
34 poss_without = poss[unshared_idx]
35
36 """ # SCIPY method ---------------------------------------------------------------------------------------------
37 nears_i_ind = cKDTree(poss_without).query_ball_point(poss[particle_idx], r=dia_max) # <--- high time consumer
38 if len(nears_i_ind) > 0:
39 dist_i, dist_i_ind = cKDTree(poss_without[nears_i_ind]).query(poss[particle_idx], k=len(nears_i_ind)) # <--- high time consumer
40 if not isinstance(dist_i, float):
41 dist_i[dist_i_ind] = dist_i.copy()
42 """ # NUMPY method --------------------------------------------------------------------------------------------
43 lx_limit_idx = poss_without[:, 0] <= poss[particle_idx][0] + dia_max
44 ux_limit_idx = poss_without[:, 0] >= poss[particle_idx][0] - dia_max
45 ly_limit_idx = poss_without[:, 1] <= poss[particle_idx][1] + dia_max
46 uy_limit_idx = poss_without[:, 1] >= poss[particle_idx][1] - dia_max
47 lz_limit_idx = poss_without[:, 2] <= poss[particle_idx][2] + dia_max
48 uz_limit_idx = poss_without[:, 2] >= poss[particle_idx][2] - dia_max
49
50 nears_i_ind = np.where(lx_limit_idx & ux_limit_idx & ly_limit_idx & uy_limit_idx & lz_limit_idx & uz_limit_idx)[0]
51 if len(nears_i_ind) > 0:
52 dist_i = distance.cdist(poss_without[nears_i_ind], poss[particle_idx][None, :]).squeeze() # <--- relatively high time consumer
53 # """ # -------------------------------------------------------------------------------------------------------
54 contact_check = dist_i - (radii[unshared_idx][nears_i_ind] + radii[particle_idx])
55 connected = contact_check[contact_check <= 0]
56
57 particle_corsp_overlaps = np.concatenate((particle_corsp_overlaps, connected))
58 """ using list looping """
59 # if len(connected) > 0:
60 # for value_ in connected:
61 # particle_corsp_overlaps.append(value_)
62
63 contacts_ind = np.where([contact_check <= 0])[1]
64 contacts_sec_ind = np.array(nears_i_ind)[contacts_ind]
65 sphere_olps_ind = np.where((poss[:, None] == poss_without[contacts_sec_ind][None, :]).all(axis=2))[0] # <--- high time consumer
66
67 ends_ind_mod_temp = np.array([np.repeat(particle_idx, len(sphere_olps_ind)), sphere_olps_ind], dtype=np.int64).T
68 if particle_idx > 0:
69 ends_ind = np.concatenate((ends_ind, ends_ind_mod_temp))
70 else:
71 ends_ind[0, 0], ends_ind[0, 1] = ends_ind_mod_temp[0, 0], ends_ind_mod_temp[0, 1]
72 """ using list looping """
73 # for contacted_idx in sphere_olps_ind:
74 # ends_ind.append([particle_idx, contacted_idx])
75
76 # ends_ind_org = np.array(ends_ind) # using lists
77 ends_ind_org = ends_ind
78 ends_ind, ends_ind_idx = np.unique(np.sort(ends_ind_org), axis=0, return_index=True) # <--- relatively high time consumer
79 gap = np.array(particle_corsp_overlaps)[ends_ind_idx]
80 return gap, ends_ind, ends_ind_idx, ends_ind_org
81from pyflann import FLANN
82
83p = np.loadtxt("pos_large.csv", delimiter=",")
84flann = FLANN()
85flann.build_index(pts=p)
86idx, dist = flann.nn_index(qpts=p, num_neighbors=50)
87
The last line takes less than a second in my laptop without any tuning or parallelization.
QUESTION
Error while downloading the requirements using pip install (setup command: use_2to3 is invalid.)
Asked 2022-Mar-05 at 07:13version pip 21.2.4 python 3.6
The command:
1pip install -r requirments.txt
2
The content of my requirements.txt
:
1pip install -r requirments.txt
2mongoengine==0.19.1
3numpy==1.16.2
4pylint
5pandas==1.1.5
6fawkes
7
The command is failing with this error
1pip install -r requirments.txt
2mongoengine==0.19.1
3numpy==1.16.2
4pylint
5pandas==1.1.5
6fawkes
7ERROR: Command errored out with exit status 1:
8 command: /Users/*/Desktop/ml/*/venv/bin/python -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/kn/0y92g7x55qs7c42tln4gwhtm0000gp/T/pip-install-soh30mel/mongoengine_89e68f8427244f1bb3215b22f77a619c/setup.py'"'"'; __file__='"'"'/private/var/folders/kn/0y92g7x55qs7c42tln4gwhtm0000gp/T/pip-install-soh30mel/mongoengine_89e68f8427244f1bb3215b22f77a619c/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/kn/0y92g7x55qs7c42tln4gwhtm0000gp/T/pip-pip-egg-info-97994d6e
9 cwd: /private/var/folders/kn/0y92g7x55qs7c42tln4gwhtm0000gp/T/pip-install-soh30mel/mongoengine_89e68f8427244f1bb3215b22f77a619c/
10 Complete output (1 lines):
11 error in mongoengine setup command: use_2to3 is invalid.
12 ----------------------------------------
13WARNING: Discarding https://*/pypi/packages/mongoengine-0.19.1.tar.gz#md5=68e613009f6466239158821a102ac084 (from https://*/pypi/simple/mongoengine/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
14ERROR: Could not find a version that satisfies the requirement mongoengine==0.19.1 (from versions: 0.15.0, 0.19.1)
15ERROR: No matching distribution found for mongoengine==0.19.1
16
ANSWER
Answered 2021-Nov-19 at 13:30It looks like setuptools>=58
breaks support for use_2to3
:
So you should update setuptools
to setuptools<58
or avoid using packages with use_2to3
in the setup parameters.
I was having the same problem, pip==19.3.1
QUESTION
TypeError: load() missing 1 required positional argument: 'Loader' in Google Colab
Asked 2022-Mar-04 at 11:01I am trying to do a regular import in Google Colab.
This import worked up until now.
If I try:
1import plotly.express as px
2
or
1import plotly.express as px
2import pingouin as pg
3
I get an error:
1import plotly.express as px
2import pingouin as pg
3---------------------------------------------------------------------------
4TypeError Traceback (most recent call last)
5<ipython-input-19-86e89bd44552> in <module>()
6----> 1 import plotly.express as px
7
89 frames
9/usr/local/lib/python3.7/dist-packages/plotly/express/__init__.py in <module>()
10 13 )
11 14
12---> 15 from ._imshow import imshow
13 16 from ._chart_types import ( # noqa: F401
14 17 scatter,
15
16/usr/local/lib/python3.7/dist-packages/plotly/express/_imshow.py in <module>()
17 9
18 10 try:
19---> 11 import xarray
20 12
21 13 xarray_imported = True
22
23/usr/local/lib/python3.7/dist-packages/xarray/__init__.py in <module>()
24 1 import pkg_resources
25 2
26----> 3 from . import testing, tutorial, ufuncs
27 4 from .backends.api import (
28 5 load_dataarray,
29
30/usr/local/lib/python3.7/dist-packages/xarray/tutorial.py in <module>()
31 11 import numpy as np
32 12
33---> 13 from .backends.api import open_dataset as _open_dataset
34 14 from .backends.rasterio_ import open_rasterio as _open_rasterio
35 15 from .core.dataarray import DataArray
36
37/usr/local/lib/python3.7/dist-packages/xarray/backends/__init__.py in <module>()
38 4 formats. They should not be used directly, but rather through Dataset objects.
39 5
40----> 6 from .cfgrib_ import CfGribDataStore
41 7 from .common import AbstractDataStore, BackendArray, BackendEntrypoint
42 8 from .file_manager import CachingFileManager, DummyFileManager, FileManager
43
44/usr/local/lib/python3.7/dist-packages/xarray/backends/cfgrib_.py in <module>()
45 14 _normalize_path,
46 15 )
47---> 16 from .locks import SerializableLock, ensure_lock
48 17 from .store import StoreBackendEntrypoint
49 18
50
51/usr/local/lib/python3.7/dist-packages/xarray/backends/locks.py in <module>()
52 11
53 12 try:
54---> 13 from dask.distributed import Lock as DistributedLock
55 14 except ImportError:
56 15 DistributedLock = None
57
58/usr/local/lib/python3.7/dist-packages/dask/distributed.py in <module>()
59 1 # flake8: noqa
60 2 try:
61----> 3 from distributed import *
62 4 except ImportError:
63 5 msg = (
64
65/usr/local/lib/python3.7/dist-packages/distributed/__init__.py in <module>()
66 1 from __future__ import print_function, division, absolute_import
67 2
68----> 3 from . import config
69 4 from dask.config import config
70 5 from .actor import Actor, ActorFuture
71
72/usr/local/lib/python3.7/dist-packages/distributed/config.py in <module>()
73 18
74 19 with open(fn) as f:
75---> 20 defaults = yaml.load(f)
76 21
77 22 dask.config.update_defaults(defaults)
78
79TypeError: load() missing 1 required positional argument: 'Loader'
80
I think it might be a problem with Google Colab or some basic utility package that has been updated, but I can not find a way to solve it.
ANSWER
Answered 2021-Oct-15 at 21:11Found the problem.
I was installing pandas_profiling
, and this package updated pyyaml
to version 6.0 which is not compatible with the current way Google Colab imports packages.
So just reverting back to pyyaml
version 5.4.1 solved the problem.
For more information check versions of pyyaml
here.
See this issue and formal answers in GitHub
##################################################################
For reverting back to pyyaml
version 5.4.1 in your code, add the next line at the end of your packages installations:
1import plotly.express as px
2import pingouin as pg
3---------------------------------------------------------------------------
4TypeError Traceback (most recent call last)
5<ipython-input-19-86e89bd44552> in <module>()
6----> 1 import plotly.express as px
7
89 frames
9/usr/local/lib/python3.7/dist-packages/plotly/express/__init__.py in <module>()
10 13 )
11 14
12---> 15 from ._imshow import imshow
13 16 from ._chart_types import ( # noqa: F401
14 17 scatter,
15
16/usr/local/lib/python3.7/dist-packages/plotly/express/_imshow.py in <module>()
17 9
18 10 try:
19---> 11 import xarray
20 12
21 13 xarray_imported = True
22
23/usr/local/lib/python3.7/dist-packages/xarray/__init__.py in <module>()
24 1 import pkg_resources
25 2
26----> 3 from . import testing, tutorial, ufuncs
27 4 from .backends.api import (
28 5 load_dataarray,
29
30/usr/local/lib/python3.7/dist-packages/xarray/tutorial.py in <module>()
31 11 import numpy as np
32 12
33---> 13 from .backends.api import open_dataset as _open_dataset
34 14 from .backends.rasterio_ import open_rasterio as _open_rasterio
35 15 from .core.dataarray import DataArray
36
37/usr/local/lib/python3.7/dist-packages/xarray/backends/__init__.py in <module>()
38 4 formats. They should not be used directly, but rather through Dataset objects.
39 5
40----> 6 from .cfgrib_ import CfGribDataStore
41 7 from .common import AbstractDataStore, BackendArray, BackendEntrypoint
42 8 from .file_manager import CachingFileManager, DummyFileManager, FileManager
43
44/usr/local/lib/python3.7/dist-packages/xarray/backends/cfgrib_.py in <module>()
45 14 _normalize_path,
46 15 )
47---> 16 from .locks import SerializableLock, ensure_lock
48 17 from .store import StoreBackendEntrypoint
49 18
50
51/usr/local/lib/python3.7/dist-packages/xarray/backends/locks.py in <module>()
52 11
53 12 try:
54---> 13 from dask.distributed import Lock as DistributedLock
55 14 except ImportError:
56 15 DistributedLock = None
57
58/usr/local/lib/python3.7/dist-packages/dask/distributed.py in <module>()
59 1 # flake8: noqa
60 2 try:
61----> 3 from distributed import *
62 4 except ImportError:
63 5 msg = (
64
65/usr/local/lib/python3.7/dist-packages/distributed/__init__.py in <module>()
66 1 from __future__ import print_function, division, absolute_import
67 2
68----> 3 from . import config
69 4 from dask.config import config
70 5 from .actor import Actor, ActorFuture
71
72/usr/local/lib/python3.7/dist-packages/distributed/config.py in <module>()
73 18
74 19 with open(fn) as f:
75---> 20 defaults = yaml.load(f)
76 21
77 22 dask.config.update_defaults(defaults)
78
79TypeError: load() missing 1 required positional argument: 'Loader'
80!pip install pyyaml==5.4.1
81
It is important to put it at the end of the installation, some of the installations will change the pyyaml
version.
QUESTION
How do I calculate square root in Python?
Asked 2022-Feb-17 at 03:40I need to calculate the square root of some numbers, for example √9 = 3
and √2 = 1.4142
. How can I do it in Python?
The inputs will probably be all positive integers, and relatively small (say less than a billion), but just in case they're not, is there anything that might break?
Related
- Integer square root in python
- Is there a short-hand for nth root of x in Python?
- Difference between **(1/2), math.sqrt and cmath.sqrt?
- Why is math.sqrt() incorrect for large numbers?
- Python sqrt limit for very large numbers?
- Which is faster in Python: x**.5 or math.sqrt(x)?
- Why does Python give the "wrong" answer for square root? (specific to Python 2)
- calculating n-th roots using Python 3's decimal module
- How can I take the square root of -1 using python? (focused on NumPy)
- Arbitrary precision of square roots
Note: This is an attempt at a canonical question after a discussion on Meta about an existing question with the same title.
ANSWER
Answered 2022-Feb-04 at 19:44math.sqrt()
The math
module from the standard library has a sqrt
function to calculate the square root of a number. It takes any type that can be converted to float
(which includes int
) as an argument and returns a float
.
1>>> import math
2>>> math.sqrt(9)
33.0
4
The power operator (**
) or the built-in pow()
function can also be used to calculate a square root. Mathematically speaking, the square root of a
equals a
to the power of 1/2
.
The power operator requires numeric types and matches the conversion rules for binary arithmetic operators, so in this case it will return either a float
or a complex
number.
1>>> import math
2>>> math.sqrt(9)
33.0
4>>> 9 ** (1/2)
53.0
6>>> 9 ** .5 # Same thing
73.0
8>>> 2 ** .5
91.4142135623730951
10
(Note: in Python 2, 1/2
is truncated to 0
, so you have to force floating point arithmetic with 1.0/2
or similar. See Why does Python give the "wrong" answer for square root?)
This method can be generalized to nth root, though fractions that can't be exactly represented as a float
(like 1/3 or any denominator that's not a power of 2) may cause some inaccuracy:
1>>> import math
2>>> math.sqrt(9)
33.0
4>>> 9 ** (1/2)
53.0
6>>> 9 ** .5 # Same thing
73.0
8>>> 2 ** .5
91.4142135623730951
10>>> 8 ** (1/3)
112.0
12>>> 125 ** (1/3)
134.999999999999999
14
Exponentiation works with negative numbers and complex numbers, though the results have some slight inaccuracy:
1>>> import math
2>>> math.sqrt(9)
33.0
4>>> 9 ** (1/2)
53.0
6>>> 9 ** .5 # Same thing
73.0
8>>> 2 ** .5
91.4142135623730951
10>>> 8 ** (1/3)
112.0
12>>> 125 ** (1/3)
134.999999999999999
14>>> (-25) ** .5 # Should be 5j
15(3.061616997868383e-16+5j)
16>>> 8j ** .5 # Should be 2+2j
17(2.0000000000000004+2j)
18
Note the parentheses on -25
! Otherwise it's parsed as -(25**.5)
because exponentiation is more tightly binding than unary negation.
Meanwhile, math
is only built for floats, so for x<0
, math.sqrt()
will raise ValueError: math domain error
and for complex x
, it'll raise TypeError: can't convert complex to float
. Instead, you can use cmath.sqrt()
, which is more more accurate than exponentiation (and will likely be faster too):
1>>> import math
2>>> math.sqrt(9)
33.0
4>>> 9 ** (1/2)
53.0
6>>> 9 ** .5 # Same thing
73.0
8>>> 2 ** .5
91.4142135623730951
10>>> 8 ** (1/3)
112.0
12>>> 125 ** (1/3)
134.999999999999999
14>>> (-25) ** .5 # Should be 5j
15(3.061616997868383e-16+5j)
16>>> 8j ** .5 # Should be 2+2j
17(2.0000000000000004+2j)
18>>> import cmath
19>>> cmath.sqrt(-25)
205j
21>>> cmath.sqrt(8j)
22(2+2j)
23
Both options involve an implicit conversion to float
, so floating point precision is a factor. For example:
1>>> import math
2>>> math.sqrt(9)
33.0
4>>> 9 ** (1/2)
53.0
6>>> 9 ** .5 # Same thing
73.0
8>>> 2 ** .5
91.4142135623730951
10>>> 8 ** (1/3)
112.0
12>>> 125 ** (1/3)
134.999999999999999
14>>> (-25) ** .5 # Should be 5j
15(3.061616997868383e-16+5j)
16>>> 8j ** .5 # Should be 2+2j
17(2.0000000000000004+2j)
18>>> import cmath
19>>> cmath.sqrt(-25)
205j
21>>> cmath.sqrt(8j)
22(2+2j)
23>>> n = 10**30
24>>> square = n**2
25>>> x = square**.5
26>>> x == n
27False
28>>> x - n # how far off are they?
290.0
30>>> int(x) - n # how far off is the float from the int?
3119884624838656
32
Very large numbers might not even fit in a float and you'll get OverflowError: int too large to convert to float
. See Python sqrt limit for very large numbers?
Let's look at Decimal
for example:
Exponentiation fails unless the exponent is also Decimal
:
1>>> import math
2>>> math.sqrt(9)
33.0
4>>> 9 ** (1/2)
53.0
6>>> 9 ** .5 # Same thing
73.0
8>>> 2 ** .5
91.4142135623730951
10>>> 8 ** (1/3)
112.0
12>>> 125 ** (1/3)
134.999999999999999
14>>> (-25) ** .5 # Should be 5j
15(3.061616997868383e-16+5j)
16>>> 8j ** .5 # Should be 2+2j
17(2.0000000000000004+2j)
18>>> import cmath
19>>> cmath.sqrt(-25)
205j
21>>> cmath.sqrt(8j)
22(2+2j)
23>>> n = 10**30
24>>> square = n**2
25>>> x = square**.5
26>>> x == n
27False
28>>> x - n # how far off are they?
290.0
30>>> int(x) - n # how far off is the float from the int?
3119884624838656
32>>> decimal.Decimal('9') ** .5
33Traceback (most recent call last):
34 File "<stdin>", line 1, in <module>
35TypeError: unsupported operand type(s) for ** or pow(): 'decimal.Decimal' and 'float'
36>>> decimal.Decimal('9') ** decimal.Decimal('.5')
37Decimal('3.000000000000000000000000000')
38
Meanwhile, math
and cmath
will silently convert their arguments to float
and complex
respectively, which could mean loss of precision.
decimal
also has its own .sqrt()
. See also calculating n-th roots using Python 3's decimal module
QUESTION
Using a pip requirements file in a conda yml file throws AttributeError: 'FileNotFoundError'
Asked 2022-Jan-23 at 13:29I have a requirements.txt
like
1numpy
2
and an environment.yml
containing
1numpy
2# run via: conda env create --file environment.yml
3---
4name: test
5dependencies:
6 - python>=3
7 - pip
8 - pip:
9 - -r file:requirements.txt
10
when I then run conda env create --file environment.yml
I get
Pip subprocess output:
Pip subprocess error: ERROR: Exception:
<... error traceback in pip >
AttributeError: 'FileNotFoundError' object has no attribute 'read'
failed
CondaEnvException: Pip failed
It is also strange how pip is called, as reported just before the error occurs:
1numpy
2# run via: conda env create --file environment.yml
3---
4name: test
5dependencies:
6 - python>=3
7 - pip
8 - pip:
9 - -r file:requirements.txt
10['$HOME/.conda/envs/test/bin/python', '-m', 'pip', 'install', '-U', '-r', '$HOME/test/condaenv.8d3003nm.requirements.txt']
11
(I replace my home path with $HOME
)
Note the weird expansion of the requirements.txt
.
Any ideas?
ANSWER
Answered 2022-Jan-23 at 13:29A recent change in the Pip code has changed its behavior to be more strict with respect to file:
URI syntax. As pointed out by a PyPA member and Pip developer, the syntax file:requirements.txt
is not a valid URI according to the RFC8089 specification.
Instead, one must either drop the file:
scheme altogether:
1numpy
2# run via: conda env create --file environment.yml
3---
4name: test
5dependencies:
6 - python>=3
7 - pip
8 - pip:
9 - -r file:requirements.txt
10['$HOME/.conda/envs/test/bin/python', '-m', 'pip', 'install', '-U', '-r', '$HOME/test/condaenv.8d3003nm.requirements.txt']
11name: test
12dependencies:
13 - python>=3
14 - pip
15 - pip:
16 - -r requirements.txt
17
or provide a valid URI, which means using an absolute path (or a local file server):
1numpy
2# run via: conda env create --file environment.yml
3---
4name: test
5dependencies:
6 - python>=3
7 - pip
8 - pip:
9 - -r file:requirements.txt
10['$HOME/.conda/envs/test/bin/python', '-m', 'pip', 'install', '-U', '-r', '$HOME/test/condaenv.8d3003nm.requirements.txt']
11name: test
12dependencies:
13 - python>=3
14 - pip
15 - pip:
16 - -r requirements.txt
17name: test
18dependencies:
19 - python>=3
20 - pip
21 - pip:
22 - -r file:/full/path/to/requirements.txt
23 # - -r file:///full/path/to/requirements.txt # alternate syntax
24
QUESTION
Problem with memory allocation in Julia code
Asked 2022-Jan-19 at 09:34I used a function in Python/Numpy to solve a problem in combinatorial game theory.
1import numpy as np
2from time import time
3
4def problem(c):
5 start = time()
6 N = np.array([0, 0])
7 U = np.arange(c)
8
9 for _ in U:
10 bits = np.bitwise_xor(N[:-1], N[-2::-1])
11 N = np.append(N, np.setdiff1d(U, bits).min())
12
13 return len(*np.where(N==0)), time()-start
14
15problem(10000)
16
Then I wrote it in Julia because I thought it'd be faster due to Julia using just-in-time compilation.
1import numpy as np
2from time import time
3
4def problem(c):
5 start = time()
6 N = np.array([0, 0])
7 U = np.arange(c)
8
9 for _ in U:
10 bits = np.bitwise_xor(N[:-1], N[-2::-1])
11 N = np.append(N, np.setdiff1d(U, bits).min())
12
13 return len(*np.where(N==0)), time()-start
14
15problem(10000)
16function problem(c)
17 N = [0]
18 U = Vector(0:c)
19
20 for _ in U
21 elems = N[1:length(N)-1]
22 bits = elems .⊻ reverse(elems)
23 push!(N, minimum(setdiff(U, bits)))
24 end
25
26 return sum(N .== 0)
27end
28
29@time problem(10000)
30
But the second version was much slower. For c = 10000, the Python version takes 2.5 sec. on an Core i5 processor and the Julia version takes 4.5 sec. Since Numpy operations are implemented in C, I'm wondering if Python is indeed faster or if I'm writing a function with wasted time complexity.
The implementation in Julia allocates a lot of memory. How to reduce the number of allocations to improve its performance?
ANSWER
Answered 2022-Jan-19 at 09:34The original code can be re-written in the following way:
1import numpy as np
2from time import time
3
4def problem(c):
5 start = time()
6 N = np.array([0, 0])
7 U = np.arange(c)
8
9 for _ in U:
10 bits = np.bitwise_xor(N[:-1], N[-2::-1])
11 N = np.append(N, np.setdiff1d(U, bits).min())
12
13 return len(*np.where(N==0)), time()-start
14
15problem(10000)
16function problem(c)
17 N = [0]
18 U = Vector(0:c)
19
20 for _ in U
21 elems = N[1:length(N)-1]
22 bits = elems .⊻ reverse(elems)
23 push!(N, minimum(setdiff(U, bits)))
24 end
25
26 return sum(N .== 0)
27end
28
29@time problem(10000)
30function problem2(c)
31 N = zeros(Int, c+2)
32 notseen = falses(c+1)
33
34 for lN in 1:c+1
35 notseen .= true
36 @inbounds for i in 1:lN-1
37 b = N[i] ⊻ N[lN-i]
38 b <= c && (notseen[b+1] = false)
39 end
40 idx = findfirst(notseen)
41 isnothing(idx) || (N[lN+1] = idx-1)
42 end
43 return count(==(0), N)
44end
45
First check if the functions produce the same results:
1import numpy as np
2from time import time
3
4def problem(c):
5 start = time()
6 N = np.array([0, 0])
7 U = np.arange(c)
8
9 for _ in U:
10 bits = np.bitwise_xor(N[:-1], N[-2::-1])
11 N = np.append(N, np.setdiff1d(U, bits).min())
12
13 return len(*np.where(N==0)), time()-start
14
15problem(10000)
16function problem(c)
17 N = [0]
18 U = Vector(0:c)
19
20 for _ in U
21 elems = N[1:length(N)-1]
22 bits = elems .⊻ reverse(elems)
23 push!(N, minimum(setdiff(U, bits)))
24 end
25
26 return sum(N .== 0)
27end
28
29@time problem(10000)
30function problem2(c)
31 N = zeros(Int, c+2)
32 notseen = falses(c+1)
33
34 for lN in 1:c+1
35 notseen .= true
36 @inbounds for i in 1:lN-1
37 b = N[i] ⊻ N[lN-i]
38 b <= c && (notseen[b+1] = false)
39 end
40 idx = findfirst(notseen)
41 isnothing(idx) || (N[lN+1] = idx-1)
42 end
43 return count(==(0), N)
44end
45julia> problem(10000), problem2(10000)
46(1475, 1475)
47
(I have also checked that the generated N
vector is identical)
Now let us benchmark both functions:
1import numpy as np
2from time import time
3
4def problem(c):
5 start = time()
6 N = np.array([0, 0])
7 U = np.arange(c)
8
9 for _ in U:
10 bits = np.bitwise_xor(N[:-1], N[-2::-1])
11 N = np.append(N, np.setdiff1d(U, bits).min())
12
13 return len(*np.where(N==0)), time()-start
14
15problem(10000)
16function problem(c)
17 N = [0]
18 U = Vector(0:c)
19
20 for _ in U
21 elems = N[1:length(N)-1]
22 bits = elems .⊻ reverse(elems)
23 push!(N, minimum(setdiff(U, bits)))
24 end
25
26 return sum(N .== 0)
27end
28
29@time problem(10000)
30function problem2(c)
31 N = zeros(Int, c+2)
32 notseen = falses(c+1)
33
34 for lN in 1:c+1
35 notseen .= true
36 @inbounds for i in 1:lN-1
37 b = N[i] ⊻ N[lN-i]
38 b <= c && (notseen[b+1] = false)
39 end
40 idx = findfirst(notseen)
41 isnothing(idx) || (N[lN+1] = idx-1)
42 end
43 return count(==(0), N)
44end
45julia> problem(10000), problem2(10000)
46(1475, 1475)
47julia> using BenchmarkTools
48
49julia> @btime problem(10000)
50 4.938 s (163884 allocations: 3.25 GiB)
511475
52
53julia> @btime problem2(10000)
54 76.275 ms (4 allocations: 79.59 KiB)
551475
56
So it turns out to be over 60x faster.
What I do to improve the performance is avoiding allocations. In Julia it is easy and efficient. If any part of the code is not clear please comment. Note that I concentrated on showing how to improve the performance of Julia code (and not trying to just replicate the Python code, since - as it was commented under the original post - doing language performance comparisons is very tricky). I think it is better to concentrate in this discussion on how to make Julia code fast.
EDIT
Indeed changing to Vector{Bool}
and removing the condition on b
and c
relation (which mathematically holds for these values of c
) gives a better speed:
1import numpy as np
2from time import time
3
4def problem(c):
5 start = time()
6 N = np.array([0, 0])
7 U = np.arange(c)
8
9 for _ in U:
10 bits = np.bitwise_xor(N[:-1], N[-2::-1])
11 N = np.append(N, np.setdiff1d(U, bits).min())
12
13 return len(*np.where(N==0)), time()-start
14
15problem(10000)
16function problem(c)
17 N = [0]
18 U = Vector(0:c)
19
20 for _ in U
21 elems = N[1:length(N)-1]
22 bits = elems .⊻ reverse(elems)
23 push!(N, minimum(setdiff(U, bits)))
24 end
25
26 return sum(N .== 0)
27end
28
29@time problem(10000)
30function problem2(c)
31 N = zeros(Int, c+2)
32 notseen = falses(c+1)
33
34 for lN in 1:c+1
35 notseen .= true
36 @inbounds for i in 1:lN-1
37 b = N[i] ⊻ N[lN-i]
38 b <= c && (notseen[b+1] = false)
39 end
40 idx = findfirst(notseen)
41 isnothing(idx) || (N[lN+1] = idx-1)
42 end
43 return count(==(0), N)
44end
45julia> problem(10000), problem2(10000)
46(1475, 1475)
47julia> using BenchmarkTools
48
49julia> @btime problem(10000)
50 4.938 s (163884 allocations: 3.25 GiB)
511475
52
53julia> @btime problem2(10000)
54 76.275 ms (4 allocations: 79.59 KiB)
551475
56julia> function problem3(c)
57 N = zeros(Int, c+2)
58 notseen = Vector{Bool}(undef, c+1)
59
60 for lN in 1:c+1
61 notseen .= true
62 @inbounds for i in 1:lN-1
63 b = N[i] ⊻ N[lN-i]
64 notseen[b+1] = false
65 end
66 idx = findfirst(notseen)
67 isnothing(idx) || (N[lN+1] = idx-1)
68 end
69 return count(==(0), N)
70 end
71problem3 (generic function with 1 method)
72
73julia> @btime problem3(10000)
74 20.714 ms (3 allocations: 88.17 KiB)
751475
76
QUESTION
Efficient summation in Python
Asked 2022-Jan-16 at 12:49I am trying to efficiently compute a summation of a summation in Python:
WolframAlpha is able to compute it too a high n value: sum of sum.
I have two approaches: a for loop method and an np.sum method. I thought the np.sum approach would be faster. However, they are the same until a large n, after which the np.sum has overflow errors and gives the wrong result.
I am trying to find the fastest way to compute this sum.
1import numpy as np
2import time
3
4def summation(start,end,func):
5 sum=0
6 for i in range(start,end+1):
7 sum+=func(i)
8 return sum
9
10def x(y):
11 return y
12
13def x2(y):
14 return y**2
15
16def mysum(y):
17 return x2(y)*summation(0, y, x)
18
19n=100
20
21# method #1
22start=time.time()
23summation(0,n,mysum)
24print('Slow method:',time.time()-start)
25
26# method #2
27start=time.time()
28w=np.arange(0,n+1)
29(w**2*np.cumsum(w)).sum()
30print('Fast method:',time.time()-start)
31
ANSWER
Answered 2022-Jan-16 at 12:49(fastest methods, 3 and 4, are at the end)
In a fast NumPy method you need to specify dtype=np.object
so that NumPy does not convert Python int
to its own dtypes (np.int64
or others). It will now give you correct results (checked it up to N=100000).
1import numpy as np
2import time
3
4def summation(start,end,func):
5 sum=0
6 for i in range(start,end+1):
7 sum+=func(i)
8 return sum
9
10def x(y):
11 return y
12
13def x2(y):
14 return y**2
15
16def mysum(y):
17 return x2(y)*summation(0, y, x)
18
19n=100
20
21# method #1
22start=time.time()
23summation(0,n,mysum)
24print('Slow method:',time.time()-start)
25
26# method #2
27start=time.time()
28w=np.arange(0,n+1)
29(w**2*np.cumsum(w)).sum()
30print('Fast method:',time.time()-start)
31# method #2
32start=time.time()
33w=np.arange(0, n+1, dtype=np.object)
34result2 = (w**2*np.cumsum(w)).sum()
35print('Fast method:', time.time()-start)
36
Your fast solution is significantly faster than the slow one. Yes, for large N's, but already at N=100 it is like 8 times faster:
1import numpy as np
2import time
3
4def summation(start,end,func):
5 sum=0
6 for i in range(start,end+1):
7 sum+=func(i)
8 return sum
9
10def x(y):
11 return y
12
13def x2(y):
14 return y**2
15
16def mysum(y):
17 return x2(y)*summation(0, y, x)
18
19n=100
20
21# method #1
22start=time.time()
23summation(0,n,mysum)
24print('Slow method:',time.time()-start)
25
26# method #2
27start=time.time()
28w=np.arange(0,n+1)
29(w**2*np.cumsum(w)).sum()
30print('Fast method:',time.time()-start)
31# method #2
32start=time.time()
33w=np.arange(0, n+1, dtype=np.object)
34result2 = (w**2*np.cumsum(w)).sum()
35print('Fast method:', time.time()-start)
36start=time.time()
37for i in range(100):
38 result1 = summation(0, n, mysum)
39print('Slow method:', time.time()-start)
40
41# method #2
42start=time.time()
43for i in range(100):
44 w=np.arange(0, n+1, dtype=np.object)
45 result2 = (w**2*np.cumsum(w)).sum()
46print('Fast method:', time.time()-start)
47
1import numpy as np
2import time
3
4def summation(start,end,func):
5 sum=0
6 for i in range(start,end+1):
7 sum+=func(i)
8 return sum
9
10def x(y):
11 return y
12
13def x2(y):
14 return y**2
15
16def mysum(y):
17 return x2(y)*summation(0, y, x)
18
19n=100
20
21# method #1
22start=time.time()
23summation(0,n,mysum)
24print('Slow method:',time.time()-start)
25
26# method #2
27start=time.time()
28w=np.arange(0,n+1)
29(w**2*np.cumsum(w)).sum()
30print('Fast method:',time.time()-start)
31# method #2
32start=time.time()
33w=np.arange(0, n+1, dtype=np.object)
34result2 = (w**2*np.cumsum(w)).sum()
35print('Fast method:', time.time()-start)
36start=time.time()
37for i in range(100):
38 result1 = summation(0, n, mysum)
39print('Slow method:', time.time()-start)
40
41# method #2
42start=time.time()
43for i in range(100):
44 w=np.arange(0, n+1, dtype=np.object)
45 result2 = (w**2*np.cumsum(w)).sum()
46print('Fast method:', time.time()-start)
47Slow method: 0.06906533241271973
48Fast method: 0.008007287979125977
49
EDIT: Even faster method (by KellyBundy, the Pumpkin) is by using pure python. Turns out NumPy has no advantage here, because it has no vectorized code for np.objects
.
1import numpy as np
2import time
3
4def summation(start,end,func):
5 sum=0
6 for i in range(start,end+1):
7 sum+=func(i)
8 return sum
9
10def x(y):
11 return y
12
13def x2(y):
14 return y**2
15
16def mysum(y):
17 return x2(y)*summation(0, y, x)
18
19n=100
20
21# method #1
22start=time.time()
23summation(0,n,mysum)
24print('Slow method:',time.time()-start)
25
26# method #2
27start=time.time()
28w=np.arange(0,n+1)
29(w**2*np.cumsum(w)).sum()
30print('Fast method:',time.time()-start)
31# method #2
32start=time.time()
33w=np.arange(0, n+1, dtype=np.object)
34result2 = (w**2*np.cumsum(w)).sum()
35print('Fast method:', time.time()-start)
36start=time.time()
37for i in range(100):
38 result1 = summation(0, n, mysum)
39print('Slow method:', time.time()-start)
40
41# method #2
42start=time.time()
43for i in range(100):
44 w=np.arange(0, n+1, dtype=np.object)
45 result2 = (w**2*np.cumsum(w)).sum()
46print('Fast method:', time.time()-start)
47Slow method: 0.06906533241271973
48Fast method: 0.008007287979125977
49# method #3
50import itertools
51start=time.time()
52for i in range(100):
53 result3 = sum(x*x * ysum for x, ysum in enumerate(itertools.accumulate(range(n+1))))
54print('Faster, pure python:', (time.time()-start))
55
1import numpy as np
2import time
3
4def summation(start,end,func):
5 sum=0
6 for i in range(start,end+1):
7 sum+=func(i)
8 return sum
9
10def x(y):
11 return y
12
13def x2(y):
14 return y**2
15
16def mysum(y):
17 return x2(y)*summation(0, y, x)
18
19n=100
20
21# method #1
22start=time.time()
23summation(0,n,mysum)
24print('Slow method:',time.time()-start)
25
26# method #2
27start=time.time()
28w=np.arange(0,n+1)
29(w**2*np.cumsum(w)).sum()
30print('Fast method:',time.time()-start)
31# method #2
32start=time.time()
33w=np.arange(0, n+1, dtype=np.object)
34result2 = (w**2*np.cumsum(w)).sum()
35print('Fast method:', time.time()-start)
36start=time.time()
37for i in range(100):
38 result1 = summation(0, n, mysum)
39print('Slow method:', time.time()-start)
40
41# method #2
42start=time.time()
43for i in range(100):
44 w=np.arange(0, n+1, dtype=np.object)
45 result2 = (w**2*np.cumsum(w)).sum()
46print('Fast method:', time.time()-start)
47Slow method: 0.06906533241271973
48Fast method: 0.008007287979125977
49# method #3
50import itertools
51start=time.time()
52for i in range(100):
53 result3 = sum(x*x * ysum for x, ysum in enumerate(itertools.accumulate(range(n+1))))
54print('Faster, pure python:', (time.time()-start))
55Faster, pure python: 0.0009944438934326172
56
EDIT2: Forss noticed that numpy fast method can be optimized by using x*x
instead of x**2
. For N > 200
it is faster than pure Python method. For N < 200
it is slower than pure Python method (the exact value of boundary may depend on machine, on mine it was 200, its best to check it yourself):
1import numpy as np
2import time
3
4def summation(start,end,func):
5 sum=0
6 for i in range(start,end+1):
7 sum+=func(i)
8 return sum
9
10def x(y):
11 return y
12
13def x2(y):
14 return y**2
15
16def mysum(y):
17 return x2(y)*summation(0, y, x)
18
19n=100
20
21# method #1
22start=time.time()
23summation(0,n,mysum)
24print('Slow method:',time.time()-start)
25
26# method #2
27start=time.time()
28w=np.arange(0,n+1)
29(w**2*np.cumsum(w)).sum()
30print('Fast method:',time.time()-start)
31# method #2
32start=time.time()
33w=np.arange(0, n+1, dtype=np.object)
34result2 = (w**2*np.cumsum(w)).sum()
35print('Fast method:', time.time()-start)
36start=time.time()
37for i in range(100):
38 result1 = summation(0, n, mysum)
39print('Slow method:', time.time()-start)
40
41# method #2
42start=time.time()
43for i in range(100):
44 w=np.arange(0, n+1, dtype=np.object)
45 result2 = (w**2*np.cumsum(w)).sum()
46print('Fast method:', time.time()-start)
47Slow method: 0.06906533241271973
48Fast method: 0.008007287979125977
49# method #3
50import itertools
51start=time.time()
52for i in range(100):
53 result3 = sum(x*x * ysum for x, ysum in enumerate(itertools.accumulate(range(n+1))))
54print('Faster, pure python:', (time.time()-start))
55Faster, pure python: 0.0009944438934326172
56# method #4
57start=time.time()
58for i in range(100):
59 w = np.arange(0, n+1, dtype=np.object)
60 result2 = (w*w*np.cumsum(w)).sum()
61print('Fast method x*x:', time.time()-start)
62
QUESTION
NumPy 1.21.2 may not yet support Python 3.10
Asked 2021-Nov-27 at 20:37Python 3.10 is released and when I try to install NumPy
it gives me this: NumPy 1.21.2 may not yet support Python 3.10.
. what should I do?
ANSWER
Answered 2021-Oct-06 at 12:26If on Windows, numpy has not yet released a precompiled wheel for Python 3.10. However you can try the unofficial wheels available at https://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy . Specifically look for
numpy‑1.21.2+mkl‑cp310‑cp310‑win_amd64.whl
ornumpy‑1.21.2+mkl‑cp310‑cp310‑win32.whl
depending on you system architecture.
After downloading the file go to the download directory and run pip install "<filename>.whl"
.)
(I have personally installed numpy‑1.21.2+mkl‑cp310‑cp310‑win_amd64.whl
and it worked for me.)
Community Discussions contain sources that include Stack Exchange Network
Tutorials and Learning Resources in Numpy
Tutorials and Learning Resources are not available at this moment for Numpy