Support
Quality
Security
License
Reuse
kandi has reviewed numpy and discovered the below as its top functions. This is intended to give you an instant insight into numpy implemented functionality, and help decide if they suit your requirements.
Get all kandi verified functions for this library.
Get all kandi verified functions for this library.
Website: https://www.numpy.org
Documentation: https://numpy.org/doc
Mailing list: https://mail.python.org/mailman/listinfo/numpy-discussion
Source code: https://github.com/numpy/numpy
Contributing: https://www.numpy.org/devdocs/dev/index.html
Bug reports: https://github.com/numpy/numpy/issues
Report a security vulnerability: https://tidelift.com/docs/security
a powerful N-dimensional array object
sophisticated (broadcasting) functions
tools for integrating C/C++ and Fortran code
useful linear algebra, Fourier transform, and random number capabilities
See all related Code Snippets
QUESTION
Why is `np.sum(range(N))` very slow?
Asked 2022-Mar-29 at 14:31I saw a video about speed of loops in python, where it was explained that doing sum(range(N))
is much faster than manually looping through range
and adding the variables together, since the former runs in C due to built-in functions being used, while in the latter the summation is done in (slow) python. I was curious what happens when adding numpy
to the mix. As I expected np.sum(np.arange(N))
is the fastest, but sum(np.arange(N))
and np.sum(range(N))
are even slower than doing the naive for loop.
Why is this?
Here's the script I used to test, some comments about the supposed cause of slowing done where I know (taken mostly from the video) and the results I got on my machine (python 3.10.0, numpy 1.21.2):
updated script:
import numpy as np
from timeit import timeit
N = 10_000_000
repetition = 10
def sum0(N = N):
s = 0
i = 0
while i < N: # condition is checked in python
s += i
i += 1 # both additions are done in python
return s
def sum1(N = N):
s = 0
for i in range(N): # increment in C
s += i # addition in python
return s
def sum2(N = N):
return sum(range(N)) # everything in C
def sum3(N = N):
return sum(list(range(N)))
def sum4(N = N):
return np.sum(range(N)) # very slow np.array conversion
def sum5(N = N):
# much faster np.array conversion
return np.sum(np.fromiter(range(N),dtype = int))
def sum5v2_(N = N):
# much faster np.array conversion
return np.sum(np.fromiter(range(N),dtype = np.int_))
def sum6(N = N):
# possibly slow conversion to Py_long from np.int
return sum(np.arange(N))
def sum7(N = N):
# list returns a list of np.int-s
return sum(list(np.arange(N)))
def sum7v2(N = N):
# tolist conversion to python int seems faster than the implicit conversion
# in sum(list()) (tolist returns a list of python int-s)
return sum(np.arange(N).tolist())
def sum8(N = N):
return np.sum(np.arange(N)) # everything in numpy (fortran libblas?)
def sum9(N = N):
return np.arange(N).sum() # remove dispatch overhead
def array_basic(N = N):
return np.array(range(N))
def array_dtype(N = N):
return np.array(range(N),dtype = np.int_)
def array_iter(N = N):
# np.sum's source code mentions to use fromiter to convert from generators
return np.fromiter(range(N),dtype = np.int_)
print(f"while loop: {timeit(sum0, number = repetition)}")
print(f"for loop: {timeit(sum1, number = repetition)}")
print(f"sum_range: {timeit(sum2, number = repetition)}")
print(f"sum_rangelist: {timeit(sum3, number = repetition)}")
print(f"npsum_range: {timeit(sum4, number = repetition)}")
print(f"npsum_iterrange: {timeit(sum5, number = repetition)}")
print(f"npsum_iterrangev2: {timeit(sum5, number = repetition)}")
print(f"sum_arange: {timeit(sum6, number = repetition)}")
print(f"sum_list_arange: {timeit(sum7, number = repetition)}")
print(f"sum_arange_tolist: {timeit(sum7v2, number = repetition)}")
print(f"npsum_arange: {timeit(sum8, number = repetition)}")
print(f"nparangenpsum: {timeit(sum9, number = repetition)}")
print(f"array_basic: {timeit(array_basic, number = repetition)}")
print(f"array_dtype: {timeit(array_dtype, number = repetition)}")
print(f"array_iter: {timeit(array_iter, number = repetition)}")
print(f"npsumarangeREP: {timeit(lambda : sum8(N/1000), number = 100000*repetition)}")
print(f"npsumarangeREP: {timeit(lambda : sum9(N/1000), number = 100000*repetition)}")
# Example output:
#
# while loop: 11.493371912998555
# for loop: 7.385945574002108
# sum_range: 2.4605720699983067
# sum_rangelist: 4.509678105998319
# npsum_range: 11.85120212900074
# npsum_iterrange: 4.464334709002287
# npsum_iterrangev2: 4.498494338993623
# sum_arange: 9.537815956995473
# sum_list_arange: 13.290120724996086
# sum_arange_tolist: 5.231948580003518
# npsum_arange: 0.241889145996538
# nparangenpsum: 0.21876695199898677
# array_basic: 11.736577274998126
# array_dtype: 8.71628468400013
# array_iter: 4.303306431000237
# npsumarangeREP: 21.240833958996518
# npsumarangeREP: 16.690092379001726
ANSWER
Answered 2021-Oct-16 at 17:42From the cpython source code for sum
sum initially seems to attempt a fast path that assumes all inputs are the same type. If that fails it will just iterate:
/* Fast addition by keeping temporary sums in C instead of new Python objects.
Assumes all inputs are the same type. If the assumption fails, default
to the more general routine.
*/
I'm not entirely certain what is happening under the hood, but it is likely the repeated creation/conversion of C types to Python objects that is causing these slow-downs. It's worth noting that both sum
and range
are implemented in C.
This next bit is not really an answer to the question, but I wondered if we could speed up sum
for python range
s as range
is quite a smart object.
To do this I've used functools.singledispatch
to override the built-in sum
function specifically for the range
type; then implemented a small function to calculate the sum of an arithmetic progression.
from functools import singledispatch
def sum_range(range_, /, start=0):
"""Overloaded `sum` for range, compute arithmetic sum"""
n = len(range_)
if not n:
return start
return int(start + (n * (range_[0] + range_[-1]) / 2))
sum = singledispatch(sum)
sum.register(range, sum_range)
def test():
"""
>>> sum(range(0, 100))
4950
>>> sum(range(0, 10, 2))
20
>>> sum(range(0, 9, 2))
20
>>> sum(range(0, -10, -1))
-45
>>> sum(range(-10, 10))
-10
>>> sum(range(-1, -100, -2))
-2500
>>> sum(range(0, 10, 100))
0
>>> sum(range(0, 0))
0
>>> sum(range(0, 100), 50)
5000
>>> sum(range(0, 0), 10)
10
"""
if __name__ == "__main__":
import doctest
doctest.testmod()
I'm not sure if this is complete, but it's definitely faster than looping.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
No vulnerabilities reported
Save this library and start creating your kit
See Similar Libraries in
Save this library and start creating your kit
Open Weaver – Develop Applications Faster with Open Source