TimeIt | simple C # library for mesure your application methods
kandi X-RAY | TimeIt Summary
kandi X-RAY | TimeIt Summary
A simple C# library for mesure your application methods. This library only exposes two mwthods> Start() and Invoke(). The only difference is Start() needs to create an Instance of TimeIt class, and Invoke() can be called directly.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of TimeIt
TimeIt Key Features
TimeIt Examples and Code Snippets
Community Discussions
Trending Discussions on TimeIt
QUESTION
Here are two measurements:
...ANSWER
Answered 2022-Mar-30 at 11:57Combining my comment and the comment by @khelwood:
TL;DR:
When analysing the bytecode for the two comparisons, it reveals the 'time'
and 'time'
strings are assigned to the same object. Therefore, an up-front identity check (at C-level) is the reason for the increased comparison speed.
The reason for the same object assignment is that, as an implementation detail, CPython interns strings which contain only 'name characters' (i.e. alpha and underscore characters). This enables the object's identity check.
Bytecode:
QUESTION
I saw a video about speed of loops in python, where it was explained that doing sum(range(N))
is much faster than manually looping through range
and adding the variables together, since the former runs in C due to built-in functions being used, while in the latter the summation is done in (slow) python. I was curious what happens when adding numpy
to the mix. As I expected np.sum(np.arange(N))
is the fastest, but sum(np.arange(N))
and np.sum(range(N))
are even slower than doing the naive for loop.
Why is this?
Here's the script I used to test, some comments about the supposed cause of slowing done where I know (taken mostly from the video) and the results I got on my machine (python 3.10.0, numpy 1.21.2):
updated script:
...ANSWER
Answered 2021-Oct-16 at 17:42From the cpython source code for sum
sum initially seems to attempt a fast path that assumes all inputs are the same type. If that fails it will just iterate:
QUESTION
I am analyzing large (between 0.5 and 20 GB) binary files, which contain information about particle collisions from a simulation. The number of collisions, number of incoming and outgoing particles can vary, so the files consist of variable length records. For analysis I use python and numpy. After switching from python 2 to python 3 I have noticed a dramatic decrease in performance of my scripts and traced it down to numpy.fromfile function.
Simplified code to reproduce the problemThis code, iotest.py
- Generates a file of a similar structure to what I have in my studies
- Reads it using numpy.fromfile
- Reads it using numpy.frombuffer
- Compares timing of both
ANSWER
Answered 2022-Mar-16 at 23:52TL;DR: np.fromfile
and np.frombuffer
are not optimized to read many small buffers. You can load the whole file in a big buffer and then decode it very efficiently using Numba.
The main issue is that the benchmark measure overheads. Indeed, it perform a lot of system/C calls that are very inefficient. For example, on the 24 MiB file, the while
loops calls 601_214 times np.fromfile
and np.frombuffer
. The timing on my machine are 10.5s for read_binary_npfromfile
and 1.2s for read_binary_npfrombuffer
. This means respectively 17.4 us and 2.0 us per call for the two function. Such timing per call are relatively reasonable considering Numpy is not designed to efficiently operate on very small arrays (it needs to perform many checks, call some functions, wrap/unwrap CPython types, allocate some objects, etc.). The overhead of these functions can change from one version to another and unless it becomes huge, this is not a bug. The addition of new features to Numpy and CPython often impact overheads and this appear to be the case here (eg. buffering interface). The point is that it is not really a problem because there is a way to use a different approach that is much much faster (as it does not pay huge overheads).
The main solution to write a fast implementation is to read the whole file once in a big byte buffer and then decode it using np.view
. That being said, this is a bit tricky because of data alignment and the fact that nearly all Numpy function needs to be prohibited in the while loop due to their overhead. Here is an example:
QUESTION
ANSWER
Answered 2022-Feb-23 at 23:47It is going to be quite hard to get numpy to go as fast as the filtered python iterator because numpy processes whole structures that will inevitably be larger than the result of filtering sets.
Here is the best I could come up with to process the product of arrays in such a way that the result is filtered on unique combinations of distinct values:
QUESTION
When passing a numpy.ndarray
of uint8
to numpy.logical_and
, it runs significantly faster if I apply numpy.view(bool)
to its inputs.
ANSWER
Answered 2022-Feb-22 at 20:23This is a performance issue of the current Numpy implementation. I can also reproduce this problem on Windows (using an Intel Skylake Xeon processor with Numpy 1.20.3). np.logical_and(a, b)
executes a very-inefficient scalar assembly code based on slow conditional jumps while np.logical_and(a.view(bool), b.view(bool))
executes relatively-fast SIMD instructions.
Currently, Numpy uses a specific implementation for bool
-types. Regarding the compiler used, the general-purpose implementation can be significantly slower if the compiler used to build Numpy failed to automatically vectorize the code which is apparently the case on Windows (and explain why this is not the case on other platforms since the compiler is likely not exactly the same). The Numpy code can be improved for non-bool
types. Note that the vectorization of Numpy is an ongoing work and we plan optimize this soon.
Here is the assembly code executed by np.logical_and(a, b)
:
QUESTION
I tried to replace a character a
by b
in a given large string. I did an experiment - first I replaced it in the whole string, then I replaced it only at its beginning.
ANSWER
Answered 2022-Jan-31 at 23:38The functions provided in the Python re
module do not optimize based on anchors. In particular, functions that try to apply a regex at every position - .search
, .sub
, .findall
etc. - will do so even when the regex can only possibly match at the beginning. I.e., even without multi-line mode specified, such that ^
can only match at the beginning of the string, the call is not re-routed internally. Thus:
QUESTION
I have created different bins for each column and grouped the DataFrame based on these.
...ANSWER
Answered 2021-Dec-22 at 16:39Because your bins are the same for your 3 columns, use codes
from cat
accessor:
QUESTION
I'm trying to understand the performance differences I am seeing by using various numba
implementations of an algorithm. In particular, I would expect func1d
from below to be the fastest implementation since it it the only algorithm that is not copying data, however from my timings func1b
appears to be fastest.
ANSWER
Answered 2021-Dec-21 at 04:01Here, copying of data doesn't play a big role: the bottle neck is fast how the tanh
-function is evaluated. There are many algorithms: some of them are faster some of them are slower, some are more precise some less.
Different numpy-distributions use different implementations of tanh
-function, e.g. it could be one from mkl/vml or the one from the gnu-math-library.
Depending on numba version, also either the mkl/svml impelementation is used or gnu-math-library.
The easiest way to look inside is to use a profiler, for example perf
.
For the numpy-version on my machine I get:
QUESTION
I'm trying to speed up a piece of code convolving a 1D array (filter) over each column of a 2D array. Somehow, when I run it with numba's njit
, I get a 7x slow down. My thoughts:
- Maybe column indexing is slowing it down, but switching to row indexing didn't affect performance
- Maybe slice indexing the results of the convolution is slow, but removing it didn't change anything
- I've checked that numba understands all the types properly
(tested on Windows 10, python 3.9.4 from conda, numpy 1.12.2, numba 0.53.1)
Can anyone tell me why this code is slow?
...ANSWER
Answered 2021-Dec-11 at 04:14The problem comes from the Numba implementation of np.convolve
. This is a known issue. It turns out that the current Numba implementation is much slower than the one of Numpy (version <=0.54.1 tested on Windows).
On one hand, the Numpy implementation call correlate
which itself performs a dot product that should be implemented by the fast BLAS library available on your system. On the other hand, the Numba implementation calls _get_inner_prod
which use np.dot
that should also use the same BLAS library (assuming a BLAS is detected which should be the case)...
That being said, there are multiple issues related to the dot product:
First of all, if the internal variable _HAVE_BLAS
of numba/np/arraymath.py
is manually disabled, Numba use a fallback implementation of the dot product supposed to be significantly slower. However, it turns out that using the fallback dot product implementation used by np.convolve
result in a 5 times faster execution than with the BLAS wrapper on my machine! Using additionally the parameter fastmath=True
in the njit
Numba decorator results in an overall 8.7 times faster execution! Here is the testing code:
QUESTION
I have to do a large number of operations (additions) on relatively small integers, and I started considering which datatype would give the best performance on a 64 bit machine.
I was convinced that adding together 4 uint16
would take the same time as one uint64
, since the ALU could make 4 uint16
additions using only 1 uint64
adder. (Carry propagation means this doesn't work that easily for a single 64-bit adder, but this is how integer SIMD instructions work.)
Apparently this is not the case:
...ANSWER
Answered 2021-Nov-29 at 00:22TL;DR: I made an experimental analysis on Numpy 1.21.1. Experimental results show that np.sum
does NOT (really) make use of SIMD instructions: no SIMD instruction are used for integers, and scalar SIMD instructions are used for floating-point numbers! Moreover, Numpy converts the integers to 64-bits values for smaller integer types by default so to avoid overflows!
Note that this may not reflect all Numpy versions since there is an ongoing work to provide SIMD support for commonly used functions (the version Numpy 1.22.0rc1 not yet released continue this long-standing work). Moreover, the compiler or the processor used may significantly impact the results. The following experiments have been done using a Numpy retrieved from pip on a Debian Linux with a i5-9600KF processor.
Under the hood ofnp.sum
For floating-point numbers, Numpy uses a pairwise algorithm which is known to be quite numerically stable while being relatively fast. This can be seen in the code, but also simply using a profiler: TYPE_pairwise_sum
is the C function called to compute the sum at runtime (where TYPE
is DOUBLE
or FLOAT
).
For integers, Numpy use a classical naive reduction. The C function called is ULONG_add_avx2
on AVX2-compatible machines. It also surprisingly convert items to 64-bit ones if the type is not np.int64
.
Here is the hot part of the assembly code executed by the DOUBLE_pairwise_sum
function
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install TimeIt
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page