timeit | Timing macros for Rust modelled after Python 's timeit

by gustavla Rust Version: Current License: MIT

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | timeit Summary

timeit is a Rust library. timeit has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

This crate provides macros that make it easy to benchmark blocks of code. It is inspired and named after timeit from Python.

Support

Quality

Security

License

Reuse

Support

timeit has a low active ecosystem.

It has 15 star(s) with 1 fork(s). There are 1 watchers for this library.

It had no major release in the last 6 months.

There are 0 open issues and 1 have been closed. There are 1 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of timeit is current.

Quality

timeit has 0 bugs and 0 code smells.

Security

timeit has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

timeit code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

timeit is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

timeit releases are not available. You will need to build from source code and install.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of timeit

Get all kandi verified functions for this library.

timeit Key Features

No Key Features are available at this moment for timeit.

timeit Examples and Code Snippets

No Code Snippets are available at this moment for timeit.

Community Discussions

Trending Discussions on timeit

Why is it faster to compare strings that match than strings that do not?

Why is `np.sum(range(N))` very slow?

Dramatic drop in numpy fromfile performance when switching from python 2 to python 3

Why is numpy cartesian product slower than pure python version?

Why does numpy.view(bool) makes numpy.logical_and significantly faster?

Replacing whole string is faster than replacing only its first character

How to speed up the agg of pandas groupby bins?

Understanding Numba Performance Differences

Numba np.convolve really slow

No speedup when summing uint16 vs uint64 arrays with NumPy?

QUESTION

Why is it faster to compare strings that match than strings that do not?

Asked 2022-Mar-30 at 11:58

Here are two measurements:

...

ANSWER

Answered 2022-Mar-30 at 11:57

Combining my comment and the comment by @khelwood:

TL;DR:
When analysing the bytecode for the two comparisons, it reveals the 'time' and 'time' strings are assigned to the same object. Therefore, an up-front identity check (at C-level) is the reason for the increased comparison speed.

The reason for the same object assignment is that, as an implementation detail, CPython interns strings which contain only 'name characters' (i.e. alpha and underscore characters). This enables the object's identity check.

Bytecode:

Source https://stackoverflow.com/questions/71644405

QUESTION

Why is `np.sum(range(N))` very slow?

Asked 2022-Mar-29 at 14:31

I saw a video about speed of loops in python, where it was explained that doing sum(range(N)) is much faster than manually looping through range and adding the variables together, since the former runs in C due to built-in functions being used, while in the latter the summation is done in (slow) python. I was curious what happens when adding numpy to the mix. As I expected np.sum(np.arange(N)) is the fastest, but sum(np.arange(N)) and np.sum(range(N)) are even slower than doing the naive for loop.

Why is this?

Here's the script I used to test, some comments about the supposed cause of slowing done where I know (taken mostly from the video) and the results I got on my machine (python 3.10.0, numpy 1.21.2):

updated script:

...

ANSWER

Answered 2021-Oct-16 at 17:42

From the cpython source code for sum sum initially seems to attempt a fast path that assumes all inputs are the same type. If that fails it will just iterate:

Source https://stackoverflow.com/questions/69584027

QUESTION

Dramatic drop in numpy fromfile performance when switching from python 2 to python 3

Asked 2022-Mar-16 at 23:53

Background

I am analyzing large (between 0.5 and 20 GB) binary files, which contain information about particle collisions from a simulation. The number of collisions, number of incoming and outgoing particles can vary, so the files consist of variable length records. For analysis I use python and numpy. After switching from python 2 to python 3 I have noticed a dramatic decrease in performance of my scripts and traced it down to numpy.fromfile function.

Simplified code to reproduce the problem

This code, iotest.py

Generates a file of a similar structure to what I have in my studies
Reads it using numpy.fromfile
Reads it using numpy.frombuffer
Compares timing of both

...

ANSWER

Answered 2022-Mar-16 at 23:52

TL;DR: np.fromfile and np.frombuffer are not optimized to read many small buffers. You can load the whole file in a big buffer and then decode it very efficiently using Numba.

Analysis

The main issue is that the benchmark measure overheads. Indeed, it perform a lot of system/C calls that are very inefficient. For example, on the 24 MiB file, the while loops calls 601_214 times np.fromfile and np.frombuffer. The timing on my machine are 10.5s for read_binary_npfromfile and 1.2s for read_binary_npfrombuffer. This means respectively 17.4 us and 2.0 us per call for the two function. Such timing per call are relatively reasonable considering Numpy is not designed to efficiently operate on very small arrays (it needs to perform many checks, call some functions, wrap/unwrap CPython types, allocate some objects, etc.). The overhead of these functions can change from one version to another and unless it becomes huge, this is not a bug. The addition of new features to Numpy and CPython often impact overheads and this appear to be the case here (eg. buffering interface). The point is that it is not really a problem because there is a way to use a different approach that is much much faster (as it does not pay huge overheads).

Faster Numpy code

The main solution to write a fast implementation is to read the whole file once in a big byte buffer and then decode it using np.view. That being said, this is a bit tricky because of data alignment and the fact that nearly all Numpy function needs to be prohibited in the while loop due to their overhead. Here is an example:

Source https://stackoverflow.com/questions/71411907

QUESTION

Why is numpy cartesian product slower than pure python version?

Asked 2022-Feb-25 at 01:58

Input ...

ANSWER

Answered 2022-Feb-23 at 23:47

It is going to be quite hard to get numpy to go as fast as the filtered python iterator because numpy processes whole structures that will inevitably be larger than the result of filtering sets.

Here is the best I could come up with to process the product of arrays in such a way that the result is filtered on unique combinations of distinct values:

Source https://stackoverflow.com/questions/71244250

QUESTION

Why does numpy.view(bool) makes numpy.logical_and significantly faster?

Asked 2022-Feb-22 at 20:23

When passing a numpy.ndarray of uint8 to numpy.logical_and, it runs significantly faster if I apply numpy.view(bool) to its inputs.

...

ANSWER

Answered 2022-Feb-22 at 20:23

This is a performance issue of the current Numpy implementation. I can also reproduce this problem on Windows (using an Intel Skylake Xeon processor with Numpy 1.20.3). np.logical_and(a, b) executes a very-inefficient scalar assembly code based on slow conditional jumps while np.logical_and(a.view(bool), b.view(bool)) executes relatively-fast SIMD instructions.

Currently, Numpy uses a specific implementation for bool-types. Regarding the compiler used, the general-purpose implementation can be significantly slower if the compiler used to build Numpy failed to automatically vectorize the code which is apparently the case on Windows (and explain why this is not the case on other platforms since the compiler is likely not exactly the same). The Numpy code can be improved for non-bool types. Note that the vectorization of Numpy is an ongoing work and we plan optimize this soon.

Deeper analysis

Here is the assembly code executed by np.logical_and(a, b):

Source https://stackoverflow.com/questions/71225872

QUESTION

Replacing whole string is faster than replacing only its first character

Asked 2022-Jan-31 at 23:38

I tried to replace a character a by b in a given large string. I did an experiment - first I replaced it in the whole string, then I replaced it only at its beginning.

...

ANSWER

Answered 2022-Jan-31 at 23:38

The functions provided in the Python re module do not optimize based on anchors. In particular, functions that try to apply a regex at every position - .search, .sub, .findall etc. - will do so even when the regex can only possibly match at the beginning. I.e., even without multi-line mode specified, such that ^ can only match at the beginning of the string, the call is not re-routed internally. Thus:

Source https://stackoverflow.com/questions/70927513

QUESTION

How to speed up the agg of pandas groupby bins?

Asked 2021-Dec-23 at 10:16

I have created different bins for each column and grouped the DataFrame based on these.

...

ANSWER

Answered 2021-Dec-22 at 16:39

Because your bins are the same for your 3 columns, use codes from cat accessor:

Source https://stackoverflow.com/questions/70452146

QUESTION

Understanding Numba Performance Differences

Asked 2021-Dec-21 at 04:01

I'm trying to understand the performance differences I am seeing by using various numba implementations of an algorithm. In particular, I would expect func1d from below to be the fastest implementation since it it the only algorithm that is not copying data, however from my timings func1b appears to be fastest.

...

ANSWER

Answered 2021-Dec-21 at 04:01

Here, copying of data doesn't play a big role: the bottle neck is fast how the tanh-function is evaluated. There are many algorithms: some of them are faster some of them are slower, some are more precise some less.

Different numpy-distributions use different implementations of tanh-function, e.g. it could be one from mkl/vml or the one from the gnu-math-library.

Depending on numba version, also either the mkl/svml impelementation is used or gnu-math-library.

The easiest way to look inside is to use a profiler, for example perf.

For the numpy-version on my machine I get:

Source https://stackoverflow.com/questions/70426958

QUESTION

Numba np.convolve really slow

Asked 2021-Dec-11 at 04:43

I'm trying to speed up a piece of code convolving a 1D array (filter) over each column of a 2D array. Somehow, when I run it with numba's njit, I get a 7x slow down. My thoughts:

Maybe column indexing is slowing it down, but switching to row indexing didn't affect performance
Maybe slice indexing the results of the convolution is slow, but removing it didn't change anything
I've checked that numba understands all the types properly

(tested on Windows 10, python 3.9.4 from conda, numpy 1.12.2, numba 0.53.1)

Can anyone tell me why this code is slow?

...

ANSWER

Answered 2021-Dec-11 at 04:14

The problem comes from the Numba implementation of np.convolve. This is a known issue. It turns out that the current Numba implementation is much slower than the one of Numpy (version <=0.54.1 tested on Windows).

Under the hood

On one hand, the Numpy implementation call correlate which itself performs a dot product that should be implemented by the fast BLAS library available on your system. On the other hand, the Numba implementation calls _get_inner_prod which use np.dot that should also use the same BLAS library (assuming a BLAS is detected which should be the case)...

That being said, there are multiple issues related to the dot product:

First of all, if the internal variable _HAVE_BLAS of numba/np/arraymath.py is manually disabled, Numba use a fallback implementation of the dot product supposed to be significantly slower. However, it turns out that using the fallback dot product implementation used by np.convolve result in a 5 times faster execution than with the BLAS wrapper on my machine! Using additionally the parameter fastmath=True in the njit Numba decorator results in an overall 8.7 times faster execution! Here is the testing code:

Source https://stackoverflow.com/questions/70311592

QUESTION

No speedup when summing uint16 vs uint64 arrays with NumPy?

Asked 2021-Nov-29 at 00:22

I have to do a large number of operations (additions) on relatively small integers, and I started considering which datatype would give the best performance on a 64 bit machine.

I was convinced that adding together 4 uint16 would take the same time as one uint64, since the ALU could make 4 uint16 additions using only 1 uint64 adder. (Carry propagation means this doesn't work that easily for a single 64-bit adder, but this is how integer SIMD instructions work.)

Apparently this is not the case:

...

ANSWER

Answered 2021-Nov-29 at 00:22

TL;DR: I made an experimental analysis on Numpy 1.21.1. Experimental results show that np.sum does NOT (really) make use of SIMD instructions: no SIMD instruction are used for integers, and scalar SIMD instructions are used for floating-point numbers! Moreover, Numpy converts the integers to 64-bits values for smaller integer types by default so to avoid overflows!

Note that this may not reflect all Numpy versions since there is an ongoing work to provide SIMD support for commonly used functions (the version Numpy 1.22.0rc1 not yet released continue this long-standing work). Moreover, the compiler or the processor used may significantly impact the results. The following experiments have been done using a Numpy retrieved from pip on a Debian Linux with a i5-9600KF processor.

Under the hood of np.sum

For floating-point numbers, Numpy uses a pairwise algorithm which is known to be quite numerically stable while being relatively fast. This can be seen in the code, but also simply using a profiler: TYPE_pairwise_sum is the C function called to compute the sum at runtime (where TYPE is DOUBLE or FLOAT).

For integers, Numpy use a classical naive reduction. The C function called is ULONG_add_avx2 on AVX2-compatible machines. It also surprisingly convert items to 64-bit ones if the type is not np.int64.

Here is the hot part of the assembly code executed by the DOUBLE_pairwise_sum function

Source https://stackoverflow.com/questions/70134026

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install timeit

You can download it from GitHub.
Rust is installed and managed by the rustup tool. Rust has a 6-week rapid release process and supports a great number of platforms, so there are many builds of Rust available at any time. Please refer rust-lang.org for more information.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: