timeit | Timing macros for Rust modelled after Python 's timeit

 by   gustavla Rust Version: Current License: MIT

kandi X-RAY | timeit Summary

kandi X-RAY | timeit Summary

timeit is a Rust library. timeit has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

This crate provides macros that make it easy to benchmark blocks of code. It is inspired and named after timeit from Python.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              timeit has a low active ecosystem.
              It has 15 star(s) with 1 fork(s). There are 1 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 0 open issues and 1 have been closed. There are 1 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of timeit is current.

            kandi-Quality Quality

              timeit has 0 bugs and 0 code smells.

            kandi-Security Security

              timeit has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              timeit code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              timeit is licensed under the MIT License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              timeit releases are not available. You will need to build from source code and install.
              Installation instructions are not available. Examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of timeit
            Get all kandi verified functions for this library.

            timeit Key Features

            No Key Features are available at this moment for timeit.

            timeit Examples and Code Snippets

            No Code Snippets are available at this moment for timeit.

            Community Discussions

            QUESTION

            Why is it faster to compare strings that match than strings that do not?
            Asked 2022-Mar-30 at 11:58

            Here are two measurements:

            ...

            ANSWER

            Answered 2022-Mar-30 at 11:57

            Combining my comment and the comment by @khelwood:

            TL;DR:
            When analysing the bytecode for the two comparisons, it reveals the 'time' and 'time' strings are assigned to the same object. Therefore, an up-front identity check (at C-level) is the reason for the increased comparison speed.

            The reason for the same object assignment is that, as an implementation detail, CPython interns strings which contain only 'name characters' (i.e. alpha and underscore characters). This enables the object's identity check.

            Bytecode:

            Source https://stackoverflow.com/questions/71644405

            QUESTION

            Why is `np.sum(range(N))` very slow?
            Asked 2022-Mar-29 at 14:31

            I saw a video about speed of loops in python, where it was explained that doing sum(range(N)) is much faster than manually looping through range and adding the variables together, since the former runs in C due to built-in functions being used, while in the latter the summation is done in (slow) python. I was curious what happens when adding numpy to the mix. As I expected np.sum(np.arange(N)) is the fastest, but sum(np.arange(N)) and np.sum(range(N)) are even slower than doing the naive for loop.

            Why is this?

            Here's the script I used to test, some comments about the supposed cause of slowing done where I know (taken mostly from the video) and the results I got on my machine (python 3.10.0, numpy 1.21.2):

            updated script:

            ...

            ANSWER

            Answered 2021-Oct-16 at 17:42

            From the cpython source code for sum sum initially seems to attempt a fast path that assumes all inputs are the same type. If that fails it will just iterate:

            Source https://stackoverflow.com/questions/69584027

            QUESTION

            Dramatic drop in numpy fromfile performance when switching from python 2 to python 3
            Asked 2022-Mar-16 at 23:53
            Background

            I am analyzing large (between 0.5 and 20 GB) binary files, which contain information about particle collisions from a simulation. The number of collisions, number of incoming and outgoing particles can vary, so the files consist of variable length records. For analysis I use python and numpy. After switching from python 2 to python 3 I have noticed a dramatic decrease in performance of my scripts and traced it down to numpy.fromfile function.

            Simplified code to reproduce the problem

            This code, iotest.py

            1. Generates a file of a similar structure to what I have in my studies
            2. Reads it using numpy.fromfile
            3. Reads it using numpy.frombuffer
            4. Compares timing of both
            ...

            ANSWER

            Answered 2022-Mar-16 at 23:52

            TL;DR: np.fromfile and np.frombuffer are not optimized to read many small buffers. You can load the whole file in a big buffer and then decode it very efficiently using Numba.

            Analysis

            The main issue is that the benchmark measure overheads. Indeed, it perform a lot of system/C calls that are very inefficient. For example, on the 24 MiB file, the while loops calls 601_214 times np.fromfile and np.frombuffer. The timing on my machine are 10.5s for read_binary_npfromfile and 1.2s for read_binary_npfrombuffer. This means respectively 17.4 us and 2.0 us per call for the two function. Such timing per call are relatively reasonable considering Numpy is not designed to efficiently operate on very small arrays (it needs to perform many checks, call some functions, wrap/unwrap CPython types, allocate some objects, etc.). The overhead of these functions can change from one version to another and unless it becomes huge, this is not a bug. The addition of new features to Numpy and CPython often impact overheads and this appear to be the case here (eg. buffering interface). The point is that it is not really a problem because there is a way to use a different approach that is much much faster (as it does not pay huge overheads).

            Faster Numpy code

            The main solution to write a fast implementation is to read the whole file once in a big byte buffer and then decode it using np.view. That being said, this is a bit tricky because of data alignment and the fact that nearly all Numpy function needs to be prohibited in the while loop due to their overhead. Here is an example:

            Source https://stackoverflow.com/questions/71411907

            QUESTION

            Why is numpy cartesian product slower than pure python version?
            Asked 2022-Feb-25 at 01:58
            Input ...

            ANSWER

            Answered 2022-Feb-23 at 23:47

            It is going to be quite hard to get numpy to go as fast as the filtered python iterator because numpy processes whole structures that will inevitably be larger than the result of filtering sets.

            Here is the best I could come up with to process the product of arrays in such a way that the result is filtered on unique combinations of distinct values:

            Source https://stackoverflow.com/questions/71244250

            QUESTION

            Why does numpy.view(bool) makes numpy.logical_and significantly faster?
            Asked 2022-Feb-22 at 20:23

            When passing a numpy.ndarray of uint8 to numpy.logical_and, it runs significantly faster if I apply numpy.view(bool) to its inputs.

            ...

            ANSWER

            Answered 2022-Feb-22 at 20:23

            This is a performance issue of the current Numpy implementation. I can also reproduce this problem on Windows (using an Intel Skylake Xeon processor with Numpy 1.20.3). np.logical_and(a, b) executes a very-inefficient scalar assembly code based on slow conditional jumps while np.logical_and(a.view(bool), b.view(bool)) executes relatively-fast SIMD instructions.

            Currently, Numpy uses a specific implementation for bool-types. Regarding the compiler used, the general-purpose implementation can be significantly slower if the compiler used to build Numpy failed to automatically vectorize the code which is apparently the case on Windows (and explain why this is not the case on other platforms since the compiler is likely not exactly the same). The Numpy code can be improved for non-bool types. Note that the vectorization of Numpy is an ongoing work and we plan optimize this soon.

            Deeper analysis

            Here is the assembly code executed by np.logical_and(a, b):

            Source https://stackoverflow.com/questions/71225872

            QUESTION

            Replacing whole string is faster than replacing only its first character
            Asked 2022-Jan-31 at 23:38

            I tried to replace a character a by b in a given large string. I did an experiment - first I replaced it in the whole string, then I replaced it only at its beginning.

            ...

            ANSWER

            Answered 2022-Jan-31 at 23:38

            The functions provided in the Python re module do not optimize based on anchors. In particular, functions that try to apply a regex at every position - .search, .sub, .findall etc. - will do so even when the regex can only possibly match at the beginning. I.e., even without multi-line mode specified, such that ^ can only match at the beginning of the string, the call is not re-routed internally. Thus:

            Source https://stackoverflow.com/questions/70927513

            QUESTION

            How to speed up the agg of pandas groupby bins?
            Asked 2021-Dec-23 at 10:16

            I have created different bins for each column and grouped the DataFrame based on these.

            ...

            ANSWER

            Answered 2021-Dec-22 at 16:39

            Because your bins are the same for your 3 columns, use codes from cat accessor:

            Source https://stackoverflow.com/questions/70452146

            QUESTION

            Understanding Numba Performance Differences
            Asked 2021-Dec-21 at 04:01

            I'm trying to understand the performance differences I am seeing by using various numba implementations of an algorithm. In particular, I would expect func1d from below to be the fastest implementation since it it the only algorithm that is not copying data, however from my timings func1b appears to be fastest.

            ...

            ANSWER

            Answered 2021-Dec-21 at 04:01

            Here, copying of data doesn't play a big role: the bottle neck is fast how the tanh-function is evaluated. There are many algorithms: some of them are faster some of them are slower, some are more precise some less.

            Different numpy-distributions use different implementations of tanh-function, e.g. it could be one from mkl/vml or the one from the gnu-math-library.

            Depending on numba version, also either the mkl/svml impelementation is used or gnu-math-library.

            The easiest way to look inside is to use a profiler, for example perf.

            For the numpy-version on my machine I get:

            Source https://stackoverflow.com/questions/70426958

            QUESTION

            Numba np.convolve really slow
            Asked 2021-Dec-11 at 04:43

            I'm trying to speed up a piece of code convolving a 1D array (filter) over each column of a 2D array. Somehow, when I run it with numba's njit, I get a 7x slow down. My thoughts:

            • Maybe column indexing is slowing it down, but switching to row indexing didn't affect performance
            • Maybe slice indexing the results of the convolution is slow, but removing it didn't change anything
            • I've checked that numba understands all the types properly

            (tested on Windows 10, python 3.9.4 from conda, numpy 1.12.2, numba 0.53.1)

            Can anyone tell me why this code is slow?

            ...

            ANSWER

            Answered 2021-Dec-11 at 04:14

            The problem comes from the Numba implementation of np.convolve. This is a known issue. It turns out that the current Numba implementation is much slower than the one of Numpy (version <=0.54.1 tested on Windows).

            Under the hood

            On one hand, the Numpy implementation call correlate which itself performs a dot product that should be implemented by the fast BLAS library available on your system. On the other hand, the Numba implementation calls _get_inner_prod which use np.dot that should also use the same BLAS library (assuming a BLAS is detected which should be the case)...

            That being said, there are multiple issues related to the dot product:

            First of all, if the internal variable _HAVE_BLAS of numba/np/arraymath.py is manually disabled, Numba use a fallback implementation of the dot product supposed to be significantly slower. However, it turns out that using the fallback dot product implementation used by np.convolve result in a 5 times faster execution than with the BLAS wrapper on my machine! Using additionally the parameter fastmath=True in the njit Numba decorator results in an overall 8.7 times faster execution! Here is the testing code:

            Source https://stackoverflow.com/questions/70311592

            QUESTION

            No speedup when summing uint16 vs uint64 arrays with NumPy?
            Asked 2021-Nov-29 at 00:22

            I have to do a large number of operations (additions) on relatively small integers, and I started considering which datatype would give the best performance on a 64 bit machine.

            I was convinced that adding together 4 uint16 would take the same time as one uint64, since the ALU could make 4 uint16 additions using only 1 uint64 adder. (Carry propagation means this doesn't work that easily for a single 64-bit adder, but this is how integer SIMD instructions work.)

            Apparently this is not the case:

            ...

            ANSWER

            Answered 2021-Nov-29 at 00:22

            TL;DR: I made an experimental analysis on Numpy 1.21.1. Experimental results show that np.sum does NOT (really) make use of SIMD instructions: no SIMD instruction are used for integers, and scalar SIMD instructions are used for floating-point numbers! Moreover, Numpy converts the integers to 64-bits values for smaller integer types by default so to avoid overflows!

            Note that this may not reflect all Numpy versions since there is an ongoing work to provide SIMD support for commonly used functions (the version Numpy 1.22.0rc1 not yet released continue this long-standing work). Moreover, the compiler or the processor used may significantly impact the results. The following experiments have been done using a Numpy retrieved from pip on a Debian Linux with a i5-9600KF processor.

            Under the hood of np.sum

            For floating-point numbers, Numpy uses a pairwise algorithm which is known to be quite numerically stable while being relatively fast. This can be seen in the code, but also simply using a profiler: TYPE_pairwise_sum is the C function called to compute the sum at runtime (where TYPE is DOUBLE or FLOAT).

            For integers, Numpy use a classical naive reduction. The C function called is ULONG_add_avx2 on AVX2-compatible machines. It also surprisingly convert items to 64-bit ones if the type is not np.int64.

            Here is the hot part of the assembly code executed by the DOUBLE_pairwise_sum function

            Source https://stackoverflow.com/questions/70134026

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install timeit

            You can download it from GitHub.
            Rust is installed and managed by the rustup tool. Rust has a 6-week rapid release process and supports a great number of platforms, so there are many builds of Rust available at any time. Please refer rust-lang.org for more information.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/gustavla/timeit.git

          • CLI

            gh repo clone gustavla/timeit

          • sshUrl

            git@github.com:gustavla/timeit.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link