perfplot | Performance analysis for Python snippets | Data Manipulation library

 by   nschloe Python Version: 0.10.2 License: GPL-3.0

kandi X-RAY | perfplot Summary

kandi X-RAY | perfplot Summary

perfplot is a Python library typically used in Utilities, Data Manipulation, Numpy applications. perfplot has no bugs, it has no vulnerabilities, it has a Strong Copyleft License and it has high support. However perfplot build file is not available. You can install using 'pip install perfplot' or download it from GitHub, PyPI.

perfplot extends Python's timeit by testing snippets with input parameters (e.g., the size of an array) and plotting the results. For example, to compare different NumPy array concatenation methods, the script. Clearly, stack and vstack are the best options for large arrays. (By default, perfplot asserts the equality of the output of all snippets, too.). If your plot takes a while to generate, you can also use. with the same arguments as above. It will plot the updates live.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              perfplot has a highly active ecosystem.
              It has 1159 star(s) with 61 fork(s). There are 18 watchers for this library.
              OutlinedDot
              It had no major release in the last 12 months.
              There are 10 open issues and 34 have been closed. On average issues are closed in 31 days. There are 1 open pull requests and 0 closed requests.
              It has a positive sentiment in the developer community.
              The latest version of perfplot is 0.10.2

            kandi-Quality Quality

              perfplot has 0 bugs and 0 code smells.

            kandi-Security Security

              perfplot has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              perfplot code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              perfplot is licensed under the GPL-3.0 License. This license is Strong Copyleft.
              Strong Copyleft licenses enforce sharing, and you can use them when creating open source projects.

            kandi-Reuse Reuse

              perfplot releases are available to install and integrate.
              Deployable package is available in PyPI.
              perfplot has no build file. You will be need to create the build yourself to build the component from source.
              Installation instructions, examples and code snippets are available.
              perfplot saves you 160 person hours of effort in developing the same functionality from scratch.
              It has 587 lines of code, 36 functions and 7 files.
              It has high code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed perfplot and discovered the below as its top functions. This is intended to give you an instant insight into perfplot implemented functionality, and help decide if they suit your requirements.
            • Wrapper around bench save
            • Plot the spectrum
            • Benchmark a set of kernels
            • Convert time_s
            • Save the plot
            • Plot a time series
            • Displays a benchmark
            • Show the plot
            Get all kandi verified functions for this library.

            perfplot Key Features

            No Key Features are available at this moment for perfplot.

            perfplot Examples and Code Snippets

            What is the complexity of str() function in Python3?
            Pythondot img1Lines of Code : 9dot img1License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            import perfplot
            
            perfplot.show(
                setup=lambda n: 10**n,
                kernels=[str],
                n_range=range(1, 1001, 10),
                xlabel='number of digits',
            )
            
            Filter out everything before a condition is met, keep all elements after
            Pythondot img2Lines of Code : 73dot img2License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            out = next((p[i:] for i, item in enumerate(p) if item > 18), [])
            
            [20, 13, 29, 3, 39]
            
            import perfplot
            import numpy as np
            import pandas as pd
            import random
            from itertools import dropwhile
            
            Most computationally efficient way to count consecutive repeating values
            Pythondot img3Lines of Code : 40dot img3License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            out = [ar.sum() for ar in np.split(a2, np.where(np.diff(a2.astype(int), prepend=0)==1)[0])[1:]]
            
            idx = np.where(np.diff(a2.astype(int), prepend=0)==1)[0]
            out = [len(a2[i:j][a2[i:j]]) for i,j in zip(idx, idx[1:])] + 
            Split an array according to cluster labels
            Pythondot img4Lines of Code : 54dot img4License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            def get_clusters(X, y):
                s = np.argsort(y)
                return np.split(X[s], np.unique(y[s], return_index=True)[1][1:])
            
            import numpy as np
            from typing import List
            
            def get_clusters(X: np.ndarray, y: np.ndarray) -> Li
            Group by float values by range
            Pythondot img5Lines of Code : 76dot img5License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            from sklearn.cluster import AgglomerativeClustering
            
            def quantize(df, tolerance=0.005):
                # df: DataFrame with only the column(s) to quantize
                model = AgglomerativeClustering(distance_threshold=2 * tolerance, linkage='complete',
                 
            How to add measured input values as x-axis labels in generated chart?
            Pythondot img6Lines of Code : 26dot img6License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            import numpy as np
            import perfplot
            import matplotlib.pyplot as plt
            import matplotlib.ticker as mt
            
            n_range = [16, 512, 16384, 524288, 16777216]
            
            perfplot.plot(
                setup=lambda n: np.random.rand(n),
                kernels=[
                    lambda a: np.c_[a,
            Groupby Roll up or Roll Down for any kind of aggregates
            Pythondot img7Lines of Code : 169dot img7License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            # think map-reduce: first map, then reduce (arbitrary number of times), then map to result
            
            myfuncs = {
                'sum': [sum, sum],
                'prod': ['prod', 'prod'],
                'count': ['count', sum],
                'set': [set, lambda g: set.union(*g)],
                'list'
            Best way to get split from dataframe column values
            Pythondot img8Lines of Code : 42dot img8License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            import perfplot
            import pandas as pd
            import numpy as np
            
            def list_comp(s):
                return [x.split() for x in s]
                # If you want an equality check
                #return pd.Series([x.split() for x in s], index=s.index)
            
            def series_apply(s):
                return s
            Unique combination of two columns with mixed values
            Pythondot img9Lines of Code : 37dot img9License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            import numpy as np
            
            df['key'] = np.sort(df.to_numpy(), axis=1).sum(1)
            
            #  Col1 Col2 key
            #0    a    b  ab
            #1    c    d  cd
            #2    b    a  ab
            #3    e    f  ef
            
            import perfplot
            import pandas as pd
            import numpy as np
            fro
            How to apply multiple condition on rows without changing the result from the previous condition?
            Pythondot img10Lines of Code : 71dot img10License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            splits = df['Cond'].str.rsplit("_", expand=True)
            df['New'] = np.select(
                [df['Samp'].eq('Org') | df['Samp'].eq('Sea'), df['Samp'].eq('Paid')],
                [splits[0], splits[2]]
            )
            
               0  1  2
            0  A  B  C
            1  A  B  C
            2  A  

            Community Discussions

            QUESTION

            Numpy create array from list of integers specifying the array elements
            Asked 2022-Feb-15 at 18:09

            Say we have the list a = [3,4,2,1], from a we want to obtain the following array: b = [0,0,0,1,1,1,1,2,2,3].

            I have managed to do it like this:

            ...

            ANSWER

            Answered 2022-Feb-15 at 17:38

            Try np.repeat: np.repeat([0,1,2,3], [3,4,2,1])

            Source https://stackoverflow.com/questions/71130904

            QUESTION

            What is a fast way to read a matrix from a CSV file to NumPy if the size is known in advance?
            Asked 2022-Feb-04 at 01:42

            I was tired of waiting while loading a simple distance matrix from a csv file using numpy.genfromtxt. Following another SO question, I performed a perfplot test, while including some additional methods. The results (source code at the end):

            The result for the largest input size shows that the best method is read_csv, which is this:

            ...

            ANSWER

            Answered 2022-Feb-04 at 01:42

            Parsing CSV files correctly while supporting several data types (eg. floating-point numbers, integers, strings) and possibly ill-formed input files is clearly not easy, and doing so efficiently is actually pretty hard. Moreover, decoding UTF-8 strings is also much slower than reading directly ASCII strings. This is the reasons why most CSV libraries are pretty slow. Not to mention wrapping library in Python could introduce pretty big overheads regarding the input types (especially string).

            Hopefully, if you need to read a CSV file containing a square matrix of integers that is assumed to be correctly formed, then you can write a much faster specific code dedicated to your needs (which does not care about floating-point numbers, strings, UTF-8, header decoding, error handling, etc.).

            That being said, any call to a basic CPython function tends to introduce a huge overhead. Even a simple call to open+read is relatively slow (the binary mode is significantly faster than the text mode but unfortunately not so fast). The trick is to use Numpy to load the whole binary file in RAM with np.fromfile. This function is extremely fast: it just read the whole file at once, put its binary content in a raw memory buffer and return a view on it. When the file is in the operating system cache or a high-throughput NVMe SSD storage device, it can load the file at the speed of several GiB/s.

            One the file is loaded, you can decode it with Numba (or Cython) so the decoding can be nearly as fast as a native code. Note that Numba does not support well/efficiently strings/bytes. Hopefully, the function np.fromfile produces a contiguous byte array and Numba can compute it very quickly. You can know the size of the matrix by just reading the first line and counting the number of comma. Then you can fill the matrix very efficiently by decoding integer on-the-fly, packing them in a flatten matrix and just consider end-of-line characters as regular separators. Note that \r and \n can both appear in the file since the file is read in binary mode.

            Here is the resulting implementation:

            Source https://stackoverflow.com/questions/70972526

            QUESTION

            How to add measured input values as x-axis labels in generated chart?
            Asked 2021-Oct-06 at 20:59

            I'm using perfplot to make only a few measurements. I would like to see measured input values as x-axis labels similarly to generated y-axis labels.

            Currently I see 10^2 10^3 10^4 10^5 10^6 10^7 as x-axis labels.

            I want to have 16 512 16384 524288 16777216 as x-axis labels.

            perfplot uses internally matplotlib, so I think it should be possible to achieve.

            Example code:

            ...

            ANSWER

            Answered 2021-Oct-06 at 20:59

            You can use plot instead of show to get access to the current axes object after perfplot has finished and then set ticks as needed:

            Source https://stackoverflow.com/questions/69470896

            QUESTION

            Precomputing strided access pattern to array gives worse performance?
            Asked 2021-Jun-03 at 15:36

            I have a written a c-extension for the numpy library which is used for computing a specific type of bincount. From the lack of a better name, let's call it fast_compiled and place the method signature in numpy/core/src/multiarray/multiarraymodule.c inside array_module_methods:

            ...

            ANSWER

            Answered 2021-Jun-01 at 14:18

            fast_compiled is faster than fast_compiled_strides because it works on contiguous data known at compile time enabling compilers to use SIMD instructions (eg. typically SSE on x86-like platforms or NEON on ARM ones). It should also be faster because of less data cache to retrieve from the L1 cache (more fetches are needed due to the indirection).

            Indeed, dans[j] += weights[k] can be vectorized by loading m items of dans and m items of weights adding the m items using one instruction and storing the m items back in dans. This solution is efficient and cache friendly.

            dans[strides[i]] += weights[i] cannot be efficiently vectorized on most mainstream hardware. The processor need to perform a costly gather from the memory hierarchy due to the indirection, then do the sum and then perform a scatter store which is also expensive. Even if strides would contain contiguous indices, the instructions are generally much more expensive than loading a contiguous block of data from memory. Moreover, compiler often fail to vectorize the code or just find that this is not worth using SIMD instruction in that case. As a result the generated code is likely a less efficient scalar code.

            Actually, the performance difference between the two codes should be bigger on modern processors with good compilation flags. I suspect you only use SSE on a x86 processor here and so the speed up is close to 2 theoretically since 2 double-precision floating-point numbers can be computed in a row. However, using AVX/AVX-2 would lead to a speed up close to 4 theoretically (as 4 numbers can be computed in a row). Very recent Intel processors can even compute 8 double-precision floating-point numbers in a row. Note that computing simple-precision floating-point numbers can also results in a theoretical 2x speed up. The same apply for other architecture like ARM with NEON and SVE instruction sets or POWER. Since future processors will likely use wider SIMD registers (because of their efficiency), it is very important to write SIMD-friendly codes.

            Source https://stackoverflow.com/questions/67787501

            QUESTION

            How to prevent perfplot (matplotlib) graph labels from being truncated?
            Asked 2020-Dec-26 at 13:31

            I'm working with the perfplot library (which you can pip-install) which benchmarks functions and plots their performance.

            When observing the plotted graphs, the labels are truncated. How can I prevent this?

            Here's a simple MCVE:

            ...

            ANSWER

            Answered 2020-Dec-26 at 13:31

            perfplot seems to use matplotlib for the display. According to the github site, you can separate calculation and plotting, giving you the possibility to inject an autoformat (basically plt.tight_layout()) with rcParams for this graph.

            You can add the following before your script:

            Source https://stackoverflow.com/questions/65456241

            QUESTION

            Perfplot raised a "TypeError: bench() got an unexpected keyword argument 'logx'". How to fix?
            Asked 2020-May-10 at 12:14

            After a search on SO for numpy array mixed dtype filling I found a nice little numpy array fill performance tester perfplot. When the posted code answer from Nico Schlömer was ran, I saw a dip in the performance chart. So I changed the perflot.show(..snippet..) to perflot.bench(..snippet..) as suggest here and got the following error:

            ...

            ANSWER

            Answered 2020-Jan-15 at 23:56

            After a dive into perfplot main.py I figured out there is no logx' and logy **kwargs available.

            My solution:

            Source https://stackoverflow.com/questions/59761149

            QUESTION

            Create dict from a string or list
            Asked 2020-Feb-12 at 01:40
            Background

            I want to generate a hash table for a given string or given list. The hash table treat element as key and showup times as value. For instance:

            ...

            ANSWER

            Answered 2020-Feb-11 at 13:03

            The best way would be to use the built in counter, otherwise, you may use defualtdict which is quite similar to your second attempt

            Source https://stackoverflow.com/questions/60169387

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install perfplot

            perfplot is available from the Python Package Index, so simply do.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            Install
          • PyPI

            pip install perfplot

          • CLONE
          • HTTPS

            https://github.com/nschloe/perfplot.git

          • CLI

            gh repo clone nschloe/perfplot

          • sshUrl

            git@github.com:nschloe/perfplot.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link