cudf | cuDF - GPU DataFrame Library | GPU library

 by   rapidsai C++ Version: 23.10.0 License: Apache-2.0

kandi X-RAY | cudf Summary

kandi X-RAY | cudf Summary

cudf is a C++ library typically used in Hardware, GPU, Numpy, Pandas, Spark applications. cudf has no bugs, it has no vulnerabilities, it has a Permissive License and it has medium support. You can download it from GitHub.

Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data. cuDF provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              cudf has a medium active ecosystem.
              It has 5565 star(s) with 701 fork(s). There are 138 watchers for this library.
              OutlinedDot
              It had no major release in the last 12 months.
              There are 770 open issues and 4675 have been closed. On average issues are closed in 139 days. There are 82 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of cudf is 23.10.0

            kandi-Quality Quality

              cudf has 0 bugs and 0 code smells.

            kandi-Security Security

              cudf has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              cudf code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              cudf is licensed under the Apache-2.0 License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              cudf releases are available to install and integrate.
              Installation instructions, examples and code snippets are available.
              It has 130902 lines of code, 8588 functions and 428 files.
              It has high code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of cudf
            Get all kandi verified functions for this library.

            cudf Key Features

            No Key Features are available at this moment for cudf.

            cudf Examples and Code Snippets

            How do I install cudf using pip?
            Lines of Code : 6dot img1License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            # CUDA 9.2
            conda install -c nvidia -c rapidsai -c numba -c conda-forge -c defaults cudf
            
            # CUDA 10.0
            conda install -c nvidia/label/cuda10.0 -c rapidsai/label/cuda10.0 -c numba -c conda-forge -c defaults cudf
            
            cuDF - groupby UDF to support datetime
            Lines of Code : 32dot img2License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            import cudf
            a = cudf.DataFrame({"col1": [1, 1, 1, 2, 2, 2], "col2": [1, 2, 1, 1, 2, 1], "dt": [10000000, 2000000, 3000000, 100000, 2000000, 40000000]}) 
            a['dt'] = a['dt'].astype('datetime64[ns]')
            print(a)
            a['dt'] = a['dt'].astype('datetime
            How to do a matrix dot product between two DataFrame in the GPU with rapids.ai
            Lines of Code : 15dot img3License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            import cudf
            
            a = cudf.DataFrame([[0.1, 0.2, 0.3, 0.4], [0.1, 0.2, 0.3, 0.4]])
            b = cudf.DataFrame([[0.1, 0.2], [0.1, 0.2]])
            res = cudf.DataFrame.from_gpu_matrix(
                a.values.T.dot(b.values)
            )
            
            print(res)
                0   1
            0   0.02    0.04
            1   0.04
            Expected a bytes object, got a 'int' object erro with cudf
            Lines of Code : 16dot img4License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            import cudf
            df1 = cudf.DataFrame({'a': [0, 1, 2, 3],'b': [0.1, 0.2, 0.3, 0.4]})
            df2 = cudf.DataFrame({'a': [4, 5, 6, 7],'b': [0.1, 0.2, None, 0.3]}) #your new elements
            df3= df1.a.append(df2.a)
            df3
            
            0    0
            1    1
            2  
            How do you determine memory stats while using rapids.ai?
            Lines of Code : 17dot img5License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            import cudf
            x = cudf.DataFrame({'x': [1, 2, 3]})
            x_usage = x.memory_usage(deep=True)
            print(x_usage)
            
            x        24
            Index     0
            dtype: int64
            
            import pynvml
            ​
            pynvml.nvmlInit()
            handle = pynvml.nv
            Convert cuDF data frame column to 1 or 0 for “true”/“false” values
            Lines of Code : 16dot img6License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            import cudf
            
            df = cudf.DataFrame({'a':[0,1,2,3,4]})
            df['new'] = df['a'] >= 3
            df['new'] = df['new'].astype('int') # could use int8, int32, or int64
            
            # could also do (df['a'] >= 3).astype('int')
            df
            
                a   new
            0   0   0
            1   1   0
            2   
            Replace values in Column C where value in Column A is x
            Lines of Code : 24dot img7License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            import cudf
            df = cudf.DataFrame({'basement_flag': [1, 1, 1, 0],
                                 'basementsqft': [400,750,500,0],
                                 'fireplace_count': [2, None, None, 1], #<-- added a None to illustrate the targeted nature of the
            Query to return record value in column instead of row?
            Lines of Code : 40dot img8License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            SELECT c.ClientID,
               c.LastName,
               c.FirstName,
               c.MiddleName,
               CASE
                   WHEN cudf.UserDefinedFieldFormatULink = '93fb3820-38aa-4655-8aad-a8dce8aede' THEN
                       cudf.UDF_ReportValue AS 'DA Status'
                   WHEN cudf.UserDefined
            Query to return record value in column instead of row?
            Lines of Code : 33dot img9License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            SELECT
            c.ClientID
            , c.LastName
            , c.FirstName
            , c.MiddleName
            , CASE WHEN cudf.UserDefinedFieldFormatULink = '93fb3820-38aa-4655-8aad-a8dce8aede' THEN cudf.UDF_ReportValue --AS 'DA Status'
                 WHEN cudf.UserDefinedFieldFormatULink = '2144a7

            Community Discussions

            QUESTION

            How to install cuDF on google colab with GPU Tesla K80?
            Asked 2022-Mar-10 at 22:05

            I am trying to install cuDF on Google Colab for hours. One of the requirements I should install cuDF with GPU Tesla T4. While google colab gives me every time GPU Tesla K80 and I cannot install cuDF. I tried this snippet of code to check what type of GPU I have every time:

            ...

            ANSWER

            Answered 2022-Mar-10 at 22:05

            The K80 use Kepler GPU architecture, which is not supported by RAPIDS. Colab itself no longer can run the latest versions of RAPIDS. You can try SageMaker Studio Lab for your Try it Now experience. https://github.com/rapidsai-community/rapids-smsl.

            Source https://stackoverflow.com/questions/71294926

            QUESTION

            Is there a way of using the entire memory of my GPU for CUML calculations?
            Asked 2022-Jan-13 at 23:57

            I am new to the RAPIDS AI world and I decided to try CUML and CUDF out for the first time. I am running UBUNTU 18.04 on WSL 2. My main OS is Windows 11. I have a 64 GB RAM and a laptop RTX 3060 6 GB GPU.

            At the time I am writing this post, I am running a TSNE fitting calculation over a CUDF dataframe composed by approximately 26 thousand values, stored in 7 columns (all the values are numerical or binary ones, since the categorical ones have been one hot encoded). While classifiers like LogisticRegression or SVM were really fast, TSNE seems taking a while to output results (it's been more than a hour now, and it is still going on even if the Dataframe is not so big). The task manager is telling me that 100% of GPU is being used for the calculations even if, by running "nvidia-smi" on the windows powershell, the command returns that only 1.94 GB out of a total of 6 GB are currently in use. This seems odd to me since I read papers on RAPIDS AI's TSNE algorithm being 20x faster than the standard scikit-learn one.

            I wonder if there is a way of increasing the percentage of dedicated GPU memory to perform faster computations or if it is just an issue related to WSL 2 (probably it limits the GPU usage at just 2 GB).

            Any suggestion or thoughts? Many thanks

            ...

            ANSWER

            Answered 2022-Jan-13 at 23:57

            The task manager is telling me that 100% of GPU is being used for the calculations

            I'm not sure if the Windows Task Manager will be able to tell you of GPU throughput that is being achieved for computations.

            "nvidia-smi" on the windows powershell, the command returns that only 1.94 GB out of a total of 6 GB are currently in use

            Memory utilisation is a different calculation than GPU throughput. Any GPU application will only use as much memory as is requested, and there is no correlation between higher memory usage and higher throughput, unless the application specifically mentions a way that it can achieve higher throughput by using more memory (for example, a different algorithm for the same computation may use more memory).

            TSNE seems taking a while to output results (it's been more than a hour now, and it is still going on even if the Dataframe is not so big).

            This definitely seems odd, and not the expected behavior for a small dataset. What version of cuML are you using, and what is your method argument for the fit task? Could you also open an issue at www.github.com/rapidsai/cuml/issues with a way to access your dataset so the issue can be reproduced?

            Source https://stackoverflow.com/questions/70699806

            QUESTION

            Why do I get a CUDA memory error when using RAPIDS in WSL?
            Asked 2021-Nov-23 at 00:25

            I installed WSL 2 (5.10.60.1-microsoft-standard-WSL2) under Windows 21H2 (19044.1348) and using NVidia driver 510.06 with a pascal GPU (1070). I use the default ubuntu version in WSL (20.04.3 LTS) I tried both docker and anaconda versions. I can run the Jupiter Notebook and import the library's. you can also create a cudf Datagramme. but writing to it or ding anything else gives a memory error.

            ...

            ANSWER

            Answered 2021-Nov-23 at 00:25

            Sadly, RAPIDS on WSL2 only runs on Pascal GPUs with RAPIDS 21.08, but not 21.10 or later. Please try 21.08. It was still experimental with those versions, so YMMV.

            Source https://stackoverflow.com/questions/70049128

            QUESTION

            what is the most efficient way to do `diff` for a `cudf`
            Asked 2021-Oct-09 at 18:44

            The rapids.ai cudf type is somewhat compatible with pandas, but here is a strange incompatibility. cudf.Series has a .diff() method, but a cudf.DataFrame does not appear to. This is super-annoying (consider, for example, a data frame of stock prices, with columns corresponding to instruments). There are, of course, kludgy ays to get around this (converting to pandas data frame and back comes to mind), but I wonder what the canonical way is. Any advice?

            ...

            ANSWER

            Answered 2021-Oct-09 at 18:44

            cuDF Python covers a large segment of the pandas API, but there are some gaps (as you've run into here).

            Today, the easiest way to run diff on every column and return a dataframe would be the following:

            Source https://stackoverflow.com/questions/69509360

            QUESTION

            how to use tqdm progress bar in dask_cudf and cudf
            Asked 2021-Jul-31 at 18:57

            I can use tqdm progress bar in pandas for example:

            ...

            ANSWER

            Answered 2021-Jul-31 at 18:57

            Until progress_apply is available, you would have to implement an equivalent yourself (e.g. using apply_chunks). Just a sketch of the code:

            Source https://stackoverflow.com/questions/68603875

            QUESTION

            cuPy error : Implicit conversion to a host NumPy array via __array__ is not allowed,
            Asked 2021-Jul-27 at 13:28

            Getting this error while converting array to cuPy array: TypeError: Implicit conversion to a host NumPy array via array is not allowed, To explicitly construct a GPU array, consider using cupy.asarray(...) To explicitly construct a host array, consider using .to_array()

            ...

            ANSWER

            Answered 2021-Jul-27 at 13:28

            Converting a cuDF series into a CuPy array with cupy.asarray(series) requires CuPy-compatible data types. You may want to double check your Series is int, float, or bool typed, rather than string, decimal, list, or struct typed.

            Source https://stackoverflow.com/questions/68544183

            QUESTION

            searching index with cudf dataframe doesn't work with numpy
            Asked 2021-Jul-08 at 02:31

            I just loaded the csv file with cudf (rapidsai) to reduce the time it takes. An issue comes up when I try to search index with an condition where df['X'] = A.

            here is my code example:

            ...

            ANSWER

            Answered 2021-Jul-08 at 02:31

            cuDF is trying to dispatch from numpy.where to cupy.where via the array function protocol. For one reason or another, cuDF is not able to successfully run the dispatched function in this case.

            In general, the recommendation would be to explicitly use CuPy rather than numpy here.

            Source https://stackoverflow.com/questions/68294729

            QUESTION

            cuDF: an alternative of Pandas Groupby + Shift?
            Asked 2021-May-28 at 00:10

            I have a DF that I want to use Groupby + Shift. I can do this in pandas, but I cannot do it in cuDF because it is not implemented yet: see the issue Issue #7183. The feature request was long ago, so it seems like they will not implement this in the near future. Is there any alternative way?

            ...

            ANSWER

            Answered 2021-May-28 at 00:10

            UPDATE: RAPIDS just finished merging groupby.shift() into cudf.
            Please try it out in the 21.06 nightlies!

            Previous Post: This is currently planned to be implemented in 0.20.

            Source https://stackoverflow.com/questions/66863973

            QUESTION

            How to create unique ID column in DASK_CUDF
            Asked 2021-May-19 at 13:17

            How to create unique id column in dsak cudf dataframe across all the partitions So far I am using following technique, but if I increase data to more than 10cr rows it is giving me memory error.

            ...

            ANSWER

            Answered 2021-May-19 at 08:43

            The reason why you are running into memory error is this step:

            Source https://stackoverflow.com/questions/67599701

            QUESTION

            CUML fit functions throwing cp.full TypeError
            Asked 2021-May-06 at 17:13

            I've been trying to run RAPIDS on Google Colab pro, and have successfully installed the cuml and cudf packages, however I am unable to run even the example scripts.

            TLDR;

            Anytime I try to run the fit function for cuml on Google Colab I get the following error. I get this when using the demo examples both for installation and then for cuml. This happens for a range of cuml examples (I first hit this trying to run UMAP).

            ...

            ANSWER

            Answered 2021-May-06 at 17:13

            Colab retains cupy==7.4.0 despite conda installing cupy==8.6.0 during the RAPIDS install. It is a custom install. I just had success pip installing cupy-cuda110==8.6.0 BEFORE installing RAPIDS, with

            !pip install cupy-cuda110==8.6.0:

            I'll be updating the script soon so that you won't have to do it manually, but want to test a few more things out. Thanks again for letting us know!

            EDIT: script updated.

            Source https://stackoverflow.com/questions/67368715

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install cudf

            Please see the Demo Docker Repository, choosing a tag based on the NVIDIA CUDA version you’re running. This provides a ready to run Docker container with example notebooks and data, showcasing how you can utilize cuDF.

            Support

            Please see our guide for contributing to cuDF.
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            Install
            Maven
            Gradle
            CLONE
          • HTTPS

            https://github.com/rapidsai/cudf.git

          • CLI

            gh repo clone rapidsai/cudf

          • sshUrl

            git@github.com:rapidsai/cudf.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Explore Related Topics

            Consider Popular GPU Libraries

            taichi

            by taichi-dev

            gpu.js

            by gpujs

            hashcat

            by hashcat

            cupy

            by cupy

            EASTL

            by electronicarts

            Try Top Libraries by rapidsai

            cuml

            by rapidsaiC++

            cugraph

            by rapidsaiPython

            cusignal

            by rapidsaiPython

            notebooks

            by rapidsaiShell

            jupyterlab-nvdashboard

            by rapidsaiPython