cudf | cuDF - GPU DataFrame Library | GPU library

by rapidsai C++ Version: 23.10.0 License: Apache-2.0

X-Ray Key Features Code Snippets(9)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | cudf Summary

cudf is a C++ library typically used in Hardware, GPU, Numpy, Pandas, Spark applications. cudf has no bugs, it has no vulnerabilities, it has a Permissive License and it has medium support. You can download it from GitHub.

Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data. cuDF provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming.

Support

Quality

Security

License

Reuse

Support

cudf has a medium active ecosystem.

It has 5565 star(s) with 701 fork(s). There are 138 watchers for this library.

It had no major release in the last 12 months.

There are 770 open issues and 4675 have been closed. On average issues are closed in 139 days. There are 82 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of cudf is 23.10.0

Quality

cudf has 0 bugs and 0 code smells.

Security

cudf has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

cudf code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

cudf is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

cudf releases are available to install and integrate.

Installation instructions, examples and code snippets are available.

It has 130902 lines of code, 8588 functions and 428 files.

It has high code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of cudf

Get all kandi verified functions for this library.

cudf Key Features

No Key Features are available at this moment for cudf.

cudf Examples and Code Snippets

How do I install cudf using pip?

Lines of Code : 6

License : Strong Copyleft (CC BY-SA 4.0)

Copy

# CUDA 9.2
conda install -c nvidia -c rapidsai -c numba -c conda-forge -c defaults cudf

# CUDA 10.0
conda install -c nvidia/label/cuda10.0 -c rapidsai/label/cuda10.0 -c numba -c conda-forge -c defaults cudf

cuDF - groupby UDF to support datetime

Lines of Code : 32

License : Strong Copyleft (CC BY-SA 4.0)

Copy

import cudf
a = cudf.DataFrame({"col1": [1, 1, 1, 2, 2, 2], "col2": [1, 2, 1, 1, 2, 1], "dt": [10000000, 2000000, 3000000, 100000, 2000000, 40000000]}) 
a['dt'] = a['dt'].astype('datetime64[ns]')
print(a)
a['dt'] = a['dt'].astype('datetime

How to do a matrix dot product between two DataFrame in the GPU with rapids.ai

Lines of Code : 15

License : Strong Copyleft (CC BY-SA 4.0)

Copy

import cudf

a = cudf.DataFrame([[0.1, 0.2, 0.3, 0.4], [0.1, 0.2, 0.3, 0.4]])
b = cudf.DataFrame([[0.1, 0.2], [0.1, 0.2]])
res = cudf.DataFrame.from_gpu_matrix(
    a.values.T.dot(b.values)
)

print(res)
    0   1
0   0.02    0.04
1   0.04

Expected a bytes object, got a 'int' object erro with cudf

Lines of Code : 16

License : Strong Copyleft (CC BY-SA 4.0)

Copy

import cudf
df1 = cudf.DataFrame({'a': [0, 1, 2, 3],'b': [0.1, 0.2, 0.3, 0.4]})
df2 = cudf.DataFrame({'a': [4, 5, 6, 7],'b': [0.1, 0.2, None, 0.3]}) #your new elements
df3= df1.a.append(df2.a)
df3

0    0
1    1
2

How do you determine memory stats while using rapids.ai?

Lines of Code : 17

License : Strong Copyleft (CC BY-SA 4.0)

Copy

import cudf
x = cudf.DataFrame({'x': [1, 2, 3]})
x_usage = x.memory_usage(deep=True)
print(x_usage)

x        24
Index     0
dtype: int64

import pynvml

pynvml.nvmlInit()
handle = pynvml.nv

Convert cuDF data frame column to 1 or 0 for “true”/“false” values

Lines of Code : 16

License : Strong Copyleft (CC BY-SA 4.0)

Copy

import cudf

df = cudf.DataFrame({'a':[0,1,2,3,4]})
df['new'] = df['a'] >= 3
df['new'] = df['new'].astype('int') # could use int8, int32, or int64

# could also do (df['a'] >= 3).astype('int')
df

    a   new
0   0   0
1   1   0
2

Replace values in Column C where value in Column A is x

Lines of Code : 24

License : Strong Copyleft (CC BY-SA 4.0)

Copy

import cudf
df = cudf.DataFrame({'basement_flag': [1, 1, 1, 0],
                     'basementsqft': [400,750,500,0],
                     'fireplace_count': [2, None, None, 1], #<-- added a None to illustrate the targeted nature of the

Query to return record value in column instead of row?

Lines of Code : 40

License : Strong Copyleft (CC BY-SA 4.0)

Copy

SELECT c.ClientID,
   c.LastName,
   c.FirstName,
   c.MiddleName,
   CASE
       WHEN cudf.UserDefinedFieldFormatULink = '93fb3820-38aa-4655-8aad-a8dce8aede' THEN
           cudf.UDF_ReportValue AS 'DA Status'
       WHEN cudf.UserDefined

Query to return record value in column instead of row?

Lines of Code : 33

License : Strong Copyleft (CC BY-SA 4.0)

Copy

SELECT
c.ClientID
, c.LastName
, c.FirstName
, c.MiddleName
, CASE WHEN cudf.UserDefinedFieldFormatULink = '93fb3820-38aa-4655-8aad-a8dce8aede' THEN cudf.UDF_ReportValue --AS 'DA Status'
     WHEN cudf.UserDefinedFieldFormatULink = '2144a7

Community Discussions

Trending Discussions on cudf

How to install cuDF on google colab with GPU Tesla K80?

Is there a way of using the entire memory of my GPU for CUML calculations?

Why do I get a CUDA memory error when using RAPIDS in WSL?

what is the most efficient way to do `diff` for a `cudf`

how to use tqdm progress bar in dask_cudf and cudf

cuPy error : Implicit conversion to a host NumPy array via __array__ is not allowed,

searching index with cudf dataframe doesn't work with numpy

cuDF: an alternative of Pandas Groupby + Shift?

How to create unique ID column in DASK_CUDF

CUML fit functions throwing cp.full TypeError

QUESTION

How to install cuDF on google colab with GPU Tesla K80?

Asked 2022-Mar-10 at 22:05

I am trying to install cuDF on Google Colab for hours. One of the requirements I should install cuDF with GPU Tesla T4. While google colab gives me every time GPU Tesla K80 and I cannot install cuDF. I tried this snippet of code to check what type of GPU I have every time:

...

ANSWER

Answered 2022-Mar-10 at 22:05

The K80 use Kepler GPU architecture, which is not supported by RAPIDS. Colab itself no longer can run the latest versions of RAPIDS. You can try SageMaker Studio Lab for your Try it Now experience. https://github.com/rapidsai-community/rapids-smsl.

Source https://stackoverflow.com/questions/71294926

QUESTION

Is there a way of using the entire memory of my GPU for CUML calculations?

Asked 2022-Jan-13 at 23:57

I am new to the RAPIDS AI world and I decided to try CUML and CUDF out for the first time. I am running UBUNTU 18.04 on WSL 2. My main OS is Windows 11. I have a 64 GB RAM and a laptop RTX 3060 6 GB GPU.

At the time I am writing this post, I am running a TSNE fitting calculation over a CUDF dataframe composed by approximately 26 thousand values, stored in 7 columns (all the values are numerical or binary ones, since the categorical ones have been one hot encoded). While classifiers like LogisticRegression or SVM were really fast, TSNE seems taking a while to output results (it's been more than a hour now, and it is still going on even if the Dataframe is not so big). The task manager is telling me that 100% of GPU is being used for the calculations even if, by running "nvidia-smi" on the windows powershell, the command returns that only 1.94 GB out of a total of 6 GB are currently in use. This seems odd to me since I read papers on RAPIDS AI's TSNE algorithm being 20x faster than the standard scikit-learn one.

I wonder if there is a way of increasing the percentage of dedicated GPU memory to perform faster computations or if it is just an issue related to WSL 2 (probably it limits the GPU usage at just 2 GB).

Any suggestion or thoughts? Many thanks

...

ANSWER

Answered 2022-Jan-13 at 23:57

The task manager is telling me that 100% of GPU is being used for the calculations

I'm not sure if the Windows Task Manager will be able to tell you of GPU throughput that is being achieved for computations.

"nvidia-smi" on the windows powershell, the command returns that only 1.94 GB out of a total of 6 GB are currently in use

Memory utilisation is a different calculation than GPU throughput. Any GPU application will only use as much memory as is requested, and there is no correlation between higher memory usage and higher throughput, unless the application specifically mentions a way that it can achieve higher throughput by using more memory (for example, a different algorithm for the same computation may use more memory).

TSNE seems taking a while to output results (it's been more than a hour now, and it is still going on even if the Dataframe is not so big).

This definitely seems odd, and not the expected behavior for a small dataset. What version of cuML are you using, and what is your method argument for the fit task? Could you also open an issue at www.github.com/rapidsai/cuml/issues with a way to access your dataset so the issue can be reproduced?

Source https://stackoverflow.com/questions/70699806

QUESTION

Why do I get a CUDA memory error when using RAPIDS in WSL?

Asked 2021-Nov-23 at 00:25

I installed WSL 2 (5.10.60.1-microsoft-standard-WSL2) under Windows 21H2 (19044.1348) and using NVidia driver 510.06 with a pascal GPU (1070). I use the default ubuntu version in WSL (20.04.3 LTS) I tried both docker and anaconda versions. I can run the Jupiter Notebook and import the library's. you can also create a cudf Datagramme. but writing to it or ding anything else gives a memory error.

...

ANSWER

Answered 2021-Nov-23 at 00:25

Sadly, RAPIDS on WSL2 only runs on Pascal GPUs with RAPIDS 21.08, but not 21.10 or later. Please try 21.08. It was still experimental with those versions, so YMMV.

Source https://stackoverflow.com/questions/70049128

QUESTION

what is the most efficient way to do `diff` for a `cudf`

Asked 2021-Oct-09 at 18:44

The rapids.ai cudf type is somewhat compatible with pandas, but here is a strange incompatibility. cudf.Series has a .diff() method, but a cudf.DataFrame does not appear to. This is super-annoying (consider, for example, a data frame of stock prices, with columns corresponding to instruments). There are, of course, kludgy ays to get around this (converting to pandas data frame and back comes to mind), but I wonder what the canonical way is. Any advice?

...

ANSWER

Answered 2021-Oct-09 at 18:44

cuDF Python covers a large segment of the pandas API, but there are some gaps (as you've run into here).

Today, the easiest way to run diff on every column and return a dataframe would be the following:

Source https://stackoverflow.com/questions/69509360

QUESTION

how to use tqdm progress bar in dask_cudf and cudf

Asked 2021-Jul-31 at 18:57

I can use tqdm progress bar in pandas for example:

...

ANSWER

Answered 2021-Jul-31 at 18:57

Until progress_apply is available, you would have to implement an equivalent yourself (e.g. using apply_chunks). Just a sketch of the code:

Source https://stackoverflow.com/questions/68603875

QUESTION

cuPy error : Implicit conversion to a host NumPy array via __array__ is not allowed,

Asked 2021-Jul-27 at 13:28

Getting this error while converting array to cuPy array: TypeError: Implicit conversion to a host NumPy array via array is not allowed, To explicitly construct a GPU array, consider using cupy.asarray(...) To explicitly construct a host array, consider using .to_array()

...

ANSWER

Answered 2021-Jul-27 at 13:28

Converting a cuDF series into a CuPy array with cupy.asarray(series) requires CuPy-compatible data types. You may want to double check your Series is int, float, or bool typed, rather than string, decimal, list, or struct typed.

Source https://stackoverflow.com/questions/68544183

QUESTION

searching index with cudf dataframe doesn't work with numpy

Asked 2021-Jul-08 at 02:31

I just loaded the csv file with cudf (rapidsai) to reduce the time it takes. An issue comes up when I try to search index with an condition where df['X'] = A.

here is my code example:

...

ANSWER

Answered 2021-Jul-08 at 02:31

cuDF is trying to dispatch from numpy.where to cupy.where via the array function protocol. For one reason or another, cuDF is not able to successfully run the dispatched function in this case.

In general, the recommendation would be to explicitly use CuPy rather than numpy here.

Source https://stackoverflow.com/questions/68294729

QUESTION

cuDF: an alternative of Pandas Groupby + Shift?

Asked 2021-May-28 at 00:10

I have a DF that I want to use Groupby + Shift. I can do this in pandas, but I cannot do it in cuDF because it is not implemented yet: see the issue Issue #7183. The feature request was long ago, so it seems like they will not implement this in the near future. Is there any alternative way?

...

ANSWER

Answered 2021-May-28 at 00:10

UPDATE: RAPIDS just finished merging groupby.shift() into cudf.
Please try it out in the 21.06 nightlies!

Previous Post: This is currently planned to be implemented in 0.20.

Source https://stackoverflow.com/questions/66863973

QUESTION

How to create unique ID column in DASK_CUDF

Asked 2021-May-19 at 13:17

How to create unique id column in dsak cudf dataframe across all the partitions So far I am using following technique, but if I increase data to more than 10cr rows it is giving me memory error.

...

ANSWER

Answered 2021-May-19 at 08:43

The reason why you are running into memory error is this step:

Source https://stackoverflow.com/questions/67599701

QUESTION

CUML fit functions throwing cp.full TypeError

Asked 2021-May-06 at 17:13

I've been trying to run RAPIDS on Google Colab pro, and have successfully installed the cuml and cudf packages, however I am unable to run even the example scripts.

TLDR;

Anytime I try to run the fit function for cuml on Google Colab I get the following error. I get this when using the demo examples both for installation and then for cuml. This happens for a range of cuml examples (I first hit this trying to run UMAP).

...

ANSWER

Answered 2021-May-06 at 17:13

Colab retains cupy==7.4.0 despite conda installing cupy==8.6.0 during the RAPIDS install. It is a custom install. I just had success pip installing cupy-cuda110==8.6.0 BEFORE installing RAPIDS, with

!pip install cupy-cuda110==8.6.0:

I'll be updating the script soon so that you won't have to do it manually, but want to test a few more things out. Thanks again for letting us know!

EDIT: script updated.

Source https://stackoverflow.com/questions/67368715

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install cudf

Please see the Demo Docker Repository, choosing a tag based on the NVIDIA CUDA version you’re running. This provides a ready to run Docker container with example notebooks and data, showcasing how you can utilize cuDF.