blazingsql | GPU accelerated , SQL engine | GPU library
kandi X-RAY | blazingsql Summary
kandi X-RAY | blazingsql Summary
A lightweight, GPU accelerated, SQL engine built on the RAPIDS.ai ecosystem. Getting Started | Documentation | Examples | Contributing | License | Blog | Try Now. BlazingSQL is a GPU accelerated SQL engine built on top of the RAPIDS ecosystem. RAPIDS is based on the Apache Arrow columnar memory format, and cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data. BlazingSQL is a SQL interface for cuDF, with various features to support large scale data science workflows and enterprise datasets. Try our 5-min Welcome Notebook to start using BlazingSQL and RAPIDS AI.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of blazingsql
blazingsql Key Features
blazingsql Examples and Code Snippets
Community Discussions
Trending Discussions on blazingsql
QUESTION
I've been trying to run RAPIDS on Google Colab pro, and have successfully installed the cuml and cudf packages, however I am unable to run even the example scripts.
TLDR;Anytime I try to run the fit function for cuml on Google Colab I get the following error. I get this when using the demo examples both for installation and then for cuml. This happens for a range of cuml examples (I first hit this trying to run UMAP).
...ANSWER
Answered 2021-May-06 at 17:13Colab retains cupy==7.4.0
despite conda installing cupy==8.6.0
during the RAPIDS install. It is a custom install. I just had success pip installing cupy-cuda110==8.6.0
BEFORE installing RAPIDS, with
!pip install cupy-cuda110==8.6.0
:
I'll be updating the script soon so that you won't have to do it manually, but want to test a few more things out. Thanks again for letting us know!
EDIT: script updated.
QUESTION
I was trying to pip install blazingsql. However, I keep getting the following error:
...ANSWER
Answered 2020-Sep-08 at 19:29There is no pip package for BlazingSQL or the rest of the rapids eco system.
I just tried the exact same command conda install -c blazingsql/label/cuda10.2 -c blazingsql -c rapidsai -c nvidia -c conda-forge -c defaults blazingsql python=3.7
in a newly created conda environment and had no issues. There is no currently prescribed way of doing this outside of a conda environment right now though people have figured out work arounds for different architectures like PowerPC.
Can you submit an issue on github (https://github.com/blazingdb/blazingsql/issues) and provide more information about the OS, driver version, and your cuda version?
QUESTION
I'm trying to understand if BlazingSQL is a competitor or complementary to dask.
I have some medium-sized data (10-50GB) saved as parquet files on Azure blob storage.
IIUC I can query, join, aggregate, groupby with BlazingSQL using SQL syntax, but I can also read the data into CuDF using dask_cudf
and do all same operations using python/dataframe syntax.
So, it seems to me that they're direct competitors?
Is it correct that (one of) the benefits of using dask is that it can operate on partitions so can operate on datasets larger than GPU memory whereas BlazingSQL is limited to what can fit on the GPU?
Why would one choose to use BlazingSQL rather than dask?
Edit:
The docs talk about dask_cudf
but the actual repo is archived saying that dask support is now in cudf
itself. It would be good to know how to leverage dask
to operate on larger-than-gpu-memory datasets with cudf
ANSWER
Answered 2020-Jan-18 at 04:54Full disclosure I'm a co-founder of BlazingSQL.
BlazingSQL and Dask are not competitive, in fact you need Dask to use BlazingSQL in a distributed context. All distibured BlazingSQL results return dask_cudf result sets, so you can then continuer operations on said results in python/dataframe syntax. To your point, you are correct on two counts:
- BlazingSQL is currently limited to GPU memory, and actually some system memory by leveraging CUDA's Unified Virtual Memory. That will change soon, we are estimating around v0.13 which is scheduled for an early March release. Upon that release, memory will spill off and cache to system memory, local drives, or even our supported storage plugins such as AWS S3, Google Cloud Storage, and HDFS.
- You can totally write SQL operations as dask_cudf functions, but it is incumbent on the user to know all of those functions, and optimize their usage of them. SQL has a variety of benefits in that it is more accessible (more people know it, and it's very easy to learn), and there is a great deal of research around optimizing SQL (cost-based optimizers for example) for running queries at scale.
If you wish to make RAPIDS accessible to more users SQL is a pretty easy onboarding process, and it's very easy to optimize for because of the reduced scope necessary to optimize SQL operations over Dask which has many other considerations.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install blazingsql
BlazingSQL can be installed with conda (miniconda, or the full Anaconda distribution) from the blazingsql channel:.
Where $CUDA_VERSION is is 10.1, 10.2 or 11.0 and $PYTHON_VERSION is 3.7 or 3.8 For example for CUDA 10.1 and Python 3.7:.
The build process will checkout the BlazingSQL repository and will build and install into the conda environment. NOTE: You can do ./build.sh -h to see more build options. $CONDA_PREFIX now has a folder for the blazingsql repository.
For nightly version cuda 11+ are only supported, see https://github.com/rapidsai/cudf#cudagpu-requirements.
The build process will checkout the BlazingSQL repository and will build and install into the conda environment. NOTE: You can do ./build.sh -h to see more build options. NOTE: You can perform static analysis with cppcheck with the command cppcheck --project=compile_commands.json in any of the cpp project build directories. $CONDA_PREFIX now has a folder for the blazingsql repository.
By disabling the storage plugins you don't need to install mysql-connector-cpp=8.0.23 libpq=13 sqlite=3 (neither any of its dependencies).
Currenlty we support only MySQL. but PostgreSQL and SQLite will be ready for the next version!
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page