blazingsql | GPU accelerated , SQL engine | GPU library

 by   BlazingDB C++ Version: v21.08.00 License: Apache-2.0

kandi X-RAY | blazingsql Summary

kandi X-RAY | blazingsql Summary

blazingsql is a C++ library typically used in Hardware, GPU, Spark applications. blazingsql has no bugs, it has no vulnerabilities, it has a Permissive License and it has medium support. You can download it from GitHub.

A lightweight, GPU accelerated, SQL engine built on the RAPIDS.ai ecosystem. Getting Started | Documentation | Examples | Contributing | License | Blog | Try Now. BlazingSQL is a GPU accelerated SQL engine built on top of the RAPIDS ecosystem. RAPIDS is based on the Apache Arrow columnar memory format, and cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data. BlazingSQL is a SQL interface for cuDF, with various features to support large scale data science workflows and enterprise datasets. Try our 5-min Welcome Notebook to start using BlazingSQL and RAPIDS AI.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              blazingsql has a medium active ecosystem.
              It has 1808 star(s) with 172 fork(s). There are 55 watchers for this library.
              OutlinedDot
              It had no major release in the last 12 months.
              There are 129 open issues and 586 have been closed. On average issues are closed in 42 days. There are 17 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of blazingsql is v21.08.00

            kandi-Quality Quality

              blazingsql has no bugs reported.

            kandi-Security Security

              blazingsql has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              blazingsql is licensed under the Apache-2.0 License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              blazingsql releases are available to install and integrate.
              Installation instructions, examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of blazingsql
            Get all kandi verified functions for this library.

            blazingsql Key Features

            No Key Features are available at this moment for blazingsql.

            blazingsql Examples and Code Snippets

            No Code Snippets are available at this moment for blazingsql.

            Community Discussions

            QUESTION

            CUML fit functions throwing cp.full TypeError
            Asked 2021-May-06 at 17:13

            I've been trying to run RAPIDS on Google Colab pro, and have successfully installed the cuml and cudf packages, however I am unable to run even the example scripts.

            TLDR;

            Anytime I try to run the fit function for cuml on Google Colab I get the following error. I get this when using the demo examples both for installation and then for cuml. This happens for a range of cuml examples (I first hit this trying to run UMAP).

            ...

            ANSWER

            Answered 2021-May-06 at 17:13

            Colab retains cupy==7.4.0 despite conda installing cupy==8.6.0 during the RAPIDS install. It is a custom install. I just had success pip installing cupy-cuda110==8.6.0 BEFORE installing RAPIDS, with

            !pip install cupy-cuda110==8.6.0:

            I'll be updating the script soon so that you won't have to do it manually, but want to test a few more things out. Thanks again for letting us know!

            EDIT: script updated.

            Source https://stackoverflow.com/questions/67368715

            QUESTION

            ERROR: No matching distribution found for blazingsql
            Asked 2020-Sep-08 at 19:29

            I was trying to pip install blazingsql. However, I keep getting the following error:

            ...

            ANSWER

            Answered 2020-Sep-08 at 19:29

            There is no pip package for BlazingSQL or the rest of the rapids eco system.

            I just tried the exact same command conda install -c blazingsql/label/cuda10.2 -c blazingsql -c rapidsai -c nvidia -c conda-forge -c defaults blazingsql python=3.7 in a newly created conda environment and had no issues. There is no currently prescribed way of doing this outside of a conda environment right now though people have figured out work arounds for different architectures like PowerPC.

            Can you submit an issue on github (https://github.com/blazingdb/blazingsql/issues) and provide more information about the OS, driver version, and your cuda version?

            Source https://stackoverflow.com/questions/63740193

            QUESTION

            What is the relationship between BlazingSQL and dask?
            Asked 2020-Jan-18 at 04:54

            I'm trying to understand if BlazingSQL is a competitor or complementary to dask.

            I have some medium-sized data (10-50GB) saved as parquet files on Azure blob storage.

            IIUC I can query, join, aggregate, groupby with BlazingSQL using SQL syntax, but I can also read the data into CuDF using dask_cudf and do all same operations using python/dataframe syntax.

            So, it seems to me that they're direct competitors?

            Is it correct that (one of) the benefits of using dask is that it can operate on partitions so can operate on datasets larger than GPU memory whereas BlazingSQL is limited to what can fit on the GPU?

            Why would one choose to use BlazingSQL rather than dask?

            Edit:
            The docs talk about dask_cudf but the actual repo is archived saying that dask support is now in cudf itself. It would be good to know how to leverage dask to operate on larger-than-gpu-memory datasets with cudf

            ...

            ANSWER

            Answered 2020-Jan-18 at 04:54

            Full disclosure I'm a co-founder of BlazingSQL.

            BlazingSQL and Dask are not competitive, in fact you need Dask to use BlazingSQL in a distributed context. All distibured BlazingSQL results return dask_cudf result sets, so you can then continuer operations on said results in python/dataframe syntax. To your point, you are correct on two counts:

            1. BlazingSQL is currently limited to GPU memory, and actually some system memory by leveraging CUDA's Unified Virtual Memory. That will change soon, we are estimating around v0.13 which is scheduled for an early March release. Upon that release, memory will spill off and cache to system memory, local drives, or even our supported storage plugins such as AWS S3, Google Cloud Storage, and HDFS.
            2. You can totally write SQL operations as dask_cudf functions, but it is incumbent on the user to know all of those functions, and optimize their usage of them. SQL has a variety of benefits in that it is more accessible (more people know it, and it's very easy to learn), and there is a great deal of research around optimizing SQL (cost-based optimizers for example) for running queries at scale.

            If you wish to make RAPIDS accessible to more users SQL is a pretty easy onboarding process, and it's very easy to optimize for because of the reduced scope necessary to optimize SQL operations over Dask which has many other considerations.

            Source https://stackoverflow.com/questions/59797206

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install blazingsql

            Here's two copy + paste reproducable BlazingSQL snippets, keep scrolling to find example Notebooks below.
            BlazingSQL can be installed with conda (miniconda, or the full Anaconda distribution) from the blazingsql channel:.
            Where $CUDA_VERSION is is 10.1, 10.2 or 11.0 and $PYTHON_VERSION is 3.7 or 3.8 For example for CUDA 10.1 and Python 3.7:.
            The build process will checkout the BlazingSQL repository and will build and install into the conda environment. NOTE: You can do ./build.sh -h to see more build options. $CONDA_PREFIX now has a folder for the blazingsql repository.
            For nightly version cuda 11+ are only supported, see https://github.com/rapidsai/cudf#cudagpu-requirements.
            The build process will checkout the BlazingSQL repository and will build and install into the conda environment. NOTE: You can do ./build.sh -h to see more build options. NOTE: You can perform static analysis with cppcheck with the command cppcheck --project=compile_commands.json in any of the cpp project build directories. $CONDA_PREFIX now has a folder for the blazingsql repository.
            By disabling the storage plugins you don't need to install mysql-connector-cpp=8.0.23 libpq=13 sqlite=3 (neither any of its dependencies).
            Currenlty we support only MySQL. but PostgreSQL and SQLite will be ready for the next version!

            Support

            You can find our full documentation at docs.blazingdb.com.
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries