tpcds-kit | TPC-DS benchmark kit with some modifications/fixes | SQL Database library

by gregrahn C Version: Current License: No License

X-Ray Key Features Code Snippets Community Discussions(2)Vulnerabilities Install Support

kandi X-RAY | tpcds-kit Summary

tpcds-kit is a C library typically used in Database, SQL Database applications. tpcds-kit has no bugs, it has no vulnerabilities and it has low support. You can download it from GitHub.

TPC-DS benchmark kit with some modifications/fixes

Support

Quality

Security

License

Reuse

Support

tpcds-kit has a low active ecosystem.

It has 268 star(s) with 181 fork(s). There are 11 watchers for this library.

It had no major release in the last 6 months.

There are 15 open issues and 38 have been closed. On average issues are closed in 154 days. There are 2 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of tpcds-kit is current.

Quality

tpcds-kit has 0 bugs and 0 code smells.

Security

tpcds-kit has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

tpcds-kit code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

tpcds-kit does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

tpcds-kit releases are not available. You will need to build from source code and install.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of tpcds-kit

Get all kandi verified functions for this library.

tpcds-kit Key Features

No Key Features are available at this moment for tpcds-kit.

tpcds-kit Examples and Code Snippets

No Code Snippets are available at this moment for tpcds-kit.

Community Discussions

Trending Discussions on tpcds-kit

Snowflake query performance is unexpectedly slower for external Parquet tables vs. internal tables

Spark error when running TPCDS benchmark datasets - Could not find dsdgen

QUESTION

Snowflake query performance is unexpectedly slower for external Parquet tables vs. internal tables

Asked 2022-Feb-07 at 14:34

When I run queries on external Parquet tables in Snowflake, the queries are orders of magnitude slower than on the same tables copied into Snowflake or with any other cloud data warehouse I have tested on the same files.

Context:

I have tables belonging to the 10TB TPC-DS dataset in Parquet format on GCS and a Snowflake account in the same region (US Central). I have loaded those tables into Snowflake using create as select. I can run TPC-DS queries(here #28) on these internal tables with excellent performance. I was also able to query those files on GCS directly with data lake engines with excellent performance, as the files are "optimally" sized and internally sorted. However, when I query the same external tables on Snowflake, the query does not seem to finish in reasonable time (>4 minutes and counting, as opposed to 30 seconds, on the same virtual warehouse). Looking at the query profile, it seems that the number of records read in the table scans keeps growing indefinitely, resulting in a proportional amount of spilling to disk.

The table happens to be partitioned but it those not matter on the query of interest (which I tested with other engines).

What I would expect:

Assuming proper data "formatting", I would expect no major performance degradation compared to internal tables, as the setup is technically the same - data stored in columnar format in cloud object store - and as it is advertised as such by Snowflake. For example I saw no performance degradation with BigQuery on the exact same experiment.

Other than double checking my setup, I see don't see many things to try...

This is what the "in progress" part of the plan looks like 4 minutes into execution on the external table. All other operators are at 0% progress. You can see external bytes scanned=bytes spilled and 26G!! rows are produced. And this is what it looked like on a finished execution on the internal table executed in ~20 seconds. You can see that the left-most table scan should produce 1.4G rows but had produced 23G rows with the external table.

This is a sample of the DDL I used (I also tested without defining the partitioning column):

...

ANSWER

Answered 2022-Jan-18 at 12:20

Probably Snowflake plan assumes it must read every parquet file because it cannot tell beforehand if the files are sorted, number of unique values, nulls, minimum and maximum values for each column, etc.

This information is stored as an optional field in Parquet, but you'll need to read the parquet metadata first to find out.

When Snowflake uses internal tables, it has full control about storage, has information about indexes (if any), column stats, and how to optimize a query both from a logical and physical perspective.

Source https://stackoverflow.com/questions/70755218

QUESTION

Spark error when running TPCDS benchmark datasets - Could not find dsdgen

Asked 2020-Mar-29 at 08:29

Im trying to build the TPCDS benchmark datasets, by following this website.

https://xuechendi.github.io/2019/07/12/Prepare-TPCDS-For-Spark

when I run this:

...

ANSWER

Answered 2020-Mar-29 at 08:29

Could not find dsdgen at /home/troberts/spark-sql-perf/tpcds-kit/tools/dsdgen or //home/troberts/spark-sql-perf/tpcds-kit/tools/dsdgen

Source https://stackoverflow.com/questions/60906687

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install tpcds-kit

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: