amazon-redshift-utils | Amazon Redshift Utils contains utilities, scripts and view which are useful in a Redshift environmen | AWS library

by awslabs Python Version: Current License: Apache-2.0

X-Ray Key Features Code Snippets Community Discussions(2)Vulnerabilities Install Support

kandi X-RAY | amazon-redshift-utils Summary

amazon-redshift-utils is a Python library typically used in Cloud, AWS applications. amazon-redshift-utils has no bugs, it has no vulnerabilities, it has a Permissive License and it has medium support. However amazon-redshift-utils build file is not available. You can download it from GitHub.

Amazon Redshift Utils contains utilities, scripts and view which are useful in a Redshift environment

Support

Quality

Security

License

Reuse

Support

amazon-redshift-utils has a medium active ecosystem.

It has 2599 star(s) with 1215 fork(s). There are 220 watchers for this library.

It had no major release in the last 6 months.

There are 26 open issues and 203 have been closed. On average issues are closed in 1188 days. There are 13 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of amazon-redshift-utils is current.

Quality

amazon-redshift-utils has 0 bugs and 0 code smells.

Security

amazon-redshift-utils has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

amazon-redshift-utils code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

amazon-redshift-utils is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

amazon-redshift-utils releases are not available. You will need to build from source code and install.

amazon-redshift-utils has no build file. You will be need to create the build yourself to build the component from source.

Installation instructions are not available. Examples and code snippets are available.

It has 12186 lines of code, 564 functions and 64 files.

It has high code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed amazon-redshift-utils and discovered the below as its top functions. This is intended to give you an instant insight into amazon-redshift-utils implemented functionality, and help decide if they suit your requirements.

Analyze table .
Runs an analysis on a given cluster .
Runs a vacuum on a given cluster .
Write logs to output directory .
Play a worker .
Bundle event handler
Create a snapshot of the given configs
Validate config file .
Start a replay .
Get credentials for a given username .

Get all kandi verified functions for this library.

amazon-redshift-utils Key Features

No Key Features are available at this moment for amazon-redshift-utils.

amazon-redshift-utils Examples and Code Snippets

No Code Snippets are available at this moment for amazon-redshift-utils.

Community Discussions

Trending Discussions on amazon-redshift-utils

Convert python script to airflow dag

RedShift - Why you shouldn't compress the sortykey column?

QUESTION

Convert python script to airflow dag

Asked 2021-Jun-03 at 17:10

I have identified the below script as being really useful for anyone running Amazon Redshift:

...

ANSWER

Answered 2021-Jun-03 at 17:10

How about creating a new custom operator? It should accept all the cli arguments and then you can pass them to code from existing script. Here is some rough draft of what I would do:

Source https://stackoverflow.com/questions/67783393

QUESTION

RedShift - Why you shouldn't compress the sortykey column?

Asked 2020-May-01 at 23:00

I know may experts suggest this, even I follow this as best practice(Read it from AWS Blog), there is a very deep doc about this in Github, but still I'm confused with this term. It'll affect the range-restricted scan and not able to understand this concept.

Can someone give me an example, that clarifies why we shouldn't use the compression on the sort key column?

...

ANSWER

Answered 2020-May-01 at 23:00

So the reality is simple executable answers are often not perfect but the best rule of thumb. You say you have read the docs so I won't go into detail. The assumption behind this recommendation is that the sort key is also the common where clause in many queries. This is important to make sense of the recommendation but it is generally true. I have lots of queries with "where date_col > getdate() - interval '1 year'" from which you decide to make the sort key of the table "date_col" - very typical.

Now when you run this type of query Redshift leader node will check the where condition against the block meta data for the date_col column. Whichever blocks have the desired dates within them then these block "match". Now you are going to look at the data for other columns as well. To get the needed blocks for these columns Redshift uses another piece of meta data for the date_col column - namely the row number range that are in each matching block. These row number ranges are used to find the blocks for other columns based on the metadata for those columns. I hope this makes sense - find the blocks that match the where clause then find the blocks in other columns. All of these to not read blocks that aren't needed for the query.

Now for the example - if you have a table with 2 columns: 1) sort key column is an INT and 2) a large varchar. Both are compressed. Now the first column (INT) is in sort order and will be highly compressed. Let's say that this column fits in 1 block. The other column (large varchar) takes 10 blocks. We run our query with a where clause on the INT column, it matches the 1 block, but not the row numbers needed in the other column results in getting all 10 blocks. No savings in disk read bandwidth. But if the INT column is not compressed it will take up more blocks - let's say 8 blocks. The same query will match only one of the 8 blocks of the INT column and the row number cross reference to the varchar column may match only 3 of the 10 block for that column. Now we have reduced the data read from disk.

Hopefully that makes sense. You can see that there are a number of assumptions behind this recommendation which are true more often than not. Without these assumptions it is hard to figure out whys they say this. Namely that your sort key is your common where clause, that the compression of the sortkey column will be much better than other columns, and that the data stored in the sortkey is smaller than data in other columns. And a few others but less central.

Did this help?

Source https://stackoverflow.com/questions/61546930

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install amazon-redshift-utils

You can download it from GitHub.
You can use amazon-redshift-utils like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: