amazon-redshift-utils | Amazon Redshift Utils contains utilities, scripts and view which are useful in a Redshift environmen | AWS library
kandi X-RAY | amazon-redshift-utils Summary
kandi X-RAY | amazon-redshift-utils Summary
Amazon Redshift Utils contains utilities, scripts and view which are useful in a Redshift environment
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Analyze table .
- Runs an analysis on a given cluster .
- Runs a vacuum on a given cluster .
- Write logs to output directory .
- Play a worker .
- Bundle event handler
- Create a snapshot of the given configs
- Validate config file .
- Start a replay .
- Get credentials for a given username .
amazon-redshift-utils Key Features
amazon-redshift-utils Examples and Code Snippets
Community Discussions
Trending Discussions on amazon-redshift-utils
QUESTION
I have identified the below script as being really useful for anyone running Amazon Redshift:
...ANSWER
Answered 2021-Jun-03 at 17:10How about creating a new custom operator? It should accept all the cli arguments and then you can pass them to code from existing script. Here is some rough draft of what I would do:
QUESTION
I know may experts suggest this, even I follow this as best practice(Read it from AWS Blog), there is a very deep doc about this in Github, but still I'm confused with this term. It'll affect the range-restricted scan
and not able to understand this concept.
Can someone give me an example, that clarifies why we shouldn't use the compression on the sort key column?
...ANSWER
Answered 2020-May-01 at 23:00So the reality is simple executable answers are often not perfect but the best rule of thumb. You say you have read the docs so I won't go into detail. The assumption behind this recommendation is that the sort key is also the common where clause in many queries. This is important to make sense of the recommendation but it is generally true. I have lots of queries with "where date_col > getdate() - interval '1 year'" from which you decide to make the sort key of the table "date_col" - very typical.
Now when you run this type of query Redshift leader node will check the where condition against the block meta data for the date_col column. Whichever blocks have the desired dates within them then these block "match". Now you are going to look at the data for other columns as well. To get the needed blocks for these columns Redshift uses another piece of meta data for the date_col column - namely the row number range that are in each matching block. These row number ranges are used to find the blocks for other columns based on the metadata for those columns. I hope this makes sense - find the blocks that match the where clause then find the blocks in other columns. All of these to not read blocks that aren't needed for the query.
Now for the example - if you have a table with 2 columns: 1) sort key column is an INT and 2) a large varchar. Both are compressed. Now the first column (INT) is in sort order and will be highly compressed. Let's say that this column fits in 1 block. The other column (large varchar) takes 10 blocks. We run our query with a where clause on the INT column, it matches the 1 block, but not the row numbers needed in the other column results in getting all 10 blocks. No savings in disk read bandwidth. But if the INT column is not compressed it will take up more blocks - let's say 8 blocks. The same query will match only one of the 8 blocks of the INT column and the row number cross reference to the varchar column may match only 3 of the 10 block for that column. Now we have reduced the data read from disk.
Hopefully that makes sense. You can see that there are a number of assumptions behind this recommendation which are true more often than not. Without these assumptions it is hard to figure out whys they say this. Namely that your sort key is your common where clause, that the compression of the sortkey column will be much better than other columns, and that the data stored in the sortkey is smaller than data in other columns. And a few others but less central.
Did this help?
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install amazon-redshift-utils
You can use amazon-redshift-utils like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page