Greenplum | Some Greenplum SQL commands for DBA | Database library
kandi X-RAY | Greenplum Summary
kandi X-RAY | Greenplum Summary
Some General adminsitration tools for greenplum database.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of Greenplum
Greenplum Key Features
Greenplum Examples and Code Snippets
Community Discussions
Trending Discussions on Greenplum
QUESTION
I've got some Reports that Im trying to loop through, and fetch the connection ID's (User ID's) and list those reports along with the usernames.
A report can have the following scenarios:
- No DataSources
- 1 DataSource (1 User ID)
- More than 1 DataSources (Therefore more than 1 user id's)
The following script does the job, however, for some reason, reports with only 1 datasource seem to be getting executed in the else ... No Connection ID's found!
statement. That shouldn't be the case considering there is AT LEAST 1 Datasource, so they should be going through this if ($DataSourceValue.DataModelDataSource.count -gt 0)
statement instead!
Below is script accompanied by current output vs expected output:
...ANSWER
Answered 2021-May-04 at 03:42I don't mean this as an answer but could you test your script by replacing this portion of code:
QUESTION
Firstly, thank you for any future help!
Onto my issue: I'm trying to get the size of all tables in my Greenplum database - Simple. However, there are quite a few partitioned tables, and I want their total size, not the independent child size. So I'm using the following query to do this:
...ANSWER
Answered 2021-Feb-10 at 17:18Try properly escaping the identifiers:
QUESTION
In Greenplum, I need to create an external table with a dynamic location parameter. For an example:
...ANSWER
Answered 2021-Jan-07 at 17:19You are missing a few things in your table definition. You forgot "external" and "table".
QUESTION
I'm trying to implement an idempotent insert, this query looks good for this task
...ANSWER
Answered 2020-Dec-17 at 11:23ctx.insertInto(TEST_TABLE)
.select(
select()
.from(values(
row(1, 2, 3, 4),
row(3, 4, 5, 6),
row(1, 2, 3, 4),
row(3, 4, 5, 6)
).as("data", "a", "b", "c", "d"))
.whereNotExists(
selectOne()
.from(TEST_TABLE)
.where(TEST_TABLE.A.eq(field(name("data", "a"), TEST_TABLE.A.getDataType())))
.and(TEST_TABLE.B.eq(field(name("data", "b"), TEST_TABLE.A.getDataType())))
.and(TEST_TABLE.C.eq(field(name("data", "c"), TEST_TABLE.A.getDataType())))
)
)
.execute();
QUESTION
I have a requirement of streaming from multiple Kafka topics[Avro based] and putting them in Greenplum with small modification in the payload.
The Kaka topics are defined as a list in a configuration file and each Kafka topic will have one target table.
I am looking for a single Spark Structured application and an update in the configuration file to listen to new topics or stop. listening to the topic.
I am looking for help as I am confused about using a single query vs multiple:
...ANSWER
Answered 2020-Nov-07 at 12:31Apparently, you can use regex pattern for consuming the data from different kafka topics.
Lets say, you have topic names like "topic-ingestion1", "topic-ingestion2" - then you can create a regex pattern for consuming data from all topics ending with "*ingestion".
Once the new topic gets created in the format of your regex pattern - spark will automatically start streaming data from the newly created topic.
Reference: [https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#consumer-caching]
you can use this parameter to specify your cache timeout. "spark.kafka.consumer.cache.timeout".
From the spark documentation:
spark.kafka.consumer.cache.timeout - The minimum amount of time a consumer may sit idle in the pool before it is eligible for eviction by the evictor.
Lets say if you have multiple sinks where you are reading from kafka and you are writing it into two different locations like hdfs and hbase - then you can branch out your application logic into two writeStreams.
If the sink (Greenplum) supports batch mode of operations - then you can look at forEachBatch() function from spark structured streaming. It will allow us to reuse the same batchDF for both the operations.
Reference: [https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#consumer-caching]
QUESTION
Issue while connecting to Postgres/Greenplum.
...ANSWER
Answered 2020-Oct-30 at 10:21PostgreSQL 12 and later now allow GSSAPI encrypted connections. This parameter controls whether to enforce using GSSAPI encryption or not.
If encryption is not configured, should be disabled by setting JDBC connection parameter
gssEncMode=disable
QUESTION
I have written the following query, i am using greenplum db and dbeaver for implementation.
...ANSWER
Answered 2020-Oct-20 at 04:46Please try like this it works for me:
QUESTION
As we have only one instance in postgres, but greenplum is a combination of many postgres instances sewn together.So setting up the shared buffer with "Alter system" will set the value for the master or all the segments? For example setting a value of 125 MB for 8 segments will make shared_buffer 125*8= 1GB or will 125 MB be for all the segments
...ANSWER
Answered 2020-May-08 at 12:08The shared_buffers configuration is a per-segment configuration. Setting a value of 125MB for 8 segments will allocate 1GB for all 8 segments. Here is a snippet from the documentation.
Sets the amount of memory a Greenplum Database segment instance uses for shared memory buffers. This setting must be at least 128KB and at least 16KB times max_connections.
...
Each Greenplum Database segment instance calculates and attempts to allocate certain amount of shared memory based on the segment configuration.
QUESTION
I am trying to use a modified greenplum open source version for development. The greenplum version is Greenplum Database 6.0.0-beta.1 build dev (based on PostgreSQL 9.4.24).
I wanted to add pg_stat_statements extension to my database, and I did manage to install it on the database, following https://www.postgresql.org/docs/9.5/pgstatstatements.html. However, this extension doesn't work as expected. It only records non-plannable queries and utility queries. For all plannable queries which modify my tables, there is not a single record.
My question is, is pg_stat_statements compatible with greenplum? Since I am not using the official release, I would like to make sure the original one can work with pg_stat_statements. If so, how can I use it to track all sql queries in greeplum? Thanks.
Below is a example of not recording my select query.
...ANSWER
Answered 2020-May-03 at 17:11Here's what I get from @leskin-in in greenplum slack:
I suppose this is currently not possible. When Greenplum executes "normal" queries, each of Greenplum segments acts as an independent PostgreSQL instance which executes a plan created by GPDB master. pg_stat_statements tracks resources usage for a single PostgreSQL instance; as a result, in GPDB it is able to track the resources consumption for each segment independently. There are several complications PostgreSQL
pg_stat_statements
does not deal with. An example is that GPDB uses slices. On GPDB segments, these are independent parts of plan trees and are executed as independent queries. I suppose when a query to pg_stat_statemens is made on GPDB master in current version, the results retrieved are for master only. As "normal" queries are executed in most part by segments, the results are inconsistent with actual resources' consumption by the query. In open-source Greenplum 5 & 6 there is a Greenplum-specific utility gpperfmon. It provides some of the pg_stat_statements features, and is cluster-aware (shows actual resources consumption for the cluster as a whole, and also several cluster-specific metrics).
QUESTION
I wonder if Greenplum PXF can take advantage of HDFS short circuit read when we place pxf and datanode on the same host. We did a prelimiary test, however, it seems that pxf does not leverage the short circuit read. There is almost nothing after googling, so we are not sure if we miss something. We use Greenplum 6.4 (community version), pxf 5.11.2 and CDH 6.3.
Any references, suggestions or comments are very appreciated.
...ANSWER
Answered 2020-Apr-22 at 15:56The old version of PXF with hawq actually resides with data nodes and utilizes short-circuit read. THe current PXF has changed to reside with Greenplum segment hosts and acts like a hdfs client. I think you can tweak pxf source codes and setup pxf on datanodes with short-circuit read. However, you speed up the hdfs<->pxf communication, but slow down pxf<->greenplum segment communication.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install Greenplum
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page