Support
Quality
Security
License
Reuse
kandi has reviewed pubsub-to-bigquery and discovered the below as its top functions. This is intended to give you an instant insight into pubsub-to-bigquery implemented functionality, and help decide if they suit your requirements.
A highly configurable Google Cloud Dataflow pipeline that writes data into Google Big Query table from Pub/Sub
PubSubToBigQuery
java.exe -jar "C:\Jars\pubsub-to-bq.jar" --runner=BlockingDataflowPipelineRunner --params="<params><workingBucket>gs://your_bucket</workingBucket><maxNumWorkers>1</maxNumWorkers><diskSizeGb>250</diskSizeGb><machineType>n1-standard-1</machineType><keyFile>C:\KeyFiles\YourFile.json</keyFile><accountEmail>your_account@developer.gserviceaccount.com</accountEmail><projectId>your_project_id</projectId><pipelineName>your_pipeline_name</pipelineName><pubSubTopic>your_pub_topic</pubSubTopic><bqDataSet>your_destination_BQ_dataset</bqDataSet><bqTable>your_destination_BQ_table</bqTable><streaming>true</streaming><zone>us-west1-a</zone><schema>{"fields":[{"description":null,"fields":null,"mode":"REQUIRED","name":"Student_Name","type":"STRING","ETag":null}],"ETag":null}</schema></params>"
GCloud Dataflow recreate BigQuery table if it gets deleted during job run
public TableReference getOrCreateTable(BigQueryOptions options, String tableSpec)
throws IOException {
TableReference tableReference = parseTableSpec(tableSpec);
if (!createdTables.contains(tableSpec)) {
synchronized (createdTables) {
// Another thread may have succeeded in creating the table in the meanwhile, so
// check again. This check isn't needed for correctness, but we add it to prevent
// every thread from attempting a create and overwhelming our BigQuery quota.
if (!createdTables.contains(tableSpec)) {
TableSchema tableSchema = JSON_FACTORY.fromString(jsonTableSchema, TableSchema.class);
Bigquery client = Transport.newBigQueryClient(options).build();
BigQueryTableInserter inserter = new BigQueryTableInserter(client);
inserter.getOrCreateTable(tableReference, WriteDisposition.WRITE_APPEND,
CreateDisposition.CREATE_IF_NEEDED, tableSchema);
createdTables.add(tableSpec);
}
}
}
return tableReference;
}
if (!createdTables.contains(tableSpec)) {
-----------------------
public TableReference getOrCreateTable(BigQueryOptions options, String tableSpec)
throws IOException {
TableReference tableReference = parseTableSpec(tableSpec);
if (!createdTables.contains(tableSpec)) {
synchronized (createdTables) {
// Another thread may have succeeded in creating the table in the meanwhile, so
// check again. This check isn't needed for correctness, but we add it to prevent
// every thread from attempting a create and overwhelming our BigQuery quota.
if (!createdTables.contains(tableSpec)) {
TableSchema tableSchema = JSON_FACTORY.fromString(jsonTableSchema, TableSchema.class);
Bigquery client = Transport.newBigQueryClient(options).build();
BigQueryTableInserter inserter = new BigQueryTableInserter(client);
inserter.getOrCreateTable(tableReference, WriteDisposition.WRITE_APPEND,
CreateDisposition.CREATE_IF_NEEDED, tableSchema);
createdTables.add(tableSpec);
}
}
}
return tableReference;
}
if (!createdTables.contains(tableSpec)) {
QUESTION
Unable to drain/cancel Dataflow job. It keeps pending state
Asked 2021-Feb-10 at 20:19Some jobs are remaining with pending pending state and I can't cancel them.
How do I cancel the job.
Web console shows like this.
There are no instances in the Compute Engline tab.
What I did is below. I created a streaming job. it was simple template job, Pubsub subscription to BigQuery. I set machineType as e2-micro because it was just a testing.
I also tried to drain and cancel by gcloud but it doesn't work.
$ gcloud dataflow jobs drain --region asia-northeast1 JOBID
Failed to drain job [...]: (...): Workflow modification failed. Causes: (...):
Operation drain not allowed for JOBID.
Job is not yet ready for draining. Please retry in a few minutes.
Please ensure you have permission to access the job and the `--region` flag, asia-northeast1, matches the job's
region.
This is jobs list
$ gcloud dataflow jobs list --region asia-northeast1
JOB_ID NAME TYPE CREATION_TIME STATE REGION
JOBID1 pubsub-to-bigquery-udf4 Streaming 2021-02-09 04:24:23 Pending asia-northeast1
JOBID2 pubsub-to-bigquery-udf2 Streaming 2021-02-09 03:20:35 Pending asia-northeast1
...other jobs...
Please let me know how to stop/cancel/delete these streaming jobs.
Job IDs:
ANSWER
Answered 2021-Feb-10 at 12:47Community Discussions, Code Snippets contain sources that include Stack Exchange Network
No vulnerabilities reported
Save this library and start creating your kit
Save this library and start creating your kit