kandi background
Explore Kits

pubsub-to-bigquery | highly configurable Google Cloud Dataflow pipeline | GCP library

 by   bomboradata Java Version: Current License: Apache-2.0

 by   bomboradata Java Version: Current License: Apache-2.0

Download this library from

kandi X-RAY | pubsub-to-bigquery Summary

pubsub-to-bigquery is a Java library typically used in Cloud, GCP applications. pubsub-to-bigquery has build file available, it has a Permissive License and it has low support. However pubsub-to-bigquery has 1 bugs and it has 1 vulnerabilities. You can download it from GitHub.
A highly configurable Google Cloud Dataflow pipeline that writes data into Google Big Query table from Pub/Sub
Support
Support
Quality
Quality
Security
Security
License
License
Reuse
Reuse

kandi-support Support

  • pubsub-to-bigquery has a low active ecosystem.
  • It has 64 star(s) with 6 fork(s). There are 7 watchers for this library.
  • It had no major release in the last 12 months.
  • There are 0 open issues and 1 have been closed. On average issues are closed in 435 days. There are no pull requests.
  • It has a neutral sentiment in the developer community.
  • The latest version of pubsub-to-bigquery is current.
pubsub-to-bigquery Support
Best in #GCP
Average in #GCP
pubsub-to-bigquery Support
Best in #GCP
Average in #GCP

quality kandi Quality

  • pubsub-to-bigquery has 1 bugs (0 blocker, 0 critical, 1 major, 0 minor) and 17 code smells.
pubsub-to-bigquery Quality
Best in #GCP
Average in #GCP
pubsub-to-bigquery Quality
Best in #GCP
Average in #GCP

securitySecurity

  • pubsub-to-bigquery has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
  • pubsub-to-bigquery code analysis shows 1 unresolved vulnerabilities (1 blocker, 0 critical, 0 major, 0 minor).
  • There are 0 security hotspots that need review.
pubsub-to-bigquery Security
Best in #GCP
Average in #GCP
pubsub-to-bigquery Security
Best in #GCP
Average in #GCP

license License

  • pubsub-to-bigquery is licensed under the Apache-2.0 License. This license is Permissive.
  • Permissive licenses have the least restrictions, and you can use them in most projects.
pubsub-to-bigquery License
Best in #GCP
Average in #GCP
pubsub-to-bigquery License
Best in #GCP
Average in #GCP

buildReuse

  • pubsub-to-bigquery releases are not available. You will need to build from source code and install.
  • Build file is available. You can build the component from source.
  • Installation instructions are not available. Examples and code snippets are available.
  • It has 415 lines of code, 5 functions and 3 files.
  • It has medium code complexity. Code complexity directly impacts maintainability of the code.
pubsub-to-bigquery Reuse
Best in #GCP
Average in #GCP
pubsub-to-bigquery Reuse
Best in #GCP
Average in #GCP
Top functions reviewed by kandi - BETA

kandi has reviewed pubsub-to-bigquery and discovered the below as its top functions. This is intended to give you an instant insight into pubsub-to-bigquery implemented functionality, and help decide if they suit your requirements.

  • Parses the XML document from the provided parameters .
    • Entry point for testing .

      Get all kandi verified functions for this library.

      Get all kandi verified functions for this library.

      pubsub-to-bigquery Key Features

      A highly configurable Google Cloud Dataflow pipeline that writes data into Google Big Query table from Pub/Sub

      PubSubToBigQuery

      copy iconCopydownload iconDownload
      java.exe -jar "C:\Jars\pubsub-to-bq.jar" --runner=BlockingDataflowPipelineRunner --params="<params><workingBucket>gs://your_bucket</workingBucket><maxNumWorkers>1</maxNumWorkers><diskSizeGb>250</diskSizeGb><machineType>n1-standard-1</machineType><keyFile>C:\KeyFiles\YourFile.json</keyFile><accountEmail>your_account@developer.gserviceaccount.com</accountEmail><projectId>your_project_id</projectId><pipelineName>your_pipeline_name</pipelineName><pubSubTopic>your_pub_topic</pubSubTopic><bqDataSet>your_destination_BQ_dataset</bqDataSet><bqTable>your_destination_BQ_table</bqTable><streaming>true</streaming><zone>us-west1-a</zone><schema>{"fields":[{"description":null,"fields":null,"mode":"REQUIRED","name":"Student_Name","type":"STRING","ETag":null}],"ETag":null}</schema></params>"
      

      GCloud Dataflow recreate BigQuery table if it gets deleted during job run

      copy iconCopydownload iconDownload
          public TableReference getOrCreateTable(BigQueryOptions options, String tableSpec)
              throws IOException {
            TableReference tableReference = parseTableSpec(tableSpec);
            if (!createdTables.contains(tableSpec)) {
              synchronized (createdTables) {
                // Another thread may have succeeded in creating the table in the meanwhile, so
                // check again. This check isn't needed for correctness, but we add it to prevent
                // every thread from attempting a create and overwhelming our BigQuery quota.
                if (!createdTables.contains(tableSpec)) {
                  TableSchema tableSchema = JSON_FACTORY.fromString(jsonTableSchema, TableSchema.class);
                  Bigquery client = Transport.newBigQueryClient(options).build();
                  BigQueryTableInserter inserter = new BigQueryTableInserter(client);
                  inserter.getOrCreateTable(tableReference, WriteDisposition.WRITE_APPEND,
                      CreateDisposition.CREATE_IF_NEEDED, tableSchema);
                  createdTables.add(tableSpec);
                }
              }
            }
            return tableReference;
          }
      
      
      if (!createdTables.contains(tableSpec)) {
      
          public TableReference getOrCreateTable(BigQueryOptions options, String tableSpec)
              throws IOException {
            TableReference tableReference = parseTableSpec(tableSpec);
            if (!createdTables.contains(tableSpec)) {
              synchronized (createdTables) {
                // Another thread may have succeeded in creating the table in the meanwhile, so
                // check again. This check isn't needed for correctness, but we add it to prevent
                // every thread from attempting a create and overwhelming our BigQuery quota.
                if (!createdTables.contains(tableSpec)) {
                  TableSchema tableSchema = JSON_FACTORY.fromString(jsonTableSchema, TableSchema.class);
                  Bigquery client = Transport.newBigQueryClient(options).build();
                  BigQueryTableInserter inserter = new BigQueryTableInserter(client);
                  inserter.getOrCreateTable(tableReference, WriteDisposition.WRITE_APPEND,
                      CreateDisposition.CREATE_IF_NEEDED, tableSchema);
                  createdTables.add(tableSpec);
                }
              }
            }
            return tableReference;
          }
      
      
      if (!createdTables.contains(tableSpec)) {
      

      Community Discussions

      Trending Discussions on pubsub-to-bigquery
      • Unable to drain/cancel Dataflow job. It keeps pending state
      • GCloud Dataflow recreate BigQuery table if it gets deleted during job run
      Trending Discussions on pubsub-to-bigquery

      QUESTION

      Unable to drain/cancel Dataflow job. It keeps pending state

      Asked 2021-Feb-10 at 20:19

      Some jobs are remaining with pending pending state and I can't cancel them.

      How do I cancel the job.

      Web console shows like this.

      • "The graph is still being analyzed."
      • All logs are "No entries found matching current filter."
      • Job status: "Starting..." There isn't appered a cancel button yet.

      There are no instances in the Compute Engline tab.

      What I did is below. I created a streaming job. it was simple template job, Pubsub subscription to BigQuery. I set machineType as e2-micro because it was just a testing.

      I also tried to drain and cancel by gcloud but it doesn't work.

      $ gcloud dataflow jobs drain --region asia-northeast1 JOBID
      
      Failed to drain job [...]: (...): Workflow modification failed. Causes: (...): 
      Operation drain not allowed for JOBID. 
      Job is not yet ready for draining. Please retry in a few minutes. 
      Please ensure you have permission to access the job and the `--region` flag, asia-northeast1, matches the job's
      region.
      

      This is jobs list

      $ gcloud dataflow jobs list --region asia-northeast1
      JOB_ID  NAME                               TYPE       CREATION_TIME        STATE      REGION
      JOBID1  pubsub-to-bigquery-udf4            Streaming  2021-02-09 04:24:23  Pending    asia-northeast1
      JOBID2  pubsub-to-bigquery-udf2            Streaming  2021-02-09 03:20:35  Pending    asia-northeast1
      ...other jobs...
      

      Please let me know how to stop/cancel/delete these streaming jobs.

      Job IDs:

      • 2021-02-08_20_24_22-11667100055733179687
      • 2021-02-08_20_24_22-11667100055733179687

      WebUI: https://i.stack.imgur.com/B75OX.png

      https://i.stack.imgur.com/LzUGQ.png

      ANSWER

      Answered 2021-Feb-10 at 12:47

      In GCP console Dataflow UI, if you have running Dataflow jobs, you will see the "STOP" button just like the below image.

      enter image description here

      Press the STOP button. enter image description here

      When you successfully stop your job, you will see the status like below. (I was too slow to stop the job with the first try, so I had to test it again. :) ) enter image description here

      Source https://stackoverflow.com/questions/66116354

      Community Discussions, Code Snippets contain sources that include Stack Exchange Network

      Vulnerabilities

      No vulnerabilities reported

      Install pubsub-to-bigquery

      You can download it from GitHub.
      You can use pubsub-to-bigquery like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the pubsub-to-bigquery component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .

      Support

      For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

      DOWNLOAD this Library from

      Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
      over 430 million Knowledge Items
      Find more libraries
      Reuse Solution Kits and Libraries Curated by Popular Use Cases
      Explore Kits

      Save this library and start creating your kit

      Explore Related Topics

      Share this Page

      share link
      Consider Popular GCP Libraries
      Try Top Libraries by bomboradata
      Compare GCP Libraries with Highest Support
      Compare GCP Libraries with Highest Quality
      Compare GCP Libraries with Highest Security
      Compare GCP Libraries with Permissive License
      Compare GCP Libraries with Highest Reuse
      Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
      over 430 million Knowledge Items
      Find more libraries
      Reuse Solution Kits and Libraries Curated by Popular Use Cases
      Explore Kits

      Save this library and start creating your kit

      • © 2022 Open Weaver Inc.