flow-pipeline | A set of tools and examples to run a flow-pipeline ( sFlow | Pub Sub library

by cloudflare Go Version: Current License: No License

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | flow-pipeline Summary

flow-pipeline is a Go library typically used in Messaging, Pub Sub, Docker, Kafka, Grafana applications. flow-pipeline has no bugs, it has no vulnerabilities and it has low support. You can download it from GitHub.

If you choose to visualize in Grafana, you will need a Clickhouse Data source plugin. You can connect to the compose Grafana which has the plugin installed.

Support

Quality

Security

License

Reuse

Support

flow-pipeline has a low active ecosystem.

It has 142 star(s) with 38 fork(s). There are 15 watchers for this library.

It had no major release in the last 6 months.

There are 5 open issues and 3 have been closed. On average issues are closed in 167 days. There are 3 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of flow-pipeline is current.

Quality

flow-pipeline has no bugs reported.

Security

flow-pipeline has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

flow-pipeline does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

flow-pipeline releases are not available. You will need to build from source code and install.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi has reviewed flow-pipeline and discovered the below as its top functions. This is intended to give you an instant insight into flow-pipeline implemented functionality, and help decide if they suit your requirements.

Generate a new Kafka producer
Flush consumer message
Register the flow type .

Get all kandi verified functions for this library.

flow-pipeline Key Features

No Key Features are available at this moment for flow-pipeline.

flow-pipeline Examples and Code Snippets

No Code Snippets are available at this moment for flow-pipeline.

Community Discussions

Trending Discussions on flow-pipeline

Compilation of Elyra-Pipelines to Tekton based Kubeflow fails

Can I make flex template jobs take less than 10 minutes before they start to process data?

Libraries cannot be found on Dataflow/Apache-beam job launched from CircleCI

GCP AI Platform - Pipelines - Clusters - Does not have minimum availability

How to put a TPL Dataflow TranformBlock or ActionBlock in a separate file?

How to trigger Cloud Dataflow pipeline job from Cloud Function in Java?

Can I configure a Dataflow job to be single threaded?

Airflow, calling dags from a dag causes duplicate dagruns

Issues with Dynamic Destinations in Dataflow

Jenkins BUILD_USER_ID MissingPropertyException for Pipeline Trigerred by Git SCM

QUESTION

Compilation of Elyra-Pipelines to Tekton based Kubeflow fails

Asked 2021-Mar-26 at 18:06

I've installed a kubernetes cluster running kubeflow pipelines based on tekton on top of KIND using the following instructions

Now I'm getting the following error message from the Elyra pipelines editor. Running against an argo based kfp cluster works fine.

Is the kfp compiler somehow not supporting tekton? Can someone please shine some light on this?

HTTP response body:

...

ANSWER

Answered 2021-Jan-21 at 19:50

As of now the Tekton compiler is in a separate package. You can install it with pip install kfp-tekton==0.3.0 for kubeflow 1.2 . Here is the user guide

Currently, Elyra doesn't support compiling for kfp-tekton, only kfp-argo

There is an open Issue on that with the Elyra team

Source https://stackoverflow.com/questions/65832799

QUESTION

Can I make flex template jobs take less than 10 minutes before they start to process data?

Asked 2021-Jan-19 at 11:09

I am using terraform resource google_dataflow_flex_template_job to deploy a Dataflow flex template job.

...

ANSWER

Answered 2021-Jan-19 at 11:09

As mentioned in the existing answer you need to extract the apache-beam modules inside your requirements.txt:

Source https://stackoverflow.com/questions/65766066

QUESTION

Libraries cannot be found on Dataflow/Apache-beam job launched from CircleCI

Asked 2020-Oct-13 at 23:57

I am having serious issues running a python Apache Beam pipeline using a GCP Dataflow runner, launched from CircleCI. I would really appreciate if someone could give any hint on how to tackle this, I've tried it all but nothing seems to work.

Basically, I'm running this python Apache Beam pipeline which runs in Dataflow and uses google-api-python-client-1.12.3. If I run the job in my machine (python3 main.py --runner dataflow --setup_file /path/to/my/file/setup.py), it works fine. If I run this same job from within CircleCI, the Dataflow job is created, but it fails with a message ImportError: No module named 'apiclient'.

By looking at this documentation, I think I should probably use explicitely a requirements.txt file. If I run that same pipeline from CircleCI, but adding the --requirements_file argument to a requirements file containing a single line (google-api-python-client==1.12.3), the dataflow job fails because the workers fail too. In the logs, there's a info message first ERROR: Could not find a version that satisfies the requirement wheel (from versions: none)" which results in a later error message "Error syncing pod somePodIdHere (\"dataflow-myjob-harness-rl84_default(somePodIdHere)\"), skipping: failed to \"StartContainer\" for \"python\" with CrashLoopBackOff: \"back-off 40s restarting failed container=python pod=dataflow-myjob-harness-rl84_default(somePodIdHere)\". I found this thread but the solution didn't seem to work in my case.

Any help would be really, really appreciated. Thanks a lot in advance!

...

ANSWER

Answered 2020-Oct-13 at 23:57

This question looks very similar to yours. The solution seemed to be to explicitly include the dependencies of your requirements in your requirements.txt

apache beam 2.19.0 not running on cloud dataflow anymore due to Could not find a version that satisfies the requirement setuptools>=40.8

Source https://stackoverflow.com/questions/64333740

QUESTION

GCP AI Platform - Pipelines - Clusters - Does not have minimum availability

Asked 2020-Oct-04 at 15:56

I can't create pipelines. I can't even load the samples / tutorials on the AI Platform Pipelines Dashboard because it doesn't seem to be able to proxy to whatever it needs to.

...

ANSWER

Answered 2020-Sep-18 at 08:11

The Does not have minimum availability error is generic. There could be many issues that trigger it. You need to analyse more in-depth in order to find the actual problem. Here are some possible causes:

Insufficient resources: check if your Node has adequate resources (CPU/Memory). If Node is ok than check the Pod's status.
Liveliness probe and/or Readiness probe failure: execute kubectl describe pod to check if they failed and why.
Deployment misconfiguration: review your deployment yaml file to see if there are any errors or leftovers from previous configurations.
You can also try to wait a bit as sometimes it takes some time in order to deploy everything and/or try changing your Region/Zone.

Source https://stackoverflow.com/questions/63946871

QUESTION

How to put a TPL Dataflow TranformBlock or ActionBlock in a separate file?

Asked 2020-Sep-15 at 09:06

I want to use the TPL Dataflow for my .NET Core application and followed the example from the docs.

Instead of having all the logic in one file I would like to separate each TransformBlock and ActionBlock (I don't need the other ones yet) into their own files. A small TransformBlock example converting integers to strings

...

ANSWER

Answered 2020-Sep-15 at 08:41

As @Panagiotis explained, I think you have to put aside the OOP Mindset a little. What you have with DataFlow are Buildingblocks that you configure to execute what you need. I'll try to create a little example of what I mean by that:

Source https://stackoverflow.com/questions/63896597

QUESTION

How to trigger Cloud Dataflow pipeline job from Cloud Function in Java?

Asked 2020-Sep-06 at 15:20

I have a requirement to trigger the Cloud Dataflow pipeline from Cloud Functions. But the Cloud function must be written in Java. So the Trigger for Cloud Function is Google Cloud Storage's Finalise/Create Event, i.e., when a file is uploaded in a GCS bucket, the Cloud Function must trigger the Cloud dataflow.

When I create a dataflow pipeline (batch) and I execute the pipeline, it creates a Dataflow pipeline template and creates a Dataflow job.

But when I create a cloud function in Java, and a file is uploaded, the status just says "ok", but it does not trigger the dataflow pipeline.

Cloud function

...

ANSWER

Answered 2020-Sep-06 at 15:20

RuntimeEnvironment runtimeEnvironment = new RuntimeEnvironment();
runtimeEnvironment.setBypassTempDirValidation(false);
runtimeEnvironment.setTempLocation("gs://karthiksfirstbucket/temp1");

LaunchTemplateParameters launchTemplateParameters = new LaunchTemplateParameters();
launchTemplateParameters.setEnvironment(runtimeEnvironment);
launchTemplateParameters.setJobName("newJob" + (new Date()).getTime());

Map params = new HashMap();
params.put("inputFile", "gs://karthiksfirstbucket/sample.txt");
params.put("output", "gs://karthiksfirstbucket/count1");
launchTemplateParameters.setParameters(params);
writer.write("4");
       
Dataflow.Projects.Templates.Launch launch = dataflowService.projects().templates().launch(projectId, launchTemplateParameters);            
launch.setGcsPath("gs://dataflow-templates-us-central1/latest/Word_Count");
launch.execute();

Source https://stackoverflow.com/questions/63516968

QUESTION

Can I configure a Dataflow job to be single threaded?

Asked 2020-Mar-26 at 07:48

I was trying to configure and deploy a Cloud Dataflow job that is truly single threaded to avoid concurrency issues while creating/updating entities in the datastore. I was under the assumption that using an n1-standard-1 machine ensures that the job is running on a single thread, on a single machine, but I have come to learn the hard that this is not the case.

I have gone over the suggestions mentioned in an earlier query here- Can I force a step in my dataflow pipeline to be single-threaded (and on a single machine)?

But I wanted to avoid implementing a windowing approach around this, and wanted to know if there is a simpler way to simply configure a job to ensure single threaded behavior.

Any suggestions or insights would be greatly appreciated

...

ANSWER

Answered 2020-Mar-26 at 07:48

I have come to learn recently that single threaded behavior is guaranteed by using a single worker which is n1-standard-1 and additionally using the following exec_arg --numberOfWorkerHarnessThreads=1 as this restricts the number of JVM threads to 1 as well.

Source https://stackoverflow.com/questions/60834478

QUESTION

Airflow, calling dags from a dag causes duplicate dagruns

Asked 2020-Jan-08 at 15:09

I have the below "Master" DAG. I want to call the associated DAGs as per the downstream section at the bottom.

However, what happens, is that the first DAG gets called four times, and the other three runs for a microsecond (Not enough to actually perform) and everything comes back green.

How do I get to behave in the direction the downstream section means?

...

ANSWER

Answered 2020-Jan-08 at 15:09

A TriggerDagRun is more of a "fire and forget" operation and from your comments it sounds like you might actually want a SubDagOperator, which will treat a secondary DAG as a task to be waited on before continuing.

Source https://stackoverflow.com/questions/59550766

QUESTION

Issues with Dynamic Destinations in Dataflow

Asked 2019-Nov-26 at 18:58

I have a Dataflow job that reads data from pubsub and based on the time and filename writes the contents to GCS where the folder path is based on the YYYY/MM/DD. This allows files to be generated in folders based on date and uses apache beam's FileIO and Dynamic Destinations.

About two weeks ago, I noticed an unusual buildup of unacknowledged messages. Upon restarting the df job the errors disappeared and new files were being written in GCS.

After a couple of days, writing stopped again, except this time, there were errors claiming that processing was stuck. After some trusty SO research, I found out that this was likely caused by a deadlock issue in pre 2.90 Beam because it used the Conscrypt library as the default security provider. So, I upgraded to Beam 2.11 from Beam 2.8.

Once again, it worked, until it didn't. I looked more closely at the error and noticed that it had a problem with a SimpleDateFormat object, which isn't thread-safe. So, I switched to use Java.time and DateTimeFormatter, which is thread-safe. It worked until it didn't. However, this time, the error was slightly different and didn't point to anything in my code: The error is provided below.

...

ANSWER

Answered 2019-Aug-12 at 17:29

The error 'Processing stuck ...' indicates that some particular operation took longer than 5m, not that the job is permanently stuck. However, since the step FileIO.Write/WriteFiles/WriteShardedBundlesToTempFiles/WriteShardsIntoTempFiles is the one that is stuck and the job gets cancelled/killed, I would think on an issue while the job is writing temp files.

I found out the BEAM-7689 issue which is related to a second-granularity timestamp (yyyy-MM-dd_HH-mm-ss) that is used to write temporary files. This happens because several concurrent jobs can share the same temporary directory and this can cause that one of the jobs deletes it before the other(s) job finish(es).

According to the previous link, to mitigate the issue please upgrade to SDK 2.14. And let us know if the error is gone.

Source https://stackoverflow.com/questions/55748746

QUESTION

Jenkins BUILD_USER_ID MissingPropertyException for Pipeline Trigerred by Git SCM

Asked 2019-Mar-19 at 13:01

I need to get BUILD_USER_ID from Jenkins pipeline, and successfully implemented using this tutorial: here

It works when triggered manually by user, but error returned when triggered by GitSCM.

groovy.lang.MissingPropertyException: No such property: BUILD_USER_ID for class: groovy.lang.Binding

Please help.

...

ANSWER

Answered 2018-Dec-24 at 10:22

BUILD_USER_ID is being set only if the build has a UserIdCause:

https://github.com/jenkinsci/build-user-vars-plugin/blob/master/src/main/java/org/jenkinsci/plugins/builduser/BuildUser.java#L84-L88

Source https://stackoverflow.com/questions/53847181

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install flow-pipeline

You can download it from GitHub.

Support

The compose files don't bind to specific versions of the containers. You will likely need to down in order to clean the setup (volumes, network), push to resynchronize repositories like GoFlow and build to rebuild components like inserter .

Find more information at: