flow-pipeline | A set of tools and examples to run a flow-pipeline ( sFlow | Pub Sub library
kandi X-RAY | flow-pipeline Summary
kandi X-RAY | flow-pipeline Summary
If you choose to visualize in Grafana, you will need a Clickhouse Data source plugin. You can connect to the compose Grafana which has the plugin installed.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Generate a new Kafka producer
- Flush consumer message
- Register the flow type .
flow-pipeline Key Features
flow-pipeline Examples and Code Snippets
Community Discussions
Trending Discussions on flow-pipeline
QUESTION
I've installed a kubernetes cluster running kubeflow pipelines based on tekton on top of KIND using the following instructions
Now I'm getting the following error message from the Elyra pipelines editor. Running against an argo based kfp cluster works fine.
Is the kfp compiler somehow not supporting tekton? Can someone please shine some light on this?
HTTP response body:
...ANSWER
Answered 2021-Jan-21 at 19:50QUESTION
I am using terraform resource google_dataflow_flex_template_job to deploy a Dataflow flex template job.
...ANSWER
Answered 2021-Jan-19 at 11:09As mentioned in the existing answer you need to extract the apache-beam
modules inside your requirements.txt:
QUESTION
I am having serious issues running a python Apache Beam pipeline using a GCP Dataflow runner, launched from CircleCI. I would really appreciate if someone could give any hint on how to tackle this, I've tried it all but nothing seems to work.
Basically, I'm running this python Apache Beam pipeline which runs in Dataflow and uses google-api-python-client-1.12.3
. If I run the job in my machine (python3 main.py --runner dataflow --setup_file /path/to/my/file/setup.py
), it works fine. If I run this same job from within CircleCI, the Dataflow job is created, but it fails with a message ImportError: No module named 'apiclient'
.
By looking at this documentation, I think I should probably use explicitely a requirements.txt
file. If I run that same pipeline from CircleCI, but adding the --requirements_file
argument to a requirements file containing a single line (google-api-python-client==1.12.3
), the dataflow job fails because the workers fail too. In the logs, there's a info message first ERROR: Could not find a version that satisfies the requirement wheel (from versions: none)"
which results in a later error message "Error syncing pod somePodIdHere (\"dataflow-myjob-harness-rl84_default(somePodIdHere)\"), skipping: failed to \"StartContainer\" for \"python\" with CrashLoopBackOff: \"back-off 40s restarting failed container=python pod=dataflow-myjob-harness-rl84_default(somePodIdHere)\"
. I found this thread but the solution didn't seem to work in my case.
Any help would be really, really appreciated. Thanks a lot in advance!
...ANSWER
Answered 2020-Oct-13 at 23:57This question looks very similar to yours. The solution seemed to be to explicitly include the dependencies of your requirements in your requirements.txt
QUESTION
I can't create pipelines. I can't even load the samples / tutorials on the AI Platform Pipelines Dashboard because it doesn't seem to be able to proxy to whatever it needs to.
...ANSWER
Answered 2020-Sep-18 at 08:11The Does not have minimum availability
error is generic. There could be many issues that trigger it. You need to analyse more in-depth in order to find the actual problem. Here are some possible causes:
Insufficient resources: check if your Node has adequate resources (CPU/Memory). If Node is ok than check the Pod's status.
Liveliness probe and/or Readiness probe failure: execute
kubectl describe pod
to check if they failed and why.Deployment misconfiguration: review your deployment yaml file to see if there are any errors or leftovers from previous configurations.
You can also try to wait a bit as sometimes it takes some time in order to deploy everything and/or try changing your Region/Zone.
QUESTION
I want to use the TPL Dataflow for my .NET Core application and followed the example from the docs.
Instead of having all the logic in one file I would like to separate each TransformBlock
and ActionBlock
(I don't need the other ones yet) into their own files. A small TransformBlock
example converting integers to strings
ANSWER
Answered 2020-Sep-15 at 08:41As @Panagiotis explained, I think you have to put aside the OOP Mindset a little. What you have with DataFlow are Buildingblocks that you configure to execute what you need. I'll try to create a little example of what I mean by that:
QUESTION
I have a requirement to trigger the Cloud Dataflow pipeline from Cloud Functions. But the Cloud function must be written in Java. So the Trigger for Cloud Function is Google Cloud Storage's Finalise/Create Event, i.e., when a file is uploaded in a GCS bucket, the Cloud Function must trigger the Cloud dataflow.
When I create a dataflow pipeline (batch) and I execute the pipeline, it creates a Dataflow pipeline template and creates a Dataflow job.
But when I create a cloud function in Java, and a file is uploaded, the status just says "ok", but it does not trigger the dataflow pipeline.
Cloud function
...ANSWER
Answered 2020-Sep-06 at 15:20RuntimeEnvironment runtimeEnvironment = new RuntimeEnvironment();
runtimeEnvironment.setBypassTempDirValidation(false);
runtimeEnvironment.setTempLocation("gs://karthiksfirstbucket/temp1");
LaunchTemplateParameters launchTemplateParameters = new LaunchTemplateParameters();
launchTemplateParameters.setEnvironment(runtimeEnvironment);
launchTemplateParameters.setJobName("newJob" + (new Date()).getTime());
Map params = new HashMap();
params.put("inputFile", "gs://karthiksfirstbucket/sample.txt");
params.put("output", "gs://karthiksfirstbucket/count1");
launchTemplateParameters.setParameters(params);
writer.write("4");
Dataflow.Projects.Templates.Launch launch = dataflowService.projects().templates().launch(projectId, launchTemplateParameters);
launch.setGcsPath("gs://dataflow-templates-us-central1/latest/Word_Count");
launch.execute();
QUESTION
I was trying to configure and deploy a Cloud Dataflow job that is truly single threaded to avoid concurrency issues while creating/updating entities in the datastore. I was under the assumption that using an n1-standard-1 machine ensures that the job is running on a single thread, on a single machine, but I have come to learn the hard that this is not the case.
I have gone over the suggestions mentioned in an earlier query here- Can I force a step in my dataflow pipeline to be single-threaded (and on a single machine)?
But I wanted to avoid implementing a windowing approach around this, and wanted to know if there is a simpler way to simply configure a job to ensure single threaded behavior.
Any suggestions or insights would be greatly appreciated
...ANSWER
Answered 2020-Mar-26 at 07:48I have come to learn recently that single threaded behavior is guaranteed by using a single worker which is n1-standard-1 and additionally using the following exec_arg --numberOfWorkerHarnessThreads=1 as this restricts the number of JVM threads to 1 as well.
QUESTION
I have the below "Master" DAG. I want to call the associated DAGs as per the downstream section at the bottom.
However, what happens, is that the first DAG gets called four times, and the other three runs for a microsecond (Not enough to actually perform) and everything comes back green.
How do I get to behave in the direction the downstream section means?
...ANSWER
Answered 2020-Jan-08 at 15:09A TriggerDagRun
is more of a "fire and forget" operation and from your comments it sounds like you might actually want a SubDagOperator
, which will treat a secondary DAG as a task to be waited on before continuing.
QUESTION
I have a Dataflow job that reads data from pubsub and based on the time and filename writes the contents to GCS where the folder path is based on the YYYY/MM/DD. This allows files to be generated in folders based on date and uses apache beam's FileIO
and Dynamic Destinations
.
About two weeks ago, I noticed an unusual buildup of unacknowledged messages. Upon restarting the df job the errors disappeared and new files were being written in GCS.
After a couple of days, writing stopped again, except this time, there were errors claiming that processing was stuck. After some trusty SO research, I found out that this was likely caused by a deadlock issue in pre 2.90 Beam because it used the Conscrypt library as the default security provider. So, I upgraded to Beam 2.11 from Beam 2.8.
Once again, it worked, until it didn't. I looked more closely at the error and noticed that it had a problem with a SimpleDateFormat object, which isn't thread-safe. So, I switched to use Java.time and DateTimeFormatter, which is thread-safe. It worked until it didn't. However, this time, the error was slightly different and didn't point to anything in my code: The error is provided below.
...ANSWER
Answered 2019-Aug-12 at 17:29The error 'Processing stuck ...' indicates that some particular operation took longer than 5m, not that the job is permanently stuck. However, since the step FileIO.Write/WriteFiles/WriteShardedBundlesToTempFiles/WriteShardsIntoTempFiles is the one that is stuck and the job gets cancelled/killed, I would think on an issue while the job is writing temp files.
I found out the BEAM-7689 issue which is related to a second-granularity timestamp (yyyy-MM-dd_HH-mm-ss) that is used to write temporary files. This happens because several concurrent jobs can share the same temporary directory and this can cause that one of the jobs deletes it before the other(s) job finish(es).
According to the previous link, to mitigate the issue please upgrade to SDK 2.14. And let us know if the error is gone.
QUESTION
I need to get BUILD_USER_ID
from Jenkins pipeline, and successfully implemented using this tutorial: here
It works when triggered manually by user, but error returned when triggered by GitSCM.
groovy.lang.MissingPropertyException: No such property: BUILD_USER_ID for class: groovy.lang.Binding
Please help.
...ANSWER
Answered 2018-Dec-24 at 10:22BUILD_USER_ID is being set only if the build has a UserIdCause:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install flow-pipeline
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page