spring-cloud-dataflow | based Streaming and Batch data processing | Stream Processing library
kandi X-RAY | spring-cloud-dataflow Summary
kandi X-RAY | spring-cloud-dataflow Summary
Spring Cloud Data Flow is a microservices-based toolkit for building streaming and batch data processing pipelines in Cloud Foundry and Kubernetes. Data processing pipelines consist of Spring Boot apps, built using the Spring Cloud Stream or Spring Cloud Task microservice frameworks. This makes Spring Cloud Data Flow ideal for a range of data processing use cases, from import/export to event streaming and predictive analytics.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Launch a task .
- Deletes child task executions .
- Inline the given sequence into the main sequence .
- Builds the StreamAppDefinition .
- Creates the appDeployment requests for a stream deployment .
- Returns a page of audit records .
- Retrieves the TaskExecution information for a given task .
- Process a set of links .
- This method executes a task .
- Create container configuration map .
spring-cloud-dataflow Key Features
spring-cloud-dataflow Examples and Code Snippets
Community Discussions
Trending Discussions on spring-cloud-dataflow
QUESTION
I've used the Bitnami Helm chart to install SCDF into a k8s cluster generated by kOps in AWS.
I'm trying to add my development SCDF stream apps into the installation using a file URI and cannot figure-out where or how the shared Skipper & Server mount point is. exec'ing into either instance there is no /home/cnb
and I'm not seeing anything common via mount
. The best I can tell the Bitnami installation is using the MariaDB instance for shared "storage".
Is there a recommended way of installing local/dev Stream apps into the cluster?
...ANSWER
Answered 2021-Aug-23 at 09:03There are a couple of parameters under the deployer
section that allows you to mount volumes (link):
QUESTION
What is the format for passing in additional arguments or environment variables to the Data Flow Server in SCDF running on Kubernetes? When running locally in Docker Compose, I can do something like below, but not sure what the equivalent is when deploying to Kubernetes using the helm chart.
...ANSWER
Answered 2021-Aug-06 at 16:56The properties you are looking for might be here under Kafka Chart Parameters -> externalKafka.brokers
So in your case I would try
QUESTION
I have about 16 tasks configured in parallel like .
My intention is to only have 3 tasks running at one time. I don't mind which tasks run first as long as the order of the sequential tasks are maintained (BBB is always run after AAA, DDD after CCC etc.)
As per the documentation here - https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#_configuration_options, I tried setting --split-thread-core-pool-size=3
, but it gave me this error -
Split thread core pool size 3 should be equal or greater than the depth of split flows 17. Try setting the composed task property splitThreadCorePoolSize
What do I do here ?
...ANSWER
Answered 2021-Aug-04 at 20:54Spring Cloud Dataflow's Composed Task Runner uses Spring Batch under the covers. And the way Spring Batch deals with nested splits in flows is not quite optimal:
- https://github.com/spring-projects/spring-batch/issues/3857
- https://github.com/spring-cloud/spring-cloud-dataflow/issues/4279
That's why nested splits should be avoided if tight control on concurrency limits is required.
In your case that should be possible: With
QUESTION
I am trying to run/configure a Spring Data Cloud Data Flow (SCDF) to schedule a task for a Spring Batch Job.
I am running in a minikube that connects to a local postgresql(localhost:5432). The minikube runs in a virtualbox where I assigned a vnet thru the --cidr, so minikube can connect to the local postgres.
Here is the postgresql service yaml: https://github.com/msuzuki23/SpringCloudDataFlowDemo/blob/main/postgres-service.yaml
Here is the SCDF config yaml: https://github.com/msuzuki23/SpringCloudDataFlowDemo/blob/main/server-config.yaml
Here is the SCDF deployment yaml: https://github.com/msuzuki23/SpringCloudDataFlowDemo/blob/main/server-deployment.yaml
Here is the SCDF server-svc.yaml: https://github.com/msuzuki23/SpringCloudDataFlowDemo/blob/main/server-svc.yaml
To launch the SCDF server in minikube I do the following kubectl commands:
...ANSWER
Answered 2021-Jul-08 at 16:31I did a search on
Caused by: org.postgresql.util.PSQLException: ERROR: relation "hibernate_sequence" does not exist Position: 17
And found this stackoverflow anwer:
Postgres error in batch insert : relation "hibernate_sequence" does not exist position 17
Went to the postgres and created the hibernate_sequence:
QUESTION
TL;DR
Spring Cloud Data Flow does not allow multiple executions of the same Task even though the documentation says that this is the default behavior. How can we allow SCDF to run multiple instances of the same task at the same time using the Java DSL to launch tasks? To make things more interesting, launching of the same task multiple times works fine when directly hitting the rest enpoints using curl for example.
Background :
I have a Spring Cloud Data Flow Task that I have pre-registered in the Spring Cloud Data Flow UI Dashboard
...ANSWER
Answered 2021-May-12 at 16:57In this case it looks like you are trying to recreate the task definition. You should only need to create the task definition once. From this definition you can launch multiple times. For example:
QUESTION
Background
I have Spring Cloud Data Flow Server running in Kubernetes as a Pod. I am able to launch tasks from the SCDF server UI dashboard. I am looking to develop a more complicated, real-world task- pipeline use-case.
Instead of using the SCDF UI dashboard, I want to launch a sequential list of tasks from a standard Java application. Consider the following task pipeline :
Task 1 : Reads data from the database for the unique id received as task argument input and performs enrichments. The enriched record is written back to the database. Execution of one task instance is responsible for processing one unique id.
Task 2 : Reads the enriched data written by step 1 for the unique id received as task argument input and generates reports. Execution of one task instance is responsible for generating reports for one unique id.
It should be clear from the above explanation that Task 1 and Task 2 are sequential steps. Assume that the input database contains 50k unique ids. I want to develop an orchestrator Java program that would launch task 1 with a limit of 40. (i.e only 40 pods can be running at any given time for task 1. Any requests to launch more pods for task 1 should be put on wait). Once all 50k unique ids have been processed through Task 1 instances, only then can Task 2 pods should be launched.
What I found so far
Going through the documentation, I found something known as the CompositeTaskRunner. However, the examples show commands triggered on a shell/cmd window. I want to do something similar but instead of opening up a data-flow shell program, I want to pass arguments to a Java program that can internally launch tasks. This allows me to easily integrate my application with legacy code that knows how to integrate with Java code (Either by launching a Java program on-demand that should launch a set of tasks and wait for them to complete or by calling a Rest API).
Question
- How to programmatically launch tasks on-demand with Spring Cloud Data Flow using Java instead of a data-flow shell? (Is there a Rest-API to do this or a simple Java program that will be run on a stand alone server should be fine too)
- How to programmatically build a sequential pipeline with an upper limit on the number of pods that can be launched per task and with dependencies such that a task can only start once the previous task completed processing all the inputs.
ANSWER
Answered 2021-May-10 at 18:15Please review the Java DSL support for Tasks.
You'd be able to compose the choreography of the tasks with sequential/parallel execution with this fluent-style API. [example: .definition("a: timestamp && b:timestamp")
]
With this defined as Java code, you'd be able to build, launch or schedule the launching of these directed graphs. We see many customers following this approach for E2E acceptance testing and deployment automation.
[ADD]
Furthermore, you can extend the programmatic task definitions for continuous deployments, as well.
QUESTION
The only way I am running into is using curl command as per the docs: https://docs.spring.io/spring-cloud-dataflow/docs/2.7.1/reference/htmlsingle/#resources-app-registry-post
This uses a curl command to hit the api. Which I can develop a script for, but I would like to set this up within the helm charts so that these tasks and applications are created when the helm chart is deployed. Any ideas?
...ANSWER
Answered 2021-Mar-10 at 07:31Please check Spring Cloud Data Flow, Helm Installation, Register prebuilt applications, it says:
Applications can be registered individually using the app register functionality or as a group using the app import functionality.
So, I guess you always need to start the app using Helm Chart and only later register the applications using app
or REST Endpoint.
QUESTION
Recently I updated my JDK from 8 (1.8_275) to 11 (openjdk version "11.0.9.1" 2020-11-04)
While I am trying to launch SCDF local server using
...ANSWER
Answered 2020-Dec-22 at 16:34You're using an ancient and deprecated version of SCDF. The 1.x version of SCDF has reached EOL/EOGS, as well. In particular, the version you're using is >2yrs old.
Please upgrade to the 2.x version. The latest GA is at 2.7.0.
Check out the getting-started guide and the release blog for more details.
QUESTION
I am currently trying to integrate keycloak with spring cloud dataflow 2.3.0 but the configurations are showing in the documentation is not working for this version. I tried the same with version spring cloud dataflow 2.2.2 and the integrations worked okay. This the config I am added in application.yaml for both the versions,
...ANSWER
Answered 2020-Nov-01 at 14:26The configurations are changed from the version 2.3.0 which is not documented in the dataflow documentations. I have added only the keycloak related configuration in github https://github.com/ChimbuChinnadurai/spring-cloud-dataflow-keycloak-integration
QUESTION
We are trying to turn on the security for Spring Cloud Data Flow following the documentation (https://docs.spring.io/spring-cloud-dataflow/docs/current-SNAPSHOT/reference/htmlsingle/#configuration-security) but we have some knowledge gaps that we are not capable to fill.
According to the point 9.2, it is possible to configure the authentication with OAuth 2.0 and integrate it with SSO. We use RedHat SSO, so we are trying to integrate both of them, but we are not capable to make it works, is it possible or there is a limitation about the SSO used?
Following the documentation, we have set these properties:
- spring.security.oauth2.client.registration.uaa.client-id=xxxxxxx
- spring.security.oauth2.client.registration.uaa.client-secret=xxxxxx
- spring.security.oauth2.client.registration.uaa.redirect-uri='{baseUrl}/login/oauth2/code/{registrationId}'
- spring.security.oauth2.client.registration.uaa.authorization-grant-type=authorization_code
- spring.security.oauth2.client.registration.uaa.scope[0]=openid
- spring.security.oauth2.client.provider.uaa.jwk-set-uri=../openid-connect/certs
- spring.security.oauth2.client.provider.uaa.token-uri=../openid-connect/token
- spring.security.oauth2.client.provider.uaa.user-info-uri=../openid-connect/userinfo
- spring.security.oauth2.client.provider.uaa.user-name-attribute=user_name
- spring.security.oauth2.client.provider.uaa.authorization-uri=../openid-connect/auth
- spring.security.oauth2.resourceserver.opaquetoken.introspection-uri=../openid-connect/token/introspect
- spring.security.oauth2.resourceserver.opaquetoken.client-id=xxxxxxx
- spring.security.oauth2.resourceserver.opaquetoken.client-secret=xxxxxxx
So we have some considerations:
- The properties resourceserver.opaquetoken are needed for the introspection of the token, so we are pretty sure that they are necessary for when we receive a REST request and it must have the Authorization header
- If we are not using UAA, should the properties be named uaa?
- When we try to access to de UI, it redirects to the authorization-uri because the authorization-grant-type=authorization_code, so it will login in the SSO, is that right?
- If we use the grant-type Password it would request directly a username/password for login, where does it is validated?
- The user-info URI is mandatory but, is it really used?
- What are the other URIs (jwk and token) used for?
- Why the redirect URI has that format? where does that variables point to?
Finally, we have test the configuration in a SCDF running in a Docker container, but it does "nothing":
...ANSWER
Answered 2020-Jul-21 at 08:01These are all plain Spring Security OAuth settings and concepts are better documented there. We're in a process to add better docks for keycloak but in a meanwhile my old test dataflow-keycloak might get you started.
In a recent versions we added a better way to use plain jwt keys and we documented it for Azure/AD. Plan is to add similar section for keycloak.
I believe just by using issuer-uri and jwk-set-uri should give you a working setup(you still need to figure out scope to roles mappings) as Spring Security is using those to autoconfigure oauth settings. All the other settings are kinda legacy dating back times when we weren't fully on Spring Security 5.3 line.
For RH SSO I'm not sure if you're talking about some global shared instance or your private setup.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install spring-cloud-dataflow
You can use spring-cloud-dataflow like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the spring-cloud-dataflow component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page