flink-statefun | Apache Flink Stateful Functions | Serverless library
kandi X-RAY | flink-statefun Summary
kandi X-RAY | flink-statefun Summary
Stateful Functions is an API that simplifies the building of distributed stateful applications with a runtime built for serverless architectures. It brings together the benefits of stateful stream processing - the processing of large datasets with low latency and bounded resource constraints - along with a runtime for modeling stateful entities that supports location transparency, concurrency, scaling, and resiliency. It is designed to work with modern architectures, like cloud-native deployments and popular event-driven FaaS platforms like AWS Lambda and KNative, and to provide out-of-the-box consistent state and messaging while preserving the serverless experience and elasticity of these platforms. Stateful Functions is developed under the umbrella of Apache Flink. This README is meant as a brief walkthrough on the core concepts and how to set things up to get yourself started with Stateful Functions. For a fully detailed documentation, please visit the official docs.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Override this method to send incoming request
- Serialize protobuf message to byte buffers
- Zeroizes a message to a buffer
- Get the headers
- Deserialize a bootstrap bootstrap data
- Copy the contents of the data input to the output view
- Deserialize TaggedBootstrapData
- Creates a copy of the given bootstrap data
- Initialize the buffer
- Create the Reductions
- Starts the job
- Deserialize an AWS region
- Creates a stateful cluster instance
- Returns the properties at a given node as a map
- Entry point
- Drains all completed futures on the operator thread
- Copy object to targetClassLoader
- Gets the long properties at a given position
- Opens the Operator
- Deserialize the credentials
- Returns a hash code for this map
- Loads the services from the classpath
- Initialize state
- Called when a channel is created
- Returns the command line options
- Returns a string representation of the data type information
flink-statefun Key Features
flink-statefun Examples and Code Snippets
Community Discussions
Trending Discussions on flink-statefun
QUESTION
I understand in general that event time uses Watermarks to make progress in time. In the case of Flink Statefun which is more based on iteration it may be a problem. So my question is if I use the delayed message (https://nightlies.apache.org/flink/flink-statefun-docs-stable/docs/sdk/java/#sending-delayed-messages), then does it mean we can use only processing time notion in Stateful functions ?
I would like to change to Event time processing model but not sure how it will work with Stateful functions.
...ANSWER
Answered 2022-Feb-03 at 09:06Stateful Functions (statefun) doesn't support watermarks or event-time processing. But you could implement your own triggering logic based on the timestamps in arriving events.
QUESTION
I have a working Flink job built on Flink Data Stream. I want to REWRITE the entire job based on the Flink stateful functions 3.1.
The functions of my current Flink Job are:- Read message from Kafka
- Each message is in format a slice of data packets, e.g.(s for slice):
- s-0, s-1 are for packet 0
- s-4, s-5, s-6 are for packet 1
- The job merges slices into several data packets and then sink packets to HBase
- Window functions are applied to deal with disorder of slice arrival
- Currently I already have Flink Stateful Functions demo running on my k8s. I want to do rewrite my entire job upon on stateful functions.
- Save data into MinIO instead of HBase
I have read the doc and got some ideas. My plans are:
- There's no need to deal with Kafka anymore,
Kafka Ingress
(https://nightlies.apache.org/flink/flink-statefun-docs-release-3.0/docs/io-module/apache-kafka/) handles it - Rewrite my job based on java SDK. Merging are straightforward. But How about window functions?
- Maybe I should use persistent state with TTL to mimic window function behaviors
- Egress for
MinIO
is not in the list of defaultFlink I/O Connectors
, therefore I need to write my customFlink I/O Connector
forMinIO
myself, according to https://nightlies.apache.org/flink/flink-statefun-docs-release-3.0/docs/io-module/flink-connectors/ - I want to avoid
Embedded module
because it prevents scaling. Auto scaling is the key reason why I want to migrate toFlink stateful functions
I don't feel confident with my plan. Is there anything wrong with my understandings/plan?
Are there any best practice I should refer to?
Update: windows were used to assemble results- get a slice, inspect its metadata and know it is the last one of the packet
- also knows the packet should contains 10 slices
- if there are already 10 slices, merge them
- if there are not enough slices yet, wait for sometime (e.g. 10 minutes) and then either merge or record packet errors.
I want to get rid of windows during the rewrite, but I don't know how
...ANSWER
Answered 2022-Jan-10 at 19:11Background: Use KeyedProcessFunctions Rather than Windows to Assemble Related Events
With the DataStream API, windows are not a good building block for assembling together related events. The problem is that windows begin and end at times that are aligned to the clock, rather than being aligned to the events. So even if two related events are only a few milliseconds apart they might be assigned to different windows.
In general, it's more straightforward to implement this sort of use case with keyed process functions, and use timers as needed to deal with missing or late events.
Doing this with the Statefun API
You can use the same pattern mentioned above. The function id will play the same role as the key, and you can use a delayed message instead of a timer:
- as each slice arrives, add it to the packet that's being assembled
- if it is the first slice, send a delayed message that will act as a timeout
- when all the slices have arrived, merge them and send the packet
- if the delayed message arrives before the packet is complete, do whatever is appropriate (e.g., go ahead and send the partial packet)
QUESTION
I have 2 questions related to high availability of a StateFun application running on Kubernetes
Here are details about my setup:
- Using StateFun v3.1.0
- Checkpoints are stored on HDFS (state.checkpoint-storage: filesystem)
- Checkpointing mode is EXACTLY_ONCE
- State backend is rocksdb and incremental checkpointing is enabled
1- I tried both Zookeeper and Kubernetes HA settings, result is the same (log below is from a Zookeeper HA env). When I kill the jobmanager pod, minikube starts another pod and this new pod fails when it tries to load last checkpoint:
...ANSWER
Answered 2021-Dec-15 at 16:51In statefun <= 3.2 routers do not have manually specified UIDs. While Flinks internal UID generation is deterministic, the way statefun generates the underlying stream graph may not be in some cases. This is a bug. I've opened a PR to fix this in a backwards compatible way[1].
QUESTION
Good day to all, Started work recently with Apache Flink Stateful functions. We are using Flink reporter to put metrics to InfluxDB https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/metric_reporters/ Stateful functions provides "function" scope with several metrics out of the box https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.2/deployment-and-operations/metrics.html but it's not enough and there is a need to add custom metrics and measurements. All source code seems to be closed to extention and I'm not able to find the proper way how to do this. Please share your experience if someone managed to complete this task.
...ANSWER
Answered 2021-Nov-06 at 07:56The ability to add user defined metrics was added to the main branch recently for the embedded-functions SDK. See JIRA issue.
With that change, you can do something like this:
QUESTION
When deploying Flink Stateful Functions, one needs to specify what the endpoints for the functions are, i.e. what URL does Flink need to hit in order to trigger the execution of a remote function.
The docs state:
...The URL template name may contain template parameters that are filled in based on the function’s specific type. For example, a message sent to message type com.example/greeter will be sent to http://bar.foo.com/greeter.
ANSWER
Answered 2021-Nov-03 at 23:06The only template value supported at the moment is the function name. i.e. the last value after the last forward slash /
. You can place it wherever you would like in the template as long as it would resolve to a legal url at the end.
For example, this is also a valid template:
http://{function.name}.prod.svc.example.com
Then, a message address to com.example/greeter
(in your example, with my new template) would resolve to:
http://greeter.prod.svc.example.com
If you are missing any other template parameters, feel free to connect with the Flink community over the user mailing list/JIRA. I'm sure they would be happy to learn about new uses cases ;-)
QUESTION
I am trying to dive into the new Stateful Functions approach and I already tried to create a savepoint manually (https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.1/deployment-and-operations/state-bootstrap.html#creating-a-savepoint).
It works like a charm but I can't find a way how to do it automatically. For example, I have a couple millions of keys and I need to write them all to savepoint.
...ANSWER
Answered 2020-Jul-26 at 11:14Is your question about how to replace the env.fromElements
in the example with something that reads from a file, or other data source? Flink's DataSet API, which is what's used here, can read from any HadoopInputFormat
. See DataSet Connectors for details.
There are easy-to-use shortcuts for common cases. If you just want to read data from a file using a TextInputFormat
, that would look like this:
QUESTION
According to this page we have the ability to set TTL for state when using Flink Statefun v2.1.0.
We also have the ability to bootstrap state, according to this page.
First question is, bootstrap documentation does not mention state expiration at all. What is the correct way to do bootstrapping on states that have TTL? Can someone point me to an example?
The second question is, what happens if I set some state as expire after writing in 1 day and then bootstrap that state using 6 months worth data?
Is the whole bootstrapped state going to expire after literally 1 day?
If so, what can I do to have it expire 1 day worth of data after 1 day passes?
...ANSWER
Answered 2020-Jul-24 at 20:54Yes, if that data hasn't been modified since it was loaded, it will all be deleted after one day.
To expire one day's worth of data every day: After bootstrapping the state, you could send yourself a delayed message, set to be delivered one day later. When it arrives, delete the oldest data and send another delayed message.
QUESTION
I'm trying to egress to confluent kafka from flink statefun. In confluent git repo in order to schema check and put data to kafka topic all we need to do is use kafka client ProducerRecord object with avro object.
But in statefun we need to override "ProducerRecord serialize" method for kafka egress. This causes the following error.
...ANSWER
Answered 2020-Jun-24 at 15:40Schema registry is not directly supported at this version of stateful functions, but few workarounds are possible:
- Connect to the schema registry by your self from the
KafkaEgressSerializer
class. In your linked example that would need to be happening here. - Provide your own instance of a FlinkKafkaProducer that is based on (see
AvroDeserializationSchema
) - Mange the schemas outside of stateful functions, but serialize your Avro record to bytes. Make sure to remove the schema registry from the properties that being passed to the
KafkaProducer
QUESTION
I have a properly working embedded job and I want to deploy additional co-located jobs. These newly added jobs will receive messages from the old job and send it to kafka topic.
code as below
...ANSWER
Answered 2020-May-04 at 18:12Responses inline, and FYI nothing you are asking is co-located specific. These properties hold for remote modules and jobs that contain mixed workloads of co-located and remote.
Do I have to define ingress for every co-located job? If not how can I make this work?
Yes, every job (remote or colocated) requires at least one ingress. An ingress is a channel that consumes messages from the outside world into a statefun application. Think Kafka or Kinesis. Without an ingress, the job would never do anything because there would be no initial messages to begin the processing.
To each ingress, you will bind 1 or more routers, which take each message from the ingress and forward them to 0 or more functions based on their function types[1].
How can I get co-located jobs to communicate? Is it enough to use the same FunctionType?
Yes, functions simply message each other using their function types.
Are co-located functions communicating over ingress/egress?
No, messages are passed between functions using the Apache Flink runtime which contains a highly optimized network stack. Once a message is pulled from an ingress, it never interacts with that ingress again. If interested, you can read about how Flink's network stack works in some blog posts that the community wrote, but this is not necessary to successfully use statefun in production[2].
[1] https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.0/io-module/index.html#router
[2]https://flink.apache.org/2019/06/05/flink-network-stack.html
QUESTION
Flink Stateful Functions 2.0 has the ability to make asychronous calls, for example to an external API: [https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.0/sdk/java.html#completing-async-requests][1].
Function execution is then paused until the call completed with Success, Failure, or Unknown. Unknown is:
The stateful function was restarted, possibly on a different machine, before the CompletableFuture was completed, therefore it is unknown what is the status of the asynchronous operation.
What happens when there is a second call with the same ID to the paused/waiting function?
- Does the callee then wait on the called function's processing of its async result so that this second call executes with a clean, non-shared post-async state?
- Or does the second call execute on a normal schedule, and thus on top of the state that was current as of the async call, and then when the async call completes it continues processing using the state that was updated while the async call was pending?
- Or maybe the call counts as a "restart" of the called function - in which case what is the order of execution: the "restart" runs and then the async returns with "restart" to execute from the now updated state, or this order is reversed?
- Or something else?
ANSWER
Answered 2020-Apr-19 at 21:20Function execution does not pause while an async request is completing. The instance for that id will continue to process messages until the request completes. This means the state can change while the future is running.
Think of your future as an ad-hoc function that you message and that then messages you back when it has a result. Functions can spawn multiple asynchronous requests without issue. Whichever future completes first will be processed first by the function instance, not necessarily the order in which they were spawned.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install flink-statefun
You can use flink-statefun like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the flink-statefun component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page