cloudflow | Cloudflow enables users to quickly develop
kandi X-RAY | cloudflow Summary
kandi X-RAY | cloudflow Summary
Technologies like mobile, the Internet of Things (IoT), Big Data analytics, machine learning, and others are driving enterprises to modernize how they process large volumes of data. A rapidly growing percentage of that data is now arriving in the form of data streams. To extract value from that data as soon as it arrives, those streams require near-realtime processing. We use the term "Fast Data" to describe applications and systems that deal with such requirements. The Fast Data landscape has been rapidly evolving, with tools like Spark, Flink, and Kafka Streams emerging from the world of large-scale data processing while projects like Reactive Streams and Akka Streams have emerged from the world of application development and high-performance networking. The demand for availability, scalability, and resilience is forcing fast data architectures to become more like microservice architectures. Conversely, successful organizations building microservices find their data needs grow with their organization while their data sources are becoming more stream-like and more real-time. Hence, there is a unification happening between streaming data and microservice architectures. It can be quite hard to develop, deploy, and operate large-scale microservices-based systems that can take advantage of streaming data and seamlessly integrate with systems for analytics processing and machine learning. The individual technologies are well-documented, but combining them into fully integrated, unified systems is no easy task. Cloudflow aims to make this easier by integrating the most popular streaming frameworks into a single platform for creating and running distributed Fast Data applications on Kubernetes.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of cloudflow
cloudflow Key Features
cloudflow Examples and Code Snippets
Community Discussions
Trending Discussions on cloudflow
QUESTION
Is there a standard in place for setting up a Scala project where the build.sbt
is contained in a subdirectory?
I've cloned https://github.com/lightbend/cloudflow and opened it in IntelliJ, here is the structure:
Can see core contains build.sbt
.
If I open the project core
in a new project window then IntelliJ will recognise the Scala project.
How to compile the Scala project core
while keeping the other folders available within the IntelliJ window?
ANSWER
Answered 2021-Apr-14 at 13:53EDIT:
If you do want to play around with the project, it should suffice to either import an SBT project and select core as the root. Intellij should also detect the build.sbt
if you open core as the root.
Here is the SBT Reference Manual
Traditionally, build.sbt
will be at the root of the project.
If you are looking to use their libraries, you should import them in your sbt file, you shouldn't clone the repo unless you intend to modify or fork their repo.
For importing libraries into your project take a look at the Maven Repository for Cloudflow, select the project(s), click on the version you want, and select the SBT tab. Just copy and paste those dependencies into your build.sbt
. Once you build the project with SBT, you should have all those packages available to you.
So in [ProjectRoot]/build.sbt
something along the lines of
QUESTION
I am using lightbend cloudflow to develop my application that consumes from external kafka topic.
The external kafka topic contains avro records and if i try to use kafka-avro-console-consumer with schema-regestry, then able to fetch message.
but in the same case cloudflow is unable to deserialize the message and throws exception.
...ANSWER
Answered 2021-Jan-10 at 14:59com.twitter.bijection.avro.BinaryAvroCodec
does not work with the Confluent Schema Registry format.
You'll need to adjust your Kafka client's deserializer settings to use the approriate KafkaAvroDeserializer
class from Confluent
QUESTION
We use Apache Flink job cluster
on Kubernetes that consists of one Job Manager
and two Task Managers
with two slots each. The cluster is deployed and configured using Lightbend Cloudflow
framework.
We also use RocksDB
state backend together with S3-compatible storage for the persistence. There are no any issues considering both savepoints
creation from CLI. Our job consists of a few keyed states (MapState
) and tends to be rather huge (we expect at least 150 Gb per each state). The Restart Strategy
for the job is set to the Failure Rate
. We use Apache Kafka
as a source and sink throughout our jobs.
We currently doing some tests (mostly PoC's) and there are a few questions lingering:
We did some synthetic tests and passed incorrect events to the job. That lead to the Exceptions
were thrown during the execution. Due to Failure Rate
strategy the following steps happen: The Corrupted message from Kafka is read via source -> The Operator tries to process the event and eventually throws an Exception
-> The Job restarts and reads THE SAME record from Kafka as at the step before -> The Operator fails -> The Failure Rate
finally exceeds the given value and the job eventually stops. What should I do next? If we try to restart the job seems that it will be restored with the latests Kafka consumer state and will read the corrupted message once again, leading us back to the previously mentioned behavior? Which are the right steps to bare with such issues? And does Flink utilize any kind of so-called Dead Letter Queues
?
The other question is about the checkpointing and restore mechanics. We are currently can't figure out which exceptions raised during a job execution are considered as critical and lead to the failure of the job following by automatic recovery from the latest checkpoint? As it described in the previous case, the ordinary Exception
raised inside the job leads to continious restarts that finally followed by the job termination. We are looking for a cases to reproduce when something is happened with our cluster (Job Manager
fails, Task Manager
fails or something) that leads to the automatic recovery from the latest checkpoint. Any suggestions are welcomed considering such scenario in Kubernetes cluster.
We had sank into the Flink official documentation but didn't find any related information or possibly perceived it in the wrong way. Great thanks!
...ANSWER
Answered 2020-Jun-26 at 08:00The approach that Flink's Kafka deserializer takes is that if the deserialize
method returns null, then the Flink Kafka consumer will silently skip the corrupted message. And if it throws an IOException
, the pipeline is restarted, which can lead to a fail/restart loop as you have noted.
This is described in the last paragraph of this section of the docs.
Past work and discussion on this topic can be found in https://issues.apache.org/jira/browse/FLINK-5583 and https://issues.apache.org/jira/browse/FLINK-3679, and in https://github.com/apache/flink/pull/3314.
A dead letter queue would be a nice improvement, but I'm not aware of any effort in that direction. (Right now, side outputs from process functions are the only way to implement a dead letter queue.)
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install cloudflow
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page