cloudflow | Cloudflow enables users to quickly develop

by lightbend Scala Version: v2.3.1 License: Apache-2.0

X-Ray Key Features Code Snippets Community Discussions(3)Vulnerabilities Install Support

kandi X-RAY | cloudflow Summary

cloudflow is a Scala library typically used in Big Data, Kafka, Spark, Hadoop applications. cloudflow has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

Technologies like mobile, the Internet of Things (IoT), Big Data analytics, machine learning, and others are driving enterprises to modernize how they process large volumes of data. A rapidly growing percentage of that data is now arriving in the form of data streams. To extract value from that data as soon as it arrives, those streams require near-realtime processing. We use the term "Fast Data" to describe applications and systems that deal with such requirements. The Fast Data landscape has been rapidly evolving, with tools like Spark, Flink, and Kafka Streams emerging from the world of large-scale data processing while projects like Reactive Streams and Akka Streams have emerged from the world of application development and high-performance networking. The demand for availability, scalability, and resilience is forcing fast data architectures to become more like microservice architectures. Conversely, successful organizations building microservices find their data needs grow with their organization while their data sources are becoming more stream-like and more real-time. Hence, there is a unification happening between streaming data and microservice architectures. It can be quite hard to develop, deploy, and operate large-scale microservices-based systems that can take advantage of streaming data and seamlessly integrate with systems for analytics processing and machine learning. The individual technologies are well-documented, but combining them into fully integrated, unified systems is no easy task. Cloudflow aims to make this easier by integrating the most popular streaming frameworks into a single platform for creating and running distributed Fast Data applications on Kubernetes.

Support

Quality

Security

License

Reuse

Support

cloudflow has a low active ecosystem.

It has 312 star(s) with 91 fork(s). There are 17 watchers for this library.

It had no major release in the last 12 months.

There are 96 open issues and 357 have been closed. On average issues are closed in 315 days. There are 23 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of cloudflow is v2.3.1

Quality

cloudflow has no bugs reported.

Security

cloudflow has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

cloudflow is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

cloudflow releases are available to install and integrate.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of cloudflow

Get all kandi verified functions for this library.

cloudflow Key Features

No Key Features are available at this moment for cloudflow.

cloudflow Examples and Code Snippets

No Code Snippets are available at this moment for cloudflow.

Community Discussions

Trending Discussions on cloudflow

Setting up Scala project

Cloudflow is unable to read avro message from kafka

Force Apache Flink to fail and restore its state from checkpoint

QUESTION

Setting up Scala project

Asked 2021-Apr-14 at 13:53

Is there a standard in place for setting up a Scala project where the build.sbt is contained in a subdirectory?

I've cloned https://github.com/lightbend/cloudflow and opened it in IntelliJ, here is the structure:

Can see core contains build.sbt.

If I open the project core in a new project window then IntelliJ will recognise the Scala project.

How to compile the Scala project core while keeping the other folders available within the IntelliJ window?

...

ANSWER

Answered 2021-Apr-14 at 13:53

EDIT: If you do want to play around with the project, it should suffice to either import an SBT project and select core as the root. Intellij should also detect the build.sbt if you open core as the root.

Here is the SBT Reference Manual

Traditionally, build.sbt will be at the root of the project.

If you are looking to use their libraries, you should import them in your sbt file, you shouldn't clone the repo unless you intend to modify or fork their repo.

For importing libraries into your project take a look at the Maven Repository for Cloudflow, select the project(s), click on the version you want, and select the SBT tab. Just copy and paste those dependencies into your build.sbt. Once you build the project with SBT, you should have all those packages available to you.

So in [ProjectRoot]/build.sbt something along the lines of

Source https://stackoverflow.com/questions/67080517

QUESTION

Cloudflow is unable to read avro message from kafka

Asked 2021-Jan-10 at 14:59

I am using lightbend cloudflow to develop my application that consumes from external kafka topic.

The external kafka topic contains avro records and if i try to use kafka-avro-console-consumer with schema-regestry, then able to fetch message.

but in the same case cloudflow is unable to deserialize the message and throws exception.

...

ANSWER

Answered 2021-Jan-10 at 14:59

com.twitter.bijection.avro.BinaryAvroCodec does not work with the Confluent Schema Registry format.

You'll need to adjust your Kafka client's deserializer settings to use the approriate KafkaAvroDeserializer class from Confluent

Source https://stackoverflow.com/questions/65624078

QUESTION

Force Apache Flink to fail and restore its state from checkpoint

Asked 2020-Jun-26 at 08:00

We use Apache Flink job cluster on Kubernetes that consists of one Job Manager and two Task Managers with two slots each. The cluster is deployed and configured using Lightbend Cloudflow framework.

We also use RocksDB state backend together with S3-compatible storage for the persistence. There are no any issues considering both savepoints creation from CLI. Our job consists of a few keyed states (MapState) and tends to be rather huge (we expect at least 150 Gb per each state). The Restart Strategy for the job is set to the Failure Rate. We use Apache Kafka as a source and sink throughout our jobs.

We currently doing some tests (mostly PoC's) and there are a few questions lingering:

We did some synthetic tests and passed incorrect events to the job. That lead to the Exceptions were thrown during the execution. Due to Failure Rate strategy the following steps happen: The Corrupted message from Kafka is read via source -> The Operator tries to process the event and eventually throws an Exception -> The Job restarts and reads THE SAME record from Kafka as at the step before -> The Operator fails -> The Failure Rate finally exceeds the given value and the job eventually stops. What should I do next? If we try to restart the job seems that it will be restored with the latests Kafka consumer state and will read the corrupted message once again, leading us back to the previously mentioned behavior? Which are the right steps to bare with such issues? And does Flink utilize any kind of so-called Dead Letter Queues?

The other question is about the checkpointing and restore mechanics. We are currently can't figure out which exceptions raised during a job execution are considered as critical and lead to the failure of the job following by automatic recovery from the latest checkpoint? As it described in the previous case, the ordinary Exception raised inside the job leads to continious restarts that finally followed by the job termination. We are looking for a cases to reproduce when something is happened with our cluster (Job Manager fails, Task Manager fails or something) that leads to the automatic recovery from the latest checkpoint. Any suggestions are welcomed considering such scenario in Kubernetes cluster.

We had sank into the Flink official documentation but didn't find any related information or possibly perceived it in the wrong way. Great thanks!

...

ANSWER

Answered 2020-Jun-26 at 08:00

The approach that Flink's Kafka deserializer takes is that if the deserialize method returns null, then the Flink Kafka consumer will silently skip the corrupted message. And if it throws an IOException, the pipeline is restarted, which can lead to a fail/restart loop as you have noted.

This is described in the last paragraph of this section of the docs.

Past work and discussion on this topic can be found in https://issues.apache.org/jira/browse/FLINK-5583 and https://issues.apache.org/jira/browse/FLINK-3679, and in https://github.com/apache/flink/pull/3314.

A dead letter queue would be a nice improvement, but I'm not aware of any effort in that direction. (Right now, side outputs from process functions are the only way to implement a dead letter queue.)

Source https://stackoverflow.com/questions/62467193

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install cloudflow

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: