kafka-cluster | A kafka development cluster using docker | Continuous Deployment library

by spiside Shell Version: Current License: MIT

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | kafka-cluster Summary

kafka-cluster is a Shell library typically used in Devops, Continuous Deployment, Docker, Kafka, Hadoop applications. kafka-cluster has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

A kafka development cluster using docker

Support

Quality

Security

License

Reuse

Support

kafka-cluster has a low active ecosystem.

It has 24 star(s) with 12 fork(s). There are 1 watchers for this library.

It had no major release in the last 6 months.

There are 1 open issues and 1 have been closed. On average issues are closed in 115 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of kafka-cluster is current.

Quality

kafka-cluster has no bugs reported.

Security

kafka-cluster has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

kafka-cluster is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

kafka-cluster releases are not available. You will need to build from source code and install.

Installation instructions, examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of kafka-cluster

Get all kandi verified functions for this library.

kafka-cluster Key Features

No Key Features are available at this moment for kafka-cluster.

kafka-cluster Examples and Code Snippets

No Code Snippets are available at this moment for kafka-cluster.

Community Discussions

Trending Discussions on kafka-cluster

Why Kafka partition is absolutely needed for scalability

In Kafka Connect, how to connect with multiple kafka clusters?

kubernetess multiple deployments using one code base but different configuration (environement variables)

Connect Kafka topics to elasticsearch using fast-data-dev using docker

Kafka consumer leaves group after failing to heartbeat

Setting kafka mirroring using Brooklin

Error when using StatefulSet object from Kubernetes to find the kafka broker without a static IP

Spring Kafka Configuration for 2 different kafka cluster setups

Does kafka consumer manages its state?

How to change the name of the topic generated by Kafka Connect Source Connector

QUESTION

Why Kafka partition is absolutely needed for scalability

Asked 2021-May-25 at 12:36

I have found this question that talks about the difference between partition and replica, and the answers seem to mention that the Kafka partition is needed for scalability. But I don't get why it's "mandatory" in order to scale your infrastructure? I feel like you could simply add a new node and increase the replication value of the topic?

...

ANSWER

Answered 2021-May-25 at 12:36

Consumer Application side Scalability

Partitions are not shared within same group consumer instances. If your topic has only one partition, And your consumer application has multiple instances with same consumer group id it is useless. So if you need to scale your consumer application to multiple instances, you need to have multiple partitions.

Kafka Broker side Scalability

And if your topic is too busy with messages, if you have multiple partition, you can add another node and rebalance partitions so they will be shared with new brokers. So, broker traffic will be shared with multiple partitions. If you have only one partition, no traffic is shared, making that not scalable.

Source https://stackoverflow.com/questions/67685142

QUESTION

In Kafka Connect, how to connect with multiple kafka clusters?

Asked 2021-May-07 at 08:08

I set the kafka connect cluster in distributed mode and I wanna get connections with multiple kafka CLUSTERS, not just multiple brokers. Target brokers can be set with bootstrap.servers in connect-distributed.properties. So, at first, I set broker1 from kafka-cluster-A like below:

...

ANSWER

Answered 2021-May-07 at 08:08

As far as I know, you can only connect a Kafka Connect worker to one Kafka cluster.

If you have data on different clusters that you want to handle with Kafka Connect then run multiple Kafka Connect worker processes.

Source https://stackoverflow.com/questions/67429769

QUESTION

kubernetess multiple deployments using one code base but different configuration (environement variables)

Asked 2021-Apr-19 at 15:28

I have a project where we are consuming data from kafka and publishing to mongo. In fact the code base does only one task, may be mongo to kafka migration, kafka to mongo migration or something else.

we have to consume from different kafka topics and publish to different mongo collections. Now these are parallel streams of work.

Current design is to have one codebase which can consume from Any topic and publish to Any mongo collection which is configurable using Environment variables. So we created One kubernetes Pod and have multiple containers inside it. each container has different environment variables.

My questions:

Is it wise to use multiple containers in one pod. Easy to distinguish, but as they are tightly coupled , i am guessing high chance of failure and not actually proper microservice design.
Should I create multiple deployments for each of these pipelines ? Would be very difficult to maintain as each will have different deployment configs.
Is there any better way to address this ?

Sample of step 1:

...

ANSWER

Answered 2021-Apr-18 at 12:22

A templating tool like Helm will let you fill in the environment-variable values from deploy-time settings. In Helm this would look like:

Source https://stackoverflow.com/questions/67148209

QUESTION

Connect Kafka topics to elasticsearch using fast-data-dev using docker

Asked 2021-Mar-11 at 08:42

I will like to send data from kafka to elasticsearch using fast-data-dev docker image and elasticsearch latest, kibana latest. But I got the following error:

...

ANSWER

Answered 2021-Mar-11 at 08:42

After you use network it will not be localhost anymore. You need to use your service name as connection.url . Can you try connection.url=http://elasticsearch:9200 and maybe without http

Source https://stackoverflow.com/questions/66576905

QUESTION

Kafka consumer leaves group after failing to heartbeat

Asked 2020-Dec-08 at 05:15

We are facing an issue with a set of kafka-consumers. Whenever there is activity on the kafka-cluster like rebooting the brokers(rolling restarts) or reboot of VM's running the broker, our kafka consumers LeaveGroup after failing to heartbeat. The below logs repeat exactly for one minute and correspond to the commitSync call being done in the application code as a part of consuming messages from the topic

...

ANSWER

Answered 2020-Nov-19 at 20:58

When you mention that you are doing activities on the kafka broker i.e. restarting VMs (these should be controlled restarts of the kafka service and not the containers). What I mean is if you want constant consumption even in the time of maintenance you must consider the following -

Kafka Brokers must be pulled down for maintenance in a rolling restart fashion, click here for details
The above is suggested one at a time or depending on the ISR settings in the cluster configs
Number of partitions and the replication factor must be greater than 1 as if a broker is down for maintenance your topic should not have offline partitions which results in a producer/consumer failure and in turn data loss

A personal suggestion the controller can be pull down as the last one in the rolling restarts to avoid multiple controller switches and reloading of the cluster metadata

When we do a rolling restart for each broker after a maintenance activity, the broker takes some time to come up, i.e. the time taken for it to repopulate the partition metadata and for all the under replicated partitions to return to 0 (This is very important so as not to pressure the controller with multiple restarts, as multiple under replicated partitions amy cause offline/unavailable topic partitions depending on your config)

On top of the above you can definitely tweak the following consumer configs -

heartbeat.interval.ms - must be lower than session.timeout.ms
session.timeout.ms
max.poll.interval.ms THe above can be tweaked based on your connection latencies and kafka cluster status You can read more about them here on the Confluent Docs

It is also possible at time when cluster maintenance activity takes place, the broker assigned to be the partition leader takes time to respond which is significantly greater than the session.timeout.msmax.poll, in the case of which the consumer stops retrying. So tweaking the consumer configs and maintaining sanity in cluster operations is the key to a healthy and continuous kafka integration.*

Note - On a personal opinion, having done cluster upgrades/maintenance activities with over 1Gpbs throughput and we do not face consumption issues (expect a spike in request handler/network handler latencies because of the rebalance). Keeping the above disclaimers in mind with careful execution, updates are easier but definitely time consuming as they are to be executed serially in fashion

More help with documentation for cluster maintenance and consumer behavior tweaks -

Source https://stackoverflow.com/questions/64915296

QUESTION

Setting kafka mirroring using Brooklin

Asked 2020-Oct-14 at 11:49

I am trying to test out Brooklin for mirroring data between kafka clusters. I am following the wiki https://github.com/linkedin/brooklin/wiki/mirroring-kafka-clusters

Unlike the wiki - I am trying to setup the mirroring between 2 different clusters. I am able to start the Brooklin process and the Datastream but I cannot manage to mirror messages. Brooklin is running on the source kafka cluster ATM. I am trying to mirror topic 'test'

The server.properties for brooklin is

...

ANSWER

Answered 2020-Oct-14 at 11:49

This was an issue on my end - I had a typo in the topic name configured for mirroring.

Source https://stackoverflow.com/questions/64350944

QUESTION

Error when using StatefulSet object from Kubernetes to find the kafka broker without a static IP

Asked 2020-Oct-01 at 10:48

I am trying to deploy Zookeeper and Kafka on Kubernetes using the confluentinc docker images. I based my solution on this question and this post. The Zookeeper is running without errors on the log. I want to deploy 3 Kafka brokers using StatefulSet. The problem with my yaml files is that I don't know how to configure the KAFKA_ADVERTISED_LISTENERS property for Kafka when using 3 brokers.

Here is the yaml files for zookeeper:

...

ANSWER

Answered 2020-Oct-01 at 10:48

I have been reading this blog post "Kafka Listeners - Explained" and I was able to configure 3 Kafka brokers with the following configuration.

Source https://stackoverflow.com/questions/64135892

QUESTION

Spring Kafka Configuration for 2 different kafka cluster setups

Asked 2020-Jul-27 at 14:24

In one of our spring-boot based services, we intended to connect to 2 different kafka clusters simultaneously. These clusters each have their own set of bootstrap-servers, topic configurations etc. They are nowhere related to each other as was the case in this question.

I will have different types messages to be read from each cluster on different topic names. There may or may not be multiple producers connecting to both the clusters from this service but we will surely have at least one consumer per cluster.

I would like to know how can I define properties in application.yml to cater to this setup so that I can just use 2 different KafkaProperties objects to create 4 container factories (2 consumer, 2 producer). The rest, I believe, should be pretty straight forward as I would need to use the relevant factory to create a particular container/listener/kafkaTemplate as per the business requirements.

...

ANSWER

Answered 2020-Jul-27 at 14:24

You cannot; you need to disable Boot's auto configuration and configure the infrastructure beans for each cluster yourself.

Boot's auto configuration only supports one cluster.

Source https://stackoverflow.com/questions/63110194

QUESTION

Does kafka consumer manages its state?

Asked 2020-May-26 at 03:31

I have a Kafka cluster set up using three GCP VM instances. All three VMs have Zookeepers and servers running on then.

I followed this guide for my set up: How to Setup Multi-Node Multi Broker Kafka Cluster in AWS

I have kept all three instances in different regions in order to achieve High Availability in case of regional failure in Cloud Service (GCP or AWS, I understand it is highly unlikely).

I have a topic created with replication-factor as 3. Now, suppose one region goes entirely down and only two nodes are alive. What happens to a consumer who was reading from the VM in the failed (which was working previously)?

Once the services in the region come back up, would this consumer (having a unique client_id) maintain its state and read only new messages?

Also, what if I have 7 VMs divided (2,2,3) across 3 regions:

Is it a good idea to keep the replication factor as 7 to achieve high availability?
I am thinking of 7 VMs because any of the regions go down, we still have the majority of Kafka nodes running. Is it possible to run a Kafka cluster with majority of nodes down? (E.g 4 out of 7 nodes down)

...

ANSWER

Answered 2020-May-26 at 03:31

The Kafka provide various setting to achieve high availability and can be tuned and optimize based on requirement

1. min.insync.replicas: Is the minimum number of copies will be always live at any time to continue running Kafka cluster. e.g. lets we have 3 broker nodes and one broker node got down in that case if min.insync.replicas = 2 or less cluster will keep serving request however if min.insync.replicas 3 it will stop. Please note min.insync.replicas=1 is not advisable in that case if data lost will lost forever. min.insync.replicas is a balance between higher consistency (requiring writes to more than one broker) and higher availability (allowing writes when fewer brokers are available).

2. ack(Acknowledgements): While publishing message we can set how many replica commit before producer receive acknowledge. e.g. ack is 0 means immediately acknowledge the message without waiting any commit to partition. ack is 1 means get success acknowledge after message get commit to the leader. ack is all means acknowledge after all in-sync replicas committed.

leader.election.enable: You can set unclean.leader.election.enable=true on your brokers and in this case, if no replicas are in-sync, one of the out-of-sync replicas will be elected. This can lead to data loss but is favoring availability. Of course if some replicas are in-sync it will still elect one of them

offsets.topic.replication.factor: should be greater than in case __consumer_offsets to have high available. __consumer_offsets is offset topic to manage topic offset so in case you have topic replication factor is 1 it may failed to consumer if one broker got down

__consumer_offsets maintain committed for each topic of consumer group.

Consumer always reads from a leader partition Consumers do not consume from followers, the followers only exist for redundancy and fail-over. There are possibilities that failed broker consists multiple leader partition. So in that case follower from partition on other broker will get promoted to leader. There different scenario lets follower partition doesn't have 100% in-sync with leader partition then we might loss some data. Here scenario comes how or what ack your using while publishing messages.

There is trade off how many number of partition or replication is best number. I can say its depend design and choice based on discussion. Please make note large number of replica will give over burden which doing ISR with all follower partition with more memory occupied whereas very less like 1 or 2 will better performance but not high availability.

Broker controller The controller is one of the brokers that has additional partition and replica management responsibilities. It has some extra responsibility and store some meta details and response in any meta details change in partition.

Any of the brokers can play the role of the controller, but in a healthy cluster there is exactly one controller.Kafka broker will get relected to another controller in case controller shutdown or zookeeper lost connection. If controller is 0 or more than 1 means Broker is not healthy

Source https://stackoverflow.com/questions/58511319

QUESTION

How to change the name of the topic generated by Kafka Connect Source Connector

Asked 2020-May-17 at 15:22

I have an already running production deployed Kafka-Cluster and having Topic "existing-topic". I am using MongoDB-Source-Connector from Debezium.

Here all what I want is to push the CDC events directly to the topic "existing-topic" so that my consumers which are already listening to that topic will process it.

I didn't find any resource to do it so, however it's mentioned that topic is created in below format -

"If your mongodb.name parameter is A, database name is B and collection name is C, the data from database A and collection C will be loaded under the topic A.B.C"

Can I change the topic to "existing-topic" and push the events to it?

...

ANSWER

Answered 2020-May-17 at 15:20

According to the documentation,

The name of the Kafka topics always takes the form logicalName.databaseName.collectionName, where logicalName is the logical name of the connector as specified with the mongodb.name configuration property, databaseName is the name of the database where the operation occurred, and collectionName is the name of the MongoDB collection in which the affected document existed.

This means that if your connector's logical name is myConnector and your database myDatabase has two collections users and orders

Source https://stackoverflow.com/questions/61847570

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install kafka-cluster

Before starting the cluster you will need to install docker (docker-engine >= 0.10.0) and docker-compose. If you already have these installed, you can skip to Getting Started.
You can install docker from here.
With docker and docker-compose installed, the simplest way to get the cluster up and running is to run the bootstrap command. The bootstrap command will launch a 2 node Kafka cluster with a single Zookeeper node and create a Kafka topic test with 2 partitions plus a replication factor of 2. You can run the command by entering the following in your shell:. You should see a success message if the bootstrap command ran successfully. To check and see if the docker containers are running, run the following. There should be three containers running where two will be named kafkacluster_kafka_<id> and one named kafkacluster_zookeeper_<id>.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: