kafka-cluster | A kafka development cluster using docker | Continuous Deployment library
kandi X-RAY | kafka-cluster Summary
kandi X-RAY | kafka-cluster Summary
A kafka development cluster using docker
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of kafka-cluster
kafka-cluster Key Features
kafka-cluster Examples and Code Snippets
Community Discussions
Trending Discussions on kafka-cluster
QUESTION
I have found this question that talks about the difference between partition and replica, and the answers seem to mention that the Kafka partition is needed for scalability. But I don't get why it's "mandatory" in order to scale your infrastructure? I feel like you could simply add a new node and increase the replication
value of the topic?
ANSWER
Answered 2021-May-25 at 12:36Consumer Application side Scalability
Partitions are not shared within same group consumer instances. If your topic has only one partition, And your consumer application has multiple instances with same consumer group id
it is useless. So if you need to scale your consumer application
to multiple instances, you need to have multiple partitions.
Kafka Broker side Scalability
And if your topic is too busy with messages, if you have multiple partition, you can add another node and rebalance partitions so they will be shared with new brokers. So, broker traffic will be shared with multiple partitions. If you have only one partition, no traffic is shared, making that not scalable.
QUESTION
I set the kafka connect cluster in distributed mode and I wanna get connections with multiple kafka CLUSTERS, not just multiple brokers.
Target brokers can be set with bootstrap.servers
in connect-distributed.properties
.
So, at first, I set broker1 from kafka-cluster-A like below:
ANSWER
Answered 2021-May-07 at 08:08As far as I know, you can only connect a Kafka Connect worker to one Kafka cluster.
If you have data on different clusters that you want to handle with Kafka Connect then run multiple Kafka Connect worker processes.
QUESTION
I have a project where we are consuming data from kafka and publishing to mongo. In fact the code base does only one task, may be mongo to kafka migration, kafka to mongo migration or something else.
we have to consume from different kafka topics and publish to different mongo collections. Now these are parallel streams of work.
Current design is to have one codebase which can consume from Any topic and publish to Any mongo collection which is configurable using Environment variables. So we created One kubernetes Pod and have multiple containers inside it. each container has different environment variables.
My questions:
- Is it wise to use multiple containers in one pod. Easy to distinguish, but as they are tightly coupled , i am guessing high chance of failure and not actually proper microservice design.
- Should I create multiple deployments for each of these pipelines ? Would be very difficult to maintain as each will have different deployment configs.
- Is there any better way to address this ?
Sample of step 1:
...ANSWER
Answered 2021-Apr-18 at 12:22A templating tool like Helm will let you fill in the environment-variable values from deploy-time settings. In Helm this would look like:
QUESTION
I will like to send data from kafka to elasticsearch using fast-data-dev docker image and elasticsearch latest, kibana latest. But I got the following error:
...ANSWER
Answered 2021-Mar-11 at 08:42After you use network it will not be localhost anymore. You need to use your service name as connection.url
. Can you try connection.url=http://elasticsearch:9200
and maybe without http
QUESTION
We are facing an issue with a set of kafka-consumers. Whenever there is activity on the kafka-cluster like rebooting the brokers(rolling restarts) or reboot of VM's running the broker, our kafka consumers LeaveGroup
after failing to heartbeat. The below logs repeat exactly for one minute and correspond to the commitSync
call being done in the application code as a part of consuming messages from the topic
ANSWER
Answered 2020-Nov-19 at 20:58When you mention that you are doing activities on the kafka broker i.e. restarting VMs (these should be controlled restarts of the kafka service and not the containers). What I mean is if you want constant consumption even in the time of maintenance you must consider the following -
- Kafka Brokers must be pulled down for maintenance in a rolling restart fashion, click here for details
- The above is suggested one at a time or depending on the ISR settings in the cluster configs
- Number of partitions and the replication factor must be greater than 1 as if a broker is down for maintenance your topic should not have offline partitions which results in a producer/consumer failure and in turn data loss
A personal suggestion the controller can be pull down as the last one in the rolling restarts to avoid multiple controller switches and reloading of the cluster metadata
When we do a rolling restart for each broker after a maintenance activity, the broker takes some time to come up, i.e. the time taken for it to repopulate the partition metadata and for all the under replicated partitions to return to 0 (This is very important so as not to pressure the controller with multiple restarts, as multiple under replicated partitions amy cause offline/unavailable topic partitions depending on your config)
On top of the above you can definitely tweak the following consumer configs -
- heartbeat.interval.ms - must be lower than session.timeout.ms
- session.timeout.ms
- max.poll.interval.ms THe above can be tweaked based on your connection latencies and kafka cluster status You can read more about them here on the Confluent Docs
It is also possible at time when cluster maintenance activity takes place, the broker assigned to be the partition leader takes time to respond which is significantly greater than the session.timeout.msmax.poll, in the case of which the consumer stops retrying. So tweaking the consumer configs and maintaining sanity in cluster operations is the key to a healthy and continuous kafka integration.*
Note - On a personal opinion, having done cluster upgrades/maintenance activities with over 1Gpbs throughput and we do not face consumption issues (expect a spike in request handler/network handler latencies because of the rebalance). Keeping the above disclaimers in mind with careful execution, updates are easier but definitely time consuming as they are to be executed serially in fashion
More help with documentation for cluster maintenance and consumer behavior tweaks -
QUESTION
I am trying to test out Brooklin for mirroring data between kafka clusters. I am following the wiki https://github.com/linkedin/brooklin/wiki/mirroring-kafka-clusters
Unlike the wiki - I am trying to setup the mirroring between 2 different clusters. I am able to start the Brooklin process and the Datastream but I cannot manage to mirror messages. Brooklin is running on the source kafka cluster ATM. I am trying to mirror topic 'test'
The server.properties for brooklin is
...ANSWER
Answered 2020-Oct-14 at 11:49This was an issue on my end - I had a typo in the topic name configured for mirroring.
QUESTION
I am trying to deploy Zookeeper and Kafka on Kubernetes using the confluentinc docker images. I based my solution on this question and this post. The Zookeeper is running without errors on the log. I want to deploy 3 Kafka brokers using StatefulSet
. The problem with my yaml
files is that I don't know how to configure the KAFKA_ADVERTISED_LISTENERS
property for Kafka when using 3 brokers.
Here is the yaml
files for zookeeper:
ANSWER
Answered 2020-Oct-01 at 10:48I have been reading this blog post "Kafka Listeners - Explained" and I was able to configure 3 Kafka brokers with the following configuration.
QUESTION
In one of our spring-boot based services, we intended to connect to 2 different kafka clusters simultaneously. These clusters each have their own set of bootstrap-servers, topic configurations etc. They are nowhere related to each other as was the case in this question.
I will have different types messages to be read from each cluster on different topic names. There may or may not be multiple producers connecting to both the clusters from this service but we will surely have at least one consumer per cluster.
I would like to know how can I define properties in application.yml to cater to this setup so that I can just use 2 different KafkaProperties
objects to create 4 container factories (2 consumer, 2 producer). The rest, I believe, should be pretty straight forward as I would need to use the relevant factory to create a particular container/listener/kafkaTemplate as per the business requirements.
ANSWER
Answered 2020-Jul-27 at 14:24You cannot; you need to disable Boot's auto configuration and configure the infrastructure beans for each cluster yourself.
Boot's auto configuration only supports one cluster.
QUESTION
I have a Kafka cluster set up using three GCP VM instances. All three VMs have Zookeepers and servers running on then.
I followed this guide for my set up: How to Setup Multi-Node Multi Broker Kafka Cluster in AWS
I have kept all three instances in different regions in order to achieve High Availability in case of regional failure in Cloud Service (GCP or AWS, I understand it is highly unlikely).
I have a topic created with replication-factor as 3
. Now, suppose one region goes entirely down and only two nodes are alive. What happens to a consumer who was reading from the VM in the failed (which was working previously)?
Once the services in the region come back up, would this consumer (having a unique client_id) maintain its state and read only new messages?
Also, what if I have 7 VMs divided (2,2,3) across 3 regions:
- Is it a good idea to keep the replication factor as
7
to achieve high availability? - I am thinking of 7 VMs because any of the regions go down, we still have the majority of Kafka nodes running. Is it possible to run a Kafka cluster with majority of nodes down? (E.g 4 out of 7 nodes down)
ANSWER
Answered 2020-May-26 at 03:31The Kafka provide various setting to achieve high availability and can be tuned and optimize based on requirement
1. min.insync.replicas: Is the minimum number of copies will be always live at any time to continue running Kafka cluster. e.g. lets we have 3 broker nodes and one broker node got down in that case if min.insync.replicas = 2 or less cluster will keep serving request however if min.insync.replicas 3 it will stop. Please note min.insync.replicas=1 is not advisable in that case if data lost will lost forever. min.insync.replicas is a balance between higher consistency (requiring writes to more than one broker) and higher availability (allowing writes when fewer brokers are available).
2. ack(Acknowledgements): While publishing message we can set how many replica commit before producer receive acknowledge. e.g. ack is 0 means immediately acknowledge the message without waiting any commit to partition. ack is 1 means get success acknowledge after message get commit to the leader. ack is all means acknowledge after all in-sync replicas committed.
leader.election.enable: You can set unclean.leader.election.enable=true on your brokers and in this case, if no replicas are in-sync, one of the out-of-sync replicas will be elected. This can lead to data loss but is favoring availability. Of course if some replicas are in-sync it will still elect one of them
offsets.topic.replication.factor: should be greater than in case __consumer_offsets to have high available. __consumer_offsets is offset topic to manage topic offset so in case you have topic replication factor is 1 it may failed to consumer if one broker got down
__consumer_offsets maintain committed for each topic of consumer group.
Consumer always reads from a leader partition Consumers do not consume from followers, the followers only exist for redundancy and fail-over. There are possibilities that failed broker consists multiple leader partition. So in that case follower from partition on other broker will get promoted to leader. There different scenario lets follower partition doesn't have 100% in-sync with leader partition then we might loss some data. Here scenario comes how or what ack your using while publishing messages.
There is trade off how many number of partition or replication is best number. I can say its depend design and choice based on discussion. Please make note large number of replica will give over burden which doing ISR with all follower partition with more memory occupied whereas very less like 1 or 2 will better performance but not high availability.
Broker controller The controller is one of the brokers that has additional partition and replica management responsibilities. It has some extra responsibility and store some meta details and response in any meta details change in partition.
Any of the brokers can play the role of the controller, but in a healthy cluster there is exactly one controller.Kafka broker will get relected to another controller in case controller shutdown or zookeeper lost connection. If controller is 0 or more than 1 means Broker is not healthy
QUESTION
I have an already running production deployed Kafka-Cluster and having Topic "existing-topic". I am using MongoDB-Source-Connector from Debezium.
Here all what I want is to push the CDC events directly to the topic "existing-topic" so that my consumers which are already listening to that topic will process it.
I didn't find any resource to do it so, however it's mentioned that topic is created in below format -
"If your mongodb.name parameter is A, database name is B and collection name is C, the data from database A and collection C will be loaded under the topic A.B.C"
Can I change the topic to "existing-topic" and push the events to it?
...ANSWER
Answered 2020-May-17 at 15:20According to the documentation,
The name of the Kafka topics always takes the form
logicalName.databaseName.collectionName
, wherelogicalName
is the logical name of the connector as specified with themongodb.name
configuration property,databaseName
is the name of the database where the operation occurred, andcollectionName
is the name of the MongoDB collection in which the affected document existed.
This means that if your connector's logical name is myConnector
and your database myDatabase
has two collections users
and orders
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install kafka-cluster
You can install docker from here.
With docker and docker-compose installed, the simplest way to get the cluster up and running is to run the bootstrap command. The bootstrap command will launch a 2 node Kafka cluster with a single Zookeeper node and create a Kafka topic test with 2 partitions plus a replication factor of 2. You can run the command by entering the following in your shell:. You should see a success message if the bootstrap command ran successfully. To check and see if the docker containers are running, run the following. There should be three containers running where two will be named kafkacluster_kafka_<id> and one named kafkacluster_zookeeper_<id>.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page