kafka-mirror-maker | Krux Mirror Maker is a custom solution | Pub Sub library
kandi X-RAY | kafka-mirror-maker Summary
kandi X-RAY | kafka-mirror-maker Summary
The Krux Mirror Maker is a custom solution for easily providing message replication capabilities between [Kafka] broker clusters. It uses the same configuration files as the [Mirror Maker tool provided directly by Kafka] but provides two additional capabilities: [Krux Standard Library] configuration, stats and HTTP-status endpoints, and the ability to be configured to send messages over a pre-configured set of SSH tunnels to a remote Kafka cluster.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Handles a message
- Sends messages to the producer
- Command line parser
kafka-mirror-maker Key Features
kafka-mirror-maker Examples and Code Snippets
Community Discussions
Trending Discussions on kafka-mirror-maker
QUESTION
I want to replicate data from kafka cluster A to kafka cluster B using kafka-mirror-maker. Also, i need some special serialization logic for it. As i know, to create custom serializator i have to implement Serializer<>
interface.
So, the problem is that kafka-mirror-maker does not use custom serializer which i specify in producer.properties:
ANSWER
Answered 2019-Jul-19 at 04:34You still have to deserialize bytes, so I'm not sure I understand the purpose of overriding only the serializer
If you want to manipulate the message, look at the MessageHandler
interface, and then the --handler
argument. In there, you would need to wrap both a deserailizer and serializer
Example here of renaming the topic - https://github.com/gwenshap/kafka-examples/tree/master/MirrorMakerHandler
QUESTION
There are best practices out there which recommend to run the Mirror Maker on the target cluster. https://community.hortonworks.com/articles/79891/kafka-mirror-maker-best-practices.html
I wonder why this recommendation exists because ultimately all data must cross the border between the clusters, regardless of whether they are consumed at the target or produced at the source. A reason I can imagine is that the Mirror Maker supports multimple consumer but only one producer - so consuming data on the way with the greater latency might be speed up by the use of multiple consumers.
If performance because of multi threading is a point, would it be usefaul to use several producer (one per consumer) to replicate the data (with a custom replication process)? Does anyone knows why the Mirror Maker shares a single Producer among all consumers?
My usecase is the replication of data from several source cluster (~10) to a single target cluster. I would prefer to run the replication process on the source cluster to avoid to many replication processes (each for one source) on the target cluster.
Hints and suggestions on this topic are very welcome.
...ANSWER
Answered 2019-Jun-07 at 08:14I also put the question in the Apache Kafka Mailing List:
https://lists.apache.org/thread.html/06a3c3ec10e4c44695ad0536240450919843824fab206ae3f390a7b8@%3Cusers.kafka.apache.org%3E
I would like to quote some reasonable answers here:
Franz, you can run MM on or near either source or target cluster, but it's more efficient near the target because this minimizes producer latency. If latency is high, poducers will block waiting on ACKs for in-flight records, which reduces throughput.
I recommend running MM near the target cluster but not necessarily on the same machines, because often Kafka nodes are relatively expensive, with SSD arrays and huge IO bandwidth etc, which isn't necessary for MM.
Ryanne
and
Hi, Franz!
I guess, one of the reasons could be additional safety in case of network split.
It is also some probability of bugs even with good software. So, If we place MM on source cluster and network will split, consumers could (theoretically) continue to read messages from source cluster and commit them even without asks from destination cluster (one of possible bugs). This way you will end up with lost messages on producer after network fix.
On the other hand, if we place MM on destination cluster and network will split, nothing bad happens. MM will be unable to grep data from source cluster, so you data won’t corrupt even in case of bugs.
Tolya
QUESTION
I have swarm network configured when try run Kafka MirrorMake from service inside a docker stack, I get an error from service logs:
...ANSWER
Answered 2019-Jan-10 at 17:06After research, the solution are quick simple, I had to add a host mapping with extra_hosts
in my docker-compose.yml
file.
QUESTION
The error described below occurs in MapR 6.0, but I fail to see any MapR-specificity, so guess it could happen in other enrironments..
I'm mirroring a MapR Stream to Azure Kafka-enabled Event Hub, combining these two documents:
Mirroring Topics from a MapR Cluster to an Apache Kafka Cluster
Use Apache Kafka MirrorMaker with Azure Event Hubs for Kafka Ecosystem | Microsoft Docs
There seems to be some activity happening in Azure when I run the mirror making command, but in the end it fails with this exception list:
...ANSWER
Answered 2018-Aug-26 at 23:10Had to follow the directions from the following link, then use the generated cert-file in the configuration for the producer.
https://docs.confluent.io/current/kafka/authentication_ssl.html
QUESTION
Basically, MM is replicating MORE than I need it to.
I have four environments, DEV01, DEV02, TST01, and TST02, that each have two Servers running the same App that is generating JSON files. Logstash is reading those files and pushing messages into two, three node Kafka Clusters, KAF01 & KAF02. The DEV01 & TST01 boxes push to the KAF01 Cluster, with corresponding DEV01 & TST01 topics, and the DEV02 & TST02 boxes push to the KAF02 Cluster, with corresponding DEV02 & TST02 topics. Logstash is running on each of the Kafka nodes to then push the messages into corresponding Elasticsearch Clusters. This all works as expected. I then added in MM to replicated messages between environments, IE: DEV01<->DEV02, TST01<->TST02. I started the MM process for the DEV environments and everything worked fine. Then, on the same Hosts, I started a 2nd MM process for the TST environments and everything seemed fine until I realized that I was seeing messages from TST in DEV Elasticsearch and vice versa.
Here's a rough diagram of the flow:
I have MM running on the first Hosts in each Kafka Cluster, IE: kaf01-01 & kaf02-01. For the KAF01 Cluster, kaf01-01 is setup to mirror both the dev01 & tst01 topics to the KAF02 Cluster:
kafka-mirror-maker.sh kafka.tools.MirrorMaker --consumer.config dev01_mm_source.properties --num.streams 1 --producer.config dev01_mm_target.properties --whitelist="dev01"
For --consumer.config
, the dev01_mm_source.properties
file is configured with the KAF01 Cluster nodes. For --producer.config
, the dev01_mm_target.properties
file is configured with the KAF02 Cluster nodes.
kafka-mirror-maker.sh kafka.tools.MirrorMaker --consumer.config tst01_mm_source.properties --num.streams 1 --producer.config tst01_mm_target.properties --whitelist="tst01"
For --consumer.config
, the tst01_mm_source.properties
file is configured with the KAF01 Cluster nodes. For --producer.config
, the tst01_mm_target.properties
file is configured with the KAF02 Cluster nodes.
For the KAF02 Cluster, kaf02-01 is setup to mirror both the dev02 & tst02 topics to the KAF01 Cluster:
kafka-mirror-maker.sh kafka.tools.MirrorMaker --consumer.config dev02_mm_source.properties --num.streams 1 --producer.config dev02_mm_target.properties --whitelist="dev02"
For --consumer.config
, the dev02_mm_source.properties
file is configured with the KAF02 Cluster nodes. For --producer.config
, the dev02_mm_target.properties
file is configured with the KAF01 Cluster nodes.
kafka-mirror-maker.sh kafka.tools.MirrorMaker --consumer.config tst02_mm_source.properties --num.streams 1 --producer.config tst02_mm_target.properties --whitelist="tst02"
For --consumer.config
, the tst02_mm_source.properties
file is configured with the KAF02 Cluster nodes. For --producer.config
, the tst02_mm_target.properties
file is configured with the KAF01 Cluster nodes.
Do I have things mixed up? Do I have the --consumer.config
and --producer.config
files backwards? Is the regex for the --whitelist
option that I'm using incorrect? Not really using regex either, just a quoted string. I've triple-checked that Logstash on all of the App boxes is configured to push to the correct Kafka topic and that Logstash on the Kafka boxes is configured to pull from the correct Kafka topic and then push to the correct Elasticsearch Cluster.
Just started working with Kafka and MM today so I'm totally new to all of this and any/all help is greatly appreciated.
...ANSWER
Answered 2018-Jul-28 at 01:30I have figured this out. I was trying to have Logstash output to two different ES Clusters, which a single instance of Logstash apparently cannot do, so it was mushing them together. MirrorMaker is working as expected. I've changed where Logstash is running, on each of the Elasticsearch Nodes themselves to pull from the Kafka topics, to separate this out more and everything is now working as expected.
QUESTION
I use kafka 0.10.2.0 and Im trying to use the kafka mirror maker option.
consumer.properties:
...ANSWER
Answered 2017-Apr-18 at 00:43The whitelist is a Java regex. Try ".*" including the quotes instead of just *. Also you can use just the topic name for testing to eliminate regex being the problem.
QUESTION
I am trying to use the mirrormaker tool to replicate data from one primary cluster to backup one, but I got following error.
...ANSWER
Answered 2017-Feb-03 at 14:31It seems that you are facing a problem with the timestamps, the official documentation says:
Currently, Kafka Streams does not handle invalid (i.e., negative) timestamps returned from the TimestampExtractor gracefully, but fails with an exception, because negative timestamps cannot get handled in a meaningful way for any time based operators like window aggregates or joins.
Negative timestamp can occur for several reason.
- You consume a topic that is written by old Kafka producer clients (i.e., version 0.9 or earlier), which don't use the new message format, and thus meta data timestamp field defaults to -1 if the topic is configured with log.message.timestamp.type=CreateTime
- You consume a pre-0.10 topic after upgrading your Kafka cluster from 0.9 to 0.10: here all the data that was generated with 0.9 producers is not compatible with the 0.10 message format (and defaults to timestamp -1)
- You consume a topic that is being written to by a third-party producer client that allows for embedding negative timestamps (KafkaProducer does check for negative timestamp an raises an exception for this case preventing invalid timestamp in the first place)
- The user provides a custom timestamp extractor that extracts a timestamp for the payload data (i.e., key-value pair), and this custom extractor might return negative timestamps.
Here you can read all the information.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install kafka-mirror-maker
You can use kafka-mirror-maker like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the kafka-mirror-maker component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page