kafka-connect-hdfs | Kafka Connect HDFS connector

 by   confluentinc Java Version: v10.2.2 License: Non-SPDX

kandi X-RAY | kafka-connect-hdfs Summary

kandi X-RAY | kafka-connect-hdfs Summary

kafka-connect-hdfs is a Java library typically used in Big Data, Kafka, Spark, Hadoop applications. kafka-connect-hdfs has no bugs, it has no vulnerabilities, it has build file available and it has low support. However kafka-connect-hdfs has a Non-SPDX License. You can download it from GitHub.

kafka-connect-hdfs is a Kafka Connector for copying data between Kafka and Hadoop HDFS.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              kafka-connect-hdfs has a low active ecosystem.
              It has 452 star(s) with 394 fork(s). There are 320 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 120 open issues and 192 have been closed. On average issues are closed in 484 days. There are 25 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of kafka-connect-hdfs is v10.2.2

            kandi-Quality Quality

              kafka-connect-hdfs has 0 bugs and 0 code smells.

            kandi-Security Security

              kafka-connect-hdfs has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              kafka-connect-hdfs code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              kafka-connect-hdfs has a Non-SPDX License.
              Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

            kandi-Reuse Reuse

              kafka-connect-hdfs releases are not available. You will need to build from source code and install.
              Build file is available. You can build the component from source.
              kafka-connect-hdfs saves you 5653 person hours of effort in developing the same functionality from scratch.
              It has 12514 lines of code, 699 functions and 119 files.
              It has medium code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed kafka-connect-hdfs and discovered the below as its top functions. This is intended to give you an instant insight into kafka-connect-hdfs implemented functionality, and help decide if they suit your requirements.
            • Writes data to sink
            • Performs the recovery process
            • Commit file
            • Reads the offset of the file from HDFS
            • Start the sink
            • Find fileStatus with max offset
            • Synchronize all the topics in the hive table
            • Closes the writer
            • Initialize Hive services
            • Return Avro schema for a given path
            • Append WAL file
            • Returns a new Partitioner instance based on the configuration
            • This method returns a list of configs configured for this task
            • Determines whether this path belongs to the committed file
            • Get Avro schema from disk
            • Creates a list of task configurations
            • Create a RecordWriter
            • Returns the latest offset and file path from the WAL file
            • Return the schema for the given path
            • Initializes the connection
            • Create a record writer
            • Configure kerberos authentication
            • Create a ParquetWriter that wraps the Avro schema
            • Apply the WAL file to the storage
            • Polls a source record
            • Create a record writer
            Get all kandi verified functions for this library.

            kafka-connect-hdfs Key Features

            No Key Features are available at this moment for kafka-connect-hdfs.

            kafka-connect-hdfs Examples and Code Snippets

            No Code Snippets are available at this moment for kafka-connect-hdfs.

            Community Discussions

            QUESTION

            What happens when I deploy new kafka-connect cluster while file opened? (kafka-connect-hdfs)
            Asked 2021-Dec-22 at 20:41

            I'm using hdfs kafka connect cluster, as in distributed mode.

            I set rotate.interval.ms as 1 hour, and offset.flush.interval.ms as 1 minute.

            In my case, I thought the file would be committed when a new record with an hour interval with the timestamp of the first record came; and offset will be flushed every minute.

            However, I wondered what will be happened when I restart the cluster when the file has still opened. I mean, what will be happened in below case?

            1. The file was opened starting with a record with a '15:37' timestamp. (offset 10)
            2. after 10 minutes, the kafka-connect cluster restarted.
            3. (I thought step 1's file will be discarded in the memory, and not be committed to the hdfs)
            4. When the new worker started, will the "new opened file" start tracking the record from offset 10?

            Does kafka-connect/kafka-connect-hdfs keep us from losing our uncommitted records?

            Due to the official document, I thought __consumer_offsets will help me in this case, but I'm not sure.

            Any documents or comments will be very helpful!

            ...

            ANSWER

            Answered 2021-Dec-22 at 20:41

            The consumer offsets topic is used for sink connectors, yes, and, if possible, the consumer will reset to the last non-committed offsets.

            I think the behavior might have changed some time ago, but the HDFS Connector used to use a write-ahead log (WAL) to temporarily preserve the data the it was writing to a temporary HDFS location before the final file was created.

            Source https://stackoverflow.com/questions/70423504

            QUESTION

            Kafka Stream for Kafka to HDFS
            Asked 2021-Jun-03 at 01:27

            I have a Flink Job which reads data from Kafka topics and writes it to HDFS. There are some problems with checkpoints, for example after stopping Flink Job some files stay in pending mode and other problems with checkpoints which write to HDFS too. I want to try Kafka Streams for the same type of pipeline Kafka to HDFS. I found the next problem - https://github.com/confluentinc/kafka-connect-hdfs/issues/365 Could you tell me please how to resolve it? Could you tell me where Kafka Streams keep files for recovery?

            ...

            ANSWER

            Answered 2021-Jun-03 at 01:27

            Kafka Streams only interacts between topics of the same cluster, not with external systems.

            Kafka Connect HDFS2 connector maintains offsets in an internal offsets topic. Older versions of it maintained offsets in the filenames and used a write-ahead log to ensure file delivery

            Source https://stackoverflow.com/questions/67807661

            QUESTION

            Could not transfer artifact io.confluent:kafka-connect-storage-common-parent:pom:6.0.0-SNAPSHOT from/to confluent (${confluent.maven.repo})
            Asked 2020-Aug-28 at 18:34

            I am trying Kafka connect for the first time and I want to connect SAP S/4 HANA to Hive. I have created the SAP S/4 source Kafka connector using this:

            https://github.com/SAP/kafka-connect-sap

            But, I am not able to create an HDFS sink connector. The issue is related to pom file.

            I have tried mvn clean package. But, I got this error:

            ...

            ANSWER

            Answered 2020-Aug-28 at 18:34

            I suggest you download existing Confluent Platform which includes HDFS Connect already

            Otherwise, checkout a release version rather than only the master branch to build the project.

            Source https://stackoverflow.com/questions/63602134

            QUESTION

            Kafka to hdfs3 sink Missing required configuration "confluent.topic.bootstrap.servers" which has no default value
            Asked 2020-Jun-23 at 08:56
            Status

            My HDFS was installed via ambari, HDP. I'm Currently trying to load kafka topics into HDFS sink. Kafka and HDFS was installed in the same machine x.x.x.x. I didn't change much stuff from the default settings, except some port that according to my needs.

            Here is how i execute kafka:

            ...

            ANSWER

            Answered 2020-Jun-23 at 08:23

            Here's the error:

            Missing required configuration "confluent.topic.bootstrap.servers" which has no default value.

            The problem is that you've taken the config for the HDFS Sink connector, and changed the connector for a different one (HDFS 3 Sink), and this one has different configuration requirements.

            You can follow the quickstart for the HDFS 3 sink connector, or fix your existing configuration by adding

            Source https://stackoverflow.com/questions/62526864

            QUESTION

            How can i change the Debezium default topic naming convention to make it fit for confluent hive table auto-generated strategy?
            Asked 2020-May-08 at 04:02

            I am build an data synchronizer, which capture the data change from MySQL Source, and export the data to hive.

            I choose to use Kafka Connect to implement this. I use Debezium as source connector, and confluent hdfs as sink connector.

            But the problem is, the Debezium's naming convention for Kafka topic is like:

            serverName.databaseName.tableName

            In confluent hdfs sink propeties, i have to config the topics the same as Debezium generated:

            "topics": "serverName.databaseName.tableName"

            Confluent hdfs sink connector will generate path in HDFS like:

            /topics/serverName.databaseName.tableName/partition=0

            which will definitely cause some problem in HDFS/Hive, since the path contains syntax ., In fact, the external table auto generated by confluent hdfs sink connector failed, due to the path problem.

            ...

            ANSWER

            Answered 2020-May-08 at 04:02

            HDFS Connector will replace dots (and dashes) with underscores when creating Hive tables

            HDFS itself doesn't care about dots in paths. The problem is that you cannot have a dot after the port, and you have /null in there somehow.

            hdfs://localhost:9000./null

            is there anyway that i can change the Debezium default naming convention for topics

            Solution has nothing to do with Debezium. You can use RegexRouter that is base Apache Kafka Connect library in a transforms config for you source or sink connector, depending on how early you want to "fix" the problem.

            You could also write your own transform and put it in Connect's plugin.path

            Source https://stackoverflow.com/questions/61663896

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install kafka-connect-hdfs

            You can download it from GitHub.
            You can use kafka-connect-hdfs like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the kafka-connect-hdfs component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .

            Support

            Source Code: https://github.com/confluentinc/kafka-connect-hdfsIssue Tracker: https://github.com/confluentinc/kafka-connect-hdfs/issuesLearn how to work with the connector's source code by reading our Development and Contribution guidelines.
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/confluentinc/kafka-connect-hdfs.git

          • CLI

            gh repo clone confluentinc/kafka-connect-hdfs

          • sshUrl

            git@github.com:confluentinc/kafka-connect-hdfs.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link