ABRiS | Avro SerDe for Apache Spark structured APIs
kandi X-RAY | ABRiS Summary
kandi X-RAY | ABRiS Summary
Pain free Spark/Avro integration. Seamlessly integrate with Confluent platform, including Schema Registry with all available naming strategies and schema evolution. Seamlessly convert your Avro records from anywhere (e.g. Kafka, Parquet, HDFS, etc) into Spark Rows. Convert your Dataframes into Avro records without even specifying a schema. Go back-and-forth Spark Avro (since Spark 2.4).
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of ABRiS
ABRiS Key Features
ABRiS Examples and Code Snippets
Community Discussions
Trending Discussions on ABRiS
QUESTION
I have a Java jar, coming from a Java program, if I run the Java program locally within IntelliJ IDEA, it is working well.
When I have compiled the Java program into a jar file.
If I run the program as java -cp jarFileName.jar com.pathToclass.ClassName inputArguments
, it works well.
However, when I run as
spark-submit --master local[4] --class com.pathToclass.ClassName jarFileName.jar inputArguments
, I have the following error when the Java code runs into the read.textFile
function.
The code is as follows:
...ANSWER
Answered 2021-May-04 at 14:46I have found a solution.
The "Multiple sources found for ..." indicate that multiple packages are found for reading text/csv files when submitting the spark job in spark-submit
.
So, it is likely that the multiple versions of the library used for reading text/csv files have been found.
I assume the cause are as follows:
I compiled my java code with gradle on my Windows pc with particular hadoop/spark version. I have run the spark-submit --someCofigaration myjar.jar --some parameters
locally on my windows PC and on different linux server. The version specified in the gradle.build file may not be the same as on my Windows pc. Lucikly it is the same with the version on one of the linux server, and different with the version on another linux server. That is why the spark-submit
job only succeed on one of the linux server and failed on the other one and the Windows pc.
After realizing it could potentially the problem of version conflicts, I re-installed the most recent versions on my pc/linux and the spark-submit
works well, without the error of "mutilple source found for ...".
The versions I am currently using are as follows:
Hadoop: hadoop-3.2.2
Spark: spark-3.1.1-bin-hadoop3.2
java: openjdk version “1.8.0_282“ (Java 8)
Flume: apache-flume-1.9.0-bin
Kafka: kafka_2.13-2.7.0
Scala: scala-2.12.13.deb
sbt: sbt-1.5.0.tgz
I am not sure whether my answer is indeed the correct one as I am relatively new to hadoop/spark/java. If someone knows the reason in details, please post your answer.
QUESTION
I have been trying to read Kafka's avro serialized messages from spark structured streaming (2.4.4) with Scala 2.11. For this purpose i have used spark-avro (dependency below). I generate kafka messages from python using confluent-kafka library. Spark streaming is able to consume the messages with the schema but it doesn't read the values of the fields correctly. I have prepared a simple example to show the problem, the code is avalible here: https://github.com/anigmo97/SimpleExamples/tree/master/Spark_streaming_kafka_avro_scala
I create records in python, the schema of the records is:
...ANSWER
Answered 2020-Mar-05 at 11:22The problem was that i was using the confluent_kafka library in python and i was reading the avro messages in spark structured streaming using spark-avro library.
Confluent_kafka library uses confluent's avro format and spark avro reads using standard avro format.
The difference is that in order to use schema registry, confluent avro prepends the message with four bytes that indicates which schema should be used.
For being able to use confluent avro and read it from spark structured streaming i replaced spark-avro library for Abris ( abris allow to integrate avro and confluent avro with spark). https://github.com/AbsaOSS/ABRiS
My dependencies changed like this:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install ABRiS
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page