azure-event-hubs-spark | Enabling Continuous Data Processing with Apache Spark
kandi X-RAY | azure-event-hubs-spark Summary
kandi X-RAY | azure-event-hubs-spark Summary
This is the source code of the Azure Event Hubs Connector for Apache Spark. Azure Event Hubs is a highly scalable publish-subscribe service that can ingest millions of events per second and stream them into multiple applications. Spark Streaming and Structured Streaming are scalable and fault-tolerant stream processing engines that allow users to process huge amounts of data using complex algorithms expressed with high-level functions like map, reduce, join, and window. This data can then be pushed to filesystems, databases, or even back to Event Hubs. By making Event Hubs and Spark easier to use together, we hope this connector makes building scalable, fault-tolerant applications easier for our users.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of azure-event-hubs-spark
azure-event-hubs-spark Key Features
azure-event-hubs-spark Examples and Code Snippets
Community Discussions
Trending Discussions on azure-event-hubs-spark
QUESTION
Using the steps documented in structured streaming pyspark, I'm unable to create a dataframe in pyspark from the Azure Event Hub I have set up in order to read the stream data.
Error message is: java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.eventhubs.EventHubsSourceProvider could not be instantiated
I have installed the Maven libraries (com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.12 is unavailable) but none appear to work: com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.15 com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.6
As well as ehConf['eventhubs.connectionString'] = sc._jvm.org.apache.spark.eventhubs.EventHubsUtils.encrypt(connectionString)
but the error message returned is:
java.lang.NoSuchMethodError: org.apache.spark.internal.Logging.$init$(Lorg/apache/spark/internal/Logging;)V
The connection string is correct as it is also used in a console application that writes to the Azure Event Hub and that works.
Can someone point me in the right direction, please. Code in use is as follows:
...ANSWER
Answered 2020-May-19 at 18:54To resolve the issue, I did the following:
- Uninstall azure event hub library versions
- Install com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.15 library version from Maven Central
- Restart cluster
- Validate by re-running code provided in the question
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install azure-event-hubs-spark
Java 1.8 SDK installed
Maven 3.x installed (or SBT version 1.x)
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page