docker-cloudera-quickstart | Docker Cloudera Quick Start Image | Continuous Deployment library
kandi X-RAY | docker-cloudera-quickstart Summary
kandi X-RAY | docker-cloudera-quickstart Summary
Docker Cloudera Quick Start Image. Cloudera Hadoop 5 (CDH5). Now you can run the Cloudera Quick Start image without all the overhead of a Virtual Machine. Just use docker-cloudera-quickstart Image. Based on Ubuntu 14.04 (Trusty LTS). Works with Cloudera CDH 5. *UPDATED FOR LATEST VERSION - CDH5.3.2.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of docker-cloudera-quickstart
docker-cloudera-quickstart Key Features
docker-cloudera-quickstart Examples and Code Snippets
Community Discussions
Trending Discussions on docker-cloudera-quickstart
QUESTION
I have met some issues while trying to consume messages from Kafka with a Spark Streaming application in a Kerberized Hadoop cluster. I tried both of the two approaches listed here :
- receiver-based approach :
KafkaUtils.createStream
- direct approach (no receivers) :
KafkaUtils.createDirectStream
The receiver-based approach (KafkaUtils.createStream
) throws 2 types of exceptions (different exceptions whether I am in local mode (--master local[*]
) or in YARN mode (--master yarn --deploy-mode client
) :
- a weird
kafka.common.BrokerEndPointNotAvailableException
in a Spark local application - a Zookeeper timeout in a Spark on YARN application. I once managed to make this work (connecting to Zookeeper successfully), but no messages were received
In both modes (local or YARN), the direct approach (KafkaUtils.createDirectStream
) returns an unexplained EOFException
(see details below).
My final goal is to launch a Spark Streaming job on YARN, so I will leave the Spark local job aside.
Here is my test environment :
- Cloudera CDH 5.7.0
- Spark 1.6.0
- Kafka 0.10.1.0
I'm working on a single-node cluster (hostname = quickstart.cloudera
) for testing purposes. For those interested to reproduce the tests, I'm working on a custom Docker container based on cloudera/quickstart
(Git repo).
Below is my sample code I used in a spark-shell
. Of course this code works when Kerberos is not enabled : messages produced by kafka-console-producer
are received by the Spark application.
ANSWER
Answered 2018-Jan-08 at 10:11It is not supported with Spark 1.6, as stated in Cloudera docs:
Spark Streaming cannot consume from secure Kafka till it starts using Kafka 0.9 Consumer API
Spark-streaming in 1.6 uses old consumer API, where secure consuming is not supported.
You can use Spark 2.1, which supports secure Kafka: https://blog.cloudera.com/blog/2017/05/reading-data-securely-from-apache-kafka-to-apache-spark/
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install docker-cloudera-quickstart
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page