docker-hadoop | Apache Hadoop docker image

by big-data-europe Shell Version: 3.2.1 License: No License

X-Ray Key Features Code Snippets Community Discussions(5)Vulnerabilities Install Support

kandi X-RAY | docker-hadoop Summary

docker-hadoop is a Shell library typically used in Big Data, Nginx, Docker, Kafka, Spark, Hadoop applications. docker-hadoop has no bugs, it has no vulnerabilities and it has medium support. You can download it from GitHub.

Apache Hadoop docker image

Support

Quality

Security

License

Reuse

Support

docker-hadoop has a medium active ecosystem.

It has 1940 star(s) with 1117 fork(s). There are 84 watchers for this library.

It had no major release in the last 6 months.

There are 62 open issues and 42 have been closed. On average issues are closed in 29 days. There are 14 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of docker-hadoop is 3.2.1

Quality

docker-hadoop has 0 bugs and 0 code smells.

Security

docker-hadoop has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

docker-hadoop code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

docker-hadoop does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

docker-hadoop releases are not available. You will need to build from source code and install.

Installation instructions, examples and code snippets are available.

It has 48 lines of code, 0 functions and 1 files.

It has low code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of docker-hadoop

Get all kandi verified functions for this library.

docker-hadoop Key Features

No Key Features are available at this moment for docker-hadoop.

docker-hadoop Examples and Code Snippets

No Code Snippets are available at this moment for docker-hadoop.

Community Discussions

Trending Discussions on docker-hadoop

Hadoop: How can i resolve the error "Could'n upload the file" in docker container

Copy a file from local machine to docker container

Why every processing slots in Flink 1.4 use separate core?

Docker - where these will be logged

Link Kafka and HDFS with docker containers

QUESTION

Hadoop: How can i resolve the error "Could'n upload the file" in docker container

Asked 2021-Oct-01 at 07:23

Hadoop was run on the local machine with docker-compose.yml. And tried to upload a file to HDFS from the Web UI, but the following results occurred:

Couldn't upload the file bar.txt

Symptoms

folders can be created on the Web UI.
browser devtools fails to network request

Attempt 1

checked and found that the network call failed. Wokred with this reference Open a file with webhdfs in docker container and added the following to services.datanode.ports into docker-compose.yml. But the symptoms were the same.

...

ANSWER

Answered 2021-Aug-30 at 18:26

File uploads to WebHDFS require an HTTP redirect (first it creates the file handle in HDFS, then you upload the file to that place).

Your host doesn't know the container service names, so you will see ERR_NAME_NOT_RESOLVED

One possible solution is to edit your /etc/hosts file to include the namenode container ID to point at 127.0.0.1, however the better way would simply be do docker-compose exec into a container with an HDFS client, and run hadoop fs -put commands

Source https://stackoverflow.com/questions/68947700

QUESTION

Copy a file from local machine to docker container

Asked 2021-Jul-15 at 11:41

I am following this example:

I find the namenode as follows:

...

ANSWER

Answered 2021-Jul-15 at 11:38

Remove the $ at the beginning. That's what $: command not found means. Easy to miss when copy pasting code

Source https://stackoverflow.com/questions/68393052

QUESTION

Why every processing slots in Flink 1.4 use separate core?

Asked 2020-Sep-07 at 08:31

I have a docker-compose with hadoop big-data-europe and flink 1.10 and 1.4 which I try to start in separate container. I use this reference YARN Setup, in which there is an example

Example: Issue the following command to allocate 10 Task Managers, with 8 GB of memory and 32 processing slots each:

...

ANSWER

Answered 2020-Sep-07 at 08:31

As stated in the documentation for configuration parameters in yarn deployment mode, yarn.containers.vcores specifies the number of virtual cores (vcores) per YARN container. By default, the number of vcores is set to the number of slots per TaskManager, if set, or to 1, otherwise. In order for this parameter to be used your cluster must have CPU scheduling enabled.

In your case, you specify the -s 32 taskmanager.numberOfTaskSlots parameter without overriding the yarn.containers.vcores setting thus the app acquires the container with 32 vcores. In order to be able to run with 32 slots per TM and only 8 cores, please, set the yarn.containers.vcores to 8 in flink/conf/flink-conf.yaml.

Regarding the resources, yes, every task manager equals to yarn container acquired, but container has a number of vcores, specified by yarn.containers.vcores (or to a number of slots per container). Regarding the slot, it's more like a resource group and each slot can have multiple tasks, each running in a separate thread. So, slot itself is not limited to only one thread. Please, find more at Task Slots and Resources Docs page.

Source https://stackoverflow.com/questions/63772535

QUESTION

Docker - where these will be logged

Asked 2020-Sep-03 at 17:52

I see a lot of echo statements in one the entrypoint.sh.

Where these logs will be stored ?
I believe, these will be automatically logged. Useful in debugging to see which environment variables ingested etc .. ?

A Sample entrypoint.sh file https://github.com/big-data-europe/docker-hadoop/blob/master/base/entrypoint.sh

...

ANSWER

Answered 2020-Sep-03 at 17:52

If entrypoint.sh is the image's entrypoint, it'll be logged in the docker logs output and in the container's log files (usually at /var/lib/docker/containers//-json.log).
That's usually done for exposing the configuration upon which the container is running. In this case the container is only reporting what's doing, as half of the echo lines are just setting up the hadoop configuration files.

Source https://stackoverflow.com/questions/63729012

QUESTION

Link Kafka and HDFS with docker containers

Asked 2020-Jan-21 at 19:26

Hello Guys, I'm trying to connect Kafka and HDFS with Kafka Connect, but I still face an issue that I can't rid of it.

I'm using this example: https://clubhouse.io/developer-how-to/how-to-set-up-a-hadoop-cluster-in-docker/

I start the HDFS first with: docker-compose up -d

Then I launch the zookeeper kafka and mysql with images from debezium website. https://debezium.io/documentation/reference/1.0/tutorial.html

docker run -it --rm --name zookeeper --network docker-hadoop-master_default -p 2181:2181 -p 2888:2888 -p 3888:3888 debezium/zookeeper:1.0

docker run -it --rm --name kafka --network docker-hadoop-master_default -e ZOOKEEPER_CONNECT=zookeeper -p 9092:9092 --link zookeeper:zookeeper debezium/kafka:1.0

docker run -it --rm --name mysql --network docker-hadoop-master_default -p 3306:3306 -e MYSQL_ROOT_PASSWORD=debezium -e MYSQL_USER=mysqluser -e MYSQL_PASSWORD=mysqlpw debezium /example-mysql:1.0

I use the network on these runs because when I tried to change the network from HDFS on docker-compose.yml the resource manager shutdown and no matter how I couldn't find how I could raise up again and make him stable. So added directly on these containers zookeeper kafka and mysql.

Then, this is the most tricky part, the Kafka Connect, I used the same network to on this case which makes sense.

docker run -it --rm --name connect --network docker-hadoop-master_default -p 8083:8083 -e GROUP_ID=1 -e CONFIG_STORAGE_TOPIC=my_connect_configs -e OFFSET_STORAGE_TOPIC=my_connect_offsets -e STATUS_STORAGE_TOPIC=my_connect_statuses -e BOOTSTRAP_SERVERS="172.18.0.10:9092" -e CORE_CONF_fs_defaultFS=hdfs://172.18.0.2:9000 --link namenode:namenode --link zookeeper:zookeeper --link mysql:mysql debezium/connect:1.0

To link the source (Mysql) to Kafka I uses the connector from the debezium tutorial, the one below.

curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{ "name": "inventory-connector", "config": { "connector.class": "io.debezium.connector.mysql.MySqlConnector", "tasks.max": "1", "database.hostname": "mysql", "database.port": "3306", "database.user": "debezium", "database.password": "dbz", "database.server.id": "184054", "database.server.name": "dbserver1", "database.whitelist": "inventory", "database.history.kafka.bootstrap.servers": "kafka:9092", "database.history.kafka.topic": "dbhistory.inventory" } }'

I tested if Kafka receives any event from the source and works fine.

After setting this, I moved to the installation of the plugin, which I downloaded from the confluent web site and pasted on my local machine Linux, then I installed the Confluent-Hub, and after that the plugin on my local machine. Then I created the user kafka and change all the content from the plugin directory into kafka:kafka.

After all this I used docker cp :/kafka/connect to copy to Kafka Connect.

Then check if it is there and then restart the Kafka Connect to install it.

We can use this to check if is installed: curl -i -X GET -H "Accept:application/json" localhost:8083/connector-plugins

You need to see somewhere this: [{"class":"io.confluent.connect.hdfs.HdfsSinkConnector","type":"sink","version":"5.4.0"},…

After this step I believe is where my problem resides: curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{"name":"hdfs-sink","config":{"connector.class":"io.confluent.connect.hdfs.HdfsSinkConnector","tasks.max":1,"topics":"dbserver1,dbserver1.inventory.products,dbserver1.inventory.products_on_hand,dbserver1.inventory.customers,dbserver1.inventory.orders, dbserver1.inventory.geom,dbserver1.inventory.addresses","hdfs.url":"hdfs://172.18.0.2:9000","flush.size":3,"logs.dir":"logs","topics.dir":"kafka","format.class":"io.confluent.connect.hdfs.parquet.ParquetFormat","partitioner.class":"io.confluent.connect.hdfs.partitioner.DefaultPartitioner","partition.field.name":"day"}}'

I have no idea how to convince Kafka Connect that I want a specific IP address from the namenode, he just keeps my trowing messages that found a different IP when the expected is hdfs://namenode:9000

Also adding this -e CORE_CONF_fs_defaultFS=hdfs://172.18.0.2:9000 to the docker run our setting it inside the Kafka Connect, when I POST the Curl of hdfs-sink he trowing me the message below.

Log from Kafka Connect:

...

ANSWER

Answered 2020-Jan-21 at 19:26

By default, Docker compose adds an underscore and the directory where you ran the command underscore is not allowed in a hostname. Hadoop prefers hostnames by default in the hdfs-site.xml config file.

I have no idea how to convince Kafka Connect that I want a specific IP address from the namenode, he just keeps my trowing messages that found a different IP when the expected is hdfs://namenode:9000

Ideally, you wouldn't use an IP within Docker anyway, you would use the service name and exposed port.

For the HDFS Connector, you also need to define 1) HADOOP_CONF_DIR env-var 2) mount your XML configs as a volume for remote clients such as Connect to interact with the Hadoop cluster and 3) define hadoop.conf.dir in connector property.

Source https://stackoverflow.com/questions/59845575

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install docker-hadoop

To deploy an example HDFS cluster, run:.
Namenode: http://<dockerhadoop_IP_address>:9870/dfshealth.html#tab-overview
History server: http://<dockerhadoop_IP_address>:8188/applicationhistory
Datanode: http://<dockerhadoop_IP_address>:9864/
Nodemanager: http://<dockerhadoop_IP_address>:8042/node
Resource manager: http://<dockerhadoop_IP_address>:8088/