HadoopInternals | Diagrams describing Apache Hadoop internals

 by   ercoppa HTML Version: Current License: No License

kandi X-RAY | HadoopInternals Summary

kandi X-RAY | HadoopInternals Summary

HadoopInternals is a HTML library typically used in Big Data, Spark applications. HadoopInternals has no bugs, it has no vulnerabilities and it has low support. You can download it from GitHub.

This project contains several diagrams describing [Apache Hadoop] internals (2.3.0 or later). Even if these diagrams are NOT specified in any formal or unambiguous language (e.g., UML), they should be reasonably understandable and useful for any person who want to grasp the main ideas behind Hadoop. Unfortunately, not all the internal details are covered by these diagrams. You are free to help :) # Ready? Go to the [project website] Images linked in the wiki are dinamically generated (from LucidChart) but, in the source directory, you can find diagram snapshots in the following formats: * PNG * Visio (VDX). A VDX file can be opened with one of the many VISIO editors (e.g., I am using the web-application editor [LucidChart] www.lucidchart.com) but unfortunately only pro users can edit an imported file). These files are periodically synced with the ones showed inside the wiki. If requested, I can share LucidChart files using Google Drive and you can help me in this project (in this case, the free account on LucidChart is enough for editing).
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              HadoopInternals has a low active ecosystem.
              It has 429 star(s) with 200 fork(s). There are 56 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              HadoopInternals has no issues reported. There are no pull requests.
              It has a neutral sentiment in the developer community.
              The latest version of HadoopInternals is current.

            kandi-Quality Quality

              HadoopInternals has no bugs reported.

            kandi-Security Security

              HadoopInternals has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              HadoopInternals does not have a standard license declared.
              Check the repository for any license declaration and review the terms closely.
              OutlinedDot
              Without a license, all rights are reserved, and you cannot use the library in your applications.

            kandi-Reuse Reuse

              HadoopInternals releases are not available. You will need to build from source code and install.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of HadoopInternals
            Get all kandi verified functions for this library.

            HadoopInternals Key Features

            No Key Features are available at this moment for HadoopInternals.

            HadoopInternals Examples and Code Snippets

            No Code Snippets are available at this moment for HadoopInternals.

            Community Discussions

            QUESTION

            Determine cause of Hadoop error in code, as standard logs inconclusive: file splits, container memory, or block size
            Asked 2017-Oct-16 at 20:13

            For a while now I've been going through the log4j logs trying to determine why my Hadoop job is crashing.

            Essentially what the job attempts to do is issue a command on the underlying machine, and collect the output of that command- at the moment point all of these steps takes place in a map job (later I'll try to reduce over the sum of those individual outputs).

            The behaviour I'm experiencing is that- for a certain number of outputs generated to the BufferedReader, for sake of conversation- 28 of them, everything works fine and the job finishes almost immediately, however when I increase that number to 29, the map job hangs at 67% completion- attempts it three times- always stopping at 67% and finally terminates itself for lack of progress.

            From the NameNode where the job is issued we can see the following output:

            ...

            ANSWER

            Answered 2017-Oct-11 at 02:34

            To understand where exactly the mapper is stuck at, jstack can be used [to get the thread dump].

            Jstack ships with the jdk and you can use it on the stuck mapper process as follows.

            Step0: Find the hostname in which your map task were running and make a note of the task_id

            step1: login to the node and run

            ps aux | grep task_id

            identify the process id and the username of the process which is starting with /usr/java/jdk/bin/java 

            step2: su to the process owner username

            step3: export java home and bin path [example: export JAVA_HOME=/usr/java/jdk1.7.0_67 && export PATH=$JAVA_HOME/bin:$PATH]

            step4:replace pid with pid you obtained in step1:

            export PID=PID for i in $(seq 1 10); do echo "Jstack Iteration $i"; jstack $PID > /tmp/hungtask-hostname-${PID}.jstack.$i; sleep 5s; done tar zcvf hungtask.tar.gz /tmp/hungtask-hostname-${PID}.jstack.*

            the hungtask.tar.gz will contain the thread dump of the process taken at intervals of 5 seconds for ten times. You may need to run the script at the point when the task goes into hung state.

            After this if you can upload hungtask.tar.gz to this thread, I can see and share my observation.

            Also to understand if the process is undergoing frequent GC you can try the below command

            jstat -gc -t PID STEP

            PID is the process ID of the java process to monitor  STEP is the sample timestep 

            You can paste the contents to the website http://nix-on.blogspot.in/2015/01/java-jstat-how-to-visualize-garbage.html to understand if it is undergoing excessive GC

            Source https://stackoverflow.com/questions/46653756

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install HadoopInternals

            You can download it from GitHub.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/ercoppa/HadoopInternals.git

          • CLI

            gh repo clone ercoppa/HadoopInternals

          • sshUrl

            git@github.com:ercoppa/HadoopInternals.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link