HPC | A collection of various resources, examples, and executables for the general NREL HPC user community | GPU library

 by   NREL Jupyter Notebook Version: Current License: Non-SPDX

kandi X-RAY | HPC Summary

kandi X-RAY | HPC Summary

HPC is a Jupyter Notebook library typically used in Hardware, GPU applications. HPC has no bugs, it has no vulnerabilities and it has low support. However HPC has a Non-SPDX License. You can download it from GitHub.

This repository serves as a collection of walkthroughs, utilities, and other resources to improve the NREL HPC user's quality of life, both novice and veteran. We are here to help: If you need help with a specific issue or would like to see a topic covered please open an issue. If you have materials that could be useful for the NREL community, please see our contributing guidelines, and open a pull request.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              HPC has a low active ecosystem.
              It has 73 star(s) with 54 fork(s). There are 14 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 79 open issues and 114 have been closed. On average issues are closed in 40 days. There are 11 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of HPC is current.

            kandi-Quality Quality

              HPC has no bugs reported.

            kandi-Security Security

              HPC has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              HPC has a Non-SPDX License.
              Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

            kandi-Reuse Reuse

              HPC releases are not available. You will need to build from source code and install.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of HPC
            Get all kandi verified functions for this library.

            HPC Key Features

            No Key Features are available at this moment for HPC.

            HPC Examples and Code Snippets

            No Code Snippets are available at this moment for HPC.

            Community Discussions

            QUESTION

            How can I do in order to generate the makefile with ./configure?
            Asked 2021-Jun-08 at 12:40

            I am trying to install EZTrace which is a tool that aims at generating automatically execution trace from HPC. I downloaded the installation folder from here, https://eztrace.gitlab.io/eztrace/index.html. After extracting it, I found a README file:

            ...

            ANSWER

            Answered 2021-Jun-08 at 12:40
            • don't run autoheader - the project is not setup to use it
            • the automake warning is a warning, not an error.

            usually, the simplest way to bootstrap an autotools-project is by running autoreconf -fiv. that will create a configure script which you need to run in order to create the Makefile.

            Source https://stackoverflow.com/questions/67885170

            QUESTION

            How can I run this simple for loop in parallel bash?
            Asked 2021-Jun-03 at 18:55

            I am trying to run an Rscript multiple times with different parameters and I am using a bash script to do so (I got an error while trying to run it in parallel in R with things like foreach and doParallel but that is not the problem here).

            My script, which I intended to call with $sbatch script.sh (on a hpc) looks as follows:

            ...

            ANSWER

            Answered 2021-Jun-03 at 18:55

            For your script to work, you need to

            1. either use variables names a, b, c, etc. or $dist $rlen $trans $meta $init but not both
            2. end the scrip with wait otherwise Slurm will think your script has finished

            So:

            Source https://stackoverflow.com/questions/67820987

            QUESTION

            How to create a batch script, which submitts several jobs and allocates each of the this jobs on a separate node?
            Asked 2021-Jun-03 at 11:17

            I am new to HPC and SLURM especially, and i ran into some troubles.

            I was provided with acces to a HPC cluster with 32 CPUs on each node. In order to do the needed calculations I made 12 Python multiprocessing Scripts, where each Script uses excactly 32 CPU's. How, instead of starting each Script manually in the interactive modus ( which is also an option btw. but it takes a lot of time) I decided to write a Batch Script in order to start all my 12 Scripts automatically.

            //SCRIPT//

            #!/bin/bash

            #SBATCH --job-name=job_name

            #SBATCH --partition=partition

            #SBATCH --nodes=1

            #SBATCH --time=47:59:59

            #SBATCH --export=NONE

            #SBATCH --array=1-12

            module switch env env/system-gcc module load python/3.8.5

            source /home/user/env/bin/activate

            python3.8 $HOME/Script_directory/Script$SLURM_ARRAY_TASK_ID.py

            exit

            //UNSCRIPT//

            But as far as i understand, this script would start all of the Jobs from the Array on the same node and thus the underlying python scripts might start a "fight" for the available CPU's and thus slow down.

            How should i modify my bash file in Order to start each task from the array on a separate node?

            Thanks in advance!

            ...

            ANSWER

            Answered 2021-Jun-03 at 11:17

            This script will start 12 independent jobs, possibly on 12 distinct nodes at the same time, or all 12 in sequence on the same node or any other combination depending on the load of the cluster.

            Each job will run the corresponding Script$SLURM_ARRAY_TASK_ID.py script. There will be no competition for resources.

            Note that if nodes are shared in the cluster, you would add the --exclusive parameter to request whole nodes with their 32 CPUs.

            Source https://stackoverflow.com/questions/67807263

            QUESTION

            How can I setup Rstudio, sparklyR on an auto scale cluster managed by slurm?
            Asked 2021-May-24 at 11:07

            I have an aws HPC auto scale cluster managed by slurm, I can submit jobs using sbatch, however I want to use spraklyr on this cluster so that slurm increases the cluster size based on the workload of the sparklyr code in the R script. Is this possible?

            ...

            ANSWER

            Answered 2021-May-24 at 11:07

            Hi Amir is there a reason you are using slurm here? Sparklyr has better integration with Apache Spark and it would be advisable to run it over a spark cluster. You can follow this Blog to know the steps to setup this up with Amazon EMR which is a Service to run Spark cluster on AWS - https://aws.amazon.com/blogs/big-data/running-sparklyr-rstudios-r-interface-to-spark-on-amazon-emr/

            Source https://stackoverflow.com/questions/67629543

            QUESTION

            How do I use dask distributed?
            Asked 2021-May-20 at 16:19

            I am trying to use Dask by looking at the code examples and documentation, and have trouble understanding how it works. As suggested in the document, I am trying to use the distributed scheduler (I also plan to deploy my code on an HPC).

            The first simple thing I tried was this:

            ...

            ANSWER

            Answered 2021-May-20 at 16:19

            But, how do I make sure that only one worker executes this function? (In MPI, I can do this by using rank == 0; I did not find anything similar to MPI_Comm_rank() which can tell me the worker number or id in Dask).

            This is effectively what the if __name__ == '__main__' block is checking. That condition is true when your script is run directly; it's not true when it's imported as a module by the workers. Any code that you put outside of this block is run by every worker when it starts up; this should be limited to function definitions and necessary global setup. All of the code that actually does work needs to be in the if __name__ == '__main__' block, or inside functions which are only called from inside that block.

            Source https://stackoverflow.com/questions/67624157

            QUESTION

            AArch64 SVE/2 - Left pack elements from list
            Asked 2021-May-13 at 17:36

            I'm trying to implement a SIMD algorithm with AArch64 SVE (or SVE2) that takes a list of elements and only selects the ones that meet a certain condition. It's often called Left Packing (SSE/AVX/AVX-512), or Stream Compaction (CUDA)?

            Is it possible to vectorize this operation with SVE?

            Equivalent SQL and scalar code could look like this:

            ...

            ANSWER

            Answered 2021-May-13 at 13:05

            Use svcompact to compact the active elements, then a normal linear store to store the result.

            Source https://stackoverflow.com/questions/67518513

            QUESTION

            How to tell python to wait for shell (.sh) script
            Asked 2021-May-11 at 18:55

            I am calling a shell (.sh) script from my python code and I want to tell Python to wait for the script to end before continuing to the rest of the code. For the record, the script is calling a HPC cluster some calculations which take approximately 40-50min. I could probably do sleep() and force python to wait for these 40-50min, but firstly I do not always know the amount of time that should wait, and secondly I was hoping for a more efficient way of doing this. So, the script is called by using os.system("bsub < test.sh").

            Is there any way to actually tell python wait until the script is finished and then continue with the rest of the code? Thanks in advance

            ...

            ANSWER

            Answered 2021-May-11 at 18:55

            I think @Barmar identifies the problem in a few comments

            When you run bsub, it submits the job and immediately returns, rather than waiting for completion.

            You should either

            • add the -K arg to bsub for it to wait ref
            • skip bsub and run the script directly
            • write some independent marker at the end of your script (perhaps a file) and have the Python script check for it in a loop (maybe every 1-5s so it doesn't flood that resource)
            • re-write the script in pure Python and directly incorporate it into your logic

            Source https://stackoverflow.com/questions/67492599

            QUESTION

            C++ call to LAPACKE run on a single thread while NumPy uses all threads
            Asked 2021-Apr-19 at 20:27

            I wrote a C++ code whose bottleneck is the diagonalization of a possibly large symmetric matrix. The code uses OpenMP, CBLAS and LAPACKE C-interfaces.

            However, the call on dsyev runs on a single thread both on my local machine and on a HPC cluster (as seen by htop or equivalent tools). It takes about 800 seconds to diagonalize a 12000x12000 matrix, while NumPy's function eigh takes about 250 seconds. Of course, in both cases $OMP_NUM_THREADS is set to the number of threads.

            Here is an example of a C++ code that calls LAPACK that is basically what I do in my program. (I am reading the matrix that is in binary format).

            ...

            ANSWER

            Answered 2021-Apr-19 at 17:28

            From the provided informations, it seems your C++ code is linked with OpenBLAS while your Python implementation use the Intel MKL.

            OpenBLAS is a free open-souce library that implement basic linear algebra functions (called BLAS, like the matrix multiplication, the dot products, etc.), but it barely supports advanced linear algebra functions (called LAPACK, like the eigen values, QR decomposition, etc.). Consequently, while the BLAS functions of OpenBLAS are well optimized and run in parallel. The LAPACK functions are clearly not well optimized yet and are mostly running in sequential.

            The Intel MKL is a non-free closed-source library implementing both BLAS and LAPACK functions. Intel claims high performance for the implementation of both BLAS and LAPACK functions (at least on Intel processors). The implementation are well optimized and most are running in parallel.

            As a result, if you want your C++ code to be at least as fast as your Python code, you need to link the MKL and not OpenBLAS.

            Source https://stackoverflow.com/questions/67165201

            QUESTION

            How to save/record SLURM script's config parameters to the output file?
            Asked 2021-Apr-13 at 06:54

            I'm new to HPC and SLURM in particular. Here is an example code that I use to run my python script:

            ...

            ANSWER

            Answered 2021-Apr-13 at 06:54

            You add the following two lines at the end of your submission script:

            Source https://stackoverflow.com/questions/67051162

            QUESTION

            register usage on nvidia with hipSYCL / llvm
            Asked 2021-Apr-09 at 15:23

            I am looking at the performance of a sycl port of some hpc code, which I am running on a GV100 card via hipSYCL.

            Running the code through a profiler tells me that very high register usage is the likely limiting factor for performance.

            Is there any way of influencing register usage of the gpu code that hipSYCL / clang generates, something akin to nvcc's -maxregcount option?

            ...

            ANSWER

            Answered 2021-Apr-09 at 15:23

            hipSYCL invokes the clang CUDA toolchain. As far as I know clang CUDA and the LLVM nvptx backend do not have a direct analogue to -maxregcount, but maybe the LLVM nvptx backend option --nvptx-sched4reg can help. It tells the optimizer to schedule for minimum register pressure instead of just following the source.

            If you use accessors, you can also try to use SYCL 2020 USM pointers instead. In hipSYCL[1] accessors will always use more registers because they need to store the valid access range and offset as well.

            [1] and also any other SYCL implementation that relies heavily on library-only semantics

            Source https://stackoverflow.com/questions/66977229

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install HPC

            You can download it from GitHub.

            Support

            Please see our contribution guidelines for a rundown on how we'd like the contents of this repository to be formatted.
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/NREL/HPC.git

          • CLI

            gh repo clone NREL/HPC

          • sshUrl

            git@github.com:NREL/HPC.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link