benchmarks | Scripts for running various benchmarks on Isambard | DevOps library

by UoB-HPC Shell Version: CCPE-CUG-2018 License: No License

X-Ray Key Features Code Snippets(2)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | benchmarks Summary

benchmarks is a Shell library typically used in Devops applications. benchmarks has no bugs, it has no vulnerabilities and it has low support. You can download it from GitHub.

This repository contains scripts for running various benchmarks in a reproducible manner. This is primarily for benchmarking ThunderX2 in Isambard, and other systems that we typically compare against.

Support

Quality

Security

License

Reuse

Support

benchmarks has a low active ecosystem.

It has 24 star(s) with 4 fork(s). There are 13 watchers for this library.

It had no major release in the last 12 months.

There are 3 open issues and 0 have been closed. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of benchmarks is CCPE-CUG-2018

Quality

benchmarks has no bugs reported.

Security

benchmarks has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

benchmarks does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

benchmarks releases are available to install and integrate.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of benchmarks

Get all kandi verified functions for this library.

benchmarks Key Features

No Key Features are available at this moment for benchmarks.

benchmarks Examples and Code Snippets

UoB HPC Benchmarks,Usage

Shell

Lines of Code : 5

License : No License

Copy

mkdir benchmarks && cd benchmarks

# Example for CloverLeaf on TX2
$BENCH/cloverleaf/tx2-isambard/benchmark.sh build

# Example for CloverLeaf on TX2, running on 64 nodes (assuming you have previously run 'build')
$BENCH/cloverleaf/tx2-isamba

UoB HPC Benchmarks,Usage,Using custom settings

Shell

Lines of Code : 4

License : No License

Copy

# Example for GROMACS on TX2
$BENCH/gromacs/tx2-isambard/benchmark.sh

$BENCH/gromacs/tx2-isambard/benchmark.sh build arm-19.0 armpl-19.0
$BENCH/gromacs/tx2-isambard/benchmark.sh run scale-64 arm-19.0 armpl-19.0

Community Discussions

Trending Discussions on benchmarks

How to improve divide-and-conquer runtimes?

Polybase External Tables vs. OPENROWSET serverless sql pool architecture

BenchmarkDotNet Unable to find Tests when it faces weird solution structure

Why does text appear in browsers but not appear in image viewers?

PGBouncer IDLE Connections not Closing on Postgres

Create a new slice given a previous one, without a given value

Writing a vector sum function with SIMD (System.Numerics) and making it faster than a for loop

Why does JMH report such strange times for a simple Quicksort --obviously disproportionate to N * log(N)?

To use Task.WhenAll, or not to use Task.WhenAll

Vectorized hashing/ranking of integer combinations of fixed size via operations on 32-bit integers in MATLAB

QUESTION

How to improve divide-and-conquer runtimes?

Asked 2021-Jun-15 at 17:36

When a divide-and-conquer recursive function doesn't yield runtimes low enough, which other improvements could be done?

Let's say, for example, this power function taken from here:

...

ANSWER

Answered 2021-Jun-15 at 17:36

The primary optimization you should use here is common subexpression elimination. Consider your first piece of code:

Source https://stackoverflow.com/questions/67987701

QUESTION

Polybase External Tables vs. OPENROWSET serverless sql pool architecture

Asked 2021-Jun-09 at 09:33

I am in search of performance benchmarks for querying parquet ADLS files with the standard dedicated sql pool using external tables with polybase vs. serverless sql pool and OPENROWSET views. From my base queries on a 1.5 billion record table, it does appears OPENROWSET in serverless sql pool is around 30% more performant given time for the same query, but what are the architecture that power that? Are there any readily available performance benchmarks?

...

ANSWER

Answered 2021-Jun-09 at 09:33

The architecture behind Azure Synapse SQL Serverless Pools and how it achieves such a strong performance is described in this paper, it is called "Polaris".

http://www.vldb.org/pvldb/vol13/p3204-saborit.pdf

Performance benchmarks have been published on multiple blogs. Be aware that this can only be a snapshot in time as those features are being improved constantly.

Source https://stackoverflow.com/questions/67896757

QUESTION

BenchmarkDotNet Unable to find Tests when it faces weird solution structure

Asked 2021-Jun-03 at 00:42

I have problem with BenchmarkDotNet which I struggle to solve

Here's my project structure:

...

ANSWER

Answered 2021-Jun-03 at 00:42

The short answer is you cannot run benchmark with the structure you created and it is intentional.

For the BenchmarkDotNet (and it is a generally good practice) it's required for solution to have following structure

Source https://stackoverflow.com/questions/67766289

QUESTION

Why does text appear in browsers but not appear in image viewers?

Asked 2021-May-28 at 08:39

I am trying to render a chart but encounter a problem: The elements appear in browsers (Chrome, Firefox) but not in traditional image viewers (Eyes of GNOME, GIMP, Inkscape).

Code

At first, I thought that it was because image viewers are incapable of rendering fonts, until I came across an asciinema's thumbnail, which is displayed perfectly by Eyes of GNOME:

Question: Why does this happen and how to fix this?

...

ANSWER

Answered 2021-May-28 at 07:32

The reason is in nested SVGs:

Source https://stackoverflow.com/questions/67733561

QUESTION

PGBouncer IDLE Connections not Closing on Postgres

Asked 2021-May-27 at 16:31

We have a setup where we are running 6 PgBouncer processes and our performance benchmarks degrade linearly with time. The longer PgBouncer has been running, the longer the connections to Postgres exist results in slower response times for the benchmark. We have a multi-tenant schema separated database with 2000+ relations. We are configured for Transaction Mode pooling right now. Over time, we see the memory footprint of each Postgres process climb and climb and climb, and again, this results in poorer performance.

We have tried to be more aggressive in cleaning up idle connections with the following settings:

...

ANSWER

Answered 2021-May-27 at 16:31

The issue is resolved.

The application was extremely chatty and even with server_idle_timeout set as low as 5 seconds, the connections were not getting recycled on the Postgres side.

The issue we had was that server_lifetime was accidentally commented when we thought it was active and once we changed that, we could clearly see that Postgres connections were getting recycled every 2 minutes (based on our settings).

The increased memory of each connection over time especially for long-lived connections was only taking into consideration private memory and not shared memory. What we observed was the longer the connection was alive, the more memory it consumed. We tried setting things like DISCARD ALL for reset_query and it had no impact on memory consumption. Based on my research online, we were not the only to ones to face this challenge with pooling connections.

Thanks for the comments and the help. Our solution in the end was to leverage server_lifetime in pgBouncer to control the number of long-lived connections on Postgres.

-Mayan

Source https://stackoverflow.com/questions/67664415

QUESTION

Create a new slice given a previous one, without a given value

Asked 2021-May-23 at 13:50

I have a slice of strings. What I need to accomplish is to remove one value from the slice, without knowing the index. I thought this would be the easiest way to do it:

...

ANSWER

Answered 2021-May-23 at 13:49

Allocate a big return slice in one step (estimated by the input slice), and don't use append() but assign to individual elements:

Source https://stackoverflow.com/questions/67660392

QUESTION

Writing a vector sum function with SIMD (System.Numerics) and making it faster than a for loop

Asked 2021-May-21 at 18:27

I wrote a function to add up all the elements of a double[] array using SIMD (System.Numerics.Vector) and the performance is worse than the naïve method.

On my computer Vector.Count is 4 which means I could create an accumulator of 4 values and run through the array adding up the elements by groups.

For example a 10 element array, with a 4 element accumulator and 2 remaining elements I would get

...

ANSWER

Answered 2021-May-19 at 18:28

I would suggest you take a look at this article exploring SIMD performance in .Net.

The overall algorithm looks identical for summing using regular vectorization. One difference is that the multiplication can be avoided when slicing the array:

Source https://stackoverflow.com/questions/67605744

QUESTION

Why does JMH report such strange times for a simple Quicksort --obviously disproportionate to N * log(N)?

Asked 2021-May-19 at 10:44

Having an intent to study a sort algorithm (of my own), I decided to compare its performance with the classical quicksort and to my great surprise I've discovered that the time taken by my implementation of quicksort is far not proportional to N log(N). I thoroughly tried to find an error in my quicksort but unsuccessfully. It is a simple version of the sort algorithm working with arrays of Integer of different sizes, filled with random numbers, and I have no idea, where the error can sneak in. I have even counted all the comparisons and swaps executed by my code, and their number was rather fairly proportional to N log(N). I am completely confused and can't understand the reality I observe. Here are the benchkmark results for sorting arrays of 1,000, 2,000, 4,000, 8,000 and 16,000 random values (measured with JMH):

...

ANSWER

Answered 2021-May-18 at 21:03

Three points work together against your implementation:

Quicksort has a worst case complexity of O(n^2)
Picking the leftmost element as pivot gives worst case behavior on already sorted arrays (https://en.wikipedia.org/wiki/Quicksort#Choice_of_pivot):

In the very early versions of quicksort, the leftmost element of the partition would often be chosen as the pivot element. Unfortunately, this causes worst-case behavior on already sorted arrays

Your algorithm sorts the arrays in place, meaning that after the first pass the "random" array is sorted. (To calculate average times JMH does several passes over the data).

To fix this, you could change your benchmark methods. For example, you could change sortArray01000() to

Source https://stackoverflow.com/questions/67571268

QUESTION

To use Task.WhenAll, or not to use Task.WhenAll

Asked 2021-May-14 at 11:58

I am reviewing some code and trying to come up with a technical reason why you should or should not use Task.WhenAll(Tasks[]) for essentially making Http calls in parallel. The Http calls call a different microservice and I guess one of the calls may or may not take some time to execute... (I guess I am not really interested in that). I'm using BenchmarkDotNet to give me an idea of there is any more memory consumed, or if execution time is wildly different. Here is an over-simplified example of the Benchmarks:

...

ANSWER

Answered 2021-May-12 at 15:51

My question really is, is there a technical reason why you should or not use Task.WhenAll()?

The behavior is just slightly different in the case of exceptions when both calls fail. If they're awaited one at a time, the second failure will never be observed; the exception from the first failure is propagated immediately. If using Task.WhenAll, both failures are observed; one exception is propagated after both tasks fail.

Is it just a preference?

It's mostly just preference. I tend to prefer WhenAll because the code is more explicit, but I don't have a problem with awaiting one at a time.

Source https://stackoverflow.com/questions/67506865

QUESTION

Vectorized hashing/ranking of integer combinations of fixed size via operations on 32-bit integers in MATLAB

Asked 2021-May-14 at 06:09

I have huge dynamically created tables/matrices in MATLAB of varying first dimension, whose rows represent (sorted) combinations of integers in the range 1-50 of order 6.

I would like to assign to each combination a unique value (hash, ranking), so that I can check if the same combinations appear in different tables. Different combinations are not allowed to have same value assigned, i.e. no collisions. I have to make a lot of such comparisons between a lot of such tables. So, for performance reasons, I would like to accomplish this by vectorization of uint32 operations to make it suitable for GPU acceleration in MATLAB.

Things I have thought of so far:

Lexicographic ranking: no idea how to vectorize the standard fast recursive algorithms well, and the only option seems to be to parfor it through the rows, which is slower than other options. IIRC, the direct explicit formula, though vectorizable, requires computation of binomials, which in turn requires log Gamma function in order to avoid huge factorials + double type to avoid collision if I am not mistaken, i.e. is slower because it's 'very numerical'.
Cantor pairing function: one can successively apply Cantor's pairing, which is nice because it's a polynomial expression, but it produces huge numbers well beyond uint32 and is definitely slower than other options.
Base 51 (no pun intended) integers: sends a combination/row vector (x_1,...,x_6) to x_1 + x_2 * 51 + ... + x_6 * 51^5. This is the fastest I currently have. It's easily vectorizable, but unfortunately still requires uint64 or double for rank-6 combinations of 50 elements, which is slower than uint32 or single type operations would be.

So, I guess, I am looking for a 'clever' injective function on these combinations that computes within the uint32 range and is also well vectorizable (in MATLAB).

Any help would be much appreciated!

EDIT: Here is a routine that benchmarks both ranking and searching in uint32, single, and double. I have used MATLAB's gputimeit to produce accurate results.

...

ANSWER

Answered 2021-May-10 at 12:41

You've almost got enough bits for your last idea, so you just need to squeeze a few bits out due to the ordering to get it over the bar. Since the whole sequence is sorted, every pair is also ordered. So use a 50-by-50 look-up table to map the sorted (1st,2nd), (3rd,4th), (5th,6th) pairs into numbers from 0-1274.

Or if you don't want a table, there are fairly simple explicit functions for mapping a pair (i,j) with j>=i to a linear index. Look up upper- or lower-triangular matrix indexing for details on those. (It'll be something along the lines of n*(n+1)/2 - (n-i)*(n-i-1)/2 + j with some +/-1's thrown in depending on base-0 or base-1 indexing, and n=50 in your case, but I'm sure I'll get it wrong writing it off-the-cuff.)

Anyway, once you've got three numbers 0-1274, the base-1275 idea will fit in uint32.

Source https://stackoverflow.com/questions/67455774

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install benchmarks

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: