parallel-programming | MolSSI education lesson for parallel programming | Learning library
kandi X-RAY | parallel-programming Summary
kandi X-RAY | parallel-programming Summary
This is the repo for the MolSSI education lesson for parallel programming. To participate in the lesson, see
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Calculate the energy of the particle
- Computes the Lennard Jenkins potential for two particles
- R Compute the distance between two particles
- Check for required files
- Check for blank lines
- Split the text into metadata
- Check the fileset
- Add a message
- Parse command line arguments
- Require a condition
- Performs validation
- Read all markdown files
- Check if condition is met
- Create a checker
- Print the messages in pretty format
- Check for missing files
- Check metadata
- Check source rmmd files
- Check for missing labels
- Calculates the total pair energy of particles
- Check configuration files
- Calculate tail correction
- Read references
- Return the URL for a git repository
- Return whether a move is accepted
- Calculate displacement based on acceptance rate
parallel-programming Key Features
parallel-programming Examples and Code Snippets
class Bank {
private int[] accounts;
Logger logger;
public Bank(int accountNum, int baseAmount, Logger logger) {
this.logger = logger;
accounts = new int[accountNum];
Arrays.fill(accounts, baseAmount);
Community Discussions
Trending Discussions on parallel-programming
QUESTION
In C#, I am using Task Parallel Library (TPL) to download an image, process the image, and save the analysis results. A simplified code reads as the following.
...ANSWER
Answered 2022-Jan-31 at 13:57is this the expected behavior of TPL?
Yes. It doesn't root all the objects (they are available for garbage collection and finalization), but it does not dispose them, either.
and is there any option to set so the resources are released at the end of each iteration?
No.
how can I can make sure dispose is auto called when the last block/action executed on an input?
To dispose objects, your code should call Dispose
. This is fairly easily done by modifying ProcessImage
or wrapping it in a delegate.
If ProcessImage
is synchronous:
QUESTION
I am looking to parallelise processing of a task which is dependent on an object (State
), which is not thread-safe, and whose construction is time-expensive.
For this reason, I was looking into partition-local variables, but either I am doing it wrong or looking for something else. This more or less represents my current implementation:
...ANSWER
Answered 2020-Oct-30 at 15:08You have a couple of options. The simplest is to create a single State
object, and synchronize the access to it by using a lock
:
QUESTION
I translated a DTW matlab function to Swift. The code looks as follows:
...ANSWER
Answered 2020-Oct-26 at 17:05Your first problem is that you're creating an array of arrays. This is not an efficient data structure, and is not a "2 dimensional array" in the way most people mean (i.e a matrix). It is an array made up of other arrays, all of which can have arbitrary sizes, and this can be very expensive to mutate. As a rule, if you want a matrix, you should back it with a flat array and use multiplication to find its offsets, particularly if you're mutating it. Instead of table[i][j]
you would use table[i * width + j]
.
But in your case it's even easier, since there are exactly two rows. So you don't a multi-dimensional array at all. You can just use two variables, and it'll be much more efficient. (In my tests, just making this change is about 30% faster than the original code.)
The major thing that slows you down is contention. You read and write to the same array in the loop. That gets in the way of various reordering and caching optimizations. In particular, it happens here:
QUESTION
I have the following for function:
...ANSWER
Answered 2020-Oct-03 at 14:59Solution Using Numpy vectorization
Issue
- Line
if(index-i > 0):
should beif(index-i >= 0):
otherwise we miss the difference of 1 - Use 'Close' rather than 'Trade Close' (doesn't matter for performance but avoid renaming column after pulling data from web)
Code
QUESTION
Inspired by .Net TPL, I'm trying to find a way to handle an error outside the RX pipe. Specifically, on error, I wish the Observer pipe to stop, and pass control back to the surrounding method. Something like:
...ANSWER
Answered 2020-Sep-30 at 17:26Have you tried catching the error like so
QUESTION
I have made an app that uses azure functions durable fan-out strategy to make parallel inquiries and updates to a database by sending http requests to our own internal API.
I found out that the fan-out strategy is EXTREMELY slower than it would be to use TPL library and doing parallelism this way on a normal .net Core webapp. It's not just slower it's about 20 times slower. It takes 10 minutes for 130 updates while the .net core 3.1 app that I've made for speed comparisson that does the exact same thing does 130 updates in 0.5 minutes and in a significantly lower plan.
I understand there is a latency becuase of the durable framwork infrastructure (communicating with the storage account and whatnot) but I don't see how that speed difference is normal. Each individual update happens in an ActivityTrigger function and the orchestrator is the one that gathers all the necessary updates and puts them in a Task.WhenAll() call just like the example from Microsoft docs.
Am I doing something wrong here? Is this business scenario maybe not compatible with this technology? The code seems to work fine and the parallelism works It's just a LOT slower than the .net core app. Another thing to mention is that the moment the function opens a second instance (due to either it being in consuption plan and naturally openning a second instance to deal with heavy load or it being in appservice plan and I manually opening an instance) it goes even slower although the cpu load somehow balances in the two instances. I suspect this could be extra latency due to azure queue comunication between the two instances but I'm not entirely sure.
One last detail is that the app also has a TimeTrigger that does a simple select in a database every one minute (nothing even remotely cpu intensive but it might play a role in the performance).
I've tried the function app in a premium plan, consuption plan, and appservice plan and it seems to top at 130 updates in 10 minutes no matter how huge the plan is.
...ANSWER
Answered 2020-Sep-29 at 20:29Speaking generally, TPL will almost always be much faster than Durable Functions because all the coordination is done in-memory (assuming to don't completely exhaust system resources doing everything on one machine). So that part is often expected. Here are a few points worth knowing:
- Each fan-out to an activity function involves a set of queue transactions: one message for calling the activity function and one message for handing the result back to the orchestrator. When there are multiple VMs involved, then you also have to worry about queue polling delays.
- By default, the per-instance concurrency for activity functions is limited to 10 on a single-core VM. If your activity functions don't require much memory or CPU, then you'll want to crank up this value to increase per-instance concurrency.
- If you're using the Azure Functions Consumption or Premium plans, it will take 15-30 seconds before new instances get added for your app. This matters mainly if your workload can be done faster by running on multiple machines. The amount of time a message spends waiting on a queue is what drives scale-out (1 second is considered too long).
You can find more details on this in the Durable Functions Performance and Scale documentation.
One last thing I will say is the key value add of Durable Functions is orchestrating work in a reliable way in a distributed environment. However, if your workload isn't long-running, doesn't require strict durability/resilience, doesn't require scale-out to multiple VMs, and if you have strict latency requirements, then Durable Functions might not be the right tool. If you just need a single VM and want low latency, then a simple function that uses in-memory TPL may be a better choice.
QUESTION
I want to use the TPL Dataflow for my .NET Core application and followed the example from the docs.
Instead of having all the logic in one file I would like to separate each TransformBlock
and ActionBlock
(I don't need the other ones yet) into their own files. A small TransformBlock
example converting integers to strings
ANSWER
Answered 2020-Sep-15 at 08:41As @Panagiotis explained, I think you have to put aside the OOP Mindset a little. What you have with DataFlow are Buildingblocks that you configure to execute what you need. I'll try to create a little example of what I mean by that:
QUESTION
I am using multi-processing in python 3.7
Some articles say that a good number for number of processes to be used in Pool is the number of CPU cores.
My AMD Ryzen CPU has 8 cores and can run 16 threads.
So, should the number of processes be 8 or 16?
...ANSWER
Answered 2020-May-26 at 09:39Q : "So, should the number of processes be 8 or 16?"
So, should the herd of sub-processes distributed workloads are cache re-use intensive (not memory-I/O), the SpaceDOMAIN
-constraints rule, as the size of the cache-able data will play cardinal role in deciding if 8 or 16.
Why ?
Because the costs of memory-I/O are about a thousand times more expensive in the TimeDOMAIN
, paying about 3xx - 4xx [ns]
per memory-I/O, compared to 0.1 ~ 0.4 [ns]
for in-cache data.
How to Make The Decision ?
Make a small scale test, before deciding on production scale configuration.
So, should the herd of to-be distributed workloads are network-I/O, or other remarkable (locally non-singular) source of latency, dependent, the TimeDOMAIN
may benefit from doing a latency-masking trick, running 16, 160 or merely 1600 threads ( not processes in this case ).
Why ?
Because the costs of doing the over-the-network-I/O provide so, sooo, soooooo much waiting-time ( a few [ms]
of network-I/O RTT latency are time enough to do about 1E7 ~ 10.000.000
per CPU-core uop-s, which is quite a lot of work, isn't that? So, smart interleaving of even the whole processes, here also just using the latency-masked thread-based concurrent processing may fit ( as the threads waiting for the remote "answer" from over-the-network-I/O ought not fight for a GIL-lock, as they have nothing to compute until they receive their expected I/O-bytes back, have they? )
How to Make The Decision ?
Review the code to determine how many over-the-network-I/O fetches and how many about the cache-footprint sized reads are in the game (in 2020/Q2+ L1-caches grew to about a few [MB]
-s). For those cases, where these operations repeat many times, do not hesitate to spin up one thread per each "slow" network-I/O target as the processing will benefit from the just by a coincidence created masking of the "long" waiting-times at a cost of just a cheap ("fast") and (due to "many" and "long" waiting times) rather sparse thread-switching or even the O/S-driven process-scheduler mapping the full sub-processes onto a free CPU-core.
So, should the herd of to-be distributed workloads is some mix of the above cases, there is no other way than to experiment on the actual hardware local / non-local resources.
Why ?
Because there is no rule of thumb to fine-tune the mapping of the workload processing onto the actual CPU-core resources.
one may easily find to have paid way more than ever getting back
The known trap
of achieving a SlowDown, instead of a ( just wished to get ) SpeedUp
In all cases, the overhead-strict, resources-aware and atomicity of workload respecting revised Amdahl's Law identifies a point-of-diminishing returns, after which any more workers ( CPU-core-s ) will not improve the wished to get Speedup. Many surprises of getting S << 1 are expressed in StackOverflow posts, so one may read as many of what not to do (learning by anti-patterns) as one may wish.
QUESTION
I have an asynchronous stream of tasks, that is generated by applying an async lambda to a stream of items:
...ANSWER
Answered 2020-May-04 at 16:45Here is my implementation of the AwaitResults
method. It is based on a SemaphoreSlim
for controlling the concurrency level, and on a Channel>
that is used as an async queue. The enumeration of the source IAsyncEnumerable>
happens inside a fire-and-forget task (the feeder), that pushes the hot tasks to the channel. It also attaches a continuation to each task, where the semaphore is released.
The last part of the method is the yielding loop, where the tasks are dequeued from the channel one by one, and then awaited sequentially. This way the results are yielded in the same order as the tasks in the source stream.
This implementation requires that each task is awaited twice, which means that it couldn't be used for a source of type IAsyncEnumerable>
, since a ValueTask
can only be awaited once.
QUESTION
In the Azure Service Bus documentation there is a comment saying:
// Note: Use the cancellationToken passed as necessary to determine if the queueClient has already been closed.
// If queueClient has already been closed, you can choose to not call CompleteAsync() or AbandonAsync() etc.
// to avoid unnecessary exceptions.
I have been trying to find more information on how to use the token "as necessary" but it is not obvious to me. I tried reading the section on Task Cancellation but came out none the wiser.
The token has a few properties, CanBeCancelled
and IsCancellationRequested
that look interesting.
ANSWER
Answered 2020-Apr-07 at 15:48The IsCancellationRequested
property is the one that you are looking for and the if statement that you've shared is how you should use it.
Also, you could make the same check before starting any long running process as well since the message would be reprocessed anyways I suppose.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install parallel-programming
You can use parallel-programming like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page