starmap | Starmap is a program for viewing the positions of the stars
kandi X-RAY | starmap Summary
kandi X-RAY | starmap Summary
The positions of the stars are calculated with the help of EDB databases and pyephem library according to the date and location of the request. After the positions of the stars are calculated, these positions are converted to the format requested by the star-charts library with edb_converter. After converting to the desired format, SVG file is created and displayed with the help of star-charts library.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of starmap
starmap Key Features
starmap Examples and Code Snippets
Community Discussions
Trending Discussions on starmap
QUESTION
I am trying to run a simple parallel program on a SLURM cluster (4x raspberry Pi 3) but I have no success. I have been reading about it, but I just cannot get it to work. The problem is as follows:
I have a Python program named remove_duplicates_in_scraped_data.py. This program is executed on a single node (node=1xraspberry pi) and inside the program there is a multiprocessing loop section that looks something like:
...ANSWER
Answered 2021-Jun-15 at 06:17Pythons multiprocessing package is limited to shared memory parallelization. It spawns new processes that all have access to the main memory of a single machine.
You cannot simply scale out such a software onto multiple nodes. As the different machines do not have a shared memory that they can access.
To run your program on multiple nodes at once, you should have a look into MPI (Message Passing Interface). There is also a python package for that.
Depending on your task, it may also be suitable to run the program 4 times (so one job per node) and have it work on a subset of the data. It is often the simpler approach, but not always possible.
QUESTION
I have the following three for loop (two of them are nested) in python. The API requests should be sent concurrently. How to parallelized the execution?
...ANSWER
Answered 2021-Jun-13 at 18:03Looking at the three instances of apiString
and rewriting them to use the more succinct F-strings, they all appear to be of the form:
QUESTION
I am using around 1500 process multiprocessing.pool to run tasks parallely.
In every process I need to perform a funtion every 5 mins, i.e every 5 mins I need to read 1500 files simultaneously.For every thread I am using time.sleep(300 - start_time) at the end of execution of function. However when trying to sleep only 16 process are executed because of 16 cores in my PC, rest all processes are not working.
below is the code:
...ANSWER
Answered 2021-Jun-04 at 17:02I do not think Pool or the way you do this is the best approach. If you have a Pool, it will have N workers that can run in parallel. By default N is the number of your cores. In your case the first thing a worker does is that it goes to sleep. It blocks the worker in the pool but it is not doing anything. You can try increasing the number of pool workers but you will most likely hit an OS limit of something if you try to launch 1500 workers.
My strong advice is to redesign your app so that you will do the sleeping and waiting somewhere else and only dispatch tasks to a Pool worker when the nap time is over. Not sure if it is suitable in your case, but you could combine threads and Pool. Make the napping happen in a thread and only dispatch a worker when there is work to be done, and remove all sleeps from your pool worker.
This is just a dummy snippet to demonstrate the idea but it runs.
QUESTION
I get while count < max_iteration:ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() with the code that I have written, cant seem to find the error and not sure why the error appears. Can anyone help me with this ? Much appreciate the help in advance.
...ANSWER
Answered 2021-May-14 at 09:43You are entering the max_iteration
value as an array and then comparing an int
count with it.You might want to do something like this, I'm not sure which dimension you want, but since both are same so:
QUESTION
I'm trying to implement multithreading to a very time consuming program, and I've come across this SO answer: https://stackoverflow.com/a/28463266/3451339, which basically offers this solution for multiple arrays:
...ANSWER
Answered 2021-May-08 at 03:58Python has the multiprocessing module
which can run multiple tasks in parallel and inside each process you can have multiple threads or async io
code
Here is a working example which uses 3 Processes and Multithreading
QUESTION
I've encountered an error running an original script on iMac (2011, macOS High Sierra, Intel Core i7 2600, Python3.9.2), so reproduced behavior with simple code below:
...ANSWER
Answered 2021-Mar-20 at 21:24Let's look at your original approach where power
was undefined. First, a couple of comments. You are calling worker function multiply_by
, but I do not see where that is defined either. I do see, however, function raise_to_power
. The error you should have gotten is that multiply_by
is not defined. So this is a bit puzzling. Second, I see that you are computing a chunksize
rather than using the default function for doing this which would compute a value roughly 1/4 the size you compute. Larger chunksizes mean fewer memory transfers (good) but could result in processors ending up being idle if they do not process their tasks at the same rate (bad), which granted is not that likely in your case. I can see having your own function for calculating the chunksize since the function used by the map
method must convert its iterable argument to a list if necessary in order to get its length and if the iterable is very large, this could be very memory-inefficient. But you have already converted a range
to a list so you haven't taken advantage of the opportunity to save memory by doing your own calculation.
As Mark Satchell indicated, the simple solution for your case would be just to make power
a global. But let's consider the general case. If your platform is Windows or any platform that uses spawn
to create new processes (I am guessing this well might be the case based on your use of if __name__ == '__main__':
governing the code that creates new processes), then any code that is at the global scope will be executed for every new process created. This is not an issue for a statement like power = 10
. But if power
required far more complicated code to initialize its value, it would be inefficient to re-execute this code over and over again for each process in the pool of processes. Or consider the case where power
was a very large array. It would be perhaps too costly to create instances of this array in each sub-process's memory space. Then what is required is a single instance of the array in shared memory.
There is a mechanism for initializing global variables for each sub-process in a pool by using the initializer and initargs arguments when creating the Pool
instance. I have also made an additional change to save memory by taking advantage of the fact that you are using your own chunksize calculation:
QUESTION
I have below code that reads from a csv file a number of ticker symbols into a dataframe.
Each ticker calls the Web Api returning a dafaframe df which is then attached to the last one until complete.
The code works , but when a large number of tickers is used the code slows down tremendously.
I understand I can use multiprocessing and threads to speed up my code but dont know where to start and what would be the most suited in my particular case.
What code should I use to get my data into a combined daframe in the fastest possible manner?
...ANSWER
Answered 2021-Mar-18 at 21:01First optimization is to avoid concatenate your dataframe at each iteration.
You can try something like that:
QUESTION
I have a function returning a tuple of two elements. The function is called with pool starmap to generate a list of tuples which are unpacked to two lists.
...ANSWER
Answered 2021-Mar-12 at 20:10It is possible for another delayed function to unpack the tuple, in the example below, the delayed value of return_tuple(1)
was not computed, but passed as a delayed
object:
QUESTION
I am using dask to handle data from the variations of many many parameters where I aim to build a final dask dataframe of 600 000 ( number of cases or columns) from operations on dask arrays constructed from small arrays of shape less than 2000. Here, my final dataframe computed for 6400 cases
...ANSWER
Answered 2021-Mar-11 at 17:46600 000 ( number of cases or columns)
Judging by the names of the columns you provide in the sample, your workflow might benefit from a better re-organization of the data, which might also simplify the calculations.
How can I use append at each step or assign to dask array without loading memory or is there a better approach ?
You are probably interested in delayed
API, but the question/problem is not sufficiently clear to provide further advice.
QUESTION
I am trying to extract specific information from a directory of html files that I earlier extracted using the request library of python. Extraction of the htmls was already slow since I built in a random wait timer but now that I want to itterate over each retrieved html file, it seems like my script is not very well optimalized. This is a problem since I want to itterate over 42000 html files, each with > 8000 lines. This would take a lot of time probably.
Since I have never run into these problems that were this demanding for my computer, I do not know where to start learning to optimalize my code. My question to you, should I approach this problem differently, possibly in a more time efficient way?? Your suggestions would be very much appreciated.
Here is the code I am using, I changed some sensitive information:
...ANSWER
Answered 2021-Mar-08 at 05:01This question is maybe better suited for code review see this post
Issues that you could encounter:
- you have to deal with a lot of files.
- your process might be limited by the amount of open files you can have at one time.
- you might be limited by the speed you can open and read the files.
- you might be limited by the speed that you can process those files.
- maybe you can't keep the lists that you are constructing in memory
- the process can last a long time and might fail after having already done a lot of work.
Ways you might mitigate these issues:
- you should close file handlers when your done processing a file
- you might use multithreading or multiprocessing or another way to ask the operating system to open and read multiple files at once if you are limited by the speed you can open or read files in parallel
- you might use multiprocessing to parallelize processing of read files. (don't use multithreading for this because of the GIL)
- you might want to write a partial result to disk so you can limit your memory footprint
- you might want to write a partial result to disk so you pause and resume your process. for example if you encounter an error.
- you might want to store the end result to disk so you only have to parse all these files once and can later run more queries on the dataframe
One format to store a partial result that I recommend is CSV. It is easy to read and append and panda's already has support to read and write from CSV.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install starmap
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page