parallel | Ruby : parallel processing | Architecture library

 by   grosser Ruby Version: v1.22.1 License: MIT

kandi X-RAY | parallel Summary

parallel is a Ruby library typically used in Architecture applications. parallel has no bugs, it has no vulnerabilities, it has a Permissive License and it has medium support. You can download it from GitHub.
Ruby: parallel processing made simple and fast
    Support
      Quality
        Security
          License
            Reuse
            Support
              Quality
                Security
                  License
                    Reuse

                      kandi-support Support

                        summary
                        parallel has a medium active ecosystem.
                        summary
                        It has 3984 star(s) with 256 fork(s). There are 77 watchers for this library.
                        summary
                        It had no major release in the last 6 months.
                        summary
                        There are 33 open issues and 146 have been closed. On average issues are closed in 83 days. There are no pull requests.
                        summary
                        It has a neutral sentiment in the developer community.
                        summary
                        The latest version of parallel is v1.22.1
                        parallel Support
                          Best in #Architecture
                            Average in #Architecture
                            parallel Support
                              Best in #Architecture
                                Average in #Architecture

                                  kandi-Quality Quality

                                    summary
                                    parallel has 0 bugs and 10 code smells.
                                    parallel Quality
                                      Best in #Architecture
                                        Average in #Architecture
                                        parallel Quality
                                          Best in #Architecture
                                            Average in #Architecture

                                              kandi-Security Security

                                                summary
                                                parallel has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
                                                summary
                                                parallel code analysis shows 0 unresolved vulnerabilities.
                                                summary
                                                There are 0 security hotspots that need review.
                                                parallel Security
                                                  Best in #Architecture
                                                    Average in #Architecture
                                                    parallel Security
                                                      Best in #Architecture
                                                        Average in #Architecture

                                                          kandi-License License

                                                            summary
                                                            parallel is licensed under the MIT License. This license is Permissive.
                                                            summary
                                                            Permissive licenses have the least restrictions, and you can use them in most projects.
                                                            parallel License
                                                              Best in #Architecture
                                                                Average in #Architecture
                                                                parallel License
                                                                  Best in #Architecture
                                                                    Average in #Architecture

                                                                      kandi-Reuse Reuse

                                                                        summary
                                                                        parallel releases are not available. You will need to build from source code and install.
                                                                        summary
                                                                        parallel saves you 724 person hours of effort in developing the same functionality from scratch.
                                                                        summary
                                                                        It has 1672 lines of code, 56 functions and 66 files.
                                                                        summary
                                                                        It has medium code complexity. Code complexity directly impacts maintainability of the code.
                                                                        parallel Reuse
                                                                          Best in #Architecture
                                                                            Average in #Architecture
                                                                            parallel Reuse
                                                                              Best in #Architecture
                                                                                Average in #Architecture
                                                                                  Top functions reviewed by kandi - BETA
                                                                                  kandi has reviewed parallel and discovered the below as its top functions. This is intended to give you an instant insight into parallel implemented functionality, and help decide if they suit your requirements.
                                                                                  • Get the number of memory processors .
                                                                                    • Return next item from the queue
                                                                                      • Initialize the client
                                                                                        • Sleep for the process .
                                                                                          • Closes the write method
                                                                                            • Pack the item .
                                                                                              • Get the number of processors .
                                                                                                • Unpack a new producer .
                                                                                                  • Push an item to the queue .
                                                                                                    Get all kandi verified functions for this library.
                                                                                                    Get all kandi verified functions for this library.

                                                                                                    parallel Key Features

                                                                                                    Ruby: parallel processing made simple and fast

                                                                                                    parallel Examples and Code Snippets

                                                                                                    Parallel processing
                                                                                                    mavendot imgLines of Code : 13dot imgno licencesLicense : No License
                                                                                                    copy iconCopy
                                                                                                    
                                                                                                                                        Flowable.range(1, 10) .flatMap(v -> Flowable.just(v) .subscribeOn(Schedulers.computation()) .map(w -> w * w) ) .blockingSubscribe(System.out::println);
                                                                                                    
                                                                                                    Flowable.range(1, 10) .parallel() .runOn(Schedulers.computation()) .map(v -> v * v) .sequential() .blockingSubscribe(System.out::println);
                                                                                                    
                                                                                                    Parallel processing
                                                                                                    mavendot imgLines of Code : 13dot imgno licencesLicense : No License
                                                                                                    copy iconCopy
                                                                                                    
                                                                                                                                        Flowable.range(1, 10) .flatMap(v -> Flowable.just(v) .subscribeOn(Schedulers.computation()) .map(w -> w * w) ) .blockingSubscribe(System.out::println);
                                                                                                    
                                                                                                    Flowable.range(1, 10) .parallel() .runOn(Schedulers.computation()) .map(v -> v * v) .sequential() .blockingSubscribe(System.out::println);
                                                                                                    
                                                                                                    Generator for parallel walk .
                                                                                                    pythondot imgLines of Code : 66dot imgLicense : Non-SPDX (Apache License 2.0)
                                                                                                    copy iconCopy
                                                                                                    
                                                                                                                                        def parallel_walk(node, other): """Walks two ASTs in parallel. The two trees must have identical structure. Args: node: Union[ast.AST, Iterable[ast.AST]] other: Union[ast.AST, Iterable[ast.AST]] Yields: Tuple[ast.AST, ast.AST] Raises: ValueError: if the two trees don't have identical structure. """ if isinstance(node, (list, tuple)): node_stack = list(node) else: node_stack = [node] if isinstance(other, (list, tuple)): other_stack = list(other) else: other_stack = [other] while node_stack and other_stack: assert len(node_stack) == len(other_stack) n = node_stack.pop() o = other_stack.pop() if ((not isinstance(n, (ast.AST, gast.AST, str)) and n is not None) or (not isinstance(o, (ast.AST, gast.AST, str)) and n is not None) or n.__class__.__name__ != o.__class__.__name__): raise ValueError('inconsistent nodes: {} ({}) and {} ({})'.format( n, n.__class__.__name__, o, o.__class__.__name__)) yield n, o if isinstance(n, str): assert isinstance(o, str), 'The check above should have ensured this' continue if n is None: assert o is None, 'The check above should have ensured this' continue for f in n._fields: n_child = getattr(n, f, None) o_child = getattr(o, f, None) if f.startswith('__') or n_child is None or o_child is None: continue if isinstance(n_child, (list, tuple)): if (not isinstance(o_child, (list, tuple)) or len(n_child) != len(o_child)): raise ValueError( 'inconsistent values for field {}: {} and {}'.format( f, n_child, o_child)) node_stack.extend(n_child) other_stack.extend(o_child) elif isinstance(n_child, (gast.AST, ast.AST)): node_stack.append(n_child) other_stack.append(o_child) elif n_child != o_child: raise ValueError( 'inconsistent values for field {}: {} and {}'.format( f, n_child, o_child))
                                                                                                    Create a parallel interleave dataset .
                                                                                                    pythondot imgLines of Code : 58dot imgLicense : Non-SPDX (Apache License 2.0)
                                                                                                    copy iconCopy
                                                                                                    
                                                                                                                                        def parallel_interleave(map_func, cycle_length, block_length=1, sloppy=False, buffer_output_elements=None, prefetch_input_elements=None): """A parallel version of the `Dataset.interleave()` transformation. `parallel_interleave()` maps `map_func` across its input to produce nested datasets, and outputs their elements interleaved. Unlike `tf.data.Dataset.interleave`, it gets elements from `cycle_length` nested datasets in parallel, which increases the throughput, especially in the presence of stragglers. Furthermore, the `sloppy` argument can be used to improve performance, by relaxing the requirement that the outputs are produced in a deterministic order, and allowing the implementation to skip over nested datasets whose elements are not readily available when requested. Example usage: ```python # Preprocess 4 files concurrently. filenames = tf.data.Dataset.list_files("/path/to/data/train*.tfrecords") dataset = filenames.apply( tf.data.experimental.parallel_interleave( lambda filename: tf.data.TFRecordDataset(filename), cycle_length=4)) ``` WARNING: If `sloppy` is `True`, the order of produced elements is not deterministic. Args: map_func: A function mapping a nested structure of tensors to a `Dataset`. cycle_length: The number of input `Dataset`s to interleave from in parallel. block_length: The number of consecutive elements to pull from an input `Dataset` before advancing to the next input `Dataset`. sloppy: A boolean controlling whether determinism should be traded for performance by allowing elements to be produced out of order. If `sloppy` is `None`, the `tf.data.Options.deterministic` dataset option (`True` by default) is used to decide whether to enforce a deterministic order. buffer_output_elements: The number of elements each iterator being interleaved should buffer (similar to the `.prefetch()` transformation for each interleaved iterator). prefetch_input_elements: The number of input elements to transform to iterators before they are needed for interleaving. Returns: A `Dataset` transformation function, which can be passed to `tf.data.Dataset.apply`. """ def _apply_fn(dataset): return readers.ParallelInterleaveDataset(dataset, map_func, cycle_length, block_length, sloppy, buffer_output_elements, prefetch_input_elements) return _apply_fn
                                                                                                    Creates a parallel map and returns the result .
                                                                                                    pythondot imgLines of Code : 38dot imgLicense : Non-SPDX (Apache License 2.0)
                                                                                                    copy iconCopy
                                                                                                    
                                                                                                                                        def _benchmark_map_and_interleave(self, autotune, benchmark_id): k = 1024 * 1024 a = (np.random.rand(1, 8 * k), np.random.rand(8 * k, 1)) b = (np.random.rand(1, 4 * k), np.random.rand(4 * k, 1)) c = (np.random.rand(1, 2 * k), np.random.rand(2 * k, 1)) dataset_a = dataset_ops.Dataset.from_tensors(a).repeat() dataset_b = dataset_ops.Dataset.from_tensors(b).repeat() dataset_c = dataset_ops.Dataset.from_tensors(c).repeat() def f1(x, y): return math_ops.matmul(x, y) def f2(a, b): x, y = b return a, math_ops.matmul(x, y) dataset = dataset_a dataset = dataset.map(f1, num_parallel_calls=dataset_ops.AUTOTUNE) dataset = dataset_ops.Dataset.range(1).repeat().interleave( lambda _: dataset, num_parallel_calls=dataset_ops.AUTOTUNE, cycle_length=2) dataset = dataset_ops.Dataset.zip((dataset, dataset_b)) dataset = dataset.map(f2, num_parallel_calls=dataset_ops.AUTOTUNE) dataset = dataset_ops.Dataset.range(1).repeat().interleave( lambda _: dataset, num_parallel_calls=dataset_ops.AUTOTUNE, cycle_length=2) dataset = dataset_ops.Dataset.zip((dataset, dataset_c)) dataset = dataset.map(f2, num_parallel_calls=dataset_ops.AUTOTUNE) return self._run_benchmark( dataset=dataset, autotune=autotune, benchmark_iters=10000, benchmark_label="map_and_interleave", benchmark_id=benchmark_id)
                                                                                                    Community Discussions

                                                                                                    Trending Discussions on parallel

                                                                                                    Parallelization in Durable Function
                                                                                                    chevron right
                                                                                                    Parallelize histogram creation in c++ with futures: how to use a template function with future?
                                                                                                    chevron right
                                                                                                    Implement barrier with pthreads on C
                                                                                                    chevron right
                                                                                                    How to thread a generator
                                                                                                    chevron right
                                                                                                    Recommended way of measuring execution time in Tensorflow Federated
                                                                                                    chevron right
                                                                                                    SLURM and Python multiprocessing pool on a cluster
                                                                                                    chevron right
                                                                                                    what is the meaning of "map" from map function?
                                                                                                    chevron right
                                                                                                    How python multithreaded program can run on different Cores of CPU simultaneously despite of having GIL
                                                                                                    chevron right
                                                                                                    Play and task execution with multiple groups and servers with ansible
                                                                                                    chevron right
                                                                                                    Is there a metric to quantify the perspectiveness in two images?
                                                                                                    chevron right

                                                                                                    QUESTION

                                                                                                    Parallelization in Durable Function
                                                                                                    Asked 2021-Jun-16 at 01:02

                                                                                                    I'm trying to understand how parallelization works in Durable Function. I have a durable function with the following code:

                                                                                                    [FunctionName(nameof(OrchestratorFunction))]
                                                                                                    public async Task RunOrchestrator(
                                                                                                        [OrchestrationTrigger] IDurableOrchestrationContext context,
                                                                                                        ILogger log)
                                                                                                    {
                                                                                                        var jobs = await context.CallActivityAsync>(nameof(jobsReaderFunction), null);
                                                                                                        if (jobs != null && jobs.Count > 0)
                                                                                                        {           
                                                                                                            var groupTasks = GetGroups(context, jobs);
                                                                                                            var groups = await Task.WhenAll(groupTasks);                
                                                                                                    
                                                                                                            var emailTasks = SendEmails(context, groups);
                                                                                                            await Task.WhenAll(emailTasks);
                                                                                                    
                                                                                                            await context.CallActivityAsync(nameof(jobsProcessorFunction), jobs);
                                                                                                        }
                                                                                                    }
                                                                                                    
                                                                                                    public List> GetGroups(IDurableOrchestrationContext context, List jobs)
                                                                                                    {
                                                                                                        var groupTasks = new List>();
                                                                                                        foreach (var job in jobs)
                                                                                                        {
                                                                                                            groupTasks.Add(context.CallActivityAsync(nameof(GroupNameReaderFunction), job));
                                                                                                        }
                                                                                                        return groupNameTasks;           
                                                                                                    }
                                                                                                    
                                                                                                    public List SendEmails(IDurableOrchestrationContext context, Group[] groups)
                                                                                                    {
                                                                                                        var emailTasks = new List();
                                                                                                        foreach (var group in groups)
                                                                                                        {
                                                                                                            emailTasks.Add(context.CallActivityAsync(nameof(EmailSenderFunction), group));
                                                                                                        }
                                                                                                        return emailTasks;
                                                                                                    }
                                                                                                    

                                                                                                    As you can see in the code, I have added 4 Activity functions -

                                                                                                    (1) to get all jobs

                                                                                                    var jobs = await context.CallActivityAsync>(nameof(jobsReaderFunction), null);
                                                                                                    

                                                                                                    After (1) is complete, (2) to get groups for all jobs

                                                                                                    var groupTasks = GetGroups(context, jobs);
                                                                                                    var groups = await Task.WhenAll(groupTasks);  
                                                                                                    

                                                                                                    After (2) is complete, (3) to send email for all groups

                                                                                                    var emailTasks = SendEmails(context, groups);
                                                                                                    await Task.WhenAll(emailTasks);
                                                                                                    

                                                                                                    After (3) is complete, (4) to process all jobs

                                                                                                    await context.CallActivityAsync(nameof(jobsProcessorFunction), jobs);
                                                                                                    

                                                                                                    If I understand correctly, parallelization occurs in 2 and 3 to get groups for each job at the same time and to send emails for each group at the same time. Is that correct?

                                                                                                    What I wanted to do is to get all jobs and all jobs should run in parallel.

                                                                                                    Job 1 -> get groups for job 1, send emails for job 1, process job 1
                                                                                                    Job 2 -> get groups for job 2, send emails for job 2, process job 2
                                                                                                    .
                                                                                                    .
                                                                                                    .
                                                                                                    Job n -> get groups for job n, send emails for job n, process job n
                                                                                                    

                                                                                                    Job 1, Job 2, Job n should run in parallel. How do I do that? Any guidance would be helpful?

                                                                                                    UPDATE:

                                                                                                    I updated my code as follows:

                                                                                                            [FunctionName(nameof(OrchestratorFunction))]
                                                                                                            public async Task RunOrchestrator(
                                                                                                                [OrchestrationTrigger] IDurableOrchestrationContext context,
                                                                                                                ILogger log)
                                                                                                            {           
                                                                                                                var jobs = await context.CallActivityAsync>(nameof(JobsReaderFunction), null);
                                                                                                                if (jobs != null && jobs .Count > 0)
                                                                                                                {
                                                                                                                    var processingTasks = new List();
                                                                                                                    foreach (var job in jobs)
                                                                                                                    {
                                                                                                                        Task processTask = context.CallSubOrchestratorAsync(nameof(SubOrchestratorFunction), job);
                                                                                                                        processingTasks.Add(processTask);
                                                                                                                    }
                                                                                                                    await Task.WhenAll(processingTasks);               
                                                                                                                }
                                                                                                            }
                                                                                                    
                                                                                                            [FunctionName(nameof(SubOrchestratorFunction))]
                                                                                                            public async Task RunSubOrchestrator(
                                                                                                                [OrchestrationTrigger] IDurableOrchestrationContext context,
                                                                                                                ILogger log)
                                                                                                            {
                                                                                                                var job = context.GetInput();            
                                                                                                                var group = await context.CallActivityAsync(nameof(GroupReaderFunction), job);
                                                                                                                await context.CallActivityAsync(nameof(EmailSenderFunction), group);
                                                                                                                var canWriteToGroup = await context.CallActivityAsync(nameof(GroupVerifierFunction), job);            
                                                                                                                await context.CallActivityAsync(nameof(JopStatusUpdaterFunction), new JopStatusUpdaterRequest { CanWriteToGroup = canWriteToGroup, Job = job });
                                                                                                                await context.CallActivityAsync(nameof(TopicMessageSenderFunction), job);            
                                                                                                            }  
                                                                                                    

                                                                                                    Does this code run all the jobs in parallel? Meaning the time taken to execute 10 jobs and 10000 jobs will be the same? Please let me know.

                                                                                                    ANSWER

                                                                                                    Answered 2021-Jun-10 at 08:44

                                                                                                    There are two approaches that are possible. The first is to use a suborchestrator for each job so that each suborchestrator handles just a specific job. Here is the docs for this approach https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-sub-orchestrations?tabs=csharp Example from docs seem to be alike to yours.

                                                                                                    The other is to use ContinueWith so that each job has its own "chain"

                                                                                                    List tasks = new List();
                                                                                                    foreach (var job in jobs){
                                                                                                       tasks.Add(context.CallActivityAsync(nameof(GroupNameReaderFunction), job)
                                                                                                            .ContinueWith(prevTask => emailTasks.Add(context.CallActivityAsync(nameof(EmailSenderFunction), prevTask.Result))));
                                                                                                    }
                                                                                                    await Task.WhenAll(tasks);
                                                                                                    

                                                                                                    Source https://stackoverflow.com/questions/67910695

                                                                                                    QUESTION

                                                                                                    Parallelize histogram creation in c++ with futures: how to use a template function with future?
                                                                                                    Asked 2021-Jun-16 at 00:46

                                                                                                    Giving a bit of context. I'm using c++17. I'm using pointer T* data because this will interop with cuda code. I'm trying write a parallel version (on CPU) of a histogram creator. The sequential version:

                                                                                                    template 
                                                                                                    vector Histogram::SortDataToHist(T* data, size_t size)
                                                                                                    {
                                                                                                        vector bars{};
                                                                                                        bars.resize(BarCount); // BarCount is the number of histogram bars
                                                                                                    
                                                                                                        for (int i = 0; i < size; ++i) // size is the count of elements (in data*) to sort
                                                                                                        {
                                                                                                            // given the value of data[i] GetBarIndex will tell which bar it belongs, for counting
                                                                                                            auto idx = GetBarIndex(data[i]);
                                                                                                            // counting
                                                                                                            bars[idx] += 1u;
                                                                                                        }
                                                                                                        return bars;
                                                                                                    }
                                                                                                    

                                                                                                    The parallel version splits the consideration of data array (read only) for several threads, sorts each sub array into local histograms and then merge (reduce) each into one final histogram. There is no need for mutex.

                                                                                                    template 
                                                                                                    vector Histogram::SortDataToHistPar(T* data, size_t size, int threadsCount)
                                                                                                    {
                                                                                                        vector bars{};
                                                                                                        bars.resize(BarCount);
                                                                                                    
                                                                                                        auto indexes = GetIndexes(size, threadsCount);
                                                                                                        vector>> futures{};
                                                                                                        
                                                                                                        // loop to start threads
                                                                                                        for (int i = 0; i < indexes.size() - 1; i++)
                                                                                                        {
                                                                                                            int idxA = indexes[i];
                                                                                                            int idxB = indexes[i + 1];
                                                                                                            future> future = async(LocalSortHist, data, idxA, idxB);
                                                                                                            // C2672 'async': no matching overloaded function found
                                                                                                    
                                                                                                            futures.push_back(future);
                                                                                                        }
                                                                                                        
                                                                                                        // loop to collect threads results
                                                                                                        for (int i = 0; i < threadsCount; ++i)
                                                                                                        {
                                                                                                            auto result = futures[i].get();
                                                                                                            for (int r = 0; r < BarCount; ++r)
                                                                                                                bars[r] += result[r];
                                                                                                        }
                                                                                                        return bars;
                                                                                                    }
                                                                                                    

                                                                                                    I could not find a way to use LocalSortHist as argument for async. As written, I get a C2672 and C3867, and with &Histogram::LocalSortHist (yes a template function has no address..) it adds C2440 and C2893. With async(LocalSortHist it cannot resolve LocalSortHist. How can I use LocalSortHist in several threads like this or so ?

                                                                                                    For consideration, LocalSortHist. The range [idxA, idxB] is the local consideration of the data array for "local sorting" or local histogram generation.

                                                                                                    template 
                                                                                                    vector Histogram::LocalSortHist(T* data, uint idxA, uint idxB)
                                                                                                    {
                                                                                                        vector bars{};
                                                                                                        bars.resize(BarCount);
                                                                                                        for (uint i = idxA; i < idxB; ++i)
                                                                                                        {
                                                                                                            auto idx = GetBarIndex(data[i]);
                                                                                                            bars[idx] += 1u;
                                                                                                        }
                                                                                                        return bars;
                                                                                                    }
                                                                                                    

                                                                                                    And GetIndexes:

                                                                                                    template 
                                                                                                    vector Histogram::GetIndexes(size_t size, int threadsCount)
                                                                                                    {
                                                                                                        vector pidx{};
                                                                                                        int w = size / threadsCount;
                                                                                                        int idx;
                                                                                                        while(idx < size)
                                                                                                        {
                                                                                                            pidx.push_back(idx);
                                                                                                            idx += w;
                                                                                                        }
                                                                                                        if (idx != size - 1)
                                                                                                            pidx.push_back(size - 1);
                                                                                                        return pidx;
                                                                                                    }
                                                                                                    

                                                                                                    A AAA test method:

                                                                                                        TEST_METHOD(SortDataToHistParSOTest)
                                                                                                        {
                                                                                                            std::default_random_engine generator{};
                                                                                                            std::normal_distribution distribution(15.0, 5.0);
                                                                                                            size_t sampleSize = 80000;
                                                                                                            size_t sampleSizeBytes = sampleSize * sizeof(float);
                                                                                                            float* samples = (float*)malloc(sampleSizeBytes);
                                                                                                            for (int i = 0; i < sampleSize; ++i)
                                                                                                            {
                                                                                                                float number = distribution(generator);
                                                                                                                samples[i] = number;
                                                                                                            }
                                                                                                            MinMax mm;
                                                                                                            mm.Min = 0.0f;
                                                                                                            mm.Max = 30.0f;
                                                                                                            Histogram sut(mm, 15);
                                                                                                    
                                                                                                            auto hist = sut.SortDataToHistPar(samples, sampleSize, 16);
                                                                                                    
                                                                                                            wstringstream s{};
                                                                                                            for (auto x : hist)
                                                                                                                s << x << L" ";
                                                                                                            Logger::WriteMessage(s.str().c_str());
                                                                                                        }
                                                                                                    

                                                                                                    ANSWER

                                                                                                    Answered 2021-Jun-16 at 00:46

                                                                                                    The issue you are having has nothing to do with templates. You cannot invoke std::async() on a member function without binding it to an instance. Wrapping the call in a lambda does the trick.

                                                                                                    Here's an example:

                                                                                                    #include 
                                                                                                    
                                                                                                    class MyClass {
                                                                                                    public:
                                                                                                      template
                                                                                                      int foo(T arg) {
                                                                                                          return 12;
                                                                                                      }
                                                                                                    
                                                                                                      int bar() {
                                                                                                        auto fut = std::async([this](auto arg){return foo(arg);}, "HI");
                                                                                                    
                                                                                                        return fut.get();
                                                                                                      }
                                                                                                    };
                                                                                                    

                                                                                                    Source https://stackoverflow.com/questions/67994778

                                                                                                    QUESTION

                                                                                                    Implement barrier with pthreads on C
                                                                                                    Asked 2021-Jun-15 at 18:32

                                                                                                    I'm trying to parallelize a merge-sort algorithm. What I'm doing is dividing the input array for each thread, then merging the threads results. The way I'm trying to merge the results is something like this:

                                                                                                    thread 0                     |   thread 1        |   thread 2         |   thread 3
                                                                                                    
                                                                                                    sort(A0)                     |   sort(A1)        |   sort(A2)         | sort(A3)
                                                                                                    merge(A0,A1)                 |                   |   merge(A2,A3)     | 
                                                                                                    merge(A0A1, A2A3)            |                   |                    |
                                                                                                    

                                                                                                    So, at the end of my function sortManager I call the function mergeThreadResults that should implement the above logic. In it I iterate over pairs to merge the corresponding threads. Then, if needed, I merge the last items onto thread 0. It looks like this :

                                                                                                    void mergeThreadResults(long myRank, int myLeft, int myRight, int size, int threads) {
                                                                                                    
                                                                                                        int nextThread;
                                                                                                        int iter = 2;
                                                                                                        while (iter <= threads) {
                                                                                                            int nextThread = (myRank+1*iter) < threads ? (myRank+1*iter) : threads;
                                                                                                            int nextThreadRight = nextThread * ((float)size / (float)threads) - 1;
                                                                                                    
                                                                                                            printf("Merging threads %ld to %d\n", myRank, nextThread);
                                                                                                            
                                                                                                            if (myRank % iter != 0) {
                                                                                                                break;
                                                                                                            }
                                                                                                    
                                                                                                            merge(sortingArray, myLeft, myRight, nextThreadRight);
                                                                                                            sleep(3); // <- sleep
                                                                                                    
                                                                                                            myRight = nextThreadRight;
                                                                                                            iter = iter * 2;
                                                                                                        }
                                                                                                    
                                                                                                         if (myRank == 0 && nextThread < threads-1) {
                                                                                                            int nextThreadRight = threads * ((float)size / (float)threads) - 1;
                                                                                                            merge(sortingArray, myLeft, myRight, nextThreadRight);
                                                                                                         }
                                                                                                    
                                                                                                    }
                                                                                                    

                                                                                                    It appears to be working as intended. The problem is, I'm using a sleep function to synchronize the threads, which is far from being the best approach. So I'm trying to implement a barrier with pthread.
                                                                                                    In it I try to calculate how many iterations will be needed on that cycle and pass it as breakpoint. When all the threads are at the same point I release the merge function and wait again in the new cycle. This is what I've tried:

                                                                                                            pthread_mutex_lock(&mutex);
                                                                                                            counter++;
                                                                                                            int breakpoint = threads % 2 == 0 ? threads/iter : threads/iter+1;
                                                                                                            if(counter >= breakpoint ) {
                                                                                                                counter = 0;
                                                                                                                pthread_cond_broadcast(&cond_var);
                                                                                                            } else {
                                                                                                                while (pthread_cond_wait(&cond_var, &mutex) != 0);
                                                                                                            }
                                                                                                            pthread_mutex_unlock(&mutex);
                                                                                                    

                                                                                                    But it's not working as intended. Some merge triggers before the last cycle has fully ended, leaving me with a partially sorted array.

                                                                                                    This is a minor example of my code for testing:

                                                                                                    #define _GNU_SOURCE
                                                                                                    
                                                                                                    #include 
                                                                                                    #include 
                                                                                                    #include 
                                                                                                    #include 
                                                                                                    #include 
                                                                                                    
                                                                                                    #include 
                                                                                                    #include 
                                                                                                    
                                                                                                    // Initialize global variables
                                                                                                    int sortingArray[20] = {5,-4,3,-1,-2,3,1,2,-2,-1,-2,-1,-2,-3,4,1234,534,123,87,123};
                                                                                                    int counter = 0;
                                                                                                    pthread_mutex_t mutex;
                                                                                                    pthread_cond_t cond_var;
                                                                                                    
                                                                                                    struct ThreadTask {
                                                                                                        long rank;
                                                                                                        int size;
                                                                                                        int threads;
                                                                                                    };
                                                                                                    
                                                                                                    void merge(int arr[], int left, int mid, int right) {
                                                                                                        /* Merge arrays */
                                                                                                    
                                                                                                        int i, j, k;
                                                                                                        int n1 = mid - left + 1;
                                                                                                        int n2 = right - mid;
                                                                                                    
                                                                                                        // Alocate temp arrays
                                                                                                        int *L = malloc((n1 + 2) * sizeof(int));
                                                                                                        int *R = malloc((n2 + 2) * sizeof(int));
                                                                                                        if (L == NULL || R == NULL) {
                                                                                                            fprintf(stderr, "Fatal: failed to allocate memory fo temp arrays.");
                                                                                                            exit(EXIT_FAILURE);
                                                                                                        }
                                                                                                    
                                                                                                        // Populate temp arrays
                                                                                                        for (i = 1; i <= n1; i++) {
                                                                                                            L[i] = arr[left + i - 1];
                                                                                                        }
                                                                                                        for (j = 1; j <= n2; j++) {
                                                                                                            R[j] = arr[mid + j];
                                                                                                        }
                                                                                                    
                                                                                                        L[n1 + 1] = INT_MAX;
                                                                                                        R[n2 + 1] = INT_MAX;
                                                                                                        i = 1;
                                                                                                        j = 1;
                                                                                                    
                                                                                                        // Merge arrays
                                                                                                        for (k = left; k <= right; k++) {
                                                                                                            if (L[i] <= R[j]) {
                                                                                                                arr[k] = L[i];
                                                                                                                i++;
                                                                                                            } else {
                                                                                                                arr[k] = R[j];
                                                                                                                j++;
                                                                                                            }
                                                                                                        }
                                                                                                    
                                                                                                        free(L);
                                                                                                        free(R);
                                                                                                    }
                                                                                                    
                                                                                                    
                                                                                                    void mergeSort(int arr[], int left, int right) {
                                                                                                        /* Sort and then merge arrays */
                                                                                                    
                                                                                                        if (left < right) {
                                                                                                            int mid = left + (right - left) / 2;
                                                                                                    
                                                                                                            mergeSort(arr, left, mid);
                                                                                                            mergeSort(arr, mid + 1, right);
                                                                                                    
                                                                                                            merge(arr, left, mid, right);
                                                                                                        }
                                                                                                    }
                                                                                                    
                                                                                                    
                                                                                                    void mergeThreadResults(long myRank, int myLeft, int myRight, int size, int threads) {
                                                                                                    
                                                                                                        int nextThread;
                                                                                                        int iter = 2;
                                                                                                        while (iter <= threads) {
                                                                                                            int nextThread = (myRank+1*iter) < threads ? (myRank+1*iter) : threads;
                                                                                                            int nextThreadRight = nextThread * ((float)size / (float)threads) - 1;
                                                                                                    
                                                                                                            printf("Merging threads %ld to %d\n", myRank, nextThread);
                                                                                                            
                                                                                                            if (myRank % iter != 0) {
                                                                                                                break;
                                                                                                            }
                                                                                                    
                                                                                                            // barrier
                                                                                                            pthread_mutex_lock(&mutex);
                                                                                                            counter++;
                                                                                                            int breakpoint = threads % 2 == 0 ? threads/iter : threads/iter+1;
                                                                                                            if(counter >= breakpoint ) {
                                                                                                                counter = 0;
                                                                                                                pthread_cond_broadcast(&cond_var);
                                                                                                            } else {
                                                                                                                while (pthread_cond_wait(&cond_var, &mutex) != 0);
                                                                                                            }
                                                                                                            pthread_mutex_unlock(&mutex);
                                                                                                    
                                                                                                            merge(sortingArray, myLeft, myRight, nextThreadRight);
                                                                                                            sleep(2); // <- sleep
                                                                                                    
                                                                                                            myRight = nextThreadRight;
                                                                                                            iter = iter * 2;
                                                                                                        }
                                                                                                    
                                                                                                         if (myRank == 0 && nextThread < threads-1) {
                                                                                                            int nextThreadRight = threads * ((float)size / (float)threads) - 1;
                                                                                                            merge(sortingArray, myLeft, myRight, nextThreadRight);
                                                                                                         }
                                                                                                    
                                                                                                    }
                                                                                                    
                                                                                                    void *sortManager(void *threadInfo) {
                                                                                                        /* Manage mergeSort between threads */
                                                                                                    
                                                                                                        struct ThreadTask *currentTask = threadInfo;
                                                                                                    
                                                                                                        // Get task arguments
                                                                                                        long rank = currentTask->rank;
                                                                                                        int left= rank * ((float)currentTask->size / (float)currentTask->threads);
                                                                                                        int right = (rank + 1) * ((float)currentTask->size / (float)currentTask->threads) - 1;
                                                                                                        int mid = left + (right - left) / 2;
                                                                                                    
                                                                                                        // Execute merge for task division
                                                                                                        if (left < right) {
                                                                                                            mergeSort(sortingArray, left, mid);
                                                                                                            mergeSort(sortingArray, mid + 1, right);
                                                                                                            merge(sortingArray, left, mid, right);
                                                                                                        }
                                                                                                    
                                                                                                        // Merge thread results
                                                                                                        if (rank % 2 == 0)  {
                                                                                                            mergeThreadResults(rank, left, right, currentTask->size, currentTask->threads);
                                                                                                        }
                                                                                                    
                                                                                                        return 0;
                                                                                                    }
                                                                                                    
                                                                                                    
                                                                                                    struct ThreadTask *threadCreator(int size, int threads, pthread_t *thread_handles, struct ThreadTask *tasksHolder) {
                                                                                                        /* Create threads with each task info */
                                                                                                    
                                                                                                        struct ThreadTask *threadTask;
                                                                                                    
                                                                                                        for (long thread = 0; thread < threads; thread++){
                                                                                                            threadTask = &tasksHolder[thread];
                                                                                                            threadTask->rank = thread;
                                                                                                            threadTask->size = size;
                                                                                                            threadTask->threads = threads;
                                                                                                    
                                                                                                            pthread_create(&thread_handles[thread], NULL, sortManager, (void*) threadTask);
                                                                                                        }
                                                                                                    
                                                                                                        return tasksHolder;
                                                                                                    }
                                                                                                    
                                                                                                    
                                                                                                    void printArray(int arr[], int size) {
                                                                                                        /* Print array */
                                                                                                    
                                                                                                        for (int arrayIndex = 0; arrayIndex < size; arrayIndex++)
                                                                                                            printf("%d ", arr[arrayIndex]);
                                                                                                        printf("\n");
                                                                                                    }
                                                                                                    
                                                                                                    
                                                                                                    int main(int argc, char *argv[]) {
                                                                                                    
                                                                                                        // Initialize arguments
                                                                                                        int arraySize = 20;
                                                                                                        int totalThreads = 16;
                                                                                                    
                                                                                                        
                                                                                                        // Display input
                                                                                                        printf("\nInput array:\n");
                                                                                                        printArray(sortingArray, arraySize);
                                                                                                        
                                                                                                    
                                                                                                        // Initialize threads
                                                                                                        pthread_t *thread_handles;
                                                                                                        thread_handles = malloc(totalThreads * sizeof(pthread_t));
                                                                                                    
                                                                                                        // Create threads
                                                                                                        struct ThreadTask threadTasksHolder[totalThreads];
                                                                                                        *threadTasksHolder = *threadCreator(arraySize, totalThreads, thread_handles, threadTasksHolder);
                                                                                                        
                                                                                                        // Execute merge sort in each thread
                                                                                                        for (long thread = 0; thread < totalThreads; thread++) {
                                                                                                            pthread_join(thread_handles[thread], NULL);
                                                                                                        }
                                                                                                        free(thread_handles);
                                                                                                        
                                                                                                    
                                                                                                        // Display output
                                                                                                        printf("\nSorted array:\n");
                                                                                                        printArray(sortingArray, arraySize);
                                                                                                        
                                                                                                        return 0;
                                                                                                    }
                                                                                                    

                                                                                                    ANSWER

                                                                                                    Answered 2021-Jun-15 at 01:58

                                                                                                    I'm trying to parallelize a merge-sort algorithm. What I'm doing is dividing the input array for each thread, then merging the threads results.

                                                                                                    Ok, but yours is an unnecessarily difficult approach. At each step of the merge process, you want half of your threads to wait for the other half to finish, and the most natural way for one thread to wait for another to finish is to use pthread_join(). If you wanted all of your threads to continue with more work after synchronizing then that would be different, but in this case, those that are not responsible for any more merges have nothing at all left to do.

                                                                                                    This is what I've tried:

                                                                                                            pthread_mutex_lock(&mutex);
                                                                                                            counter++;
                                                                                                            int breakpoint = threads % 2 == 0 ? threads/iter : threads/iter+1;
                                                                                                            if(counter >= breakpoint ) {
                                                                                                                counter = 0;
                                                                                                                pthread_cond_broadcast(&cond_var);
                                                                                                            } else {
                                                                                                                while (pthread_cond_wait(&cond_var, &mutex) != 0);
                                                                                                            }
                                                                                                            pthread_mutex_unlock(&mutex);
                                                                                                    

                                                                                                    There are several problems with that, but the biggest is that a barrier is the wrong tool for the job. After a barrier is crested, all the threads that were blocked at it proceed. You want half of the threads to proceed, performing merges, but the others (should) have no more work to do. Your computation of breakpoint assumes that that second half will not return to the barrier, which indeed they should not do. If you insist on using a barrier then the threads that have no merge to perform should terminate after passing through the barrier.

                                                                                                    Moreover, it is incorrect to start iter at 2. If you use a barrier approach then all the threads active at each iteration must reach the barrier before any proceed, but if iter starts at 2 then on the first iteration, only half of all the threads must reach the barrier before it is passed.

                                                                                                    Additionally, your CV use is non-idiomatic and susceptible to problems. None of the documented failure reasons for pthread_cond_wait() can be rescued by trying to wait again as you do, so you probably need to terminate the program on error, instead. Note also that pthread_mutex_lock(), pthread_mutex_unlock(), and pthread_cond_broadcast() all may fail, too.

                                                                                                    On the other hand, CVs are susceptible to (very rare) spurious wakeups, so on successful return from a wait you need to check the condition again before proceeding, and possibly wait again. Something more like this:

                                                                                                            if (pthread_mutex_lock(&mutex) != 0) {
                                                                                                                perror("pthread_mutex_lock");
                                                                                                                abort();
                                                                                                            }
                                                                                                            counter++;
                                                                                                            int breakpoint = threads % 2 == 0 ? threads/iter : threads/iter+1;
                                                                                                            if(counter >= breakpoint ) {
                                                                                                                counter = 0;
                                                                                                                if (pthread_cond_broadcast(&cond_var) != 0) {
                                                                                                                    perror("pthread_cond_broadcast");
                                                                                                                    abort();
                                                                                                                }
                                                                                                            } else {
                                                                                                                do {
                                                                                                                    if (pthread_cond_wait(&cond_var, &mutex) != 0) {
                                                                                                                        perror("pthread_cond_wait");
                                                                                                                        abort();
                                                                                                                    }
                                                                                                                } while (counter < breakpoint);
                                                                                                            }
                                                                                                            if (pthread_mutex_unlock(&mutex) != 0) {
                                                                                                                perror("pthread_mutex_unlock");
                                                                                                                abort();
                                                                                                            }
                                                                                                    
                                                                                                            // some threads must terminate at this point
                                                                                                    

                                                                                                    Source https://stackoverflow.com/questions/67977544

                                                                                                    QUESTION

                                                                                                    How to thread a generator
                                                                                                    Asked 2021-Jun-15 at 16:02

                                                                                                    I have a generator object, that loads quite big amount of data and hogs the I/O of the system. The data is too big to fit into memory all at once, hence the use of generator. And I have a consumer that all of the CPU to process the data yielded by generator. It does not consume much of other resources. Is it possible to interleave these tasks using threads?

                                                                                                    For example I'd guess it is possible to run the simplified code below in 11 seconds.

                                                                                                    import time, threading
                                                                                                    lock = threading.Lock()
                                                                                                    def gen():
                                                                                                        for x in range(10):
                                                                                                            time.sleep(1)
                                                                                                            yield x
                                                                                                    def con(x):
                                                                                                        lock.acquire()
                                                                                                        time.sleep(1)
                                                                                                        lock.release()
                                                                                                        return x+1
                                                                                                    

                                                                                                    However, the simplest application of threads does not run in that time. It does speed up, but I assume because of parallelism between the dispatcher which does generation and the worked. But not thanks to parallelism between workers.

                                                                                                    import joblib
                                                                                                    %time joblib.Parallel(n_jobs=2,backend='threading',pre_dispatch=2)((joblib.delayed(con)(x) for x in gen()))
                                                                                                    # CPU times: user 0 ns, sys: 0 ns, total: 0 ns
                                                                                                    # Wall time: 16 s
                                                                                                    

                                                                                                    ANSWER

                                                                                                    Answered 2021-Jun-15 at 16:02

                                                                                                    Send your data to separate processes. I used concurrent.futures because I like the simple interface.

                                                                                                    This runs in about 11 seconds on my computer.

                                                                                                    from concurrent.futures import ThreadPoolExecutor
                                                                                                    import concurrent
                                                                                                    import threading
                                                                                                    lock = threading.Lock()
                                                                                                    
                                                                                                    def gen():
                                                                                                        for x in range(10):
                                                                                                            time.sleep(1)
                                                                                                            yield x
                                                                                                    
                                                                                                    def con(x):
                                                                                                        lock.acquire()
                                                                                                        time.sleep(1)
                                                                                                        lock.release()
                                                                                                        return f'{x+1}'
                                                                                                    
                                                                                                    if __name__ == "__main__":
                                                                                                    
                                                                                                        futures = []
                                                                                                        with ThreadPoolExecutor() as executor:
                                                                                                            t0 = time.time()
                                                                                                            for x in gen():
                                                                                                                futures.append(executor.submit(con,x))
                                                                                                        results = []
                                                                                                        for future in concurrent.futures.as_completed(futures):
                                                                                                            results.append(future.result())
                                                                                                        print(time.time() - t0)
                                                                                                        print('\n'.join(results))
                                                                                                    

                                                                                                    Using 100 generator iterations (def gen(): for x in range(100):) it took about 102 seconds.

                                                                                                    Your process may need to keep track of how much data has been sent to tasks that haven't finished to prevent swamping memory resources.

                                                                                                    Adding some diagnostic prints to con seems to show that there might be at least two chunks of data out there at a time.

                                                                                                    def con(x):
                                                                                                        print(f'{x} received payload at t0 + {time.time()-t0:3.3f}')
                                                                                                        lock.acquire()
                                                                                                        time.sleep(1)
                                                                                                        lock.release()
                                                                                                        print(f'{x} released lock at t0 + {time.time()-t0:3.3f}')
                                                                                                        return f'{x+1}'
                                                                                                    

                                                                                                    Source https://stackoverflow.com/questions/67958976

                                                                                                    QUESTION

                                                                                                    Recommended way of measuring execution time in Tensorflow Federated
                                                                                                    Asked 2021-Jun-15 at 13:49

                                                                                                    I would like to know whether there is a recommended way of measuring execution time in Tensorflow Federated. To be more specific, if one would like to extract the execution time for each client in a certain round, e.g., for each client involved in a FedAvg round, saving the time stamp before the local training starts and the time stamp just before sending back the updates, what is the best (or just correct) strategy to do this? Furthermore, since the clients' code run in parallel, are such a time stamps untruthful (especially considering the hypothesis that different clients may be using differently sized models for local training)?

                                                                                                    To be very practical, using tf.timestamp() at the beginning and at the end of @tf.function client_update(model, dataset, server_message, client_optimizer) -- this is probably a simplified signature -- and then subtracting such time stamps is appropriate?

                                                                                                    I have the feeling that this is not the right way to do this given that clients run in parallel on the same machine.

                                                                                                    Thanks to anyone can help me on that.

                                                                                                    ANSWER

                                                                                                    Answered 2021-Jun-15 at 12:01

                                                                                                    There are multiple potential places to measure execution time, first might be defining very specifically what is the intended measurement.

                                                                                                    1. Measuring the training time of each client as proposed is a great way to get a sense of the variability among clients. This could help identify whether rounds frequently have stragglers. Using tf.timestamp() at the beginning and end of the client_update function seems reasonable. The question correctly notes that this happens in parallel, summing all of these times would be akin to CPU time.

                                                                                                    2. Measuring the time it takes to complete all client training in a round would generally be the maximum of the values above. This might not be true when simulating FL in TFF, as TFF maybe decided to run some number of clients sequentially due to system resources constraints. In practice all of these clients would run in parallel.

                                                                                                    3. Measuring the time it takes to complete a full round (the maximum time it takes to run a client, plus the time it takes for the server to update) could be done by moving the tf.timestamp calls to the outer training loop. This would be wrapping the call to trainer.next() in the snippet on https://www.tensorflow.org/federated. This would be most similar to elapsed real time (wall clock time).

                                                                                                    Source https://stackoverflow.com/questions/67982276

                                                                                                    QUESTION

                                                                                                    SLURM and Python multiprocessing pool on a cluster
                                                                                                    Asked 2021-Jun-15 at 13:42

                                                                                                    I am trying to run a simple parallel program on a SLURM cluster (4x raspberry Pi 3) but I have no success. I have been reading about it, but I just cannot get it to work. The problem is as follows:

                                                                                                    I have a Python program named remove_duplicates_in_scraped_data.py. This program is executed on a single node (node=1xraspberry pi) and inside the program there is a multiprocessing loop section that looks something like:

                                                                                                    pool = multiprocessing.Pool()
                                                                                                    input_iter= product(FeaturesArray_1, FeaturesArray_2, repeat=1)
                                                                                                    results = pool.starmap(refact_featureMatch, input_iter)
                                                                                                    

                                                                                                    The idea is that when it hits that part of the program it should distribute the calculations, one thread per element in the iterator and combine the results in the end. So, the program remove_duplicates_in_scraped_data.py runs once (not multiple times) and it spawns different threads during the pool calculation.

                                                                                                    On a single machine (without using SLURM) it works just fine, and for the particular case of a raspberry pi, it spawns 4 threads, does the calcuations, saves it in results and continues the progarm as a single thread.

                                                                                                    I would like to exploit all the 16 threads of the SLURM cluster but I cannot seem to get it to work. And I am confident that the cluster has been configured correctly, since it can run all the multiprocessing examples (e.g. calculate the digits of pi) using SLURM in all 16 threads of the cluster.

                                                                                                    Now, looking at the SLURM configuration with sinfo -N -l we have:

                                                                                                    NODELIST   NODES  PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
                                                                                                    node01         1 picluster*        idle    4    4:1:1      1        0      1   (null) none
                                                                                                    node02         1 picluster*        idle    4    4:1:1      1        0      1   (null) none
                                                                                                    node03         1 picluster*        idle    4    4:1:1      1        0      1   (null) none
                                                                                                    node04         1 picluster*        idle    4    4:1:1      1        0      1   (null) none
                                                                                                    

                                                                                                    Each cluster reports 4 sockets, 1 Core and 1 Thread and as far as SLURM is concerned 4 CPUs.

                                                                                                    I wish to exploit all the 16 CPUs and if I run my progam as:

                                                                                                    srun -N 4 -n 16  python3 remove_duplicates_in_scraped_data.py
                                                                                                    

                                                                                                    It will just run 4 copies of the main progam in each node, resulting in 16 threads. But this is not what I want. I want a single instance of the program, which then spawns the 16 threads across the cluster. At least we know that with srun -N -n 16 the cluster works.

                                                                                                    So, I tried instead changing the program as follows:

                                                                                                    
                                                                                                    #!/usr/bin/python3
                                                                                                    
                                                                                                    #SBATCH -p picluster
                                                                                                    #SBATCH --nodes=4
                                                                                                    #SBATCH --ntasks=16
                                                                                                    #SBATCH --cpus-per-task=1
                                                                                                    #SBATCH --ntasks-per-node=4
                                                                                                    #SBATCH --ntasks-per-socket=1
                                                                                                    #SBATCH --sockets-per-node=4
                                                                                                    
                                                                                                    sys.path.append(os.getcwd())
                                                                                                    
                                                                                                    ...
                                                                                                    ...
                                                                                                    ...
                                                                                                    pool = multiprocessing.Pool()
                                                                                                    input_iter= product(FeaturesArray_1, FeaturesArray_2, repeat=1)
                                                                                                    results = pool.starmap(refact_featureMatch, input_iter)
                                                                                                    ...
                                                                                                    ...
                                                                                                    

                                                                                                    and executing it with

                                                                                                    sbatch remove_duplicates_in_scraped_data.py
                                                                                                    

                                                                                                    The slurm job is created successfully and I see that all nodes have been allocated on the cluster

                                                                                                    PARTITION  AVAIL  TIMELIMIT  NODES  STATE NODELIST
                                                                                                    picluster*    up   infinite      4  alloc node[01-04]
                                                                                                    

                                                                                                    The program starts running as a single thread on node01 but when it hits the parallel part it only spawns 4 threads on node01 and nothing on all the other nodes.

                                                                                                    I tried different combination of settings, even tried to run it via a script

                                                                                                    #!/bin/bash
                                                                                                    
                                                                                                    
                                                                                                    #SBATCH -p picluster
                                                                                                    #SBATCH --nodes=4
                                                                                                    #SBATCH --ntasks=16
                                                                                                    #SBATCH --cpus-per-task=1
                                                                                                    #SBATCH --ntasks-per-node=4
                                                                                                    #SBATCH --ntasks-per-socket=1
                                                                                                    #SBATCH --ntasks-per-core=1
                                                                                                    #SBATCH --sockets-per-node=4
                                                                                                    
                                                                                                    python3 remove_duplicates_in_scraped_data.py
                                                                                                    

                                                                                                    but I just cannot get it to spawn on the other nodes.

                                                                                                    Can you please help me? Is it even possible to do this? i.e. use python's multiprocessing pool on different nodes of a cluster? If not, what other options do I have? The cluster also has dask configured. Would that be able to work better?

                                                                                                    Please help as I am really stuck with this.

                                                                                                    Thanks

                                                                                                    ANSWER

                                                                                                    Answered 2021-Jun-15 at 06:17

                                                                                                    Pythons multiprocessing package is limited to shared memory parallelization. It spawns new processes that all have access to the main memory of a single machine.

                                                                                                    You cannot simply scale out such a software onto multiple nodes. As the different machines do not have a shared memory that they can access.

                                                                                                    To run your program on multiple nodes at once, you should have a look into MPI (Message Passing Interface). There is also a python package for that.

                                                                                                    Depending on your task, it may also be suitable to run the program 4 times (so one job per node) and have it work on a subset of the data. It is often the simpler approach, but not always possible.

                                                                                                    Source https://stackoverflow.com/questions/67975328

                                                                                                    QUESTION

                                                                                                    what is the meaning of "map" from map function?
                                                                                                    Asked 2021-Jun-15 at 11:06

                                                                                                    I'm happy to use "map function" in python for parallelized calculations. such as below.

                                                                                                    dask_datafram.column.map(target_function)
                                                                                                    

                                                                                                    But I don't understand why the name is "map". Map is kind of drawing of land surface.

                                                                                                    does it initial? initial of what?

                                                                                                    please someone who know the meaning answer the question.

                                                                                                    ANSWER

                                                                                                    Answered 2021-Jun-14 at 20:26

                                                                                                    "Map" is also a synonym for "function" in the mathematical sense: something that sends an input to an output. You should be able to find it in any English dictionary. It can also be used as a verb for the process of transformation: "map each element to its square".

                                                                                                    The word "map" for a geographic drawing is related, in that it also "maps" each point of the real terrain to a point on the paper map, or vice versa.

                                                                                                    It is not an acronym.

                                                                                                    Source https://stackoverflow.com/questions/67976134

                                                                                                    QUESTION

                                                                                                    How python multithreaded program can run on different Cores of CPU simultaneously despite of having GIL
                                                                                                    Asked 2021-Jun-15 at 08:23

                                                                                                    In this video, he shows how multithreading runs on physical(Intel or AMD) processor cores.

                                                                                                    https://youtu.be/ecKWiaHCEKs

                                                                                                    and

                                                                                                    is python capable of running on multiple cores?

                                                                                                    All these links basically say:
                                                                                                    Python threads cannot take advantage of many physical cores. This is due to an internal implementation detail called the GIL (global interpreter lock) and if we want to utilize multiple physical cores of the CPU we must use true parallel multiprocessing module

                                                                                                    But when I ran this below code on my laptop

                                                                                                    import threading
                                                                                                    import math
                                                                                                    
                                                                                                    def worker(argument):
                                                                                                        for i in range(200000):
                                                                                                            print(math.sqrt(i))
                                                                                                        return
                                                                                                    
                                                                                                    for i in range(3):
                                                                                                        t = threading.Thread(target=worker, args=[i])
                                                                                                        t.start()
                                                                                                    

                                                                                                    I got this result

                                                                                                    Questions:

                                                                                                    1. Why did code run on all of my physical CPU cores instead of using one out of four physical cores of my CPU? If so what is the point of multiprocessing module?

                                                                                                    2. The second time, I changed the above code to create only one thread and that also took all of the CPU physical cores and took 4/4 physical cores to run.Why is that?

                                                                                                    ANSWER

                                                                                                    Answered 2021-Jun-15 at 08:06

                                                                                                    https://docs.python.org/3/library/math.html

                                                                                                    The math module consists mostly of thin wrappers around the platform C math library functions.

                                                                                                    While python itself can only execute a single instruction at a time, a low level c function that is called by python does not have this limitation.
                                                                                                    So it's not python that is using multiple cores but your system's well optimized math library that is wrapped by python's math module.

                                                                                                    That basically answers both your questions.

                                                                                                    Regarding the usefulness of multiprocessing: It is still useful for those cases, where you're trying to parallelize pure python code or code that does not call libraries that already use multiple cores. However, it comes with inter process communication (IPC) overhead that may or may not be larger than the performance gain that you get from using multiple cores. Tuning IPC is therefore often crucial for multiprocessing in python.

                                                                                                    Source https://stackoverflow.com/questions/67982013

                                                                                                    QUESTION

                                                                                                    Play and task execution with multiple groups and servers with ansible
                                                                                                    Asked 2021-Jun-14 at 21:08

                                                                                                    We have this Ansible inventory with dozens of servers, being grouped in servers per microservice. So say we have several application groups in the inventory with servers in it.

                                                                                                    Say:

                                                                                                    [group1]
                                                                                                    server1
                                                                                                    server2
                                                                                                    server3
                                                                                                    server20
                                                                                                    server27
                                                                                                    server38
                                                                                                    
                                                                                                    [group2]
                                                                                                    server4
                                                                                                    server5
                                                                                                    
                                                                                                    [group3]
                                                                                                    server7
                                                                                                    server8
                                                                                                    server9
                                                                                                    server6
                                                                                                    

                                                                                                    This inventory is being used for dozens of plays, so just changing it is not an option. I need to deal with this setup.

                                                                                                    What I need to know if it is somehow possible to have a play run in parallel on one server in each group without naming them explicitly in the plays? (groups and servers can be added by others and I need to play to be able to cope with that)

                                                                                                    So when the play starts it may process in parallel on server1, server4 and server7. Processing on server2 may start when server1 is finished, processing of server5 may start when server4 is finished, etc, etc. You get what I mean I guess. This will mean that, in the beginning, one server of every group is processed, but as time runs by smaller groups will be done whereas in larger group processing still takes place.

                                                                                                    Are there ways to achieve this?

                                                                                                    Thia

                                                                                                    ANSWER

                                                                                                    Answered 2021-Jun-08 at 15:26

                                                                                                    there is already an answer on how to run playbooks on multiple hosts answered here Ansible: deploy on multiple hosts in the same time

                                                                                                    Maybe you could start form there. However if running only first servers in parallel interests you than it will be more difficult, as it would require writing a custom script or something similar

                                                                                                    Source https://stackoverflow.com/questions/67889454

                                                                                                    QUESTION

                                                                                                    Is there a metric to quantify the perspectiveness in two images?
                                                                                                    Asked 2021-Jun-14 at 16:59

                                                                                                    I am coding a program in OpenCV where I want to adjust camera position. I would like to know if there is any metric in OpenCV to measure the amount of perspectiveness in two images. How can homography be used to quantify the degree of perspectiveness in two images as follows. The method that comes to my mind is to run edge detection and compare the parallel edge sizes but that method is prone to errors.

                                                                                                    ANSWER

                                                                                                    Answered 2021-Jun-14 at 16:59

                                                                                                    As a first solution I'd recommend maximizing the distance between the image of the line at infinity and the center of your picture.

                                                                                                    Identify at least two pairs of lines that are parallel in the original image. Intersect the lines of each pair and connect the resulting points. Best do all of this in homogeneous coordinates so you won't have to worry about lines being still parallel in the transformed version. Compute the distance between the center of the image and that line, possibly taking the resolution of the image into account somehow to make the result invariant to resampling. The result will be infinity for an image obtained from a pure affine transformation. So the larger that value the closer you are to the affine scenario.

                                                                                                    Source https://stackoverflow.com/questions/67963004

                                                                                                    Community Discussions, Code Snippets contain sources that include Stack Exchange Network

                                                                                                    Vulnerabilities

                                                                                                    No vulnerabilities reported

                                                                                                    Install parallel

                                                                                                    You can download it from GitHub.
                                                                                                    On a UNIX-like operating system, using your system’s package manager is easiest. However, the packaged Ruby version may not be the newest one. There is also an installer for Windows. Managers help you to switch between multiple Ruby versions on your system. Installers can be used to install a specific or multiple Ruby versions. Please refer ruby-lang.org for more information.

                                                                                                    Support

                                                                                                    For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
                                                                                                    Find more information at:
                                                                                                    Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
                                                                                                    Find more libraries
                                                                                                    Explore Kits - Develop, implement, customize Projects, Custom Functions and Applications with kandi kits​
                                                                                                    Save this library and start creating your kit
                                                                                                    CLONE
                                                                                                  • HTTPS

                                                                                                    https://github.com/grosser/parallel.git

                                                                                                  • CLI

                                                                                                    gh repo clone grosser/parallel

                                                                                                  • sshUrl

                                                                                                    git@github.com:grosser/parallel.git

                                                                                                  • Share this Page

                                                                                                    share link

                                                                                                    Explore Related Topics

                                                                                                    Reuse Pre-built Kits with parallel

                                                                                                    Consider Popular Architecture Libraries

                                                                                                    Try Top Libraries by grosser

                                                                                                    parallel_tests

                                                                                                    by grosserRuby

                                                                                                    pru

                                                                                                    by grosserRuby

                                                                                                    smusher

                                                                                                    by grosserRuby

                                                                                                    fast_gettext

                                                                                                    by grosserRuby

                                                                                                    wwtd

                                                                                                    by grosserRuby

                                                                                                    Compare Architecture Libraries with Highest Support

                                                                                                    Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
                                                                                                    Find more libraries
                                                                                                    Explore Kits - Develop, implement, customize Projects, Custom Functions and Applications with kandi kits​
                                                                                                    Save this library and start creating your kit