furrr | Apply Mapping Functions in Parallel using Futures | Development Tools library

 by   DavisVaughan R Version: v0.2.3 License: Non-SPDX

kandi X-RAY | furrr Summary

kandi X-RAY | furrr Summary

furrr is a R library typically used in Utilities, Development Tools applications. furrr has no bugs, it has no vulnerabilities and it has low support. However furrr has a Non-SPDX License. You can download it from GitHub.

The goal of furrr is to combine purrr’s family of mapping functions with future’s parallel processing capabilities. The result is near drop in replacements for purrr functions such as map() and map2_dbl(), which can be replaced with their furrr equivalents of future_map() and future_map2_dbl() to map in parallel. The code draws heavily from the implementations of purrr and future.apply and this package would not be possible without either of them.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              furrr has a low active ecosystem.
              It has 629 star(s) with 37 fork(s). There are 22 watchers for this library.
              OutlinedDot
              It had no major release in the last 12 months.
              There are 5 open issues and 175 have been closed. On average issues are closed in 25 days. There are 1 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of furrr is v0.2.3

            kandi-Quality Quality

              furrr has 0 bugs and 0 code smells.

            kandi-Security Security

              furrr has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              furrr code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              furrr has a Non-SPDX License.
              Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

            kandi-Reuse Reuse

              furrr releases are available to install and integrate.
              Installation instructions, examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of furrr
            Get all kandi verified functions for this library.

            furrr Key Features

            No Key Features are available at this moment for furrr.

            furrr Examples and Code Snippets

            No Code Snippets are available at this moment for furrr.

            Community Discussions

            QUESTION

            Error in future_map: argument ".f" is missing, with no default
            Asked 2022-Mar-19 at 04:55

            Requesting your help or expert opinion on a parallelization issue I am facing.

            I regularly run an Xgboost classifier model on a rather large dataset (dim(train_data) = 357,401 x 281, dims after recipe prep() are 147,304 x 1159 ) for a multiclass prediction. In base R the model runs in just over 4 hours using registerDoParallel(using all 24 cores of my server). I am now trying to run it in the Tidymodels environment, however, I am yet to find a robust parallelization option to tune the grid.

            I attempted the following parallelization options within tidymodels. All of them seem to work on a smaller subsample (eg 20% data), but options 1-4 fail when I run the entire dataset, mostly due to memory allocation issues.

            1. makePSOCKcluster(), library(doParallel)
            2. registerDoFuture(), library(doFuture)
            3. doMC::registerDoMC()
            4. plan(cluster, workers), doFuture, parallel
            5. registerDoParallel(), library(doParallel)
            6. future::plan(multisession), library(furrr)

            Option 5 (doParallel) has worked with 100% data in the tidymodel environment, however, it takes 4-6 hours to tune the grid. I would request your attention to option 6 (future/ furrr), this appeared to be the most efficient of all methods I tried. This method however worked only once (successful code included below, please note I have incorporated a racing method and stopping grid into the tuning).

            ...

            ANSWER

            Answered 2022-Mar-19 at 04:55

            Apparently, in tidymodels code, the parallelization happens internally, and there is no need to use furrr/future to do manual parallel computation. Moreover, the above code may be syntactically incorrect. For a more detailed explanation of why this is please see this post by mattwarkentin in the R Studio community forum.

            Source https://stackoverflow.com/questions/71506192

            QUESTION

            How to parallelize future_pmap() across multiple slurm nodes
            Asked 2022-Feb-04 at 00:40

            I have access to a large computing cluster with many nodes each of which has >16 cores, running Slurm 20.11.3. I want to run a job in parallel using furrr::future_pmap(). I can parallelize across multiple cores on a single node but I have not been able to figure out the correct syntax to take advantage of cores on multiple nodes. See this related question.

            Here is a reproducible example where I made a function that sleeps for 5 seconds and returns the starting time, ending time, and the node name.

            ...

            ANSWER

            Answered 2022-Feb-04 at 00:40

            QUESTION

            Error while predicting a GAM model using tidymodels
            Asked 2022-Jan-12 at 23:47

            WHAT I WANT: I'm trying to fit a GAM model for classification using tidymodels on a given data.

            SO FAR: I'm able to fit a logit model.

            ...

            ANSWER

            Answered 2022-Jan-12 at 23:47

            This problem has been fixed in the developmental version of {parsnip} (>0.1.7). You can install it by running remotes::install_github("tidymodels/parsnip").

            Source https://stackoverflow.com/questions/70682454

            QUESTION

            R future multisession Error: values() is defunct in future (>= 1.20.0). Use value() instead
            Asked 2021-Dec-25 at 06:59

            I had a function successfully using future multisession, but with an update of future received the below error, which I have not figured out how to solve:

            Error: values() is defunct in future (>= 1.20.0). Use value() instead.

            Please see below example (where I am not using values(), but still get the error):

            ...

            ANSWER

            Answered 2021-Dec-25 at 06:59

            This was solved by updating the furrr package.

            Source https://stackoverflow.com/questions/70470555

            QUESTION

            no applicable method for 'prep' applied to an object of class
            Asked 2021-Nov-07 at 21:41

            I am building a custom recipes function and getting an error when I try to prep() the recipe. I get the following error:

            ...

            ANSWER

            Answered 2021-Nov-07 at 21:41

            @importFrom recipes prep bake had to be added to the .R file

            Source https://stackoverflow.com/questions/69852944

            QUESTION

            Why is `furrr::future_map_int()` slower than `purrr::map_int()` when I use `dplyr::mutate()`?
            Asked 2021-Nov-02 at 22:59

            I have a tibble that includes a list-column with vectors inside. I want to create a new column that accounts for the length of each vector. Since this dataset is large (3M rows), I thought to shave off some processing time using the furrr package. However, it seems that purrr is faster than furrr. How come?

            To demonstrate the problem, I first simulate some data. Don't bother to understand the code in the simulation part as it's irrelevant to the question.

            data simulation function

            ...

            ANSWER

            Answered 2021-Nov-02 at 22:59

            As I have argued in the comments to the original post, my suspicion is that there is an overhead caused by the distribution the very large dataset by the workers.

            To substantiate my suspicion, I have used the same code used by the OP with a single modification: I have added a delay of 0.000001 and the results were: purrr --> 192.45 sec and furrr: 44.707 sec (8 workers). The time taken by furrr was only 1/4 of the one taken by purrr -- very far from 1/8!

            My code is below, as requested by the OP:

            Source https://stackoverflow.com/questions/69808082

            QUESTION

            Making rollApply() skip n steps - R
            Asked 2021-Oct-21 at 18:48

            Below is my attempt at a minimal reproducible example. Briefly explained, I am using rollApply from the rowr package to calculate a function over a rolling window, and using data from two columns simultaneously. If possible, I would like to skip n steps between each time the function is calculated on a new window. I will try to make it clear what I mean in the example below.

            Here is the example data:

            ...

            ANSWER

            Answered 2021-Oct-21 at 18:48

            1) The rowr package was removed from CRAN but we can use rollapplyr (like rollapply but the r on the end means to default to right alignment) from zoo which has a by.column= argument to specify whether processing is performed column by column (TRUE) or all columns are passed at once (FALSE) and a by= argument which causes skipping.

            Source https://stackoverflow.com/questions/69666330

            QUESTION

            Comparing R and Python Vectorization and Optimization
            Asked 2021-Oct-15 at 19:40

            In the R language, optimization can be achieved by using purrr::map() or furrr::future_map() functions. However, I am not sure how does optimization works for np.array() methods. Indeed, I would like to understand how does Python and R scales out to parallel processing [1, 2] in terms of complexity and performance.

            Thus, the following questions arise:

            How does the optimization of np.array() in Python works comparing to purrr::map() and furrr::future_map() functions in the R language?

            By doing a simple tictoc test on purrr/furrr, I can observe that we have a big win from vectorization in both cases. Nonetheless, I can also notice that the results seem to show that the R language is just fundamentally faster.

            Python ...

            ANSWER

            Answered 2021-Oct-15 at 19:40

            I believe numpy wraps some of its "primitive" objects in wrapper classes which are, themselves, Python (eg. this one). When looking at the R mirror source, I conversely find an array class that's basically native code (aka C). That extra indirection layer alone could explain the difference in speed, I guess.

            Source https://stackoverflow.com/questions/69498290

            QUESTION

            Parallel GIS with `future`
            Asked 2021-Sep-28 at 13:20

            I noticed that when loading furrr after raster, I am warned that values is masked:

            ...

            ANSWER

            Answered 2021-Sep-28 at 13:20

            The workers aren't loading the sf package. Use .options = furrr_options(packages = "sf").

            Reproducing the issue:

            Source https://stackoverflow.com/questions/69349290

            QUESTION

            Is `Map()` when used in a `data.table` parallel? - R
            Asked 2021-Aug-29 at 10:08

            From the data.table package website, given that:

            "many common operations are internally parallelized to use multiple CPU threads"

            • I would like to know if that is the case when Map() is used within a data.table?

            The reason for asking is because I have noticed that comparing the same operation on a large dataset (cor.test(x, y) with x = .SD and y being a single column of the dataset), the one using Map() performs quicker than when furrr::fututre_map2() is used.

            ...

            ANSWER

            Answered 2021-Aug-29 at 10:08

            You can use this rather explorative approach and see whether the time elapsed shrinks when more threads are used. Note that on my machine the maximum number of usable threads is just one, so no difference is possible

            Source https://stackoverflow.com/questions/68971670

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install furrr

            You can install the released version of furrr from CRAN with:.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/DavisVaughan/furrr.git

          • CLI

            gh repo clone DavisVaughan/furrr

          • sshUrl

            git@github.com:DavisVaughan/furrr.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Explore Related Topics

            Consider Popular Development Tools Libraries

            FreeCAD

            by FreeCAD

            MailHog

            by mailhog

            front-end-handbook-2018

            by FrontendMasters

            front-end-handbook-2017

            by FrontendMasters

            tools

            by googlecodelabs

            Try Top Libraries by DavisVaughan

            slider

            by DavisVaughanR

            almanac

            by DavisVaughanR

            strapgod

            by DavisVaughanR

            ivs

            by DavisVaughanR

            cbuild

            by DavisVaughanR