furrr | Apply Mapping Functions in Parallel using Futures | Development Tools library

by DavisVaughan R Version: v0.2.3 License: Non-SPDX

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | furrr Summary

furrr is a R library typically used in Utilities, Development Tools applications. furrr has no bugs, it has no vulnerabilities and it has low support. However furrr has a Non-SPDX License. You can download it from GitHub.

The goal of furrr is to combine purrr’s family of mapping functions with future’s parallel processing capabilities. The result is near drop in replacements for purrr functions such as map() and map2_dbl(), which can be replaced with their furrr equivalents of future_map() and future_map2_dbl() to map in parallel. The code draws heavily from the implementations of purrr and future.apply and this package would not be possible without either of them.

Support

Quality

Security

License

Reuse

Support

furrr has a low active ecosystem.

It has 629 star(s) with 37 fork(s). There are 22 watchers for this library.

It had no major release in the last 12 months.

There are 5 open issues and 175 have been closed. On average issues are closed in 25 days. There are 1 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of furrr is v0.2.3

Quality

furrr has 0 bugs and 0 code smells.

Security

furrr has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

furrr code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

furrr has a Non-SPDX License.

Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

Reuse

furrr releases are available to install and integrate.

Installation instructions, examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of furrr

Get all kandi verified functions for this library.

furrr Key Features

No Key Features are available at this moment for furrr.

furrr Examples and Code Snippets

No Code Snippets are available at this moment for furrr.

Community Discussions

Trending Discussions on furrr

Error in future_map: argument ".f" is missing, with no default

How to parallelize future_pmap() across multiple slurm nodes

Error while predicting a GAM model using tidymodels

R future multisession Error: values() is defunct in future (>= 1.20.0). Use value() instead

no applicable method for 'prep' applied to an object of class

Why is `furrr::future_map_int()` slower than `purrr::map_int()` when I use `dplyr::mutate()`?

Making rollApply() skip n steps - R

Comparing R and Python Vectorization and Optimization

Parallel GIS with `future`

Is `Map()` when used in a `data.table` parallel? - R

QUESTION

Error in future_map: argument ".f" is missing, with no default

Asked 2022-Mar-19 at 04:55

Requesting your help or expert opinion on a parallelization issue I am facing.

I regularly run an Xgboost classifier model on a rather large dataset (dim(train_data) = 357,401 x 281, dims after recipe prep() are 147,304 x 1159 ) for a multiclass prediction. In base R the model runs in just over 4 hours using registerDoParallel(using all 24 cores of my server). I am now trying to run it in the Tidymodels environment, however, I am yet to find a robust parallelization option to tune the grid.

I attempted the following parallelization options within tidymodels. All of them seem to work on a smaller subsample (eg 20% data), but options 1-4 fail when I run the entire dataset, mostly due to memory allocation issues.

makePSOCKcluster(), library(doParallel)
registerDoFuture(), library(doFuture)
doMC::registerDoMC()
plan(cluster, workers), doFuture, parallel
registerDoParallel(), library(doParallel)
future::plan(multisession), library(furrr)

Option 5 (doParallel) has worked with 100% data in the tidymodel environment, however, it takes 4-6 hours to tune the grid. I would request your attention to option 6 (future/ furrr), this appeared to be the most efficient of all methods I tried. This method however worked only once (successful code included below, please note I have incorporated a racing method and stopping grid into the tuning).

...

ANSWER

Answered 2022-Mar-19 at 04:55

Apparently, in tidymodels code, the parallelization happens internally, and there is no need to use furrr/future to do manual parallel computation. Moreover, the above code may be syntactically incorrect. For a more detailed explanation of why this is please see this post by mattwarkentin in the R Studio community forum.

Source https://stackoverflow.com/questions/71506192

QUESTION

How to parallelize future_pmap() across multiple slurm nodes

Asked 2022-Feb-04 at 00:40

I have access to a large computing cluster with many nodes each of which has >16 cores, running Slurm 20.11.3. I want to run a job in parallel using furrr::future_pmap(). I can parallelize across multiple cores on a single node but I have not been able to figure out the correct syntax to take advantage of cores on multiple nodes. See this related question.

Here is a reproducible example where I made a function that sleeps for 5 seconds and returns the starting time, ending time, and the node name.

...

ANSWER

Answered 2022-Feb-04 at 00:40

Source https://stackoverflow.com/questions/70789559

QUESTION

Error while predicting a GAM model using tidymodels

Asked 2022-Jan-12 at 23:47

WHAT I WANT: I'm trying to fit a GAM model for classification using tidymodels on a given data.

SO FAR: I'm able to fit a logit model.

...

ANSWER

Answered 2022-Jan-12 at 23:47

This problem has been fixed in the developmental version of {parsnip} (>0.1.7). You can install it by running remotes::install_github("tidymodels/parsnip").

Source https://stackoverflow.com/questions/70682454

QUESTION

R future multisession Error: values() is defunct in future (>= 1.20.0). Use value() instead

Asked 2021-Dec-25 at 06:59

I had a function successfully using future multisession, but with an update of future received the below error, which I have not figured out how to solve:

Error: values() is defunct in future (>= 1.20.0). Use value() instead.

Please see below example (where I am not using values(), but still get the error):

...

ANSWER

Answered 2021-Dec-25 at 06:59

This was solved by updating the furrr package.

Source https://stackoverflow.com/questions/70470555

QUESTION

no applicable method for 'prep' applied to an object of class

Asked 2021-Nov-07 at 21:41

I am building a custom recipes function and getting an error when I try to prep() the recipe. I get the following error:

...

ANSWER

Answered 2021-Nov-07 at 21:41

@importFrom recipes prep bake had to be added to the .R file

Source https://stackoverflow.com/questions/69852944

QUESTION

Why is `furrr::future_map_int()` slower than `purrr::map_int()` when I use `dplyr::mutate()`?

Asked 2021-Nov-02 at 22:59

I have a tibble that includes a list-column with vectors inside. I want to create a new column that accounts for the length of each vector. Since this dataset is large (3M rows), I thought to shave off some processing time using the furrr package. However, it seems that purrr is faster than furrr. How come?

To demonstrate the problem, I first simulate some data. Don't bother to understand the code in the simulation part as it's irrelevant to the question.

data simulation function

...

ANSWER

Answered 2021-Nov-02 at 22:59

As I have argued in the comments to the original post, my suspicion is that there is an overhead caused by the distribution the very large dataset by the workers.

To substantiate my suspicion, I have used the same code used by the OP with a single modification: I have added a delay of 0.000001 and the results were: purrr --> 192.45 sec and furrr: 44.707 sec (8 workers). The time taken by furrr was only 1/4 of the one taken by purrr -- very far from 1/8!

My code is below, as requested by the OP:

Source https://stackoverflow.com/questions/69808082

QUESTION

Making rollApply() skip n steps - R

Asked 2021-Oct-21 at 18:48

Below is my attempt at a minimal reproducible example. Briefly explained, I am using rollApply from the rowr package to calculate a function over a rolling window, and using data from two columns simultaneously. If possible, I would like to skip n steps between each time the function is calculated on a new window. I will try to make it clear what I mean in the example below.

Here is the example data:

...

ANSWER

Answered 2021-Oct-21 at 18:48

1) The rowr package was removed from CRAN but we can use rollapplyr (like rollapply but the r on the end means to default to right alignment) from zoo which has a by.column= argument to specify whether processing is performed column by column (TRUE) or all columns are passed at once (FALSE) and a by= argument which causes skipping.

Source https://stackoverflow.com/questions/69666330

QUESTION

Comparing R and Python Vectorization and Optimization

Asked 2021-Oct-15 at 19:40

In the R language, optimization can be achieved by using purrr::map() or furrr::future_map() functions. However, I am not sure how does optimization works for np.array() methods. Indeed, I would like to understand how does Python and R scales out to parallel processing [1, 2] in terms of complexity and performance.

Thus, the following questions arise:

How does the optimization of np.array() in Python works comparing to purrr::map() and furrr::future_map() functions in the R language?

By doing a simple tictoc test on purrr/furrr, I can observe that we have a big win from vectorization in both cases. Nonetheless, I can also notice that the results seem to show that the R language is just fundamentally faster.

Python ...

ANSWER

Answered 2021-Oct-15 at 19:40

I believe numpy wraps some of its "primitive" objects in wrapper classes which are, themselves, Python (eg. this one). When looking at the R mirror source, I conversely find an array class that's basically native code (aka C). That extra indirection layer alone could explain the difference in speed, I guess.

Source https://stackoverflow.com/questions/69498290

QUESTION

Parallel GIS with `future`

Asked 2021-Sep-28 at 13:20

I noticed that when loading furrr after raster, I am warned that values is masked:

...

ANSWER

Answered 2021-Sep-28 at 13:20

The workers aren't loading the sf package. Use .options = furrr_options(packages = "sf").

Reproducing the issue:

Source https://stackoverflow.com/questions/69349290

QUESTION

Is `Map()` when used in a `data.table` parallel? - R

Asked 2021-Aug-29 at 10:08

From the data.table package website, given that:

"many common operations are internally parallelized to use multiple CPU threads"

I would like to know if that is the case when Map() is used within a data.table?

The reason for asking is because I have noticed that comparing the same operation on a large dataset (cor.test(x, y) with x = .SD and y being a single column of the dataset), the one using Map() performs quicker than when furrr::fututre_map2() is used.

...

ANSWER

Answered 2021-Aug-29 at 10:08

You can use this rather explorative approach and see whether the time elapsed shrinks when more threads are used. Note that on my machine the maximum number of usable threads is just one, so no difference is possible

Source https://stackoverflow.com/questions/68971670

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install furrr

You can install the released version of furrr from CRAN with:.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: