partools | Tools to aid coding in the R 'parallel ' package | Machine Learning library
kandi X-RAY | partools Summary
kandi X-RAY | partools Summary
Miscellaneous utilities for parallelizing large computations. Alternative to MapReduce. File splitting and distributed operations such as sort and aggregate. "Software Alchemy" method for parallelizing most statistical methods, presented in N. Matloff, Parallel Computation for Data Science, Chapman and Hall, 2015. Includes a debugging aid.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of partools
partools Key Features
partools Examples and Code Snippets
Community Discussions
Trending Discussions on partools
QUESTION
I am using the partools
package to run linear regressions in parallel. I am doing this using the calm()
function, which is a wrapper for the package's version of R's lm()
.
I'm using 20 cores on a 64gb node.
I receive errors when I run the calm()
function, and I've isolated the problem to a single variable: agelvl
. Since partools
must split a dataset into chunks (the number of chunks equaling the number of cores to be used), variables, from what I can tell, are stored as either character or integer. agelvl
is stored as a character due to it's named levels, so I use factor()
around it in the function.
Here's the code:
...ANSWER
Answered 2018-Aug-06 at 19:53According to the author of partools
, this could be a scaling issue -- so that, even if no levels of a categorical variable are missing in any one chunk, the error may still occur because the number of observations in a given level are both absolutely and relatively low.
Solutions
Decrease the number of chunks: assuming there is a point at which the error will disappear, you can decrease the number of chunks; however, this also means that you are decreasing the number of cores you will use which means that (a) each chunk may be so large so that you run into memory problems or (b) the parallel processes now run too slow, or (c) both.
Alter the levels/variable structure: you can leave the desired number of chunks/cores as-is, and simply alter the levels so that each level has a critical number of observations. For
agelvl
, you could increase the intervals (10 years, instead of 5), or, if possible, change age from a categorical variable to a continuous one. One should keep in mind that such changes could alter the explanatory power of the model or cause the model to be incorrectly specified.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install partools
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page