ffbase | Basic functionality for R package ff
kandi X-RAY | ffbase Summary
kandi X-RAY | ffbase Summary
R is an excellent statistical tool. However its important data objects are memory objects: all processing in R takes place in memory. ff is a R package for working with vectors that are bigger than memory, but lacks at the moment some standard statistical methods. The intention of ffbase is to provide the basic statistical functions for ff objects, so programming with ff will be easier.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of ffbase
ffbase Key Features
ffbase Examples and Code Snippets
Community Discussions
Trending Discussions on ffbase
QUESTION
I'm trying to use OHDSI:s version of the SelfControlledCaseSeries
package, which utilizes the ff
package to handle big data. But something is not working with the ffwhich
function. Running the following example, provided in the ffwhich
documentation:
ANSWER
Answered 2019-Sep-19 at 02:57A similar error was reported on the package's git hub. Appears to be an issue with operating system (Windows 10?). @jwijffels provides the reason in the comments:
Haven't got windows 10 machine myself but the problem clearly comes from ff::chunk, namely from ff::chunk.ff_vector which is defined as follows
The relevant part is this: b <- BATCHBYTES%/%RECORDBYTES. This calculation apparently on your machine gives 23058430092136940 for reasons beyond my understanding (given that you report it works on Rgui but not on RStudio).
You could probably get around on this by changing option ffbatchbytes to something like this options(ffbatchbytes = 84882227) - which is the number I have on my oldskool windows 7
I was able to reproduce your error and correct it using the above suggestion:
QUESTION
I am trying to run a generalized linear model on a very large dataset (several million rows). R doesn't seem able to handle the analysis, however, as I keep getting memory allocation errors (unable to allocate vector of size...etc.).
The data fit in RAM, but seem to be too large to estimate complex models. As a solution, I'm exploring using the ff package to replace r's in-RAM storage mechanism with on-disk storage.
I have successfully (I think) off-loaded the data to my hard drive, but when I attempt to estimate the glm (via the biglm package) I get the following error:
...ANSWER
Answered 2019-May-09 at 10:49You are using the wrong family argument.
QUESTION
I am working with a large set of data that contains more than 2^31 observations. The actual number of observations is close to 3.5 billion observations.
I am using the R package "biglm" to run a regression with approximately 70 predictors. I read in the data one million rows at a time and update the regression results. The data have been saved in the ffdf format using the R library "ffdf" to load quickly and avoid using up all my RAM.
Here is the basic outline of the code I am using:
...ANSWER
Answered 2017-Jul-04 at 06:28I believe that I have found the source of the issue in the biglm code.
The number of observations (n
) is stored as an integer, which has a max value of 2^31 - 1
.
The numeric
type is not subject to this limit, and, as far as I can tell, can be used instead of integers to store n
.
Here is a commit on github showing how to fix this problem with one additional line of code that converts the integer n
to a numeric
. As the model is updated, the number of rows in the new batch is added to the old n
, so the type of n
remains numeric
.
I was able to reproduce the error described in this question and verify that my fix works with this code:
(WARNING: This consumes a large amount of memory, consider doing more iterations with a smaller array if you have tight memory constraints)
QUESTION
I am having trouble doing the following operations in a larger dataset. I wonder if there is a built in way to do it with either ff or ffdf.
Example: Modifying a character columns in an ffdf object using substr and reassign it as a different column:
...ANSWER
Answered 2017-Apr-06 at 11:49require(ffbase)
data(iris, package = "datasets")
x <- as.ffdf(iris)
x$spec <- with(x[c("Species")], substr(Species, 1, 4))
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install ffbase
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page