disk.frame | Fast Disk-Based Parallelized Data Manipulation Framework

 by   xiaodaigh R Version: v0.6 License: Non-SPDX

kandi X-RAY | disk.frame Summary

kandi X-RAY | disk.frame Summary

disk.frame is a R library typically used in Big Data, Numpy, Spark applications. disk.frame has no bugs, it has no vulnerabilities and it has low support. However disk.frame has a Non-SPDX License. You can download it from GitHub.

How do I manipulate tabular data that doesn’t fit into Random Access Memory (RAM)?. In a nutshell, {disk.frame} makes use of two simple ideas. {disk.frame} performs a similar role to distributed systems such as Apache Spark, Python’s Dask, and Julia’s JuliaDB.jl for medium data which are datasets that are too large for RAM but not quite large enough to qualify as big data.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              disk.frame has a low active ecosystem.
              It has 574 star(s) with 38 fork(s). There are 21 watchers for this library.
              OutlinedDot
              It had no major release in the last 12 months.
              There are 89 open issues and 128 have been closed. On average issues are closed in 103 days. There are 3 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of disk.frame is v0.6

            kandi-Quality Quality

              disk.frame has 0 bugs and 0 code smells.

            kandi-Security Security

              disk.frame has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              disk.frame code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              disk.frame has a Non-SPDX License.
              Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

            kandi-Reuse Reuse

              disk.frame releases are available to install and integrate.
              Installation instructions, examples and code snippets are available.
              It has 55320 lines of code, 0 functions and 126 files.
              It has low code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of disk.frame
            Get all kandi verified functions for this library.

            disk.frame Key Features

            No Key Features are available at this moment for disk.frame.

            disk.frame Examples and Code Snippets

            No Code Snippets are available at this moment for disk.frame.

            Community Discussions

            QUESTION

            How to import the data in disk.frame folder into R environment
            Asked 2022-Mar-28 at 10:14

            There is a folder 'C:\tmp_flights.df' that created by disk.frame package , how to import the data into R environment again ? Thanks!

            Below code created the disc.frame folder

            ...

            ANSWER

            Answered 2022-Mar-28 at 10:14

            The function disk.frame reads in an existing disk.frame folder.

            Source https://stackoverflow.com/questions/71645558

            QUESTION

            Problem with non-standard evaluation in disk.frame objects using data.table syntax
            Asked 2022-Feb-01 at 17:47
            Problem

            I'm currently trying to write a function that filters some rows of a disk.frame object using regular expressions. I, unfortunately, run into some issues with the evaluation of my search string in the filter function. My idea was to pass a regular expression as a string into a function argument (e.g. storm_name) and then pass that argument into my filtering call. I used the %like% function included in {data.table} for filtering rows.

            My problem is that the storm_name object gets evaluated inside the disk.frame. However, since the storm_name is only included in the function environment, but not in the disk.frame object, I get the following error:

            ...

            ANSWER

            Answered 2022-Jan-20 at 17:38

            While I don't know the exact cause of this, it has to do with environments, search path, etc. For instance, these work:

            Source https://stackoverflow.com/questions/70788596

            QUESTION

            How can I input a single additional parameter to disk.frame's inmapfn at readin?
            Asked 2022-Feb-01 at 11:38

            According to the article https://diskframe.com/articles/ingesting-data.html a good use case for inmapfn as part of csv_to_disk_frame(...) is for date conversion. In my data I know the name of the date column at runtime and would like to feed in the date to a convert at read in time function. One issue I am having is that it doesn't seem any additional parameters can be passed into the inmapfn argument beyond the chunk itself. I can't use a hardcoded variable at runtime as the name of the column isn't known until runtime.

            To clarify the issue is that the inmapfn seems to run in its own environment to prevent any data races/other parallelisation issues but I know the variable won't be changed so I am hoping there is someway to override this as I can make sure that this is safe.

            I know the function I am calling works when called on an arbitrary dataframe.

            I have provided a reproducible example below.

            ...

            ANSWER

            Answered 2021-Oct-14 at 16:51

            You can experiment with different backend and chunk_reader arguments. For example, if you set the backend to readr, the inmapfn user defined function will have access to previously defined variables. Furthermore, readr will do column type guessing and will automatically impute Date type columns if it recognizes the string format as a date (in your example data it wouldn't recognize that as a date type, however).

            If you don't want to use the readr backend for performance reasons, then I would ask if your example correctly represents your actual scenario? I'm not seeing the need to pass in the date column as a variable in the example you provided.

            There is a working solution in the Just-in-time transformation section of the link you provided, and I'm not seeing any added complexities between that example and yours.

            If you really need to use the default backend and chunk_reader plan AND you really need to send the inmapfn function a previously defined variable, you can wrap the the csv_to_disk.frame call in a wrapper function:

            Source https://stackoverflow.com/questions/69532122

            QUESTION

            CSV to disk frame with multiple CSVs
            Asked 2020-Sep-20 at 06:09

            I'm getting this error when trying to import CSVs using this code:

            some.df = csv_to_disk.frame(list.files("some/path"))

            Error in split_every_nlines(name_in = normalizePath(file, mustWork = TRUE), : Expecting a single string value: [type=character; extent=3].

            I got a temporary solution with a for loop that iterated through each of the files and then I rbinded all the disk frames together.

            I pulled the code from the ingesting data doc

            ...

            ANSWER

            Answered 2020-Sep-20 at 06:09

            This seems to be an error triggered by the bigreadr package. I wonder if you have a way to reproduce the chunks.

            Or maybe try a different chunk reader,

            Source https://stackoverflow.com/questions/63960570

            QUESTION

            In format.default(nam.ob, width = max(ncn), justify = "left") : NAs introduced by coercion to integer range
            Asked 2020-Sep-19 at 01:32

            I have a disk frame that I've saved into a file. It's made up of ten chunks.

            I coded every one of the columns as a character because I intend on combining these individual disk frames into one large disk frame and setting the column types at that point.

            I wanted to pull the disk frame from it's file with this code

            ...

            ANSWER

            Answered 2020-Sep-19 at 01:32

            In case someone gets the same error, it means that you have the wrong pathname.

            Source https://stackoverflow.com/questions/63964446

            QUESTION

            What's the best way to write a disk frame to CSV?
            Asked 2020-Sep-17 at 02:39

            I'm looking through the docs and I don't see a function for writing to CSV.

            It appears there's a function for writing the disk frame, but it's unclear what format it gets stored in

            write_disk.frame

            Write a data.frame/disk.frame to a disk.frame location. If df is a data.frame then using the as.disk.framefunction is recommended for most cases

            Can I use fwrite or write_csv with a disk frame?

            ...

            ANSWER

            Answered 2020-Sep-17 at 02:39

            I see. I might add the write to csv functionality as I see this request quite often.

            The best way to keep track though is to submit an issue on github https://github.com/xiaodaigh/disk.frame/issues I have done that this time see https://github.com/xiaodaigh/disk.frame/issues/311

            If you want to write each chunk to a separate CSV just do

            Source https://stackoverflow.com/questions/63840852

            QUESTION

            How do I read a disk frame that's already been saved?
            Asked 2020-Sep-11 at 19:37

            I saved a disk frame to its output directory and then restarted my R session.

            I'd like to read the existing disk frame instead of recreating it elsewhere.

            How might I be able to accomplish this? My folder is called outdir.df

            This is how I saved the disk frame

            ...

            ANSWER

            Answered 2020-Sep-11 at 19:37

            I think disk.frame's preferred method is to open a reference to the disk location, using

            Source https://stackoverflow.com/questions/63850235

            QUESTION

            How do count unique entities with disk.frame in R?
            Asked 2020-Sep-09 at 00:59

            I'd like to convert a data frame to a disk frame and then count the first column. It's not counting the number of unique values of the column when I try it. It appears to be counting the number of workers.

            ...

            ANSWER

            Answered 2020-Sep-09 at 00:59

            {disk.frame} only supports some group-by functions. You can use dplyr::n_distinct

            Source https://stackoverflow.com/questions/63782007

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install disk.frame

            You can install the released version of {disk.frame} from CRAN with:.

            Support

            Do you need help with machine learning and data science in R, Python, or Julia? I am available for Machine Learning/Data Science/R/Python/Julia consulting! Email me.
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries