hickle | a HDF5-based python pickle replacement
kandi X-RAY | hickle Summary
kandi X-RAY | hickle Summary
Hickle is an [HDF5] based clone of pickle, with a twist: instead of serializing to a pickle file, Hickle dumps to an HDF5 file (Hierarchical Data Format). It is designed to be a "drop-in" replacement for pickle (for common data objects), but is really an amalgam of h5py and dill/pickle with extended functionality. That is: hickle is a neat little way of dumping python variables to HDF5 files that can be read in most programming languages, not just Python. Hickle is fast, and allows for transparent compression of your data (LZF / GZIP).
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Dump a Python object to a file
- Dump a py_obj to HDF5
- Create an hdf5 file opener
- Test if f is an io - like object
- Load NDMasked Dataset
- Convert to array
- Load an ndarray Dataset
- Get the type and data of a node
- Create a setlike dataset from an object
- Create a dataset for a list - like object
- Recover a custom dataset
- Load the python dtype dataset
- Register a class
- Yields all reference types from the parent
- Load a sparse matrix data
- Register a list of classes
- Load an astropy quantity dataset
- Load a numpy dtype dataset
- Load an astropy constant dataset
- Load an ASTropy - angle dataset
- Load a numpy array from h_node
- Load an astropy SkyCoord dataset
- Load a string from a hickle file
- Helper function to create a dataset
- Load an astropy time dataset
- Load an astropy table from a h5 node
hickle Key Features
hickle Examples and Code Snippets
Community Discussions
Trending Discussions on hickle
QUESTION
I have large data set.
The best I could achieve is use numpy arrays and make a binary file out of it and then compressing it:
...ANSWER
Answered 2020-Apr-15 at 13:52An array with 46800 x 4 x 18 8-byte floats takes up 26956800 bytes. That's 25.7MiB or 27.0MB. A compressed size of 22MB is an 18% (or 14% if you really meant MiB) compression, which is pretty good by most standards, especially for random binary data. You are unlikely to improve on that much. Using a smaller datatype like float32, or perhaps trying to represent your data as rationals may be useful.
Since you mention that you want to store metadata, you can record a byte for the number of dimensions (numpy allows at most 32 dimensions), and N integers for the size in each dimension (either 32 or 64 bit). Let's say you use 64 bit integers. That makes for 193 bytes of metadata in your particular case, or 7*10-4% of the total array size.
QUESTION
I have to save a significantly large python data into the a mysql database which consists of lists and dictionaries but I get memory exception during the save operation.
I have already benchmarked the saving operation and also tried different ways of dumping the data, including binary format but all methods seemed to consume a lot of memory. Benchmarks below:
MAX MEMORY USAGE DURING JSON SAVE: 966.83MB
SIZE AFTER DUMPING json: 81.03 MB pickle: 66.79 MB msgpack: 33.83 MB
COMPRESSION TIME: json: 5.12s pickle: 11.17s msgpack: 0.27s
DECOMPRESSION TIME: json: 2.57s pickle: 1.66s msgpack: 0.52s
COMPRESSION MAX MEMORY USAGE: json dumping: 840.84MB pickle: 1373.30MB msgpack: 732.67MB
DECOMPRESSION MAX MEMORY USAGE: json: 921.41MB pickle: 1481.25MB msgpack: 1006.12MB
msgpack seems to be the most performant library but the decompression takes up a lot of memory too. I also tried hickle which is said to consume little memory but the final size ended up being 800MB.
Does anyone have a suggestion? Should I just increase the memory limit? can mongodb handle the save operation with less memory?
find below the stacktrace
...ANSWER
Answered 2020-Apr-06 at 21:54In essence, here is how I would do it to reduce memory consumption and improve performance:
- Load json file (no way to stream it in python AFAIK)
- Chunk the array of dictionaries into smaller chunks
- Convert chunk into objects
- Call
bulk_create
- Garbage collect after every loop iteration
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install hickle
You should have Python 3.5 and above installed
Install h5py (Official page: http://docs.h5py.org/en/latest/build.html)
Install hdf5 (Official page: http://www.hdfgroup.org/ftp/HDF5/current/src/unpacked/release_docs/INSTALL)
Download hickle: via terminal: git clone https://github.com/telegraphic/hickle.git via manual download: Go to https://github.com/telegraphic/hickle and on right hand side you will find Download ZIP file
cd to your downloaded hickle directory
Then run the following command in the hickle directory: python setup.py install
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page