hickle | a HDF5-based python pickle replacement

by telegraphic Python Version: 5.0.3 License: Non-SPDX

X-Ray Key Features Code Snippets Community Discussions(2)Vulnerabilities Install Support

kandi X-RAY | hickle Summary

hickle is a Python library. hickle has no bugs, it has no vulnerabilities, it has build file available and it has low support. However hickle has a Non-SPDX License. You can install using 'pip install hickle' or download it from GitHub, PyPI.

Hickle is an [HDF5] based clone of pickle, with a twist: instead of serializing to a pickle file, Hickle dumps to an HDF5 file (Hierarchical Data Format). It is designed to be a "drop-in" replacement for pickle (for common data objects), but is really an amalgam of h5py and dill/pickle with extended functionality. That is: hickle is a neat little way of dumping python variables to HDF5 files that can be read in most programming languages, not just Python. Hickle is fast, and allows for transparent compression of your data (LZF / GZIP).

Support

Quality

Security

License

Reuse

Support

hickle has a low active ecosystem.

It has 381 star(s) with 64 fork(s). There are 21 watchers for this library.

It had no major release in the last 12 months.

There are 6 open issues and 85 have been closed. On average issues are closed in 389 days. There are 1 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of hickle is 5.0.3

Quality

hickle has 0 bugs and 0 code smells.

Security

hickle has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

hickle code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

hickle has a Non-SPDX License.

Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

Reuse

hickle releases are available to install and integrate.

Deployable package is available in PyPI.

Build file is available. You can build the component from source.

Installation instructions, examples and code snippets are available.

hickle saves you 1091 person hours of effort in developing the same functionality from scratch.

It has 2469 lines of code, 213 functions and 31 files.

It has high code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed hickle and discovered the below as its top functions. This is intended to give you an instant insight into hickle implemented functionality, and help decide if they suit your requirements.

Dump a Python object to a file
Dump a py_obj to HDF5
Create an hdf5 file opener
Test if f is an io - like object
Load NDMasked Dataset
Convert to array
Load an ndarray Dataset
Get the type and data of a node
Create a setlike dataset from an object
Create a dataset for a list - like object
Recover a custom dataset
Load the python dtype dataset
Register a class
Yields all reference types from the parent
Load a sparse matrix data
Register a list of classes
Load an astropy quantity dataset
Load a numpy dtype dataset
Load an astropy constant dataset
Load an ASTropy - angle dataset
Load a numpy array from h_node
Load an astropy SkyCoord dataset
Load a string from a hickle file
Helper function to create a dataset
Load an astropy time dataset
Load an astropy table from a h5 node

Get all kandi verified functions for this library.

hickle Key Features

No Key Features are available at this moment for hickle.

hickle Examples and Code Snippets

No Code Snippets are available at this moment for hickle.

Community Discussions

Trending Discussions on hickle

What is the most compact way of storing numpy data?

least memory comsuming way of saving large python data into DB

QUESTION

What is the most compact way of storing numpy data?

Asked 2020-Apr-15 at 13:52

I have large data set.

The best I could achieve is use numpy arrays and make a binary file out of it and then compressing it:

...

ANSWER

Answered 2020-Apr-15 at 13:52

An array with 46800 x 4 x 18 8-byte floats takes up 26956800 bytes. That's 25.7MiB or 27.0MB. A compressed size of 22MB is an 18% (or 14% if you really meant MiB) compression, which is pretty good by most standards, especially for random binary data. You are unlikely to improve on that much. Using a smaller datatype like float32, or perhaps trying to represent your data as rationals may be useful.

Since you mention that you want to store metadata, you can record a byte for the number of dimensions (numpy allows at most 32 dimensions), and N integers for the size in each dimension (either 32 or 64 bit). Let's say you use 64 bit integers. That makes for 193 bytes of metadata in your particular case, or 7*10^-4% of the total array size.

Source https://stackoverflow.com/questions/61213944

QUESTION

least memory comsuming way of saving large python data into DB

Asked 2020-Apr-10 at 11:28

I have to save a significantly large python data into the a mysql database which consists of lists and dictionaries but I get memory exception during the save operation.

I have already benchmarked the saving operation and also tried different ways of dumping the data, including binary format but all methods seemed to consume a lot of memory. Benchmarks below:

MAX MEMORY USAGE DURING JSON SAVE: 966.83MB

SIZE AFTER DUMPING json: 81.03 MB pickle: 66.79 MB msgpack: 33.83 MB

COMPRESSION TIME: json: 5.12s pickle: 11.17s msgpack: 0.27s

DECOMPRESSION TIME: json: 2.57s pickle: 1.66s msgpack: 0.52s

COMPRESSION MAX MEMORY USAGE: json dumping: 840.84MB pickle: 1373.30MB msgpack: 732.67MB

DECOMPRESSION MAX MEMORY USAGE: json: 921.41MB pickle: 1481.25MB msgpack: 1006.12MB

msgpack seems to be the most performant library but the decompression takes up a lot of memory too. I also tried hickle which is said to consume little memory but the final size ended up being 800MB.

Does anyone have a suggestion? Should I just increase the memory limit? can mongodb handle the save operation with less memory?

find below the stacktrace

...

ANSWER

Answered 2020-Apr-06 at 21:54

In essence, here is how I would do it to reduce memory consumption and improve performance:

Load json file (no way to stream it in python AFAIK)
Chunk the array of dictionaries into smaller chunks
Convert chunk into objects
Call bulk_create
Garbage collect after every loop iteration

Source https://stackoverflow.com/questions/61069230

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install hickle

You should have Python 3.5 and above installed. Install h5py (Official page: http://docs.h5py.org/en/latest/build.html). Install hdf5 (Official page: http://www.hdfgroup.org/ftp/HDF5/current/src/unpacked/release_docs/INSTALL). Download hickle: via terminal: git clone https://github.com/telegraphic/hickle.git via manual download: Go to https://github.com/telegraphic/hickle and on right hand side you will find Download ZIP file. cd to your downloaded hickle directory. Then run the following command in the hickle directory: python setup.py install.
You should have Python 3.5 and above installed
Install h5py (Official page: http://docs.h5py.org/en/latest/build.html)
Install hdf5 (Official page: http://www.hdfgroup.org/ftp/HDF5/current/src/unpacked/release_docs/INSTALL)
Download hickle: via terminal: git clone https://github.com/telegraphic/hickle.git via manual download: Go to https://github.com/telegraphic/hickle and on right hand side you will find Download ZIP file
cd to your downloaded hickle directory
Then run the following command in the hickle directory: python setup.py install