pystore | Fast data store for Pandas time-series data | Database library
kandi X-RAY | pystore Summary
kandi X-RAY | pystore Summary
Fast data store for Pandas time-series data
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Wrapper for write
- Write data to an item
- List items in the datastore
- Read metadata
- Write metadata to path
- Convert datetime index to int64
- Convert to pandas DataFrame
- Append data to item
- Creates a snapshot
- List all available snapshots
- Make a Path from arguments
- Return list of subdirectories
- Set the default path
- Return a Path instance
- Return the index of the given item
- Return the path for an item
- Delete a snapshot
- Delete all snapshots
- Delete an item
- Return a list of all store directories
pystore Key Features
pystore Examples and Code Snippets
Community Discussions
Trending Discussions on pystore
QUESTION
I'm developing a Django app. Use-case scenario is this:
50 users, each one can store up to 300 time series and each time serie has around 7000 rows.
Each user can ask at any time to retrieve all of their 300 time series and ask, for each of them, to perform some advanced data analysis on the last N rows. The data analysis cannot be done in SQL but in Pandas, where it doesn't take much time... but retrieving 300,000 rows in separate dataframes does!
Users can also ask results of some analysis that can be performed in SQL (like aggregation+sum by date) and that is considerably faster (to the point where I wouldn't be writing this post if that was all of it).
Browsing and thinking around, I've figured storing time series in SQL is not a good solution (read here).
Ideal deploy architecture looks like this (each bucket is a separate server!):
Problem: time series in SQL are too slow to retrieve in a multi-user app.
Researched solutions (from this article):
Arctic: https://github.com/manahl/arctic
Here are some problems:
1) Although these solutions are massively faster for pulling millions of rows time series into a single dataframe, I might need to pull around 500.000 rows into 300 different dataframes. Would that still be as fast?
This is the current db structure I'm using:
...ANSWER
Answered 2019-Sep-04 at 18:26The article you refer to in your post is probably the best answer to your question. Clearly good research and a few good solutions being proposed (don't forget to take a look at InfluxDB).
Regarding the decoupling of the storage solution from your instances, I don't see the problem:
- Arctic uses mongoDB as a backing store
- pyStore uses a file system as a backing store
- InfluxDB is a database server on its own
So as long as you decouple the backing store from your instances and make them shared among instances, you'll have the same setup as for your posgreSQL database: mongoDB or InfluxDB can run on a separate centralised instance; the file storage for pyStore can be shared, e.g. using a shared mounted volume. The python libraries that access these stores of course run on your django instances, like psycopg2 does.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install pystore
You can use pystore like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page