PyTables | A Python package to manage extremely large amounts of data | Data Visualization library

 by   PyTables Python Version: v3.8.0 License: BSD-3-Clause

kandi X-RAY | PyTables Summary

kandi X-RAY | PyTables Summary

PyTables is a Python library typically used in Analytics, Data Visualization applications. PyTables has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has medium support. You can install using 'pip install PyTables' or download it from GitHub, PyPI.

A Python package to manage extremely large amounts of data
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              PyTables has a medium active ecosystem.
              It has 1178 star(s) with 238 fork(s). There are 59 watchers for this library.
              OutlinedDot
              It had no major release in the last 12 months.
              There are 148 open issues and 515 have been closed. On average issues are closed in 459 days. There are 7 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of PyTables is v3.8.0

            kandi-Quality Quality

              PyTables has no bugs reported.

            kandi-Security Security

              PyTables has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              PyTables is licensed under the BSD-3-Clause License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              PyTables releases are available to install and integrate.
              Deployable package is available in PyPI.
              Build file is available. You can build the component from source.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of PyTables
            Get all kandi verified functions for this library.

            PyTables Key Features

            No Key Features are available at this moment for PyTables.

            PyTables Examples and Code Snippets

            What's new in 1.2.0 (December 26, 2020)
            Pythondot img1Lines of Code : 646dot img1License : Permissive (BSD-3-Clause)
            copy iconCopy
            
            .. _whatsnew_120.duplicate_labels:
            
            Optionally disallow duplicate labels
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
            
            :class:`Series` and :class:`DataFrame` can now be created with ``allows_duplicate_labels=False`` flag to
            control whether the index or colu  
            Version 0.15.0 (October 18, 2014)
            Pythondot img2Lines of Code : 565dot img2License : Permissive (BSD-3-Clause)
            copy iconCopy
            
            .. _whatsnew_0150.cat:
            
            Categoricals in Series/DataFrame
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
            
            :class:`~pandas.Categorical` can now be included in ``Series`` and ``DataFrames`` and gained new
            methods to manipulate. Thanks to Jan Schulz for much of this   
            What's new in 0.24.0 (January 25, 2019)
            Pythondot img3Lines of Code : 468dot img3License : Permissive (BSD-3-Clause)
            copy iconCopy
            
            .. _whatsnew_0240.enhancements.intna:
            
            Optional integer NA support
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
            
            pandas has gained the ability to hold integer dtypes with missing values. This long requested feature is enabled through the use of :ref:`extension types  
            How to merge pandas dataframes with different column names
            Pythondot img4Lines of Code : 20dot img4License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            df2.columns = df2.columns.str[0]
            df3.columns = df3.columns.str[0]
            out = pd.concat([df1, df2, df3])
            
            out = pd.concat([df1, df2.rename(columns=lambda x:x[0]), df3.rename(columns=lambda x:x[0])])
            
            How to merge pandas dataframes with different column names
            Pythondot img5Lines of Code : 16dot img5License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            out = pd.DataFrame(np.concatenate([df1.values,df2.values,df3.values]),columns=df1.columns)
            Out[346]: 
                  A    B    C    D
            0    A0   B0   C0   D0
            1    A1   B1   C1   D1
            2    A2   B2   C2   D2
            3    A3   B3   C3   D3
            4    A4   B4   C4   D4
            How would I go about finding the most common substring in a file
            Pythondot img6Lines of Code : 8dot img6License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            >>> data = "the quick brown fox jumps over the lazy dog"
            >>> c = collections.Counter(data[i:i+2] for i in range(len(data)-2))
            >>> max(c, key=c.get)
            'th'
            >>> c = collections.Counter(data[i:i+3] for i in r
            Pandas merge stop at first match like vlookup instead of duplicating
            Pythondot img7Lines of Code : 7dot img7License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            merged = pd.merge(pos, mat_grp.drop_duplicates('matgrp'), how='left', on='matgrp')
            
                   PO matgrp      commodity
            0  123456   1001    foo - 10001
            1  654321   803A  spam - 100003
            2  971358   803B   eggs - 10003
            
            What could allow several entries into the database with identical composite keys?
            Pythondot img8Lines of Code : 10dot img8License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            SELECT oid, relnamespace::regnamespace AS schema, relname, relkind
            FROM   pg_class
            WHERE  relname ILIKE 'contacts';
            
            REINDEX TABLE contacts;
            
            REINDEX INDEX contacts_pkey;  -- use actual name
            
            Populating sql table with foreign keys using django
            Pythondot img9Lines of Code : 5dot img9License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            e = Event()
            ...
            e.nation = Nation.objects.get(name=nation_list[i])
            e.save()
            
            How to combine multiple rows into one column with pandas?
            Pythondot img10Lines of Code : 2dot img10License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            df = df.assign(g = df.groupby('personal id').cumcount()).pivot('personal id','g','label')
            

            Community Discussions

            QUESTION

            Removing a table does not free disk space in pytables
            Asked 2021-May-20 at 16:52

            I have a table in pytables created as follows:

            ...

            ANSWER

            Answered 2021-May-20 at 16:48

            Yes, that behavior is expected. Take a look at this answer to see more detailed example of the same behavior: How does HDF handle the space freed by deleted datasets without repacking. Note that the space will be reclaimed/reused if you add new datasets.

            To reclaim the unused space in the file, you have to use a command line utility. There are 2 choices: ptrepack and h5repack: Both are used for a number of external file operations. To reduce file size after object deletion, create a new file from the old one as shown below:

            • ptrepack utility delivered with PyTables.
              • Reference here: PyTables ptrepack doc
              • Example: ptrepack file1.h5 file2.h5 (creates file2.h5 from file1.h5)
            • h5repack utility from The HDF Group.
              • Reference here: HDF5 h5repack doc
              • Example: h5repack [OPTIONS] file1.h5 file2.h5 (creates file2.h5 from file1.h5)

            Both have options to use a different compression method when creating the new file, so are also handy if you want to convert from compressed to uncompressed (or vice versa)

            Source https://stackoverflow.com/questions/67612684

            QUESTION

            How to keep hdf5 binary of a pandas dataframe in-memory?
            Asked 2021-May-18 at 12:59

            I would like to get the byte contents of a pandas dataframe exported as hdf5, ideally without actually saving the file (i.e., in-memory).

            On python>=3.6, < 3.9 (and pandas==1.2.4, pytables==3.6.1) the following used to work:

            ...

            ANSWER

            Answered 2021-May-18 at 12:59

            The fix was to do conda install -c conda-forge pytables instead of pip install pytables. I still don't understand the ultimate reason behind the error, though.

            Source https://stackoverflow.com/questions/67488733

            QUESTION

            Pandas to_hdf() TypeError: object of type 'int' has no len()
            Asked 2021-Apr-28 at 22:20

            I would like to store a pandas DataFrame such that when I later load it again, I only load certain columns of it and not the entire thing. Therefore, I am trying to store a pandas DataFrame in hdf format. The DataFrame contains a numpy array and I get the following error message.

            Any idea on how to get rid of the error or what format I could use instead?

            CODE:

            ...

            ANSWER

            Answered 2021-Apr-28 at 22:20

            Pandas seems to have trouble serializing the numpy array in your dataframe. So I would suggest storing the numpy data in a seperate *.h5 file.

            Source https://stackoverflow.com/questions/67308374

            QUESTION

            Updating packages in conda
            Asked 2021-Apr-14 at 20:26

            I have a problem with updating packages in conda. The list of my installed packages is:

            ...

            ANSWER

            Answered 2021-Apr-14 at 20:26

            Channel pypi means that the package was installed with pip. You may need to upgrade it with pip as well

            Source https://stackoverflow.com/questions/67097308

            QUESTION

            Jupyter Notebook Cannot Connect to Kernel, Likely due to Zipline / AssertionError
            Asked 2021-Apr-12 at 04:17

            All of my virtual environments work fine, except for one in which the jupyter notebook won't connect for kernel. This environment has Zipline in it, so I expect there is some dependency that is a problem there, even though I installed all packages with Conda.

            I've read the question and answers here, and unfortunately downgrading tornado to 5.1.1 didn't work nor do I get ValueErrors. I am, however, getting an AssertionError that appears related to the Class NSProcessInfo.

            I'm on an M1 Mac. Log from terminal showing the error below, and my environment file is below that. Can someone help me get this kernel working? Thank you!

            ...

            ANSWER

            Answered 2021-Apr-04 at 18:14

            Figured it out.

            What works:

            Source https://stackoverflow.com/questions/66907180

            QUESTION

            "ImportError: No module named seaborn" in Azure ML
            Asked 2020-Oct-22 at 16:57

            Created a new compute instance in Azure ML and trained a model with out any issue. I wanted to draw a pairplot using seaborn but I keep getting the error "ImportError: No module named seaborn"

            I ran !conda list and I can see seaborn in the list

            ...

            ANSWER

            Answered 2020-Sep-07 at 04:17

            I just did the following and wasn't able to reproduce your error:

            1. make a new compute instance
            2. open it up using JupyterLab
            3. open a new terminal
            4. conda activate azureml_py36
            5. conda install seaborn -y
            6. open a new notebook and run import seaborn as sns
            Spitballing
            1. Are you using the kernel, Python 3.6 - AzureML (i.e. the azureml_py36 conda env)?
            2. Have you tried restarting the kernel and/or creating a new compute instance?

            Source https://stackoverflow.com/questions/63770171

            QUESTION

            Effect of tilde on booleans — why ~True is -2 & ~False is -1 in Python?
            Asked 2020-Oct-14 at 18:02
            The problem

            I found that ~True is -2 & ~False is -1 using my Jupyter Notebook.

            This source says that ~ invertes all the bits. Why isn't ~True is False and ~False is True?

            Reasoning attempts

            My attempt to explain these:

            True is +1, and the bits of +1 are inverted. + is inverted to -. 1 in two-digit binary is 01, so inverted bits: 10, ie 2. So result is -2.

            False is +0, + is inverted to -, 0 in two-digit binary is 00, all the bits inverted, 11, which is 3 - it should be 1.

            Sources

            This answer points a more complicated picture:

            A list full of Trues only contains 4- or 8-byte references to the one canonical True object.

            This source says:

            bool: Boolean (true/false) types. Supported precisions: 8 (default) bits.

            These don't support the simplistic (and apparently wrong) reasoning above.

            The question

            What is the proper explanation for ~True being -2 & ~False being -1 then?

            ...

            ANSWER

            Answered 2020-Aug-19 at 10:22

            First of all, I'd use the not operator to invert Boolean values (not True == False, and vice versa). Now if Booleans are stored as 8-bit integers, the following happens:

            True is 0000 0001. Hence ~True yields 1111 1110, which is -2 in two's-complement representation.

            False is 0000 0000. Hence ~False yields 1111 1111, which is -1.

            Source https://stackoverflow.com/questions/63484690

            QUESTION

            Import rasterio failed. Reason: image not found
            Asked 2020-Sep-22 at 05:37

            I'm going to use rasterio in python. I downloaded rasterio via

            ...

            ANSWER

            Answered 2020-Sep-22 at 05:37

            I've got some experience with rasterio, but I am not nearly a master with it. If I remember correctly, rasterio requires you to have installed the program GDAL(both binaries and python utilities), and some other dependencies listed on the PyPi page. I don't use conda at the moment, I like to use the regular python 3.8 installer with pip. Given what I'm seeing with your installation, I would uninstall rasterio and follow a different installation procedure.

            I follow the instructions listed here: https://rasterio.readthedocs.io/en/latest/installation.html
            This page also has separate instructions for those using Anaconda.

            The GDAL installation is by far the most annoying but once it's done, the hard part is over. The python utilities for both rasterio and gdal can be found here:
            https://www.lfd.uci.edu/~gohlke/pythonlibs/#gdal
            The second link is also provided on the PyPi page but I like to keep it bookmarked because there's a lot of good resources there!

            Source https://stackoverflow.com/questions/64002714

            QUESTION

            How to create a PyTables table to store a huge square matrix?
            Asked 2020-Aug-30 at 22:38

            I'm trying to create a PyTables table to store 200000 * 200000 matrix in it. I try this code:

            ...

            ANSWER

            Answered 2020-Aug-30 at 22:38

            That's a big matrix (300GB if all ints). Likely you will have to write incrementally. (I don't have enough RAM on my system to do it all at one.)

            Without seeing your data types, it's hard to give specific advice.
            First question: do you really want to create a Table or will an Array suffice? PyTables has both types. What's the difference?
            An Array holds homogeneous data (like a NumPy ndarray) and can have any dimension. An Table is typically used to hold heterogeneous data (like a NumPy recarray) and is always 2d (really a 1d array of structured types). Tables also support complex queries with the PyTables API.

            The key when creating a Table is to either use the description= or obj= parameter to describe the structured types (and field names) for each row. I recently posted an answer that shows how to create a Table. Please review. You may find you don't want to create 200000 fields/columns to define the Table. See this answer: different data types for different columns of an array

            If you just want to save a matrix of 200000x200000 homogeneous entities, an array is easier. (Given the data size, you probably need to use an EArray, so you can write the data in increments.) I wrote a simple example that creates an EArray with 2000x200000 entities, then adds 3 more sets of data (each 2000 rows; total of 8000 rows).

            • The shape=(0,nrows) parameter indicates the first axis can be extended, and creates ncols columns.
            • The expectedrows=nrows parameter is important in large datasets to improvie I/O performance.

            The resulting HDF5 file is 6GB. Repeat earr.append(arr) 99 times to get 200000 rows. Code below:

            Source https://stackoverflow.com/questions/63660279

            QUESTION

            How can I persistently store and efficiently access a very large 2D list in Python?
            Asked 2020-Aug-26 at 03:22

            In Python, I'm reading in a very large 2D grid of data that consists of around 200,000,000 data points in total. Each data point is a tuple of 3 floats. Reading all of this data into a two dimensional list frequently causes Memory Errors. To get around this Memory Error, I would like to be able to read this data into some sort of table on the hard drive that can be efficiently accessed when given a grid coordinate i.e harddrive_table.get(300, 42).

            So far in my research, I've come across PyTables, which is an implementation of HDF5 and seems like overkill, and the built in shelve library, which uses a dictionary-like method to access saved data, but the keys have to be strings and the performance of converting hundreds of millions of grid coordinates to strings for storage could be too much of a performance hit for my use.

            Are there any libraries that allow me to store a 2D table of data on the hard drive with efficient access for a single data point?

            This table of data is only needed while the program is running, so I don't care about it's interoperability or how it stores the data on the hard drive as it will be deleted after the program has run.

            ...

            ANSWER

            Answered 2020-Aug-26 at 03:22
            1. HDF5 isn't really overkill if it works. In addition to PyTables there's the somewhat simpler h5py.

            2. Numpy lets you mmap a file directly into a numpy array. The values will be stored in the disk file in the minimum-overhead way, with the numpy array shape providing the mapping between array indices and file offsets. mmap uses the same underlying OS mechanisms that power the disk cache to map a disk file into virtual memory, meaning that the whole thing can be loaded into RAM if memory permits, but parts can be flushed to disk (and reloaded later on demand) if it doesn't all fit at once.

            Source https://stackoverflow.com/questions/63589215

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install PyTables

            You can install using 'pip install PyTables' or download it from GitHub, PyPI.
            You can use PyTables like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/PyTables/PyTables.git

          • CLI

            gh repo clone PyTables/PyTables

          • sshUrl

            git@github.com:PyTables/PyTables.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link