ijson | efficient alternative to serde_json : : Value | JSON Processing library

 by   Diggsey Rust Version: 0.1.3 License: Apache-2.0

kandi X-RAY | ijson Summary

kandi X-RAY | ijson Summary

ijson is a Rust library typically used in Utilities, JSON Processing applications. ijson has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

This crate offers a replacement for serde-json's Value type, which is significantly more memory efficient. As a ballpark figure, it will typically use half as much memory as serde-json when deserializing a value and the memory footprint of cloning a value is more than 7x smaller.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              ijson has a low active ecosystem.
              It has 98 star(s) with 5 fork(s). There are 3 watchers for this library.
              OutlinedDot
              It had no major release in the last 12 months.
              There are 6 open issues and 8 have been closed. On average issues are closed in 1 days. There are 2 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of ijson is 0.1.3

            kandi-Quality Quality

              ijson has 0 bugs and 0 code smells.

            kandi-Security Security

              ijson has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              ijson code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              ijson is licensed under the Apache-2.0 License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              ijson releases are available to install and integrate.
              Installation instructions are not available. Examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of ijson
            Get all kandi verified functions for this library.

            ijson Key Features

            No Key Features are available at this moment for ijson.

            ijson Examples and Code Snippets

            No Code Snippets are available at this moment for ijson.

            Community Discussions

            QUESTION

            python ijson not working on multiple element at once
            Asked 2021-Dec-04 at 13:21

            I have thousands of very large JSON files that I need to process on specific elements. To avoid memory overload I am using a python library called ijson which works fine when I am processing only a single element from the json file but when I try to process multiple-element at once it throughs

            IncompleteJSONError: parse error: premature EOF

            Partial JSON:

            ...

            ANSWER

            Answered 2021-Dec-04 at 12:58

            I think this is happening because you've finished reading your IO stream from the file, you're at the end already, and already asking for another query.

            What you can do is to reset the cursor to the 0 position before the second query:

            Source https://stackoverflow.com/questions/70225638

            QUESTION

            Python: Reading and Writing HUGE Json files
            Asked 2021-Oct-31 at 14:18

            I am new to python. So please excuse me if I am not asking the questions in pythonic way.

            My requirements are as follows:

            1. I need to write python code to implement this requirement.

            2. Will be reading 60 json files as input. Each file is approximately 150 GB.

            3. Sample structure for all 60 json files is as shown below. Please note each file will have only ONE json object. And the huge size of each file is because of the number and size of the "array_element" array contained in that one huge json object.

              { "string_1":"abc", "string_1":"abc", "string_1":"abc", "string_1":"abc", "string_1":"abc", "string_1":"abc", "array_element":[] }

            4. Transformation logic is simple. I need to merge all the array_element from all 60 files and write it into one HUGE json file. That is almost 150GB X 60 will be the size of the output json file.

            Questions for which I am requesting your help on:

            1. For reading: Planning on using "ijson" module's ijson.items(file_object, "array_element"). Could you please tell me if ijson.items will "Yield" (that is NOT load the entire file into memory) one item at a time from "array_element" array in the json file? I dont think json.load is an option here because we cannot hold such a huge dictionalry in-memory.

            2. For writing: I am planning to read each item using ijson.item, and do json.dumps to "encode" and then write it to the file using file_object.write and NOT using json.dump since I cannot have such a huge dictionary in memory to use json.dump. Could you please let me know if f.flush() applied in the code shown below is needed? To my understanding, the internal buffer will automatically get flushed by itself when it is full and the size of the internal buffer is constant and wont dynamically grow to an extent that it will overload the memory? please let me know

            3. Are there any better approach to the ones mentioned above for incrementally reading and writing huge json files?

            Code snippet showing above described reading and writing logic:

            ...

            ANSWER

            Answered 2021-Oct-31 at 14:18

            The following program assumes that the input files have a format that is predictable enough to skip JSON parsing for the sake of performance.

            My assumptions, inferred from your description, are:

            • All files have the same encoding.
            • All files have a single position somewhere at the start where "array_element":[ can be found, after which the "interesting portion" of the file begins
            • All files have a single position somewhere at the end where ]} marks the end of the "interesting portion"
            • All "interesting portions" can be joined with commas and still be valid JSON

            When all of these points are true, concatenating a predefined header fragment, the respective file ranges, and a footer fragment would produce one large, valid JSON file.

            Source https://stackoverflow.com/questions/69633676

            QUESTION

            ijson kvitems unexpected behaviour
            Asked 2021-Oct-28 at 09:02

            I'm using ijson to parse through large JSONs. I have this code, which should give me a dict of values corresponding to the relevant JSON fields:

            ...

            ANSWER

            Answered 2021-Oct-27 at 13:02
            About result collection

            Beware of how you are collecting the results from kvitems. In all your examples above you are using a generator expression, which are themselves lazy-evaluated, and this may lead to misunderstandings. You are not showing however how you find that your final dictionary has values for id but not for the other keys. I'm assuming it's only because you are iterating over the values under the parse_records['id'] values first. As you do so, the generator expression is then evaluated and the underlying kvitems generator is exhausted. When you iterate over the values of the other generator expressions, the underlying kvitems generator that feeds them is exhausted so they yield nothing. However, if you were to iterate over the values for one of the other keys first, you should see values for that key and not for the others.

            Generator expressions themselves are great, but in this case it might end up adding confusion. If you want to avoid this situation you may want to consolidate those sequences to be lists instead (e.g., using [... for k, v in kvitems ...] instead of (... for k, v in kvitems ...)).

            About kvitems

            As you point out, kvitems is a single-pass generator (or a single-pass asynchronous generator when fed with an asynchronous file-like object), so once you fully iterate over it, further iterations yield no values. This is why indeed in your original code you get values for id but not for the other keys that are collected on subsequent iterations over an already-iterated kvitems object.

            Trying to duplicate the kvitems object is also bogus: as you also found out, you are simply creating a list with the same object in all positions instead of copies of the original object.

            Trying to copy the kvitems is simply not possible. The only option to get a N "copies" is to actually construct N different object; this means however that the input file will be read N times (and needs to be opened N times as well, as kvitems will advance the given file until it doesn't have any more input). Possible, but not great.

            The result of itertools.cycle is an infinite generator. Then you use this as the basis to construct different generator expressions (so, lazy evaluated). You mention that this solution worked in ways "you don't understand", but don't delve on what exactly happened. My expectation is that when trying to inspect the values for any of the keys, you run into an infinite loop because your generator expression is iterating over an infinite generator, or something similar.

            You say that your finally piece of code works as expected. This is the only bit that surprises me, specially if you really, really inspected (i.e., evaluated) all three of the generator expressions after you created them. If you could clarify if that's the case it would be interesting; otherwise if you created all three generator expressions, but then evaluated one or the other, then there are no surprises here (because of the "About result collection" explanation).

            How to tackle your problem

            It basically all boils down to doing a single iteration over kvitems. You could try for instance something like this:

            Source https://stackoverflow.com/questions/69738173

            QUESTION

            Python ijson - parse error: trailing garbage // bz2.decompress()
            Asked 2021-Oct-17 at 14:46

            I have come across an error while parsing json with ijson.

            Background: I have a series(approx - 1000) of large files of twitter data that are compressed in a '.bz2' format. I need to get elements from the file into a pd.DataFrame for further analysis. I have identified the keys I need to get. I am cautious putting twitter data up.

            Attempt: I have managed to decompress the files using bz2.decompress with the following code:

            ...

            ANSWER

            Answered 2021-Oct-17 at 14:46

            To directly answer your two questions:

            • The decompression method is correct in the sense that it yields JSON data that you then feed to ijson. As you point out, ijson works both with str and bytes inputs (although the latter is preferred); if you were giving ijson some non-JSON input you wouldn't see an error showing JSON data in it.

            • This is a very common error that is described in ijson's FAQ. It basically means your JSON document has more than one top-level value, which is not standard JSON, but is supported by ijson by using the multiple_values option (see docs for details).

            About the code as a whole: while it's working correctly, it could be improved on: the whole point of using ijson is that you can avoid loading the full JSON contents in memory. The code you posted doesn't use this to its advantage though: it first opens the bz-compressed file, reads it as a whole, decompresses that as a whole, (unnecessarily) decodes that as a whole, and then gives the decoded data as input to ijson. If your input file is small, and the decompressed data is also small you won't see any impact, but if your files are big then you'll definitely start noticing it.

            A better approach is to stream the data through all the operations so that everything happens incrementally: decompression, no decoding and JSON parsing. Something along the lines of:

            Source https://stackoverflow.com/questions/69603013

            QUESTION

            Json in python rename, delete
            Asked 2021-Sep-15 at 21:57

            I work with big geojson data (more than 1 Gb) with this structure. Is part of it.

            ...

            ANSWER

            Answered 2021-Sep-15 at 09:56

            This answer works if you are sure that the data is a GeoJSON and it is structured properly:

            For reading GeoJSON data you can use Geopandas library:

            Source https://stackoverflow.com/questions/69189813

            QUESTION

            Memory error while parsing huge JSON file
            Asked 2021-May-25 at 18:09

            I'm trying to parse a huge 12 GB JSON file with almost 5 million lines(each one is an object) in python and store it to a database. I'm using ijson and multiprocessing in order to run it faster. Here is the code

            ...

            ANSWER

            Answered 2021-May-25 at 18:09

            I've had to make quite some extrapolations and assumptions, but it looks like

            • you're using Django
            • you want to populate an SQL database with venue, paper and author data
            • you want to then do some analysis using Pandas

            Populating your SQL database can be done pretty neatly with something like the following.

            • I added the tqdm package so you get a progress indication.
            • This assumes there's a PaperAuthor model that links papers and authors.
            • Unlike the original code, this will not save duplicate Venues in the database.
            • You can see I replaced get_or_create and create with stubs to make this runnable without the database models (or indeed, without Django), just having the dataset you're using available.

            On my machine, this consumes practically no memory, as the records are (or would be) dumped into the SQL database, not into an ever-growing, fragmenting dataframe in memory.

            The Pandas processing is left as an exercise for the reader ;-), but I'd imagine it'd involve pd.read_sql() to read this preprocessed data from the database.

            Source https://stackoverflow.com/questions/67692983

            QUESTION

            Python ijson - nested parsing
            Asked 2021-May-11 at 13:29

            I'm working with a web response of JSON that looks like this (simplified, and I can't change the format):

            ...

            ANSWER

            Answered 2021-May-11 at 13:29

            You need to use ijson's event interception mechanism. Basically go one level down in the parsing logic by using ijson.parse until you hit the big array, then switch to using ijson.items with the rest of the parse events. This uses a string literal, but should illustrate the point:

            Source https://stackoverflow.com/questions/67467897

            QUESTION

            How can i use ijson to extract a set of corresponding data from json file?
            Asked 2021-May-02 at 14:59

            I have a json file just like this:

            ...

            ANSWER

            Answered 2021-May-02 at 14:59

            I think if you need to keep track of CVE IDs and their corresponding CPEs you'll need to iterate over whole cve items and extract the bits of data you need (so you'll only do one pass through the file). Not as efficient memory-wise as your original iteration, but if each item in CVE_Items is not too big then it's not a problem:

            Source https://stackoverflow.com/questions/67355915

            QUESTION

            Adding commas in between JSON objects while writing,
            Asked 2021-Feb-17 at 13:47

            I am parsing an extremely large JSON file using IJSON and then writing the contents to a temp file. Afterwards, I overwrite the original file with the contents of the temp file.

            ...

            ANSWER

            Answered 2021-Feb-17 at 13:39

            have you tried this json.dump(row, temp, indent=4)

            Source https://stackoverflow.com/questions/66242837

            QUESTION

            Load a large json file 3.7GB into dataframe and convert to csv file using ijson
            Asked 2021-Feb-08 at 11:29

            I have a large json data file with 3.7gb. Iam going to load the json file to dataframe and delete unused columns than convert it to csv and load to sql. ram is 40gb My json file structure

            ...

            ANSWER

            Answered 2021-Feb-07 at 10:26

            Your proposal is:

            • Step 1 read json file
            • Step 2 load to dataframe
            • Step 3 save file as a csv
            • Step 4 load csv to sql
            • Step 5 load data to django to search

            The problem with your second example is that you still use global lists (data_phone, data_name), which grow over time.

            Here's what you should try, for huge files:

            • Step 1 read json
              • line by line
              • do not save any data into a global list
              • write data directly into SQL
            • Step 2 Add indexes to your database
            • Step 3 use SQL from django

            You don't need to write anything to CSV. If you really want to, you could simply write the file line by line:

            Source https://stackoverflow.com/questions/66079234

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install ijson

            You can download it from GitHub.
            Rust is installed and managed by the rustup tool. Rust has a 6-week rapid release process and supports a great number of platforms, so there are many builds of Rust available at any time. Please refer rust-lang.org for more information.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/Diggsey/ijson.git

          • CLI

            gh repo clone Diggsey/ijson

          • sshUrl

            git@github.com:Diggsey/ijson.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Explore Related Topics

            Consider Popular JSON Processing Libraries

            json

            by nlohmann

            fastjson

            by alibaba

            jq

            by stedolan

            gson

            by google

            normalizr

            by paularmstrong

            Try Top Libraries by Diggsey

            act-zero

            by DiggseyRust

            query_interface

            by DiggseyRust

            aoc2018

            by DiggseyRust

            spanr

            by DiggseyRust

            lockless

            by DiggseyRust