ijson | efficient alternative to serde_json : : Value | JSON Processing library

by Diggsey Rust Version: 0.1.3 License: Apache-2.0

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | ijson Summary

ijson is a Rust library typically used in Utilities, JSON Processing applications. ijson has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

This crate offers a replacement for serde-json's Value type, which is significantly more memory efficient. As a ballpark figure, it will typically use half as much memory as serde-json when deserializing a value and the memory footprint of cloning a value is more than 7x smaller.

Support

Quality

Security

License

Reuse

Support

ijson has a low active ecosystem.

It has 98 star(s) with 5 fork(s). There are 3 watchers for this library.

It had no major release in the last 12 months.

There are 6 open issues and 8 have been closed. On average issues are closed in 1 days. There are 2 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of ijson is 0.1.3

Quality

ijson has 0 bugs and 0 code smells.

Security

ijson has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

ijson code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

ijson is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

ijson releases are available to install and integrate.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of ijson

Get all kandi verified functions for this library.

ijson Key Features

No Key Features are available at this moment for ijson.

ijson Examples and Code Snippets

No Code Snippets are available at this moment for ijson.

Community Discussions

Trending Discussions on ijson

python ijson not working on multiple element at once

Python: Reading and Writing HUGE Json files

ijson kvitems unexpected behaviour

Python ijson - parse error: trailing garbage // bz2.decompress()

Json in python rename, delete

Memory error while parsing huge JSON file

Python ijson - nested parsing

How can i use ijson to extract a set of corresponding data from json file?

Adding commas in between JSON objects while writing,

Load a large json file 3.7GB into dataframe and convert to csv file using ijson

QUESTION

python ijson not working on multiple element at once

Asked 2021-Dec-04 at 13:21

I have thousands of very large JSON files that I need to process on specific elements. To avoid memory overload I am using a python library called ijson which works fine when I am processing only a single element from the json file but when I try to process multiple-element at once it throughs

IncompleteJSONError: parse error: premature EOF

Partial JSON:

...

ANSWER

Answered 2021-Dec-04 at 12:58

I think this is happening because you've finished reading your IO stream from the file, you're at the end already, and already asking for another query.

What you can do is to reset the cursor to the 0 position before the second query:

Source https://stackoverflow.com/questions/70225638

QUESTION

Python: Reading and Writing HUGE Json files

Asked 2021-Oct-31 at 14:18

I am new to python. So please excuse me if I am not asking the questions in pythonic way.

My requirements are as follows:

I need to write python code to implement this requirement.
Will be reading 60 json files as input. Each file is approximately 150 GB.
Sample structure for all 60 json files is as shown below. Please note each file will have only ONE json object. And the huge size of each file is because of the number and size of the "array_element" array contained in that one huge json object.

{ "string_1":"abc", "string_1":"abc", "string_1":"abc", "string_1":"abc", "string_1":"abc", "string_1":"abc", "array_element":[] }
Transformation logic is simple. I need to merge all the array_element from all 60 files and write it into one HUGE json file. That is almost 150GB X 60 will be the size of the output json file.

Questions for which I am requesting your help on:

For reading: Planning on using "ijson" module's ijson.items(file_object, "array_element"). Could you please tell me if ijson.items will "Yield" (that is NOT load the entire file into memory) one item at a time from "array_element" array in the json file? I dont think json.load is an option here because we cannot hold such a huge dictionalry in-memory.
For writing: I am planning to read each item using ijson.item, and do json.dumps to "encode" and then write it to the file using file_object.write and NOT using json.dump since I cannot have such a huge dictionary in memory to use json.dump. Could you please let me know if f.flush() applied in the code shown below is needed? To my understanding, the internal buffer will automatically get flushed by itself when it is full and the size of the internal buffer is constant and wont dynamically grow to an extent that it will overload the memory? please let me know
Are there any better approach to the ones mentioned above for incrementally reading and writing huge json files?

Code snippet showing above described reading and writing logic:

...

ANSWER

Answered 2021-Oct-31 at 14:18

The following program assumes that the input files have a format that is predictable enough to skip JSON parsing for the sake of performance.

My assumptions, inferred from your description, are:

All files have the same encoding.
All files have a single position somewhere at the start where "array_element":[ can be found, after which the "interesting portion" of the file begins
All files have a single position somewhere at the end where ]} marks the end of the "interesting portion"
All "interesting portions" can be joined with commas and still be valid JSON

When all of these points are true, concatenating a predefined header fragment, the respective file ranges, and a footer fragment would produce one large, valid JSON file.

Source https://stackoverflow.com/questions/69633676

QUESTION

ijson kvitems unexpected behaviour

Asked 2021-Oct-28 at 09:02

I'm using ijson to parse through large JSONs. I have this code, which should give me a dict of values corresponding to the relevant JSON fields:

...

ANSWER

Answered 2021-Oct-27 at 13:02

About result collection

Beware of how you are collecting the results from kvitems. In all your examples above you are using a generator expression, which are themselves lazy-evaluated, and this may lead to misunderstandings. You are not showing however how you find that your final dictionary has values for id but not for the other keys. I'm assuming it's only because you are iterating over the values under the parse_records['id'] values first. As you do so, the generator expression is then evaluated and the underlying kvitems generator is exhausted. When you iterate over the values of the other generator expressions, the underlying kvitems generator that feeds them is exhausted so they yield nothing. However, if you were to iterate over the values for one of the other keys first, you should see values for that key and not for the others.

Generator expressions themselves are great, but in this case it might end up adding confusion. If you want to avoid this situation you may want to consolidate those sequences to be lists instead (e.g., using [... for k, v in kvitems ...] instead of (... for k, v in kvitems ...)).

About kvitems

As you point out, kvitems is a single-pass generator (or a single-pass asynchronous generator when fed with an asynchronous file-like object), so once you fully iterate over it, further iterations yield no values. This is why indeed in your original code you get values for id but not for the other keys that are collected on subsequent iterations over an already-iterated kvitems object.

Trying to duplicate the kvitems object is also bogus: as you also found out, you are simply creating a list with the same object in all positions instead of copies of the original object.

Trying to copy the kvitems is simply not possible. The only option to get a N "copies" is to actually construct N different object; this means however that the input file will be read N times (and needs to be opened N times as well, as kvitems will advance the given file until it doesn't have any more input). Possible, but not great.

The result of itertools.cycle is an infinite generator. Then you use this as the basis to construct different generator expressions (so, lazy evaluated). You mention that this solution worked in ways "you don't understand", but don't delve on what exactly happened. My expectation is that when trying to inspect the values for any of the keys, you run into an infinite loop because your generator expression is iterating over an infinite generator, or something similar.

You say that your finally piece of code works as expected. This is the only bit that surprises me, specially if you really, really inspected (i.e., evaluated) all three of the generator expressions after you created them. If you could clarify if that's the case it would be interesting; otherwise if you created all three generator expressions, but then evaluated one or the other, then there are no surprises here (because of the "About result collection" explanation).

How to tackle your problem

It basically all boils down to doing a single iteration over kvitems. You could try for instance something like this:

Source https://stackoverflow.com/questions/69738173

QUESTION

Python ijson - parse error: trailing garbage // bz2.decompress()

Asked 2021-Oct-17 at 14:46

I have come across an error while parsing json with ijson.

Background: I have a series(approx - 1000) of large files of twitter data that are compressed in a '.bz2' format. I need to get elements from the file into a pd.DataFrame for further analysis. I have identified the keys I need to get. I am cautious putting twitter data up.

Attempt: I have managed to decompress the files using bz2.decompress with the following code:

...

ANSWER

Answered 2021-Oct-17 at 14:46

To directly answer your two questions:

The decompression method is correct in the sense that it yields JSON data that you then feed to ijson. As you point out, ijson works both with str and bytes inputs (although the latter is preferred); if you were giving ijson some non-JSON input you wouldn't see an error showing JSON data in it.
This is a very common error that is described in ijson's FAQ. It basically means your JSON document has more than one top-level value, which is not standard JSON, but is supported by ijson by using the multiple_values option (see docs for details).

About the code as a whole: while it's working correctly, it could be improved on: the whole point of using ijson is that you can avoid loading the full JSON contents in memory. The code you posted doesn't use this to its advantage though: it first opens the bz-compressed file, reads it as a whole, decompresses that as a whole, (unnecessarily) decodes that as a whole, and then gives the decoded data as input to ijson. If your input file is small, and the decompressed data is also small you won't see any impact, but if your files are big then you'll definitely start noticing it.

A better approach is to stream the data through all the operations so that everything happens incrementally: decompression, no decoding and JSON parsing. Something along the lines of:

Source https://stackoverflow.com/questions/69603013

QUESTION

Json in python rename, delete

Asked 2021-Sep-15 at 21:57

I work with big geojson data (more than 1 Gb) with this structure. Is part of it.

...

ANSWER

Answered 2021-Sep-15 at 09:56

This answer works if you are sure that the data is a GeoJSON and it is structured properly:

For reading GeoJSON data you can use Geopandas library:

Source https://stackoverflow.com/questions/69189813

QUESTION

Memory error while parsing huge JSON file

Asked 2021-May-25 at 18:09

I'm trying to parse a huge 12 GB JSON file with almost 5 million lines(each one is an object) in python and store it to a database. I'm using ijson and multiprocessing in order to run it faster. Here is the code

...

ANSWER

Answered 2021-May-25 at 18:09

I've had to make quite some extrapolations and assumptions, but it looks like

you're using Django
you want to populate an SQL database with venue, paper and author data
you want to then do some analysis using Pandas

Populating your SQL database can be done pretty neatly with something like the following.

I added the tqdm package so you get a progress indication.
This assumes there's a PaperAuthor model that links papers and authors.
Unlike the original code, this will not save duplicate Venues in the database.
You can see I replaced get_or_create and create with stubs to make this runnable without the database models (or indeed, without Django), just having the dataset you're using available.

On my machine, this consumes practically no memory, as the records are (or would be) dumped into the SQL database, not into an ever-growing, fragmenting dataframe in memory.

The Pandas processing is left as an exercise for the reader ;-), but I'd imagine it'd involve pd.read_sql() to read this preprocessed data from the database.

Source https://stackoverflow.com/questions/67692983

QUESTION

Python ijson - nested parsing

Asked 2021-May-11 at 13:29

I'm working with a web response of JSON that looks like this (simplified, and I can't change the format):

...

ANSWER

Answered 2021-May-11 at 13:29

You need to use ijson's event interception mechanism. Basically go one level down in the parsing logic by using ijson.parse until you hit the big array, then switch to using ijson.items with the rest of the parse events. This uses a string literal, but should illustrate the point:

Source https://stackoverflow.com/questions/67467897

QUESTION

How can i use ijson to extract a set of corresponding data from json file?

Asked 2021-May-02 at 14:59

I have a json file just like this:

...

ANSWER

Answered 2021-May-02 at 14:59

I think if you need to keep track of CVE IDs and their corresponding CPEs you'll need to iterate over whole cve items and extract the bits of data you need (so you'll only do one pass through the file). Not as efficient memory-wise as your original iteration, but if each item in CVE_Items is not too big then it's not a problem:

Source https://stackoverflow.com/questions/67355915

QUESTION

Adding commas in between JSON objects while writing,

Asked 2021-Feb-17 at 13:47

I am parsing an extremely large JSON file using IJSON and then writing the contents to a temp file. Afterwards, I overwrite the original file with the contents of the temp file.

...

ANSWER

Answered 2021-Feb-17 at 13:39

have you tried this json.dump(row, temp, indent=4)

Source https://stackoverflow.com/questions/66242837

QUESTION

Load a large json file 3.7GB into dataframe and convert to csv file using ijson

Asked 2021-Feb-08 at 11:29

I have a large json data file with 3.7gb. Iam going to load the json file to dataframe and delete unused columns than convert it to csv and load to sql. ram is 40gb My json file structure

...

ANSWER

Answered 2021-Feb-07 at 10:26

Your proposal is:

Step 1 read json file
Step 2 load to dataframe
Step 3 save file as a csv
Step 4 load csv to sql
Step 5 load data to django to search

The problem with your second example is that you still use global lists (data_phone, data_name), which grow over time.

Here's what you should try, for huge files:

Step 1 read json
- line by line
- do not save any data into a global list
- write data directly into SQL
Step 2 Add indexes to your database
Step 3 use SQL from django

You don't need to write anything to CSV. If you really want to, you could simply write the file line by line:

Source https://stackoverflow.com/questions/66079234

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install ijson

You can download it from GitHub.
Rust is installed and managed by the rustup tool. Rust has a 6-week rapid release process and supports a great number of platforms, so there are many builds of Rust available at any time. Please refer rust-lang.org for more information.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: