simdjson | Parsing gigabytes of JSON per second

by simdjson C++ Version: v3.1.8 License: Apache-2.0

X-Ray Key Features Code Snippets Community Discussions(3)Vulnerabilities Install Support

kandi X-RAY | simdjson Summary

simdjson is a C++ library typically used in Big Data applications. simdjson has no bugs, it has no vulnerabilities, it has a Permissive License and it has medium support. You can download it from GitHub, GitLab.

JSON is everywhere on the Internet. Servers spend a lot of time parsing it. We need a fresh approach. The simdjson library uses commonly available SIMD instructions and microparallel algorithms to parse JSON 4x faster than RapidJSON and 25x faster than JSON for Modern C++. This library is part of the [Awesome Modern C++] list.

Support

Quality

Security

License

Reuse

Support

simdjson has a medium active ecosystem.

It has 16984 star(s) with 916 fork(s). There are 238 watchers for this library.

There were 1 major release(s) in the last 12 months.

There are 122 open issues and 631 have been closed. On average issues are closed in 18 days. There are 3 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of simdjson is v3.1.8

Quality

simdjson has 0 bugs and 0 code smells.

Security

simdjson has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

simdjson code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

simdjson is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

simdjson releases are available to install and integrate.

Installation instructions, examples and code snippets are available.

It has 555 lines of code, 22 functions and 6 files.

It has high code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of simdjson

Get all kandi verified functions for this library.

simdjson Key Features

No Key Features are available at this moment for simdjson.

simdjson Examples and Code Snippets

No Code Snippets are available at this moment for simdjson.

Community Discussions

Trending Discussions on simdjson

How to import package in cmake from vcpkg?

Fastest way to store and retrieve a large stream of small unstructured messages

What is 'really_inline' exactly?

QUESTION

How to import package in cmake from vcpkg?

Asked 2021-Mar-01 at 09:01

When I vcpkg install simdjson , it returns :

...

ANSWER

Answered 2021-Mar-01 at 09:01

Unfortunately, redis-plus-plus doesn't supply CMake config files. Someone should open an issue with upstream. It's honestly pretty unacceptable to not support find_package for your library. Thus, thanks to the authors' negligence, you will have to create an imported target for their library yourself. Here's an example CMakeLists.txt, step by step. We'll start with the standard boilerplate:

Source https://stackoverflow.com/questions/66417946

QUESTION

Fastest way to store and retrieve a large stream of small unstructured messages

Asked 2020-May-09 at 01:30

I am developing an IOT application that requires me to handle many small unstructured messages (meaning that their fields can change over time - some can appear and others can disappear). These messages typically have between 2 and 15 fields, whose values belong to basic data types (ints/longs, strings, booleans). These messages fit very well within the JSON data format (or msgpack).

It is critical that the messages get processed in their order of arrival (understand: they need to be processed by a single thread - there is no way to parallelize this part). I have my own logic for handling these messages in realtime (the throughput is relatively small, a few hundred thousand messages per second at most), but there is an increasing need for the engine to be able to simulate/replay previous periods by replaying a history of messages. Though it wasn't initially written for that purpose, my event processing engine (written in Go) could very well handle dozens (maybe in the low hundreds) of millions of messages per second if I was able to feed it with historical data at a sufficient speed.

This is exactly the problem. I have been storing many (hundreds of billions) of these messages over a long period of time (several years), for now in delimited msgpack format (https://github.com/msgpack/msgpack-python#streaming-unpacking). In this setting and others (see below), I was able to benchmark peak parsing speeds of ~2M messages/second (on a 2019 Macbook Pro, parsing only), which is far from saturating disk IO.

Even without talking about IO, doing the following:

...

ANSWER

Answered 2020-May-09 at 01:30

I assume that messages only contain few named attributes of basic types (defined at runtime) and that these basic types are for example strings, integers and floating-point numbers.

For the implementation to be fast, it is better to:

avoid text parsing (slow because sequential and full of conditionals);
avoid checking if messages are ill-formed (not needed here as they should all be well-formed);
avoid allocations as much as possible;
work on message chunks.

Thus, we first need to design a simple and fast binary message protocol:

A binary message contains the number of its attributes (encoded on 1 byte) followed by the list of attributes. Each attribute contains a string prefixed by its size (encoded on 1 byte) followed by the type of the attribute (the index of the type in the std::variant, encoded on 1 byte) as well as the attribute value (a size-prefixed string, a 64-bit integer or a 64-bit floating-point number).

Each encoded message is a stream of bytes that can fit in a large buffer (allocated once and reused for multiple incoming messages).

Here is a code to decode a message from a raw binary buffer:

Source https://stackoverflow.com/questions/61609733

QUESTION

What is 'really_inline' exactly?

Asked 2020-Apr-29 at 06:29

While browsing github I stumbled on "really_inline" keyword and I was wondering what is it exactly?

Obviously, if it lives by its name, it makes 100% sure to embed the function and remove the caller/callee in the assembly output, but I want to know if it is specific to any compiler or else and how is this done.

...

ANSWER

Answered 2020-Apr-29 at 06:29

in this case it's a macro -- simdjson defines it conditionally on whether it is msvc:

here (msvc):

Source https://stackoverflow.com/questions/61495480

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install simdjson

The simdjson library is easily consumable with a single .h and .cpp file.
Prerequisites: g` (version 7 or better) or `clang (version 6 or better), and a 64-bit system with a command-line shell (e.g., Linux, macOS, freeBSD). We also support programming environments like Visual Studio and Xcode, but different steps are needed.
Pull [simdjson.h](singleheader/simdjson.h) and [simdjson.cpp](singleheader/simdjson.cpp) into a directory, along with the sample file [twitter.json](jsonexamples/twitter.json). ``` wget https://raw.githubusercontent.com/simdjson/simdjson/master/singleheader/simdjson.h https://raw.githubusercontent.com/simdjson/simdjson/master/singleheader/simdjson.cpp https://raw.githubusercontent.com/simdjson/simdjson/master/jsonexamples/twitter.json ``` 2. Create `quickstart.cpp`:

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: