rocksdb | persistent key-value store | Database library

by facebook C++ Version: v8.1.1 License: GPL-2.0

X-Ray Key Features Code Snippets(1)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | rocksdb Summary

rocksdb is a C++ library typically used in Database applications. rocksdb has no bugs, it has no vulnerabilities, it has a Strong Copyleft License and it has medium support. You can download it from GitHub.

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com). This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database. Start with example usage here: See the github wiki for more explanation. The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning. Questions and discussions are welcome on the RocksDB Developers Public Facebook group and email list on Google Groups.

Support

Quality

Security

License

Reuse

Support

rocksdb has a medium active ecosystem.

It has 25420 star(s) with 5808 fork(s). There are 1003 watchers for this library.

It had no major release in the last 12 months.

There are 528 open issues and 2331 have been closed. On average issues are closed in 455 days. There are 347 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of rocksdb is v8.1.1

Quality

rocksdb has 0 bugs and 0 code smells.

Security

rocksdb has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

rocksdb code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

rocksdb is licensed under the GPL-2.0 License. This license is Strong Copyleft.

Strong Copyleft licenses enforce sharing, and you can use them when creating open source projects.

Reuse

rocksdb releases are available to install and integrate.

It has 47407 lines of code, 5947 functions and 407 files.

It has medium code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of rocksdb

Get all kandi verified functions for this library.

rocksdb Key Features

No Key Features are available at this moment for rocksdb.

rocksdb Examples and Code Snippets

I get a maximum number of open files exceeded error by RocksDB when running a Faust app locally. How can I fix this?

pypi

Lines of Code : 143

License : No License

Copy

Faust supports kafka with version >= 0.10.

.. _`introduction`: http://faust.readthedocs.io/en/latest/introduction.html

.. _`quickstart`: http://faust.readthedocs.io/en/latest/playbooks/quickstart.html

.. _`User Guide`: http://faust.readthedocs.

Community Discussions

Trending Discussions on rocksdb

Kafka Streams RocksDB large state

The Kafka topic is here, a Java consumer program finds it, but lists none of its content, while a kafka-console-consumer is able to

Flink job requiring a lot of memory despite using rocksdb state backend

Flink state using RocksDB

How can I solve busy time problem in process function?

How does Flink save state in checkpoint/savepoint if some of state descriptor is removed

How to run Faust from Docker - ERROR: Failed building wheel for python-rocksdb

What would happen if a key is not seen but rocksdb has state about that key?

Flink StateFun high availability exception: "java.lang.IllegalStateException: There is no operator for the state ....."

Where does Flink store Timers and State ttl?

QUESTION

Kafka Streams RocksDB large state

Asked 2022-Apr-03 at 20:15

Is it okay to hold large state in RocksDB when using Kafka Streams? We are planning to use RocksDB as an eventstore to hold billions of events for ininite of time.

...

ANSWER

Answered 2022-Apr-03 at 20:15

The main limitation would be disk space, so sure, it can be done, but if the app crashes for any reason, you might be waiting for a while for the app to rebuild its state.

Source https://stackoverflow.com/questions/71728337

QUESTION

The Kafka topic is here, a Java consumer program finds it, but lists none of its content, while a kafka-console-consumer is able to

Asked 2022-Feb-16 at 13:23

It's my first Kafka program.

From a kafka_2.13-3.1.0 instance, I created a Kafka topic poids_garmin_brut and filled it with this csv:

...

ANSWER

Answered 2022-Feb-15 at 14:36

Following should work.

Source https://stackoverflow.com/questions/71122596

QUESTION

Flink job requiring a lot of memory despite using rocksdb state backend

Asked 2022-Feb-07 at 13:08

I have a job running on Flink 1.14.3 (Java 11) that uses rocksdb as the state backend. The problem is that the job requires an amount of memory pretty similar to the overall state size.

Indeed, for making it stable (and capable of taking snapshots) this is what I'm using:

4 TMs with 30 GB of RAM and 7 CPUs
Everything is run on top of Kubernetes on AWS using nodes with 32 GB of RAM and locally attached SSD disks (M5ad instances for what it's worth)

I have these settings in place:

...

ANSWER

Answered 2022-Feb-04 at 18:54

RocksDB is designed to use all of the memory you give it access to -- so if it can fit all of your state in memory, it will. And given that you've increased taskmanager.memory.managed.fraction from 0.4 to 0.9, it's not surprising that your overall memory usage approaches its limit over time.

If you give RocksDB rather less memory, it should cope. Have you tried that?

Source https://stackoverflow.com/questions/70986020

QUESTION

Flink state using RocksDB

Asked 2022-Jan-30 at 10:25

What is the difference between using RocksDB to store operator state checkpoints vs using RocksDB as cache(instead of a cache like Redis)in Flink job? I have a requirement to store data processed from Flink job to a cache for 24 hours and perform some computations in streaming job based on that data. The data has to be removed past 24 hrs. Can RocksDB be used for this purpose?

...

ANSWER

Answered 2022-Jan-30 at 10:25

The role that RocksDB plays in Flink is not really a checkpoint store or a cache. A checkpoint store must be reliable, and capable of surviving failures; Flink does not rely on RocksDB to survive failures. During checkpointing Flink copies the state in RocksDB to a distributed file system. During recovery, a new RocksDB instance will be created from the latest checkpoint. Caches, on the other hand, are a nice-to-have storage layer that can transparently fall back to some ground truth storage in the case of a cache miss. This comes closer to describing how the RocksDB state backend fits into Flink, except that Flink's state backends are essential components, rather than nice-to-haves. If the state for a running job can't be found in RocksDB, it doesn't exist.

Setting that aside, yes, you can store data in RocksDB for 24 hours and then remove it (or have it removed). You can explicitly remove it by using a Timer with a KeyedProcessFunction, and then clear an entry when the Timer fires. Or you can use the State TTL mechanism to have Flink clear state for you automatically.

You don't have to use Flink with RocksDB. The fully in-memory heap-based state backend is a higher performance alternative that offers the same exactly-once fault-tolerance guarantees, but it doesn't spill to disk like RocksDB, so you are more limited in how much state can be managed.

Source https://stackoverflow.com/questions/70900933

QUESTION

How can I solve busy time problem in process function?

Asked 2022-Jan-12 at 09:16

I have a flink(v1.13.3) application with un-bounded stream (using kafka). And one of the my stream is so busy. And also busy value (I can see on the UI) increases over the time. When I just start flink application:

sum by(task_name) (flink_taskmanager_job_task_busyTimeMsPerSecond{job="Flink", task_name="MyProcessFunction"}) returns 300-450 ms
After five++ hours sum by(task_name) (flink_taskmanager_job_task_busyTimeMsPerSecond{job="Flink", task_name="MyProcessFunction"}) returns 5-7 sn.

This function is so simple, and it just use rocksdb for the state backend:

...

ANSWER

Answered 2022-Jan-12 at 09:16

Some slowdown is to be expected once RocksDB reaches the point where the working state no longer fits in memory. However, in this case you should be able to dramatically improve performance by switching from ValueState to MapState.

Currently you are deserializing and reserializing the entire hashSet for every record. As these hashSets grow over time, performance degrades.

The RocksDB state backend has an optimized implementation of MapState. Each individual key/value entry in the map is stored as a separate RocksDB object, so you can lookup, insert, and update entries without having to do serde on the rest of the map.

ListState is also optimized for RocksDB (it can be appended to without deserializing the list). In general it's best to avoid storing collections in ValueState when using RocksDB, and use ListState or MapState instead wherever possible.

Since the heap-based state backend keeps its working state as objects on the heap, it doesn't have the same issues.

Source https://stackoverflow.com/questions/70677605

QUESTION

How does Flink save state in checkpoint/savepoint if some of state descriptor is removed

Asked 2021-Dec-28 at 09:39

If I have a simple flink job with 2 keyed states, say State1 and State2.

The job is configured with rocksDB backend. Each of the states hold 10GB data.

If I update the code so that one of the state is not used(state descriptor deleted, and related code removed.). For example State1 is deleted.

When next time flink trigger checkpoint or I trigger savepoint manually. Will the checkpoint/savepoint still hold data of State1 or not?

...

ANSWER

Answered 2021-Dec-28 at 09:39

If you are using RocksDB with incremental checkpoints, then state for the obsolete state descriptor will remain in checkpoints until it is compacted away (but it can be ignored). With any full snapshot, nothing of State1 will remain.

With RocksDB, expired state is eventually removed by a RocksDB compaction filter. Until then, if StateTtlConfig.StateVisibility.NeverReturnExpired is set the state backend returns null in the place of expired values.

Documentation on State TTL

Source https://stackoverflow.com/questions/70445264

QUESTION

How to run Faust from Docker - ERROR: Failed building wheel for python-rocksdb

Asked 2021-Dec-27 at 23:37

I'm trying to run Python Faust from Docker.

Based on this documentation: https://faust.readthedocs.io/en/latest/userguide/installation.html

I created a simple Docker file:

...

ANSWER

Answered 2021-Dec-27 at 23:37

Read the error message, where it is clearly stated you are missing a header file:

fatal error: rocksdb/slice.h: No such file or directory 705 | #include "rocksdb/slice.h" | ^~~~~~~~~~~~~~~~~ compilation terminated. error: command '/usr/bin/gcc' failed with exit code 1

Accordingly, you'll need to build and install RocksDB. This is separate from the installation of faust[rocksdb] with pip. That simply installs python-rocksdb, the Python interface to the underlying libraries.

There is even a (third-party) RocksDB docker image based on Python 3.7 Slim.

You could use that directly or take some tricks from the Dockerfile for that image.

Source https://stackoverflow.com/questions/70500665

QUESTION

What would happen if a key is not seen but rocksdb has state about that key?

Asked 2021-Dec-15 at 21:16

Let's say I have process function like this one (with rocksdb state backend):

...

ANSWER

Answered 2021-Dec-15 at 21:16

The state for the inactive key "orange" will be removed from RocksDB during the first RocksDB compaction that occurs after 10 minutes have elapsed since the state for that key was created (because the TTL configuration builder was configured with a 10 minute TTL timeout). Until then the state will linger in RocksDB, but because you have configured StateVisibility.NeverReturnExpired Flink will pretend it's not there should you try to access it.

Source https://stackoverflow.com/questions/70363213

QUESTION

Flink StateFun high availability exception: "java.lang.IllegalStateException: There is no operator for the state ....."

Asked 2021-Dec-15 at 16:51

I have 2 questions related to high availability of a StateFun application running on Kubernetes

Here are details about my setup:

Using StateFun v3.1.0
Checkpoints are stored on HDFS (state.checkpoint-storage: filesystem)
Checkpointing mode is EXACTLY_ONCE
State backend is rocksdb and incremental checkpointing is enabled

1- I tried both Zookeeper and Kubernetes HA settings, result is the same (log below is from a Zookeeper HA env). When I kill the jobmanager pod, minikube starts another pod and this new pod fails when it tries to load last checkpoint:

...

ANSWER

Answered 2021-Dec-15 at 16:51

In statefun <= 3.2 routers do not have manually specified UIDs. While Flinks internal UID generation is deterministic, the way statefun generates the underlying stream graph may not be in some cases. This is a bug. I've opened a PR to fix this in a backwards compatible way[1].

[1] https://github.com/apache/flink-statefun/pull/279

Source https://stackoverflow.com/questions/70316498

QUESTION

Where does Flink store Timers and State ttl?

Asked 2021-Dec-03 at 12:46

I am using Flink with v1.13.2

Many of the process functions use registerProcessingTimeTimer to clear state:

...

ANSWER

Answered 2021-Dec-03 at 12:46

Where will timers created by timerService be stored? (Stored in RocksDB or task memory)

By default, in RocksDB. You also have the option to keep your timers on the heap, but unless they are few in number, this is a bad idea because checkpointing heap-based timers blocks the main stream processing thread, and they add stress to the garbage collector.

Where state time-to-live created by statettl config will be stored?

This will add a long to each item of state (in the state backend, so in RocksDB).

Is there anything saved into the memory when I use timerService or statettl?

Not if you are using RocksDB for both state and timers.

If I have millions of keys which way should I prefer?

Keep your timers in RocksDB.

Creating millions of keys can lead to out of memory exception when I use timerService? Creating millions of keys can lead to out of memory exception when I use statettl?

It is always possible to have out-of-memory exceptions with RocksDB irrespective of what you are storing in it; the native library is not always well behaved with respect to living within the memory it has been allocated. But it shouldn't grow in an unbounded way, and these choices you make about timers and state TTL shouldn't make any difference.

Improvements were made in Flink 1.14 (by upgrading to a newer version of RocksDB), but some problems are still being seen. In the worst case you might need to set your actual process memory limit in the OS to something larger than what you tell Flink it can use.

Source https://stackoverflow.com/questions/70186907

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install rocksdb

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: