rocksdb | Optimize rocksdb compaction

by FlashSQL C++ Version: Current License: GPL-2.0

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | rocksdb Summary

rocksdb is a C++ library. rocksdb has no bugs, it has no vulnerabilities, it has a Strong Copyleft License and it has low support. You can download it from GitHub.

This project aims to optimize RocksDB's storage engine for almost-sequential log data (ASLog). This project consists of two main components. One is to optimize compaction in RocksDB for ASLog. The other is to optimize a cleaning operation for data which have a timestamp. You can see the progress here(*Language: Korean). Most of documents are written in Korean.

Support

Quality

Security

License

Reuse

Support

rocksdb has a low active ecosystem.

It has 15 star(s) with 6 fork(s). There are 2 watchers for this library.

It had no major release in the last 6 months.

rocksdb has no issues reported. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of rocksdb is current.

Quality

rocksdb has no bugs reported.

Security

rocksdb has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

rocksdb is licensed under the GPL-2.0 License. This license is Strong Copyleft.

Strong Copyleft licenses enforce sharing, and you can use them when creating open source projects.

Reuse

rocksdb releases are not available. You will need to build from source code and install.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of rocksdb

Get all kandi verified functions for this library.

rocksdb Key Features

No Key Features are available at this moment for rocksdb.

rocksdb Examples and Code Snippets

No Code Snippets are available at this moment for rocksdb.

Community Discussions

Trending Discussions on rocksdb

Configure RocksDB in flink 1.13

Apple M1 Docker error cc1plus: error: unknown value 'armv8-a-march=armv8-a' for -march

Python leaks memory if I read a tuple instead of single value

Error when finishing an application with a savepoint

Getting Out of Memory exception possibly due to the ktable related state store

Apache Flink RocksDB state management

Substrate 3.0 "on_runtime_upgrade" function does not work

Union of more than two streams in apache flink

Flink-RocksDB behaviour after task manager failure

ROCKSDB Failed to acquire lock due to rocksdb_max_row_locks

QUESTION

Configure RocksDB in flink 1.13

Asked 2021-Jun-04 at 07:09

I have read about EmbeddedRocksDBStateBackend in Flink 1.13 version but has size limitations, so I want to keep the current configuration of my previous Flink version 1.11, but the point is that this way of configuring the RocksDB is deprecated (new RocksDBStateBackend("path", true);).

I have tried with the new configuration using EmbeddedRocksDBStateBackend (new EmbeddedRocksDBStateBackend(true)) and I have this error:

...

ANSWER

Answered 2021-Jun-04 at 07:09

In Flink 1.13 we reorganized the state backends because the old way had resulted in many misunderstandings about how things work. So these two concerns were decoupled:

Where your working state is stored (the state backend). (In the case of RocksDB, it should be configured to use the fastest available local disk.)
Where checkpoints are stored (the checkpoint storage). In most cases, this should be a distributed filesystem.

With the old API, the fact that two different filesystems are involved in the case of RocksDB was obscured by the way the checkpointing path was passed to the RocksDBStateBackend constructor. So that bit of configuration has been moved elsewhere (see below).

This table shows the relationships between the legacy state backends and the new ones (in combination with checkpoint storage):

Legacy State Backend New State Backend + Checkpoint Storage MemoryStateBackend HashMapStateBackend + JobManagerCheckpointStorage FsStateBackend HashMapStateBackend + FileSystemCheckpointStorage RocksDBStateBackend EmbeddedRocksDBStateBackend + FileSystemCheckpointStorage

In your case you want to use the EmbeddedRocksDBStateBackend with FileSystemCheckpointStorage. The problem you are currently having is that you are using in-memory checkpoint storage (JobManagerCheckpointStorage) with RocksDB, which severely limits how much state can be checkpointed.

You can fix this by either specifying a checkpoint directory inflink-conf.yaml

Source https://stackoverflow.com/questions/67830641

QUESTION

Apple M1 Docker error cc1plus: error: unknown value 'armv8-a-march=armv8-a' for -march

Asked 2021-May-31 at 17:35

Getting this error while building docker images on Mac OS BigSur with M1 chip.

What I've tried: Installed docker for Apple Silicon Graphic M1 from docker site

It fails while trying to install RocksDB from Docker

...

ANSWER

Answered 2021-May-31 at 17:35

There are a couple of issues to address. The dockerfile as you have it will download a base golang ARM image, and try to use that to build. That's fine, as long as the required libs "know how" to build with an arm architecture. If they don't know how to build under arm (as seems to be the case here), you may want to try building under an AMD image of golang.

Intel / AMD containers will run under ARM docker on an M1. There are a few ways to build AMD containers on an M1. You can use buildkit, and then: docker buildx build --platform linux/amd64 . or, you can add the arch to the source image by modifying the Dockerfile to include something like:

Source https://stackoverflow.com/questions/67770443

QUESTION

Python leaks memory if I read a tuple instead of single value

Asked 2021-May-27 at 16:00

I'm querying whether a key exists in a RocksDB database using Python. The API (see bottom) implies it returns a two-element tuple. So I receive both tuple elements:

...

ANSWER

Answered 2021-May-27 at 15:51

Your code

Source https://stackoverflow.com/questions/67725480

QUESTION

Error when finishing an application with a savepoint

Asked 2021-May-18 at 17:05

I'm trying to finish some applications that use RocksDB state backend in the incremental mode and I want to keep a savepoint to start use in the next execution. Whenever I try do finish, this error shows.

Errors:

...

ANSWER

Answered 2021-May-18 at 17:05

There was an issue opended.

https://issues.apache.org/jira/browse/FLINK-21028.

The version I use has this problem (1.11.2).

Source https://stackoverflow.com/questions/67508369

QUESTION

Getting Out of Memory exception possibly due to the ktable related state store

Asked 2021-May-17 at 09:30

We have a kstreams app doing kstream-kstable inner join. Both the topics are high volume with 256 partitions each. kstreams App is deployed on 8 nodes with 8 GB heap each right now. We see that the heap memory keeps constantly growing and eventually OOM happens. I am not able to get the heap dump as its running in a container which gets killed when that happens. But, I have tried a few things to gain confidence that it is related to the state stores/ktable related stuff. Without the below RocksDBConfigSetter the memory gets used up pretty quick, but with the below it is slowed down to some extent. Need some guidance to proceed further , thanks

I added below 3 properties,

...

ANSWER

Answered 2021-May-17 at 09:30

You could try to limit the memory usage of RocksDB across all RocksDB instances on one node. To do so you must configure RocksDB to cache the index and filter blocks in the block cache, limit the memtable memory through a shared WriteBufferManager and count its memory against the block cache, and then pass the same Cache object to each instance. You can find more details and a sample configuration under

https://kafka.apache.org/28/documentation/streams/developer-guide/memory-mgmt.html#rocksdb

With such a setup you can specify a soft upper bound for the total heap used by all RocksDB state stores on one single instance (TOTAL_OFF_HEAP_MEMORY in the sample configuration) and then specify how much of that heap is used for writing to and reading from the state stores on one single node (TOTAL_MEMTABLE_MEMORY and INDEX_FILTER_BLOCK_RATIO in the sample configuration, respectively).

Since all values are app and workload specific you need to experiment with them and monitor the RocksDB state stores with the metrics provided by Kafka Streams.

Guidance how to handle RocksDB issues in Kafka Streams can be found under:

https://www.confluent.io/blog/how-to-tune-rocksdb-kafka-streams-state-stores-performance/

Especially for your case, the following section might be interesting:

https://www.confluent.io/blog/how-to-tune-rocksdb-kafka-streams-state-stores-performance/#high-memory-usage

Source https://stackoverflow.com/questions/67526810

QUESTION

Apache Flink RocksDB state management

Asked 2021-Apr-17 at 17:47

I'm reading 2 kafka topics in the same flink jobs.

Stream1: Messages comes from the first topic are saved to rocksdb, then it will union with stream2.
Stream2: Messages comes from the second topic are enriched with state saved by stream1, then it will union with the stream1.

Topic1 and topic 2 are different sources but basically the output is the same for two sources. I have to just enrich data come from topic2 with the data come from topic1.

Here is flow;

...

ANSWER

Answered 2021-Apr-17 at 17:47

Seems that You should be able to achieve exactly what You want by using KeyedCoProcessFunction. This would like that more or less:

Source https://stackoverflow.com/questions/67139378

QUESTION

Substrate 3.0 "on_runtime_upgrade" function does not work

Asked 2021-Apr-13 at 11:38

I want to try "Runtime Upgrade" and "Storage Migration". I tried by steps as following,but "on_runtime_upgrade" function did not work. Unable to verify that the "on_runtime_upgrade" function is called. The implemented debug log is not output.

Download "substrate-node-template 3.0".
Compile it.
I ran the node with the following command. "target/release/node-template --dev -l runtime = debug".
Implement "on_runtime_upgrade" in "palet-template".The program I tried "substrate-node-template/pallets/template/lib.rs" is listed below.
Compile it by "cargo build --release -p node-template-runtime" command.
Upload "node_template_runtime.compact.wasm" by using "sudo" & "setcode" command.
I checked the execution log of the node, but I couldn't check the log set to "on_runtime_upgrade".

--lib.rs--

...

ANSWER

Answered 2021-Apr-13 at 11:38

This Code works. But "debug::info!" does not work. The value of "Something" would be "32".

Source https://stackoverflow.com/questions/66912571

QUESTION

Union of more than two streams in apache flink

Asked 2021-Apr-06 at 14:39

I have an architecture question regarding the union of more than two streams in Apache Flink.

We are having three and sometime more streams that are some kind of code book with whom we have to enrich main stream. Code book streams are compacted Kafka topics. Code books are something that doesn't change so often, eg currency. Main stream is a fast event stream. Our goal is to enrich main stream with code books.

There are three possible ways as I see it to do it:

Make a union of all code books and then join it with main stream and store the enrichment data as managed, keyed state (so when compact events from kafka expire I have the codebooks saved in state). This is now only way that I tired to do it. Deserilized Kafka topic messages which are in JSON to POJOs eg. Currency, OrganizationUnit and so on. I made one big wrapper class CodebookData with all code books eg:

...

ANSWER

Answered 2021-Apr-06 at 13:58

In many cases where you need to do several independent enrichment joins like this, a better pattern to follow is to use a fan-in / fan-out approach, and perform all of the joins in parallel.

Something like this, where after making sure each event on the main stream has a unique ID, you create 3 or more copies of each event:

Then you can key each copy by whatever is appropriate -- the currency, the organization unit, and so on (or customer, IP address, and merchant in the example I took this figure from) -- then connect it to the appropriate cookbook stream, and compute each of the 2-way joins independently.

Then union together these parallel join result streams, keyBy the random nonce you added to each of the original events, and glue the results together.

Now in the case of three streams, this may be overly complex. In that case I might just do a series of three 2-way joins, one after another, using keyBy and connect each time. But at some point, as they get longer, pipelines built that way tend to run into performance / checkpointing problems.

There's an example implementing this fan-in/fan-out pattern in https://gist.github.com/alpinegizmo/5d5f24397a6db7d8fabc1b12a15eeca6.

Source https://stackoverflow.com/questions/66968398

QUESTION

Flink-RocksDB behaviour after task manager failure

Asked 2021-Apr-02 at 08:05

I am experimenting with my new Flink cluster(3 Different Machines-> 1 Job Manager, 2-> Task Managers) using RocksDB as State Backend however the checkpointing behaviour I am getting is a little confusing.

More specifically, I have designed a simple WordCount example and my data source is netcat. When I submit my job, the job manager assigns it to a random task manager(no replication as well). I provide some words and then I kill the currenlty running task manager. After a while, the job restarts in the other task manager and I can provide some new words. The confusing part is that state from the first task manager is preserved even when I have killed it.

To my understanding, RocksDB maintains its state in a local directory of the running task manager, so what I expected was when the first task manager was killed to lose the entire state and start counting words from the beginning. So Flink seems to somehow maintain its state in the memory(?) or broadcasts it through JobManager?

Am I missing something?

...

ANSWER

Answered 2021-Apr-02 at 08:05

The RocksDB state backend does keep its working state on each task manager's local disk, while checkpoints are normally stored in a distributed filesystem.

If you have checkpointing enabled, then the spare task manager is able to recover the state from the latest checkpoint and resume processing.

Source https://stackoverflow.com/questions/66911177

QUESTION

ROCKSDB Failed to acquire lock due to rocksdb_max_row_locks

Asked 2021-Mar-27 at 13:07

I´m trying to load a .csv in to my rocksdb database, but it fails and show me this error:

Got error 10 'Operation aborted:Failed to acquire lock due to rocksdb_max_row_locks limit' from ROCKSDB

I've tried with SET SESSION rocksdb_max_row_locks=1073741824; but same error always.

Anyone can help me?

...

ANSWER

Answered 2021-Mar-27 at 13:07

This should do the trick (before starting the insert)

SET session rocksdb_bulk_load=1;

Source https://stackoverflow.com/questions/66241696

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install rocksdb

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: