rocksdb | Optimize rocksdb compaction
kandi X-RAY | rocksdb Summary
kandi X-RAY | rocksdb Summary
This project aims to optimize RocksDB's storage engine for almost-sequential log data (ASLog). This project consists of two main components. One is to optimize compaction in RocksDB for ASLog. The other is to optimize a cleaning operation for data which have a timestamp. You can see the progress here(*Language: Korean). Most of documents are written in Korean.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of rocksdb
rocksdb Key Features
rocksdb Examples and Code Snippets
Community Discussions
Trending Discussions on rocksdb
QUESTION
I have read about EmbeddedRocksDBStateBackend
in Flink 1.13 version but has size limitations, so I want to keep the current configuration of my previous Flink version 1.11, but the point is that this way of configuring the RocksDB is deprecated (new RocksDBStateBackend("path", true);
).
I have tried with the new configuration using EmbeddedRocksDBStateBackend (new EmbeddedRocksDBStateBackend(true))
and I have this error:
ANSWER
Answered 2021-Jun-04 at 07:09In Flink 1.13 we reorganized the state backends because the old way had resulted in many misunderstandings about how things work. So these two concerns were decoupled:
- Where your working state is stored (the state backend). (In the case of RocksDB, it should be configured to use the fastest available local disk.)
- Where checkpoints are stored (the checkpoint storage). In most cases, this should be a distributed filesystem.
With the old API, the fact that two different filesystems are involved in the case of RocksDB was obscured by the way the checkpointing path was passed to the RocksDBStateBackend
constructor. So that bit of configuration has been moved elsewhere (see below).
This table shows the relationships between the legacy state backends and the new ones (in combination with checkpoint storage):
Legacy State Backend New State Backend + Checkpoint StorageMemoryStateBackend
HashMapStateBackend + JobManagerCheckpointStorage
FsStateBackend
HashMapStateBackend + FileSystemCheckpointStorage
RocksDBStateBackend
EmbeddedRocksDBStateBackend + FileSystemCheckpointStorage
In your case you want to use the EmbeddedRocksDBStateBackend
with FileSystemCheckpointStorage
. The problem you are currently having is that you are using in-memory checkpoint storage (JobManagerCheckpointStorage
) with RocksDB, which severely limits how much state can be checkpointed.
You can fix this by either specifying a checkpoint directory inflink-conf.yaml
QUESTION
Getting this error while building docker images on Mac OS BigSur with M1 chip.
What I've tried: Installed docker for Apple Silicon Graphic M1 from docker site
It fails while trying to install RocksDB from Docker
...ANSWER
Answered 2021-May-31 at 17:35There are a couple of issues to address. The dockerfile as you have it will download a base golang ARM image, and try to use that to build. That's fine, as long as the required libs "know how" to build with an arm architecture. If they don't know how to build under arm (as seems to be the case here), you may want to try building under an AMD image of golang.
Intel / AMD containers will run under ARM docker on an M1. There are a few ways to build AMD containers on an M1. You can use buildkit, and then:
docker buildx build --platform linux/amd64 .
or, you can add the arch to the source image by modifying the Dockerfile
to include something like:
QUESTION
I'm querying whether a key exists in a RocksDB database using Python. The API (see bottom) implies it returns a two-element tuple. So I receive both tuple elements:
...ANSWER
Answered 2021-May-27 at 15:51Your code
QUESTION
ANSWER
Answered 2021-May-18 at 17:05There was an issue opended.
https://issues.apache.org/jira/browse/FLINK-21028.
The version I use has this problem (1.11.2).
QUESTION
We have a kstreams
app doing kstream-kstable
inner join. Both the topics are high volume with 256 partitions each. kstreams
App is deployed on 8 nodes with 8 GB heap each right now. We see that the heap memory keeps constantly growing and eventually OOM happens. I am not able to get the heap dump as its running in a container which gets killed when that happens. But, I have tried a few things to gain confidence that it is related to the state stores/ktable related stuff. Without the below RocksDBConfigSetter
the memory gets used up pretty quick, but with the below it is slowed down to some extent. Need some guidance to proceed further , thanks
I added below 3 properties,
...ANSWER
Answered 2021-May-17 at 09:30You could try to limit the memory usage of RocksDB across all RocksDB instances on one node. To do so you must configure RocksDB to cache the index and filter blocks in the block cache, limit the memtable memory through a shared WriteBufferManager
and count its memory against the block cache, and then pass the same Cache
object to each instance. You can find more details and a sample configuration under
https://kafka.apache.org/28/documentation/streams/developer-guide/memory-mgmt.html#rocksdb
With such a setup you can specify a soft upper bound for the total heap used by all RocksDB state stores on one single instance (TOTAL_OFF_HEAP_MEMORY in the sample configuration) and then specify how much of that heap is used for writing to and reading from the state stores on one single node (TOTAL_MEMTABLE_MEMORY and INDEX_FILTER_BLOCK_RATIO in the sample configuration, respectively).
Since all values are app and workload specific you need to experiment with them and monitor the RocksDB state stores with the metrics provided by Kafka Streams.
Guidance how to handle RocksDB issues in Kafka Streams can be found under:
https://www.confluent.io/blog/how-to-tune-rocksdb-kafka-streams-state-stores-performance/
Especially for your case, the following section might be interesting:
QUESTION
I'm reading 2 kafka topics in the same flink jobs.
Stream1
: Messages comes from the first topic are saved to rocksdb, then it will union with stream2.Stream2
: Messages comes from the second topic are enriched with state saved by stream1, then it will union with the stream1.
Topic1 and topic 2 are different sources but basically the output is the same for two sources. I have to just enrich data come from topic2 with the data come from topic1.
Here is flow;
...ANSWER
Answered 2021-Apr-17 at 17:47Seems that You should be able to achieve exactly what You want by using KeyedCoProcessFunction
. This would like that more or less:
QUESTION
I want to try "Runtime Upgrade" and "Storage Migration". I tried by steps as following,but "on_runtime_upgrade" function did not work. Unable to verify that the "on_runtime_upgrade" function is called. The implemented debug log is not output.
- Download "substrate-node-template 3.0".
- Compile it.
- I ran the node with the following command. "target/release/node-template --dev -l runtime = debug".
- Implement "on_runtime_upgrade" in "palet-template".The program I tried "substrate-node-template/pallets/template/lib.rs" is listed below.
- Compile it by "cargo build --release -p node-template-runtime" command.
- Upload "node_template_runtime.compact.wasm" by using "sudo" & "setcode" command.
- I checked the execution log of the node, but I couldn't check the log set to "on_runtime_upgrade".
--lib.rs--
...ANSWER
Answered 2021-Apr-13 at 11:38This Code works. But "debug::info!" does not work. The value of "Something" would be "32".
QUESTION
I have an architecture question regarding the union of more than two streams in Apache Flink.
We are having three and sometime more streams that are some kind of code book with whom we have to enrich main stream. Code book streams are compacted Kafka topics. Code books are something that doesn't change so often, eg currency. Main stream is a fast event stream. Our goal is to enrich main stream with code books.
There are three possible ways as I see it to do it:
- Make a union of all code books and then join it with main stream and store the enrichment data as managed, keyed state (so when compact events from kafka expire I have the codebooks saved in state). This is now only way that I tired to do it. Deserilized Kafka topic messages which are in JSON to POJOs eg. Currency, OrganizationUnit and so on. I made one big wrapper class CodebookData with all code books eg:
ANSWER
Answered 2021-Apr-06 at 13:58In many cases where you need to do several independent enrichment joins like this, a better pattern to follow is to use a fan-in / fan-out approach, and perform all of the joins in parallel.
Something like this, where after making sure each event on the main stream has a unique ID, you create 3 or more copies of each event:
Then you can key each copy by whatever is appropriate -- the currency, the organization unit, and so on (or customer, IP address, and merchant in the example I took this figure from) -- then connect it to the appropriate cookbook stream, and compute each of the 2-way joins independently.
Then union together these parallel join result streams, keyBy the random nonce you added to each of the original events, and glue the results together.
Now in the case of three streams, this may be overly complex. In that case I might just do a series of three 2-way joins, one after another, using keyBy and connect each time. But at some point, as they get longer, pipelines built that way tend to run into performance / checkpointing problems.
There's an example implementing this fan-in/fan-out pattern in https://gist.github.com/alpinegizmo/5d5f24397a6db7d8fabc1b12a15eeca6.
QUESTION
I am experimenting with my new Flink cluster(3 Different Machines-> 1 Job Manager, 2-> Task Managers) using RocksDB as State Backend however the checkpointing behaviour I am getting is a little confusing.
More specifically, I have designed a simple WordCount example and my data source is netcat. When I submit my job, the job manager assigns it to a random task manager(no replication as well). I provide some words and then I kill the currenlty running task manager. After a while, the job restarts in the other task manager and I can provide some new words. The confusing part is that state from the first task manager is preserved even when I have killed it.
To my understanding, RocksDB maintains its state in a local directory of the running task manager, so what I expected was when the first task manager was killed to lose the entire state and start counting words from the beginning. So Flink seems to somehow maintain its state in the memory(?) or broadcasts it through JobManager?
Am I missing something?
...ANSWER
Answered 2021-Apr-02 at 08:05The RocksDB state backend does keep its working state on each task manager's local disk, while checkpoints are normally stored in a distributed filesystem.
If you have checkpointing enabled, then the spare task manager is able to recover the state from the latest checkpoint and resume processing.
QUESTION
I´m trying to load a .csv in to my rocksdb database, but it fails and show me this error:
Got error 10 'Operation aborted:Failed to acquire lock due to rocksdb_max_row_locks limit' from ROCKSDB
I've tried with SET SESSION rocksdb_max_row_locks=1073741824;
but same error always.
Anyone can help me?
...ANSWER
Answered 2021-Mar-27 at 13:07This should do the trick (before starting the insert)
SET session rocksdb_bulk_load=1;
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install rocksdb
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page