rdedup | Data deduplication engine , supporting optional compression | Continuous Backup library
kandi X-RAY | rdedup Summary
kandi X-RAY | rdedup Summary
rdedup is a data deduplication engine and a backup software. See current project status and original use case description wiki pages. rdedup is generally similar to existing software like duplicacy, restic, attic, duplicity, zbackup, etc., with a skew towards asymmetric encryption and synchronization friendly data model. Thanks to Rust and solid architecture, rdedup is also exteremely performant and very reliable (no data-loss bugs ever reported). rdedup is written in Rust and provides both command line tool and library API (rdedup-lib). The library can be used to embed the core engine into other applications, or building custom frontends and tools.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of rdedup
rdedup Key Features
rdedup Examples and Code Snippets
Community Discussions
Trending Discussions on rdedup
QUESTION
I have a MyReader
that implements Iterator
and produces Buffer
s where Buffer : Send
. MyReader
produces a lot of Buffer
s very quickly, but I have a CPU-intensive job to perform on each Buffer
(.map(|buf| ...)
) that is my bottleneck, and then gather the results (ordered). I want to parallelize the CPU intense work - hopefully to N threads, that would use work stealing to perform them as fast as the number of cores allows.
Edit: To be more precise. I am working on rdedup
. MyStruct
is Chunker
which reads io::Read
(typically stdio), finds parts (chunks) of data and yields them. Then map()
is supposed, for each chunk, to calculate sha256 digest of it, compress, encrypt, save and return the digest as the result of map(...)
. Digest of saved data is used to build index
of the data. The order between chunks being processed by map(...)
does not matter, but digest returned from each map(...)
needs to be collected in the same order that the chunks were found. The actual save
to file step is offloaded to yet another thread (writter thread). actual code of PR in question
I hoped I can use rayon
for this, but rayon
expect an iterator that is already parallizable - eg. a Vec<...>
or something like that. I have found no way to get a par_iter
from MyReader
- my reader is very single-threaded in nature.
There is simple_parallel
but documentation says it's not recommended for general use. And I want to make sure everything will just work.
I could just take a spmc queue implementation and a custom thread_pool
, but I was hopping for an existing solution that is optimized and tested.
There's also pipeliner
but doesn't support ordered map yet.
ANSWER
Answered 2017-Feb-27 at 09:38In general, preserving order is a pretty tough requirement as far as parallelization goes.
You could try to hand-make it with a typical fan-out/fan-in setup:
- a single producer which tags inputs with a sequential monotonically increasing ID,
- a thread pool which consumes from this producer and then sends the result toward the final consumer,
- a consumer who buffers and reorders result so as to treat them in the sequential order.
Or you could raise the level of abstraction.
Of specific interest here: Future
.
A Future
represents the result of a computation, which may or may not have happened yet. A consumer receiving an ordered list of Future
can simply wait on each one, and let buffering occur naturally in the queue.
For bonus points, if you use a fixed size queue, you automatically get back-pressure on the consumer.
And therefore I would recommend building something of CpuPool
.
The setup is going to be:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install rdedup
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page