vcdiff | Heavily optimized .NET Core vcdiff library
kandi X-RAY | vcdiff Summary
kandi X-RAY | vcdiff Summary
This is a hard fork of VCDiff, originally written by Metric, written primarily for use in Snowflake. Large chunks have been rewritten, and heavily optimized to be extremely fast, using vector intrinsics, as well as Memory and Span APIs as well as a sprinkling of unsafe pointer access to eke out every bit of performance possible. Non-scientific preliminary testing shows up to a 30x to 50x speedup compared to the original library when diffing a 2MB file. Support for xdelta3 checksums have also been included. Testing was done with xdelta 3.1, support for xdelta 3.0 patch files has not been tested. Only patch files without external compression (-S none) are supported. Wherever possible, SSE3 or AVX2 extensions are used on supported systems. Speeds are comparable, albeit slightly slower than the native xdelta3, depending on the chosen blocksize. A lot of work has gone into optimizing out the overhead of garbage collection and memory access through Memory, as well as parallelizing computational work with SIMD extensions.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of vcdiff
vcdiff Key Features
vcdiff Examples and Code Snippets
Community Discussions
Trending Discussions on vcdiff
QUESTION
I have to compare using Spark-based big data analysis data sets (text files) that are very similar (>98%) but with very large sizes. After doing some research, I found that most efficient way could be to use delta encoders. With this I can have a reference text and store others as delta increments. However, I use Scala that does not have support for delta encoders, and I am not at all conversant with Java. But as Scala is interoperable with Java, I know that it is possible to get Java lib work in Scala.
I found the promising implementations to be xdelta, vcdiff-java and bsdiff. With a bit more searching, I found the most interesting library, dez. The link also gives benchmarks in which it seems to perform very well, and code is free to use and looks lightweight.
At this point, I am stuck with using this library in Scala (via sbt). I would appreciate any suggestions or references to navigate this barrier, either specific to this issue (delata encoders), library or in working with Java API in general within Scala. Specifically, my questions are:
Is there a Scala library for delta encoders that I can directly use? (If not)
Is it possible that I place the class files/notzed.dez.jar in the project and let sbt provide the APIs in the Scala code?
I am kind of stuck in this quagmire and any way out would be greatly appreciated.
...ANSWER
Answered 2020-Oct-24 at 10:14There are several details to take into account. There is no problem in using directly the Java libraries in Scala, either using as dependencies in sbt or using as unmanaged dependencies https://www.scala-sbt.org/1.x/docs/Library-Dependencies.html: "Dependencies in lib go on all the classpaths (for compile, test, run, and console)". You can create a fat jar with your code and dependencies with https://github.com/sbt/sbt-native-packager and distributed it with Spark Submit.
The point here is to use these frameworks in Spark. To take advantage of Spark you would need split your files in blocks to distribute the algorithm across the cluster for one file. Or if your files are compressed and you have each of them in one hdfs partition you would need to adjust the size of the hdfs blocks, etc ...
You can use the C modules and include them in your project and call them via JNI as frameworks like deep learning frameworks use the native linear algebra functions, etc. So, in essence, there are a lot to discuss about how to implement these delta algorithms in Spark.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install vcdiff
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page