LZ77 | An LZ77 dictionary based compressor | Compression library

by neoben C Version: Current License: Non-SPDX

X-Ray Key Features Code Snippets Community Discussions(9)Vulnerabilities Install Support

kandi X-RAY | LZ77 Summary

LZ77 is a C library typically used in Utilities, Compression applications. LZ77 has no bugs, it has no vulnerabilities and it has low support. However LZ77 has a Non-SPDX License. You can download it from GitHub.

An LZ77 dictionary based compressor.

Support

Quality

Security

License

Reuse

Support

LZ77 has a low active ecosystem.

It has 5 star(s) with 2 fork(s). There are 2 watchers for this library.

It had no major release in the last 6 months.

LZ77 has no issues reported. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of LZ77 is current.

Quality

LZ77 has 0 bugs and 87 code smells.

Security

LZ77 has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

LZ77 code analysis shows 0 unresolved vulnerabilities.

There are 17 security hotspots that need review.

License

LZ77 has a Non-SPDX License.

Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

Reuse

LZ77 releases are not available. You will need to build from source code and install.

It has 916 lines of code, 24 functions and 5 files.

It has low code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of LZ77

Get all kandi verified functions for this library.

LZ77 Key Features

No Key Features are available at this moment for LZ77.

LZ77 Examples and Code Snippets

No Code Snippets are available at this moment for LZ77.

Community Discussions

Trending Discussions on LZ77

How does DEFLATE optimize this so much?

How to extract the encoding dictionary from gzip archives

Is it always okay to Inflate zlib-wrapped data with the windowBits of 15?

StringBuilder Duplicates the last character in the delete

C++ decoding LZ77-compressed data using std::fstream too slow

How to Merge huffman and lz77?

Can DEFLATE only compress duplicate strings up to 32 KiB apart?

LZ77 optimization

How to decode Base64 encoded binary(encoded using LZX algorithm) back to original string

QUESTION

How does DEFLATE optimize this so much?

Asked 2021-Dec-12 at 20:55

I am trying to understand the deflate algorithm, and I have read up on Huffman codes as well as LZ77 compression. I was toying around with compression sizes of different strings, and I stumbled across something I could not explain. The string aaa when compressed, both through zlib and gzip, turns out to be the same size as aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa (36 as).

Before reading about this I would have assumed the compressor does something like storing 36*a instead of each character individually, but I could not find anywhere in the specifications where that is mentioned.

Using fixed Huffman code yielded the same result, so I assume the space-saving lies in LZ77, but that only uses distance-length pairs. How would that allow a 3 length string to expand by 12 times without increasing in size?

Interrupting the string of as with one or several bs in the middle drastically increases the size. If distance-length pairs is what's doing the job, why could it not just skip over the bs when searching backwards? Or is Huffman codes being utilized and I misunderstood what fixed Huffman codes implies?

...

ANSWER

Answered 2021-Dec-12 at 20:55

The 36 "a"s are effectively run-length encoded by LZ77 by giving the first "a" as a literal, and then a match with a distance of one, and a length of 35. The length can be as much as 258 for deflate.

Look online for tutorials on LZ77, Huffman coding, and deflate. You can disassemble the resulting compressed data with infgen to get more insight into how the data is being represented.

Source https://stackoverflow.com/questions/70323083

QUESTION

How to extract the encoding dictionary from gzip archives

Asked 2021-Sep-14 at 21:03

I am looking for a method whereby I can extract the encoding dictionary made by DEFLATE algorithm from a gzip archive.

I need the LZ77 made pointers from the whole archive which refer to patterns from the file as well as the Huffman tree with the aforementioned pointers.

Is there any solution in python?

Does anyone know the https://github.com/madler/infgen/blob/master/infgen.c which might provide the dictionary?

...

ANSWER

Answered 2021-Sep-14 at 21:03

The "dictionary" used for compression at any point in the input is nothing more than the 32K bytes of uncompressed data that precede that point.

Yes, infgen will disassemble a deflate stream, showing all of the LZ77 references and the derived Huffman codes in a readable form. You could run infgen from Python and interpret the output in Python.

infgen also has a -b option for a non-human-readable binary format that might be faster to process for what you want to do.

Source https://stackoverflow.com/questions/69179042

QUESTION

Is it always okay to Inflate zlib-wrapped data with the windowBits of 15?

Asked 2021-Feb-17 at 17:24

If the zlib-wrapped data to Inflate has, in its zlib header, CINFO (which determines the LZ77 window size) that is other than 7 (the maximum valid), is it okay to Inflate it with the windowBits of 15 (usually the value of MAX_WBITS)?

I ask this question as everyone seems to do so and not care about the CINFO.

Did I misunderstood something?

...

ANSWER

Answered 2021-Feb-16 at 22:46

Yes, that's okay. The windowBits needs to be greater than or equal to the window size that the data was compressed with. It is always ok to decompress with the maximum window size (15).

Source https://stackoverflow.com/questions/66225983

QUESTION

StringBuilder Duplicates the last character in the delete

Asked 2020-Aug-22 at 08:36

I do not know the site, and I have only asked one question here. I have no idea - how to handle a code problem. I tried a lot - but I could not fix. I use StringBuilder - for because of its benefits according to the standard string

I want to delete the first character in the string - but the character that appears in the last place - is duplicated - in the last two places.

for example: i have the String abcdef, when i delete - the first instace - 'a': i got back the String bcdeff well i try - to set the length of the String to original length minus one - but this dont give any result. i try also - to set the string to new String - and after that - send the String that i was save in tmp string - but this do help either.

...

ANSWER

Answered 2020-Aug-15 at 20:15

This seems to work:

Source https://stackoverflow.com/questions/63430409

QUESTION

C++ decoding LZ77-compressed data using std::fstream too slow

Asked 2020-Aug-17 at 12:13

I have a function in my code which decodes a file compressed using the LZ77 algorithm. But on 15 MB input file decompression takes about 3 minutes (too slow). What's the reason of poor performance? On every step of the loop I read two or three bytes and get length, offset and next character. If offset is not zero I also have to move "offset" bytes back in output stream and read "length" bytes. Then I insert them to the end of the same stream before writing next character there.

...

ANSWER

Answered 2020-Aug-17 at 12:13

You could try doing it on a std::stringstream in memory instead:

Source https://stackoverflow.com/questions/63449509

QUESTION

How to Merge huffman and lz77?

Asked 2020-Jul-16 at 20:39

I have huffman and lz77 codes, but I need any way to merge this algorithms to make deflate

How i can do this?

I have to write it manually without using libraries.

...

ANSWER

Answered 2020-Jul-16 at 20:39

LZ77 gives you a sequence of literal and length/distance pairs. There are many ways to apply Huffman coding to that. The first step would be to apply Huffman coding to the literals, as if there were no LZ77. Then just pass the length/distance pairs through as is, making sure you can tell whether the next thing is a literal or a length/distance pair.

After that you can also try to code the length/distance pairs. Deflate puts the literals and lengths in a single Huffman code, and the distances in a second Huffman code. Or you could code a count of literals that are followed by one length, then put the literals and length in different Huffman codes. Or ... many other ways.

In order to be able to decode, you also need to describe the Huffman codes you use at the start of the stream.

You can read the deflate description for how it does all that.

Source https://stackoverflow.com/questions/62922641

QUESTION

Can DEFLATE only compress duplicate strings up to 32 KiB apart?

Asked 2020-Jul-10 at 22:28

According to DEFLATE spec:

Compressed representation overview

A compressed data set consists of a series of blocks, corresponding to successive blocks of input data. The block sizes are arbitrary, except that non-compressible blocks are limited to 65,535 bytes.

Each block is compressed using a combination of the LZ77 algorithm and Huffman coding. The Huffman trees for each block are independent of those for previous or subsequent blocks; the LZ77 algorithm may use a reference to a duplicated string occurring in a previous block, up to 32K input bytes before.

Each block consists of two parts: a pair of Huffman code trees that describe the representation of the compressed data part, and a compressed data part. (The Huffman trees themselves are compressed using Huffman encoding.) The compressed data consists of a series of elements of two types: literal bytes (of strings that have not been detected as duplicated within the previous 32K input bytes), and pointers to duplicated strings, where a pointer is represented as a pair . The representation used in the "deflate" format limits distances to 32K bytes and lengths to 258 bytes, but does not limit the size of a block, except for uncompressible blocks, which are limited as noted above.

So pointers to duplicate strings only go back 32 KiB, but since block size is not limited, could the Huffman code tree store two duplicate strings more than 32 KiB apart as the same code? Then is the limiting factor the block size?

...

ANSWER

Answered 2020-Jul-10 at 06:41

The Huffman tree for distances contains codes 0 to 29 (table below); the code 29, followed by 8191 in "plain" bits, means "distance 32768". That's a hard limit in the definition of Deflate. The block size is not limiting. Actually the block size is not stored anywhere: the block is an infinite stream. If you want to stop the block, you send an End-Of-Block code for that.

Source https://stackoverflow.com/questions/62827971

QUESTION

LZ77 optimization

Asked 2020-May-30 at 19:07

I have a code for LZ77 compression algorithm. It works fine with small files. But if I want to compress 100 kB and bigger files, it takes a lot of time.

I think, it's all because of this part:

...

ANSWER

Answered 2020-May-30 at 16:18

I'm not an expert, but clever people have devised efficient search algorithms that may apply here. For example check out the Knuth–Morris–Pratt algorithm https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

Source https://stackoverflow.com/questions/62104581

QUESTION

How to decode Base64 encoded binary(encoded using LZX algorithm) back to original string

Asked 2020-Feb-12 at 14:19

I'm trying to decode a string which is encoded using LZX algorithm with a LZX window size of 2 megabytes (binary) and then converted to base64.

I'm receiving this string in response from Microsoft's Update API (GetUpdateData). As per Microsoft documentation for the lzx/lz77 algorithm, the XmlUpdateBlobCompressed field is:

compressed using a LZX variant of the Lempel-Ziv compression algorithm. The LZX window size used for compressing this field is 2 megabytes.

I tried to decode/decompress the string back to its original XML with no success. I tried the lz_string library (NodeJS/Ruby) and some other libraries but had no success so far.

Here is a sample I'm trying to decode/decompress back to the original XML:

...

ANSWER

Answered 2020-Feb-12 at 14:19

As added in comment by Dave, it was the cab file which was there in response.

I have saved the cab file & extracted it using libmspack

Source https://stackoverflow.com/questions/60152165

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install LZ77

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: