HuffMan- | huffman compression function | Compression library
kandi X-RAY | HuffMan- Summary
kandi X-RAY | HuffMan- Summary
huffman compression function
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of HuffMan-
HuffMan- Key Features
HuffMan- Examples and Code Snippets
public int compare(HuffmanNode x, HuffmanNode y) {
return x.data - y.data;
}
Community Discussions
Trending Discussions on HuffMan-
QUESTION
Given a random integer, for example, 19357982357627685397198. How can I compress these numbers into a string of text that has fewer characters?
The string of text must only contain numbers or alphabetical characters, both uppercase and lowercase.
I've tried Base64 and Huffman-coding that claim to compress, but none of them makes the string shorter when writing on a keyboard.
I also tried to make some kind of algorithm that tries to divide the integer by the numbers "2,3,...,10" and check if the last number in the result is the number it was divided by (looks for 0 in case of division by 10). So, when decrypting, you would just multiply the number by the last number in the integer. But that does not work because in some cases you can't divide by anything and the number would stay the same, and when it would be decrypted, it would just multiply it into a larger number than you started with.
I also tried to divide the integer into blocks of 2 numbers starting from left and giving a letter to them (a=1, b=2, o=15), and when it would get to z it would just roll back to a. This did not work because when it was decrypted, it would not know how many times the number rolled over z and therefore be a much smaller number than in the start.
I also tried some other common encryption strategies. For example Base32, Ascii85, Bifid Cipher, Baudot Code, and some others I can not remember.
It seems like an unsolvable problem. But because it starts with an integer, each number can contain 10 different combinations. While in the alphabet, letters can contain 26 different combinations. This makes it so that you can store more data in 5 alphabetical letters, than in a 5 digit integer. So it is possible to store more data in a string of characters than in an integer in mathematical means, but I just can't find anyone who has ever done it.
...ANSWER
Answered 2021-Jun-01 at 21:47You switch from base 10 to eg. base 62 by repeatedly dividing by 62 and record the remainders from each step like this:
QUESTION
I have a requirement where text files are send from one location to other. Both location are in our control. The nature of content and the words that could appear in this are mostly the same. Which means, if I keep the delate dictionary
in both location once, there is no need to send it with file.
I have been reading about this last 1 week and experimenting with some available codes such as this & this.
However, I am still in dark.
Few questions I still have:
- Can we generate and use custom deflate dictionary from a preset of words?
- Can we send file without the deflate dictionary and use local one?
- If not gzip, are there any such compression library that can be used for this purpose?
Some references I stumbled upon so far:
...ANSWER
Answered 2021-May-22 at 00:15The zlib library supports dictionaries with the zlib (not gzip) format. See deflateSetDictionary()
and inflateSetDictionary()
.
There is nothing special about the construction of a dictionary. All it is is 32K bytes of strings that you believe will occur often in the data you are compressing. You should put the most common strings at the end of the 32K.
QUESTION
I am running the Java program shown here to generate canonical Huffman codes, https://www.geeksforgeeks.org/canonical-huffman-coding/
Although the code gives the correct canonical Huffman codes with the shown input, for other cases I don't find the codes to be prefix code and correct. For example ,
...ANSWER
Answered 2021-May-09 at 22:24It is generating the codes correctly, but then printing them incorrectly. It is leaving off the leading zero bits of the codes that have them. They should have prepended the necessary zero bits after converting the number to a string of digits.
If you replace the line that prints the code with this:
QUESTION
I've got a little MatLab script, which I try to understand. It doesn't do very much. It only reads a text from a file and encode and decode it with the Huffman-functions. But it throws an error while decoding:
"error: out of memory or dimension too large for Octave's index type
error: called from huffmandeco>dict2tree at line 95 column 19"
I don't know why, because I debugged it and don't see a large index type.
I added the part which calculates p from the input text.
...ANSWER
Answered 2021-Apr-15 at 20:46I haven't weeded through the code enough to know why yet, but huffmandict
is not ignoring zero-probability symbols the way it claims to. Nor have I been able to find a bug report on Savannah, but again I haven't searched thoroughly.
A workaround is to limit the symbol list and their probabilities to only the symbols that actually occur. Using containers.Map
would be ideal, but in Octave you can do that with a couple of the outputs from unique
:
QUESTION
As the title stated, I'm writing a function to compute Huffman codes for symbols in a tree, but I feel completely lost.
A branch looks like this:
...ANSWER
Answered 2020-Jun-21 at 20:59Assuming that your Huffman tree is valid (meaning we can ignore :frequency
), and that 0
means 'left' and 1
means 'right':
QUESTION
Is there a way of computing the prefix-free coding of a given dictionary of letters and their frequencies. Similar to Huffman-Coding but dynamically computed - how does the optimization function look like?
The problem with building the tree just to position i of the dictionary is, that the lowest frequent letters could change and so the whole tree's structure would.
...ANSWER
Answered 2019-Aug-05 at 03:22Yes, there are several ways to generate prefix-free codes dynamically.
As you suggested, it would be conceptually simple to start with some default frequency, track the frequencies of the letters used so far, and for every letter decoded, increment that letter's count and then re-build a Huffman tree from all the counts. (potentially completely changing the tree after each letter). That would require a lot of work for each letter and be very slow -- and yet there are a couple of adaptive Huffman coding algorithms that effectively do the same thing -- using clever algorithms that do much less work, and so are faster.
Many other data compression algorithms also generate prefix-free codes dynamically much faster than any adaptive Huffman algorithm, at a small sacrifice of compression -- such as Polar codes or Engel coding or universal codes such as Elias delta coding.
The arithmetic coding data compression algorithm is technically not a prefix-free code, but typically gives slightly better compression (but runs slower) than either static Huffman coding or adaptive Huffman coding. Arithmetic coding is generally implemented adaptively, tracking the frequencies of all the letters used so far. (Many arithmetic coding implementations track even more context -- if the previous letter was a "t", it remembers that the most-frequent letter in this context is "h" and exactly how frequent it was, etc., giving even better compression).
QUESTION
ANSWER
Answered 2019-Aug-03 at 22:27We don't know for sure why Phil Katz chose 15, but it was likely to facilitate a fast implementation in a 16-bit processor.
No, zlib will not fail. It happens all the time. The zlib implementation applies the normal Huffman algorithm, after which if the longest code is longer than 15 bits, it proceeds to modify the codes to force them all to 15 bits or less.
Note that your example resulting in a 256-bit long code would require a set of 2256 ~= 1077 symbols in order to arrive at those frequencies. I don't think you have enough memory for that.
In any case, zlib normally limits a deflate block to 16384 symbols. For that number, the maximum Huffman code length is 19. That comes from a Fibonacci sequence of probabilities, not your powers of two. (Left as an exercise for the reader.)
QUESTION
Is there a more elegant way to express the following code (e.g. without explicit for-loop)?
...ANSWER
Answered 2019-Apr-14 at 10:01How about:
QUESTION
I have just read this:
This is where a really smart idea called Huffman coding comes in! The idea is that we represent our characters (like a, b, c, d, ….) with codes like
...
ANSWER
Answered 2019-Feb-12 at 09:28Huffman code works by laying out data in a tree. If you have a binary tree, you can associate every leaf to a code by saying that left child corresponds to a bit at 0 and right child to a 1. The path that leads from the root to a leaf corresponds to a code in a not ambiguous way.
This works for any tree and the prefix property is based on the fact that a leaf is terminal. Hence, you cannot go to leaf (have a code) by passing though another leaf (by having another code be a prefix).
The basic idea of Huffman coding is that you can build trees in such a way that the depth of every node is correlated with the probability of appearance of the node (codes more likely to happen will be closer the root).
There are several algorithms to build such a tree. For instance, assume you have a set of items you want to code, say a..f. You must know the probabilities of appearance every item, thanks to either a model of the source or an analysis of the actual values (for instance by analysing the file to code).
Then you can:
- sort the items by probability
- pickup the two items with the lowest probability
- remove these items, group them in a new compound node and assign one item to left child (code 0) and the other to right child (code 1).
- The probability of the compound node is the sum of individual probabilities and insert this new node in the sorted item list.
- goto 2 while the number of items is >1
For the previous tree, it may correspond to a set of probabilities
a (0.5) b (0.2) c (0.1) d (0.05) e (0.05) f (0.1)
Then you pick items with the lowest probability (d and e), group them in a compound node (de) and get the new list
a (0.5) b (0.2) c (0.1) (de) (0.1) f (0.1)
And the successive item lists can be
a (0.5) b (0.2) c(de) (0.2) f (0.1)
a (0.5) b (0.2) (c(de))f (0.3)
a (0.5) b((c(de))f) (0.5)
a(b(((c(de))f)) 1.0
So the prefix property is insured by construction.
QUESTION
I have successfully built my Huffman tree and I have a method that traverses the the tree and saves the Huffman code for each character as a String of 1s and 0s:
...ANSWER
Answered 2018-Jun-20 at 04:08Don't use a String
to store the binary result, use a java.util.BitSet
.
It does exactly what you want, allowing you to set individual bits by index position.
When you are ready to extract the value in binary you can use toByteArray()
.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install HuffMan-
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page