huffman-coding | decompression program based on Huffman
kandi X-RAY | huffman-coding Summary
kandi X-RAY | huffman-coding Summary
This project is to design compression and decompression programs based on Huffman Coding. The idea of Huffman Coding is to minimize the weighted expected length of the code by means of assigning shorter codes to frequently-used characters and longer codes to seldom-used code.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of huffman-coding
huffman-coding Key Features
huffman-coding Examples and Code Snippets
Community Discussions
Trending Discussions on huffman-coding
QUESTION
Given a random integer, for example, 19357982357627685397198. How can I compress these numbers into a string of text that has fewer characters?
The string of text must only contain numbers or alphabetical characters, both uppercase and lowercase.
I've tried Base64 and Huffman-coding that claim to compress, but none of them makes the string shorter when writing on a keyboard.
I also tried to make some kind of algorithm that tries to divide the integer by the numbers "2,3,...,10" and check if the last number in the result is the number it was divided by (looks for 0 in case of division by 10). So, when decrypting, you would just multiply the number by the last number in the integer. But that does not work because in some cases you can't divide by anything and the number would stay the same, and when it would be decrypted, it would just multiply it into a larger number than you started with.
I also tried to divide the integer into blocks of 2 numbers starting from left and giving a letter to them (a=1, b=2, o=15), and when it would get to z it would just roll back to a. This did not work because when it was decrypted, it would not know how many times the number rolled over z and therefore be a much smaller number than in the start.
I also tried some other common encryption strategies. For example Base32, Ascii85, Bifid Cipher, Baudot Code, and some others I can not remember.
It seems like an unsolvable problem. But because it starts with an integer, each number can contain 10 different combinations. While in the alphabet, letters can contain 26 different combinations. This makes it so that you can store more data in 5 alphabetical letters, than in a 5 digit integer. So it is possible to store more data in a string of characters than in an integer in mathematical means, but I just can't find anyone who has ever done it.
...ANSWER
Answered 2021-Jun-01 at 21:47You switch from base 10 to eg. base 62 by repeatedly dividing by 62 and record the remainders from each step like this:
QUESTION
I have a requirement where text files are send from one location to other. Both location are in our control. The nature of content and the words that could appear in this are mostly the same. Which means, if I keep the delate dictionary
in both location once, there is no need to send it with file.
I have been reading about this last 1 week and experimenting with some available codes such as this & this.
However, I am still in dark.
Few questions I still have:
- Can we generate and use custom deflate dictionary from a preset of words?
- Can we send file without the deflate dictionary and use local one?
- If not gzip, are there any such compression library that can be used for this purpose?
Some references I stumbled upon so far:
...ANSWER
Answered 2021-May-22 at 00:15The zlib library supports dictionaries with the zlib (not gzip) format. See deflateSetDictionary()
and inflateSetDictionary()
.
There is nothing special about the construction of a dictionary. All it is is 32K bytes of strings that you believe will occur often in the data you are compressing. You should put the most common strings at the end of the 32K.
QUESTION
I am running the Java program shown here to generate canonical Huffman codes, https://www.geeksforgeeks.org/canonical-huffman-coding/
Although the code gives the correct canonical Huffman codes with the shown input, for other cases I don't find the codes to be prefix code and correct. For example ,
...ANSWER
Answered 2021-May-09 at 22:24It is generating the codes correctly, but then printing them incorrectly. It is leaving off the leading zero bits of the codes that have them. They should have prepended the necessary zero bits after converting the number to a string of digits.
If you replace the line that prints the code with this:
QUESTION
Is there a way of computing the prefix-free coding of a given dictionary of letters and their frequencies. Similar to Huffman-Coding but dynamically computed - how does the optimization function look like?
The problem with building the tree just to position i of the dictionary is, that the lowest frequent letters could change and so the whole tree's structure would.
...ANSWER
Answered 2019-Aug-05 at 03:22Yes, there are several ways to generate prefix-free codes dynamically.
As you suggested, it would be conceptually simple to start with some default frequency, track the frequencies of the letters used so far, and for every letter decoded, increment that letter's count and then re-build a Huffman tree from all the counts. (potentially completely changing the tree after each letter). That would require a lot of work for each letter and be very slow -- and yet there are a couple of adaptive Huffman coding algorithms that effectively do the same thing -- using clever algorithms that do much less work, and so are faster.
Many other data compression algorithms also generate prefix-free codes dynamically much faster than any adaptive Huffman algorithm, at a small sacrifice of compression -- such as Polar codes or Engel coding or universal codes such as Elias delta coding.
The arithmetic coding data compression algorithm is technically not a prefix-free code, but typically gives slightly better compression (but runs slower) than either static Huffman coding or adaptive Huffman coding. Arithmetic coding is generally implemented adaptively, tracking the frequencies of all the letters used so far. (Many arithmetic coding implementations track even more context -- if the previous letter was a "t", it remembers that the most-frequent letter in this context is "h" and exactly how frequent it was, etc., giving even better compression).
QUESTION
Is there a more elegant way to express the following code (e.g. without explicit for-loop)?
...ANSWER
Answered 2019-Apr-14 at 10:01How about:
QUESTION
I have just read this:
This is where a really smart idea called Huffman coding comes in! The idea is that we represent our characters (like a, b, c, d, ….) with codes like
...
ANSWER
Answered 2019-Feb-12 at 09:28Huffman code works by laying out data in a tree. If you have a binary tree, you can associate every leaf to a code by saying that left child corresponds to a bit at 0 and right child to a 1. The path that leads from the root to a leaf corresponds to a code in a not ambiguous way.
This works for any tree and the prefix property is based on the fact that a leaf is terminal. Hence, you cannot go to leaf (have a code) by passing though another leaf (by having another code be a prefix).
The basic idea of Huffman coding is that you can build trees in such a way that the depth of every node is correlated with the probability of appearance of the node (codes more likely to happen will be closer the root).
There are several algorithms to build such a tree. For instance, assume you have a set of items you want to code, say a..f. You must know the probabilities of appearance every item, thanks to either a model of the source or an analysis of the actual values (for instance by analysing the file to code).
Then you can:
- sort the items by probability
- pickup the two items with the lowest probability
- remove these items, group them in a new compound node and assign one item to left child (code 0) and the other to right child (code 1).
- The probability of the compound node is the sum of individual probabilities and insert this new node in the sorted item list.
- goto 2 while the number of items is >1
For the previous tree, it may correspond to a set of probabilities
a (0.5) b (0.2) c (0.1) d (0.05) e (0.05) f (0.1)
Then you pick items with the lowest probability (d and e), group them in a compound node (de) and get the new list
a (0.5) b (0.2) c (0.1) (de) (0.1) f (0.1)
And the successive item lists can be
a (0.5) b (0.2) c(de) (0.2) f (0.1)
a (0.5) b (0.2) (c(de))f (0.3)
a (0.5) b((c(de))f) (0.5)
a(b(((c(de))f)) 1.0
So the prefix property is insured by construction.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install huffman-coding
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page