reed-solomon | reliable Reed-Solomon erasure
kandi X-RAY | reed-solomon Summary
kandi X-RAY | reed-solomon Summary
Fast, reliable Reed-Solomon erasure coding as a native addon for Node.js.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of reed-solomon
reed-solomon Key Features
reed-solomon Examples and Code Snippets
Community Discussions
Trending Discussions on reed-solomon
QUESTION
I've some questions related to the RAID storage structure. First, let me show a picture of RAID system. [RAID][1][1]: https://i.stack.imgur.com/Whd6B.png. According to my understanding, each disk is partitioned into multiple strips. The encoding or decoding is done along a stripe, which is the collection of the strips. Let says each stripe consists of k data strips and t parity strips. If I'm using Reed-Solomon (RS) code constructed over GF(2^w), each strip with a size of "q" byte will be divided into q/k symbols and each symbol consists of w-bit.
My questions:
- When we talk about RS en/decoding, are we treat each strip as a RS symbol or each w-bit in the strip as RS symbol, although each w-bit in a strip is multiplying with the same w-bit element in GF(2^w). Description:
- When I was working on the software implementation of RAID system, the
things I've learn from each Research paper, especially from this one:
"A Performance Evaluation and Examination of Open-Source Erasure
Coding Libraries For Storage", is that the t parity strip, P=[p0,p1,...,p_{t-1}] is computed as matrix multiplication as P=CxD, where D=[d0,d1,...,d_{k-1}] and C is t x k Cauchy matrix (I'm using Cauchy matrix-based encoding as an example). - When I was reading another paper, which focus on hardware implementation of RS coding: "A Low Complexity Design of Reed Solomon Code Algorithm for Advanced RAID System", they seems to refering each strip as RS symbol. I mean how is that possible. Let says we are using GF(2^8), can a size of strip becomes only 8-bit? OR, in hardware implementation, people simply uses a higher order of finite field to construct the RAID system?
- I sometime see people describe the RAID storage system as drives, where "each drive is divided into multiple strips". So, what is the difference between the terms drive and disk? Are they used interchangbly?
ANSWER
Answered 2020-Oct-14 at 16:01each strip
Each group of blocks can be considered to be a matrix, let r = number of disks, and c = # bytes per block, then the matrix is a r row by c column matrix. The Reed Solomon like encoding and decoding are performed on each column of the matrix, where each column of the matrix is treated as an array of r bytes.
open source erasure
Erasure encoding may be different than Raid encoding. Erasure encoding can be based on a modified Vandermonde matrix, the transpose of what Wikipedia calls the systematic encoding for original view Reed Solomon:
Erasure can also be based on Cauchy matrix as shown in the paper you read.
For Raid, the encoding matrix rows corresponding to the P parity is all 1's effectively XOR, Q parity is powers of 2 (in GF(2^8)), R parity is powers of 4, ... . Unlike standard Reed Solomon, which generates parities (based on remainder of a generating polynomial), Raid generates syndromes.
In all 3 cases, the encoding matrix is the identity matrix augmented by the rows to encode the parities, but the augmented rows are different for Vandermonde, Cauchy, and Raid.
drive or disk
In this case, they mean the same thing.
encoding
Let d = number of data rows per strip, and p = number of parity rows per strip, then encoding can be implemented with a matrix multiply using the last p rows of the encoding matrix, a p by d encoding sub-matrix times the d by c matrix of data rows.
decoding
Open source erasure is done in two steps. Let e = number of erased data rows. The e erased data rows are regenerated by taking the first d rows of the encoding matrix corresponding (same row index) to the non-erased rows of data, inverting the d by d sub-matrix, then the e rows of the inverted matrix corresponding to the erased data rows results in an e by d matrix used to multiply the d by c non-erased data row matrix to regenerate the data. Once the data is regenerated, any parity rows are regenerated by re-encoding the now regenerated data.
Raid decoding is different. If there is only a single erasure in the data or P rows, then XOR is used to regenerate the erased row. If there is only a single erasure in the Q, R, ... rows, then the erased row is reencoded. For multiple row erasures, GF() algebra is normally used to regenerate erased data. This could also be done by creating an e by e matrix based on powers of 2 to the indexes of the erased rows, inverting the e by e matrix, then multiplying the inverted e by e times the first e of the p encoding rows of the encoding matrix, then using that e by c matrix to regenerate the erased data.
Note that neither of these decoding methods are conventional Reed Solomon decoding methods which are intended to correct errors as well as erasures. The modified Vandermonde matrix is the same encoding as original view systematic encoding, but erasure decoding would be the same as described above.
If the encoding was based on what Wiki calls "BCH view", then the parity rows are actual parities (remainder from generator polynomial), and during decoding, e rows of syndromes are generated, and once the e rows of syndromes are generated, then a Raid like regeneration of erased data could be done. I'm not aware of an erasure code based on "BCH view" encoding.
https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Systematic_encoding_procedure
Update based on comments.
Raid (standard version) does not use Cauchy matrix. Raid does not use modified Vandermonde matrix either. Cauchy and modified Vandermonde matrix are mostly used by jerasure type algorihtms, not Raid. Raid encoding is based on powers of 2, {1 1 1 1 1 ... } {2 4 8 16 ...}, {4 16 64 ... }.
Raid is byte oriented, not bit oriented. Each column of a "strip" is treated as an array of bytes.
Raid (and erasure) encoding and decoding is performed on the columns of a matrix, treating each column as an independent array of bytes.
As for the limitation of operating within GF(2^8), this limits Raid to 255 data drives and 255 parity (syndromes) drives. Jerasure is limited to a total of 255 (BCH view) or 256 (original view) drives for both data and parity. I"m not aware of any implementation that comes close to these limits. Raid 6 (2 parities, P and Q) specifies a max of 32 drives, but that is an arbitrary choice. A common jerasure implementation splits up a file into 17 logical strips called shards and adds 3 shards for parity, where each shard is typically stored on a different Raid block of drives or on a different node or on a different server.
QUESTION
I've a question regarding to a statement written in this paper, "Generalized Integrated Interleaved Codes". The paper mentions that erasure decoding of Reed-Solomon (RS) code incurs no miscorrection, but error-only decoding of RS code incurs a high miscorrection rate if the correction capability is too low.
From my understanding, I think the difference between erasure decoding and error-only decoding is that erasure decoding does not require to compute the error locations. On the other hand, error-only decoding requires to know the error locations, which can be computed by Berlekamp–Massey algorithm. I wonder if the miscorrection for error-only decoding comes from computing the wrong error locations? If yes, why the miscorrection rate is related to the correction capability of the RS code?
...ANSWER
Answered 2020-Oct-01 at 18:27miscorrection for error-only decoding comes from computing the wrong error locations
Yes. For example, consider an RS code with 6 parities, which can correct 3 errors. Assume that 4 errors have occurred, and that a 3 error correction attempt created an additional 3 errors, for a total of 7 errors. It will produce a valid codeword, but the wrong codeword.
There are situations where the probability of miscorrection can be lowered. If the message is a shortened message, say 64 bytes of data and 6 parities for a total of 70 bytes, then if a 3 error case produces an invalid location, miscorrection can be avoided. In this case, the odds of 3 random locations being valid is (70/255)^3 ~= .02 (2%).
Another way to avoid miscorrection is to not use all of the parities for correction. With 6 parities, the correction could be limited to 2 errors, leaving 2 parities for detection purposes. Or use 7 parities, for 3 error correction, with 1 parity used for detection.
Follow up based on comments:
First note that there are 3 decoders that can be used for BCH view Reed Solomon: PGZ (Peterson Gorenstein Zierler) matrix decoder, BKM (Berlekamp Massey) discrepancy decoder , and Sugiyama's extended Euclid decoder. PGZ has greater time complexity O((n-k)^3) than BKM or Euclid, so most implementations use BKM or Euclid. You can read a bit more about these decoders here:
https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction
Getting back to 6 parities, 4 errors. All valid RS(n+6, n) codewords differ from each other by at least 7 elements. If 4 of the elements of a message are in error, there may be a valid codeword that differs by only 3 more elements from that message with 4 error elements, and in this case, all 3 decoders will find that the message differs from a valid codeword by 3 elements, and "correct" those 3 elements to produce a valid codeword, but in this case the wrong valid codeword, which will differ by 7 elements from the original codeword. With 5 elements in error, a 2 or 3 error miscorrection could occur, and with 6 or more elements in error, a 1 or 2 or 3 error miscorrection could occur.
Invalid location - consider a RS code based on GF(2^8), which allows for a message size up to 255 bytes. Valid locations for a 255 byte message are 0 to 254. If the message size is less than 255 bytes, for example 64 data + 6 parity = 70 bytes, then locations 0 to 69 are valid, while locations 70 to 254 are invalid. In what would otherwise be a case of miscorrection, if a calculated location is out of range, then the decoder has detected an uncorrectable message, rather than miscorrect it. Assume a garbled message and that the decoder generates 3 random locations in the range 0 to 254, the probability of all 3 being in the range 0 to 69 is (70/255)^3.
Another case where miscorrection is avoided is when the number of distinct roots of the error locator polynomial does not match the degree of the polynomial. Consider a 3 error case with generated error locator polynomial x^3 + a x^2 + b x + c. If there are more than 3 errors in the message, then the generated polynomial may have less than 3 distinct roots, such as a double root, or zero roots, or ... , in which case miscorrection is avoided and the message is detected as being uncorrectable.
QUESTION
I am working on a object storage project where I need to understand Reed Solomon error correction algorithm,
I have gone through this Doc as a starter and also some thesis paper.
1. content.sakai.rutgers.edu
2. theseus.fi
but I can't seem to understand the lower part of the identity matrix (red box), where it is coming from. How this calculation is done?
Can anyone please explain this.
...ANSWER
Answered 2020-May-23 at 14:08The encoding matrix is a 6 x 4 Vandermonde matrix using the evaluation points {0 1 2 3 4 5} modified so that the upper 4 x 4 portion of the matrix is the identity matrix. To create the matrix, a 6 x 4 Vandermonde matrix is generated (where matrix[r][c] = pow(r,c) ), then multiplied with the inverse of the upper 4 x 4 portion of the matrix to produce the encoding matrix. This is the equivalent of "systematic encoding" with Reed Solomon's "original view" as mentioned in the Wikipedia article you linked to above, which is different than Reed Solomon's "BCH view", which links 1. and 2. refer to. The Wikipedia's example systematic encoding matrix is a transposed version of the encoding matrix used in the question.
https://en.wikipedia.org/wiki/Vandermonde_matrix
The code to generate the encoding matrix is near the bottom of this github source file:
QUESTION
Reed-Solomon algorithm is adding an additional data to the input, so potential errors (of particular size/quantity) on such damaged input can be corrected back to the original state. Correct? Does this algorithm protects also such added data not being part of the input, but used by the algorithm? If not, what happened if the error occurs in such non-input data part?
...ANSWER
Answered 2020-May-08 at 08:54An important aspect is that Reed-Solomon (RS) codes are cyclic: the set of codewords is stable by cyclic shift.
A consequence is that no particular part of a code word is more protected or less protected.
A RS code has a error correction capability equal to t = (n-k)/2, where n is the code length (generally expressed in bytes) and k is the information part length.
If the total number of errors (in both parts) is less than t, the RS decoder will be able to correct the errors (more precisely, the t erroneous bytes in the general case). If it is higher, the errors cannot be corrected (but could be detected, another story).
The emplacement of the errors, either in the information part or the added part, has no influence on the error correction capability.
EDIT: the rule t = (n-k)/2 that I mentioned is valid for Reed-Solomon codes. This rule is not generally correct for BCH codes: t <= (n-k)/2. However, with respect to your question, this does not change the answer: these families of code have a given capacity correction, corresponding to the minimum distance between codewords, the decoders can then correct t errors, whatever the position of the errors in the codeword
QUESTION
I have a combined data information that requires minimum 35 bits.
Using a 4-state barcode, each bar represents 2 bits, so the above mentioned information can be translated into 18 bars.
I would like to add some strong error correction to this barcode, so if it's somehow damaged, it can be corrected. One of such approach is Reed-Solomon error correction.
My goal is to add as strong error correction as possible, but on the other hand I have a size limitation on the barcode. If I understood the Reed-Solomon algorithm correctly, m∙k has to be at least the size of my message, i.e. 35 in my case.
Based on the Reed-Solomon Interactive Demo, I can go with (m, n, t, k) being (4, 15, 3, 9), which would allow me to code message up to 4∙9 = 36 bits. This would lead to code word of size 4∙15 = 60 bits, or 30 bars, but the error correction ratio t / n would be just 20.0%.
Next option is to go with (m, n, t, k) being (5, 31, 12, 7), which would allow me to code message up to 5∙7 = 35 bits. This would lead to code word of size 5∙31 = 155 bits, or 78 bars, and the error correction ratio t / n would be ~38.7%.
The first scenario requires use of barcode with 30 bars, which is nice, but 20.0% error correction is not as great as desired. The second scenario offers excellent error correction of 38.7%, but the barcode would have to have 78 bars, which is too many.
Is there some other approach or a different method, that would offer great error correction and a reasonable barcode length?
...ANSWER
Answered 2020-May-07 at 21:51You could use a shortened code word such as (5, 19, 6, 7) 31.5% correction ratio, 95 bits, 48 bars. One advantage of a shortened code word is reduced chance of mis-correction if it is allowed to correct the maximum of 6 errors. If any of the 6 error locations is outside of the range of valid locations, that is an indication of that there are more than 6 errors. The probability of mis-correction is about (19/31)^6 = 5.3%.
QUESTION
CDROM data use a 3rd layer of error detection using Reed-Solomon and an EDC using a 32_bits CRC polynomial.
The ECMA 130 standard define the EDC CRC polynomial as follow (page 16, 14.3):
P(X) = (X^16 + x^15 + x^2 + 1).(x^16 + x^2 + x + 1)
and
The least significant bit of a data byte is used first.
Usually, translating the polynomial into its integer value form is pretty straightforward. Using modulo math, the extended polynomial must be P(X) = x^32 + x^31 + x^18 + x^17 + x^16 + x^15 + x^4 + x^3 + x^2 + x + 1
, thus the value being 0x8007801F
The last sentence means that the polynomial is reversed (if I get it right).
But I didn't managed to get the right value so far. The Cdrtools source code use 0x08001801 as polynomial value. Can someone explain how did they find that value?
...ANSWER
Answered 2020-May-06 at 08:13Posting the answer :
First, I made a mistake in the modulo-2 algebra used to expand the polynomial. Non-modulo expanded form is :
QUESTION
I am trying to use the the Schifra Reed-Solomon error correcting code library in a project. I have no background about how Reed-Solomon code and Galois field work. I am having trouble figuring out the ideal value of the generator_polynomial_index for a 16 bit symbol(field descriptor).
My code works for the the index 0 and many others. I have tried all values of index. The code works for a lot of them (0-32724 & 32779-65485 to be precise) but
Questions- What is the most ideal value?
- What is changing if I switch to another value of index(which also works but is not ideal)?
field_descriptor = symbol size(bits/symbol)
code_length(Total number of symbols(data symbols + error-correction code symbols)) = 2^symbol_size - 1 (Library only supports this value of code length)
generator_polynomial_root_count = fec_length(redundancy or number of error-correction symbols)
Errors are measured in symbols and not bits i.e. 1 incorrect bit in a particular symbol is counted as 1 error. But even if all 16 bits are incorrect; it would count as 1 error(not 16).
The maximum number of errors and erasures which can be rectified should obey the following inequality: 2*num_errors + num_erasures < fec_length
Please correct me if I am mistaken anywhere
...ANSWER
Answered 2019-Sep-05 at 03:44what is the ideal value of the generator_polynomial_index
There probably isn't an "ideal" value.
I had to look at the github code to determine that the generator field index is the log of the first consecutive root of the generator polynomial.
Typically the index is 0 (first consecutive root == 1) or 1 (first consecutive root == Alpha (the field primitive)). Choosing index = 1 is used for a "narrow sense" code. It slightly simplifies the Forney Algorithm. Link to wiki article, where "c" represents the log of the first consecutive root (it lists the roots as a^c, a^(c+1), ...):
https://en.wikipedia.org/wiki/Forney_algorithm
Why use a narrow sense code:
For hardware, the number of unique coefficients can be reduced by using a self-reciprocal generator polynomial, where the first consecutive root is chosen so that the generator polynomial is of the form: 1 x^n + a x^(n-1) + b x^(n-2) + ... + b x^2 + a x + 1. For 32 roots in GF(2^16), the first consecutive root is alpha^((65536-32)/2) = alpha^32752, and the last consecutive root would be alpha^32783. Note this is only possible with a binary field GF(2^n), and not possible for non-binary fields such as GF(929) (929 is a prime number). The question shows a range for index that doesn't include 32752; if 32752 doesn't work with this library, it's due to some limitation in the library, and not with Reed Solomon error correction algorithms.
Other than these 3 cases, index = 0, 1, or self-reciprocal generator polynomial, I'm not aware of any reason to choose a different index. It's unlikely that the choice of index has any effect on trying to decode beyond the normal limits.
The maximum number of errors and erasures which can be rectified should obey the following inequality: 2*num_errors + num_erasures < fec_length
That should be
QUESTION
I am setting up a Reed Solomon library to correct AND detect errors as the come in. For simplicity, let's look at a Reed Solomon configuration where
...ANSWER
Answered 2019-Jun-24 at 16:22With two parity symbols, the syndromes will only be unique for single symbol errors, which is why they can be used to correct single symbol errors. In the case of two symbol errors, the syndromes will be non-zero (one of them may be zero, but not both), but not always unique, for all combinations of two error locations and two error values (which is why two symbol errors can't be corrected if there are only two parity symbols).
With two parity symbols, the Hamming distance is 3 symbols. Every valid (zero syndromes == exact multiple of the generator polynomial) codeword differs by at least 3 symbols from every other valid code word, so no 2 symbol error case will appear to be a valid (zero syndrome) codeword.
It is possible for a 3 or more error case combination of error locations and values to produce syndromes == 0. The simplest example of this is to take a valid codeword (zero error message), and xor the 3 symbol generator polynomial anywhere within the message, which will be another valid codeword (an exact multiple of the generator polynomial).
In addition, there is a maximum length codeword. For BCH type Reed Solomon code, which is what you are using, for GF(2^n), it's (2^n)-1 symbols. If a message contains 2^n or more symbols (including the parity symbols), then a two error case with the same error value at message[i] and message[i + 2^n - 1] will produce syndromes of zero. For the original view type Reed Solomon code, the maximum length codeword is 2^n (one more symbol than BCH type), but it's rarely used since decoding operates on the entire message, while BCH decoding operates on syndromes.
Update - I forgot to mention that with two parity symbols, attempting to perform a one error correction on a two error message may end up causing a third error, which will result in a valid codeword (syndromes will be zero), but one that differs from the original codeword in three locations.
The probability of this happening is reduced if the codeword is shortened, as any calculated location that is not within range of the shortened codeword would be a detected error. If there are n symbols (including the parity symbols), then the probability of of a single error mis-correction based on the calculated location being within range is about n/255.
In this case, with a codeword size of 4 bytes, I found 3060 out of a possible 390150 two error cases that would create a valid code word by doing a single error correction that ends up creating a third error.
QUESTION
I,m trying encode, send and put some noise,and decode an Image in Python app using Reed-Solomon coder
I have converted imagage from PIL to numpy array. Now I'm trying to encode this array and then decode it. But I have problem with code word. It's too long. Does anyone know how to solve this problem. Thank You in advance
Error message: ValueError: Message length is max 223. Message was 226
...ANSWER
Answered 2019-May-03 at 16:06The ReedSolomon package's github page clearly indicates that you cannot encode arrays that are larger than k (223 in your case). This means that you have to split your image first before encoding it. You can split it into chunks of 223 and then work on the encoded chunks:
QUESTION
As a QR-Code uses Reed-Solomon for error correction, am I right, that with certain levels of corruption, a QR-Code reader could theoretically return wrong results?
If yes, are there other levels of integrity checks (checksums etc.) that would prevent that?
...ANSWER
Answered 2019-Mar-06 at 22:59You can do a web search for "QR Code ISO", to find a pdf version of the document. I found one here:
https://www.swisseduc.ch/informatik/theoretische_informatik/qr_codes/docs/qr_standard.pdf
There are multiple strengths of error correction in the standard, and to avoid mis-correction, in some cases, some of the "parity" bytes are only used for error detection, and not for error correction. This is shown in table 13 in the pdf file linked to above. The ones marked with a "b" are cases where some of the parity bytes are used only for error detection. For example, the very first entry in table 13 shows (26,19,2)b, which means 26 total bytes, 19 data bytes, and 2 byte correction, which means of the 26-19 = 7 parity bytes, 4 are used for correction (each corrected byte requires 2 parity bytes unless hardware can flag "erasures"), and 3 are used for detection only.
If the error correction calculates an invalid location (one that is "outside" the range of valid locations), that will be flagged as a detected error. If the number of unique calculated locations is less than than the number of assumed errors used to calculate those locations (duplicate or non-existing root) that will be flagged as a detected error. For higher levels of error correction, the odds of all the calculated locations being valid for bad data is so small that none of the parity bytes are used for error detection only. These cases don't have the "b" in their table 13 entries.
The choices made for the various levels of error correction result in a very small chance of a bad result, but it's always possible.
are there other levels of integrity checks (checksums etc.) that would prevent that?
A QR-Code reader could flags bytes where any of the bits were not clearly 0 or 1 (like a shade of grey on a black / white code) as potential "erasures", which would lower the odds of a bad result. I don't know if this is done.
When generating a QR-Code, a mask is chosen to even out the ratio of light and dark areas in a code, and after correction, if there is evidence that the wrong mask was chosen, that could be flagged as a detected error, but I'm not sure if the "best" mask is always chosen when a code is printed, so I don't know if a check for "best" mask is used.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install reed-solomon
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page