Reed-Solomon | Reed Solomon BCH encoder and decoder | Messaging library
kandi X-RAY | Reed-Solomon Summary
kandi X-RAY | Reed-Solomon Summary
This RS implementation was designed for embedded purposes, so all the memory allocations performed on the stack. If somebody want to reimplement memory management with heap usege, pull requests are welcome.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of Reed-Solomon
Reed-Solomon Key Features
Reed-Solomon Examples and Code Snippets
Community Discussions
Trending Discussions on Reed-Solomon
QUESTION
I've some questions related to the RAID storage structure. First, let me show a picture of RAID system. [RAID][1][1]: https://i.stack.imgur.com/Whd6B.png. According to my understanding, each disk is partitioned into multiple strips. The encoding or decoding is done along a stripe, which is the collection of the strips. Let says each stripe consists of k data strips and t parity strips. If I'm using Reed-Solomon (RS) code constructed over GF(2^w), each strip with a size of "q" byte will be divided into q/k symbols and each symbol consists of w-bit.
My questions:
- When we talk about RS en/decoding, are we treat each strip as a RS symbol or each w-bit in the strip as RS symbol, although each w-bit in a strip is multiplying with the same w-bit element in GF(2^w). Description:
- When I was working on the software implementation of RAID system, the
things I've learn from each Research paper, especially from this one:
"A Performance Evaluation and Examination of Open-Source Erasure
Coding Libraries For Storage", is that the t parity strip, P=[p0,p1,...,p_{t-1}] is computed as matrix multiplication as P=CxD, where D=[d0,d1,...,d_{k-1}] and C is t x k Cauchy matrix (I'm using Cauchy matrix-based encoding as an example). - When I was reading another paper, which focus on hardware implementation of RS coding: "A Low Complexity Design of Reed Solomon Code Algorithm for Advanced RAID System", they seems to refering each strip as RS symbol. I mean how is that possible. Let says we are using GF(2^8), can a size of strip becomes only 8-bit? OR, in hardware implementation, people simply uses a higher order of finite field to construct the RAID system?
- I sometime see people describe the RAID storage system as drives, where "each drive is divided into multiple strips". So, what is the difference between the terms drive and disk? Are they used interchangbly?
ANSWER
Answered 2020-Oct-14 at 16:01each strip
Each group of blocks can be considered to be a matrix, let r = number of disks, and c = # bytes per block, then the matrix is a r row by c column matrix. The Reed Solomon like encoding and decoding are performed on each column of the matrix, where each column of the matrix is treated as an array of r bytes.
open source erasure
Erasure encoding may be different than Raid encoding. Erasure encoding can be based on a modified Vandermonde matrix, the transpose of what Wikipedia calls the systematic encoding for original view Reed Solomon:
Erasure can also be based on Cauchy matrix as shown in the paper you read.
For Raid, the encoding matrix rows corresponding to the P parity is all 1's effectively XOR, Q parity is powers of 2 (in GF(2^8)), R parity is powers of 4, ... . Unlike standard Reed Solomon, which generates parities (based on remainder of a generating polynomial), Raid generates syndromes.
In all 3 cases, the encoding matrix is the identity matrix augmented by the rows to encode the parities, but the augmented rows are different for Vandermonde, Cauchy, and Raid.
drive or disk
In this case, they mean the same thing.
encoding
Let d = number of data rows per strip, and p = number of parity rows per strip, then encoding can be implemented with a matrix multiply using the last p rows of the encoding matrix, a p by d encoding sub-matrix times the d by c matrix of data rows.
decoding
Open source erasure is done in two steps. Let e = number of erased data rows. The e erased data rows are regenerated by taking the first d rows of the encoding matrix corresponding (same row index) to the non-erased rows of data, inverting the d by d sub-matrix, then the e rows of the inverted matrix corresponding to the erased data rows results in an e by d matrix used to multiply the d by c non-erased data row matrix to regenerate the data. Once the data is regenerated, any parity rows are regenerated by re-encoding the now regenerated data.
Raid decoding is different. If there is only a single erasure in the data or P rows, then XOR is used to regenerate the erased row. If there is only a single erasure in the Q, R, ... rows, then the erased row is reencoded. For multiple row erasures, GF() algebra is normally used to regenerate erased data. This could also be done by creating an e by e matrix based on powers of 2 to the indexes of the erased rows, inverting the e by e matrix, then multiplying the inverted e by e times the first e of the p encoding rows of the encoding matrix, then using that e by c matrix to regenerate the erased data.
Note that neither of these decoding methods are conventional Reed Solomon decoding methods which are intended to correct errors as well as erasures. The modified Vandermonde matrix is the same encoding as original view systematic encoding, but erasure decoding would be the same as described above.
If the encoding was based on what Wiki calls "BCH view", then the parity rows are actual parities (remainder from generator polynomial), and during decoding, e rows of syndromes are generated, and once the e rows of syndromes are generated, then a Raid like regeneration of erased data could be done. I'm not aware of an erasure code based on "BCH view" encoding.
https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Systematic_encoding_procedure
Update based on comments.
Raid (standard version) does not use Cauchy matrix. Raid does not use modified Vandermonde matrix either. Cauchy and modified Vandermonde matrix are mostly used by jerasure type algorihtms, not Raid. Raid encoding is based on powers of 2, {1 1 1 1 1 ... } {2 4 8 16 ...}, {4 16 64 ... }.
Raid is byte oriented, not bit oriented. Each column of a "strip" is treated as an array of bytes.
Raid (and erasure) encoding and decoding is performed on the columns of a matrix, treating each column as an independent array of bytes.
As for the limitation of operating within GF(2^8), this limits Raid to 255 data drives and 255 parity (syndromes) drives. Jerasure is limited to a total of 255 (BCH view) or 256 (original view) drives for both data and parity. I"m not aware of any implementation that comes close to these limits. Raid 6 (2 parities, P and Q) specifies a max of 32 drives, but that is an arbitrary choice. A common jerasure implementation splits up a file into 17 logical strips called shards and adds 3 shards for parity, where each shard is typically stored on a different Raid block of drives or on a different node or on a different server.
QUESTION
I've a question regarding to a statement written in this paper, "Generalized Integrated Interleaved Codes". The paper mentions that erasure decoding of Reed-Solomon (RS) code incurs no miscorrection, but error-only decoding of RS code incurs a high miscorrection rate if the correction capability is too low.
From my understanding, I think the difference between erasure decoding and error-only decoding is that erasure decoding does not require to compute the error locations. On the other hand, error-only decoding requires to know the error locations, which can be computed by Berlekamp–Massey algorithm. I wonder if the miscorrection for error-only decoding comes from computing the wrong error locations? If yes, why the miscorrection rate is related to the correction capability of the RS code?
...ANSWER
Answered 2020-Oct-01 at 18:27miscorrection for error-only decoding comes from computing the wrong error locations
Yes. For example, consider an RS code with 6 parities, which can correct 3 errors. Assume that 4 errors have occurred, and that a 3 error correction attempt created an additional 3 errors, for a total of 7 errors. It will produce a valid codeword, but the wrong codeword.
There are situations where the probability of miscorrection can be lowered. If the message is a shortened message, say 64 bytes of data and 6 parities for a total of 70 bytes, then if a 3 error case produces an invalid location, miscorrection can be avoided. In this case, the odds of 3 random locations being valid is (70/255)^3 ~= .02 (2%).
Another way to avoid miscorrection is to not use all of the parities for correction. With 6 parities, the correction could be limited to 2 errors, leaving 2 parities for detection purposes. Or use 7 parities, for 3 error correction, with 1 parity used for detection.
Follow up based on comments:
First note that there are 3 decoders that can be used for BCH view Reed Solomon: PGZ (Peterson Gorenstein Zierler) matrix decoder, BKM (Berlekamp Massey) discrepancy decoder , and Sugiyama's extended Euclid decoder. PGZ has greater time complexity O((n-k)^3) than BKM or Euclid, so most implementations use BKM or Euclid. You can read a bit more about these decoders here:
https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction
Getting back to 6 parities, 4 errors. All valid RS(n+6, n) codewords differ from each other by at least 7 elements. If 4 of the elements of a message are in error, there may be a valid codeword that differs by only 3 more elements from that message with 4 error elements, and in this case, all 3 decoders will find that the message differs from a valid codeword by 3 elements, and "correct" those 3 elements to produce a valid codeword, but in this case the wrong valid codeword, which will differ by 7 elements from the original codeword. With 5 elements in error, a 2 or 3 error miscorrection could occur, and with 6 or more elements in error, a 1 or 2 or 3 error miscorrection could occur.
Invalid location - consider a RS code based on GF(2^8), which allows for a message size up to 255 bytes. Valid locations for a 255 byte message are 0 to 254. If the message size is less than 255 bytes, for example 64 data + 6 parity = 70 bytes, then locations 0 to 69 are valid, while locations 70 to 254 are invalid. In what would otherwise be a case of miscorrection, if a calculated location is out of range, then the decoder has detected an uncorrectable message, rather than miscorrect it. Assume a garbled message and that the decoder generates 3 random locations in the range 0 to 254, the probability of all 3 being in the range 0 to 69 is (70/255)^3.
Another case where miscorrection is avoided is when the number of distinct roots of the error locator polynomial does not match the degree of the polynomial. Consider a 3 error case with generated error locator polynomial x^3 + a x^2 + b x + c. If there are more than 3 errors in the message, then the generated polynomial may have less than 3 distinct roots, such as a double root, or zero roots, or ... , in which case miscorrection is avoided and the message is detected as being uncorrectable.
QUESTION
I am working on a object storage project where I need to understand Reed Solomon error correction algorithm,
I have gone through this Doc as a starter and also some thesis paper.
1. content.sakai.rutgers.edu
2. theseus.fi
but I can't seem to understand the lower part of the identity matrix (red box), where it is coming from. How this calculation is done?
Can anyone please explain this.
...ANSWER
Answered 2020-May-23 at 14:08The encoding matrix is a 6 x 4 Vandermonde matrix using the evaluation points {0 1 2 3 4 5} modified so that the upper 4 x 4 portion of the matrix is the identity matrix. To create the matrix, a 6 x 4 Vandermonde matrix is generated (where matrix[r][c] = pow(r,c) ), then multiplied with the inverse of the upper 4 x 4 portion of the matrix to produce the encoding matrix. This is the equivalent of "systematic encoding" with Reed Solomon's "original view" as mentioned in the Wikipedia article you linked to above, which is different than Reed Solomon's "BCH view", which links 1. and 2. refer to. The Wikipedia's example systematic encoding matrix is a transposed version of the encoding matrix used in the question.
https://en.wikipedia.org/wiki/Vandermonde_matrix
The code to generate the encoding matrix is near the bottom of this github source file:
QUESTION
Reed-Solomon algorithm is adding an additional data to the input, so potential errors (of particular size/quantity) on such damaged input can be corrected back to the original state. Correct? Does this algorithm protects also such added data not being part of the input, but used by the algorithm? If not, what happened if the error occurs in such non-input data part?
...ANSWER
Answered 2020-May-08 at 08:54An important aspect is that Reed-Solomon (RS) codes are cyclic: the set of codewords is stable by cyclic shift.
A consequence is that no particular part of a code word is more protected or less protected.
A RS code has a error correction capability equal to t = (n-k)/2, where n is the code length (generally expressed in bytes) and k is the information part length.
If the total number of errors (in both parts) is less than t, the RS decoder will be able to correct the errors (more precisely, the t erroneous bytes in the general case). If it is higher, the errors cannot be corrected (but could be detected, another story).
The emplacement of the errors, either in the information part or the added part, has no influence on the error correction capability.
EDIT: the rule t = (n-k)/2 that I mentioned is valid for Reed-Solomon codes. This rule is not generally correct for BCH codes: t <= (n-k)/2. However, with respect to your question, this does not change the answer: these families of code have a given capacity correction, corresponding to the minimum distance between codewords, the decoders can then correct t errors, whatever the position of the errors in the codeword
QUESTION
I have a combined data information that requires minimum 35 bits.
Using a 4-state barcode, each bar represents 2 bits, so the above mentioned information can be translated into 18 bars.
I would like to add some strong error correction to this barcode, so if it's somehow damaged, it can be corrected. One of such approach is Reed-Solomon error correction.
My goal is to add as strong error correction as possible, but on the other hand I have a size limitation on the barcode. If I understood the Reed-Solomon algorithm correctly, m∙k has to be at least the size of my message, i.e. 35 in my case.
Based on the Reed-Solomon Interactive Demo, I can go with (m, n, t, k) being (4, 15, 3, 9), which would allow me to code message up to 4∙9 = 36 bits. This would lead to code word of size 4∙15 = 60 bits, or 30 bars, but the error correction ratio t / n would be just 20.0%.
Next option is to go with (m, n, t, k) being (5, 31, 12, 7), which would allow me to code message up to 5∙7 = 35 bits. This would lead to code word of size 5∙31 = 155 bits, or 78 bars, and the error correction ratio t / n would be ~38.7%.
The first scenario requires use of barcode with 30 bars, which is nice, but 20.0% error correction is not as great as desired. The second scenario offers excellent error correction of 38.7%, but the barcode would have to have 78 bars, which is too many.
Is there some other approach or a different method, that would offer great error correction and a reasonable barcode length?
...ANSWER
Answered 2020-May-07 at 21:51You could use a shortened code word such as (5, 19, 6, 7) 31.5% correction ratio, 95 bits, 48 bars. One advantage of a shortened code word is reduced chance of mis-correction if it is allowed to correct the maximum of 6 errors. If any of the 6 error locations is outside of the range of valid locations, that is an indication of that there are more than 6 errors. The probability of mis-correction is about (19/31)^6 = 5.3%.
QUESTION
CDROM data use a 3rd layer of error detection using Reed-Solomon and an EDC using a 32_bits CRC polynomial.
The ECMA 130 standard define the EDC CRC polynomial as follow (page 16, 14.3):
P(X) = (X^16 + x^15 + x^2 + 1).(x^16 + x^2 + x + 1)
and
The least significant bit of a data byte is used first.
Usually, translating the polynomial into its integer value form is pretty straightforward. Using modulo math, the extended polynomial must be P(X) = x^32 + x^31 + x^18 + x^17 + x^16 + x^15 + x^4 + x^3 + x^2 + x + 1
, thus the value being 0x8007801F
The last sentence means that the polynomial is reversed (if I get it right).
But I didn't managed to get the right value so far. The Cdrtools source code use 0x08001801 as polynomial value. Can someone explain how did they find that value?
...ANSWER
Answered 2020-May-06 at 08:13Posting the answer :
First, I made a mistake in the modulo-2 algebra used to expand the polynomial. Non-modulo expanded form is :
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install Reed-Solomon
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page