143f6c078SMauro Carvalho Chehab========================== 243f6c078SMauro Carvalho ChehabNAND Error-correction Code 343f6c078SMauro Carvalho Chehab========================== 443f6c078SMauro Carvalho Chehab 543f6c078SMauro Carvalho ChehabIntroduction 643f6c078SMauro Carvalho Chehab============ 743f6c078SMauro Carvalho Chehab 8*131ce3edSMiquel RaynalHaving looked at the linux mtd/nand Hamming software ECC engine driver 943f6c078SMauro Carvalho ChehabI felt there was room for optimisation. I bashed the code for a few hours 1043f6c078SMauro Carvalho Chehabperforming tricks like table lookup removing superfluous code etc. 1143f6c078SMauro Carvalho ChehabAfter that the speed was increased by 35-40%. 1243f6c078SMauro Carvalho ChehabStill I was not too happy as I felt there was additional room for improvement. 1343f6c078SMauro Carvalho Chehab 1443f6c078SMauro Carvalho ChehabBad! I was hooked. 1543f6c078SMauro Carvalho ChehabI decided to annotate my steps in this file. Perhaps it is useful to someone 1643f6c078SMauro Carvalho Chehabor someone learns something from it. 1743f6c078SMauro Carvalho Chehab 1843f6c078SMauro Carvalho Chehab 1943f6c078SMauro Carvalho ChehabThe problem 2043f6c078SMauro Carvalho Chehab=========== 2143f6c078SMauro Carvalho Chehab 2243f6c078SMauro Carvalho ChehabNAND flash (at least SLC one) typically has sectors of 256 bytes. 2343f6c078SMauro Carvalho ChehabHowever NAND flash is not extremely reliable so some error detection 2443f6c078SMauro Carvalho Chehab(and sometimes correction) is needed. 2543f6c078SMauro Carvalho Chehab 2643f6c078SMauro Carvalho ChehabThis is done by means of a Hamming code. I'll try to explain it in 2743f6c078SMauro Carvalho Chehablaymans terms (and apologies to all the pro's in the field in case I do 2843f6c078SMauro Carvalho Chehabnot use the right terminology, my coding theory class was almost 30 2943f6c078SMauro Carvalho Chehabyears ago, and I must admit it was not one of my favourites). 3043f6c078SMauro Carvalho Chehab 3143f6c078SMauro Carvalho ChehabAs I said before the ecc calculation is performed on sectors of 256 3243f6c078SMauro Carvalho Chehabbytes. This is done by calculating several parity bits over the rows and 3343f6c078SMauro Carvalho Chehabcolumns. The parity used is even parity which means that the parity bit = 1 3443f6c078SMauro Carvalho Chehabif the data over which the parity is calculated is 1 and the parity bit = 0 3543f6c078SMauro Carvalho Chehabif the data over which the parity is calculated is 0. So the total 3643f6c078SMauro Carvalho Chehabnumber of bits over the data over which the parity is calculated + the 3743f6c078SMauro Carvalho Chehabparity bit is even. (see wikipedia if you can't follow this). 3843f6c078SMauro Carvalho ChehabParity is often calculated by means of an exclusive or operation, 3943f6c078SMauro Carvalho Chehabsometimes also referred to as xor. In C the operator for xor is ^ 4043f6c078SMauro Carvalho Chehab 4143f6c078SMauro Carvalho ChehabBack to ecc. 4243f6c078SMauro Carvalho ChehabLet's give a small figure: 4343f6c078SMauro Carvalho Chehab 4443f6c078SMauro Carvalho Chehab========= ==== ==== ==== ==== ==== ==== ==== ==== === === === === ==== 4543f6c078SMauro Carvalho Chehabbyte 0: bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0 rp0 rp2 rp4 ... rp14 4643f6c078SMauro Carvalho Chehabbyte 1: bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0 rp1 rp2 rp4 ... rp14 4743f6c078SMauro Carvalho Chehabbyte 2: bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0 rp0 rp3 rp4 ... rp14 4843f6c078SMauro Carvalho Chehabbyte 3: bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0 rp1 rp3 rp4 ... rp14 4943f6c078SMauro Carvalho Chehabbyte 4: bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0 rp0 rp2 rp5 ... rp14 5043f6c078SMauro Carvalho Chehab... 5143f6c078SMauro Carvalho Chehabbyte 254: bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0 rp0 rp3 rp5 ... rp15 5243f6c078SMauro Carvalho Chehabbyte 255: bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0 rp1 rp3 rp5 ... rp15 5343f6c078SMauro Carvalho Chehab cp1 cp0 cp1 cp0 cp1 cp0 cp1 cp0 5443f6c078SMauro Carvalho Chehab cp3 cp3 cp2 cp2 cp3 cp3 cp2 cp2 5543f6c078SMauro Carvalho Chehab cp5 cp5 cp5 cp5 cp4 cp4 cp4 cp4 5643f6c078SMauro Carvalho Chehab========= ==== ==== ==== ==== ==== ==== ==== ==== === === === === ==== 5743f6c078SMauro Carvalho Chehab 5843f6c078SMauro Carvalho ChehabThis figure represents a sector of 256 bytes. 5943f6c078SMauro Carvalho Chehabcp is my abbreviation for column parity, rp for row parity. 6043f6c078SMauro Carvalho Chehab 6143f6c078SMauro Carvalho ChehabLet's start to explain column parity. 6243f6c078SMauro Carvalho Chehab 6343f6c078SMauro Carvalho Chehab- cp0 is the parity that belongs to all bit0, bit2, bit4, bit6. 6443f6c078SMauro Carvalho Chehab 6543f6c078SMauro Carvalho Chehab so the sum of all bit0, bit2, bit4 and bit6 values + cp0 itself is even. 6643f6c078SMauro Carvalho Chehab 6743f6c078SMauro Carvalho ChehabSimilarly cp1 is the sum of all bit1, bit3, bit5 and bit7. 6843f6c078SMauro Carvalho Chehab 6943f6c078SMauro Carvalho Chehab- cp2 is the parity over bit0, bit1, bit4 and bit5 7043f6c078SMauro Carvalho Chehab- cp3 is the parity over bit2, bit3, bit6 and bit7. 7143f6c078SMauro Carvalho Chehab- cp4 is the parity over bit0, bit1, bit2 and bit3. 7243f6c078SMauro Carvalho Chehab- cp5 is the parity over bit4, bit5, bit6 and bit7. 7343f6c078SMauro Carvalho Chehab 7443f6c078SMauro Carvalho ChehabNote that each of cp0 .. cp5 is exactly one bit. 7543f6c078SMauro Carvalho Chehab 7643f6c078SMauro Carvalho ChehabRow parity actually works almost the same. 7743f6c078SMauro Carvalho Chehab 7843f6c078SMauro Carvalho Chehab- rp0 is the parity of all even bytes (0, 2, 4, 6, ... 252, 254) 7943f6c078SMauro Carvalho Chehab- rp1 is the parity of all odd bytes (1, 3, 5, 7, ..., 253, 255) 8043f6c078SMauro Carvalho Chehab- rp2 is the parity of all bytes 0, 1, 4, 5, 8, 9, ... 8143f6c078SMauro Carvalho Chehab (so handle two bytes, then skip 2 bytes). 8243f6c078SMauro Carvalho Chehab- rp3 is covers the half rp2 does not cover (bytes 2, 3, 6, 7, 10, 11, ...) 8343f6c078SMauro Carvalho Chehab- for rp4 the rule is cover 4 bytes, skip 4 bytes, cover 4 bytes, skip 4 etc. 8443f6c078SMauro Carvalho Chehab 8543f6c078SMauro Carvalho Chehab so rp4 calculates parity over bytes 0, 1, 2, 3, 8, 9, 10, 11, 16, ...) 8643f6c078SMauro Carvalho Chehab- and rp5 covers the other half, so bytes 4, 5, 6, 7, 12, 13, 14, 15, 20, .. 8743f6c078SMauro Carvalho Chehab 8843f6c078SMauro Carvalho ChehabThe story now becomes quite boring. I guess you get the idea. 8943f6c078SMauro Carvalho Chehab 9043f6c078SMauro Carvalho Chehab- rp6 covers 8 bytes then skips 8 etc 9143f6c078SMauro Carvalho Chehab- rp7 skips 8 bytes then covers 8 etc 9243f6c078SMauro Carvalho Chehab- rp8 covers 16 bytes then skips 16 etc 9343f6c078SMauro Carvalho Chehab- rp9 skips 16 bytes then covers 16 etc 9443f6c078SMauro Carvalho Chehab- rp10 covers 32 bytes then skips 32 etc 9543f6c078SMauro Carvalho Chehab- rp11 skips 32 bytes then covers 32 etc 9643f6c078SMauro Carvalho Chehab- rp12 covers 64 bytes then skips 64 etc 9743f6c078SMauro Carvalho Chehab- rp13 skips 64 bytes then covers 64 etc 9843f6c078SMauro Carvalho Chehab- rp14 covers 128 bytes then skips 128 9943f6c078SMauro Carvalho Chehab- rp15 skips 128 bytes then covers 128 10043f6c078SMauro Carvalho Chehab 10143f6c078SMauro Carvalho ChehabIn the end the parity bits are grouped together in three bytes as 10243f6c078SMauro Carvalho Chehabfollows: 10343f6c078SMauro Carvalho Chehab 10443f6c078SMauro Carvalho Chehab===== ===== ===== ===== ===== ===== ===== ===== ===== 10543f6c078SMauro Carvalho ChehabECC Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0 10643f6c078SMauro Carvalho Chehab===== ===== ===== ===== ===== ===== ===== ===== ===== 10743f6c078SMauro Carvalho ChehabECC 0 rp07 rp06 rp05 rp04 rp03 rp02 rp01 rp00 10843f6c078SMauro Carvalho ChehabECC 1 rp15 rp14 rp13 rp12 rp11 rp10 rp09 rp08 10943f6c078SMauro Carvalho ChehabECC 2 cp5 cp4 cp3 cp2 cp1 cp0 1 1 11043f6c078SMauro Carvalho Chehab===== ===== ===== ===== ===== ===== ===== ===== ===== 11143f6c078SMauro Carvalho Chehab 11243f6c078SMauro Carvalho ChehabI detected after writing this that ST application note AN1823 11343f6c078SMauro Carvalho Chehab(http://www.st.com/stonline/) gives a much 11443f6c078SMauro Carvalho Chehabnicer picture.(but they use line parity as term where I use row parity) 11543f6c078SMauro Carvalho ChehabOh well, I'm graphically challenged, so suffer with me for a moment :-) 11643f6c078SMauro Carvalho Chehab 11743f6c078SMauro Carvalho ChehabAnd I could not reuse the ST picture anyway for copyright reasons. 11843f6c078SMauro Carvalho Chehab 11943f6c078SMauro Carvalho Chehab 12043f6c078SMauro Carvalho ChehabAttempt 0 12143f6c078SMauro Carvalho Chehab========= 12243f6c078SMauro Carvalho Chehab 12343f6c078SMauro Carvalho ChehabImplementing the parity calculation is pretty simple. 12443f6c078SMauro Carvalho ChehabIn C pseudocode:: 12543f6c078SMauro Carvalho Chehab 12643f6c078SMauro Carvalho Chehab for (i = 0; i < 256; i++) 12743f6c078SMauro Carvalho Chehab { 12843f6c078SMauro Carvalho Chehab if (i & 0x01) 12943f6c078SMauro Carvalho Chehab rp1 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp1; 13043f6c078SMauro Carvalho Chehab else 13143f6c078SMauro Carvalho Chehab rp0 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp0; 13243f6c078SMauro Carvalho Chehab if (i & 0x02) 13343f6c078SMauro Carvalho Chehab rp3 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp3; 13443f6c078SMauro Carvalho Chehab else 13543f6c078SMauro Carvalho Chehab rp2 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp2; 13643f6c078SMauro Carvalho Chehab if (i & 0x04) 13743f6c078SMauro Carvalho Chehab rp5 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp5; 13843f6c078SMauro Carvalho Chehab else 13943f6c078SMauro Carvalho Chehab rp4 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp4; 14043f6c078SMauro Carvalho Chehab if (i & 0x08) 14143f6c078SMauro Carvalho Chehab rp7 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp7; 14243f6c078SMauro Carvalho Chehab else 14343f6c078SMauro Carvalho Chehab rp6 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp6; 14443f6c078SMauro Carvalho Chehab if (i & 0x10) 14543f6c078SMauro Carvalho Chehab rp9 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp9; 14643f6c078SMauro Carvalho Chehab else 14743f6c078SMauro Carvalho Chehab rp8 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp8; 14843f6c078SMauro Carvalho Chehab if (i & 0x20) 14943f6c078SMauro Carvalho Chehab rp11 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp11; 15043f6c078SMauro Carvalho Chehab else 15143f6c078SMauro Carvalho Chehab rp10 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp10; 15243f6c078SMauro Carvalho Chehab if (i & 0x40) 15343f6c078SMauro Carvalho Chehab rp13 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp13; 15443f6c078SMauro Carvalho Chehab else 15543f6c078SMauro Carvalho Chehab rp12 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp12; 15643f6c078SMauro Carvalho Chehab if (i & 0x80) 15743f6c078SMauro Carvalho Chehab rp15 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp15; 15843f6c078SMauro Carvalho Chehab else 15943f6c078SMauro Carvalho Chehab rp14 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp14; 16043f6c078SMauro Carvalho Chehab cp0 = bit6 ^ bit4 ^ bit2 ^ bit0 ^ cp0; 16143f6c078SMauro Carvalho Chehab cp1 = bit7 ^ bit5 ^ bit3 ^ bit1 ^ cp1; 16243f6c078SMauro Carvalho Chehab cp2 = bit5 ^ bit4 ^ bit1 ^ bit0 ^ cp2; 16343f6c078SMauro Carvalho Chehab cp3 = bit7 ^ bit6 ^ bit3 ^ bit2 ^ cp3 16443f6c078SMauro Carvalho Chehab cp4 = bit3 ^ bit2 ^ bit1 ^ bit0 ^ cp4 16543f6c078SMauro Carvalho Chehab cp5 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ cp5 16643f6c078SMauro Carvalho Chehab } 16743f6c078SMauro Carvalho Chehab 16843f6c078SMauro Carvalho Chehab 16943f6c078SMauro Carvalho ChehabAnalysis 0 17043f6c078SMauro Carvalho Chehab========== 17143f6c078SMauro Carvalho Chehab 17243f6c078SMauro Carvalho ChehabC does have bitwise operators but not really operators to do the above 17343f6c078SMauro Carvalho Chehabefficiently (and most hardware has no such instructions either). 17443f6c078SMauro Carvalho ChehabTherefore without implementing this it was clear that the code above was 17543f6c078SMauro Carvalho Chehabnot going to bring me a Nobel prize :-) 17643f6c078SMauro Carvalho Chehab 17743f6c078SMauro Carvalho ChehabFortunately the exclusive or operation is commutative, so we can combine 17843f6c078SMauro Carvalho Chehabthe values in any order. So instead of calculating all the bits 17943f6c078SMauro Carvalho Chehabindividually, let us try to rearrange things. 18043f6c078SMauro Carvalho ChehabFor the column parity this is easy. We can just xor the bytes and in the 18143f6c078SMauro Carvalho Chehabend filter out the relevant bits. This is pretty nice as it will bring 18243f6c078SMauro Carvalho Chehaball cp calculation out of the for loop. 18343f6c078SMauro Carvalho Chehab 18443f6c078SMauro Carvalho ChehabSimilarly we can first xor the bytes for the various rows. 18543f6c078SMauro Carvalho ChehabThis leads to: 18643f6c078SMauro Carvalho Chehab 18743f6c078SMauro Carvalho Chehab 18843f6c078SMauro Carvalho ChehabAttempt 1 18943f6c078SMauro Carvalho Chehab========= 19043f6c078SMauro Carvalho Chehab 19143f6c078SMauro Carvalho Chehab:: 19243f6c078SMauro Carvalho Chehab 19343f6c078SMauro Carvalho Chehab const char parity[256] = { 19443f6c078SMauro Carvalho Chehab 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 19543f6c078SMauro Carvalho Chehab 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 19643f6c078SMauro Carvalho Chehab 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 19743f6c078SMauro Carvalho Chehab 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 19843f6c078SMauro Carvalho Chehab 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 19943f6c078SMauro Carvalho Chehab 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 20043f6c078SMauro Carvalho Chehab 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 20143f6c078SMauro Carvalho Chehab 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 20243f6c078SMauro Carvalho Chehab 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 20343f6c078SMauro Carvalho Chehab 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 20443f6c078SMauro Carvalho Chehab 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 20543f6c078SMauro Carvalho Chehab 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 20643f6c078SMauro Carvalho Chehab 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 20743f6c078SMauro Carvalho Chehab 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 20843f6c078SMauro Carvalho Chehab 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 20943f6c078SMauro Carvalho Chehab 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0 21043f6c078SMauro Carvalho Chehab }; 21143f6c078SMauro Carvalho Chehab 21243f6c078SMauro Carvalho Chehab void ecc1(const unsigned char *buf, unsigned char *code) 21343f6c078SMauro Carvalho Chehab { 21443f6c078SMauro Carvalho Chehab int i; 21543f6c078SMauro Carvalho Chehab const unsigned char *bp = buf; 21643f6c078SMauro Carvalho Chehab unsigned char cur; 21743f6c078SMauro Carvalho Chehab unsigned char rp0, rp1, rp2, rp3, rp4, rp5, rp6, rp7; 21843f6c078SMauro Carvalho Chehab unsigned char rp8, rp9, rp10, rp11, rp12, rp13, rp14, rp15; 21943f6c078SMauro Carvalho Chehab unsigned char par; 22043f6c078SMauro Carvalho Chehab 22143f6c078SMauro Carvalho Chehab par = 0; 22243f6c078SMauro Carvalho Chehab rp0 = 0; rp1 = 0; rp2 = 0; rp3 = 0; 22343f6c078SMauro Carvalho Chehab rp4 = 0; rp5 = 0; rp6 = 0; rp7 = 0; 22443f6c078SMauro Carvalho Chehab rp8 = 0; rp9 = 0; rp10 = 0; rp11 = 0; 22543f6c078SMauro Carvalho Chehab rp12 = 0; rp13 = 0; rp14 = 0; rp15 = 0; 22643f6c078SMauro Carvalho Chehab 22743f6c078SMauro Carvalho Chehab for (i = 0; i < 256; i++) 22843f6c078SMauro Carvalho Chehab { 22943f6c078SMauro Carvalho Chehab cur = *bp++; 23043f6c078SMauro Carvalho Chehab par ^= cur; 23143f6c078SMauro Carvalho Chehab if (i & 0x01) rp1 ^= cur; else rp0 ^= cur; 23243f6c078SMauro Carvalho Chehab if (i & 0x02) rp3 ^= cur; else rp2 ^= cur; 23343f6c078SMauro Carvalho Chehab if (i & 0x04) rp5 ^= cur; else rp4 ^= cur; 23443f6c078SMauro Carvalho Chehab if (i & 0x08) rp7 ^= cur; else rp6 ^= cur; 23543f6c078SMauro Carvalho Chehab if (i & 0x10) rp9 ^= cur; else rp8 ^= cur; 23643f6c078SMauro Carvalho Chehab if (i & 0x20) rp11 ^= cur; else rp10 ^= cur; 23743f6c078SMauro Carvalho Chehab if (i & 0x40) rp13 ^= cur; else rp12 ^= cur; 23843f6c078SMauro Carvalho Chehab if (i & 0x80) rp15 ^= cur; else rp14 ^= cur; 23943f6c078SMauro Carvalho Chehab } 24043f6c078SMauro Carvalho Chehab code[0] = 24143f6c078SMauro Carvalho Chehab (parity[rp7] << 7) | 24243f6c078SMauro Carvalho Chehab (parity[rp6] << 6) | 24343f6c078SMauro Carvalho Chehab (parity[rp5] << 5) | 24443f6c078SMauro Carvalho Chehab (parity[rp4] << 4) | 24543f6c078SMauro Carvalho Chehab (parity[rp3] << 3) | 24643f6c078SMauro Carvalho Chehab (parity[rp2] << 2) | 24743f6c078SMauro Carvalho Chehab (parity[rp1] << 1) | 24843f6c078SMauro Carvalho Chehab (parity[rp0]); 24943f6c078SMauro Carvalho Chehab code[1] = 25043f6c078SMauro Carvalho Chehab (parity[rp15] << 7) | 25143f6c078SMauro Carvalho Chehab (parity[rp14] << 6) | 25243f6c078SMauro Carvalho Chehab (parity[rp13] << 5) | 25343f6c078SMauro Carvalho Chehab (parity[rp12] << 4) | 25443f6c078SMauro Carvalho Chehab (parity[rp11] << 3) | 25543f6c078SMauro Carvalho Chehab (parity[rp10] << 2) | 25643f6c078SMauro Carvalho Chehab (parity[rp9] << 1) | 25743f6c078SMauro Carvalho Chehab (parity[rp8]); 25843f6c078SMauro Carvalho Chehab code[2] = 25943f6c078SMauro Carvalho Chehab (parity[par & 0xf0] << 7) | 26043f6c078SMauro Carvalho Chehab (parity[par & 0x0f] << 6) | 26143f6c078SMauro Carvalho Chehab (parity[par & 0xcc] << 5) | 26243f6c078SMauro Carvalho Chehab (parity[par & 0x33] << 4) | 26343f6c078SMauro Carvalho Chehab (parity[par & 0xaa] << 3) | 26443f6c078SMauro Carvalho Chehab (parity[par & 0x55] << 2); 26543f6c078SMauro Carvalho Chehab code[0] = ~code[0]; 26643f6c078SMauro Carvalho Chehab code[1] = ~code[1]; 26743f6c078SMauro Carvalho Chehab code[2] = ~code[2]; 26843f6c078SMauro Carvalho Chehab } 26943f6c078SMauro Carvalho Chehab 27043f6c078SMauro Carvalho ChehabStill pretty straightforward. The last three invert statements are there to 27143f6c078SMauro Carvalho Chehabgive a checksum of 0xff 0xff 0xff for an empty flash. In an empty flash 27243f6c078SMauro Carvalho Chehaball data is 0xff, so the checksum then matches. 27343f6c078SMauro Carvalho Chehab 27443f6c078SMauro Carvalho ChehabI also introduced the parity lookup. I expected this to be the fastest 27543f6c078SMauro Carvalho Chehabway to calculate the parity, but I will investigate alternatives later 27643f6c078SMauro Carvalho Chehabon. 27743f6c078SMauro Carvalho Chehab 27843f6c078SMauro Carvalho Chehab 27943f6c078SMauro Carvalho ChehabAnalysis 1 28043f6c078SMauro Carvalho Chehab========== 28143f6c078SMauro Carvalho Chehab 28243f6c078SMauro Carvalho ChehabThe code works, but is not terribly efficient. On my system it took 28343f6c078SMauro Carvalho Chehabalmost 4 times as much time as the linux driver code. But hey, if it was 28443f6c078SMauro Carvalho Chehab*that* easy this would have been done long before. 28543f6c078SMauro Carvalho ChehabNo pain. no gain. 28643f6c078SMauro Carvalho Chehab 28743f6c078SMauro Carvalho ChehabFortunately there is plenty of room for improvement. 28843f6c078SMauro Carvalho Chehab 28943f6c078SMauro Carvalho ChehabIn step 1 we moved from bit-wise calculation to byte-wise calculation. 29043f6c078SMauro Carvalho ChehabHowever in C we can also use the unsigned long data type and virtually 29143f6c078SMauro Carvalho Chehabevery modern microprocessor supports 32 bit operations, so why not try 29243f6c078SMauro Carvalho Chehabto write our code in such a way that we process data in 32 bit chunks. 29343f6c078SMauro Carvalho Chehab 29443f6c078SMauro Carvalho ChehabOf course this means some modification as the row parity is byte by 29543f6c078SMauro Carvalho Chehabbyte. A quick analysis: 29643f6c078SMauro Carvalho Chehabfor the column parity we use the par variable. When extending to 32 bits 29743f6c078SMauro Carvalho Chehabwe can in the end easily calculate rp0 and rp1 from it. 29843f6c078SMauro Carvalho Chehab(because par now consists of 4 bytes, contributing to rp1, rp0, rp1, rp0 29943f6c078SMauro Carvalho Chehabrespectively, from MSB to LSB) 30043f6c078SMauro Carvalho Chehabalso rp2 and rp3 can be easily retrieved from par as rp3 covers the 30143f6c078SMauro Carvalho Chehabfirst two MSBs and rp2 covers the last two LSBs. 30243f6c078SMauro Carvalho Chehab 30343f6c078SMauro Carvalho ChehabNote that of course now the loop is executed only 64 times (256/4). 30443f6c078SMauro Carvalho ChehabAnd note that care must taken wrt byte ordering. The way bytes are 30543f6c078SMauro Carvalho Chehabordered in a long is machine dependent, and might affect us. 30643f6c078SMauro Carvalho ChehabAnyway, if there is an issue: this code is developed on x86 (to be 30743f6c078SMauro Carvalho Chehabprecise: a DELL PC with a D920 Intel CPU) 30843f6c078SMauro Carvalho Chehab 30943f6c078SMauro Carvalho ChehabAnd of course the performance might depend on alignment, but I expect 31043f6c078SMauro Carvalho Chehabthat the I/O buffers in the nand driver are aligned properly (and 31143f6c078SMauro Carvalho Chehabotherwise that should be fixed to get maximum performance). 31243f6c078SMauro Carvalho Chehab 31343f6c078SMauro Carvalho ChehabLet's give it a try... 31443f6c078SMauro Carvalho Chehab 31543f6c078SMauro Carvalho Chehab 31643f6c078SMauro Carvalho ChehabAttempt 2 31743f6c078SMauro Carvalho Chehab========= 31843f6c078SMauro Carvalho Chehab 31943f6c078SMauro Carvalho Chehab:: 32043f6c078SMauro Carvalho Chehab 32143f6c078SMauro Carvalho Chehab extern const char parity[256]; 32243f6c078SMauro Carvalho Chehab 32343f6c078SMauro Carvalho Chehab void ecc2(const unsigned char *buf, unsigned char *code) 32443f6c078SMauro Carvalho Chehab { 32543f6c078SMauro Carvalho Chehab int i; 32643f6c078SMauro Carvalho Chehab const unsigned long *bp = (unsigned long *)buf; 32743f6c078SMauro Carvalho Chehab unsigned long cur; 32843f6c078SMauro Carvalho Chehab unsigned long rp0, rp1, rp2, rp3, rp4, rp5, rp6, rp7; 32943f6c078SMauro Carvalho Chehab unsigned long rp8, rp9, rp10, rp11, rp12, rp13, rp14, rp15; 33043f6c078SMauro Carvalho Chehab unsigned long par; 33143f6c078SMauro Carvalho Chehab 33243f6c078SMauro Carvalho Chehab par = 0; 33343f6c078SMauro Carvalho Chehab rp0 = 0; rp1 = 0; rp2 = 0; rp3 = 0; 33443f6c078SMauro Carvalho Chehab rp4 = 0; rp5 = 0; rp6 = 0; rp7 = 0; 33543f6c078SMauro Carvalho Chehab rp8 = 0; rp9 = 0; rp10 = 0; rp11 = 0; 33643f6c078SMauro Carvalho Chehab rp12 = 0; rp13 = 0; rp14 = 0; rp15 = 0; 33743f6c078SMauro Carvalho Chehab 33843f6c078SMauro Carvalho Chehab for (i = 0; i < 64; i++) 33943f6c078SMauro Carvalho Chehab { 34043f6c078SMauro Carvalho Chehab cur = *bp++; 34143f6c078SMauro Carvalho Chehab par ^= cur; 34243f6c078SMauro Carvalho Chehab if (i & 0x01) rp5 ^= cur; else rp4 ^= cur; 34343f6c078SMauro Carvalho Chehab if (i & 0x02) rp7 ^= cur; else rp6 ^= cur; 34443f6c078SMauro Carvalho Chehab if (i & 0x04) rp9 ^= cur; else rp8 ^= cur; 34543f6c078SMauro Carvalho Chehab if (i & 0x08) rp11 ^= cur; else rp10 ^= cur; 34643f6c078SMauro Carvalho Chehab if (i & 0x10) rp13 ^= cur; else rp12 ^= cur; 34743f6c078SMauro Carvalho Chehab if (i & 0x20) rp15 ^= cur; else rp14 ^= cur; 34843f6c078SMauro Carvalho Chehab } 34943f6c078SMauro Carvalho Chehab /* 35043f6c078SMauro Carvalho Chehab we need to adapt the code generation for the fact that rp vars are now 35143f6c078SMauro Carvalho Chehab long; also the column parity calculation needs to be changed. 35243f6c078SMauro Carvalho Chehab we'll bring rp4 to 15 back to single byte entities by shifting and 35343f6c078SMauro Carvalho Chehab xoring 35443f6c078SMauro Carvalho Chehab */ 35543f6c078SMauro Carvalho Chehab rp4 ^= (rp4 >> 16); rp4 ^= (rp4 >> 8); rp4 &= 0xff; 35643f6c078SMauro Carvalho Chehab rp5 ^= (rp5 >> 16); rp5 ^= (rp5 >> 8); rp5 &= 0xff; 35743f6c078SMauro Carvalho Chehab rp6 ^= (rp6 >> 16); rp6 ^= (rp6 >> 8); rp6 &= 0xff; 35843f6c078SMauro Carvalho Chehab rp7 ^= (rp7 >> 16); rp7 ^= (rp7 >> 8); rp7 &= 0xff; 35943f6c078SMauro Carvalho Chehab rp8 ^= (rp8 >> 16); rp8 ^= (rp8 >> 8); rp8 &= 0xff; 36043f6c078SMauro Carvalho Chehab rp9 ^= (rp9 >> 16); rp9 ^= (rp9 >> 8); rp9 &= 0xff; 36143f6c078SMauro Carvalho Chehab rp10 ^= (rp10 >> 16); rp10 ^= (rp10 >> 8); rp10 &= 0xff; 36243f6c078SMauro Carvalho Chehab rp11 ^= (rp11 >> 16); rp11 ^= (rp11 >> 8); rp11 &= 0xff; 36343f6c078SMauro Carvalho Chehab rp12 ^= (rp12 >> 16); rp12 ^= (rp12 >> 8); rp12 &= 0xff; 36443f6c078SMauro Carvalho Chehab rp13 ^= (rp13 >> 16); rp13 ^= (rp13 >> 8); rp13 &= 0xff; 36543f6c078SMauro Carvalho Chehab rp14 ^= (rp14 >> 16); rp14 ^= (rp14 >> 8); rp14 &= 0xff; 36643f6c078SMauro Carvalho Chehab rp15 ^= (rp15 >> 16); rp15 ^= (rp15 >> 8); rp15 &= 0xff; 36743f6c078SMauro Carvalho Chehab rp3 = (par >> 16); rp3 ^= (rp3 >> 8); rp3 &= 0xff; 36843f6c078SMauro Carvalho Chehab rp2 = par & 0xffff; rp2 ^= (rp2 >> 8); rp2 &= 0xff; 36943f6c078SMauro Carvalho Chehab par ^= (par >> 16); 37043f6c078SMauro Carvalho Chehab rp1 = (par >> 8); rp1 &= 0xff; 37143f6c078SMauro Carvalho Chehab rp0 = (par & 0xff); 37243f6c078SMauro Carvalho Chehab par ^= (par >> 8); par &= 0xff; 37343f6c078SMauro Carvalho Chehab 37443f6c078SMauro Carvalho Chehab code[0] = 37543f6c078SMauro Carvalho Chehab (parity[rp7] << 7) | 37643f6c078SMauro Carvalho Chehab (parity[rp6] << 6) | 37743f6c078SMauro Carvalho Chehab (parity[rp5] << 5) | 37843f6c078SMauro Carvalho Chehab (parity[rp4] << 4) | 37943f6c078SMauro Carvalho Chehab (parity[rp3] << 3) | 38043f6c078SMauro Carvalho Chehab (parity[rp2] << 2) | 38143f6c078SMauro Carvalho Chehab (parity[rp1] << 1) | 38243f6c078SMauro Carvalho Chehab (parity[rp0]); 38343f6c078SMauro Carvalho Chehab code[1] = 38443f6c078SMauro Carvalho Chehab (parity[rp15] << 7) | 38543f6c078SMauro Carvalho Chehab (parity[rp14] << 6) | 38643f6c078SMauro Carvalho Chehab (parity[rp13] << 5) | 38743f6c078SMauro Carvalho Chehab (parity[rp12] << 4) | 38843f6c078SMauro Carvalho Chehab (parity[rp11] << 3) | 38943f6c078SMauro Carvalho Chehab (parity[rp10] << 2) | 39043f6c078SMauro Carvalho Chehab (parity[rp9] << 1) | 39143f6c078SMauro Carvalho Chehab (parity[rp8]); 39243f6c078SMauro Carvalho Chehab code[2] = 39343f6c078SMauro Carvalho Chehab (parity[par & 0xf0] << 7) | 39443f6c078SMauro Carvalho Chehab (parity[par & 0x0f] << 6) | 39543f6c078SMauro Carvalho Chehab (parity[par & 0xcc] << 5) | 39643f6c078SMauro Carvalho Chehab (parity[par & 0x33] << 4) | 39743f6c078SMauro Carvalho Chehab (parity[par & 0xaa] << 3) | 39843f6c078SMauro Carvalho Chehab (parity[par & 0x55] << 2); 39943f6c078SMauro Carvalho Chehab code[0] = ~code[0]; 40043f6c078SMauro Carvalho Chehab code[1] = ~code[1]; 40143f6c078SMauro Carvalho Chehab code[2] = ~code[2]; 40243f6c078SMauro Carvalho Chehab } 40343f6c078SMauro Carvalho Chehab 40443f6c078SMauro Carvalho ChehabThe parity array is not shown any more. Note also that for these 40543f6c078SMauro Carvalho Chehabexamples I kinda deviated from my regular programming style by allowing 40643f6c078SMauro Carvalho Chehabmultiple statements on a line, not using { } in then and else blocks 40743f6c078SMauro Carvalho Chehabwith only a single statement and by using operators like ^= 40843f6c078SMauro Carvalho Chehab 40943f6c078SMauro Carvalho Chehab 41043f6c078SMauro Carvalho ChehabAnalysis 2 41143f6c078SMauro Carvalho Chehab========== 41243f6c078SMauro Carvalho Chehab 41343f6c078SMauro Carvalho ChehabThe code (of course) works, and hurray: we are a little bit faster than 41443f6c078SMauro Carvalho Chehabthe linux driver code (about 15%). But wait, don't cheer too quickly. 41543f6c078SMauro Carvalho ChehabThere is more to be gained. 41643f6c078SMauro Carvalho ChehabIf we look at e.g. rp14 and rp15 we see that we either xor our data with 41743f6c078SMauro Carvalho Chehabrp14 or with rp15. However we also have par which goes over all data. 41843f6c078SMauro Carvalho ChehabThis means there is no need to calculate rp14 as it can be calculated from 41943f6c078SMauro Carvalho Chehabrp15 through rp14 = par ^ rp15, because par = rp14 ^ rp15; 42043f6c078SMauro Carvalho Chehab(or if desired we can avoid calculating rp15 and calculate it from 42143f6c078SMauro Carvalho Chehabrp14). That is why some places refer to inverse parity. 42243f6c078SMauro Carvalho ChehabOf course the same thing holds for rp4/5, rp6/7, rp8/9, rp10/11 and rp12/13. 42343f6c078SMauro Carvalho ChehabEffectively this means we can eliminate the else clause from the if 42443f6c078SMauro Carvalho Chehabstatements. Also we can optimise the calculation in the end a little bit 42543f6c078SMauro Carvalho Chehabby going from long to byte first. Actually we can even avoid the table 42643f6c078SMauro Carvalho Chehablookups 42743f6c078SMauro Carvalho Chehab 42843f6c078SMauro Carvalho ChehabAttempt 3 42943f6c078SMauro Carvalho Chehab========= 43043f6c078SMauro Carvalho Chehab 43143f6c078SMauro Carvalho ChehabOdd replaced:: 43243f6c078SMauro Carvalho Chehab 43343f6c078SMauro Carvalho Chehab if (i & 0x01) rp5 ^= cur; else rp4 ^= cur; 43443f6c078SMauro Carvalho Chehab if (i & 0x02) rp7 ^= cur; else rp6 ^= cur; 43543f6c078SMauro Carvalho Chehab if (i & 0x04) rp9 ^= cur; else rp8 ^= cur; 43643f6c078SMauro Carvalho Chehab if (i & 0x08) rp11 ^= cur; else rp10 ^= cur; 43743f6c078SMauro Carvalho Chehab if (i & 0x10) rp13 ^= cur; else rp12 ^= cur; 43843f6c078SMauro Carvalho Chehab if (i & 0x20) rp15 ^= cur; else rp14 ^= cur; 43943f6c078SMauro Carvalho Chehab 44043f6c078SMauro Carvalho Chehabwith:: 44143f6c078SMauro Carvalho Chehab 44243f6c078SMauro Carvalho Chehab if (i & 0x01) rp5 ^= cur; 44343f6c078SMauro Carvalho Chehab if (i & 0x02) rp7 ^= cur; 44443f6c078SMauro Carvalho Chehab if (i & 0x04) rp9 ^= cur; 44543f6c078SMauro Carvalho Chehab if (i & 0x08) rp11 ^= cur; 44643f6c078SMauro Carvalho Chehab if (i & 0x10) rp13 ^= cur; 44743f6c078SMauro Carvalho Chehab if (i & 0x20) rp15 ^= cur; 44843f6c078SMauro Carvalho Chehab 44943f6c078SMauro Carvalho Chehaband outside the loop added:: 45043f6c078SMauro Carvalho Chehab 45143f6c078SMauro Carvalho Chehab rp4 = par ^ rp5; 45243f6c078SMauro Carvalho Chehab rp6 = par ^ rp7; 45343f6c078SMauro Carvalho Chehab rp8 = par ^ rp9; 45443f6c078SMauro Carvalho Chehab rp10 = par ^ rp11; 45543f6c078SMauro Carvalho Chehab rp12 = par ^ rp13; 45643f6c078SMauro Carvalho Chehab rp14 = par ^ rp15; 45743f6c078SMauro Carvalho Chehab 45843f6c078SMauro Carvalho ChehabAnd after that the code takes about 30% more time, although the number of 45943f6c078SMauro Carvalho Chehabstatements is reduced. This is also reflected in the assembly code. 46043f6c078SMauro Carvalho Chehab 46143f6c078SMauro Carvalho Chehab 46243f6c078SMauro Carvalho ChehabAnalysis 3 46343f6c078SMauro Carvalho Chehab========== 46443f6c078SMauro Carvalho Chehab 46543f6c078SMauro Carvalho ChehabVery weird. Guess it has to do with caching or instruction parallellism 46643f6c078SMauro Carvalho Chehabor so. I also tried on an eeePC (Celeron, clocked at 900 Mhz). Interesting 46743f6c078SMauro Carvalho Chehabobservation was that this one is only 30% slower (according to time) 46843f6c078SMauro Carvalho Chehabexecuting the code as my 3Ghz D920 processor. 46943f6c078SMauro Carvalho Chehab 47043f6c078SMauro Carvalho ChehabWell, it was expected not to be easy so maybe instead move to a 47143f6c078SMauro Carvalho Chehabdifferent track: let's move back to the code from attempt2 and do some 47243f6c078SMauro Carvalho Chehabloop unrolling. This will eliminate a few if statements. I'll try 47343f6c078SMauro Carvalho Chehabdifferent amounts of unrolling to see what works best. 47443f6c078SMauro Carvalho Chehab 47543f6c078SMauro Carvalho Chehab 47643f6c078SMauro Carvalho ChehabAttempt 4 47743f6c078SMauro Carvalho Chehab========= 47843f6c078SMauro Carvalho Chehab 47943f6c078SMauro Carvalho ChehabUnrolled the loop 1, 2, 3 and 4 times. 48043f6c078SMauro Carvalho ChehabFor 4 the code starts with:: 48143f6c078SMauro Carvalho Chehab 48243f6c078SMauro Carvalho Chehab for (i = 0; i < 4; i++) 48343f6c078SMauro Carvalho Chehab { 48443f6c078SMauro Carvalho Chehab cur = *bp++; 48543f6c078SMauro Carvalho Chehab par ^= cur; 48643f6c078SMauro Carvalho Chehab rp4 ^= cur; 48743f6c078SMauro Carvalho Chehab rp6 ^= cur; 48843f6c078SMauro Carvalho Chehab rp8 ^= cur; 48943f6c078SMauro Carvalho Chehab rp10 ^= cur; 49043f6c078SMauro Carvalho Chehab if (i & 0x1) rp13 ^= cur; else rp12 ^= cur; 49143f6c078SMauro Carvalho Chehab if (i & 0x2) rp15 ^= cur; else rp14 ^= cur; 49243f6c078SMauro Carvalho Chehab cur = *bp++; 49343f6c078SMauro Carvalho Chehab par ^= cur; 49443f6c078SMauro Carvalho Chehab rp5 ^= cur; 49543f6c078SMauro Carvalho Chehab rp6 ^= cur; 49643f6c078SMauro Carvalho Chehab ... 49743f6c078SMauro Carvalho Chehab 49843f6c078SMauro Carvalho Chehab 49943f6c078SMauro Carvalho ChehabAnalysis 4 50043f6c078SMauro Carvalho Chehab========== 50143f6c078SMauro Carvalho Chehab 50243f6c078SMauro Carvalho ChehabUnrolling once gains about 15% 50343f6c078SMauro Carvalho Chehab 50443f6c078SMauro Carvalho ChehabUnrolling twice keeps the gain at about 15% 50543f6c078SMauro Carvalho Chehab 50643f6c078SMauro Carvalho ChehabUnrolling three times gives a gain of 30% compared to attempt 2. 50743f6c078SMauro Carvalho Chehab 50843f6c078SMauro Carvalho ChehabUnrolling four times gives a marginal improvement compared to unrolling 50943f6c078SMauro Carvalho Chehabthree times. 51043f6c078SMauro Carvalho Chehab 51143f6c078SMauro Carvalho ChehabI decided to proceed with a four time unrolled loop anyway. It was my gut 51243f6c078SMauro Carvalho Chehabfeeling that in the next steps I would obtain additional gain from it. 51343f6c078SMauro Carvalho Chehab 51443f6c078SMauro Carvalho ChehabThe next step was triggered by the fact that par contains the xor of all 51543f6c078SMauro Carvalho Chehabbytes and rp4 and rp5 each contain the xor of half of the bytes. 51643f6c078SMauro Carvalho ChehabSo in effect par = rp4 ^ rp5. But as xor is commutative we can also say 51743f6c078SMauro Carvalho Chehabthat rp5 = par ^ rp4. So no need to keep both rp4 and rp5 around. We can 51843f6c078SMauro Carvalho Chehabeliminate rp5 (or rp4, but I already foresaw another optimisation). 51943f6c078SMauro Carvalho ChehabThe same holds for rp6/7, rp8/9, rp10/11 rp12/13 and rp14/15. 52043f6c078SMauro Carvalho Chehab 52143f6c078SMauro Carvalho Chehab 52243f6c078SMauro Carvalho ChehabAttempt 5 52343f6c078SMauro Carvalho Chehab========= 52443f6c078SMauro Carvalho Chehab 52543f6c078SMauro Carvalho ChehabEffectively so all odd digit rp assignments in the loop were removed. 52643f6c078SMauro Carvalho ChehabThis included the else clause of the if statements. 52743f6c078SMauro Carvalho ChehabOf course after the loop we need to correct things by adding code like:: 52843f6c078SMauro Carvalho Chehab 52943f6c078SMauro Carvalho Chehab rp5 = par ^ rp4; 53043f6c078SMauro Carvalho Chehab 53143f6c078SMauro Carvalho ChehabAlso the initial assignments (rp5 = 0; etc) could be removed. 53243f6c078SMauro Carvalho ChehabAlong the line I also removed the initialisation of rp0/1/2/3. 53343f6c078SMauro Carvalho Chehab 53443f6c078SMauro Carvalho Chehab 53543f6c078SMauro Carvalho ChehabAnalysis 5 53643f6c078SMauro Carvalho Chehab========== 53743f6c078SMauro Carvalho Chehab 53843f6c078SMauro Carvalho ChehabMeasurements showed this was a good move. The run-time roughly halved 53943f6c078SMauro Carvalho Chehabcompared with attempt 4 with 4 times unrolled, and we only require 1/3rd 54043f6c078SMauro Carvalho Chehabof the processor time compared to the current code in the linux kernel. 54143f6c078SMauro Carvalho Chehab 54243f6c078SMauro Carvalho ChehabHowever, still I thought there was more. I didn't like all the if 54343f6c078SMauro Carvalho Chehabstatements. Why not keep a running parity and only keep the last if 54443f6c078SMauro Carvalho Chehabstatement. Time for yet another version! 54543f6c078SMauro Carvalho Chehab 54643f6c078SMauro Carvalho Chehab 54743f6c078SMauro Carvalho ChehabAttempt 6 54843f6c078SMauro Carvalho Chehab========= 54943f6c078SMauro Carvalho Chehab 55043f6c078SMauro Carvalho ChehabTHe code within the for loop was changed to:: 55143f6c078SMauro Carvalho Chehab 55243f6c078SMauro Carvalho Chehab for (i = 0; i < 4; i++) 55343f6c078SMauro Carvalho Chehab { 55443f6c078SMauro Carvalho Chehab cur = *bp++; tmppar = cur; rp4 ^= cur; 55543f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp6 ^= tmppar; 55643f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp4 ^= cur; 55743f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp8 ^= tmppar; 55843f6c078SMauro Carvalho Chehab 55943f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur; 56043f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp6 ^= cur; 56143f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp4 ^= cur; 56243f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp10 ^= tmppar; 56343f6c078SMauro Carvalho Chehab 56443f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur; rp8 ^= cur; 56543f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp6 ^= cur; rp8 ^= cur; 56643f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp8 ^= cur; 56743f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp8 ^= cur; 56843f6c078SMauro Carvalho Chehab 56943f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur; 57043f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp6 ^= cur; 57143f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp4 ^= cur; 57243f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; 57343f6c078SMauro Carvalho Chehab 57443f6c078SMauro Carvalho Chehab par ^= tmppar; 57543f6c078SMauro Carvalho Chehab if ((i & 0x1) == 0) rp12 ^= tmppar; 57643f6c078SMauro Carvalho Chehab if ((i & 0x2) == 0) rp14 ^= tmppar; 57743f6c078SMauro Carvalho Chehab } 57843f6c078SMauro Carvalho Chehab 57943f6c078SMauro Carvalho ChehabAs you can see tmppar is used to accumulate the parity within a for 58043f6c078SMauro Carvalho Chehabiteration. In the last 3 statements is added to par and, if needed, 58143f6c078SMauro Carvalho Chehabto rp12 and rp14. 58243f6c078SMauro Carvalho Chehab 58343f6c078SMauro Carvalho ChehabWhile making the changes I also found that I could exploit that tmppar 58443f6c078SMauro Carvalho Chehabcontains the running parity for this iteration. So instead of having: 58543f6c078SMauro Carvalho Chehabrp4 ^= cur; rp6 ^= cur; 58643f6c078SMauro Carvalho ChehabI removed the rp6 ^= cur; statement and did rp6 ^= tmppar; on next 58743f6c078SMauro Carvalho Chehabstatement. A similar change was done for rp8 and rp10 58843f6c078SMauro Carvalho Chehab 58943f6c078SMauro Carvalho Chehab 59043f6c078SMauro Carvalho ChehabAnalysis 6 59143f6c078SMauro Carvalho Chehab========== 59243f6c078SMauro Carvalho Chehab 59343f6c078SMauro Carvalho ChehabMeasuring this code again showed big gain. When executing the original 59443f6c078SMauro Carvalho Chehablinux code 1 million times, this took about 1 second on my system. 59543f6c078SMauro Carvalho Chehab(using time to measure the performance). After this iteration I was back 59643f6c078SMauro Carvalho Chehabto 0.075 sec. Actually I had to decide to start measuring over 10 59743f6c078SMauro Carvalho Chehabmillion iterations in order not to lose too much accuracy. This one 59843f6c078SMauro Carvalho Chehabdefinitely seemed to be the jackpot! 59943f6c078SMauro Carvalho Chehab 60043f6c078SMauro Carvalho ChehabThere is a little bit more room for improvement though. There are three 60143f6c078SMauro Carvalho Chehabplaces with statements:: 60243f6c078SMauro Carvalho Chehab 60343f6c078SMauro Carvalho Chehab rp4 ^= cur; rp6 ^= cur; 60443f6c078SMauro Carvalho Chehab 60543f6c078SMauro Carvalho ChehabIt seems more efficient to also maintain a variable rp4_6 in the while 60643f6c078SMauro Carvalho Chehabloop; This eliminates 3 statements per loop. Of course after the loop we 60743f6c078SMauro Carvalho Chehabneed to correct by adding:: 60843f6c078SMauro Carvalho Chehab 60943f6c078SMauro Carvalho Chehab rp4 ^= rp4_6; 61043f6c078SMauro Carvalho Chehab rp6 ^= rp4_6 61143f6c078SMauro Carvalho Chehab 61243f6c078SMauro Carvalho ChehabFurthermore there are 4 sequential assignments to rp8. This can be 61343f6c078SMauro Carvalho Chehabencoded slightly more efficiently by saving tmppar before those 4 lines 61443f6c078SMauro Carvalho Chehaband later do rp8 = rp8 ^ tmppar ^ notrp8; 61543f6c078SMauro Carvalho Chehab(where notrp8 is the value of rp8 before those 4 lines). 61643f6c078SMauro Carvalho ChehabAgain a use of the commutative property of xor. 61743f6c078SMauro Carvalho ChehabTime for a new test! 61843f6c078SMauro Carvalho Chehab 61943f6c078SMauro Carvalho Chehab 62043f6c078SMauro Carvalho ChehabAttempt 7 62143f6c078SMauro Carvalho Chehab========= 62243f6c078SMauro Carvalho Chehab 62343f6c078SMauro Carvalho ChehabThe new code now looks like:: 62443f6c078SMauro Carvalho Chehab 62543f6c078SMauro Carvalho Chehab for (i = 0; i < 4; i++) 62643f6c078SMauro Carvalho Chehab { 62743f6c078SMauro Carvalho Chehab cur = *bp++; tmppar = cur; rp4 ^= cur; 62843f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp6 ^= tmppar; 62943f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp4 ^= cur; 63043f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp8 ^= tmppar; 63143f6c078SMauro Carvalho Chehab 63243f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp4_6 ^= cur; 63343f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp6 ^= cur; 63443f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp4 ^= cur; 63543f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp10 ^= tmppar; 63643f6c078SMauro Carvalho Chehab 63743f6c078SMauro Carvalho Chehab notrp8 = tmppar; 63843f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp4_6 ^= cur; 63943f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp6 ^= cur; 64043f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp4 ^= cur; 64143f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; 64243f6c078SMauro Carvalho Chehab rp8 = rp8 ^ tmppar ^ notrp8; 64343f6c078SMauro Carvalho Chehab 64443f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp4_6 ^= cur; 64543f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp6 ^= cur; 64643f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; rp4 ^= cur; 64743f6c078SMauro Carvalho Chehab cur = *bp++; tmppar ^= cur; 64843f6c078SMauro Carvalho Chehab 64943f6c078SMauro Carvalho Chehab par ^= tmppar; 65043f6c078SMauro Carvalho Chehab if ((i & 0x1) == 0) rp12 ^= tmppar; 65143f6c078SMauro Carvalho Chehab if ((i & 0x2) == 0) rp14 ^= tmppar; 65243f6c078SMauro Carvalho Chehab } 65343f6c078SMauro Carvalho Chehab rp4 ^= rp4_6; 65443f6c078SMauro Carvalho Chehab rp6 ^= rp4_6; 65543f6c078SMauro Carvalho Chehab 65643f6c078SMauro Carvalho Chehab 65743f6c078SMauro Carvalho ChehabNot a big change, but every penny counts :-) 65843f6c078SMauro Carvalho Chehab 65943f6c078SMauro Carvalho Chehab 66043f6c078SMauro Carvalho ChehabAnalysis 7 66143f6c078SMauro Carvalho Chehab========== 66243f6c078SMauro Carvalho Chehab 66343f6c078SMauro Carvalho ChehabActually this made things worse. Not very much, but I don't want to move 66443f6c078SMauro Carvalho Chehabinto the wrong direction. Maybe something to investigate later. Could 66543f6c078SMauro Carvalho Chehabhave to do with caching again. 66643f6c078SMauro Carvalho Chehab 66743f6c078SMauro Carvalho ChehabGuess that is what there is to win within the loop. Maybe unrolling one 66843f6c078SMauro Carvalho Chehabmore time will help. I'll keep the optimisations from 7 for now. 66943f6c078SMauro Carvalho Chehab 67043f6c078SMauro Carvalho Chehab 67143f6c078SMauro Carvalho ChehabAttempt 8 67243f6c078SMauro Carvalho Chehab========= 67343f6c078SMauro Carvalho Chehab 67443f6c078SMauro Carvalho ChehabUnrolled the loop one more time. 67543f6c078SMauro Carvalho Chehab 67643f6c078SMauro Carvalho Chehab 67743f6c078SMauro Carvalho ChehabAnalysis 8 67843f6c078SMauro Carvalho Chehab========== 67943f6c078SMauro Carvalho Chehab 68043f6c078SMauro Carvalho ChehabThis makes things worse. Let's stick with attempt 6 and continue from there. 68143f6c078SMauro Carvalho ChehabAlthough it seems that the code within the loop cannot be optimised 68243f6c078SMauro Carvalho Chehabfurther there is still room to optimize the generation of the ecc codes. 68343f6c078SMauro Carvalho ChehabWe can simply calculate the total parity. If this is 0 then rp4 = rp5 68443f6c078SMauro Carvalho Chehabetc. If the parity is 1, then rp4 = !rp5; 68543f6c078SMauro Carvalho Chehab 68643f6c078SMauro Carvalho ChehabBut if rp4 = rp5 we do not need rp5 etc. We can just write the even bits 68743f6c078SMauro Carvalho Chehabin the result byte and then do something like:: 68843f6c078SMauro Carvalho Chehab 68943f6c078SMauro Carvalho Chehab code[0] |= (code[0] << 1); 69043f6c078SMauro Carvalho Chehab 69143f6c078SMauro Carvalho ChehabLets test this. 69243f6c078SMauro Carvalho Chehab 69343f6c078SMauro Carvalho Chehab 69443f6c078SMauro Carvalho ChehabAttempt 9 69543f6c078SMauro Carvalho Chehab========= 69643f6c078SMauro Carvalho Chehab 69743f6c078SMauro Carvalho ChehabChanged the code but again this slightly degrades performance. Tried all 69843f6c078SMauro Carvalho Chehabkind of other things, like having dedicated parity arrays to avoid the 69943f6c078SMauro Carvalho Chehabshift after parity[rp7] << 7; No gain. 70043f6c078SMauro Carvalho ChehabChange the lookup using the parity array by using shift operators (e.g. 70143f6c078SMauro Carvalho Chehabreplace parity[rp7] << 7 with:: 70243f6c078SMauro Carvalho Chehab 70343f6c078SMauro Carvalho Chehab rp7 ^= (rp7 << 4); 70443f6c078SMauro Carvalho Chehab rp7 ^= (rp7 << 2); 70543f6c078SMauro Carvalho Chehab rp7 ^= (rp7 << 1); 70643f6c078SMauro Carvalho Chehab rp7 &= 0x80; 70743f6c078SMauro Carvalho Chehab 70843f6c078SMauro Carvalho ChehabNo gain. 70943f6c078SMauro Carvalho Chehab 71043f6c078SMauro Carvalho ChehabThe only marginal change was inverting the parity bits, so we can remove 71143f6c078SMauro Carvalho Chehabthe last three invert statements. 71243f6c078SMauro Carvalho Chehab 71343f6c078SMauro Carvalho ChehabAh well, pity this does not deliver more. Then again 10 million 71443f6c078SMauro Carvalho Chehabiterations using the linux driver code takes between 13 and 13.5 71543f6c078SMauro Carvalho Chehabseconds, whereas my code now takes about 0.73 seconds for those 10 71643f6c078SMauro Carvalho Chehabmillion iterations. So basically I've improved the performance by a 71743f6c078SMauro Carvalho Chehabfactor 18 on my system. Not that bad. Of course on different hardware 71843f6c078SMauro Carvalho Chehabyou will get different results. No warranties! 71943f6c078SMauro Carvalho Chehab 72043f6c078SMauro Carvalho ChehabBut of course there is no such thing as a free lunch. The codesize almost 72143f6c078SMauro Carvalho Chehabtripled (from 562 bytes to 1434 bytes). Then again, it is not that much. 72243f6c078SMauro Carvalho Chehab 72343f6c078SMauro Carvalho Chehab 72443f6c078SMauro Carvalho ChehabCorrecting errors 72543f6c078SMauro Carvalho Chehab================= 72643f6c078SMauro Carvalho Chehab 72743f6c078SMauro Carvalho ChehabFor correcting errors I again used the ST application note as a starter, 72843f6c078SMauro Carvalho Chehabbut I also peeked at the existing code. 72943f6c078SMauro Carvalho Chehab 73043f6c078SMauro Carvalho ChehabThe algorithm itself is pretty straightforward. Just xor the given and 73143f6c078SMauro Carvalho Chehabthe calculated ecc. If all bytes are 0 there is no problem. If 11 bits 73243f6c078SMauro Carvalho Chehabare 1 we have one correctable bit error. If there is 1 bit 1, we have an 73343f6c078SMauro Carvalho Chehaberror in the given ecc code. 73443f6c078SMauro Carvalho Chehab 73543f6c078SMauro Carvalho ChehabIt proved to be fastest to do some table lookups. Performance gain 73643f6c078SMauro Carvalho Chehabintroduced by this is about a factor 2 on my system when a repair had to 73743f6c078SMauro Carvalho Chehabbe done, and 1% or so if no repair had to be done. 73843f6c078SMauro Carvalho Chehab 73943f6c078SMauro Carvalho ChehabCode size increased from 330 bytes to 686 bytes for this function. 74043f6c078SMauro Carvalho Chehab(gcc 4.2, -O3) 74143f6c078SMauro Carvalho Chehab 74243f6c078SMauro Carvalho Chehab 74343f6c078SMauro Carvalho ChehabConclusion 74443f6c078SMauro Carvalho Chehab========== 74543f6c078SMauro Carvalho Chehab 74643f6c078SMauro Carvalho ChehabThe gain when calculating the ecc is tremendous. Om my development hardware 74743f6c078SMauro Carvalho Chehaba speedup of a factor of 18 for ecc calculation was achieved. On a test on an 74843f6c078SMauro Carvalho Chehabembedded system with a MIPS core a factor 7 was obtained. 74943f6c078SMauro Carvalho Chehab 75043f6c078SMauro Carvalho ChehabOn a test with a Linksys NSLU2 (ARMv5TE processor) the speedup was a factor 75143f6c078SMauro Carvalho Chehab5 (big endian mode, gcc 4.1.2, -O3) 75243f6c078SMauro Carvalho Chehab 75343f6c078SMauro Carvalho ChehabFor correction not much gain could be obtained (as bitflips are rare). Then 75443f6c078SMauro Carvalho Chehabagain there are also much less cycles spent there. 75543f6c078SMauro Carvalho Chehab 75643f6c078SMauro Carvalho ChehabIt seems there is not much more gain possible in this, at least when 75743f6c078SMauro Carvalho Chehabprogrammed in C. Of course it might be possible to squeeze something more 75843f6c078SMauro Carvalho Chehabout of it with an assembler program, but due to pipeline behaviour etc 75943f6c078SMauro Carvalho Chehabthis is very tricky (at least for intel hw). 76043f6c078SMauro Carvalho Chehab 76143f6c078SMauro Carvalho ChehabAuthor: Frans Meulenbroeks 76243f6c078SMauro Carvalho Chehab 76343f6c078SMauro Carvalho ChehabCopyright (C) 2008 Koninklijke Philips Electronics NV. 764