143f6c078SMauro Carvalho Chehab==========================
243f6c078SMauro Carvalho ChehabNAND Error-correction Code
343f6c078SMauro Carvalho Chehab==========================
443f6c078SMauro Carvalho Chehab
543f6c078SMauro Carvalho ChehabIntroduction
643f6c078SMauro Carvalho Chehab============
743f6c078SMauro Carvalho Chehab
8*131ce3edSMiquel RaynalHaving looked at the linux mtd/nand Hamming software ECC engine driver
943f6c078SMauro Carvalho ChehabI felt there was room for optimisation. I bashed the code for a few hours
1043f6c078SMauro Carvalho Chehabperforming tricks like table lookup removing superfluous code etc.
1143f6c078SMauro Carvalho ChehabAfter that the speed was increased by 35-40%.
1243f6c078SMauro Carvalho ChehabStill I was not too happy as I felt there was additional room for improvement.
1343f6c078SMauro Carvalho Chehab
1443f6c078SMauro Carvalho ChehabBad! I was hooked.
1543f6c078SMauro Carvalho ChehabI decided to annotate my steps in this file. Perhaps it is useful to someone
1643f6c078SMauro Carvalho Chehabor someone learns something from it.
1743f6c078SMauro Carvalho Chehab
1843f6c078SMauro Carvalho Chehab
1943f6c078SMauro Carvalho ChehabThe problem
2043f6c078SMauro Carvalho Chehab===========
2143f6c078SMauro Carvalho Chehab
2243f6c078SMauro Carvalho ChehabNAND flash (at least SLC one) typically has sectors of 256 bytes.
2343f6c078SMauro Carvalho ChehabHowever NAND flash is not extremely reliable so some error detection
2443f6c078SMauro Carvalho Chehab(and sometimes correction) is needed.
2543f6c078SMauro Carvalho Chehab
2643f6c078SMauro Carvalho ChehabThis is done by means of a Hamming code. I'll try to explain it in
2743f6c078SMauro Carvalho Chehablaymans terms (and apologies to all the pro's in the field in case I do
2843f6c078SMauro Carvalho Chehabnot use the right terminology, my coding theory class was almost 30
2943f6c078SMauro Carvalho Chehabyears ago, and I must admit it was not one of my favourites).
3043f6c078SMauro Carvalho Chehab
3143f6c078SMauro Carvalho ChehabAs I said before the ecc calculation is performed on sectors of 256
3243f6c078SMauro Carvalho Chehabbytes. This is done by calculating several parity bits over the rows and
3343f6c078SMauro Carvalho Chehabcolumns. The parity used is even parity which means that the parity bit = 1
3443f6c078SMauro Carvalho Chehabif the data over which the parity is calculated is 1 and the parity bit = 0
3543f6c078SMauro Carvalho Chehabif the data over which the parity is calculated is 0. So the total
3643f6c078SMauro Carvalho Chehabnumber of bits over the data over which the parity is calculated + the
3743f6c078SMauro Carvalho Chehabparity bit is even. (see wikipedia if you can't follow this).
3843f6c078SMauro Carvalho ChehabParity is often calculated by means of an exclusive or operation,
3943f6c078SMauro Carvalho Chehabsometimes also referred to as xor. In C the operator for xor is ^
4043f6c078SMauro Carvalho Chehab
4143f6c078SMauro Carvalho ChehabBack to ecc.
4243f6c078SMauro Carvalho ChehabLet's give a small figure:
4343f6c078SMauro Carvalho Chehab
4443f6c078SMauro Carvalho Chehab=========  ==== ==== ==== ==== ==== ==== ==== ====   === === === === ====
4543f6c078SMauro Carvalho Chehabbyte   0:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp2 rp4 ... rp14
4643f6c078SMauro Carvalho Chehabbyte   1:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp1 rp2 rp4 ... rp14
4743f6c078SMauro Carvalho Chehabbyte   2:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp3 rp4 ... rp14
4843f6c078SMauro Carvalho Chehabbyte   3:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp1 rp3 rp4 ... rp14
4943f6c078SMauro Carvalho Chehabbyte   4:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp2 rp5 ... rp14
5043f6c078SMauro Carvalho Chehab...
5143f6c078SMauro Carvalho Chehabbyte 254:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp3 rp5 ... rp15
5243f6c078SMauro Carvalho Chehabbyte 255:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp1 rp3 rp5 ... rp15
5343f6c078SMauro Carvalho Chehab           cp1  cp0  cp1  cp0  cp1  cp0  cp1  cp0
5443f6c078SMauro Carvalho Chehab           cp3  cp3  cp2  cp2  cp3  cp3  cp2  cp2
5543f6c078SMauro Carvalho Chehab           cp5  cp5  cp5  cp5  cp4  cp4  cp4  cp4
5643f6c078SMauro Carvalho Chehab=========  ==== ==== ==== ==== ==== ==== ==== ====   === === === === ====
5743f6c078SMauro Carvalho Chehab
5843f6c078SMauro Carvalho ChehabThis figure represents a sector of 256 bytes.
5943f6c078SMauro Carvalho Chehabcp is my abbreviation for column parity, rp for row parity.
6043f6c078SMauro Carvalho Chehab
6143f6c078SMauro Carvalho ChehabLet's start to explain column parity.
6243f6c078SMauro Carvalho Chehab
6343f6c078SMauro Carvalho Chehab- cp0 is the parity that belongs to all bit0, bit2, bit4, bit6.
6443f6c078SMauro Carvalho Chehab
6543f6c078SMauro Carvalho Chehab  so the sum of all bit0, bit2, bit4 and bit6 values + cp0 itself is even.
6643f6c078SMauro Carvalho Chehab
6743f6c078SMauro Carvalho ChehabSimilarly cp1 is the sum of all bit1, bit3, bit5 and bit7.
6843f6c078SMauro Carvalho Chehab
6943f6c078SMauro Carvalho Chehab- cp2 is the parity over bit0, bit1, bit4 and bit5
7043f6c078SMauro Carvalho Chehab- cp3 is the parity over bit2, bit3, bit6 and bit7.
7143f6c078SMauro Carvalho Chehab- cp4 is the parity over bit0, bit1, bit2 and bit3.
7243f6c078SMauro Carvalho Chehab- cp5 is the parity over bit4, bit5, bit6 and bit7.
7343f6c078SMauro Carvalho Chehab
7443f6c078SMauro Carvalho ChehabNote that each of cp0 .. cp5 is exactly one bit.
7543f6c078SMauro Carvalho Chehab
7643f6c078SMauro Carvalho ChehabRow parity actually works almost the same.
7743f6c078SMauro Carvalho Chehab
7843f6c078SMauro Carvalho Chehab- rp0 is the parity of all even bytes (0, 2, 4, 6, ... 252, 254)
7943f6c078SMauro Carvalho Chehab- rp1 is the parity of all odd bytes (1, 3, 5, 7, ..., 253, 255)
8043f6c078SMauro Carvalho Chehab- rp2 is the parity of all bytes 0, 1, 4, 5, 8, 9, ...
8143f6c078SMauro Carvalho Chehab  (so handle two bytes, then skip 2 bytes).
8243f6c078SMauro Carvalho Chehab- rp3 is covers the half rp2 does not cover (bytes 2, 3, 6, 7, 10, 11, ...)
8343f6c078SMauro Carvalho Chehab- for rp4 the rule is cover 4 bytes, skip 4 bytes, cover 4 bytes, skip 4 etc.
8443f6c078SMauro Carvalho Chehab
8543f6c078SMauro Carvalho Chehab  so rp4 calculates parity over bytes 0, 1, 2, 3, 8, 9, 10, 11, 16, ...)
8643f6c078SMauro Carvalho Chehab- and rp5 covers the other half, so bytes 4, 5, 6, 7, 12, 13, 14, 15, 20, ..
8743f6c078SMauro Carvalho Chehab
8843f6c078SMauro Carvalho ChehabThe story now becomes quite boring. I guess you get the idea.
8943f6c078SMauro Carvalho Chehab
9043f6c078SMauro Carvalho Chehab- rp6 covers 8 bytes then skips 8 etc
9143f6c078SMauro Carvalho Chehab- rp7 skips 8 bytes then covers 8 etc
9243f6c078SMauro Carvalho Chehab- rp8 covers 16 bytes then skips 16 etc
9343f6c078SMauro Carvalho Chehab- rp9 skips 16 bytes then covers 16 etc
9443f6c078SMauro Carvalho Chehab- rp10 covers 32 bytes then skips 32 etc
9543f6c078SMauro Carvalho Chehab- rp11 skips 32 bytes then covers 32 etc
9643f6c078SMauro Carvalho Chehab- rp12 covers 64 bytes then skips 64 etc
9743f6c078SMauro Carvalho Chehab- rp13 skips 64 bytes then covers 64 etc
9843f6c078SMauro Carvalho Chehab- rp14 covers 128 bytes then skips 128
9943f6c078SMauro Carvalho Chehab- rp15 skips 128 bytes then covers 128
10043f6c078SMauro Carvalho Chehab
10143f6c078SMauro Carvalho ChehabIn the end the parity bits are grouped together in three bytes as
10243f6c078SMauro Carvalho Chehabfollows:
10343f6c078SMauro Carvalho Chehab
10443f6c078SMauro Carvalho Chehab=====  ===== ===== ===== ===== ===== ===== ===== =====
10543f6c078SMauro Carvalho ChehabECC    Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
10643f6c078SMauro Carvalho Chehab=====  ===== ===== ===== ===== ===== ===== ===== =====
10743f6c078SMauro Carvalho ChehabECC 0   rp07  rp06  rp05  rp04  rp03  rp02  rp01  rp00
10843f6c078SMauro Carvalho ChehabECC 1   rp15  rp14  rp13  rp12  rp11  rp10  rp09  rp08
10943f6c078SMauro Carvalho ChehabECC 2   cp5   cp4   cp3   cp2   cp1   cp0      1     1
11043f6c078SMauro Carvalho Chehab=====  ===== ===== ===== ===== ===== ===== ===== =====
11143f6c078SMauro Carvalho Chehab
11243f6c078SMauro Carvalho ChehabI detected after writing this that ST application note AN1823
11343f6c078SMauro Carvalho Chehab(http://www.st.com/stonline/) gives a much
11443f6c078SMauro Carvalho Chehabnicer picture.(but they use line parity as term where I use row parity)
11543f6c078SMauro Carvalho ChehabOh well, I'm graphically challenged, so suffer with me for a moment :-)
11643f6c078SMauro Carvalho Chehab
11743f6c078SMauro Carvalho ChehabAnd I could not reuse the ST picture anyway for copyright reasons.
11843f6c078SMauro Carvalho Chehab
11943f6c078SMauro Carvalho Chehab
12043f6c078SMauro Carvalho ChehabAttempt 0
12143f6c078SMauro Carvalho Chehab=========
12243f6c078SMauro Carvalho Chehab
12343f6c078SMauro Carvalho ChehabImplementing the parity calculation is pretty simple.
12443f6c078SMauro Carvalho ChehabIn C pseudocode::
12543f6c078SMauro Carvalho Chehab
12643f6c078SMauro Carvalho Chehab  for (i = 0; i < 256; i++)
12743f6c078SMauro Carvalho Chehab  {
12843f6c078SMauro Carvalho Chehab    if (i & 0x01)
12943f6c078SMauro Carvalho Chehab       rp1 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp1;
13043f6c078SMauro Carvalho Chehab    else
13143f6c078SMauro Carvalho Chehab       rp0 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp0;
13243f6c078SMauro Carvalho Chehab    if (i & 0x02)
13343f6c078SMauro Carvalho Chehab       rp3 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp3;
13443f6c078SMauro Carvalho Chehab    else
13543f6c078SMauro Carvalho Chehab       rp2 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp2;
13643f6c078SMauro Carvalho Chehab    if (i & 0x04)
13743f6c078SMauro Carvalho Chehab      rp5 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp5;
13843f6c078SMauro Carvalho Chehab    else
13943f6c078SMauro Carvalho Chehab      rp4 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp4;
14043f6c078SMauro Carvalho Chehab    if (i & 0x08)
14143f6c078SMauro Carvalho Chehab      rp7 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp7;
14243f6c078SMauro Carvalho Chehab    else
14343f6c078SMauro Carvalho Chehab      rp6 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp6;
14443f6c078SMauro Carvalho Chehab    if (i & 0x10)
14543f6c078SMauro Carvalho Chehab      rp9 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp9;
14643f6c078SMauro Carvalho Chehab    else
14743f6c078SMauro Carvalho Chehab      rp8 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp8;
14843f6c078SMauro Carvalho Chehab    if (i & 0x20)
14943f6c078SMauro Carvalho Chehab      rp11 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp11;
15043f6c078SMauro Carvalho Chehab    else
15143f6c078SMauro Carvalho Chehab      rp10 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp10;
15243f6c078SMauro Carvalho Chehab    if (i & 0x40)
15343f6c078SMauro Carvalho Chehab      rp13 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp13;
15443f6c078SMauro Carvalho Chehab    else
15543f6c078SMauro Carvalho Chehab      rp12 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp12;
15643f6c078SMauro Carvalho Chehab    if (i & 0x80)
15743f6c078SMauro Carvalho Chehab      rp15 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp15;
15843f6c078SMauro Carvalho Chehab    else
15943f6c078SMauro Carvalho Chehab      rp14 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp14;
16043f6c078SMauro Carvalho Chehab    cp0 = bit6 ^ bit4 ^ bit2 ^ bit0 ^ cp0;
16143f6c078SMauro Carvalho Chehab    cp1 = bit7 ^ bit5 ^ bit3 ^ bit1 ^ cp1;
16243f6c078SMauro Carvalho Chehab    cp2 = bit5 ^ bit4 ^ bit1 ^ bit0 ^ cp2;
16343f6c078SMauro Carvalho Chehab    cp3 = bit7 ^ bit6 ^ bit3 ^ bit2 ^ cp3
16443f6c078SMauro Carvalho Chehab    cp4 = bit3 ^ bit2 ^ bit1 ^ bit0 ^ cp4
16543f6c078SMauro Carvalho Chehab    cp5 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ cp5
16643f6c078SMauro Carvalho Chehab  }
16743f6c078SMauro Carvalho Chehab
16843f6c078SMauro Carvalho Chehab
16943f6c078SMauro Carvalho ChehabAnalysis 0
17043f6c078SMauro Carvalho Chehab==========
17143f6c078SMauro Carvalho Chehab
17243f6c078SMauro Carvalho ChehabC does have bitwise operators but not really operators to do the above
17343f6c078SMauro Carvalho Chehabefficiently (and most hardware has no such instructions either).
17443f6c078SMauro Carvalho ChehabTherefore without implementing this it was clear that the code above was
17543f6c078SMauro Carvalho Chehabnot going to bring me a Nobel prize :-)
17643f6c078SMauro Carvalho Chehab
17743f6c078SMauro Carvalho ChehabFortunately the exclusive or operation is commutative, so we can combine
17843f6c078SMauro Carvalho Chehabthe values in any order. So instead of calculating all the bits
17943f6c078SMauro Carvalho Chehabindividually, let us try to rearrange things.
18043f6c078SMauro Carvalho ChehabFor the column parity this is easy. We can just xor the bytes and in the
18143f6c078SMauro Carvalho Chehabend filter out the relevant bits. This is pretty nice as it will bring
18243f6c078SMauro Carvalho Chehaball cp calculation out of the for loop.
18343f6c078SMauro Carvalho Chehab
18443f6c078SMauro Carvalho ChehabSimilarly we can first xor the bytes for the various rows.
18543f6c078SMauro Carvalho ChehabThis leads to:
18643f6c078SMauro Carvalho Chehab
18743f6c078SMauro Carvalho Chehab
18843f6c078SMauro Carvalho ChehabAttempt 1
18943f6c078SMauro Carvalho Chehab=========
19043f6c078SMauro Carvalho Chehab
19143f6c078SMauro Carvalho Chehab::
19243f6c078SMauro Carvalho Chehab
19343f6c078SMauro Carvalho Chehab  const char parity[256] = {
19443f6c078SMauro Carvalho Chehab      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
19543f6c078SMauro Carvalho Chehab      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
19643f6c078SMauro Carvalho Chehab      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
19743f6c078SMauro Carvalho Chehab      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
19843f6c078SMauro Carvalho Chehab      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
19943f6c078SMauro Carvalho Chehab      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
20043f6c078SMauro Carvalho Chehab      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
20143f6c078SMauro Carvalho Chehab      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
20243f6c078SMauro Carvalho Chehab      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
20343f6c078SMauro Carvalho Chehab      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
20443f6c078SMauro Carvalho Chehab      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
20543f6c078SMauro Carvalho Chehab      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
20643f6c078SMauro Carvalho Chehab      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
20743f6c078SMauro Carvalho Chehab      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
20843f6c078SMauro Carvalho Chehab      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
20943f6c078SMauro Carvalho Chehab      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0
21043f6c078SMauro Carvalho Chehab  };
21143f6c078SMauro Carvalho Chehab
21243f6c078SMauro Carvalho Chehab  void ecc1(const unsigned char *buf, unsigned char *code)
21343f6c078SMauro Carvalho Chehab  {
21443f6c078SMauro Carvalho Chehab      int i;
21543f6c078SMauro Carvalho Chehab      const unsigned char *bp = buf;
21643f6c078SMauro Carvalho Chehab      unsigned char cur;
21743f6c078SMauro Carvalho Chehab      unsigned char rp0, rp1, rp2, rp3, rp4, rp5, rp6, rp7;
21843f6c078SMauro Carvalho Chehab      unsigned char rp8, rp9, rp10, rp11, rp12, rp13, rp14, rp15;
21943f6c078SMauro Carvalho Chehab      unsigned char par;
22043f6c078SMauro Carvalho Chehab
22143f6c078SMauro Carvalho Chehab      par = 0;
22243f6c078SMauro Carvalho Chehab      rp0 = 0; rp1 = 0; rp2 = 0; rp3 = 0;
22343f6c078SMauro Carvalho Chehab      rp4 = 0; rp5 = 0; rp6 = 0; rp7 = 0;
22443f6c078SMauro Carvalho Chehab      rp8 = 0; rp9 = 0; rp10 = 0; rp11 = 0;
22543f6c078SMauro Carvalho Chehab      rp12 = 0; rp13 = 0; rp14 = 0; rp15 = 0;
22643f6c078SMauro Carvalho Chehab
22743f6c078SMauro Carvalho Chehab      for (i = 0; i < 256; i++)
22843f6c078SMauro Carvalho Chehab      {
22943f6c078SMauro Carvalho Chehab          cur = *bp++;
23043f6c078SMauro Carvalho Chehab          par ^= cur;
23143f6c078SMauro Carvalho Chehab          if (i & 0x01) rp1 ^= cur; else rp0 ^= cur;
23243f6c078SMauro Carvalho Chehab          if (i & 0x02) rp3 ^= cur; else rp2 ^= cur;
23343f6c078SMauro Carvalho Chehab          if (i & 0x04) rp5 ^= cur; else rp4 ^= cur;
23443f6c078SMauro Carvalho Chehab          if (i & 0x08) rp7 ^= cur; else rp6 ^= cur;
23543f6c078SMauro Carvalho Chehab          if (i & 0x10) rp9 ^= cur; else rp8 ^= cur;
23643f6c078SMauro Carvalho Chehab          if (i & 0x20) rp11 ^= cur; else rp10 ^= cur;
23743f6c078SMauro Carvalho Chehab          if (i & 0x40) rp13 ^= cur; else rp12 ^= cur;
23843f6c078SMauro Carvalho Chehab          if (i & 0x80) rp15 ^= cur; else rp14 ^= cur;
23943f6c078SMauro Carvalho Chehab      }
24043f6c078SMauro Carvalho Chehab      code[0] =
24143f6c078SMauro Carvalho Chehab          (parity[rp7] << 7) |
24243f6c078SMauro Carvalho Chehab          (parity[rp6] << 6) |
24343f6c078SMauro Carvalho Chehab          (parity[rp5] << 5) |
24443f6c078SMauro Carvalho Chehab          (parity[rp4] << 4) |
24543f6c078SMauro Carvalho Chehab          (parity[rp3] << 3) |
24643f6c078SMauro Carvalho Chehab          (parity[rp2] << 2) |
24743f6c078SMauro Carvalho Chehab          (parity[rp1] << 1) |
24843f6c078SMauro Carvalho Chehab          (parity[rp0]);
24943f6c078SMauro Carvalho Chehab      code[1] =
25043f6c078SMauro Carvalho Chehab          (parity[rp15] << 7) |
25143f6c078SMauro Carvalho Chehab          (parity[rp14] << 6) |
25243f6c078SMauro Carvalho Chehab          (parity[rp13] << 5) |
25343f6c078SMauro Carvalho Chehab          (parity[rp12] << 4) |
25443f6c078SMauro Carvalho Chehab          (parity[rp11] << 3) |
25543f6c078SMauro Carvalho Chehab          (parity[rp10] << 2) |
25643f6c078SMauro Carvalho Chehab          (parity[rp9]  << 1) |
25743f6c078SMauro Carvalho Chehab          (parity[rp8]);
25843f6c078SMauro Carvalho Chehab      code[2] =
25943f6c078SMauro Carvalho Chehab          (parity[par & 0xf0] << 7) |
26043f6c078SMauro Carvalho Chehab          (parity[par & 0x0f] << 6) |
26143f6c078SMauro Carvalho Chehab          (parity[par & 0xcc] << 5) |
26243f6c078SMauro Carvalho Chehab          (parity[par & 0x33] << 4) |
26343f6c078SMauro Carvalho Chehab          (parity[par & 0xaa] << 3) |
26443f6c078SMauro Carvalho Chehab          (parity[par & 0x55] << 2);
26543f6c078SMauro Carvalho Chehab      code[0] = ~code[0];
26643f6c078SMauro Carvalho Chehab      code[1] = ~code[1];
26743f6c078SMauro Carvalho Chehab      code[2] = ~code[2];
26843f6c078SMauro Carvalho Chehab  }
26943f6c078SMauro Carvalho Chehab
27043f6c078SMauro Carvalho ChehabStill pretty straightforward. The last three invert statements are there to
27143f6c078SMauro Carvalho Chehabgive a checksum of 0xff 0xff 0xff for an empty flash. In an empty flash
27243f6c078SMauro Carvalho Chehaball data is 0xff, so the checksum then matches.
27343f6c078SMauro Carvalho Chehab
27443f6c078SMauro Carvalho ChehabI also introduced the parity lookup. I expected this to be the fastest
27543f6c078SMauro Carvalho Chehabway to calculate the parity, but I will investigate alternatives later
27643f6c078SMauro Carvalho Chehabon.
27743f6c078SMauro Carvalho Chehab
27843f6c078SMauro Carvalho Chehab
27943f6c078SMauro Carvalho ChehabAnalysis 1
28043f6c078SMauro Carvalho Chehab==========
28143f6c078SMauro Carvalho Chehab
28243f6c078SMauro Carvalho ChehabThe code works, but is not terribly efficient. On my system it took
28343f6c078SMauro Carvalho Chehabalmost 4 times as much time as the linux driver code. But hey, if it was
28443f6c078SMauro Carvalho Chehab*that* easy this would have been done long before.
28543f6c078SMauro Carvalho ChehabNo pain. no gain.
28643f6c078SMauro Carvalho Chehab
28743f6c078SMauro Carvalho ChehabFortunately there is plenty of room for improvement.
28843f6c078SMauro Carvalho Chehab
28943f6c078SMauro Carvalho ChehabIn step 1 we moved from bit-wise calculation to byte-wise calculation.
29043f6c078SMauro Carvalho ChehabHowever in C we can also use the unsigned long data type and virtually
29143f6c078SMauro Carvalho Chehabevery modern microprocessor supports 32 bit operations, so why not try
29243f6c078SMauro Carvalho Chehabto write our code in such a way that we process data in 32 bit chunks.
29343f6c078SMauro Carvalho Chehab
29443f6c078SMauro Carvalho ChehabOf course this means some modification as the row parity is byte by
29543f6c078SMauro Carvalho Chehabbyte. A quick analysis:
29643f6c078SMauro Carvalho Chehabfor the column parity we use the par variable. When extending to 32 bits
29743f6c078SMauro Carvalho Chehabwe can in the end easily calculate rp0 and rp1 from it.
29843f6c078SMauro Carvalho Chehab(because par now consists of 4 bytes, contributing to rp1, rp0, rp1, rp0
29943f6c078SMauro Carvalho Chehabrespectively, from MSB to LSB)
30043f6c078SMauro Carvalho Chehabalso rp2 and rp3 can be easily retrieved from par as rp3 covers the
30143f6c078SMauro Carvalho Chehabfirst two MSBs and rp2 covers the last two LSBs.
30243f6c078SMauro Carvalho Chehab
30343f6c078SMauro Carvalho ChehabNote that of course now the loop is executed only 64 times (256/4).
30443f6c078SMauro Carvalho ChehabAnd note that care must taken wrt byte ordering. The way bytes are
30543f6c078SMauro Carvalho Chehabordered in a long is machine dependent, and might affect us.
30643f6c078SMauro Carvalho ChehabAnyway, if there is an issue: this code is developed on x86 (to be
30743f6c078SMauro Carvalho Chehabprecise: a DELL PC with a D920 Intel CPU)
30843f6c078SMauro Carvalho Chehab
30943f6c078SMauro Carvalho ChehabAnd of course the performance might depend on alignment, but I expect
31043f6c078SMauro Carvalho Chehabthat the I/O buffers in the nand driver are aligned properly (and
31143f6c078SMauro Carvalho Chehabotherwise that should be fixed to get maximum performance).
31243f6c078SMauro Carvalho Chehab
31343f6c078SMauro Carvalho ChehabLet's give it a try...
31443f6c078SMauro Carvalho Chehab
31543f6c078SMauro Carvalho Chehab
31643f6c078SMauro Carvalho ChehabAttempt 2
31743f6c078SMauro Carvalho Chehab=========
31843f6c078SMauro Carvalho Chehab
31943f6c078SMauro Carvalho Chehab::
32043f6c078SMauro Carvalho Chehab
32143f6c078SMauro Carvalho Chehab  extern const char parity[256];
32243f6c078SMauro Carvalho Chehab
32343f6c078SMauro Carvalho Chehab  void ecc2(const unsigned char *buf, unsigned char *code)
32443f6c078SMauro Carvalho Chehab  {
32543f6c078SMauro Carvalho Chehab      int i;
32643f6c078SMauro Carvalho Chehab      const unsigned long *bp = (unsigned long *)buf;
32743f6c078SMauro Carvalho Chehab      unsigned long cur;
32843f6c078SMauro Carvalho Chehab      unsigned long rp0, rp1, rp2, rp3, rp4, rp5, rp6, rp7;
32943f6c078SMauro Carvalho Chehab      unsigned long rp8, rp9, rp10, rp11, rp12, rp13, rp14, rp15;
33043f6c078SMauro Carvalho Chehab      unsigned long par;
33143f6c078SMauro Carvalho Chehab
33243f6c078SMauro Carvalho Chehab      par = 0;
33343f6c078SMauro Carvalho Chehab      rp0 = 0; rp1 = 0; rp2 = 0; rp3 = 0;
33443f6c078SMauro Carvalho Chehab      rp4 = 0; rp5 = 0; rp6 = 0; rp7 = 0;
33543f6c078SMauro Carvalho Chehab      rp8 = 0; rp9 = 0; rp10 = 0; rp11 = 0;
33643f6c078SMauro Carvalho Chehab      rp12 = 0; rp13 = 0; rp14 = 0; rp15 = 0;
33743f6c078SMauro Carvalho Chehab
33843f6c078SMauro Carvalho Chehab      for (i = 0; i < 64; i++)
33943f6c078SMauro Carvalho Chehab      {
34043f6c078SMauro Carvalho Chehab          cur = *bp++;
34143f6c078SMauro Carvalho Chehab          par ^= cur;
34243f6c078SMauro Carvalho Chehab          if (i & 0x01) rp5 ^= cur; else rp4 ^= cur;
34343f6c078SMauro Carvalho Chehab          if (i & 0x02) rp7 ^= cur; else rp6 ^= cur;
34443f6c078SMauro Carvalho Chehab          if (i & 0x04) rp9 ^= cur; else rp8 ^= cur;
34543f6c078SMauro Carvalho Chehab          if (i & 0x08) rp11 ^= cur; else rp10 ^= cur;
34643f6c078SMauro Carvalho Chehab          if (i & 0x10) rp13 ^= cur; else rp12 ^= cur;
34743f6c078SMauro Carvalho Chehab          if (i & 0x20) rp15 ^= cur; else rp14 ^= cur;
34843f6c078SMauro Carvalho Chehab      }
34943f6c078SMauro Carvalho Chehab      /*
35043f6c078SMauro Carvalho Chehab         we need to adapt the code generation for the fact that rp vars are now
35143f6c078SMauro Carvalho Chehab         long; also the column parity calculation needs to be changed.
35243f6c078SMauro Carvalho Chehab         we'll bring rp4 to 15 back to single byte entities by shifting and
35343f6c078SMauro Carvalho Chehab         xoring
35443f6c078SMauro Carvalho Chehab      */
35543f6c078SMauro Carvalho Chehab      rp4 ^= (rp4 >> 16); rp4 ^= (rp4 >> 8); rp4 &= 0xff;
35643f6c078SMauro Carvalho Chehab      rp5 ^= (rp5 >> 16); rp5 ^= (rp5 >> 8); rp5 &= 0xff;
35743f6c078SMauro Carvalho Chehab      rp6 ^= (rp6 >> 16); rp6 ^= (rp6 >> 8); rp6 &= 0xff;
35843f6c078SMauro Carvalho Chehab      rp7 ^= (rp7 >> 16); rp7 ^= (rp7 >> 8); rp7 &= 0xff;
35943f6c078SMauro Carvalho Chehab      rp8 ^= (rp8 >> 16); rp8 ^= (rp8 >> 8); rp8 &= 0xff;
36043f6c078SMauro Carvalho Chehab      rp9 ^= (rp9 >> 16); rp9 ^= (rp9 >> 8); rp9 &= 0xff;
36143f6c078SMauro Carvalho Chehab      rp10 ^= (rp10 >> 16); rp10 ^= (rp10 >> 8); rp10 &= 0xff;
36243f6c078SMauro Carvalho Chehab      rp11 ^= (rp11 >> 16); rp11 ^= (rp11 >> 8); rp11 &= 0xff;
36343f6c078SMauro Carvalho Chehab      rp12 ^= (rp12 >> 16); rp12 ^= (rp12 >> 8); rp12 &= 0xff;
36443f6c078SMauro Carvalho Chehab      rp13 ^= (rp13 >> 16); rp13 ^= (rp13 >> 8); rp13 &= 0xff;
36543f6c078SMauro Carvalho Chehab      rp14 ^= (rp14 >> 16); rp14 ^= (rp14 >> 8); rp14 &= 0xff;
36643f6c078SMauro Carvalho Chehab      rp15 ^= (rp15 >> 16); rp15 ^= (rp15 >> 8); rp15 &= 0xff;
36743f6c078SMauro Carvalho Chehab      rp3 = (par >> 16); rp3 ^= (rp3 >> 8); rp3 &= 0xff;
36843f6c078SMauro Carvalho Chehab      rp2 = par & 0xffff; rp2 ^= (rp2 >> 8); rp2 &= 0xff;
36943f6c078SMauro Carvalho Chehab      par ^= (par >> 16);
37043f6c078SMauro Carvalho Chehab      rp1 = (par >> 8); rp1 &= 0xff;
37143f6c078SMauro Carvalho Chehab      rp0 = (par & 0xff);
37243f6c078SMauro Carvalho Chehab      par ^= (par >> 8); par &= 0xff;
37343f6c078SMauro Carvalho Chehab
37443f6c078SMauro Carvalho Chehab      code[0] =
37543f6c078SMauro Carvalho Chehab          (parity[rp7] << 7) |
37643f6c078SMauro Carvalho Chehab          (parity[rp6] << 6) |
37743f6c078SMauro Carvalho Chehab          (parity[rp5] << 5) |
37843f6c078SMauro Carvalho Chehab          (parity[rp4] << 4) |
37943f6c078SMauro Carvalho Chehab          (parity[rp3] << 3) |
38043f6c078SMauro Carvalho Chehab          (parity[rp2] << 2) |
38143f6c078SMauro Carvalho Chehab          (parity[rp1] << 1) |
38243f6c078SMauro Carvalho Chehab          (parity[rp0]);
38343f6c078SMauro Carvalho Chehab      code[1] =
38443f6c078SMauro Carvalho Chehab          (parity[rp15] << 7) |
38543f6c078SMauro Carvalho Chehab          (parity[rp14] << 6) |
38643f6c078SMauro Carvalho Chehab          (parity[rp13] << 5) |
38743f6c078SMauro Carvalho Chehab          (parity[rp12] << 4) |
38843f6c078SMauro Carvalho Chehab          (parity[rp11] << 3) |
38943f6c078SMauro Carvalho Chehab          (parity[rp10] << 2) |
39043f6c078SMauro Carvalho Chehab          (parity[rp9]  << 1) |
39143f6c078SMauro Carvalho Chehab          (parity[rp8]);
39243f6c078SMauro Carvalho Chehab      code[2] =
39343f6c078SMauro Carvalho Chehab          (parity[par & 0xf0] << 7) |
39443f6c078SMauro Carvalho Chehab          (parity[par & 0x0f] << 6) |
39543f6c078SMauro Carvalho Chehab          (parity[par & 0xcc] << 5) |
39643f6c078SMauro Carvalho Chehab          (parity[par & 0x33] << 4) |
39743f6c078SMauro Carvalho Chehab          (parity[par & 0xaa] << 3) |
39843f6c078SMauro Carvalho Chehab          (parity[par & 0x55] << 2);
39943f6c078SMauro Carvalho Chehab      code[0] = ~code[0];
40043f6c078SMauro Carvalho Chehab      code[1] = ~code[1];
40143f6c078SMauro Carvalho Chehab      code[2] = ~code[2];
40243f6c078SMauro Carvalho Chehab  }
40343f6c078SMauro Carvalho Chehab
40443f6c078SMauro Carvalho ChehabThe parity array is not shown any more. Note also that for these
40543f6c078SMauro Carvalho Chehabexamples I kinda deviated from my regular programming style by allowing
40643f6c078SMauro Carvalho Chehabmultiple statements on a line, not using { } in then and else blocks
40743f6c078SMauro Carvalho Chehabwith only a single statement and by using operators like ^=
40843f6c078SMauro Carvalho Chehab
40943f6c078SMauro Carvalho Chehab
41043f6c078SMauro Carvalho ChehabAnalysis 2
41143f6c078SMauro Carvalho Chehab==========
41243f6c078SMauro Carvalho Chehab
41343f6c078SMauro Carvalho ChehabThe code (of course) works, and hurray: we are a little bit faster than
41443f6c078SMauro Carvalho Chehabthe linux driver code (about 15%). But wait, don't cheer too quickly.
41543f6c078SMauro Carvalho ChehabThere is more to be gained.
41643f6c078SMauro Carvalho ChehabIf we look at e.g. rp14 and rp15 we see that we either xor our data with
41743f6c078SMauro Carvalho Chehabrp14 or with rp15. However we also have par which goes over all data.
41843f6c078SMauro Carvalho ChehabThis means there is no need to calculate rp14 as it can be calculated from
41943f6c078SMauro Carvalho Chehabrp15 through rp14 = par ^ rp15, because par = rp14 ^ rp15;
42043f6c078SMauro Carvalho Chehab(or if desired we can avoid calculating rp15 and calculate it from
42143f6c078SMauro Carvalho Chehabrp14).  That is why some places refer to inverse parity.
42243f6c078SMauro Carvalho ChehabOf course the same thing holds for rp4/5, rp6/7, rp8/9, rp10/11 and rp12/13.
42343f6c078SMauro Carvalho ChehabEffectively this means we can eliminate the else clause from the if
42443f6c078SMauro Carvalho Chehabstatements. Also we can optimise the calculation in the end a little bit
42543f6c078SMauro Carvalho Chehabby going from long to byte first. Actually we can even avoid the table
42643f6c078SMauro Carvalho Chehablookups
42743f6c078SMauro Carvalho Chehab
42843f6c078SMauro Carvalho ChehabAttempt 3
42943f6c078SMauro Carvalho Chehab=========
43043f6c078SMauro Carvalho Chehab
43143f6c078SMauro Carvalho ChehabOdd replaced::
43243f6c078SMauro Carvalho Chehab
43343f6c078SMauro Carvalho Chehab          if (i & 0x01) rp5 ^= cur; else rp4 ^= cur;
43443f6c078SMauro Carvalho Chehab          if (i & 0x02) rp7 ^= cur; else rp6 ^= cur;
43543f6c078SMauro Carvalho Chehab          if (i & 0x04) rp9 ^= cur; else rp8 ^= cur;
43643f6c078SMauro Carvalho Chehab          if (i & 0x08) rp11 ^= cur; else rp10 ^= cur;
43743f6c078SMauro Carvalho Chehab          if (i & 0x10) rp13 ^= cur; else rp12 ^= cur;
43843f6c078SMauro Carvalho Chehab          if (i & 0x20) rp15 ^= cur; else rp14 ^= cur;
43943f6c078SMauro Carvalho Chehab
44043f6c078SMauro Carvalho Chehabwith::
44143f6c078SMauro Carvalho Chehab
44243f6c078SMauro Carvalho Chehab          if (i & 0x01) rp5 ^= cur;
44343f6c078SMauro Carvalho Chehab          if (i & 0x02) rp7 ^= cur;
44443f6c078SMauro Carvalho Chehab          if (i & 0x04) rp9 ^= cur;
44543f6c078SMauro Carvalho Chehab          if (i & 0x08) rp11 ^= cur;
44643f6c078SMauro Carvalho Chehab          if (i & 0x10) rp13 ^= cur;
44743f6c078SMauro Carvalho Chehab          if (i & 0x20) rp15 ^= cur;
44843f6c078SMauro Carvalho Chehab
44943f6c078SMauro Carvalho Chehaband outside the loop added::
45043f6c078SMauro Carvalho Chehab
45143f6c078SMauro Carvalho Chehab          rp4  = par ^ rp5;
45243f6c078SMauro Carvalho Chehab          rp6  = par ^ rp7;
45343f6c078SMauro Carvalho Chehab          rp8  = par ^ rp9;
45443f6c078SMauro Carvalho Chehab          rp10  = par ^ rp11;
45543f6c078SMauro Carvalho Chehab          rp12  = par ^ rp13;
45643f6c078SMauro Carvalho Chehab          rp14  = par ^ rp15;
45743f6c078SMauro Carvalho Chehab
45843f6c078SMauro Carvalho ChehabAnd after that the code takes about 30% more time, although the number of
45943f6c078SMauro Carvalho Chehabstatements is reduced. This is also reflected in the assembly code.
46043f6c078SMauro Carvalho Chehab
46143f6c078SMauro Carvalho Chehab
46243f6c078SMauro Carvalho ChehabAnalysis 3
46343f6c078SMauro Carvalho Chehab==========
46443f6c078SMauro Carvalho Chehab
46543f6c078SMauro Carvalho ChehabVery weird. Guess it has to do with caching or instruction parallellism
46643f6c078SMauro Carvalho Chehabor so. I also tried on an eeePC (Celeron, clocked at 900 Mhz). Interesting
46743f6c078SMauro Carvalho Chehabobservation was that this one is only 30% slower (according to time)
46843f6c078SMauro Carvalho Chehabexecuting the code as my 3Ghz D920 processor.
46943f6c078SMauro Carvalho Chehab
47043f6c078SMauro Carvalho ChehabWell, it was expected not to be easy so maybe instead move to a
47143f6c078SMauro Carvalho Chehabdifferent track: let's move back to the code from attempt2 and do some
47243f6c078SMauro Carvalho Chehabloop unrolling. This will eliminate a few if statements. I'll try
47343f6c078SMauro Carvalho Chehabdifferent amounts of unrolling to see what works best.
47443f6c078SMauro Carvalho Chehab
47543f6c078SMauro Carvalho Chehab
47643f6c078SMauro Carvalho ChehabAttempt 4
47743f6c078SMauro Carvalho Chehab=========
47843f6c078SMauro Carvalho Chehab
47943f6c078SMauro Carvalho ChehabUnrolled the loop 1, 2, 3 and 4 times.
48043f6c078SMauro Carvalho ChehabFor 4 the code starts with::
48143f6c078SMauro Carvalho Chehab
48243f6c078SMauro Carvalho Chehab    for (i = 0; i < 4; i++)
48343f6c078SMauro Carvalho Chehab    {
48443f6c078SMauro Carvalho Chehab        cur = *bp++;
48543f6c078SMauro Carvalho Chehab        par ^= cur;
48643f6c078SMauro Carvalho Chehab        rp4 ^= cur;
48743f6c078SMauro Carvalho Chehab        rp6 ^= cur;
48843f6c078SMauro Carvalho Chehab        rp8 ^= cur;
48943f6c078SMauro Carvalho Chehab        rp10 ^= cur;
49043f6c078SMauro Carvalho Chehab        if (i & 0x1) rp13 ^= cur; else rp12 ^= cur;
49143f6c078SMauro Carvalho Chehab        if (i & 0x2) rp15 ^= cur; else rp14 ^= cur;
49243f6c078SMauro Carvalho Chehab        cur = *bp++;
49343f6c078SMauro Carvalho Chehab        par ^= cur;
49443f6c078SMauro Carvalho Chehab        rp5 ^= cur;
49543f6c078SMauro Carvalho Chehab        rp6 ^= cur;
49643f6c078SMauro Carvalho Chehab        ...
49743f6c078SMauro Carvalho Chehab
49843f6c078SMauro Carvalho Chehab
49943f6c078SMauro Carvalho ChehabAnalysis 4
50043f6c078SMauro Carvalho Chehab==========
50143f6c078SMauro Carvalho Chehab
50243f6c078SMauro Carvalho ChehabUnrolling once gains about 15%
50343f6c078SMauro Carvalho Chehab
50443f6c078SMauro Carvalho ChehabUnrolling twice keeps the gain at about 15%
50543f6c078SMauro Carvalho Chehab
50643f6c078SMauro Carvalho ChehabUnrolling three times gives a gain of 30% compared to attempt 2.
50743f6c078SMauro Carvalho Chehab
50843f6c078SMauro Carvalho ChehabUnrolling four times gives a marginal improvement compared to unrolling
50943f6c078SMauro Carvalho Chehabthree times.
51043f6c078SMauro Carvalho Chehab
51143f6c078SMauro Carvalho ChehabI decided to proceed with a four time unrolled loop anyway. It was my gut
51243f6c078SMauro Carvalho Chehabfeeling that in the next steps I would obtain additional gain from it.
51343f6c078SMauro Carvalho Chehab
51443f6c078SMauro Carvalho ChehabThe next step was triggered by the fact that par contains the xor of all
51543f6c078SMauro Carvalho Chehabbytes and rp4 and rp5 each contain the xor of half of the bytes.
51643f6c078SMauro Carvalho ChehabSo in effect par = rp4 ^ rp5. But as xor is commutative we can also say
51743f6c078SMauro Carvalho Chehabthat rp5 = par ^ rp4. So no need to keep both rp4 and rp5 around. We can
51843f6c078SMauro Carvalho Chehabeliminate rp5 (or rp4, but I already foresaw another optimisation).
51943f6c078SMauro Carvalho ChehabThe same holds for rp6/7, rp8/9, rp10/11 rp12/13 and rp14/15.
52043f6c078SMauro Carvalho Chehab
52143f6c078SMauro Carvalho Chehab
52243f6c078SMauro Carvalho ChehabAttempt 5
52343f6c078SMauro Carvalho Chehab=========
52443f6c078SMauro Carvalho Chehab
52543f6c078SMauro Carvalho ChehabEffectively so all odd digit rp assignments in the loop were removed.
52643f6c078SMauro Carvalho ChehabThis included the else clause of the if statements.
52743f6c078SMauro Carvalho ChehabOf course after the loop we need to correct things by adding code like::
52843f6c078SMauro Carvalho Chehab
52943f6c078SMauro Carvalho Chehab    rp5 = par ^ rp4;
53043f6c078SMauro Carvalho Chehab
53143f6c078SMauro Carvalho ChehabAlso the initial assignments (rp5 = 0; etc) could be removed.
53243f6c078SMauro Carvalho ChehabAlong the line I also removed the initialisation of rp0/1/2/3.
53343f6c078SMauro Carvalho Chehab
53443f6c078SMauro Carvalho Chehab
53543f6c078SMauro Carvalho ChehabAnalysis 5
53643f6c078SMauro Carvalho Chehab==========
53743f6c078SMauro Carvalho Chehab
53843f6c078SMauro Carvalho ChehabMeasurements showed this was a good move. The run-time roughly halved
53943f6c078SMauro Carvalho Chehabcompared with attempt 4 with 4 times unrolled, and we only require 1/3rd
54043f6c078SMauro Carvalho Chehabof the processor time compared to the current code in the linux kernel.
54143f6c078SMauro Carvalho Chehab
54243f6c078SMauro Carvalho ChehabHowever, still I thought there was more. I didn't like all the if
54343f6c078SMauro Carvalho Chehabstatements. Why not keep a running parity and only keep the last if
54443f6c078SMauro Carvalho Chehabstatement. Time for yet another version!
54543f6c078SMauro Carvalho Chehab
54643f6c078SMauro Carvalho Chehab
54743f6c078SMauro Carvalho ChehabAttempt 6
54843f6c078SMauro Carvalho Chehab=========
54943f6c078SMauro Carvalho Chehab
55043f6c078SMauro Carvalho ChehabTHe code within the for loop was changed to::
55143f6c078SMauro Carvalho Chehab
55243f6c078SMauro Carvalho Chehab    for (i = 0; i < 4; i++)
55343f6c078SMauro Carvalho Chehab    {
55443f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar  = cur; rp4 ^= cur;
55543f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp6 ^= tmppar;
55643f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
55743f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp8 ^= tmppar;
55843f6c078SMauro Carvalho Chehab
55943f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur;
56043f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
56143f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
56243f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp10 ^= tmppar;
56343f6c078SMauro Carvalho Chehab
56443f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur; rp8 ^= cur;
56543f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp6 ^= cur; rp8 ^= cur;
56643f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp8 ^= cur;
56743f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp8 ^= cur;
56843f6c078SMauro Carvalho Chehab
56943f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur;
57043f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
57143f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
57243f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur;
57343f6c078SMauro Carvalho Chehab
57443f6c078SMauro Carvalho Chehab        par ^= tmppar;
57543f6c078SMauro Carvalho Chehab        if ((i & 0x1) == 0) rp12 ^= tmppar;
57643f6c078SMauro Carvalho Chehab        if ((i & 0x2) == 0) rp14 ^= tmppar;
57743f6c078SMauro Carvalho Chehab    }
57843f6c078SMauro Carvalho Chehab
57943f6c078SMauro Carvalho ChehabAs you can see tmppar is used to accumulate the parity within a for
58043f6c078SMauro Carvalho Chehabiteration. In the last 3 statements is added to par and, if needed,
58143f6c078SMauro Carvalho Chehabto rp12 and rp14.
58243f6c078SMauro Carvalho Chehab
58343f6c078SMauro Carvalho ChehabWhile making the changes I also found that I could exploit that tmppar
58443f6c078SMauro Carvalho Chehabcontains the running parity for this iteration. So instead of having:
58543f6c078SMauro Carvalho Chehabrp4 ^= cur; rp6 ^= cur;
58643f6c078SMauro Carvalho ChehabI removed the rp6 ^= cur; statement and did rp6 ^= tmppar; on next
58743f6c078SMauro Carvalho Chehabstatement. A similar change was done for rp8 and rp10
58843f6c078SMauro Carvalho Chehab
58943f6c078SMauro Carvalho Chehab
59043f6c078SMauro Carvalho ChehabAnalysis 6
59143f6c078SMauro Carvalho Chehab==========
59243f6c078SMauro Carvalho Chehab
59343f6c078SMauro Carvalho ChehabMeasuring this code again showed big gain. When executing the original
59443f6c078SMauro Carvalho Chehablinux code 1 million times, this took about 1 second on my system.
59543f6c078SMauro Carvalho Chehab(using time to measure the performance). After this iteration I was back
59643f6c078SMauro Carvalho Chehabto 0.075 sec. Actually I had to decide to start measuring over 10
59743f6c078SMauro Carvalho Chehabmillion iterations in order not to lose too much accuracy. This one
59843f6c078SMauro Carvalho Chehabdefinitely seemed to be the jackpot!
59943f6c078SMauro Carvalho Chehab
60043f6c078SMauro Carvalho ChehabThere is a little bit more room for improvement though. There are three
60143f6c078SMauro Carvalho Chehabplaces with statements::
60243f6c078SMauro Carvalho Chehab
60343f6c078SMauro Carvalho Chehab	rp4 ^= cur; rp6 ^= cur;
60443f6c078SMauro Carvalho Chehab
60543f6c078SMauro Carvalho ChehabIt seems more efficient to also maintain a variable rp4_6 in the while
60643f6c078SMauro Carvalho Chehabloop; This eliminates 3 statements per loop. Of course after the loop we
60743f6c078SMauro Carvalho Chehabneed to correct by adding::
60843f6c078SMauro Carvalho Chehab
60943f6c078SMauro Carvalho Chehab	rp4 ^= rp4_6;
61043f6c078SMauro Carvalho Chehab	rp6 ^= rp4_6
61143f6c078SMauro Carvalho Chehab
61243f6c078SMauro Carvalho ChehabFurthermore there are 4 sequential assignments to rp8. This can be
61343f6c078SMauro Carvalho Chehabencoded slightly more efficiently by saving tmppar before those 4 lines
61443f6c078SMauro Carvalho Chehaband later do rp8 = rp8 ^ tmppar ^ notrp8;
61543f6c078SMauro Carvalho Chehab(where notrp8 is the value of rp8 before those 4 lines).
61643f6c078SMauro Carvalho ChehabAgain a use of the commutative property of xor.
61743f6c078SMauro Carvalho ChehabTime for a new test!
61843f6c078SMauro Carvalho Chehab
61943f6c078SMauro Carvalho Chehab
62043f6c078SMauro Carvalho ChehabAttempt 7
62143f6c078SMauro Carvalho Chehab=========
62243f6c078SMauro Carvalho Chehab
62343f6c078SMauro Carvalho ChehabThe new code now looks like::
62443f6c078SMauro Carvalho Chehab
62543f6c078SMauro Carvalho Chehab    for (i = 0; i < 4; i++)
62643f6c078SMauro Carvalho Chehab    {
62743f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar  = cur; rp4 ^= cur;
62843f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp6 ^= tmppar;
62943f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
63043f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp8 ^= tmppar;
63143f6c078SMauro Carvalho Chehab
63243f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
63343f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
63443f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
63543f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp10 ^= tmppar;
63643f6c078SMauro Carvalho Chehab
63743f6c078SMauro Carvalho Chehab        notrp8 = tmppar;
63843f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
63943f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
64043f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
64143f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur;
64243f6c078SMauro Carvalho Chehab        rp8 = rp8 ^ tmppar ^ notrp8;
64343f6c078SMauro Carvalho Chehab
64443f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
64543f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
64643f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
64743f6c078SMauro Carvalho Chehab        cur = *bp++; tmppar ^= cur;
64843f6c078SMauro Carvalho Chehab
64943f6c078SMauro Carvalho Chehab        par ^= tmppar;
65043f6c078SMauro Carvalho Chehab        if ((i & 0x1) == 0) rp12 ^= tmppar;
65143f6c078SMauro Carvalho Chehab        if ((i & 0x2) == 0) rp14 ^= tmppar;
65243f6c078SMauro Carvalho Chehab    }
65343f6c078SMauro Carvalho Chehab    rp4 ^= rp4_6;
65443f6c078SMauro Carvalho Chehab    rp6 ^= rp4_6;
65543f6c078SMauro Carvalho Chehab
65643f6c078SMauro Carvalho Chehab
65743f6c078SMauro Carvalho ChehabNot a big change, but every penny counts :-)
65843f6c078SMauro Carvalho Chehab
65943f6c078SMauro Carvalho Chehab
66043f6c078SMauro Carvalho ChehabAnalysis 7
66143f6c078SMauro Carvalho Chehab==========
66243f6c078SMauro Carvalho Chehab
66343f6c078SMauro Carvalho ChehabActually this made things worse. Not very much, but I don't want to move
66443f6c078SMauro Carvalho Chehabinto the wrong direction. Maybe something to investigate later. Could
66543f6c078SMauro Carvalho Chehabhave to do with caching again.
66643f6c078SMauro Carvalho Chehab
66743f6c078SMauro Carvalho ChehabGuess that is what there is to win within the loop. Maybe unrolling one
66843f6c078SMauro Carvalho Chehabmore time will help. I'll keep the optimisations from 7 for now.
66943f6c078SMauro Carvalho Chehab
67043f6c078SMauro Carvalho Chehab
67143f6c078SMauro Carvalho ChehabAttempt 8
67243f6c078SMauro Carvalho Chehab=========
67343f6c078SMauro Carvalho Chehab
67443f6c078SMauro Carvalho ChehabUnrolled the loop one more time.
67543f6c078SMauro Carvalho Chehab
67643f6c078SMauro Carvalho Chehab
67743f6c078SMauro Carvalho ChehabAnalysis 8
67843f6c078SMauro Carvalho Chehab==========
67943f6c078SMauro Carvalho Chehab
68043f6c078SMauro Carvalho ChehabThis makes things worse. Let's stick with attempt 6 and continue from there.
68143f6c078SMauro Carvalho ChehabAlthough it seems that the code within the loop cannot be optimised
68243f6c078SMauro Carvalho Chehabfurther there is still room to optimize the generation of the ecc codes.
68343f6c078SMauro Carvalho ChehabWe can simply calculate the total parity. If this is 0 then rp4 = rp5
68443f6c078SMauro Carvalho Chehabetc. If the parity is 1, then rp4 = !rp5;
68543f6c078SMauro Carvalho Chehab
68643f6c078SMauro Carvalho ChehabBut if rp4 = rp5 we do not need rp5 etc. We can just write the even bits
68743f6c078SMauro Carvalho Chehabin the result byte and then do something like::
68843f6c078SMauro Carvalho Chehab
68943f6c078SMauro Carvalho Chehab    code[0] |= (code[0] << 1);
69043f6c078SMauro Carvalho Chehab
69143f6c078SMauro Carvalho ChehabLets test this.
69243f6c078SMauro Carvalho Chehab
69343f6c078SMauro Carvalho Chehab
69443f6c078SMauro Carvalho ChehabAttempt 9
69543f6c078SMauro Carvalho Chehab=========
69643f6c078SMauro Carvalho Chehab
69743f6c078SMauro Carvalho ChehabChanged the code but again this slightly degrades performance. Tried all
69843f6c078SMauro Carvalho Chehabkind of other things, like having dedicated parity arrays to avoid the
69943f6c078SMauro Carvalho Chehabshift after parity[rp7] << 7; No gain.
70043f6c078SMauro Carvalho ChehabChange the lookup using the parity array by using shift operators (e.g.
70143f6c078SMauro Carvalho Chehabreplace parity[rp7] << 7 with::
70243f6c078SMauro Carvalho Chehab
70343f6c078SMauro Carvalho Chehab	rp7 ^= (rp7 << 4);
70443f6c078SMauro Carvalho Chehab	rp7 ^= (rp7 << 2);
70543f6c078SMauro Carvalho Chehab	rp7 ^= (rp7 << 1);
70643f6c078SMauro Carvalho Chehab	rp7 &= 0x80;
70743f6c078SMauro Carvalho Chehab
70843f6c078SMauro Carvalho ChehabNo gain.
70943f6c078SMauro Carvalho Chehab
71043f6c078SMauro Carvalho ChehabThe only marginal change was inverting the parity bits, so we can remove
71143f6c078SMauro Carvalho Chehabthe last three invert statements.
71243f6c078SMauro Carvalho Chehab
71343f6c078SMauro Carvalho ChehabAh well, pity this does not deliver more. Then again 10 million
71443f6c078SMauro Carvalho Chehabiterations using the linux driver code takes between 13 and 13.5
71543f6c078SMauro Carvalho Chehabseconds, whereas my code now takes about 0.73 seconds for those 10
71643f6c078SMauro Carvalho Chehabmillion iterations. So basically I've improved the performance by a
71743f6c078SMauro Carvalho Chehabfactor 18 on my system. Not that bad. Of course on different hardware
71843f6c078SMauro Carvalho Chehabyou will get different results. No warranties!
71943f6c078SMauro Carvalho Chehab
72043f6c078SMauro Carvalho ChehabBut of course there is no such thing as a free lunch. The codesize almost
72143f6c078SMauro Carvalho Chehabtripled (from 562 bytes to 1434 bytes). Then again, it is not that much.
72243f6c078SMauro Carvalho Chehab
72343f6c078SMauro Carvalho Chehab
72443f6c078SMauro Carvalho ChehabCorrecting errors
72543f6c078SMauro Carvalho Chehab=================
72643f6c078SMauro Carvalho Chehab
72743f6c078SMauro Carvalho ChehabFor correcting errors I again used the ST application note as a starter,
72843f6c078SMauro Carvalho Chehabbut I also peeked at the existing code.
72943f6c078SMauro Carvalho Chehab
73043f6c078SMauro Carvalho ChehabThe algorithm itself is pretty straightforward. Just xor the given and
73143f6c078SMauro Carvalho Chehabthe calculated ecc. If all bytes are 0 there is no problem. If 11 bits
73243f6c078SMauro Carvalho Chehabare 1 we have one correctable bit error. If there is 1 bit 1, we have an
73343f6c078SMauro Carvalho Chehaberror in the given ecc code.
73443f6c078SMauro Carvalho Chehab
73543f6c078SMauro Carvalho ChehabIt proved to be fastest to do some table lookups. Performance gain
73643f6c078SMauro Carvalho Chehabintroduced by this is about a factor 2 on my system when a repair had to
73743f6c078SMauro Carvalho Chehabbe done, and 1% or so if no repair had to be done.
73843f6c078SMauro Carvalho Chehab
73943f6c078SMauro Carvalho ChehabCode size increased from 330 bytes to 686 bytes for this function.
74043f6c078SMauro Carvalho Chehab(gcc 4.2, -O3)
74143f6c078SMauro Carvalho Chehab
74243f6c078SMauro Carvalho Chehab
74343f6c078SMauro Carvalho ChehabConclusion
74443f6c078SMauro Carvalho Chehab==========
74543f6c078SMauro Carvalho Chehab
74643f6c078SMauro Carvalho ChehabThe gain when calculating the ecc is tremendous. Om my development hardware
74743f6c078SMauro Carvalho Chehaba speedup of a factor of 18 for ecc calculation was achieved. On a test on an
74843f6c078SMauro Carvalho Chehabembedded system with a MIPS core a factor 7 was obtained.
74943f6c078SMauro Carvalho Chehab
75043f6c078SMauro Carvalho ChehabOn a test with a Linksys NSLU2 (ARMv5TE processor) the speedup was a factor
75143f6c078SMauro Carvalho Chehab5 (big endian mode, gcc 4.1.2, -O3)
75243f6c078SMauro Carvalho Chehab
75343f6c078SMauro Carvalho ChehabFor correction not much gain could be obtained (as bitflips are rare). Then
75443f6c078SMauro Carvalho Chehabagain there are also much less cycles spent there.
75543f6c078SMauro Carvalho Chehab
75643f6c078SMauro Carvalho ChehabIt seems there is not much more gain possible in this, at least when
75743f6c078SMauro Carvalho Chehabprogrammed in C. Of course it might be possible to squeeze something more
75843f6c078SMauro Carvalho Chehabout of it with an assembler program, but due to pipeline behaviour etc
75943f6c078SMauro Carvalho Chehabthis is very tricky (at least for intel hw).
76043f6c078SMauro Carvalho Chehab
76143f6c078SMauro Carvalho ChehabAuthor: Frans Meulenbroeks
76243f6c078SMauro Carvalho Chehab
76343f6c078SMauro Carvalho ChehabCopyright (C) 2008 Koninklijke Philips Electronics NV.
764