1c0b11a50SMauro Carvalho Chehab================== 2c0b11a50SMauro Carvalho ChehabPartial Parity Log 3c0b11a50SMauro Carvalho Chehab================== 4c0b11a50SMauro Carvalho Chehab 5c0b11a50SMauro Carvalho ChehabPartial Parity Log (PPL) is a feature available for RAID5 arrays. The issue 6c0b11a50SMauro Carvalho Chehabaddressed by PPL is that after a dirty shutdown, parity of a particular stripe 7c0b11a50SMauro Carvalho Chehabmay become inconsistent with data on other member disks. If the array is also 8c0b11a50SMauro Carvalho Chehabin degraded state, there is no way to recalculate parity, because one of the 9c0b11a50SMauro Carvalho Chehabdisks is missing. This can lead to silent data corruption when rebuilding the 10c0b11a50SMauro Carvalho Chehabarray or using it is as degraded - data calculated from parity for array blocks 11c0b11a50SMauro Carvalho Chehabthat have not been touched by a write request during the unclean shutdown can 12c0b11a50SMauro Carvalho Chehabbe incorrect. Such condition is known as the RAID5 Write Hole. Because of 13c0b11a50SMauro Carvalho Chehabthis, md by default does not allow starting a dirty degraded array. 14c0b11a50SMauro Carvalho Chehab 15c0b11a50SMauro Carvalho ChehabPartial parity for a write operation is the XOR of stripe data chunks not 16c0b11a50SMauro Carvalho Chehabmodified by this write. It is just enough data needed for recovering from the 17c0b11a50SMauro Carvalho Chehabwrite hole. XORing partial parity with the modified chunks produces parity for 18c0b11a50SMauro Carvalho Chehabthe stripe, consistent with its state before the write operation, regardless of 19c0b11a50SMauro Carvalho Chehabwhich chunk writes have completed. If one of the not modified data disks of 20c0b11a50SMauro Carvalho Chehabthis stripe is missing, this updated parity can be used to recover its 21c0b11a50SMauro Carvalho Chehabcontents. PPL recovery is also performed when starting an array after an 22c0b11a50SMauro Carvalho Chehabunclean shutdown and all disks are available, eliminating the need to resync 23c0b11a50SMauro Carvalho Chehabthe array. Because of this, using write-intent bitmap and PPL together is not 24c0b11a50SMauro Carvalho Chehabsupported. 25c0b11a50SMauro Carvalho Chehab 26c0b11a50SMauro Carvalho ChehabWhen handling a write request PPL writes partial parity before new data and 27c0b11a50SMauro Carvalho Chehabparity are dispatched to disks. PPL is a distributed log - it is stored on 28c0b11a50SMauro Carvalho Chehabarray member drives in the metadata area, on the parity drive of a particular 29c0b11a50SMauro Carvalho Chehabstripe. It does not require a dedicated journaling drive. Write performance is 30c0b11a50SMauro Carvalho Chehabreduced by up to 30%-40% but it scales with the number of drives in the array 31c0b11a50SMauro Carvalho Chehaband the journaling drive does not become a bottleneck or a single point of 32c0b11a50SMauro Carvalho Chehabfailure. 33c0b11a50SMauro Carvalho Chehab 34c0b11a50SMauro Carvalho ChehabUnlike raid5-cache, the other solution in md for closing the write hole, PPL is 35c0b11a50SMauro Carvalho Chehabnot a true journal. It does not protect from losing in-flight data, only from 36c0b11a50SMauro Carvalho Chehabsilent data corruption. If a dirty disk of a stripe is lost, no PPL recovery is 37c0b11a50SMauro Carvalho Chehabperformed for this stripe (parity is not updated). So it is possible to have 38c0b11a50SMauro Carvalho Chehabarbitrary data in the written part of a stripe if that disk is lost. In such 39c0b11a50SMauro Carvalho Chehabcase the behavior is the same as in plain raid5. 40c0b11a50SMauro Carvalho Chehab 41c0b11a50SMauro Carvalho ChehabPPL is available for md version-1 metadata and external (specifically IMSM) 42c0b11a50SMauro Carvalho Chehabmetadata arrays. It can be enabled using mdadm option --consistency-policy=ppl. 43c0b11a50SMauro Carvalho Chehab 44c0b11a50SMauro Carvalho ChehabThere is a limitation of maximum 64 disks in the array for PPL. It allows to 45c0b11a50SMauro Carvalho Chehabkeep data structures and implementation simple. RAID5 arrays with so many disks 46c0b11a50SMauro Carvalho Chehabare not likely due to high risk of multiple disks failure. Such restriction 47c0b11a50SMauro Carvalho Chehabshould not be a real life limitation. 48