xref: /openbmc/linux/Documentation/admin-guide/device-mapper/dm-integrity.rst (revision 68f436a80fc89faa474134edfe442d95528be17a)
1============
2dm-integrity
3============
4
5The dm-integrity target emulates a block device that has additional
6per-sector tags that can be used for storing integrity information.
7
8A general problem with storing integrity tags with every sector is that
9writing the sector and the integrity tag must be atomic - i.e. in case of
10crash, either both sector and integrity tag or none of them is written.
11
12To guarantee write atomicity, the dm-integrity target uses journal, it
13writes sector data and integrity tags into a journal, commits the journal
14and then copies the data and integrity tags to their respective location.
15
16The dm-integrity target can be used with the dm-crypt target - in this
17situation the dm-crypt target creates the integrity data and passes them
18to the dm-integrity target via bio_integrity_payload attached to the bio.
19In this mode, the dm-crypt and dm-integrity targets provide authenticated
20disk encryption - if the attacker modifies the encrypted device, an I/O
21error is returned instead of random data.
22
23The dm-integrity target can also be used as a standalone target, in this
24mode it calculates and verifies the integrity tag internally. In this
25mode, the dm-integrity target can be used to detect silent data
26corruption on the disk or in the I/O path.
27
28There's an alternate mode of operation where dm-integrity uses a bitmap
29instead of a journal. If a bit in the bitmap is 1, the corresponding
30region's data and integrity tags are not synchronized - if the machine
31crashes, the unsynchronized regions will be recalculated. The bitmap mode
32is faster than the journal mode, because we don't have to write the data
33twice, but it is also less reliable, because if data corruption happens
34when the machine crashes, it may not be detected.
35
36When loading the target for the first time, the kernel driver will format
37the device. But it will only format the device if the superblock contains
38zeroes. If the superblock is neither valid nor zeroed, the dm-integrity
39target can't be loaded.
40
41Accesses to the on-disk metadata area containing checksums (aka tags) are
42buffered using dm-bufio. When an access to any given metadata area
43occurs, each unique metadata area gets its own buffer(s). The buffer size
44is capped at the size of the metadata area, but may be smaller, thereby
45requiring multiple buffers to represent the full metadata area. A smaller
46buffer size will produce a smaller resulting read/write operation to the
47metadata area for small reads/writes. The metadata is still read even in
48a full write to the data covered by a single buffer.
49
50To use the target for the first time:
51
521. overwrite the superblock with zeroes
532. load the dm-integrity target with one-sector size, the kernel driver
54   will format the device
553. unload the dm-integrity target
564. read the "provided_data_sectors" value from the superblock
575. load the dm-integrity target with the target size
58   "provided_data_sectors"
596. if you want to use dm-integrity with dm-crypt, load the dm-crypt target
60   with the size "provided_data_sectors"
61
62
63Target arguments:
64
651. the underlying block device
66
672. the number of reserved sector at the beginning of the device - the
68   dm-integrity won't read of write these sectors
69
703. the size of the integrity tag (if "-" is used, the size is taken from
71   the internal-hash algorithm)
72
734. mode:
74
75	D - direct writes (without journal)
76		in this mode, journaling is
77		not used and data sectors and integrity tags are written
78		separately. In case of crash, it is possible that the data
79		and integrity tag doesn't match.
80	J - journaled writes
81		data and integrity tags are written to the
82		journal and atomicity is guaranteed. In case of crash,
83		either both data and tag or none of them are written. The
84		journaled mode degrades write throughput twice because the
85		data have to be written twice.
86	B - bitmap mode - data and metadata are written without any
87		synchronization, the driver maintains a bitmap of dirty
88		regions where data and metadata don't match. This mode can
89		only be used with internal hash.
90	R - recovery mode - in this mode, journal is not replayed,
91		checksums are not checked and writes to the device are not
92		allowed. This mode is useful for data recovery if the
93		device cannot be activated in any of the other standard
94		modes.
95
965. the number of additional arguments
97
98Additional arguments:
99
100journal_sectors:number
101	The size of journal, this argument is used only if formatting the
102	device. If the device is already formatted, the value from the
103	superblock is used.
104
105interleave_sectors:number (default 32768)
106	The number of interleaved sectors. This values is rounded down to
107	a power of two. If the device is already formatted, the value from
108	the superblock is used.
109
110meta_device:device
111	Don't interleave the data and metadata on the device. Use a
112	separate device for metadata.
113
114buffer_sectors:number (default 128)
115	The number of sectors in one metadata buffer. The value is rounded
116	down to a power of two.
117
118journal_watermark:number (default 50)
119	The journal watermark in percents. When the size of the journal
120	exceeds this watermark, the thread that flushes the journal will
121	be started.
122
123commit_time:number (default 10000)
124	Commit time in milliseconds. When this time passes, the journal is
125	written. The journal is also written immediately if the FLUSH
126	request is received.
127
128internal_hash:algorithm(:key)	(the key is optional)
129	Use internal hash or crc.
130	When this argument is used, the dm-integrity target won't accept
131	integrity tags from the upper target, but it will automatically
132	generate and verify the integrity tags.
133
134	You can use a crc algorithm (such as crc32), then integrity target
135	will protect the data against accidental corruption.
136	You can also use a hmac algorithm (for example
137	"hmac(sha256):0123456789abcdef"), in this mode it will provide
138	cryptographic authentication of the data without encryption.
139
140	When this argument is not used, the integrity tags are accepted
141	from an upper layer target, such as dm-crypt. The upper layer
142	target should check the validity of the integrity tags.
143
144recalculate
145	Recalculate the integrity tags automatically. It is only valid
146	when using internal hash.
147
148journal_crypt:algorithm(:key)	(the key is optional)
149	Encrypt the journal using given algorithm to make sure that the
150	attacker can't read the journal. You can use a block cipher here
151	(such as "cbc(aes)") or a stream cipher (for example "chacha20"
152	or "ctr(aes)").
153
154	The journal contains history of last writes to the block device,
155	an attacker reading the journal could see the last sector numbers
156	that were written. From the sector numbers, the attacker can infer
157	the size of files that were written. To protect against this
158	situation, you can encrypt the journal.
159
160journal_mac:algorithm(:key)	(the key is optional)
161	Protect sector numbers in the journal from accidental or malicious
162	modification. To protect against accidental modification, use a
163	crc algorithm, to protect against malicious modification, use a
164	hmac algorithm with a key.
165
166	This option is not needed when using internal-hash because in this
167	mode, the integrity of journal entries is checked when replaying
168	the journal. Thus, modified sector number would be detected at
169	this stage.
170
171block_size:number (default 512)
172	The size of a data block in bytes. The larger the block size the
173	less overhead there is for per-block integrity metadata.
174	Supported values are 512, 1024, 2048 and 4096 bytes.
175
176sectors_per_bit:number
177	In the bitmap mode, this parameter specifies the number of
178	512-byte sectors that corresponds to one bitmap bit.
179
180bitmap_flush_interval:number
181	The bitmap flush interval in milliseconds. The metadata buffers
182	are synchronized when this interval expires.
183
184allow_discards
185	Allow block discard requests (a.k.a. TRIM) for the integrity device.
186	Discards are only allowed to devices using internal hash.
187
188fix_padding
189	Use a smaller padding of the tag area that is more
190	space-efficient. If this option is not present, large padding is
191	used - that is for compatibility with older kernels.
192
193fix_hmac
194	Improve security of internal_hash and journal_mac:
195
196	- the section number is mixed to the mac, so that an attacker can't
197	  copy sectors from one journal section to another journal section
198	- the superblock is protected by journal_mac
199	- a 16-byte salt stored in the superblock is mixed to the mac, so
200	  that the attacker can't detect that two disks have the same hmac
201	  key and also to disallow the attacker to move sectors from one
202	  disk to another
203
204legacy_recalculate
205	Allow recalculating of volumes with HMAC keys. This is disabled by
206	default for security reasons - an attacker could modify the volume,
207	set recalc_sector to zero, and the kernel would not detect the
208	modification.
209
210The journal mode (D/J), buffer_sectors, journal_watermark, commit_time and
211allow_discards can be changed when reloading the target (load an inactive
212table and swap the tables with suspend and resume). The other arguments
213should not be changed when reloading the target because the layout of disk
214data depend on them and the reloaded target would be non-functional.
215
216For example, on a device using the default interleave_sectors of 32768, a
217block_size of 512, and an internal_hash of crc32c with a tag size of 4
218bytes, it will take 128 KiB of tags to track a full data area, requiring
219256 sectors of metadata per data area. With the default buffer_sectors of
220128, that means there will be 2 buffers per metadata area, or 2 buffers
221per 16 MiB of data.
222
223Status line:
224
2251. the number of integrity mismatches
2262. provided data sectors - that is the number of sectors that the user
227   could use
2283. the current recalculating position (or '-' if we didn't recalculate)
229
230
231The layout of the formatted block device:
232
233* reserved sectors
234    (they are not used by this target, they can be used for
235    storing LUKS metadata or for other purpose), the size of the reserved
236    area is specified in the target arguments
237
238* superblock (4kiB)
239	* magic string - identifies that the device was formatted
240	* version
241	* log2(interleave sectors)
242	* integrity tag size
243	* the number of journal sections
244	* provided data sectors - the number of sectors that this target
245	  provides (i.e. the size of the device minus the size of all
246	  metadata and padding). The user of this target should not send
247	  bios that access data beyond the "provided data sectors" limit.
248	* flags
249	    SB_FLAG_HAVE_JOURNAL_MAC
250		- a flag is set if journal_mac is used
251	    SB_FLAG_RECALCULATING
252		- recalculating is in progress
253	    SB_FLAG_DIRTY_BITMAP
254		- journal area contains the bitmap of dirty
255		  blocks
256	* log2(sectors per block)
257	* a position where recalculating finished
258* journal
259	The journal is divided into sections, each section contains:
260
261	* metadata area (4kiB), it contains journal entries
262
263	  - every journal entry contains:
264
265		* logical sector (specifies where the data and tag should
266		  be written)
267		* last 8 bytes of data
268		* integrity tag (the size is specified in the superblock)
269
270	  - every metadata sector ends with
271
272		* mac (8-bytes), all the macs in 8 metadata sectors form a
273		  64-byte value. It is used to store hmac of sector
274		  numbers in the journal section, to protect against a
275		  possibility that the attacker tampers with sector
276		  numbers in the journal.
277		* commit id
278
279	* data area (the size is variable; it depends on how many journal
280	  entries fit into the metadata area)
281
282	    - every sector in the data area contains:
283
284		* data (504 bytes of data, the last 8 bytes are stored in
285		  the journal entry)
286		* commit id
287
288	To test if the whole journal section was written correctly, every
289	512-byte sector of the journal ends with 8-byte commit id. If the
290	commit id matches on all sectors in a journal section, then it is
291	assumed that the section was written correctly. If the commit id
292	doesn't match, the section was written partially and it should not
293	be replayed.
294
295* one or more runs of interleaved tags and data.
296    Each run contains:
297
298	* tag area - it contains integrity tags. There is one tag for each
299	  sector in the data area. The size of this area is always 4KiB or
300	  greater.
301	* data area - it contains data sectors. The number of data sectors
302	  in one run must be a power of two. log2 of this value is stored
303	  in the superblock.
304