Revision tags: v4.10.3, v4.10.2, v4.10.1, v4.10 |
|
#
ef43aa38 |
| 04-Jan-2017 |
Milan Broz <gmazyland@gmail.com> |
dm crypt: add cryptographic data integrity protection (authenticated encryption)
Allow the use of per-sector metadata, provided by the dm-integrity module, for integrity protection and persistently
dm crypt: add cryptographic data integrity protection (authenticated encryption)
Allow the use of per-sector metadata, provided by the dm-integrity module, for integrity protection and persistently stored per-sector Initialization Vector (IV). The underlying device must support the "DM-DIF-EXT-TAG" dm-integrity profile.
The per-bio integrity metadata is allocated by dm-crypt for every bio.
Example of low-level mapping table for various types of use: DEV=/dev/sdb SIZE=417792
# Additional HMAC with CBC-ESSIV, key is concatenated encryption key + HMAC key SIZE_INT=389952 dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 32 J 0" dmsetup create y --table "0 $SIZE_INT crypt aes-cbc-essiv:sha256 \ 11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \ 00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff \ 0 /dev/mapper/x 0 1 integrity:32:hmac(sha256)"
# AEAD (Authenticated Encryption with Additional Data) - GCM with random IVs # GCM in kernel uses 96bits IV and we store 128bits auth tag (so 28 bytes metadata space) SIZE_INT=393024 dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 28 J 0" dmsetup create y --table "0 $SIZE_INT crypt aes-gcm-random \ 11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \ 0 /dev/mapper/x 0 1 integrity:28:aead"
# Random IV only for XTS mode (no integrity protection but provides atomic random sector change) SIZE_INT=401272 dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 16 J 0" dmsetup create y --table "0 $SIZE_INT crypt aes-xts-random \ 11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \ 0 /dev/mapper/x 0 1 integrity:16:none"
# Random IV with XTS + HMAC integrity protection SIZE_INT=377656 dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 48 J 0" dmsetup create y --table "0 $SIZE_INT crypt aes-xts-random \ 11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \ 00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff \ 0 /dev/mapper/x 0 1 integrity:48:hmac(sha256)"
Both AEAD and HMAC protection authenticates not only data but also sector metadata.
HMAC protection is implemented through autenc wrapper (so it is processed the same way as an authenticated mode).
In HMAC mode there are two keys (concatenated in dm-crypt mapping table). First is the encryption key and the second is the key for authentication (HMAC). (It is userspace decision if these keys are independent or somehow derived.)
The sector request for AEAD/HMAC authenticated encryption looks like this: |----- AAD -------|------ DATA -------|-- AUTH TAG --| | (authenticated) | (auth+encryption) | | | sector_LE | IV | sector in/out | tag in/out |
For writes, the integrity fields are calculated during AEAD encryption of every sector and stored in bio integrity fields and sent to underlying dm-integrity target for storage.
For reads, the integrity metadata is verified during AEAD decryption of every sector (they are filled in by dm-integrity, but the integrity fields are pre-allocated in dm-crypt).
There is also an experimental support in cryptsetup utility for more friendly configuration (part of LUKS2 format).
Because the integrity fields are not valid on initial creation, the device must be "formatted". This can be done by direct-io writes to the device (e.g. dd in direct-io mode). For now, there is available trivial tool to do this, see: https://github.com/mbroz/dm_int_tools
Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com> Signed-off-by: Vashek Matyas <matyas@fi.muni.cz> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
show more ...
|
#
0837e49a |
| 01-Mar-2017 |
David Howells <dhowells@redhat.com> |
KEYS: Differentiate uses of rcu_dereference_key() and user_key_payload()
rcu_dereference_key() and user_key_payload() are currently being used in two different, incompatible ways:
(1) As a wrapper
KEYS: Differentiate uses of rcu_dereference_key() and user_key_payload()
rcu_dereference_key() and user_key_payload() are currently being used in two different, incompatible ways:
(1) As a wrapper to rcu_dereference() - when only the RCU read lock used to protect the key.
(2) As a wrapper to rcu_dereference_protected() - when the key semaphor is used to protect the key and the may be being modified.
Fix this by splitting both of the key wrappers to produce:
(1) RCU accessors for keys when caller has the key semaphore locked:
dereference_key_locked() user_key_payload_locked()
(2) RCU accessors for keys when caller holds the RCU read lock:
dereference_key_rcu() user_key_payload_rcu()
This should fix following warning in the NFS idmapper
=============================== [ INFO: suspicious RCU usage. ] 4.10.0 #1 Tainted: G W ------------------------------- ./include/keys/user-type.h:53 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 0 1 lock held by mount.nfs/5987: #0: (rcu_read_lock){......}, at: [<d000000002527abc>] nfs_idmap_get_key+0x15c/0x420 [nfsv4] stack backtrace: CPU: 1 PID: 5987 Comm: mount.nfs Tainted: G W 4.10.0 #1 Call Trace: dump_stack+0xe8/0x154 (unreliable) lockdep_rcu_suspicious+0x140/0x190 nfs_idmap_get_key+0x380/0x420 [nfsv4] nfs_map_name_to_uid+0x2a0/0x3b0 [nfsv4] decode_getfattr_attrs+0xfac/0x16b0 [nfsv4] decode_getfattr_generic.constprop.106+0xbc/0x150 [nfsv4] nfs4_xdr_dec_lookup_root+0xac/0xb0 [nfsv4] rpcauth_unwrap_resp+0xe8/0x140 [sunrpc] call_decode+0x29c/0x910 [sunrpc] __rpc_execute+0x140/0x8f0 [sunrpc] rpc_run_task+0x170/0x200 [sunrpc] nfs4_call_sync_sequence+0x68/0xa0 [nfsv4] _nfs4_lookup_root.isra.44+0xd0/0xf0 [nfsv4] nfs4_lookup_root+0xe0/0x350 [nfsv4] nfs4_lookup_root_sec+0x70/0xa0 [nfsv4] nfs4_find_root_sec+0xc4/0x100 [nfsv4] nfs4_proc_get_rootfh+0x5c/0xf0 [nfsv4] nfs4_get_rootfh+0x6c/0x190 [nfsv4] nfs4_server_common_setup+0xc4/0x260 [nfsv4] nfs4_create_server+0x278/0x3c0 [nfsv4] nfs4_remote_mount+0x50/0xb0 [nfsv4] mount_fs+0x74/0x210 vfs_kern_mount+0x78/0x220 nfs_do_root_mount+0xb0/0x140 [nfsv4] nfs4_try_mount+0x60/0x100 [nfsv4] nfs_fs_mount+0x5ec/0xda0 [nfs] mount_fs+0x74/0x210 vfs_kern_mount+0x78/0x220 do_mount+0x254/0xf70 SyS_mount+0x94/0x100 system_call+0x38/0xe0
Reported-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
show more ...
|
#
f5b0cba8 |
| 31-Jan-2017 |
Ondrej Kozina <okozina@redhat.com> |
dm crypt: replace RCU read-side section with rwsem
The lockdep splat below hints at a bug in RCU usage in dm-crypt that was introduced with commit c538f6ec9f56 ("dm crypt: add ability to use keys fr
dm crypt: replace RCU read-side section with rwsem
The lockdep splat below hints at a bug in RCU usage in dm-crypt that was introduced with commit c538f6ec9f56 ("dm crypt: add ability to use keys from the kernel key retention service"). The kernel keyring function user_key_payload() is in fact a wrapper for rcu_dereference_protected() which must not be called with only rcu_read_lock() section mark.
Unfortunately the kernel keyring subsystem doesn't currently provide an interface that allows the use of an RCU read-side section. So for now we must drop RCU in favour of rwsem until a proper function is made available in the kernel keyring subsystem.
=============================== [ INFO: suspicious RCU usage. ] 4.10.0-rc5 #2 Not tainted ------------------------------- ./include/keys/user-type.h:53 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by cryptsetup/6464: #0: (&md->type_lock){+.+.+.}, at: [<ffffffffa02472a2>] dm_lock_md_type+0x12/0x20 [dm_mod] #1: (rcu_read_lock){......}, at: [<ffffffffa02822f8>] crypt_set_key+0x1d8/0x4b0 [dm_crypt] stack backtrace: CPU: 1 PID: 6464 Comm: cryptsetup Not tainted 4.10.0-rc5 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014 Call Trace: dump_stack+0x67/0x92 lockdep_rcu_suspicious+0xc5/0x100 crypt_set_key+0x351/0x4b0 [dm_crypt] ? crypt_set_key+0x1d8/0x4b0 [dm_crypt] crypt_ctr+0x341/0xa53 [dm_crypt] dm_table_add_target+0x147/0x330 [dm_mod] table_load+0x111/0x350 [dm_mod] ? retrieve_status+0x1c0/0x1c0 [dm_mod] ctl_ioctl+0x1f5/0x510 [dm_mod] dm_ctl_ioctl+0xe/0x20 [dm_mod] do_vfs_ioctl+0x8e/0x690 ? ____fput+0x9/0x10 ? task_work_run+0x7e/0xa0 ? trace_hardirqs_on_caller+0x122/0x1b0 SyS_ioctl+0x3c/0x70 entry_SYSCALL_64_fastpath+0x18/0xad RIP: 0033:0x7f392c9a4ec7 RSP: 002b:00007ffef6383378 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffef63830a0 RCX: 00007f392c9a4ec7 RDX: 000000000124fcc0 RSI: 00000000c138fd09 RDI: 0000000000000005 RBP: 00007ffef6383090 R08: 00000000ffffffff R09: 00000000012482b0 R10: 2a28205d34383336 R11: 0000000000000246 R12: 00007f392d803a08 R13: 00007ffef63831e0 R14: 0000000000000000 R15: 00007f392d803a0b
Fixes: c538f6ec9f56 ("dm crypt: add ability to use keys from the kernel key retention service") Reported-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Ondrej Kozina <okozina@redhat.com> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
show more ...
|
#
642fa448 |
| 03-Jan-2017 |
Davidlohr Bueso <dave@stgolabs.net> |
sched/core: Remove set_task_state()
This is a nasty interface and setting the state of a foreign task must not be done. As of the following commit:
be628be0956 ("bcache: Make gc wakeup sane, remo
sched/core: Remove set_task_state()
This is a nasty interface and setting the state of a foreign task must not be done. As of the following commit:
be628be0956 ("bcache: Make gc wakeup sane, remove set_task_state()")
... everyone in the kernel calls set_task_state() with current, allowing the helper to be removed.
However, as the comment indicates, it is still around for those archs where computing current is more expensive than using a pointer, at least in theory. An important arch that is affected is arm64, however this has been addressed now [1] and performance is up to par making no difference with either calls.
Of all the callers, if any, it's the locking bits that would care most about this -- ie: we end up passing a tsk pointer to a lot of the lock slowpath, and setting ->state on that. The following numbers are based on two tests: a custom ad-hoc microbenchmark that just measures latencies (for ~65 million calls) between get_task_state() vs get_current_state().
Secondly for a higher overview, an unlink microbenchmark was used, which pounds on a single file with open, close,unlink combos with increasing thread counts (up to 4x ncpus). While the workload is quite unrealistic, it does contend a lot on the inode mutex or now rwsem.
[1] https://lkml.kernel.org/r/1483468021-8237-1-git-send-email-mark.rutland@arm.com
== 1. x86-64 ==
Avg runtime set_task_state(): 601 msecs Avg runtime set_current_state(): 552 msecs
vanilla dirty Hmean unlink1-processes-2 36089.26 ( 0.00%) 38977.33 ( 8.00%) Hmean unlink1-processes-5 28555.01 ( 0.00%) 29832.55 ( 4.28%) Hmean unlink1-processes-8 37323.75 ( 0.00%) 44974.57 ( 20.50%) Hmean unlink1-processes-12 43571.88 ( 0.00%) 44283.01 ( 1.63%) Hmean unlink1-processes-21 34431.52 ( 0.00%) 38284.45 ( 11.19%) Hmean unlink1-processes-30 34813.26 ( 0.00%) 37975.17 ( 9.08%) Hmean unlink1-processes-48 37048.90 ( 0.00%) 39862.78 ( 7.59%) Hmean unlink1-processes-79 35630.01 ( 0.00%) 36855.30 ( 3.44%) Hmean unlink1-processes-110 36115.85 ( 0.00%) 39843.91 ( 10.32%) Hmean unlink1-processes-141 32546.96 ( 0.00%) 35418.52 ( 8.82%) Hmean unlink1-processes-172 34674.79 ( 0.00%) 36899.21 ( 6.42%) Hmean unlink1-processes-203 37303.11 ( 0.00%) 36393.04 ( -2.44%) Hmean unlink1-processes-224 35712.13 ( 0.00%) 36685.96 ( 2.73%)
== 2. ppc64le ==
Avg runtime set_task_state(): 938 msecs Avg runtime set_current_state: 940 msecs
vanilla dirty Hmean unlink1-processes-2 19269.19 ( 0.00%) 30704.50 ( 59.35%) Hmean unlink1-processes-5 20106.15 ( 0.00%) 21804.15 ( 8.45%) Hmean unlink1-processes-8 17496.97 ( 0.00%) 17243.28 ( -1.45%) Hmean unlink1-processes-12 14224.15 ( 0.00%) 17240.21 ( 21.20%) Hmean unlink1-processes-21 14155.66 ( 0.00%) 15681.23 ( 10.78%) Hmean unlink1-processes-30 14450.70 ( 0.00%) 15995.83 ( 10.69%) Hmean unlink1-processes-48 16945.57 ( 0.00%) 16370.42 ( -3.39%) Hmean unlink1-processes-79 15788.39 ( 0.00%) 14639.27 ( -7.28%) Hmean unlink1-processes-110 14268.48 ( 0.00%) 14377.40 ( 0.76%) Hmean unlink1-processes-141 14023.65 ( 0.00%) 16271.69 ( 16.03%) Hmean unlink1-processes-172 13417.62 ( 0.00%) 16067.55 ( 19.75%) Hmean unlink1-processes-203 15293.08 ( 0.00%) 15440.40 ( 0.96%) Hmean unlink1-processes-234 13719.32 ( 0.00%) 16190.74 ( 18.01%) Hmean unlink1-processes-265 16400.97 ( 0.00%) 16115.22 ( -1.74%) Hmean unlink1-processes-296 14388.60 ( 0.00%) 16216.13 ( 12.70%) Hmean unlink1-processes-320 15771.85 ( 0.00%) 15905.96 ( 0.85%)
x86-64 (known to be fast for get_current()/this_cpu_read_stable() caching) and ppc64 (with paca) show similar improvements in the unlink microbenches. The small delta for ppc64 (2ms), does not represent the gains on the unlink runs. In the case of x86, there was a decent amount of variation in the latency runs, but always within a 20 to 50ms increase), ppc was more constant.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Cc: mark.rutland@arm.com Link: http://lkml.kernel.org/r/1483479794-14013-5-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
Revision tags: v4.9 |
|
#
027c431c |
| 01-Dec-2016 |
Ondrej Kozina <okozina@redhat.com> |
dm crypt: reject key strings containing whitespace chars
Unfortunately key_string may theoretically contain whitespace even after it's processed by dm_split_args(). The reason for this is DM core s
dm crypt: reject key strings containing whitespace chars
Unfortunately key_string may theoretically contain whitespace even after it's processed by dm_split_args(). The reason for this is DM core supports escaping of almost all chars including any whitespace.
If userspace passes a key to the kernel in format ":32:logon:my_prefix:my\ key" dm-crypt will look up key "my_prefix:my key" in kernel keyring service. So far everything's fine.
Unfortunately if userspace later calls DM_TABLE_STATUS ioctl, it will not receive back expected ":32:logon:my_prefix:my\ key" but the unescaped version instead. Also userpace (most notably cryptsetup) is not ready to parse single target argument containing (even escaped) whitespace chars and any whitespace is simply taken as delimiter of another argument.
This effect is mitigated by the fact libdevmapper curently performs double escaping of '\' char. Any user input in format "x\ x" is transformed into "x\\ x" before being passed to the kernel. Nonetheless dm-crypt may be used without libdevmapper. Therefore the near-term solution to this is to reject any key string containing whitespace.
Signed-off-by: Ondrej Kozina <okozina@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
show more ...
|
#
c538f6ec |
| 21-Nov-2016 |
Ondrej Kozina <okozina@redhat.com> |
dm crypt: add ability to use keys from the kernel key retention service
The kernel key service is a generic way to store keys for the use of other subsystems. Currently there is no way to use kernel
dm crypt: add ability to use keys from the kernel key retention service
The kernel key service is a generic way to store keys for the use of other subsystems. Currently there is no way to use kernel keys in dm-crypt. This patch aims to fix that. Instead of key userspace may pass a key description with preceding ':'. So message that constructs encryption mapping now looks like this:
<cipher> [<key>|:<key_string>] <iv_offset> <dev_path> <start> [<#opt_params> <opt_params>]
where <key_string> is in format: <key_size>:<key_type>:<key_description>
Currently we only support two elementary key types: 'user' and 'logon'. Keys may be loaded in dm-crypt either via <key_string> or using classical method and pass the key in hex representation directly.
dm-crypt device initialised with a key passed in hex representation may be replaced with key passed in key_string format and vice versa.
(Based on original work by Andrey Ryabinin)
Signed-off-by: Ondrej Kozina <okozina@redhat.com> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
show more ...
|
Revision tags: openbmc-4.4-20161121-1, v4.4.33, v4.4.32, v4.4.31, v4.4.30, v4.4.29, v4.4.28, v4.4.27, v4.7.10, openbmc-4.4-20161021-1, v4.7.9, v4.4.26, v4.7.8, v4.4.25, v4.4.24, v4.7.7, v4.8, v4.4.23, v4.7.6, v4.7.5, v4.4.22, v4.4.21, v4.7.4, v4.7.3, v4.4.20, v4.7.2, v4.4.19, openbmc-4.4-20160819-1, v4.7.1, v4.4.18, v4.4.17, openbmc-4.4-20160804-1, v4.4.16, v4.7, openbmc-4.4-20160722-1, openbmc-20160722-1, openbmc-20160713-1, v4.4.15, v4.6.4, v4.6.3, v4.4.14, v4.6.2, v4.4.13, openbmc-20160606-1, v4.6.1, v4.4.12, openbmc-20160521-1, v4.4.11, openbmc-20160518-1, v4.6, v4.4.10, openbmc-20160511-1, openbmc-20160505-1, v4.4.9, v4.4.8, v4.4.7, openbmc-20160329-2, openbmc-20160329-1, openbmc-20160321-1, v4.4.6, v4.5, v4.4.5, v4.4.4, v4.4.3, openbmc-20160222-1, v4.4.2, openbmc-20160212-1, openbmc-20160210-1, openbmc-20160202-2, openbmc-20160202-1, v4.4.1, openbmc-20160127-1, openbmc-20160120-1, v4.4, openbmc-20151217-1, openbmc-20151210-1, openbmc-20151202-1 |
|
#
1b1b58f5 |
| 29-Nov-2015 |
Julia Lawall <Julia.Lawall@lip6.fr> |
dm crypt: constify crypt_iv_operations structures
The crypt_iv_operations are never modified, so declare them as const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@
dm crypt: constify crypt_iv_operations structures
The crypt_iv_operations are never modified, so declare them as const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
show more ...
|
#
671ea6b4 |
| 25-Aug-2016 |
Mikulas Patocka <mpatocka@redhat.com> |
dm crypt: rename crypt_setkey_allcpus to crypt_setkey
In the past, dm-crypt used per-cpu crypto context. This has been removed in the kernel 3.15 and the crypto context is shared between all cpus. T
dm crypt: rename crypt_setkey_allcpus to crypt_setkey
In the past, dm-crypt used per-cpu crypto context. This has been removed in the kernel 3.15 and the crypto context is shared between all cpus. This patch renames the function crypt_setkey_allcpus to crypt_setkey, because there is really no activity that is done for all cpus.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
show more ...
|
#
265e9098 |
| 02-Nov-2016 |
Ondrej Kozina <okozina@redhat.com> |
dm crypt: mark key as invalid until properly loaded
In crypt_set_key(), if a failure occurs while replacing the old key (e.g. tfm->setkey() fails) the key must not have DM_CRYPT_KEY_VALID flag set.
dm crypt: mark key as invalid until properly loaded
In crypt_set_key(), if a failure occurs while replacing the old key (e.g. tfm->setkey() fails) the key must not have DM_CRYPT_KEY_VALID flag set. Otherwise, the crypto layer would have an invalid key that still has DM_CRYPT_KEY_VALID flag set.
Cc: stable@vger.kernel.org Signed-off-by: Ondrej Kozina <okozina@redhat.com> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
show more ...
|
#
0dae7fe5 |
| 29-Oct-2016 |
Ming Lei <tom.leiming@gmail.com> |
dm crypt: use bio_add_page()
Use bio_add_page(), the standard interface for adding a page to a bio, rather than open-coding the same.
It should be noted that the 'clone' bio that is allocated using
dm crypt: use bio_add_page()
Use bio_add_page(), the standard interface for adding a page to a bio, rather than open-coding the same.
It should be noted that the 'clone' bio that is allocated using bio_alloc_bioset(), in crypt_alloc_buffer(), does _not_ set the bio's BIO_CLONED flag. As such, bio_add_page()'s early return for true bio clones (those with BIO_CLONED set) isn't applicable.
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
show more ...
|
#
ef295ecf |
| 28-Oct-2016 |
Christoph Hellwig <hch@lst.de> |
block: better op and flags encoding
Now that we don't need the common flags to overflow outside the range of a 32-bit type we can encode them the same way for both the bio and request fields. This
block: better op and flags encoding
Now that we don't need the common flags to overflow outside the range of a 32-bit type we can encode them the same way for both the bio and request fields. This in addition allows us to place the operation first (and make some room for more ops while we're at it) and to stop having to shift around the operation values.
In addition this allows passing around only one value in the block layer instead of two (and eventuall also in the file systems, but we can do that later) and thus clean up a lot of code.
Last but not least this allows decreasing the size of the cmd_flags field in struct request to 32-bits. Various functions passing this value could also be updated, but I'd like to avoid the churn for now.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
f659b100 |
| 21-Sep-2016 |
Rabin Vincent <rabinv@axis.com> |
dm crypt: fix crash on exit
As the documentation for kthread_stop() says, "if threadfn() may call do_exit() itself, the caller must ensure task_struct can't go away". dm-crypt does not ensure this a
dm crypt: fix crash on exit
As the documentation for kthread_stop() says, "if threadfn() may call do_exit() itself, the caller must ensure task_struct can't go away". dm-crypt does not ensure this and therefore crashes when crypt_dtr() calls kthread_stop(). The crash is trivially reproducible by adding a delay before the call to kthread_stop() and just opening and closing a dm-crypt device.
general protection fault: 0000 [#1] PREEMPT SMP CPU: 0 PID: 533 Comm: cryptsetup Not tainted 4.8.0-rc7+ #7 task: ffff88003bd0df40 task.stack: ffff8800375b4000 RIP: 0010: kthread_stop+0x52/0x300 Call Trace: crypt_dtr+0x77/0x120 dm_table_destroy+0x6f/0x120 __dm_destroy+0x130/0x250 dm_destroy+0x13/0x20 dev_remove+0xe6/0x120 ? dev_suspend+0x250/0x250 ctl_ioctl+0x1fc/0x530 ? __lock_acquire+0x24f/0x1b10 dm_ctl_ioctl+0x13/0x20 do_vfs_ioctl+0x91/0x6a0 ? ____fput+0xe/0x10 ? entry_SYSCALL_64_fastpath+0x5/0xbd ? trace_hardirqs_on_caller+0x151/0x1e0 SyS_ioctl+0x41/0x70 entry_SYSCALL_64_fastpath+0x1f/0xbd
This problem was introduced by bcbd94ff481e ("dm crypt: fix a possible hang due to race condition on exit").
Looking at the description of that patch (excerpted below), it seems like the problem it addresses can be solved by just using set_current_state instead of __set_current_state, since we obviously need the memory barrier.
| dm crypt: fix a possible hang due to race condition on exit | | A kernel thread executes __set_current_state(TASK_INTERRUPTIBLE), | __add_wait_queue, spin_unlock_irq and then tests kthread_should_stop(). | It is possible that the processor reorders memory accesses so that | kthread_should_stop() is executed before __set_current_state(). If | such reordering happens, there is a possible race on thread | termination: [...]
So this patch just reverts the aforementioned patch and changes the __set_current_state(TASK_INTERRUPTIBLE) to set_current_state(...). This fixes the crash and should also fix the potential hang.
Fixes: bcbd94ff481e ("dm crypt: fix a possible hang due to race condition on exit") Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Rabin Vincent <rabinv@axis.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
show more ...
|
#
4382e33a |
| 14-Sep-2016 |
Bart Van Assche <bart.vanassche@sandisk.com> |
block, dm-crypt, btrfs: Introduce bio_flags()
Introduce the bio_flags() macro. Ensure that the second argument of bio_set_op_attrs() only contains flags and no operation. This patch does not change
block, dm-crypt, btrfs: Introduce bio_flags()
Introduce the bio_flags() macro. Ensure that the second argument of bio_set_op_attrs() only contains flags and no operation. This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Mike Christie <mchristi@redhat.com> Cc: Chris Mason <clm@fb.com> (maintainer:BTRFS FILE SYSTEM) Cc: Josef Bacik <jbacik@fb.com> (maintainer:BTRFS FILE SYSTEM) Cc: Mike Snitzer <snitzer@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@hgst.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
5d0be84e |
| 30-Aug-2016 |
Eric Biggers <ebiggers@google.com> |
dm crypt: fix free of bad values after tfm allocation failure
If crypt_alloc_tfms() had to allocate multiple tfms and it failed before the last allocation, then it would call crypt_free_tfms() and c
dm crypt: fix free of bad values after tfm allocation failure
If crypt_alloc_tfms() had to allocate multiple tfms and it failed before the last allocation, then it would call crypt_free_tfms() and could free pointers from uninitialized memory -- due to the crypt_free_tfms() check for non-zero cc->tfms[i]. Fix by allocating zeroed memory.
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
show more ...
|
#
4e870e94 |
| 30-Aug-2016 |
Mikulas Patocka <mpatocka@redhat.com> |
dm crypt: fix error with too large bios
When dm-crypt processes writes, it allocates a new bio in crypt_alloc_buffer(). The bio is allocated from a bio set and it can have at most BIO_MAX_PAGES vec
dm crypt: fix error with too large bios
When dm-crypt processes writes, it allocates a new bio in crypt_alloc_buffer(). The bio is allocated from a bio set and it can have at most BIO_MAX_PAGES vector entries, however the incoming bio can be larger (e.g. if it was allocated by bcache). If the incoming bio is larger, bio_alloc_bioset() fails and an error is returned.
To avoid the error, we test for a too large bio in the function crypt_map() and use dm_accept_partial_bio() to split the bio. dm_accept_partial_bio() trims the current bio to the desired size and asks DM core to send another bio with the rest of the data.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # v3.16+
show more ...
|
#
0a83df6c |
| 15-Jul-2016 |
Mikulas Patocka <mpatocka@redhat.com> |
dm crypt: increase mempool reserve to better support swapping
Increase mempool size from 16 to 64 entries. This increase improves swap on dm-crypt performance.
When swapping to dm-crypt, all avail
dm crypt: increase mempool reserve to better support swapping
Increase mempool size from 16 to 64 entries. This increase improves swap on dm-crypt performance.
When swapping to dm-crypt, all available memory is temporarily exhausted and dm-crypt can only use the mempool reserve.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
show more ...
|
#
1eff9d32 |
| 05-Aug-2016 |
Jens Axboe <axboe@fb.com> |
block: rename bio bi_rw to bi_opf
Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower portion and the op code in the higher portions. This means that old code that relies on manually s
block: rename bio bi_rw to bi_opf
Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower portion and the op code in the higher portions. This means that old code that relies on manually setting bi_rw is most likely going to be broken. Instead of letting that brokeness linger, rename the member, to force old and out-of-tree code to break at compile time instead of at runtime.
No intended functional changes in this commit.
Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
350b5393 |
| 28-Jun-2016 |
Bart Van Assche <bart.vanassche@sandisk.com> |
dm crypt: Fix sparse complaints
Avoid that sparse complains about assigning a __le64 value to a u64 variable. Remove the (u64) casts since these are superfluous. This patch does not change the beh
dm crypt: Fix sparse complaints
Avoid that sparse complains about assigning a __le64 value to a u64 variable. Remove the (u64) casts since these are superfluous. This patch does not change the behavior of the source code.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
show more ...
|
#
28a8f0d3 |
| 05-Jun-2016 |
Mike Christie <mchristi@redhat.com> |
block, drivers, fs: rename REQ_FLUSH to REQ_PREFLUSH
To avoid confusion between REQ_OP_FLUSH, which is handled by request_fn drivers, and upper layers requesting the block layer perform a flush sequ
block, drivers, fs: rename REQ_FLUSH to REQ_PREFLUSH
To avoid confusion between REQ_OP_FLUSH, which is handled by request_fn drivers, and upper layers requesting the block layer perform a flush sequence along with possibly a WRITE, this patch renames REQ_FLUSH to REQ_PREFLUSH.
Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
e6047149 |
| 05-Jun-2016 |
Mike Christie <mchristi@redhat.com> |
dm: use bio op accessors
Separate the op from the rq_flag_bits and have dm set/get the bio using bio_set_op_attrs/bio_op.
Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Hannes Rein
dm: use bio op accessors
Separate the op from the rq_flag_bits and have dm set/get the bio using bio_set_op_attrs/bio_op.
Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
30187e1d |
| 31-Jan-2016 |
Mike Snitzer <snitzer@redhat.com> |
dm: rename target's per_bio_data_size to per_io_data_size
Request-based DM will also make use of per_bio_data_size.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
#
bbdb23b5 |
| 24-Jan-2016 |
Herbert Xu <herbert@gondor.apana.org.au> |
dm crypt: Use skcipher and ahash
This patch replaces uses of ablkcipher with skcipher, and the long obsolete hash interface with ahash.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
Revision tags: openbmc-20151123-1 |
|
#
bcbd94ff |
| 19-Nov-2015 |
Mikulas Patocka <mpatocka@redhat.com> |
dm crypt: fix a possible hang due to race condition on exit
A kernel thread executes __set_current_state(TASK_INTERRUPTIBLE), __add_wait_queue, spin_unlock_irq and then tests kthread_should_stop().
dm crypt: fix a possible hang due to race condition on exit
A kernel thread executes __set_current_state(TASK_INTERRUPTIBLE), __add_wait_queue, spin_unlock_irq and then tests kthread_should_stop(). It is possible that the processor reorders memory accesses so that kthread_should_stop() is executed before __set_current_state(). If such reordering happens, there is a possible race on thread termination:
CPU 0: calls kthread_should_stop() it tests KTHREAD_SHOULD_STOP bit, returns false CPU 1: calls kthread_stop(cc->write_thread) sets the KTHREAD_SHOULD_STOP bit calls wake_up_process on the kernel thread, that sets the thread state to TASK_RUNNING CPU 0: sets __set_current_state(TASK_INTERRUPTIBLE) spin_unlock_irq(&cc->write_thread_wait.lock) schedule() - and the process is stuck and never terminates, because the state is TASK_INTERRUPTIBLE and wake_up_process on CPU 1 already terminated
Fix this race condition by using a new flag DM_CRYPT_EXIT_THREAD to signal that the kernel thread should exit. The flag is set and tested while holding cc->write_thread_wait.lock, so there is no possibility of racy access to the flag.
Also, remove the unnecessary set_task_state(current, TASK_RUNNING) following the schedule() call. When the process was woken up, its state was already set to TASK_RUNNING. Other kernel code also doesn't set the state to TASK_RUNNING following schedule() (for example, do_wait_for_common in completion.c doesn't do it).
Fixes: dc2676210c42 ("dm crypt: offload writes to thread") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Mike Snitzer <snitzer@redhat.com>
show more ...
|
Revision tags: openbmc-20151118-1 |
|
#
d0164adc |
| 06-Nov-2015 |
Mel Gorman <mgorman@techsingularity.net> |
mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
__GFP_WAIT has been used to identify atomic context in callers that hold spinlocks or are in
mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
__GFP_WAIT has been used to identify atomic context in callers that hold spinlocks or are in interrupts. They are expected to be high priority and have access one of two watermarks lower than "min" which can be referred to as the "atomic reserve". __GFP_HIGH users get access to the first lower watermark and can be called the "high priority reserve".
Over time, callers had a requirement to not block when fallback options were available. Some have abused __GFP_WAIT leading to a situation where an optimisitic allocation with a fallback option can access atomic reserves.
This patch uses __GFP_ATOMIC to identify callers that are truely atomic, cannot sleep and have no alternative. High priority users continue to use __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify callers that want to wake kswapd for background reclaim. __GFP_WAIT is redefined as a caller that is willing to enter direct reclaim and wake kswapd for background reclaim.
This patch then converts a number of sites
o __GFP_ATOMIC is used by callers that are high priority and have memory pools for those requests. GFP_ATOMIC uses this flag.
o Callers that have a limited mempool to guarantee forward progress clear __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall into this category where kswapd will still be woken but atomic reserves are not used as there is a one-entry mempool to guarantee progress.
o Callers that are checking if they are non-blocking should use the helper gfpflags_allow_blocking() where possible. This is because checking for __GFP_WAIT as was done historically now can trigger false positives. Some exceptions like dm-crypt.c exist where the code intent is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to flag manipulations.
o Callers that built their own GFP flags instead of starting with GFP_KERNEL and friends now also need to specify __GFP_KSWAPD_RECLAIM.
The first key hazard to watch out for is callers that removed __GFP_WAIT and was depending on access to atomic reserves for inconspicuous reasons. In some cases it may be appropriate for them to use __GFP_HIGH.
The second key hazard is callers that assembled their own combination of GFP flags instead of starting with something like GFP_KERNEL. They may now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: openbmc-20151104-1, v4.3, openbmc-20151102-1, openbmc-20151028-1 |
|
#
6f65985e |
| 13-Sep-2015 |
Julia Lawall <Julia.Lawall@lip6.fr> |
dm: drop NULL test before kmem_cache_destroy() and mempool_destroy()
Remove DM's unneeded NULL tests before calling these destroy functions, now that they check for NULL, thanks to these v4.3 commit
dm: drop NULL test before kmem_cache_destroy() and mempool_destroy()
Remove DM's unneeded NULL tests before calling these destroy functions, now that they check for NULL, thanks to these v4.3 commits: 3942d2991 ("mm/slab_common: allow NULL cache pointer in kmem_cache_destroy()") 4e3ca3e03 ("mm/mempool: allow NULL `pool' pointer in mempool_destroy()")
The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/)
// <smpl> @@ expression x; @@ -if (x != NULL) \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x); // </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
show more ...
|