51dc4111 | 18-Feb-2025 |
David Howells <dhowells@redhat.com> |
afs: Fix the server_list to unuse a displaced server rather than putting it
[ Upstream commit add117e48df4788a86a21bd0515833c0a6db1ad1 ]
When allocating and building an afs_server_list struct objec
afs: Fix the server_list to unuse a displaced server rather than putting it
[ Upstream commit add117e48df4788a86a21bd0515833c0a6db1ad1 ]
When allocating and building an afs_server_list struct object from a VLDB record, we look up each server address to get the server record for it - but a server may have more than one entry in the record and we discard the duplicate pointers. Currently, however, when we discard, we only put a server record, not unuse it - but the lookup got as an active-user count.
The active-user count on an afs_server_list object determines its lifetime whereas the refcount keeps the memory backing it around. Failing to reduce the active-user counter prevents the record from being cleaned up and can lead to multiple copied being seen - and pointing to deleted afs_cell objects and other such things.
Fix this by switching the incorrect 'put' to an 'unuse' instead.
Without this, occasionally, a dead server record can be seen in /proc/net/afs/servers and list corruption may be observed:
list_del corruption. prev->next should be ffff888102423e40, but was 0000000000000000. (prev=ffff88810140cd38)
Fixes: 977e5f8ed0ab ("afs: Split the usage count on struct afs_server") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250218192250.296870-5-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
a5e15707 | 16-Dec-2024 |
David Howells <dhowells@redhat.com> |
afs: Fix cleanup of immediately failed async calls
[ Upstream commit 9750be93b2be12b6d92323b97d7c055099d279e6 ]
If we manage to begin an async call, but fail to transmit any data on it due to a sig
afs: Fix cleanup of immediately failed async calls
[ Upstream commit 9750be93b2be12b6d92323b97d7c055099d279e6 ]
If we manage to begin an async call, but fail to transmit any data on it due to a signal, we then abort it which causes a race between the notification of call completion from rxrpc and our attempt to cancel the notification. The notification will be necessary, however, for async FetchData to terminate the netfs subrequest.
However, since we get a notification from rxrpc upon completion of a call (aborted or otherwise), we can just leave it to that.
This leads to calls not getting cleaned up, but appearing in /proc/net/rxrpc/calls as being aborted with code 6.
Fix this by making the "error_do_abort:" case of afs_make_call() abort the call and then abandon it to the notification handler.
Fixes: 34fa47612bfe ("afs: Fix race in async call refcounting") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241216204124.3752367-25-dhowells@redhat.com cc: linux-afs@lists.infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
7e8ea8e8 | 16-Dec-2024 |
David Howells <dhowells@redhat.com> |
afs: Fix directory format encoding struct
[ Upstream commit 07a10767853adcbdbf436dc91393b729b52c4e81 ]
The AFS directory format structure, union afs_xdr_dir_block::meta, has too many alloc counter
afs: Fix directory format encoding struct
[ Upstream commit 07a10767853adcbdbf436dc91393b729b52c4e81 ]
The AFS directory format structure, union afs_xdr_dir_block::meta, has too many alloc counter slots declared and so pushes the hash table along and over the data. This doesn't cause a problem at the moment because I'm currently ignoring the hash table and only using the correct number of alloc_ctrs in the code anyway. In future, however, I should start using the hash table to try and speed up afs_lookup().
Fix this by using the correct constant to declare the counter array.
Fixes: 4ea219a839bf ("afs: Split the directory content defs into a header") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241216204124.3752367-14-dhowells@redhat.com cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
08ade0fa | 30-Nov-2023 |
Oleg Nesterov <oleg@redhat.com> |
afs: fix the usage of read_seqbegin_or_lock() in afs_find_server*()
[ Upstream commit 1702e0654ca9a7bcd7c7619c8a5004db58945b71 ]
David Howells says:
(5) afs_find_server().
There could be a
afs: fix the usage of read_seqbegin_or_lock() in afs_find_server*()
[ Upstream commit 1702e0654ca9a7bcd7c7619c8a5004db58945b71 ]
David Howells says:
(5) afs_find_server().
There could be a lot of servers in the list and each server can have multiple addresses, so I think this would be better with an exclusive second pass.
The server list isn't likely to change all that often, but when it does change, there's a good chance several servers are going to be added/removed one after the other. Further, this is only going to be used for incoming cache management/callback requests from the server, which hopefully aren't going to happen too often - but it is remotely drivable.
(6) afs_find_server_by_uuid().
Similarly to (5), there could be a lot of servers to search through, but they are in a tree not a flat list, so it should be faster to process. Again, it's not likely to change that often and, again, when it does change it's likely to involve multiple changes. This can be driven remotely by an incoming cache management request but is mostly going to be driven by setting up or reconfiguring a volume's server list - something that also isn't likely to happen often.
Make the "seq" counter odd on the 2nd pass, otherwise read_seqbegin_or_lock() never takes the lock.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/20231130115614.GA21581@redhat.com/ Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
c3215484 | 21-Dec-2023 |
David Howells <dhowells@redhat.com> |
afs: Fix use-after-free due to get/remove race in volume tree
[ Upstream commit 9a6b294ab496650e9f270123730df37030911b55 ]
When an afs_volume struct is put, its refcount is reduced to 0 before the
afs: Fix use-after-free due to get/remove race in volume tree
[ Upstream commit 9a6b294ab496650e9f270123730df37030911b55 ]
When an afs_volume struct is put, its refcount is reduced to 0 before the cell->volume_lock is taken and the volume removed from the cell->volumes tree.
Unfortunately, this means that the lookup code can race and see a volume with a zero ref in the tree, resulting in a use-after-free:
refcount_t: addition on 0; use-after-free. WARNING: CPU: 3 PID: 130782 at lib/refcount.c:25 refcount_warn_saturate+0x7a/0xda ... RIP: 0010:refcount_warn_saturate+0x7a/0xda ... Call Trace: afs_get_volume+0x3d/0x55 afs_create_volume+0x126/0x1de afs_validate_fc+0xfe/0x130 afs_get_tree+0x20/0x2e5 vfs_get_tree+0x1d/0xc9 do_new_mount+0x13b/0x22e do_mount+0x5d/0x8a __do_sys_mount+0x100/0x12a do_syscall_64+0x3a/0x94 entry_SYSCALL_64_after_hwframe+0x62/0x6a
Fix this by:
(1) When putting, use a flag to indicate if the volume has been removed from the tree and skip the rb_erase if it has.
(2) When looking up, use a conditional ref increment and if it fails because the refcount is 0, replace the node in the tree and set the removal flag.
Fixes: 20325960f875 ("afs: Reorganise volume and server trees to be rooted on the cell") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
81fc8dce | 21-Dec-2023 |
David Howells <dhowells@redhat.com> |
afs: Fix overwriting of result of DNS query
[ Upstream commit a9e01ac8c5ff32669119c40dfdc9e80eb0b7d7aa ]
In afs_update_cell(), ret is the result of the DNS lookup and the errors are to be handled b
afs: Fix overwriting of result of DNS query
[ Upstream commit a9e01ac8c5ff32669119c40dfdc9e80eb0b7d7aa ]
In afs_update_cell(), ret is the result of the DNS lookup and the errors are to be handled by a switch - however, the value gets clobbered in between by setting it to -ENOMEM in case afs_alloc_vlserver_list() fails.
Fix this by moving the setting of -ENOMEM into the error handling for OOM failure. Further, only do it if we don't have an alternative error to return.
Found by Linux Verification Center (linuxtesting.org) with SVACE. Based on a patch from Anastasia Belova [1].
Fixes: d5c32c89b208 ("afs: Fix cell DNS lookup") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: Anastasia Belova <abelova@astralinux.ru> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: lvc-project@linuxtesting.org Link: https://lore.kernel.org/r/20231221085849.1463-1-abelova@astralinux.ru/ [1] Link: https://lore.kernel.org/r/1700862.1703168632@warthog.procyon.org.uk/ # v1 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
3c305aa9 | 11-Dec-2023 |
David Howells <dhowells@redhat.com> |
afs: Fix dynamic root lookup DNS check
[ Upstream commit 74cef6872ceaefb5b6c5c60641371ea28702d358 ]
In the afs dynamic root directory, the ->lookup() function does a DNS check on the cell being ask
afs: Fix dynamic root lookup DNS check
[ Upstream commit 74cef6872ceaefb5b6c5c60641371ea28702d358 ]
In the afs dynamic root directory, the ->lookup() function does a DNS check on the cell being asked for and if the DNS upcall reports an error it will report an error back to userspace (typically ENOENT).
However, if a failed DNS upcall returns a new-style result, it will return a valid result, with the status field set appropriately to indicate the type of failure - and in that case, dns_query() doesn't return an error and we let stat() complete with no error - which can cause confusion in userspace as subsequent calls that trigger d_automount then fail with ENOENT.
Fix this by checking the status result from a valid dns_query() and returning an error if it indicates a failure.
Fixes: bbb4c4323a4d ("dns: Allow the dns resolver to retrieve a server set") Reported-by: Markus Suvanto <markus.suvanto@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=216637 Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Markus Suvanto <markus.suvanto@gmail.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
ceca54f1 | 01-Nov-2023 |
David Howells <dhowells@redhat.com> |
afs: Fix file locking on R/O volumes to operate in local mode
[ Upstream commit b590eb41be766c5a63acc7e8896a042f7a4e8293 ]
AFS doesn't really do locking on R/O volumes as fileservers don't maintain
afs: Fix file locking on R/O volumes to operate in local mode
[ Upstream commit b590eb41be766c5a63acc7e8896a042f7a4e8293 ]
AFS doesn't really do locking on R/O volumes as fileservers don't maintain state with each other and thus a lock on a R/O volume file on one fileserver will not be be visible to someone looking at the same file on another fileserver.
Further, the server may return an error if you try it.
Fix this by doing what other AFS clients do and handle filelocking on R/O volume files entirely within the client and don't touch the server.
Fixes: 6c6c1d63c243 ("afs: Provide mount-time configurable byte-range file locking emulation") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
18350d28 | 08-Jun-2023 |
David Howells <dhowells@redhat.com> |
afs: Make error on cell lookup failure consistent with OpenAFS
[ Upstream commit 2a4ca1b4b77850544408595e2433f5d7811a9daa ]
When kafs tries to look up a cell in the DNS or the local config, it will
afs: Make error on cell lookup failure consistent with OpenAFS
[ Upstream commit 2a4ca1b4b77850544408595e2433f5d7811a9daa ]
When kafs tries to look up a cell in the DNS or the local config, it will translate a lookup failure into EDESTADDRREQ whereas OpenAFS translates it into ENOENT. Applications such as West expect the latter behaviour and fail if they see the former.
This can be seen by trying to mount an unknown cell:
# mount -t afs %example.com:cell.root /mnt mount: /mnt: mount(2) system call failed: Destination address required.
Fixes: 4d673da14533 ("afs: Support the AFS dynamic root") Reported-by: Markus Suvanto <markus.suvanto@gmail.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216637 Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|