H A D | trans.c | 3d372c4e Fri Jan 15 05:05:58 CST 2021 Johannes Berg <johannes.berg@intel.com> iwlwifi: pcie: reschedule in long-running memory reads
If we spin for a long time in memory reads that (for some reason in hardware) take a long time, then we'll eventually get messages such as
watchdog: BUG: soft lockup - CPU#2 stuck for 24s! [kworker/2:2:272]
This is because the reading really does take a very long time, and we don't schedule, so we're hogging the CPU with this task, at least if CONFIG_PREEMPT is not set, e.g. with CONFIG_PREEMPT_VOLUNTARY=y.
Previously I misinterpreted the situation and thought that this was only going to happen if we had interrupts disabled, and then fixed this (which is good anyway, however), but that didn't always help; looking at it again now I realized that the spin unlock will only reschedule if CONFIG_PREEMPT is used.
In order to avoid this issue, change the code to cond_resched() if we've been spinning for too long here.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Fixes: 04516706bb99 ("iwlwifi: pcie: limit memory read spin time") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20210115130253.217a9d6a6a12.If964cb582ab0aaa94e81c4ff3b279eaafda0fd3f@changeid a59a7b96 Fri Jan 15 05:05:58 CST 2021 Johannes Berg <johannes.berg@intel.com> iwlwifi: pcie: reschedule in long-running memory reads
[ Upstream commit 3d372c4edfd4dffb7dea71c6b096fb414782b776 ]
If we spin for a long time in memory reads that (for some reason in hardware) take a long time, then we'll eventually get messages such as
watchdog: BUG: soft lockup - CPU#2 stuck for 24s! [kworker/2:2:272]
This is because the reading really does take a very long time, and we don't schedule, so we're hogging the CPU with this task, at least if CONFIG_PREEMPT is not set, e.g. with CONFIG_PREEMPT_VOLUNTARY=y.
Previously I misinterpreted the situation and thought that this was only going to happen if we had interrupts disabled, and then fixed this (which is good anyway, however), but that didn't always help; looking at it again now I realized that the spin unlock will only reschedule if CONFIG_PREEMPT is used.
In order to avoid this issue, change the code to cond_resched() if we've been spinning for too long here.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Fixes: 04516706bb99 ("iwlwifi: pcie: limit memory read spin time") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20210115130253.217a9d6a6a12.If964cb582ab0aaa94e81c4ff3b279eaafda0fd3f@changeid Signed-off-by: Sasha Levin <sashal@kernel.org> 04516706 Thu Oct 22 08:51:03 CDT 2020 Johannes Berg <johannes.berg@intel.com> iwlwifi: pcie: limit memory read spin time
When we read device memory, we lock a spinlock, write the address we want to read from the device and then spin in a loop reading the data in 32-bit quantities from another register.
As the description makes clear, this is rather inefficient, incurring a PCIe bus transaction for every read. In a typical device today, we want to read 786k SMEM if it crashes, leading to 192k register reads. Occasionally, we've seen the whole loop take over 20 seconds and then triggering the soft lockup detector.
Clearly, it is unreasonable to spin here for such extended periods of time.
To fix this, break the loop down into an outer and an inner loop, and break out of the inner loop if more than half a second elapsed. To avoid too much overhead, check for that only every 128 reads, though there's no particular reason for that number. Then, unlock and relock to obtain NIC access again, reprogram the start address and continue.
This will keep (interrupt) latencies on the CPU down to a reasonable time.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20201022165103.45878a7e49aa.I3b9b9c5a10002915072312ce75b68ed5b3dc6e14@changeid a59a7b96 Fri Jan 15 05:05:58 CST 2021 Johannes Berg <johannes.berg@intel.com> iwlwifi: pcie: reschedule in long-running memory reads [ Upstream commit 3d372c4edfd4dffb7dea71c6b096fb414782b776 ] If we spin for a long time in memory reads that (for some reason in hardware) take a long time, then we'll eventually get messages such as watchdog: BUG: soft lockup - CPU#2 stuck for 24s! [kworker/2:2:272] This is because the reading really does take a very long time, and we don't schedule, so we're hogging the CPU with this task, at least if CONFIG_PREEMPT is not set, e.g. with CONFIG_PREEMPT_VOLUNTARY=y. Previously I misinterpreted the situation and thought that this was only going to happen if we had interrupts disabled, and then fixed this (which is good anyway, however), but that didn't always help; looking at it again now I realized that the spin unlock will only reschedule if CONFIG_PREEMPT is used. In order to avoid this issue, change the code to cond_resched() if we've been spinning for too long here. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Fixes: 04516706bb99 ("iwlwifi: pcie: limit memory read spin time") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20210115130253.217a9d6a6a12.If964cb582ab0aaa94e81c4ff3b279eaafda0fd3f@changeid Signed-off-by: Sasha Levin <sashal@kernel.org> 04516706 Thu Oct 22 08:51:03 CDT 2020 Johannes Berg <johannes.berg@intel.com> iwlwifi: pcie: limit memory read spin time When we read device memory, we lock a spinlock, write the address we want to read from the device and then spin in a loop reading the data in 32-bit quantities from another register. As the description makes clear, this is rather inefficient, incurring a PCIe bus transaction for every read. In a typical device today, we want to read 786k SMEM if it crashes, leading to 192k register reads. Occasionally, we've seen the whole loop take over 20 seconds and then triggering the soft lockup detector. Clearly, it is unreasonable to spin here for such extended periods of time. To fix this, break the loop down into an outer and an inner loop, and break out of the inner loop if more than half a second elapsed. To avoid too much overhead, check for that only every 128 reads, though there's no particular reason for that number. Then, unlock and relock to obtain NIC access again, reprogram the start address and continue. This will keep (interrupt) latencies on the CPU down to a reasonable time. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20201022165103.45878a7e49aa.I3b9b9c5a10002915072312ce75b68ed5b3dc6e14@changeid
|