75ecfab9 | 20-Jun-2024 |
Hui Li <lihui@loongson.cn> |
LoongArch: Trigger user-space watchpoints correctly
commit c8e57ab0995c5b443d3c81c8a36b588776dcd0c3 upstream.
In the current code, gdb can set the watchpoint successfully through ptrace interface,
LoongArch: Trigger user-space watchpoints correctly
commit c8e57ab0995c5b443d3c81c8a36b588776dcd0c3 upstream.
In the current code, gdb can set the watchpoint successfully through ptrace interface, but watchpoint will not be triggered.
When debugging the following code using gdb.
lihui@bogon:~$ cat test.c #include <stdio.h> int a = 0; int main() { a = 1; printf("a = %d\n", a); return 0; } lihui@bogon:~$ gcc -g test.c -o test lihui@bogon:~$ gdb test ... (gdb) watch a ... (gdb) r ... a = 1 [Inferior 1 (process 4650) exited normally]
No watchpoints were triggered, the root causes are:
1. Kernel uses perf_event and hw_breakpoint framework to control watchpoint, but the perf_event corresponding to watchpoint is not enabled. So it needs to be enabled according to MWPnCFG3 or FWPnCFG3 PLV bit field in ptrace_hbp_set_ctrl(), and privilege is set according to the monitored addr in hw_breakpoint_control(). Furthermore, add a judgment in ptrace_hbp_set_addr() to ensure kernel-space addr cannot be monitored in user mode.
2. The global enable control for all watchpoints is the WE bit of CSR.CRMD, and hardware sets the value to 0 when an exception is triggered. When the ERTN instruction is executed to return, the hardware restores the value of the PWE field of CSR.PRMD here. So, before a thread containing watchpoints be scheduled, the PWE field of CSR.PRMD needs to be set to 1. Add this modification in hw_breakpoint_control().
All changes according to the LoongArch Reference Manual: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#control-and-status-registers-related-to-watchpoints https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#basic-control-and-status-registers
With this patch:
lihui@bogon:~$ gdb test ... (gdb) watch a Hardware watchpoint 1: a (gdb) r ... Hardware watchpoint 1: a
Old value = 0 New value = 1 main () at test.c:6 6 printf("a = %d\n", a); (gdb) c Continuing. a = 1 [Inferior 1 (process 775) exited normally]
Cc: stable@vger.kernel.org Signed-off-by: Hui Li <lihui@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
ca6d6d87 | 03-Jun-2024 |
Jiaxun Yang <jiaxun.yang@flygoat.com> |
LoongArch: Override higher address bits in JUMP_VIRT_ADDR
commit 1098efd299ffe9c8af818425338c7f6c4f930a98 upstream.
In JUMP_VIRT_ADDR we are performing an or calculation on address value directly f
LoongArch: Override higher address bits in JUMP_VIRT_ADDR
commit 1098efd299ffe9c8af818425338c7f6c4f930a98 upstream.
In JUMP_VIRT_ADDR we are performing an or calculation on address value directly from pcaddi.
This will only work if we are currently running from direct 1:1 mapping addresses or firmware's DMW is configured exactly same as kernel. Still, we should not rely on such assumption.
Fix by overriding higher bits in address comes from pcaddi, so we can get rid of or operator.
Cc: stable@vger.kernel.org Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
d7d7c6cd | 19-Mar-2024 |
Huacai Chen <chenhuacai@loongson.cn> |
LoongArch: Define the __io_aw() hook as mmiowb()
[ Upstream commit 9c68ece8b2a5c5ff9b2fcaea923dd73efeb174cd ]
Commit fb24ea52f78e0d595852e ("drivers: Remove explicit invocations of mmiowb()") remov
LoongArch: Define the __io_aw() hook as mmiowb()
[ Upstream commit 9c68ece8b2a5c5ff9b2fcaea923dd73efeb174cd ]
Commit fb24ea52f78e0d595852e ("drivers: Remove explicit invocations of mmiowb()") remove all mmiowb() in drivers, but it says:
"NOTE: mmiowb() has only ever guaranteed ordering in conjunction with spin_unlock(). However, pairing each mmiowb() removal in this patch with the corresponding call to spin_unlock() is not at all trivial, so there is a small chance that this change may regress any drivers incorrectly relying on mmiowb() to order MMIO writes between CPUs using lock-free synchronisation."
The mmio in radeon_ring_commit() is protected by a mutex rather than a spinlock, but in the mutex fastpath it behaves similar to spinlock. We can add mmiowb() calls in the radeon driver but the maintainer says he doesn't like such a workaround, and radeon is not the only example of mutex protected mmio.
So we should extend the mmiowb tracking system from spinlock to mutex, and maybe other locking primitives. This is not easy and error prone, so we solve it in the architectural code, by simply defining the __io_aw() hook as mmiowb(). And we no longer need to override queued_spin_unlock() so use the generic definition.
Without this, we get such an error when run 'glxgears' on weak ordering architectures such as LoongArch:
radeon 0000:04:00.0: ring 0 stalled for more than 10324msec radeon 0000:04:00.0: ring 3 stalled for more than 10240msec radeon 0000:04:00.0: GPU lockup (current fence id 0x000000000001f412 last fence id 0x000000000001f414 on ring 3) radeon 0000:04:00.0: GPU lockup (current fence id 0x000000000000f940 last fence id 0x000000000000f941 on ring 0) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
Link: https://lore.kernel.org/dri-devel/29df7e26-d7a8-4f67-b988-44353c4270ac@amd.com/T/#t Link: https://lore.kernel.org/linux-arch/20240301130532.3953167-1-chenhuacai@loongson.cn/T/#t Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
278be836 | 17-Oct-2023 |
Icenowy Zheng <uwu@icenowy.me> |
LoongArch: Disable WUC for pgprot_writecombine() like ioremap_wc()
Currently the code disables WUC only disables it for ioremap_wc(), which is only used when mapping writecombine pages like ioremap(
LoongArch: Disable WUC for pgprot_writecombine() like ioremap_wc()
Currently the code disables WUC only disables it for ioremap_wc(), which is only used when mapping writecombine pages like ioremap() (mapped to the kernel space). But for VRAM mapped in TTM/GEM, it is mapped with a crafted pgprot by the pgprot_writecombine() function, in which case WUC isn't disabled now.
Disable WUC for pgprot_writecombine() (fallback to SUC) if needed, like ioremap_wc().
This improves the AMDGPU driver's stability (solves some misrendering) on Loongson-3A5000/3A6000 machines.
Signed-off-by: Icenowy Zheng <uwu@icenowy.me> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
show more ...
|
99e5a247 | 20-Sep-2023 |
Huacai Chen <chenhuacai@loongson.cn> |
LoongArch: Don't inline kasan_mem_to_shadow()/kasan_shadow_to_mem()
As Linus suggested, kasan_mem_to_shadow()/kasan_shadow_to_mem() are not performance-critical and too big to inline. This is simply
LoongArch: Don't inline kasan_mem_to_shadow()/kasan_shadow_to_mem()
As Linus suggested, kasan_mem_to_shadow()/kasan_shadow_to_mem() are not performance-critical and too big to inline. This is simply wrong so just define them out-of-line.
If they really need to be inlined in future, such as the objtool / SMAP issue for X86, we should mark them __always_inline.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
show more ...
|
2a86f1b5 | 20-Sep-2023 |
Huacai Chen <chenhuacai@loongson.cn> |
kasan: Cleanup the __HAVE_ARCH_SHADOW_MAP usage
As Linus suggested, __HAVE_ARCH_XYZ is "stupid" and "having historical uses of it doesn't make it good". So migrate __HAVE_ARCH_SHADOW_MAP to separate
kasan: Cleanup the __HAVE_ARCH_SHADOW_MAP usage
As Linus suggested, __HAVE_ARCH_XYZ is "stupid" and "having historical uses of it doesn't make it good". So migrate __HAVE_ARCH_SHADOW_MAP to separate macros named after the respective functions.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: WANG Xuerui <git@xen0n.name> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
show more ...
|
3563b477 | 20-Sep-2023 |
Andy Shevchenko <andriy.shevchenko@linux.intel.com> |
LoongArch: Use _UL() and _ULL()
Use _UL() and _ULL() that are provided by const.h.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Huacai Chen <chenhuacai@loongson
LoongArch: Use _UL() and _ULL()
Use _UL() and _ULL() that are provided by const.h.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
show more ...
|
5aa4ac64 | 06-Sep-2023 |
Qing Zhang <zhangqing@loongson.cn> |
LoongArch: Add KASAN (Kernel Address Sanitizer) support
1/8 of kernel addresses reserved for shadow memory. But for LoongArch, There are a lot of holes between different segments and valid address s
LoongArch: Add KASAN (Kernel Address Sanitizer) support
1/8 of kernel addresses reserved for shadow memory. But for LoongArch, There are a lot of holes between different segments and valid address space (256T available) is insufficient to map all these segments to kasan shadow memory with the common formula provided by kasan core, saying (addr >> KASAN_SHADOW_SCALE_SHIFT) + KASAN_SHADOW_OFFSET
So LoongArch has a arch-specific mapping formula, different segments are mapped individually, and only limited space lengths of these specific segments are mapped to shadow.
At early boot stage the whole shadow region populated with just one physical page (kasan_early_shadow_page). Later, this page is reused as readonly zero shadow for some memory that kasan currently don't track. After mapping the physical memory, pages for shadow memory are allocated and mapped.
Functions like memset()/memcpy()/memmove() do a lot of memory accesses. If bad pointer passed to one of these function it is important to be caught. Compiler's instrumentation cannot do this since these functions are written in assembly.
KASan replaces memory functions with manually instrumented variants. Original functions declared as weak symbols so strong definitions in mm/kasan/kasan.c could replace them. Original functions have aliases with '__' prefix in names, so we could call non-instrumented variant if needed.
Signed-off-by: Qing Zhang <zhangqing@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
show more ...
|
9fbcc076 | 06-Sep-2023 |
Qing Zhang <zhangqing@loongson.cn> |
LoongArch: Simplify the processing of jumping new kernel for KASLR
Modified relocate_kernel() doesn't return new kernel's entry point but the random_offset. In this way we share the start_kernel() p
LoongArch: Simplify the processing of jumping new kernel for KASLR
Modified relocate_kernel() doesn't return new kernel's entry point but the random_offset. In this way we share the start_kernel() processing with the normal kernel, which avoids calling 'jr a0' directly and allows some other operations (e.g, kasan_early_init) before start_kernel() when KASLR (CONFIG_RANDOMIZE_BASE) is turned on.
Signed-off-by: Qing Zhang <zhangqing@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
show more ...
|