#
021182e5 |
| 21-Jun-2016 |
Thomas Garnier <thgarnie@google.com> |
x86/mm: Enable KASLR for physical mapping memory regions
Add the physical mapping in the list of randomized memory regions.
The physical memory mapping holds most allocations from boot and heap all
x86/mm: Enable KASLR for physical mapping memory regions
Add the physical mapping in the list of randomized memory regions.
The physical memory mapping holds most allocations from boot and heap allocators. Knowing the base address and physical memory size, an attacker can deduce the PDE virtual address for the vDSO memory page. This attack was demonstrated at CanSecWest 2016, in the following presentation:
"Getting Physical: Extreme Abuse of Intel Based Paged Systems": https://github.com/n3k/CansecWest2016_Getting_Physical_Extreme_Abuse_of_Intel_Based_Paging_Systems/blob/master/Presentation/CanSec2016_Presentation.pdf
(See second part of the presentation).
The exploits used against Linux worked successfully against 4.6+ but fail with KASLR memory enabled:
https://github.com/n3k/CansecWest2016_Getting_Physical_Extreme_Abuse_of_Intel_Based_Paging_Systems/tree/master/Demos/Linux/exploits
Similar research was done at Google leading to this patch proposal.
Variants exists to overwrite /proc or /sys objects ACLs leading to elevation of privileges. These variants were tested against 4.6+.
The page offset used by the compressed kernel retains the static value since it is not yet randomized during this boot stage.
Signed-off-by: Thomas Garnier <thgarnie@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Alexander Popov <alpopov@ptsecurity.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Baoquan He <bhe@redhat.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lv Zheng <lv.zheng@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: linux-doc@vger.kernel.org Link: http://lkml.kernel.org/r/1466556426-32664-7-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
#
dbf984d8 |
| 25-Jun-2016 |
Borislav Petkov <bp@suse.de> |
x86/boot/64: Add forgotten end of function marker
Add secondary_startup_64()'s ENDPROC() marker.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kern
x86/boot/64: Add forgotten end of function marker
Add secondary_startup_64()'s ENDPROC() marker.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160625112457.16930-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
#
91ed140d |
| 31-Mar-2016 |
Borislav Petkov <bp@suse.de> |
x86/asm: Make sure verify_cpu() has a good stack
04633df0c43d ("x86/cpu: Call verify_cpu() after having entered long mode too") added the call to verify_cpu() for sanitizing CPU configuration.
The
x86/asm: Make sure verify_cpu() has a good stack
04633df0c43d ("x86/cpu: Call verify_cpu() after having entered long mode too") added the call to verify_cpu() for sanitizing CPU configuration.
The latter uses the stack minimally and it can happen that we land in startup_64() directly from a 64-bit bootloader. Then we want to use our own, known good stack.
Do that.
APs don't need this as the trampoline sets up a stack for them.
Reported-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1459434062-31055-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
#
0e861fbb |
| 02-Apr-2016 |
Andy Lutomirski <luto@kernel.org> |
x86/head: Move early exception panic code into early_fixup_exception()
This removes a bunch of assembly and adds some C code instead. It changes the actual printouts on both 32-bit and 64-bit kerne
x86/head: Move early exception panic code into early_fixup_exception()
This removes a bunch of assembly and adds some C code instead. It changes the actual printouts on both 32-bit and 64-bit kernels, but they still seem okay.
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: KVM list <kvm@vger.kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel <Xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/4085070316fc3ab29538d3fcfe282648d1d4ee2e.1459605520.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
#
0d0efc07 |
| 02-Apr-2016 |
Andy Lutomirski <luto@kernel.org> |
x86/head: Move the early NMI fixup into C
C is nicer than asm.
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Linus Torvalds <tor
x86/head: Move the early NMI fixup into C
C is nicer than asm.
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: KVM list <kvm@vger.kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel <Xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/dd068269f8d59fe44e9e43a50d0efd67da65c2b5.1459605520.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
#
7bbcdb1c |
| 02-Apr-2016 |
Andy Lutomirski <luto@kernel.org> |
x86/head: Pass a real pt_regs and trapnr to early_fixup_exception()
early_fixup_exception() is limited by the fact that it doesn't have a real struct pt_regs. Change both the 32-bit and 64-bit asm
x86/head: Pass a real pt_regs and trapnr to early_fixup_exception()
early_fixup_exception() is limited by the fact that it doesn't have a real struct pt_regs. Change both the 32-bit and 64-bit asm and the C code to pass and accept a real pt_regs.
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: KVM list <kvm@vger.kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel <Xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/e3fb680fcfd5e23e38237e8328b64a25cc121d37.1459605520.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
#
a4733143 |
| 26-Jan-2016 |
Alexander Kuleshov <kuleshovmail@gmail.com> |
x86/boot: Simplify kernel load address alignment check
We are using %rax as temporary register to check the kernel address alignment. We don't really have to since the TEST instruction does not clob
x86/boot: Simplify kernel load address alignment check
We are using %rax as temporary register to check the kernel address alignment. We don't really have to since the TEST instruction does not clobber the destination operand.
Suggested-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alexander Popov <alpopov@ptsecurity.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1453531828-19291-1-git-send-email-kuleshovmail@gmail.com Link: http://lkml.kernel.org/r/1453842730-28463-11-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
#
14365449 |
| 26-Jan-2016 |
Alexander Kuleshov <kuleshovmail@gmail.com> |
x86/asm: Remove unused L3_PAGE_OFFSET
L3_PAGE_OFFSET was introduced in commit a6523748bd (paravirt/x86, 64-bit: move __PAGE_OFFSET to leave a space for hypervisor), but has no users.
Signed-off-by:
x86/asm: Remove unused L3_PAGE_OFFSET
L3_PAGE_OFFSET was introduced in commit a6523748bd (paravirt/x86, 64-bit: move __PAGE_OFFSET to leave a space for hypervisor), but has no users.
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Link: http://lkml.kernel.org/r/1453810881-30622-1-git-send-email-kuleshovmail@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
Revision tags: v4.4, openbmc-20151217-1, openbmc-20151210-1, openbmc-20151202-1, openbmc-20151123-1, openbmc-20151118-1 |
|
#
04633df0 |
| 05-Nov-2015 |
Borislav Petkov <bp@suse.de> |
x86/cpu: Call verify_cpu() after having entered long mode too
When we get loaded by a 64-bit bootloader, kernel entry point is startup_64 in head_64.S. We don't trust any and all bootloaders because
x86/cpu: Call verify_cpu() after having entered long mode too
When we get loaded by a 64-bit bootloader, kernel entry point is startup_64 in head_64.S. We don't trust any and all bootloaders because some will fiddle with CPU configuration so we go ahead and massage each CPU into sanity again.
For example, some dell BIOSes have this XD disable feature which set IA32_MISC_ENABLE[34] and disable NX. This might be some dumb workaround for other OSes but Linux sure doesn't need it.
A similar thing is present in the Surface 3 firmware - see https://bugzilla.kernel.org/show_bug.cgi?id=106051 - which sets this bit only on the BSP:
# rdmsr -a 0x1a0 400850089 850089 850089 850089
I know, right?!
There's not even an off switch in there.
So fix all those cases by sanitizing the 64-bit entry point too. For that, make verify_cpu() callable in 64-bit mode also.
Requested-and-debugged-by: "H. Peter Anvin" <hpa@zytor.com> Reported-and-tested-by: Bastien Nocera <bugzilla@hadess.net> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1446739076-21303-1-git-send-email-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
Revision tags: openbmc-20151104-1, v4.3, openbmc-20151102-1, openbmc-20151028-1, v4.3-rc1, v4.2, v4.2-rc8, v4.2-rc7, v4.2-rc6, v4.2-rc5, v4.2-rc4, v4.2-rc3, v4.2-rc2, v4.2-rc1 |
|
#
5d5aa3cf |
| 02-Jul-2015 |
Alexander Popov <alpopov@ptsecurity.com> |
x86/kasan: Fix KASAN shadow region page tables
Currently KASAN shadow region page tables created without respect of physical offset (phys_base). This causes kernel halt when phys_base is not zero.
x86/kasan: Fix KASAN shadow region page tables
Currently KASAN shadow region page tables created without respect of physical offset (phys_base). This causes kernel halt when phys_base is not zero.
So let's initialize KASAN shadow region page tables in kasan_early_init() using __pa_nodebug() which considers phys_base.
This patch also separates x86_64_start_kernel() from KASAN low level details by moving kasan_map_early_shadow(init_level4_pgt) into kasan_early_init().
Remove the comment before clear_bss() which stopped bringing much profit to the code readability. Otherwise describing all the new order dependencies would be too verbose.
Signed-off-by: Alexander Popov <alpopov@ptsecurity.com> Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: <stable@vger.kernel.org> # 4.0+ Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1435828178-10975-3-git-send-email-a.ryabinin@samsung.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
Revision tags: v4.1, v4.1-rc8, v4.1-rc7, v4.1-rc6, v4.1-rc5 |
|
#
425be567 |
| 22-May-2015 |
Andy Lutomirski <luto@kernel.org> |
x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers
The early_idt_handlers asm code generates an array of entry points spaced nine bytes apart. It's not really clear from that co
x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers
The early_idt_handlers asm code generates an array of entry points spaced nine bytes apart. It's not really clear from that code or from the places that reference it what's going on, and the code only works in the first place because GAS never generates two-byte JMP instructions when jumping to global labels.
Clean up the code to generate the correct array stride (member size) explicitly. This should be considerably more robust against screw-ups, as GAS will warn if a .fill directive has a negative count. Using '. =' to advance would have been even more robust (it would generate an actual error if it tried to move backwards), but it would pad with nulls, confusing anyone who tries to disassemble the code. The new scheme should be much clearer to future readers.
While we're at it, improve the comments and rename the array and common code.
Binutils may start relaxing jumps to non-weak labels. If so, this change will fix our build, and we may need to backport this change.
Before, on x86_64:
0000000000000000 <early_idt_handlers>: 0: 6a 00 pushq $0x0 2: 6a 00 pushq $0x0 4: e9 00 00 00 00 jmpq 9 <early_idt_handlers+0x9> 5: R_X86_64_PC32 early_idt_handler-0x4 ... 48: 66 90 xchg %ax,%ax 4a: 6a 08 pushq $0x8 4c: e9 00 00 00 00 jmpq 51 <early_idt_handlers+0x51> 4d: R_X86_64_PC32 early_idt_handler-0x4 ... 117: 6a 00 pushq $0x0 119: 6a 1f pushq $0x1f 11b: e9 00 00 00 00 jmpq 120 <early_idt_handler> 11c: R_X86_64_PC32 early_idt_handler-0x4
After:
0000000000000000 <early_idt_handler_array>: 0: 6a 00 pushq $0x0 2: 6a 00 pushq $0x0 4: e9 14 01 00 00 jmpq 11d <early_idt_handler_common> ... 48: 6a 08 pushq $0x8 4a: e9 d1 00 00 00 jmpq 120 <early_idt_handler_common> 4f: cc int3 50: cc int3 ... 117: 6a 00 pushq $0x0 119: 6a 1f pushq $0x1f 11b: eb 03 jmp 120 <early_idt_handler_common> 11d: cc int3 11e: cc int3 11f: cc int3
Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Binutils <binutils@sourceware.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
#
cdeb6048 |
| 22-May-2015 |
Andy Lutomirski <luto@kernel.org> |
x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers
The early_idt_handlers asm code generates an array of entry points spaced nine bytes apart. It's not really clear from that co
x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers
The early_idt_handlers asm code generates an array of entry points spaced nine bytes apart. It's not really clear from that code or from the places that reference it what's going on, and the code only works in the first place because GAS never generates two-byte JMP instructions when jumping to global labels.
Clean up the code to generate the correct array stride (member size) explicitly. This should be considerably more robust against screw-ups, as GAS will warn if a .fill directive has a negative count. Using '. =' to advance would have been even more robust (it would generate an actual error if it tried to move backwards), but it would pad with nulls, confusing anyone who tries to disassemble the code. The new scheme should be much clearer to future readers.
While we're at it, improve the comments and rename the array and common code.
Binutils may start relaxing jumps to non-weak labels. If so, this change will fix our build, and we may need to backport this change.
Before, on x86_64:
0000000000000000 <early_idt_handlers>: 0: 6a 00 pushq $0x0 2: 6a 00 pushq $0x0 4: e9 00 00 00 00 jmpq 9 <early_idt_handlers+0x9> 5: R_X86_64_PC32 early_idt_handler-0x4 ... 48: 66 90 xchg %ax,%ax 4a: 6a 08 pushq $0x8 4c: e9 00 00 00 00 jmpq 51 <early_idt_handlers+0x51> 4d: R_X86_64_PC32 early_idt_handler-0x4 ... 117: 6a 00 pushq $0x0 119: 6a 1f pushq $0x1f 11b: e9 00 00 00 00 jmpq 120 <early_idt_handler> 11c: R_X86_64_PC32 early_idt_handler-0x4
After:
0000000000000000 <early_idt_handler_array>: 0: 6a 00 pushq $0x0 2: 6a 00 pushq $0x0 4: e9 14 01 00 00 jmpq 11d <early_idt_handler_common> ... 48: 6a 08 pushq $0x8 4a: e9 d1 00 00 00 jmpq 120 <early_idt_handler_common> 4f: cc int3 50: cc int3 ... 117: 6a 00 pushq $0x0 119: 6a 1f pushq $0x1f 11b: eb 03 jmp 120 <early_idt_handler_common> 11d: cc int3 11e: cc int3 11f: cc int3
Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Binutils <binutils@sourceware.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
Revision tags: v4.1-rc4 |
|
#
e839004b |
| 16-May-2015 |
Borislav Petkov <bp@suse.de> |
x86/asm/head*.S: Change global labels to local
Make the disassembly look less confusing:
-- head_64.o.before.asm ++ head_64.o.after.asm 0000000000000120 <early_idt_handler>: 120: fc
x86/asm/head*.S: Change global labels to local
Make the disassembly look less confusing:
-- head_64.o.before.asm ++ head_64.o.after.asm 0000000000000120 <early_idt_handler>: 120: fc cld 121: 83 3c 24 02 cmpl $0x2,(%rsp) - 125: 0f 84 9d 00 00 00 je 1c8 <is_nmi> + 125: 0f 84 9d 00 00 00 je 1c8 <early_idt_handler+0xa8> 12b: 83 3d 00 00 00 00 02 cmpl $0x2,0x0(%rip) # 132 <early_idt_handler+0x12> 132: 74 7e je 1b2 <early_idt_handler+0x92> 134: ff 05 00 00 00 00 incl 0x0(%rip) # 13a <early_idt_handler+0x1a> @@ -1198,9 +1198,7 @@ Disassembly of section .init.text: 1bf: 5a pop %rdx 1c0: 59 pop %rcx 1c1: 58 pop %rax - 1c2: ff 0d 00 00 00 00 decl 0x0(%rip) # 1c8 <is_nmi> - -00000000000001c8 <is_nmi>: + 1c2: ff 0d 00 00 00 00 decl 0x0(%rip) # 1c8 <early_idt_handler+0xa8> 1c8: 48 83 c4 10 add $0x10,%rsp 1cc: 48 cf iretq
-- head_32.o.before.asm ++ head_32.o.after.asm 0000016c <early_idt_handler>: 16c: fc cld 16d: 83 3c 24 02 cmpl $0x2,(%esp) - 171: 74 73 je 1e6 <is_nmi> + 171: 74 73 je 1e6 <ex_entry+0xc> 173: 36 83 3d 00 00 00 00 cmpl $0x2,%ss:0x0 17a: 02 17b: 74 5a je 1d7 <hlt_loop> @@ -483,8 +483,6 @@ Disassembly of section .init.text: 1dd: 59 pop %ecx 1de: 58 pop %eax 1df: 36 ff 0d 00 00 00 00 decl %ss:0x0 - -000001e6 <is_nmi>: 1e6: 83 c4 08 add $0x8,%esp 1e9: cf iret 1ea: 66 90 xchg %ax,%ax
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1431793079-11153-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
Revision tags: v4.1-rc3, v4.1-rc2, v4.1-rc1, v4.0, v4.0-rc7, v4.0-rc6, v4.0-rc5, v4.0-rc4, v4.0-rc3 |
|
#
3e1aa7cb |
| 06-Mar-2015 |
Denys Vlasenko <dvlasenk@redhat.com> |
x86/asm: Optimize unnecessarily wide TEST instructions
By the nature of the TEST operation, it is often possible to test a narrower part of the operand:
"testl $3, mem" -> "testb $3, mem",
x86/asm: Optimize unnecessarily wide TEST instructions
By the nature of the TEST operation, it is often possible to test a narrower part of the operand:
"testl $3, mem" -> "testb $3, mem", "testq $3, %rcx" -> "testb $3, %cl"
This results in shorter instructions, because the TEST instruction has no sign-entending byte-immediate forms unlike other ALU ops.
Note that this change does not create any LCP (Length-Changing Prefix) stalls, which happen when adding a 0x66 prefix, which happens when 16-bit immediates are used, which changes such TEST instructions:
[test_opcode] [modrm] [imm32]
to:
[0x66] [test_opcode] [modrm] [imm16]
where [imm16] has a *different length* now: 2 bytes instead of 4. This confuses the decoder and slows down execution.
REX prefixes were carefully designed to almost never hit this case: adding REX prefix does not change instruction length except MOVABS and MOV [addr],RAX instruction.
This patch does not add instructions which would use a 0x66 prefix, code changes in assembly are:
-48 f7 07 01 00 00 00 testq $0x1,(%rdi) +f6 07 01 testb $0x1,(%rdi) -48 f7 c1 01 00 00 00 test $0x1,%rcx +f6 c1 01 test $0x1,%cl -48 f7 c1 02 00 00 00 test $0x2,%rcx +f6 c1 02 test $0x2,%cl -41 f7 c2 01 00 00 00 test $0x1,%r10d +41 f6 c2 01 test $0x1,%r10b -48 f7 c1 04 00 00 00 test $0x4,%rcx +f6 c1 04 test $0x4,%cl -48 f7 c1 08 00 00 00 test $0x8,%rcx +f6 c1 08 test $0x8,%cl
Linus further notes:
"There are no stalls from using 8-bit instruction forms.
Now, changing from 64-bit or 32-bit 'test' instructions to 8-bit ones *could* cause problems if it ends up having forwarding issues, so that instead of just forwarding the result, you end up having to wait for it to be stable in the L1 cache (or possibly the register file). The forwarding from the store buffer is simplest and most reliable if the read is done at the exact same address and the exact same size as the write that gets forwarded.
But that's true only if:
(a) the write was very recent and is still in the write queue. I'm not sure that's the case here anyway.
(b) on at least most Intel microarchitectures, you have to test a different byte than the lowest one (so forwarding a 64-bit write to a 8-bit read ends up working fine, as long as the 8-bit read is of the low 8 bits of the written data).
A very similar issue *might* show up for registers too, not just memory writes, if you use 'testb' with a high-byte register (where instead of forwarding the value from the original producer it needs to go through the register file and then shifted). But it's mainly a problem for store buffers.
But afaik, the way Denys changed the test instructions, neither of the above issues should be true.
The real problem for store buffer forwarding tends to be "write 8 bits, read 32 bits". That can be really surprisingly expensive, because the read ends up having to wait until the write has hit the cacheline, and we might talk tens of cycles of latency here. But "write 32 bits, read the low 8 bits" *should* be fast on pretty much all x86 chips, afaik."
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1425675332-31576-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
Revision tags: v4.0-rc2, v4.0-rc1, v3.19, v3.19-rc7 |
|
#
5b171e82 |
| 27-Jan-2015 |
Alexander Kuleshov <kuleshovmail@gmail.com> |
x86/asm/boot: Fix path in comments
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Martin Mares <mj@ucw.cz> Link: http://lkml.kernel.org/r/1422382588-10367-1-git-send-email-kuleshovma
x86/asm/boot: Fix path in comments
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Martin Mares <mj@ucw.cz> Link: http://lkml.kernel.org/r/1422382588-10367-1-git-send-email-kuleshovmail@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
#
ef7f0d6a |
| 13-Feb-2015 |
Andrey Ryabinin <a.ryabinin@samsung.com> |
x86_64: add KASan support
This patch adds arch specific code for kernel address sanitizer.
16TB of virtual addressed used for shadow memory. It's located in range [ffffec0000000000 - fffffc0000000
x86_64: add KASan support
This patch adds arch specific code for kernel address sanitizer.
16TB of virtual addressed used for shadow memory. It's located in range [ffffec0000000000 - fffffc0000000000] between vmemmap and %esp fixup stacks.
At early stage we map whole shadow region with zero page. Latter, after pages mapped to direct mapping address range we unmap zero pages from corresponding shadow (see kasan_map_shadow()) and allocate and map a real shadow memory reusing vmemmap_populate() function.
Also replace __pa with __pa_nodebug before shadow initialized. __pa with CONFIG_DEBUG_VIRTUAL=y make external function call (__phys_addr) __phys_addr is instrumented, so __asan_load could be called before shadow area initialized.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrey Konovalov <adech.fo@gmail.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Jim Davis <jim.epost@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v3.19-rc6, v3.19-rc5, v3.19-rc4, v3.19-rc3, v3.19-rc2, v3.19-rc1, v3.18, v3.18-rc7, v3.18-rc6, v3.18-rc5, v3.18-rc4, v3.18-rc3, v3.18-rc2, v3.18-rc1, v3.17, v3.17-rc7, v3.17-rc6, v3.17-rc5, v3.17-rc4, v3.17-rc3, v3.17-rc2, v3.17-rc1, v3.16, v3.16-rc7, v3.16-rc6, v3.16-rc5, v3.16-rc4, v3.16-rc3, v3.16-rc2, v3.16-rc1, v3.15, v3.15-rc8, v3.15-rc7, v3.15-rc6, v3.15-rc5, v3.15-rc4, v3.15-rc3, v3.15-rc2, v3.15-rc1, v3.14, v3.14-rc8, v3.14-rc7, v3.14-rc6 |
|
#
b01d4e68 |
| 07-Mar-2014 |
Linus Torvalds <torvalds@linux-foundation.org> |
x86: fix compile error due to X86_TRAP_NMI use in asm files
It's an enum, not a #define, you can't use it in asm files.
Introduced in commit 5fa10196bdb5 ("x86: Ignore NMIs that come in during earl
x86: fix compile error due to X86_TRAP_NMI use in asm files
It's an enum, not a #define, you can't use it in asm files.
Introduced in commit 5fa10196bdb5 ("x86: Ignore NMIs that come in during early boot"), and sadly I didn't compile-test things like I should have before pushing out.
My weak excuse is that the x86 tree generally doesn't introduce stupid things like this (and the ARM pull afterwards doesn't cause me to do a compile-test either, since I don't cross-compile).
Cc: Don Zickus <dzickus@redhat.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
5fa10196 |
| 07-Mar-2014 |
H. Peter Anvin <hpa@linux.intel.com> |
x86: Ignore NMIs that come in during early boot
Don Zickus reports:
A customer generated an external NMI using their iLO to test kdump worked. Unfortunately, the machine hung. Disabling the nmi_w
x86: Ignore NMIs that come in during early boot
Don Zickus reports:
A customer generated an external NMI using their iLO to test kdump worked. Unfortunately, the machine hung. Disabling the nmi_watchdog made things work.
I speculated the external NMI fired, caused the machine to panic (as expected) and the perf NMI from the watchdog came in and was latched. My guess was this somehow caused the hang.
----
It appears that the latched NMI stays latched until the early page table generation on 64 bits, which causes exceptions to happen which end in IRET, which re-enable NMI. Therefore, ignore NMIs that come in during early execution, until we have proper exception handling.
Reported-and-tested-by: Don Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/r/1394221143-29713-1-git-send-email-dzickus@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> # v3.5+, older with some backport effort
show more ...
|
Revision tags: v3.14-rc5, v3.14-rc4, v3.14-rc3, v3.14-rc2, v3.14-rc1, v3.13, v3.13-rc8, v3.13-rc7, v3.13-rc6, v3.13-rc5, v3.13-rc4, v3.13-rc3, v3.13-rc2, v3.13-rc1, v3.12, v3.12-rc7, v3.12-rc6, v3.12-rc5, v3.12-rc4, v3.12-rc3, v3.12-rc2, v3.12-rc1, v3.11, v3.11-rc7, v3.11-rc6, v3.11-rc5, v3.11-rc4, v3.11-rc3, v3.11-rc2 |
|
#
4df05f36 |
| 16-Jul-2013 |
Kees Cook <keescook@chromium.org> |
x86: Make sure IDT is page aligned
Since the IDT is referenced from a fixmap, make sure it is page aligned. Merge with 32-bit one, since it was already aligned to deal with F00F bug. Since bss is cl
x86: Make sure IDT is page aligned
Since the IDT is referenced from a fixmap, make sure it is page aligned. Merge with 32-bit one, since it was already aligned to deal with F00F bug. Since bss is cleared before IDT setup, it can live there. This also moves the other *_idt_table variables into common locations.
This avoids the risk of the IDT ever being moved in the bss and having the mapping be offset, resulting in calling incorrect handlers. In the current upstream kernel this is not a manifested bug, but heavily patched kernels (such as those using the PaX patch series) did encounter this bug.
The tables other than idt_table technically do not need to be page aligned, at least not at the current time, but using a common declaration avoids mistakes. On 64 bits the table is exactly one page long, anyway.
Signed-off-by: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/20130716183441.GA14232@www.outflux.net Reported-by: PaX Team <pageexec@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
show more ...
|
Revision tags: v3.11-rc1, v3.10, v3.10-rc7 |
|
#
cf910e83 |
| 20-Jun-2013 |
Seiji Aguchi <seiji.aguchi@hds.com> |
x86, trace: Add irq vector tracepoints
[Purpose of this patch]
As Vaibhav explained in the thread below, tracepoints for irq vectors are useful.
http://www.spinics.net/lists/mm-commits/msg85707.ht
x86, trace: Add irq vector tracepoints
[Purpose of this patch]
As Vaibhav explained in the thread below, tracepoints for irq vectors are useful.
http://www.spinics.net/lists/mm-commits/msg85707.html
<snip> The current interrupt traces from irq_handler_entry and irq_handler_exit provide when an interrupt is handled. They provide good data about when the system has switched to kernel space and how it affects the currently running processes.
There are some IRQ vectors which trigger the system into kernel space, which are not handled in generic IRQ handlers. Tracing such events gives us the information about IRQ interaction with other system events.
The trace also tells where the system is spending its time. We want to know which cores are handling interrupts and how they are affecting other processes in the system. Also, the trace provides information about when the cores are idle and which interrupts are changing that state. <snip>
On the other hand, my usecase is tracing just local timer event and getting a value of instruction pointer.
I suggested to add an argument local timer event to get instruction pointer before. But there is another way to get it with external module like systemtap. So, I don't need to add any argument to irq vector tracepoints now.
[Patch Description]
Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events. But there is an above use case to trace specific irq_vector rather than tracing all events. In this case, we are concerned about overhead due to unwanted events.
So, add following tracepoints instead of introducing irq_vector_entry/exit. so that we can enable them independently. - local_timer_vector - reschedule_vector - call_function_vector - call_function_single_vector - irq_work_entry_vector - error_apic_vector - thermal_apic_vector - threshold_apic_vector - spurious_apic_vector - x86_platform_ipi_vector
Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty makes a zero when tracepoints are disabled. Detailed explanations are as follows. - Create trace irq handlers with entering_irq()/exiting_irq(). - Create a new IDT, trace_idt_table, at boot time by adding a logic to _set_gate(). It is just a copy of original idt table. - Register the new handlers for tracpoints to the new IDT by introducing macros to alloc_intr_gate() called at registering time of irq_vector handlers. - Add checking, whether irq vector tracing is on/off, into load_current_idt(). This has to be done below debug checking for these reasons. - Switching to debug IDT may be kicked while tracing is enabled. - On the other hands, switching to trace IDT is kicked only when debugging is disabled.
In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being used for other purposes.
Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>
show more ...
|
#
629f4f9d |
| 20-Jun-2013 |
Seiji Aguchi <seiji.aguchi@hds.com> |
x86: Rename variables for debugging
Rename variables for debugging to describe meaning of them precisely.
Also, introduce a generic way to switch IDT by checking a current state, debug on/off.
Sig
x86: Rename variables for debugging
Rename variables for debugging to describe meaning of them precisely.
Also, introduce a generic way to switch IDT by checking a current state, debug on/off.
Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C323A8.7050905@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>
show more ...
|
Revision tags: v3.10-rc6, v3.10-rc5, v3.10-rc4, v3.10-rc3, v3.10-rc2 |
|
#
e9d0626e |
| 14-May-2013 |
Zhang Yanfei <zhangyanfei@cn.fujitsu.com> |
x86-64, init: Fix a possible wraparound bug in switchover in head_64.S
In head_64.S, a switchover has been used to handle kernel crossing 1G, 512G boundaries.
And commit 8170e6bed465b4b0c7687f93e99
x86-64, init: Fix a possible wraparound bug in switchover in head_64.S
In head_64.S, a switchover has been used to handle kernel crossing 1G, 512G boundaries.
And commit 8170e6bed465b4b0c7687f93e9948aca4358a33b x86, 64bit: Use a #PF handler to materialize early mappings on demand said: During the switchover in head_64.S, before #PF handler is available, we use three pages to handle kernel crossing 1G, 512G boundaries with sharing page by playing games with page aliasing: the same page is mapped twice in the higher-level tables with appropriate wraparound.
But from the switchover code, when we set up the PUD table: 114 addq $4096, %rdx 115 movq %rdi, %rax 116 shrq $PUD_SHIFT, %rax 117 andl $(PTRS_PER_PUD-1), %eax 118 movq %rdx, (4096+0)(%rbx,%rax,8) 119 movq %rdx, (4096+8)(%rbx,%rax,8)
It seems line 119 has a potential bug there. For example, if the kernel is loaded at physical address 511G+1008M, that is 000000000 111111111 111111000 000000000000000000000 and the kernel _end is 512G+2M, that is 000000001 000000000 000000001 000000000000000000000 So in this example, when using the 2nd page to setup PUD (line 114~119), rax is 511. In line 118, we put rdx which is the address of the PMD page (the 3rd page) into entry 511 of the PUD table. But in line 119, the entry we calculate from (4096+8)(%rbx,%rax,8) has exceeded the PUD page. IMO, the entry in line 119 should be wraparound into entry 0 of the PUD table.
The patch fixes the bug.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Link: http://lkml.kernel.org/r/5191DE5A.3020302@cn.fujitsu.com Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: <stable@vger.kernel.org> v3.9 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
show more ...
|
Revision tags: v3.10-rc1 |
|
#
78d77df7 |
| 02-May-2013 |
H. Peter Anvin <hpa@linux.intel.com> |
x86-64, init: Do not set NX bits on non-NX capable hardware
During early init, we would incorrectly set the NX bit even if the NX feature was not supported. Instead, only set this bit if NX is actu
x86-64, init: Do not set NX bits on non-NX capable hardware
During early init, we would incorrectly set the NX bit even if the NX feature was not supported. Instead, only set this bit if NX is actually available and enabled. We already do very early detection of the NX bit to enable it in EFER, this simply extends this detection to the early page table mask.
Reported-by: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1367476850.5660.2.camel@nexus Cc: <stable@vger.kernel.org> v3.9
show more ...
|
Revision tags: v3.9, v3.9-rc8, v3.9-rc7, v3.9-rc6, v3.9-rc5, v3.9-rc4, v3.9-rc3, v3.9-rc2, v3.9-rc1 |
|
#
1256276c |
| 25-Feb-2013 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
x86, doc: Fix incorrect comment about 64-bit code segment descriptors
The AMD64 Architecture Programmer's Manual Volume 2, on page 89 mentions: "If the processor is running in 64-bit mode (L=1), the
x86, doc: Fix incorrect comment about 64-bit code segment descriptors
The AMD64 Architecture Programmer's Manual Volume 2, on page 89 mentions: "If the processor is running in 64-bit mode (L=1), the only valid setting of the D bit is 0." This matches with what the code does.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1361825650-14031-4-git-send-email-konrad.wilk@oracle.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
show more ...
|
#
ac630dd9 |
| 22-Feb-2013 |
Linus Torvalds <torvalds@linux-foundation.org> |
x86-64: don't set the early IDT to point directly to 'early_idt_handler'
The code requires the use of the proper per-exception-vector stub functions (set up as the early_idt_handlers[] array - note
x86-64: don't set the early IDT to point directly to 'early_idt_handler'
The code requires the use of the proper per-exception-vector stub functions (set up as the early_idt_handlers[] array - note the 's') that make sure to set up the error vector number. This is true regardless of whether CONFIG_EARLY_PRINTK is set or not.
Why? The stack offset for the comparison of __KERNEL_CS won't be right otherwise, nor will the new check (from commit 8170e6bed465: "x86, 64bit: Use a #PF handler to materialize early mappings on demand") for the page fault exception vector.
Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|