1*ff61f079SJonathan Corbet.. SPDX-License-Identifier: GPL-2.0 2*ff61f079SJonathan Corbet 3*ff61f079SJonathan Corbet=============================== 4*ff61f079SJonathan CorbetKernel level exception handling 5*ff61f079SJonathan Corbet=============================== 6*ff61f079SJonathan Corbet 7*ff61f079SJonathan CorbetCommentary by Joerg Pommnitz <joerg@raleigh.ibm.com> 8*ff61f079SJonathan Corbet 9*ff61f079SJonathan CorbetWhen a process runs in kernel mode, it often has to access user 10*ff61f079SJonathan Corbetmode memory whose address has been passed by an untrusted program. 11*ff61f079SJonathan CorbetTo protect itself the kernel has to verify this address. 12*ff61f079SJonathan Corbet 13*ff61f079SJonathan CorbetIn older versions of Linux this was done with the 14*ff61f079SJonathan Corbetint verify_area(int type, const void * addr, unsigned long size) 15*ff61f079SJonathan Corbetfunction (which has since been replaced by access_ok()). 16*ff61f079SJonathan Corbet 17*ff61f079SJonathan CorbetThis function verified that the memory area starting at address 18*ff61f079SJonathan Corbet'addr' and of size 'size' was accessible for the operation specified 19*ff61f079SJonathan Corbetin type (read or write). To do this, verify_read had to look up the 20*ff61f079SJonathan Corbetvirtual memory area (vma) that contained the address addr. In the 21*ff61f079SJonathan Corbetnormal case (correctly working program), this test was successful. 22*ff61f079SJonathan CorbetIt only failed for a few buggy programs. In some kernel profiling 23*ff61f079SJonathan Corbettests, this normally unneeded verification used up a considerable 24*ff61f079SJonathan Corbetamount of time. 25*ff61f079SJonathan Corbet 26*ff61f079SJonathan CorbetTo overcome this situation, Linus decided to let the virtual memory 27*ff61f079SJonathan Corbethardware present in every Linux-capable CPU handle this test. 28*ff61f079SJonathan Corbet 29*ff61f079SJonathan CorbetHow does this work? 30*ff61f079SJonathan Corbet 31*ff61f079SJonathan CorbetWhenever the kernel tries to access an address that is currently not 32*ff61f079SJonathan Corbetaccessible, the CPU generates a page fault exception and calls the 33*ff61f079SJonathan Corbetpage fault handler:: 34*ff61f079SJonathan Corbet 35*ff61f079SJonathan Corbet void exc_page_fault(struct pt_regs *regs, unsigned long error_code) 36*ff61f079SJonathan Corbet 37*ff61f079SJonathan Corbetin arch/x86/mm/fault.c. The parameters on the stack are set up by 38*ff61f079SJonathan Corbetthe low level assembly glue in arch/x86/entry/entry_32.S. The parameter 39*ff61f079SJonathan Corbetregs is a pointer to the saved registers on the stack, error_code 40*ff61f079SJonathan Corbetcontains a reason code for the exception. 41*ff61f079SJonathan Corbet 42*ff61f079SJonathan Corbetexc_page_fault() first obtains the inaccessible address from the CPU 43*ff61f079SJonathan Corbetcontrol register CR2. If the address is within the virtual address 44*ff61f079SJonathan Corbetspace of the process, the fault probably occurred, because the page 45*ff61f079SJonathan Corbetwas not swapped in, write protected or something similar. However, 46*ff61f079SJonathan Corbetwe are interested in the other case: the address is not valid, there 47*ff61f079SJonathan Corbetis no vma that contains this address. In this case, the kernel jumps 48*ff61f079SJonathan Corbetto the bad_area label. 49*ff61f079SJonathan Corbet 50*ff61f079SJonathan CorbetThere it uses the address of the instruction that caused the exception 51*ff61f079SJonathan Corbet(i.e. regs->eip) to find an address where the execution can continue 52*ff61f079SJonathan Corbet(fixup). If this search is successful, the fault handler modifies the 53*ff61f079SJonathan Corbetreturn address (again regs->eip) and returns. The execution will 54*ff61f079SJonathan Corbetcontinue at the address in fixup. 55*ff61f079SJonathan Corbet 56*ff61f079SJonathan CorbetWhere does fixup point to? 57*ff61f079SJonathan Corbet 58*ff61f079SJonathan CorbetSince we jump to the contents of fixup, fixup obviously points 59*ff61f079SJonathan Corbetto executable code. This code is hidden inside the user access macros. 60*ff61f079SJonathan CorbetI have picked the get_user() macro defined in arch/x86/include/asm/uaccess.h 61*ff61f079SJonathan Corbetas an example. The definition is somewhat hard to follow, so let's peek at 62*ff61f079SJonathan Corbetthe code generated by the preprocessor and the compiler. I selected 63*ff61f079SJonathan Corbetthe get_user() call in drivers/char/sysrq.c for a detailed examination. 64*ff61f079SJonathan Corbet 65*ff61f079SJonathan CorbetThe original code in sysrq.c line 587:: 66*ff61f079SJonathan Corbet 67*ff61f079SJonathan Corbet get_user(c, buf); 68*ff61f079SJonathan Corbet 69*ff61f079SJonathan CorbetThe preprocessor output (edited to become somewhat readable):: 70*ff61f079SJonathan Corbet 71*ff61f079SJonathan Corbet ( 72*ff61f079SJonathan Corbet { 73*ff61f079SJonathan Corbet long __gu_err = - 14 , __gu_val = 0; 74*ff61f079SJonathan Corbet const __typeof__(*( ( buf ) )) *__gu_addr = ((buf)); 75*ff61f079SJonathan Corbet if (((((0 + current_set[0])->tss.segment) == 0x18 ) || 76*ff61f079SJonathan Corbet (((sizeof(*(buf))) <= 0xC0000000UL) && 77*ff61f079SJonathan Corbet ((unsigned long)(__gu_addr ) <= 0xC0000000UL - (sizeof(*(buf))))))) 78*ff61f079SJonathan Corbet do { 79*ff61f079SJonathan Corbet __gu_err = 0; 80*ff61f079SJonathan Corbet switch ((sizeof(*(buf)))) { 81*ff61f079SJonathan Corbet case 1: 82*ff61f079SJonathan Corbet __asm__ __volatile__( 83*ff61f079SJonathan Corbet "1: mov" "b" " %2,%" "b" "1\n" 84*ff61f079SJonathan Corbet "2:\n" 85*ff61f079SJonathan Corbet ".section .fixup,\"ax\"\n" 86*ff61f079SJonathan Corbet "3: movl %3,%0\n" 87*ff61f079SJonathan Corbet " xor" "b" " %" "b" "1,%" "b" "1\n" 88*ff61f079SJonathan Corbet " jmp 2b\n" 89*ff61f079SJonathan Corbet ".section __ex_table,\"a\"\n" 90*ff61f079SJonathan Corbet " .align 4\n" 91*ff61f079SJonathan Corbet " .long 1b,3b\n" 92*ff61f079SJonathan Corbet ".text" : "=r"(__gu_err), "=q" (__gu_val): "m"((*(struct __large_struct *) 93*ff61f079SJonathan Corbet ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err )) ; 94*ff61f079SJonathan Corbet break; 95*ff61f079SJonathan Corbet case 2: 96*ff61f079SJonathan Corbet __asm__ __volatile__( 97*ff61f079SJonathan Corbet "1: mov" "w" " %2,%" "w" "1\n" 98*ff61f079SJonathan Corbet "2:\n" 99*ff61f079SJonathan Corbet ".section .fixup,\"ax\"\n" 100*ff61f079SJonathan Corbet "3: movl %3,%0\n" 101*ff61f079SJonathan Corbet " xor" "w" " %" "w" "1,%" "w" "1\n" 102*ff61f079SJonathan Corbet " jmp 2b\n" 103*ff61f079SJonathan Corbet ".section __ex_table,\"a\"\n" 104*ff61f079SJonathan Corbet " .align 4\n" 105*ff61f079SJonathan Corbet " .long 1b,3b\n" 106*ff61f079SJonathan Corbet ".text" : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *) 107*ff61f079SJonathan Corbet ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err )); 108*ff61f079SJonathan Corbet break; 109*ff61f079SJonathan Corbet case 4: 110*ff61f079SJonathan Corbet __asm__ __volatile__( 111*ff61f079SJonathan Corbet "1: mov" "l" " %2,%" "" "1\n" 112*ff61f079SJonathan Corbet "2:\n" 113*ff61f079SJonathan Corbet ".section .fixup,\"ax\"\n" 114*ff61f079SJonathan Corbet "3: movl %3,%0\n" 115*ff61f079SJonathan Corbet " xor" "l" " %" "" "1,%" "" "1\n" 116*ff61f079SJonathan Corbet " jmp 2b\n" 117*ff61f079SJonathan Corbet ".section __ex_table,\"a\"\n" 118*ff61f079SJonathan Corbet " .align 4\n" " .long 1b,3b\n" 119*ff61f079SJonathan Corbet ".text" : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *) 120*ff61f079SJonathan Corbet ( __gu_addr )) ), "i"(- 14 ), "0"(__gu_err)); 121*ff61f079SJonathan Corbet break; 122*ff61f079SJonathan Corbet default: 123*ff61f079SJonathan Corbet (__gu_val) = __get_user_bad(); 124*ff61f079SJonathan Corbet } 125*ff61f079SJonathan Corbet } while (0) ; 126*ff61f079SJonathan Corbet ((c)) = (__typeof__(*((buf))))__gu_val; 127*ff61f079SJonathan Corbet __gu_err; 128*ff61f079SJonathan Corbet } 129*ff61f079SJonathan Corbet ); 130*ff61f079SJonathan Corbet 131*ff61f079SJonathan CorbetWOW! Black GCC/assembly magic. This is impossible to follow, so let's 132*ff61f079SJonathan Corbetsee what code gcc generates:: 133*ff61f079SJonathan Corbet 134*ff61f079SJonathan Corbet > xorl %edx,%edx 135*ff61f079SJonathan Corbet > movl current_set,%eax 136*ff61f079SJonathan Corbet > cmpl $24,788(%eax) 137*ff61f079SJonathan Corbet > je .L1424 138*ff61f079SJonathan Corbet > cmpl $-1073741825,64(%esp) 139*ff61f079SJonathan Corbet > ja .L1423 140*ff61f079SJonathan Corbet > .L1424: 141*ff61f079SJonathan Corbet > movl %edx,%eax 142*ff61f079SJonathan Corbet > movl 64(%esp),%ebx 143*ff61f079SJonathan Corbet > #APP 144*ff61f079SJonathan Corbet > 1: movb (%ebx),%dl /* this is the actual user access */ 145*ff61f079SJonathan Corbet > 2: 146*ff61f079SJonathan Corbet > .section .fixup,"ax" 147*ff61f079SJonathan Corbet > 3: movl $-14,%eax 148*ff61f079SJonathan Corbet > xorb %dl,%dl 149*ff61f079SJonathan Corbet > jmp 2b 150*ff61f079SJonathan Corbet > .section __ex_table,"a" 151*ff61f079SJonathan Corbet > .align 4 152*ff61f079SJonathan Corbet > .long 1b,3b 153*ff61f079SJonathan Corbet > .text 154*ff61f079SJonathan Corbet > #NO_APP 155*ff61f079SJonathan Corbet > .L1423: 156*ff61f079SJonathan Corbet > movzbl %dl,%esi 157*ff61f079SJonathan Corbet 158*ff61f079SJonathan CorbetThe optimizer does a good job and gives us something we can actually 159*ff61f079SJonathan Corbetunderstand. Can we? The actual user access is quite obvious. Thanks 160*ff61f079SJonathan Corbetto the unified address space we can just access the address in user 161*ff61f079SJonathan Corbetmemory. But what does the .section stuff do????? 162*ff61f079SJonathan Corbet 163*ff61f079SJonathan CorbetTo understand this we have to look at the final kernel:: 164*ff61f079SJonathan Corbet 165*ff61f079SJonathan Corbet > objdump --section-headers vmlinux 166*ff61f079SJonathan Corbet > 167*ff61f079SJonathan Corbet > vmlinux: file format elf32-i386 168*ff61f079SJonathan Corbet > 169*ff61f079SJonathan Corbet > Sections: 170*ff61f079SJonathan Corbet > Idx Name Size VMA LMA File off Algn 171*ff61f079SJonathan Corbet > 0 .text 00098f40 c0100000 c0100000 00001000 2**4 172*ff61f079SJonathan Corbet > CONTENTS, ALLOC, LOAD, READONLY, CODE 173*ff61f079SJonathan Corbet > 1 .fixup 000016bc c0198f40 c0198f40 00099f40 2**0 174*ff61f079SJonathan Corbet > CONTENTS, ALLOC, LOAD, READONLY, CODE 175*ff61f079SJonathan Corbet > 2 .rodata 0000f127 c019a5fc c019a5fc 0009b5fc 2**2 176*ff61f079SJonathan Corbet > CONTENTS, ALLOC, LOAD, READONLY, DATA 177*ff61f079SJonathan Corbet > 3 __ex_table 000015c0 c01a9724 c01a9724 000aa724 2**2 178*ff61f079SJonathan Corbet > CONTENTS, ALLOC, LOAD, READONLY, DATA 179*ff61f079SJonathan Corbet > 4 .data 0000ea58 c01abcf0 c01abcf0 000abcf0 2**4 180*ff61f079SJonathan Corbet > CONTENTS, ALLOC, LOAD, DATA 181*ff61f079SJonathan Corbet > 5 .bss 00018e21 c01ba748 c01ba748 000ba748 2**2 182*ff61f079SJonathan Corbet > ALLOC 183*ff61f079SJonathan Corbet > 6 .comment 00000ec4 00000000 00000000 000ba748 2**0 184*ff61f079SJonathan Corbet > CONTENTS, READONLY 185*ff61f079SJonathan Corbet > 7 .note 00001068 00000ec4 00000ec4 000bb60c 2**0 186*ff61f079SJonathan Corbet > CONTENTS, READONLY 187*ff61f079SJonathan Corbet 188*ff61f079SJonathan CorbetThere are obviously 2 non standard ELF sections in the generated object 189*ff61f079SJonathan Corbetfile. But first we want to find out what happened to our code in the 190*ff61f079SJonathan Corbetfinal kernel executable:: 191*ff61f079SJonathan Corbet 192*ff61f079SJonathan Corbet > objdump --disassemble --section=.text vmlinux 193*ff61f079SJonathan Corbet > 194*ff61f079SJonathan Corbet > c017e785 <do_con_write+c1> xorl %edx,%edx 195*ff61f079SJonathan Corbet > c017e787 <do_con_write+c3> movl 0xc01c7bec,%eax 196*ff61f079SJonathan Corbet > c017e78c <do_con_write+c8> cmpl $0x18,0x314(%eax) 197*ff61f079SJonathan Corbet > c017e793 <do_con_write+cf> je c017e79f <do_con_write+db> 198*ff61f079SJonathan Corbet > c017e795 <do_con_write+d1> cmpl $0xbfffffff,0x40(%esp,1) 199*ff61f079SJonathan Corbet > c017e79d <do_con_write+d9> ja c017e7a7 <do_con_write+e3> 200*ff61f079SJonathan Corbet > c017e79f <do_con_write+db> movl %edx,%eax 201*ff61f079SJonathan Corbet > c017e7a1 <do_con_write+dd> movl 0x40(%esp,1),%ebx 202*ff61f079SJonathan Corbet > c017e7a5 <do_con_write+e1> movb (%ebx),%dl 203*ff61f079SJonathan Corbet > c017e7a7 <do_con_write+e3> movzbl %dl,%esi 204*ff61f079SJonathan Corbet 205*ff61f079SJonathan CorbetThe whole user memory access is reduced to 10 x86 machine instructions. 206*ff61f079SJonathan CorbetThe instructions bracketed in the .section directives are no longer 207*ff61f079SJonathan Corbetin the normal execution path. They are located in a different section 208*ff61f079SJonathan Corbetof the executable file:: 209*ff61f079SJonathan Corbet 210*ff61f079SJonathan Corbet > objdump --disassemble --section=.fixup vmlinux 211*ff61f079SJonathan Corbet > 212*ff61f079SJonathan Corbet > c0199ff5 <.fixup+10b5> movl $0xfffffff2,%eax 213*ff61f079SJonathan Corbet > c0199ffa <.fixup+10ba> xorb %dl,%dl 214*ff61f079SJonathan Corbet > c0199ffc <.fixup+10bc> jmp c017e7a7 <do_con_write+e3> 215*ff61f079SJonathan Corbet 216*ff61f079SJonathan CorbetAnd finally:: 217*ff61f079SJonathan Corbet 218*ff61f079SJonathan Corbet > objdump --full-contents --section=__ex_table vmlinux 219*ff61f079SJonathan Corbet > 220*ff61f079SJonathan Corbet > c01aa7c4 93c017c0 e09f19c0 97c017c0 99c017c0 ................ 221*ff61f079SJonathan Corbet > c01aa7d4 f6c217c0 e99f19c0 a5e717c0 f59f19c0 ................ 222*ff61f079SJonathan Corbet > c01aa7e4 080a18c0 01a019c0 0a0a18c0 04a019c0 ................ 223*ff61f079SJonathan Corbet 224*ff61f079SJonathan Corbetor in human readable byte order:: 225*ff61f079SJonathan Corbet 226*ff61f079SJonathan Corbet > c01aa7c4 c017c093 c0199fe0 c017c097 c017c099 ................ 227*ff61f079SJonathan Corbet > c01aa7d4 c017c2f6 c0199fe9 c017e7a5 c0199ff5 ................ 228*ff61f079SJonathan Corbet ^^^^^^^^^^^^^^^^^ 229*ff61f079SJonathan Corbet this is the interesting part! 230*ff61f079SJonathan Corbet > c01aa7e4 c0180a08 c019a001 c0180a0a c019a004 ................ 231*ff61f079SJonathan Corbet 232*ff61f079SJonathan CorbetWhat happened? The assembly directives:: 233*ff61f079SJonathan Corbet 234*ff61f079SJonathan Corbet .section .fixup,"ax" 235*ff61f079SJonathan Corbet .section __ex_table,"a" 236*ff61f079SJonathan Corbet 237*ff61f079SJonathan Corbettold the assembler to move the following code to the specified 238*ff61f079SJonathan Corbetsections in the ELF object file. So the instructions:: 239*ff61f079SJonathan Corbet 240*ff61f079SJonathan Corbet 3: movl $-14,%eax 241*ff61f079SJonathan Corbet xorb %dl,%dl 242*ff61f079SJonathan Corbet jmp 2b 243*ff61f079SJonathan Corbet 244*ff61f079SJonathan Corbetended up in the .fixup section of the object file and the addresses:: 245*ff61f079SJonathan Corbet 246*ff61f079SJonathan Corbet .long 1b,3b 247*ff61f079SJonathan Corbet 248*ff61f079SJonathan Corbetended up in the __ex_table section of the object file. 1b and 3b 249*ff61f079SJonathan Corbetare local labels. The local label 1b (1b stands for next label 1 250*ff61f079SJonathan Corbetbackward) is the address of the instruction that might fault, i.e. 251*ff61f079SJonathan Corbetin our case the address of the label 1 is c017e7a5: 252*ff61f079SJonathan Corbetthe original assembly code: > 1: movb (%ebx),%dl 253*ff61f079SJonathan Corbetand linked in vmlinux : > c017e7a5 <do_con_write+e1> movb (%ebx),%dl 254*ff61f079SJonathan Corbet 255*ff61f079SJonathan CorbetThe local label 3 (backwards again) is the address of the code to handle 256*ff61f079SJonathan Corbetthe fault, in our case the actual value is c0199ff5: 257*ff61f079SJonathan Corbetthe original assembly code: > 3: movl $-14,%eax 258*ff61f079SJonathan Corbetand linked in vmlinux : > c0199ff5 <.fixup+10b5> movl $0xfffffff2,%eax 259*ff61f079SJonathan Corbet 260*ff61f079SJonathan CorbetIf the fixup was able to handle the exception, control flow may be returned 261*ff61f079SJonathan Corbetto the instruction after the one that triggered the fault, ie. local label 2b. 262*ff61f079SJonathan Corbet 263*ff61f079SJonathan CorbetThe assembly code:: 264*ff61f079SJonathan Corbet 265*ff61f079SJonathan Corbet > .section __ex_table,"a" 266*ff61f079SJonathan Corbet > .align 4 267*ff61f079SJonathan Corbet > .long 1b,3b 268*ff61f079SJonathan Corbet 269*ff61f079SJonathan Corbetbecomes the value pair:: 270*ff61f079SJonathan Corbet 271*ff61f079SJonathan Corbet > c01aa7d4 c017c2f6 c0199fe9 c017e7a5 c0199ff5 ................ 272*ff61f079SJonathan Corbet ^this is ^this is 273*ff61f079SJonathan Corbet 1b 3b 274*ff61f079SJonathan Corbet 275*ff61f079SJonathan Corbetc017e7a5,c0199ff5 in the exception table of the kernel. 276*ff61f079SJonathan Corbet 277*ff61f079SJonathan CorbetSo, what actually happens if a fault from kernel mode with no suitable 278*ff61f079SJonathan Corbetvma occurs? 279*ff61f079SJonathan Corbet 280*ff61f079SJonathan Corbet#. access to invalid address:: 281*ff61f079SJonathan Corbet 282*ff61f079SJonathan Corbet > c017e7a5 <do_con_write+e1> movb (%ebx),%dl 283*ff61f079SJonathan Corbet#. MMU generates exception 284*ff61f079SJonathan Corbet#. CPU calls exc_page_fault() 285*ff61f079SJonathan Corbet#. exc_page_fault() calls do_user_addr_fault() 286*ff61f079SJonathan Corbet#. do_user_addr_fault() calls kernelmode_fixup_or_oops() 287*ff61f079SJonathan Corbet#. kernelmode_fixup_or_oops() calls fixup_exception() (regs->eip == c017e7a5); 288*ff61f079SJonathan Corbet#. fixup_exception() calls search_exception_tables() 289*ff61f079SJonathan Corbet#. search_exception_tables() looks up the address c017e7a5 in the 290*ff61f079SJonathan Corbet exception table (i.e. the contents of the ELF section __ex_table) 291*ff61f079SJonathan Corbet and returns the address of the associated fault handle code c0199ff5. 292*ff61f079SJonathan Corbet#. fixup_exception() modifies its own return address to point to the fault 293*ff61f079SJonathan Corbet handle code and returns. 294*ff61f079SJonathan Corbet#. execution continues in the fault handling code. 295*ff61f079SJonathan Corbet#. a) EAX becomes -EFAULT (== -14) 296*ff61f079SJonathan Corbet b) DL becomes zero (the value we "read" from user space) 297*ff61f079SJonathan Corbet c) execution continues at local label 2 (address of the 298*ff61f079SJonathan Corbet instruction immediately after the faulting user access). 299*ff61f079SJonathan Corbet 300*ff61f079SJonathan CorbetThe steps 8a to 8c in a certain way emulate the faulting instruction. 301*ff61f079SJonathan Corbet 302*ff61f079SJonathan CorbetThat's it, mostly. If you look at our example, you might ask why 303*ff61f079SJonathan Corbetwe set EAX to -EFAULT in the exception handler code. Well, the 304*ff61f079SJonathan Corbetget_user() macro actually returns a value: 0, if the user access was 305*ff61f079SJonathan Corbetsuccessful, -EFAULT on failure. Our original code did not test this 306*ff61f079SJonathan Corbetreturn value, however the inline assembly code in get_user() tries to 307*ff61f079SJonathan Corbetreturn -EFAULT. GCC selected EAX to return this value. 308*ff61f079SJonathan Corbet 309*ff61f079SJonathan CorbetNOTE: 310*ff61f079SJonathan CorbetDue to the way that the exception table is built and needs to be ordered, 311*ff61f079SJonathan Corbetonly use exceptions for code in the .text section. Any other section 312*ff61f079SJonathan Corbetwill cause the exception table to not be sorted correctly, and the 313*ff61f079SJonathan Corbetexceptions will fail. 314*ff61f079SJonathan Corbet 315*ff61f079SJonathan CorbetThings changed when 64-bit support was added to x86 Linux. Rather than 316*ff61f079SJonathan Corbetdouble the size of the exception table by expanding the two entries 317*ff61f079SJonathan Corbetfrom 32-bits to 64 bits, a clever trick was used to store addresses 318*ff61f079SJonathan Corbetas relative offsets from the table itself. The assembly code changed 319*ff61f079SJonathan Corbetfrom:: 320*ff61f079SJonathan Corbet 321*ff61f079SJonathan Corbet .long 1b,3b 322*ff61f079SJonathan Corbet to: 323*ff61f079SJonathan Corbet .long (from) - . 324*ff61f079SJonathan Corbet .long (to) - . 325*ff61f079SJonathan Corbet 326*ff61f079SJonathan Corbetand the C-code that uses these values converts back to absolute addresses 327*ff61f079SJonathan Corbetlike this:: 328*ff61f079SJonathan Corbet 329*ff61f079SJonathan Corbet ex_insn_addr(const struct exception_table_entry *x) 330*ff61f079SJonathan Corbet { 331*ff61f079SJonathan Corbet return (unsigned long)&x->insn + x->insn; 332*ff61f079SJonathan Corbet } 333*ff61f079SJonathan Corbet 334*ff61f079SJonathan CorbetIn v4.6 the exception table entry was expanded with a new field "handler". 335*ff61f079SJonathan CorbetThis is also 32-bits wide and contains a third relative function 336*ff61f079SJonathan Corbetpointer which points to one of: 337*ff61f079SJonathan Corbet 338*ff61f079SJonathan Corbet1) ``int ex_handler_default(const struct exception_table_entry *fixup)`` 339*ff61f079SJonathan Corbet This is legacy case that just jumps to the fixup code 340*ff61f079SJonathan Corbet 341*ff61f079SJonathan Corbet2) ``int ex_handler_fault(const struct exception_table_entry *fixup)`` 342*ff61f079SJonathan Corbet This case provides the fault number of the trap that occurred at 343*ff61f079SJonathan Corbet entry->insn. It is used to distinguish page faults from machine 344*ff61f079SJonathan Corbet check. 345*ff61f079SJonathan Corbet 346*ff61f079SJonathan CorbetMore functions can easily be added. 347*ff61f079SJonathan Corbet 348*ff61f079SJonathan CorbetCONFIG_BUILDTIME_TABLE_SORT allows the __ex_table section to be sorted post 349*ff61f079SJonathan Corbetlink of the kernel image, via a host utility scripts/sorttable. It will set the 350*ff61f079SJonathan Corbetsymbol main_extable_sort_needed to 0, avoiding sorting the __ex_table section 351*ff61f079SJonathan Corbetat boot time. With the exception table sorted, at runtime when an exception 352*ff61f079SJonathan Corbetoccurs we can quickly lookup the __ex_table entry via binary search. 353*ff61f079SJonathan Corbet 354*ff61f079SJonathan CorbetThis is not just a boot time optimization, some architectures require this 355*ff61f079SJonathan Corbettable to be sorted in order to handle exceptions relatively early in the boot 356*ff61f079SJonathan Corbetprocess. For example, i386 makes use of this form of exception handling before 357*ff61f079SJonathan Corbetpaging support is even enabled! 358