1f89f20acSMark Rutland===================
2f89f20acSMark RutlandReliable Stacktrace
3f89f20acSMark Rutland===================
4f89f20acSMark Rutland
5f89f20acSMark RutlandThis document outlines basic information about reliable stacktracing.
6f89f20acSMark Rutland
7f89f20acSMark Rutland.. Table of Contents:
8f89f20acSMark Rutland
9f89f20acSMark Rutland.. contents:: :local:
10f89f20acSMark Rutland
11f89f20acSMark Rutland1. Introduction
12f89f20acSMark Rutland===============
13f89f20acSMark Rutland
14f89f20acSMark RutlandThe kernel livepatch consistency model relies on accurately identifying which
15f89f20acSMark Rutlandfunctions may have live state and therefore may not be safe to patch. One way
16f89f20acSMark Rutlandto identify which functions are live is to use a stacktrace.
17f89f20acSMark Rutland
18f89f20acSMark RutlandExisting stacktrace code may not always give an accurate picture of all
19f89f20acSMark Rutlandfunctions with live state, and best-effort approaches which can be helpful for
20f89f20acSMark Rutlanddebugging are unsound for livepatching. Livepatching depends on architectures
21f89f20acSMark Rutlandto provide a *reliable* stacktrace which ensures it never omits any live
22f89f20acSMark Rutlandfunctions from a trace.
23f89f20acSMark Rutland
24f89f20acSMark Rutland
25f89f20acSMark Rutland2. Requirements
26f89f20acSMark Rutland===============
27f89f20acSMark Rutland
28f89f20acSMark RutlandArchitectures must implement one of the reliable stacktrace functions.
29f89f20acSMark RutlandArchitectures using CONFIG_ARCH_STACKWALK must implement
30f89f20acSMark Rutland'arch_stack_walk_reliable', and other architectures must implement
31f89f20acSMark Rutland'save_stack_trace_tsk_reliable'.
32f89f20acSMark Rutland
33f89f20acSMark RutlandPrincipally, the reliable stacktrace function must ensure that either:
34f89f20acSMark Rutland
35f89f20acSMark Rutland* The trace includes all functions that the task may be returned to, and the
36f89f20acSMark Rutland  return code is zero to indicate that the trace is reliable.
37f89f20acSMark Rutland
38f89f20acSMark Rutland* The return code is non-zero to indicate that the trace is not reliable.
39f89f20acSMark Rutland
40f89f20acSMark Rutland.. note::
41f89f20acSMark Rutland   In some cases it is legitimate to omit specific functions from the trace,
42f89f20acSMark Rutland   but all other functions must be reported. These cases are described in
43*d56b699dSBjorn Helgaas   further detail below.
44f89f20acSMark Rutland
45f89f20acSMark RutlandSecondly, the reliable stacktrace function must be robust to cases where
46f89f20acSMark Rutlandthe stack or other unwind state is corrupt or otherwise unreliable. The
47f89f20acSMark Rutlandfunction should attempt to detect such cases and return a non-zero error
48f89f20acSMark Rutlandcode, and should not get stuck in an infinite loop or access memory in
49f89f20acSMark Rutlandan unsafe way.  Specific cases are described in further detail below.
50f89f20acSMark Rutland
51f89f20acSMark Rutland
52f89f20acSMark Rutland3. Compile-time analysis
53f89f20acSMark Rutland========================
54f89f20acSMark Rutland
55f89f20acSMark RutlandTo ensure that kernel code can be correctly unwound in all cases,
56f89f20acSMark Rutlandarchitectures may need to verify that code has been compiled in a manner
57f89f20acSMark Rutlandexpected by the unwinder. For example, an unwinder may expect that
58f89f20acSMark Rutlandfunctions manipulate the stack pointer in a limited way, or that all
59f89f20acSMark Rutlandfunctions use specific prologue and epilogue sequences. Architectures
60f89f20acSMark Rutlandwith such requirements should verify the kernel compilation using
61f89f20acSMark Rutlandobjtool.
62f89f20acSMark Rutland
63f89f20acSMark RutlandIn some cases, an unwinder may require metadata to correctly unwind.
64f89f20acSMark RutlandWhere necessary, this metadata should be generated at build time using
65f89f20acSMark Rutlandobjtool.
66f89f20acSMark Rutland
67f89f20acSMark Rutland
68f89f20acSMark Rutland4. Considerations
69f89f20acSMark Rutland=================
70f89f20acSMark Rutland
71f89f20acSMark RutlandThe unwinding process varies across architectures, their respective procedure
72f89f20acSMark Rutlandcall standards, and kernel configurations. This section describes common
73f89f20acSMark Rutlanddetails that architectures should consider.
74f89f20acSMark Rutland
75f89f20acSMark Rutland4.1 Identifying successful termination
76f89f20acSMark Rutland--------------------------------------
77f89f20acSMark Rutland
78f89f20acSMark RutlandUnwinding may terminate early for a number of reasons, including:
79f89f20acSMark Rutland
80f89f20acSMark Rutland* Stack or frame pointer corruption.
81f89f20acSMark Rutland
82f89f20acSMark Rutland* Missing unwind support for an uncommon scenario, or a bug in the unwinder.
83f89f20acSMark Rutland
84f89f20acSMark Rutland* Dynamically generated code (e.g. eBPF) or foreign code (e.g. EFI runtime
85f89f20acSMark Rutland  services) not following the conventions expected by the unwinder.
86f89f20acSMark Rutland
87f89f20acSMark RutlandTo ensure that this does not result in functions being omitted from the trace,
88f89f20acSMark Rutlandeven if not caught by other checks, it is strongly recommended that
89f89f20acSMark Rutlandarchitectures verify that a stacktrace ends at an expected location, e.g.
90f89f20acSMark Rutland
91f89f20acSMark Rutland* Within a specific function that is an entry point to the kernel.
92f89f20acSMark Rutland
93f89f20acSMark Rutland* At a specific location on a stack expected for a kernel entry point.
94f89f20acSMark Rutland
95f89f20acSMark Rutland* On a specific stack expected for a kernel entry point (e.g. if the
96f89f20acSMark Rutland  architecture has separate task and IRQ stacks).
97f89f20acSMark Rutland
98f89f20acSMark Rutland4.2 Identifying unwindable code
99f89f20acSMark Rutland-------------------------------
100f89f20acSMark Rutland
101f89f20acSMark RutlandUnwinding typically relies on code following specific conventions (e.g.
102f89f20acSMark Rutlandmanipulating a frame pointer), but there can be code which may not follow these
103f89f20acSMark Rutlandconventions and may require special handling in the unwinder, e.g.
104f89f20acSMark Rutland
105f89f20acSMark Rutland* Exception vectors and entry assembly.
106f89f20acSMark Rutland
107f89f20acSMark Rutland* Procedure Linkage Table (PLT) entries and veneer functions.
108f89f20acSMark Rutland
109f89f20acSMark Rutland* Trampoline assembly (e.g. ftrace, kprobes).
110f89f20acSMark Rutland
111f89f20acSMark Rutland* Dynamically generated code (e.g. eBPF, optprobe trampolines).
112f89f20acSMark Rutland
113f89f20acSMark Rutland* Foreign code (e.g. EFI runtime services).
114f89f20acSMark Rutland
115f89f20acSMark RutlandTo ensure that such cases do not result in functions being omitted from a
116f89f20acSMark Rutlandtrace, it is strongly recommended that architectures positively identify code
117f89f20acSMark Rutlandwhich is known to be reliable to unwind from, and reject unwinding from all
118f89f20acSMark Rutlandother code.
119f89f20acSMark Rutland
120f89f20acSMark RutlandKernel code including modules and eBPF can be distinguished from foreign code
121f89f20acSMark Rutlandusing '__kernel_text_address()'. Checking for this also helps to detect stack
122f89f20acSMark Rutlandcorruption.
123f89f20acSMark Rutland
124f89f20acSMark RutlandThere are several ways an architecture may identify kernel code which is deemed
125f89f20acSMark Rutlandunreliable to unwind from, e.g.
126f89f20acSMark Rutland
127f89f20acSMark Rutland* Placing such code into special linker sections, and rejecting unwinding from
128f89f20acSMark Rutland  any code in these sections.
129f89f20acSMark Rutland
130f89f20acSMark Rutland* Identifying specific portions of code using bounds information.
131f89f20acSMark Rutland
132f89f20acSMark Rutland4.3 Unwinding across interrupts and exceptions
133f89f20acSMark Rutland----------------------------------------------
134f89f20acSMark Rutland
135f89f20acSMark RutlandAt function call boundaries the stack and other unwind state is expected to be
136f89f20acSMark Rutlandin a consistent state suitable for reliable unwinding, but this may not be the
137f89f20acSMark Rutlandcase part-way through a function. For example, during a function prologue or
138f89f20acSMark Rutlandepilogue a frame pointer may be transiently invalid, or during the function
139f89f20acSMark Rutlandbody the return address may be held in an arbitrary general purpose register.
140f89f20acSMark RutlandFor some architectures this may change at runtime as a result of dynamic
141f89f20acSMark Rutlandinstrumentation.
142f89f20acSMark Rutland
143f89f20acSMark RutlandIf an interrupt or other exception is taken while the stack or other unwind
144f89f20acSMark Rutlandstate is in an inconsistent state, it may not be possible to reliably unwind,
145f89f20acSMark Rutlandand it may not be possible to identify whether such unwinding will be reliable.
146f89f20acSMark RutlandSee below for examples.
147f89f20acSMark Rutland
148f89f20acSMark RutlandArchitectures which cannot identify when it is reliable to unwind such cases
149f89f20acSMark Rutland(or where it is never reliable) must reject unwinding across exception
150f89f20acSMark Rutlandboundaries. Note that it may be reliable to unwind across certain
151f89f20acSMark Rutlandexceptions (e.g. IRQ) but unreliable to unwind across other exceptions
152f89f20acSMark Rutland(e.g. NMI).
153f89f20acSMark Rutland
154f89f20acSMark RutlandArchitectures which can identify when it is reliable to unwind such cases (or
155f89f20acSMark Rutlandhave no such cases) should attempt to unwind across exception boundaries, as
156f89f20acSMark Rutlanddoing so can prevent unnecessarily stalling livepatch consistency checks and
157f89f20acSMark Rutlandpermits livepatch transitions to complete more quickly.
158f89f20acSMark Rutland
159f89f20acSMark Rutland4.4 Rewriting of return addresses
160f89f20acSMark Rutland---------------------------------
161f89f20acSMark Rutland
162f89f20acSMark RutlandSome trampolines temporarily modify the return address of a function in order
163f89f20acSMark Rutlandto intercept when that function returns with a return trampoline, e.g.
164f89f20acSMark Rutland
165f89f20acSMark Rutland* An ftrace trampoline may modify the return address so that function graph
166f89f20acSMark Rutland  tracing can intercept returns.
167f89f20acSMark Rutland
168f89f20acSMark Rutland* A kprobes (or optprobes) trampoline may modify the return address so that
169f89f20acSMark Rutland  kretprobes can intercept returns.
170f89f20acSMark Rutland
171f89f20acSMark RutlandWhen this happens, the original return address will not be in its usual
172f89f20acSMark Rutlandlocation. For trampolines which are not subject to live patching, where an
173f89f20acSMark Rutlandunwinder can reliably determine the original return address and no unwind state
174f89f20acSMark Rutlandis altered by the trampoline, the unwinder may report the original return
175f89f20acSMark Rutlandaddress in place of the trampoline and report this as reliable. Otherwise, an
176f89f20acSMark Rutlandunwinder must report these cases as unreliable.
177f89f20acSMark Rutland
178f89f20acSMark RutlandSpecial care is required when identifying the original return address, as this
179f89f20acSMark Rutlandinformation is not in a consistent location for the duration of the entry
180f89f20acSMark Rutlandtrampoline or return trampoline. For example, considering the x86_64
181f89f20acSMark Rutland'return_to_handler' return trampoline:
182f89f20acSMark Rutland
183f89f20acSMark Rutland.. code-block:: none
184f89f20acSMark Rutland
185f89f20acSMark Rutland   SYM_CODE_START(return_to_handler)
186fb799447SJosh Poimboeuf           UNWIND_HINT_UNDEFINED
187f89f20acSMark Rutland           subq  $24, %rsp
188f89f20acSMark Rutland
189f89f20acSMark Rutland           /* Save the return values */
190f89f20acSMark Rutland           movq %rax, (%rsp)
191f89f20acSMark Rutland           movq %rdx, 8(%rsp)
192f89f20acSMark Rutland           movq %rbp, %rdi
193f89f20acSMark Rutland
194f89f20acSMark Rutland           call ftrace_return_to_handler
195f89f20acSMark Rutland
196f89f20acSMark Rutland           movq %rax, %rdi
197f89f20acSMark Rutland           movq 8(%rsp), %rdx
198f89f20acSMark Rutland           movq (%rsp), %rax
199f89f20acSMark Rutland           addq $24, %rsp
200f89f20acSMark Rutland           JMP_NOSPEC rdi
201f89f20acSMark Rutland   SYM_CODE_END(return_to_handler)
202f89f20acSMark Rutland
203f89f20acSMark RutlandWhile the traced function runs its return address on the stack points to
204f89f20acSMark Rutlandthe start of return_to_handler, and the original return address is stored in
205f89f20acSMark Rutlandthe task's cur_ret_stack. During this time the unwinder can find the return
206f89f20acSMark Rutlandaddress using ftrace_graph_ret_addr().
207f89f20acSMark Rutland
208f89f20acSMark RutlandWhen the traced function returns to return_to_handler, there is no longer a
209f89f20acSMark Rutlandreturn address on the stack, though the original return address is still stored
210f89f20acSMark Rutlandin the task's cur_ret_stack. Within ftrace_return_to_handler(), the original
211f89f20acSMark Rutlandreturn address is removed from cur_ret_stack and is transiently moved
212f89f20acSMark Rutlandarbitrarily by the compiler before being returned in rax. The return_to_handler
213f89f20acSMark Rutlandtrampoline moves this into rdi before jumping to it.
214f89f20acSMark Rutland
215f89f20acSMark RutlandArchitectures might not always be able to unwind such sequences, such as when
216f89f20acSMark Rutlandftrace_return_to_handler() has removed the address from cur_ret_stack, and the
217f89f20acSMark Rutlandlocation of the return address cannot be reliably determined.
218f89f20acSMark Rutland
219f89f20acSMark RutlandIt is recommended that architectures unwind cases where return_to_handler has
220f89f20acSMark Rutlandnot yet been returned to, but architectures are not required to unwind from the
221f89f20acSMark Rutlandmiddle of return_to_handler and can report this as unreliable. Architectures
222f89f20acSMark Rutlandare not required to unwind from other trampolines which modify the return
223f89f20acSMark Rutlandaddress.
224f89f20acSMark Rutland
225f89f20acSMark Rutland4.5 Obscuring of return addresses
226f89f20acSMark Rutland---------------------------------
227f89f20acSMark Rutland
228f89f20acSMark RutlandSome trampolines do not rewrite the return address in order to intercept
229f89f20acSMark Rutlandreturns, but do transiently clobber the return address or other unwind state.
230f89f20acSMark Rutland
231f89f20acSMark RutlandFor example, the x86_64 implementation of optprobes patches the probed function
232f89f20acSMark Rutlandwith a JMP instruction which targets the associated optprobe trampoline. When
233f89f20acSMark Rutlandthe probe is hit, the CPU will branch to the optprobe trampoline, and the
234f89f20acSMark Rutlandaddress of the probed function is not held in any register or on the stack.
235f89f20acSMark Rutland
236f89f20acSMark RutlandSimilarly, the arm64 implementation of DYNAMIC_FTRACE_WITH_REGS patches traced
237f89f20acSMark Rutlandfunctions with the following:
238f89f20acSMark Rutland
239f89f20acSMark Rutland.. code-block:: none
240f89f20acSMark Rutland
241f89f20acSMark Rutland   MOV X9, X30
242f89f20acSMark Rutland   BL <trampoline>
243f89f20acSMark Rutland
244f89f20acSMark RutlandThe MOV saves the link register (X30) into X9 to preserve the return address
245f89f20acSMark Rutlandbefore the BL clobbers the link register and branches to the trampoline. At the
246f89f20acSMark Rutlandstart of the trampoline, the address of the traced function is in X9 rather
247f89f20acSMark Rutlandthan the link register as would usually be the case.
248f89f20acSMark Rutland
249f89f20acSMark RutlandArchitectures must either ensure that unwinders either reliably unwind
250f89f20acSMark Rutlandsuch cases, or report the unwinding as unreliable.
251f89f20acSMark Rutland
252f89f20acSMark Rutland4.6 Link register unreliability
253f89f20acSMark Rutland-------------------------------
254f89f20acSMark Rutland
255f89f20acSMark RutlandOn some other architectures, 'call' instructions place the return address into a
256f89f20acSMark Rutlandlink register, and 'return' instructions consume the return address from the
257f89f20acSMark Rutlandlink register without modifying the register. On these architectures software
258f89f20acSMark Rutlandmust save the return address to the stack prior to making a function call. Over
259f89f20acSMark Rutlandthe duration of a function call, the return address may be held in the link
260f89f20acSMark Rutlandregister alone, on the stack alone, or in both locations.
261f89f20acSMark Rutland
262f89f20acSMark RutlandUnwinders typically assume the link register is always live, but this
263f89f20acSMark Rutlandassumption can lead to unreliable stack traces. For example, consider the
264f89f20acSMark Rutlandfollowing arm64 assembly for a simple function:
265f89f20acSMark Rutland
266f89f20acSMark Rutland.. code-block:: none
267f89f20acSMark Rutland
268f89f20acSMark Rutland   function:
269f89f20acSMark Rutland           STP X29, X30, [SP, -16]!
270f89f20acSMark Rutland           MOV X29, SP
271f89f20acSMark Rutland           BL <other_function>
272f89f20acSMark Rutland           LDP X29, X30, [SP], #16
273f89f20acSMark Rutland           RET
274f89f20acSMark Rutland
275f89f20acSMark RutlandAt entry to the function, the link register (x30) points to the caller, and the
276f89f20acSMark Rutlandframe pointer (X29) points to the caller's frame including the caller's return
277f89f20acSMark Rutlandaddress. The first two instructions create a new stackframe and update the
278f89f20acSMark Rutlandframe pointer, and at this point the link register and the frame pointer both
279f89f20acSMark Rutlanddescribe this function's return address. A trace at this point may describe
280f89f20acSMark Rutlandthis function twice, and if the function return is being traced, the unwinder
281f89f20acSMark Rutlandmay consume two entries from the fgraph return stack rather than one entry.
282f89f20acSMark Rutland
283f89f20acSMark RutlandThe BL invokes 'other_function' with the link register pointing to this
284f89f20acSMark Rutlandfunction's LDR and the frame pointer pointing to this function's stackframe.
285f89f20acSMark RutlandWhen 'other_function' returns, the link register is left pointing at the BL,
286f89f20acSMark Rutlandand so a trace at this point could result in 'function' appearing twice in the
287f89f20acSMark Rutlandbacktrace.
288f89f20acSMark Rutland
289f89f20acSMark RutlandSimilarly, a function may deliberately clobber the LR, e.g.
290f89f20acSMark Rutland
291f89f20acSMark Rutland.. code-block:: none
292f89f20acSMark Rutland
293f89f20acSMark Rutland   caller:
294f89f20acSMark Rutland           STP X29, X30, [SP, -16]!
295f89f20acSMark Rutland           MOV X29, SP
296f89f20acSMark Rutland           ADR LR, <callee>
297f89f20acSMark Rutland           BLR LR
298f89f20acSMark Rutland           LDP X29, X30, [SP], #16
299f89f20acSMark Rutland           RET
300f89f20acSMark Rutland
301f89f20acSMark RutlandThe ADR places the address of 'callee' into the LR, before the BLR branches to
302f89f20acSMark Rutlandthis address. If a trace is made immediately after the ADR, 'callee' will
303f89f20acSMark Rutlandappear to be the parent of 'caller', rather than the child.
304f89f20acSMark Rutland
305f89f20acSMark RutlandDue to cases such as the above, it may only be possible to reliably consume a
306f89f20acSMark Rutlandlink register value at a function call boundary. Architectures where this is
307f89f20acSMark Rutlandthe case must reject unwinding across exception boundaries unless they can
308f89f20acSMark Rutlandreliably identify when the LR or stack value should be used (e.g. using
309f89f20acSMark Rutlandmetadata generated by objtool).
310