1ff61f079SJonathan Corbet.. SPDX-License-Identifier: GPL-2.0
2ff61f079SJonathan Corbet
3ff61f079SJonathan Corbet.. include:: <isonum.txt>
4ff61f079SJonathan Corbet
5ff61f079SJonathan Corbet===============================
6ff61f079SJonathan CorbetBus lock detection and handling
7ff61f079SJonathan Corbet===============================
8ff61f079SJonathan Corbet
9ff61f079SJonathan Corbet:Copyright: |copy| 2021 Intel Corporation
10ff61f079SJonathan Corbet:Authors: - Fenghua Yu <fenghua.yu@intel.com>
11ff61f079SJonathan Corbet          - Tony Luck <tony.luck@intel.com>
12ff61f079SJonathan Corbet
13ff61f079SJonathan CorbetProblem
14ff61f079SJonathan Corbet=======
15ff61f079SJonathan Corbet
16ff61f079SJonathan CorbetA split lock is any atomic operation whose operand crosses two cache lines.
17ff61f079SJonathan CorbetSince the operand spans two cache lines and the operation must be atomic,
18ff61f079SJonathan Corbetthe system locks the bus while the CPU accesses the two cache lines.
19ff61f079SJonathan Corbet
20ff61f079SJonathan CorbetA bus lock is acquired through either split locked access to writeback (WB)
21ff61f079SJonathan Corbetmemory or any locked access to non-WB memory. This is typically thousands of
22ff61f079SJonathan Corbetcycles slower than an atomic operation within a cache line. It also disrupts
23ff61f079SJonathan Corbetperformance on other cores and brings the whole system to its knees.
24ff61f079SJonathan Corbet
25ff61f079SJonathan CorbetDetection
26ff61f079SJonathan Corbet=========
27ff61f079SJonathan Corbet
28ff61f079SJonathan CorbetIntel processors may support either or both of the following hardware
29ff61f079SJonathan Corbetmechanisms to detect split locks and bus locks.
30ff61f079SJonathan Corbet
31ff61f079SJonathan Corbet#AC exception for split lock detection
32ff61f079SJonathan Corbet--------------------------------------
33ff61f079SJonathan Corbet
34ff61f079SJonathan CorbetBeginning with the Tremont Atom CPU split lock operations may raise an
35*d56b699dSBjorn HelgaasAlignment Check (#AC) exception when a split lock operation is attempted.
36ff61f079SJonathan Corbet
37ff61f079SJonathan Corbet#DB exception for bus lock detection
38ff61f079SJonathan Corbet------------------------------------
39ff61f079SJonathan Corbet
40ff61f079SJonathan CorbetSome CPUs have the ability to notify the kernel by an #DB trap after a user
41ff61f079SJonathan Corbetinstruction acquires a bus lock and is executed. This allows the kernel to
42ff61f079SJonathan Corbetterminate the application or to enforce throttling.
43ff61f079SJonathan Corbet
44ff61f079SJonathan CorbetSoftware handling
45ff61f079SJonathan Corbet=================
46ff61f079SJonathan Corbet
47ff61f079SJonathan CorbetThe kernel #AC and #DB handlers handle bus lock based on the kernel
48ff61f079SJonathan Corbetparameter "split_lock_detect". Here is a summary of different options:
49ff61f079SJonathan Corbet
50ff61f079SJonathan Corbet+------------------+----------------------------+-----------------------+
51ff61f079SJonathan Corbet|split_lock_detect=|#AC for split lock		|#DB for bus lock	|
52ff61f079SJonathan Corbet+------------------+----------------------------+-----------------------+
53ff61f079SJonathan Corbet|off	  	   |Do nothing			|Do nothing		|
54ff61f079SJonathan Corbet+------------------+----------------------------+-----------------------+
55ff61f079SJonathan Corbet|warn		   |Kernel OOPs			|Warn once per task and |
56ff61f079SJonathan Corbet|(default)	   |Warn once per task, add a	|and continues to run.  |
57ff61f079SJonathan Corbet|		   |delay, add synchronization	|			|
58ff61f079SJonathan Corbet|		   |to prevent more than one	|			|
59ff61f079SJonathan Corbet|		   |core from executing a	|			|
60ff61f079SJonathan Corbet|		   |split lock in parallel.	|			|
61ff61f079SJonathan Corbet|		   |sysctl split_lock_mitigate	|			|
62ff61f079SJonathan Corbet|		   |can be used to avoid the	|			|
63ff61f079SJonathan Corbet|		   |delay and synchronization	|			|
64ff61f079SJonathan Corbet|		   |When both features are	|			|
65ff61f079SJonathan Corbet|		   |supported, warn in #AC	|			|
66ff61f079SJonathan Corbet+------------------+----------------------------+-----------------------+
67ff61f079SJonathan Corbet|fatal		   |Kernel OOPs			|Send SIGBUS to user.	|
68ff61f079SJonathan Corbet|		   |Send SIGBUS to user		|			|
69ff61f079SJonathan Corbet|		   |When both features are	|			|
70ff61f079SJonathan Corbet|		   |supported, fatal in #AC	|			|
71ff61f079SJonathan Corbet+------------------+----------------------------+-----------------------+
72ff61f079SJonathan Corbet|ratelimit:N	   |Do nothing			|Limit bus lock rate to	|
73ff61f079SJonathan Corbet|(0 < N <= 1000)   |				|N bus locks per second	|
74ff61f079SJonathan Corbet|		   |				|system wide and warn on|
75ff61f079SJonathan Corbet|		   |				|bus locks.		|
76ff61f079SJonathan Corbet+------------------+----------------------------+-----------------------+
77ff61f079SJonathan Corbet
78ff61f079SJonathan CorbetUsages
79ff61f079SJonathan Corbet======
80ff61f079SJonathan Corbet
81ff61f079SJonathan CorbetDetecting and handling bus lock may find usages in various areas:
82ff61f079SJonathan Corbet
83ff61f079SJonathan CorbetIt is critical for real time system designers who build consolidated real
84ff61f079SJonathan Corbettime systems. These systems run hard real time code on some cores and run
85ff61f079SJonathan Corbet"untrusted" user processes on other cores. The hard real time cannot afford
86ff61f079SJonathan Corbetto have any bus lock from the untrusted processes to hurt real time
87ff61f079SJonathan Corbetperformance. To date the designers have been unable to deploy these
88ff61f079SJonathan Corbetsolutions as they have no way to prevent the "untrusted" user code from
89ff61f079SJonathan Corbetgenerating split lock and bus lock to block the hard real time code to
90ff61f079SJonathan Corbetaccess memory during bus locking.
91ff61f079SJonathan Corbet
92ff61f079SJonathan CorbetIt's also useful for general computing to prevent guests or user
93ff61f079SJonathan Corbetapplications from slowing down the overall system by executing instructions
94ff61f079SJonathan Corbetwith bus lock.
95ff61f079SJonathan Corbet
96ff61f079SJonathan Corbet
97ff61f079SJonathan CorbetGuidance
98ff61f079SJonathan Corbet========
99ff61f079SJonathan Corbetoff
100ff61f079SJonathan Corbet---
101ff61f079SJonathan Corbet
102ff61f079SJonathan CorbetDisable checking for split lock and bus lock. This option can be useful if
103ff61f079SJonathan Corbetthere are legacy applications that trigger these events at a low rate so
104ff61f079SJonathan Corbetthat mitigation is not needed.
105ff61f079SJonathan Corbet
106ff61f079SJonathan Corbetwarn
107ff61f079SJonathan Corbet----
108ff61f079SJonathan Corbet
109ff61f079SJonathan CorbetA warning is emitted when a bus lock is detected which allows to identify
110ff61f079SJonathan Corbetthe offending application. This is the default behavior.
111ff61f079SJonathan Corbet
112ff61f079SJonathan Corbetfatal
113ff61f079SJonathan Corbet-----
114ff61f079SJonathan Corbet
115ff61f079SJonathan CorbetIn this case, the bus lock is not tolerated and the process is killed.
116ff61f079SJonathan Corbet
117ff61f079SJonathan Corbetratelimit
118ff61f079SJonathan Corbet---------
119ff61f079SJonathan Corbet
120ff61f079SJonathan CorbetA system wide bus lock rate limit N is specified where 0 < N <= 1000. This
121ff61f079SJonathan Corbetallows a bus lock rate up to N bus locks per second. When the bus lock rate
122ff61f079SJonathan Corbetis exceeded then any task which is caught via the buslock #DB exception is
123ff61f079SJonathan Corbetthrottled by enforced sleeps until the rate goes under the limit again.
124ff61f079SJonathan Corbet
125ff61f079SJonathan CorbetThis is an effective mitigation in cases where a minimal impact can be
126ff61f079SJonathan Corbettolerated, but an eventual Denial of Service attack has to be prevented. It
127ff61f079SJonathan Corbetallows to identify the offending processes and analyze whether they are
128ff61f079SJonathan Corbetmalicious or just badly written.
129ff61f079SJonathan Corbet
130ff61f079SJonathan CorbetSelecting a rate limit of 1000 allows the bus to be locked for up to about
131ff61f079SJonathan Corbetseven million cycles each second (assuming 7000 cycles for each bus
132ff61f079SJonathan Corbetlock). On a 2 GHz processor that would be about 0.35% system slowdown.
133