xref: /openbmc/linux/Documentation/arch/arm/cluster-pm-race-avoidance.rst (revision 2612e3bbc0386368a850140a6c9b990cd496a5ec)
1*e790a4ceSJonathan Corbet=========================================================
2*e790a4ceSJonathan CorbetCluster-wide Power-up/power-down race avoidance algorithm
3*e790a4ceSJonathan Corbet=========================================================
4*e790a4ceSJonathan Corbet
5*e790a4ceSJonathan CorbetThis file documents the algorithm which is used to coordinate CPU and
6*e790a4ceSJonathan Corbetcluster setup and teardown operations and to manage hardware coherency
7*e790a4ceSJonathan Corbetcontrols safely.
8*e790a4ceSJonathan Corbet
9*e790a4ceSJonathan CorbetThe section "Rationale" explains what the algorithm is for and why it is
10*e790a4ceSJonathan Corbetneeded.  "Basic model" explains general concepts using a simplified view
11*e790a4ceSJonathan Corbetof the system.  The other sections explain the actual details of the
12*e790a4ceSJonathan Corbetalgorithm in use.
13*e790a4ceSJonathan Corbet
14*e790a4ceSJonathan Corbet
15*e790a4ceSJonathan CorbetRationale
16*e790a4ceSJonathan Corbet---------
17*e790a4ceSJonathan Corbet
18*e790a4ceSJonathan CorbetIn a system containing multiple CPUs, it is desirable to have the
19*e790a4ceSJonathan Corbetability to turn off individual CPUs when the system is idle, reducing
20*e790a4ceSJonathan Corbetpower consumption and thermal dissipation.
21*e790a4ceSJonathan Corbet
22*e790a4ceSJonathan CorbetIn a system containing multiple clusters of CPUs, it is also desirable
23*e790a4ceSJonathan Corbetto have the ability to turn off entire clusters.
24*e790a4ceSJonathan Corbet
25*e790a4ceSJonathan CorbetTurning entire clusters off and on is a risky business, because it
26*e790a4ceSJonathan Corbetinvolves performing potentially destructive operations affecting a group
27*e790a4ceSJonathan Corbetof independently running CPUs, while the OS continues to run.  This
28*e790a4ceSJonathan Corbetmeans that we need some coordination in order to ensure that critical
29*e790a4ceSJonathan Corbetcluster-level operations are only performed when it is truly safe to do
30*e790a4ceSJonathan Corbetso.
31*e790a4ceSJonathan Corbet
32*e790a4ceSJonathan CorbetSimple locking may not be sufficient to solve this problem, because
33*e790a4ceSJonathan Corbetmechanisms like Linux spinlocks may rely on coherency mechanisms which
34*e790a4ceSJonathan Corbetare not immediately enabled when a cluster powers up.  Since enabling or
35*e790a4ceSJonathan Corbetdisabling those mechanisms may itself be a non-atomic operation (such as
36*e790a4ceSJonathan Corbetwriting some hardware registers and invalidating large caches), other
37*e790a4ceSJonathan Corbetmethods of coordination are required in order to guarantee safe
38*e790a4ceSJonathan Corbetpower-down and power-up at the cluster level.
39*e790a4ceSJonathan Corbet
40*e790a4ceSJonathan CorbetThe mechanism presented in this document describes a coherent memory
41*e790a4ceSJonathan Corbetbased protocol for performing the needed coordination.  It aims to be as
42*e790a4ceSJonathan Corbetlightweight as possible, while providing the required safety properties.
43*e790a4ceSJonathan Corbet
44*e790a4ceSJonathan Corbet
45*e790a4ceSJonathan CorbetBasic model
46*e790a4ceSJonathan Corbet-----------
47*e790a4ceSJonathan Corbet
48*e790a4ceSJonathan CorbetEach cluster and CPU is assigned a state, as follows:
49*e790a4ceSJonathan Corbet
50*e790a4ceSJonathan Corbet	- DOWN
51*e790a4ceSJonathan Corbet	- COMING_UP
52*e790a4ceSJonathan Corbet	- UP
53*e790a4ceSJonathan Corbet	- GOING_DOWN
54*e790a4ceSJonathan Corbet
55*e790a4ceSJonathan Corbet::
56*e790a4ceSJonathan Corbet
57*e790a4ceSJonathan Corbet	    +---------> UP ----------+
58*e790a4ceSJonathan Corbet	    |                        v
59*e790a4ceSJonathan Corbet
60*e790a4ceSJonathan Corbet	COMING_UP                GOING_DOWN
61*e790a4ceSJonathan Corbet
62*e790a4ceSJonathan Corbet	    ^                        |
63*e790a4ceSJonathan Corbet	    +--------- DOWN <--------+
64*e790a4ceSJonathan Corbet
65*e790a4ceSJonathan Corbet
66*e790a4ceSJonathan CorbetDOWN:
67*e790a4ceSJonathan Corbet	The CPU or cluster is not coherent, and is either powered off or
68*e790a4ceSJonathan Corbet	suspended, or is ready to be powered off or suspended.
69*e790a4ceSJonathan Corbet
70*e790a4ceSJonathan CorbetCOMING_UP:
71*e790a4ceSJonathan Corbet	The CPU or cluster has committed to moving to the UP state.
72*e790a4ceSJonathan Corbet	It may be part way through the process of initialisation and
73*e790a4ceSJonathan Corbet	enabling coherency.
74*e790a4ceSJonathan Corbet
75*e790a4ceSJonathan CorbetUP:
76*e790a4ceSJonathan Corbet	The CPU or cluster is active and coherent at the hardware
77*e790a4ceSJonathan Corbet	level.  A CPU in this state is not necessarily being used
78*e790a4ceSJonathan Corbet	actively by the kernel.
79*e790a4ceSJonathan Corbet
80*e790a4ceSJonathan CorbetGOING_DOWN:
81*e790a4ceSJonathan Corbet	The CPU or cluster has committed to moving to the DOWN
82*e790a4ceSJonathan Corbet	state.  It may be part way through the process of teardown and
83*e790a4ceSJonathan Corbet	coherency exit.
84*e790a4ceSJonathan Corbet
85*e790a4ceSJonathan Corbet
86*e790a4ceSJonathan CorbetEach CPU has one of these states assigned to it at any point in time.
87*e790a4ceSJonathan CorbetThe CPU states are described in the "CPU state" section, below.
88*e790a4ceSJonathan Corbet
89*e790a4ceSJonathan CorbetEach cluster is also assigned a state, but it is necessary to split the
90*e790a4ceSJonathan Corbetstate value into two parts (the "cluster" state and "inbound" state) and
91*e790a4ceSJonathan Corbetto introduce additional states in order to avoid races between different
92*e790a4ceSJonathan CorbetCPUs in the cluster simultaneously modifying the state.  The cluster-
93*e790a4ceSJonathan Corbetlevel states are described in the "Cluster state" section.
94*e790a4ceSJonathan Corbet
95*e790a4ceSJonathan CorbetTo help distinguish the CPU states from cluster states in this
96*e790a4ceSJonathan Corbetdiscussion, the state names are given a `CPU_` prefix for the CPU states,
97*e790a4ceSJonathan Corbetand a `CLUSTER_` or `INBOUND_` prefix for the cluster states.
98*e790a4ceSJonathan Corbet
99*e790a4ceSJonathan Corbet
100*e790a4ceSJonathan CorbetCPU state
101*e790a4ceSJonathan Corbet---------
102*e790a4ceSJonathan Corbet
103*e790a4ceSJonathan CorbetIn this algorithm, each individual core in a multi-core processor is
104*e790a4ceSJonathan Corbetreferred to as a "CPU".  CPUs are assumed to be single-threaded:
105*e790a4ceSJonathan Corbettherefore, a CPU can only be doing one thing at a single point in time.
106*e790a4ceSJonathan Corbet
107*e790a4ceSJonathan CorbetThis means that CPUs fit the basic model closely.
108*e790a4ceSJonathan Corbet
109*e790a4ceSJonathan CorbetThe algorithm defines the following states for each CPU in the system:
110*e790a4ceSJonathan Corbet
111*e790a4ceSJonathan Corbet	- CPU_DOWN
112*e790a4ceSJonathan Corbet	- CPU_COMING_UP
113*e790a4ceSJonathan Corbet	- CPU_UP
114*e790a4ceSJonathan Corbet	- CPU_GOING_DOWN
115*e790a4ceSJonathan Corbet
116*e790a4ceSJonathan Corbet::
117*e790a4ceSJonathan Corbet
118*e790a4ceSJonathan Corbet	 cluster setup and
119*e790a4ceSJonathan Corbet	CPU setup complete          policy decision
120*e790a4ceSJonathan Corbet	      +-----------> CPU_UP ------------+
121*e790a4ceSJonathan Corbet	      |                                v
122*e790a4ceSJonathan Corbet
123*e790a4ceSJonathan Corbet	CPU_COMING_UP                   CPU_GOING_DOWN
124*e790a4ceSJonathan Corbet
125*e790a4ceSJonathan Corbet	      ^                                |
126*e790a4ceSJonathan Corbet	      +----------- CPU_DOWN <----------+
127*e790a4ceSJonathan Corbet	 policy decision           CPU teardown complete
128*e790a4ceSJonathan Corbet	or hardware event
129*e790a4ceSJonathan Corbet
130*e790a4ceSJonathan Corbet
131*e790a4ceSJonathan CorbetThe definitions of the four states correspond closely to the states of
132*e790a4ceSJonathan Corbetthe basic model.
133*e790a4ceSJonathan Corbet
134*e790a4ceSJonathan CorbetTransitions between states occur as follows.
135*e790a4ceSJonathan Corbet
136*e790a4ceSJonathan CorbetA trigger event (spontaneous) means that the CPU can transition to the
137*e790a4ceSJonathan Corbetnext state as a result of making local progress only, with no
138*e790a4ceSJonathan Corbetrequirement for any external event to happen.
139*e790a4ceSJonathan Corbet
140*e790a4ceSJonathan Corbet
141*e790a4ceSJonathan CorbetCPU_DOWN:
142*e790a4ceSJonathan Corbet	A CPU reaches the CPU_DOWN state when it is ready for
143*e790a4ceSJonathan Corbet	power-down.  On reaching this state, the CPU will typically
144*e790a4ceSJonathan Corbet	power itself down or suspend itself, via a WFI instruction or a
145*e790a4ceSJonathan Corbet	firmware call.
146*e790a4ceSJonathan Corbet
147*e790a4ceSJonathan Corbet	Next state:
148*e790a4ceSJonathan Corbet		CPU_COMING_UP
149*e790a4ceSJonathan Corbet	Conditions:
150*e790a4ceSJonathan Corbet		none
151*e790a4ceSJonathan Corbet
152*e790a4ceSJonathan Corbet	Trigger events:
153*e790a4ceSJonathan Corbet		a) an explicit hardware power-up operation, resulting
154*e790a4ceSJonathan Corbet		   from a policy decision on another CPU;
155*e790a4ceSJonathan Corbet
156*e790a4ceSJonathan Corbet		b) a hardware event, such as an interrupt.
157*e790a4ceSJonathan Corbet
158*e790a4ceSJonathan Corbet
159*e790a4ceSJonathan CorbetCPU_COMING_UP:
160*e790a4ceSJonathan Corbet	A CPU cannot start participating in hardware coherency until the
161*e790a4ceSJonathan Corbet	cluster is set up and coherent.  If the cluster is not ready,
162*e790a4ceSJonathan Corbet	then the CPU will wait in the CPU_COMING_UP state until the
163*e790a4ceSJonathan Corbet	cluster has been set up.
164*e790a4ceSJonathan Corbet
165*e790a4ceSJonathan Corbet	Next state:
166*e790a4ceSJonathan Corbet		CPU_UP
167*e790a4ceSJonathan Corbet	Conditions:
168*e790a4ceSJonathan Corbet		The CPU's parent cluster must be in CLUSTER_UP.
169*e790a4ceSJonathan Corbet	Trigger events:
170*e790a4ceSJonathan Corbet		Transition of the parent cluster to CLUSTER_UP.
171*e790a4ceSJonathan Corbet
172*e790a4ceSJonathan Corbet	Refer to the "Cluster state" section for a description of the
173*e790a4ceSJonathan Corbet	CLUSTER_UP state.
174*e790a4ceSJonathan Corbet
175*e790a4ceSJonathan Corbet
176*e790a4ceSJonathan CorbetCPU_UP:
177*e790a4ceSJonathan Corbet	When a CPU reaches the CPU_UP state, it is safe for the CPU to
178*e790a4ceSJonathan Corbet	start participating in local coherency.
179*e790a4ceSJonathan Corbet
180*e790a4ceSJonathan Corbet	This is done by jumping to the kernel's CPU resume code.
181*e790a4ceSJonathan Corbet
182*e790a4ceSJonathan Corbet	Note that the definition of this state is slightly different
183*e790a4ceSJonathan Corbet	from the basic model definition: CPU_UP does not mean that the
184*e790a4ceSJonathan Corbet	CPU is coherent yet, but it does mean that it is safe to resume
185*e790a4ceSJonathan Corbet	the kernel.  The kernel handles the rest of the resume
186*e790a4ceSJonathan Corbet	procedure, so the remaining steps are not visible as part of the
187*e790a4ceSJonathan Corbet	race avoidance algorithm.
188*e790a4ceSJonathan Corbet
189*e790a4ceSJonathan Corbet	The CPU remains in this state until an explicit policy decision
190*e790a4ceSJonathan Corbet	is made to shut down or suspend the CPU.
191*e790a4ceSJonathan Corbet
192*e790a4ceSJonathan Corbet	Next state:
193*e790a4ceSJonathan Corbet		CPU_GOING_DOWN
194*e790a4ceSJonathan Corbet	Conditions:
195*e790a4ceSJonathan Corbet		none
196*e790a4ceSJonathan Corbet	Trigger events:
197*e790a4ceSJonathan Corbet		explicit policy decision
198*e790a4ceSJonathan Corbet
199*e790a4ceSJonathan Corbet
200*e790a4ceSJonathan CorbetCPU_GOING_DOWN:
201*e790a4ceSJonathan Corbet	While in this state, the CPU exits coherency, including any
202*e790a4ceSJonathan Corbet	operations required to achieve this (such as cleaning data
203*e790a4ceSJonathan Corbet	caches).
204*e790a4ceSJonathan Corbet
205*e790a4ceSJonathan Corbet	Next state:
206*e790a4ceSJonathan Corbet		CPU_DOWN
207*e790a4ceSJonathan Corbet	Conditions:
208*e790a4ceSJonathan Corbet		local CPU teardown complete
209*e790a4ceSJonathan Corbet	Trigger events:
210*e790a4ceSJonathan Corbet		(spontaneous)
211*e790a4ceSJonathan Corbet
212*e790a4ceSJonathan Corbet
213*e790a4ceSJonathan CorbetCluster state
214*e790a4ceSJonathan Corbet-------------
215*e790a4ceSJonathan Corbet
216*e790a4ceSJonathan CorbetA cluster is a group of connected CPUs with some common resources.
217*e790a4ceSJonathan CorbetBecause a cluster contains multiple CPUs, it can be doing multiple
218*e790a4ceSJonathan Corbetthings at the same time.  This has some implications.  In particular, a
219*e790a4ceSJonathan CorbetCPU can start up while another CPU is tearing the cluster down.
220*e790a4ceSJonathan Corbet
221*e790a4ceSJonathan CorbetIn this discussion, the "outbound side" is the view of the cluster state
222*e790a4ceSJonathan Corbetas seen by a CPU tearing the cluster down.  The "inbound side" is the
223*e790a4ceSJonathan Corbetview of the cluster state as seen by a CPU setting the CPU up.
224*e790a4ceSJonathan Corbet
225*e790a4ceSJonathan CorbetIn order to enable safe coordination in such situations, it is important
226*e790a4ceSJonathan Corbetthat a CPU which is setting up the cluster can advertise its state
227*e790a4ceSJonathan Corbetindependently of the CPU which is tearing down the cluster.  For this
228*e790a4ceSJonathan Corbetreason, the cluster state is split into two parts:
229*e790a4ceSJonathan Corbet
230*e790a4ceSJonathan Corbet	"cluster" state: The global state of the cluster; or the state
231*e790a4ceSJonathan Corbet	on the outbound side:
232*e790a4ceSJonathan Corbet
233*e790a4ceSJonathan Corbet		- CLUSTER_DOWN
234*e790a4ceSJonathan Corbet		- CLUSTER_UP
235*e790a4ceSJonathan Corbet		- CLUSTER_GOING_DOWN
236*e790a4ceSJonathan Corbet
237*e790a4ceSJonathan Corbet	"inbound" state: The state of the cluster on the inbound side.
238*e790a4ceSJonathan Corbet
239*e790a4ceSJonathan Corbet		- INBOUND_NOT_COMING_UP
240*e790a4ceSJonathan Corbet		- INBOUND_COMING_UP
241*e790a4ceSJonathan Corbet
242*e790a4ceSJonathan Corbet
243*e790a4ceSJonathan Corbet	The different pairings of these states results in six possible
244*e790a4ceSJonathan Corbet	states for the cluster as a whole::
245*e790a4ceSJonathan Corbet
246*e790a4ceSJonathan Corbet	                            CLUSTER_UP
247*e790a4ceSJonathan Corbet	          +==========> INBOUND_NOT_COMING_UP -------------+
248*e790a4ceSJonathan Corbet	          #                                               |
249*e790a4ceSJonathan Corbet	                                                          |
250*e790a4ceSJonathan Corbet	     CLUSTER_UP     <----+                                |
251*e790a4ceSJonathan Corbet	  INBOUND_COMING_UP      |                                v
252*e790a4ceSJonathan Corbet
253*e790a4ceSJonathan Corbet	          ^             CLUSTER_GOING_DOWN       CLUSTER_GOING_DOWN
254*e790a4ceSJonathan Corbet	          #              INBOUND_COMING_UP <=== INBOUND_NOT_COMING_UP
255*e790a4ceSJonathan Corbet
256*e790a4ceSJonathan Corbet	    CLUSTER_DOWN         |                                |
257*e790a4ceSJonathan Corbet	  INBOUND_COMING_UP <----+                                |
258*e790a4ceSJonathan Corbet	                                                          |
259*e790a4ceSJonathan Corbet	          ^                                               |
260*e790a4ceSJonathan Corbet	          +===========     CLUSTER_DOWN      <------------+
261*e790a4ceSJonathan Corbet	                       INBOUND_NOT_COMING_UP
262*e790a4ceSJonathan Corbet
263*e790a4ceSJonathan Corbet	Transitions -----> can only be made by the outbound CPU, and
264*e790a4ceSJonathan Corbet	only involve changes to the "cluster" state.
265*e790a4ceSJonathan Corbet
266*e790a4ceSJonathan Corbet	Transitions ===##> can only be made by the inbound CPU, and only
267*e790a4ceSJonathan Corbet	involve changes to the "inbound" state, except where there is no
268*e790a4ceSJonathan Corbet	further transition possible on the outbound side (i.e., the
269*e790a4ceSJonathan Corbet	outbound CPU has put the cluster into the CLUSTER_DOWN state).
270*e790a4ceSJonathan Corbet
271*e790a4ceSJonathan Corbet	The race avoidance algorithm does not provide a way to determine
272*e790a4ceSJonathan Corbet	which exact CPUs within the cluster play these roles.  This must
273*e790a4ceSJonathan Corbet	be decided in advance by some other means.  Refer to the section
274*e790a4ceSJonathan Corbet	"Last man and first man selection" for more explanation.
275*e790a4ceSJonathan Corbet
276*e790a4ceSJonathan Corbet
277*e790a4ceSJonathan Corbet	CLUSTER_DOWN/INBOUND_NOT_COMING_UP is the only state where the
278*e790a4ceSJonathan Corbet	cluster can actually be powered down.
279*e790a4ceSJonathan Corbet
280*e790a4ceSJonathan Corbet	The parallelism of the inbound and outbound CPUs is observed by
281*e790a4ceSJonathan Corbet	the existence of two different paths from CLUSTER_GOING_DOWN/
282*e790a4ceSJonathan Corbet	INBOUND_NOT_COMING_UP (corresponding to GOING_DOWN in the basic
283*e790a4ceSJonathan Corbet	model) to CLUSTER_DOWN/INBOUND_COMING_UP (corresponding to
284*e790a4ceSJonathan Corbet	COMING_UP in the basic model).  The second path avoids cluster
285*e790a4ceSJonathan Corbet	teardown completely.
286*e790a4ceSJonathan Corbet
287*e790a4ceSJonathan Corbet	CLUSTER_UP/INBOUND_COMING_UP is equivalent to UP in the basic
288*e790a4ceSJonathan Corbet	model.  The final transition to CLUSTER_UP/INBOUND_NOT_COMING_UP
289*e790a4ceSJonathan Corbet	is trivial and merely resets the state machine ready for the
290*e790a4ceSJonathan Corbet	next cycle.
291*e790a4ceSJonathan Corbet
292*e790a4ceSJonathan Corbet	Details of the allowable transitions follow.
293*e790a4ceSJonathan Corbet
294*e790a4ceSJonathan Corbet	The next state in each case is notated
295*e790a4ceSJonathan Corbet
296*e790a4ceSJonathan Corbet		<cluster state>/<inbound state> (<transitioner>)
297*e790a4ceSJonathan Corbet
298*e790a4ceSJonathan Corbet	where the <transitioner> is the side on which the transition
299*e790a4ceSJonathan Corbet	can occur; either the inbound or the outbound side.
300*e790a4ceSJonathan Corbet
301*e790a4ceSJonathan Corbet
302*e790a4ceSJonathan CorbetCLUSTER_DOWN/INBOUND_NOT_COMING_UP:
303*e790a4ceSJonathan Corbet	Next state:
304*e790a4ceSJonathan Corbet		CLUSTER_DOWN/INBOUND_COMING_UP (inbound)
305*e790a4ceSJonathan Corbet	Conditions:
306*e790a4ceSJonathan Corbet		none
307*e790a4ceSJonathan Corbet
308*e790a4ceSJonathan Corbet	Trigger events:
309*e790a4ceSJonathan Corbet		a) an explicit hardware power-up operation, resulting
310*e790a4ceSJonathan Corbet		   from a policy decision on another CPU;
311*e790a4ceSJonathan Corbet
312*e790a4ceSJonathan Corbet		b) a hardware event, such as an interrupt.
313*e790a4ceSJonathan Corbet
314*e790a4ceSJonathan Corbet
315*e790a4ceSJonathan CorbetCLUSTER_DOWN/INBOUND_COMING_UP:
316*e790a4ceSJonathan Corbet
317*e790a4ceSJonathan Corbet	In this state, an inbound CPU sets up the cluster, including
318*e790a4ceSJonathan Corbet	enabling of hardware coherency at the cluster level and any
319*e790a4ceSJonathan Corbet	other operations (such as cache invalidation) which are required
320*e790a4ceSJonathan Corbet	in order to achieve this.
321*e790a4ceSJonathan Corbet
322*e790a4ceSJonathan Corbet	The purpose of this state is to do sufficient cluster-level
323*e790a4ceSJonathan Corbet	setup to enable other CPUs in the cluster to enter coherency
324*e790a4ceSJonathan Corbet	safely.
325*e790a4ceSJonathan Corbet
326*e790a4ceSJonathan Corbet	Next state:
327*e790a4ceSJonathan Corbet		CLUSTER_UP/INBOUND_COMING_UP (inbound)
328*e790a4ceSJonathan Corbet	Conditions:
329*e790a4ceSJonathan Corbet		cluster-level setup and hardware coherency complete
330*e790a4ceSJonathan Corbet	Trigger events:
331*e790a4ceSJonathan Corbet		(spontaneous)
332*e790a4ceSJonathan Corbet
333*e790a4ceSJonathan Corbet
334*e790a4ceSJonathan CorbetCLUSTER_UP/INBOUND_COMING_UP:
335*e790a4ceSJonathan Corbet
336*e790a4ceSJonathan Corbet	Cluster-level setup is complete and hardware coherency is
337*e790a4ceSJonathan Corbet	enabled for the cluster.  Other CPUs in the cluster can safely
338*e790a4ceSJonathan Corbet	enter coherency.
339*e790a4ceSJonathan Corbet
340*e790a4ceSJonathan Corbet	This is a transient state, leading immediately to
341*e790a4ceSJonathan Corbet	CLUSTER_UP/INBOUND_NOT_COMING_UP.  All other CPUs on the cluster
342*e790a4ceSJonathan Corbet	should consider treat these two states as equivalent.
343*e790a4ceSJonathan Corbet
344*e790a4ceSJonathan Corbet	Next state:
345*e790a4ceSJonathan Corbet		CLUSTER_UP/INBOUND_NOT_COMING_UP (inbound)
346*e790a4ceSJonathan Corbet	Conditions:
347*e790a4ceSJonathan Corbet		none
348*e790a4ceSJonathan Corbet	Trigger events:
349*e790a4ceSJonathan Corbet		(spontaneous)
350*e790a4ceSJonathan Corbet
351*e790a4ceSJonathan Corbet
352*e790a4ceSJonathan CorbetCLUSTER_UP/INBOUND_NOT_COMING_UP:
353*e790a4ceSJonathan Corbet
354*e790a4ceSJonathan Corbet	Cluster-level setup is complete and hardware coherency is
355*e790a4ceSJonathan Corbet	enabled for the cluster.  Other CPUs in the cluster can safely
356*e790a4ceSJonathan Corbet	enter coherency.
357*e790a4ceSJonathan Corbet
358*e790a4ceSJonathan Corbet	The cluster will remain in this state until a policy decision is
359*e790a4ceSJonathan Corbet	made to power the cluster down.
360*e790a4ceSJonathan Corbet
361*e790a4ceSJonathan Corbet	Next state:
362*e790a4ceSJonathan Corbet		CLUSTER_GOING_DOWN/INBOUND_NOT_COMING_UP (outbound)
363*e790a4ceSJonathan Corbet	Conditions:
364*e790a4ceSJonathan Corbet		none
365*e790a4ceSJonathan Corbet	Trigger events:
366*e790a4ceSJonathan Corbet		policy decision to power down the cluster
367*e790a4ceSJonathan Corbet
368*e790a4ceSJonathan Corbet
369*e790a4ceSJonathan CorbetCLUSTER_GOING_DOWN/INBOUND_NOT_COMING_UP:
370*e790a4ceSJonathan Corbet
371*e790a4ceSJonathan Corbet	An outbound CPU is tearing the cluster down.  The selected CPU
372*e790a4ceSJonathan Corbet	must wait in this state until all CPUs in the cluster are in the
373*e790a4ceSJonathan Corbet	CPU_DOWN state.
374*e790a4ceSJonathan Corbet
375*e790a4ceSJonathan Corbet	When all CPUs are in the CPU_DOWN state, the cluster can be torn
376*e790a4ceSJonathan Corbet	down, for example by cleaning data caches and exiting
377*e790a4ceSJonathan Corbet	cluster-level coherency.
378*e790a4ceSJonathan Corbet
379*e790a4ceSJonathan Corbet	To avoid wasteful unnecessary teardown operations, the outbound
380*e790a4ceSJonathan Corbet	should check the inbound cluster state for asynchronous
381*e790a4ceSJonathan Corbet	transitions to INBOUND_COMING_UP.  Alternatively, individual
382*e790a4ceSJonathan Corbet	CPUs can be checked for entry into CPU_COMING_UP or CPU_UP.
383*e790a4ceSJonathan Corbet
384*e790a4ceSJonathan Corbet
385*e790a4ceSJonathan Corbet	Next states:
386*e790a4ceSJonathan Corbet
387*e790a4ceSJonathan Corbet	CLUSTER_DOWN/INBOUND_NOT_COMING_UP (outbound)
388*e790a4ceSJonathan Corbet		Conditions:
389*e790a4ceSJonathan Corbet			cluster torn down and ready to power off
390*e790a4ceSJonathan Corbet		Trigger events:
391*e790a4ceSJonathan Corbet			(spontaneous)
392*e790a4ceSJonathan Corbet
393*e790a4ceSJonathan Corbet	CLUSTER_GOING_DOWN/INBOUND_COMING_UP (inbound)
394*e790a4ceSJonathan Corbet		Conditions:
395*e790a4ceSJonathan Corbet			none
396*e790a4ceSJonathan Corbet
397*e790a4ceSJonathan Corbet		Trigger events:
398*e790a4ceSJonathan Corbet			a) an explicit hardware power-up operation,
399*e790a4ceSJonathan Corbet			   resulting from a policy decision on another
400*e790a4ceSJonathan Corbet			   CPU;
401*e790a4ceSJonathan Corbet
402*e790a4ceSJonathan Corbet			b) a hardware event, such as an interrupt.
403*e790a4ceSJonathan Corbet
404*e790a4ceSJonathan Corbet
405*e790a4ceSJonathan CorbetCLUSTER_GOING_DOWN/INBOUND_COMING_UP:
406*e790a4ceSJonathan Corbet
407*e790a4ceSJonathan Corbet	The cluster is (or was) being torn down, but another CPU has
408*e790a4ceSJonathan Corbet	come online in the meantime and is trying to set up the cluster
409*e790a4ceSJonathan Corbet	again.
410*e790a4ceSJonathan Corbet
411*e790a4ceSJonathan Corbet	If the outbound CPU observes this state, it has two choices:
412*e790a4ceSJonathan Corbet
413*e790a4ceSJonathan Corbet		a) back out of teardown, restoring the cluster to the
414*e790a4ceSJonathan Corbet		   CLUSTER_UP state;
415*e790a4ceSJonathan Corbet
416*e790a4ceSJonathan Corbet		b) finish tearing the cluster down and put the cluster
417*e790a4ceSJonathan Corbet		   in the CLUSTER_DOWN state; the inbound CPU will
418*e790a4ceSJonathan Corbet		   set up the cluster again from there.
419*e790a4ceSJonathan Corbet
420*e790a4ceSJonathan Corbet	Choice (a) permits the removal of some latency by avoiding
421*e790a4ceSJonathan Corbet	unnecessary teardown and setup operations in situations where
422*e790a4ceSJonathan Corbet	the cluster is not really going to be powered down.
423*e790a4ceSJonathan Corbet
424*e790a4ceSJonathan Corbet
425*e790a4ceSJonathan Corbet	Next states:
426*e790a4ceSJonathan Corbet
427*e790a4ceSJonathan Corbet	CLUSTER_UP/INBOUND_COMING_UP (outbound)
428*e790a4ceSJonathan Corbet		Conditions:
429*e790a4ceSJonathan Corbet				cluster-level setup and hardware
430*e790a4ceSJonathan Corbet				coherency complete
431*e790a4ceSJonathan Corbet
432*e790a4ceSJonathan Corbet		Trigger events:
433*e790a4ceSJonathan Corbet				(spontaneous)
434*e790a4ceSJonathan Corbet
435*e790a4ceSJonathan Corbet	CLUSTER_DOWN/INBOUND_COMING_UP (outbound)
436*e790a4ceSJonathan Corbet		Conditions:
437*e790a4ceSJonathan Corbet			cluster torn down and ready to power off
438*e790a4ceSJonathan Corbet
439*e790a4ceSJonathan Corbet		Trigger events:
440*e790a4ceSJonathan Corbet			(spontaneous)
441*e790a4ceSJonathan Corbet
442*e790a4ceSJonathan Corbet
443*e790a4ceSJonathan CorbetLast man and First man selection
444*e790a4ceSJonathan Corbet--------------------------------
445*e790a4ceSJonathan Corbet
446*e790a4ceSJonathan CorbetThe CPU which performs cluster tear-down operations on the outbound side
447*e790a4ceSJonathan Corbetis commonly referred to as the "last man".
448*e790a4ceSJonathan Corbet
449*e790a4ceSJonathan CorbetThe CPU which performs cluster setup on the inbound side is commonly
450*e790a4ceSJonathan Corbetreferred to as the "first man".
451*e790a4ceSJonathan Corbet
452*e790a4ceSJonathan CorbetThe race avoidance algorithm documented above does not provide a
453*e790a4ceSJonathan Corbetmechanism to choose which CPUs should play these roles.
454*e790a4ceSJonathan Corbet
455*e790a4ceSJonathan Corbet
456*e790a4ceSJonathan CorbetLast man:
457*e790a4ceSJonathan Corbet
458*e790a4ceSJonathan CorbetWhen shutting down the cluster, all the CPUs involved are initially
459*e790a4ceSJonathan Corbetexecuting Linux and hence coherent.  Therefore, ordinary spinlocks can
460*e790a4ceSJonathan Corbetbe used to select a last man safely, before the CPUs become
461*e790a4ceSJonathan Corbetnon-coherent.
462*e790a4ceSJonathan Corbet
463*e790a4ceSJonathan Corbet
464*e790a4ceSJonathan CorbetFirst man:
465*e790a4ceSJonathan Corbet
466*e790a4ceSJonathan CorbetBecause CPUs may power up asynchronously in response to external wake-up
467*e790a4ceSJonathan Corbetevents, a dynamic mechanism is needed to make sure that only one CPU
468*e790a4ceSJonathan Corbetattempts to play the first man role and do the cluster-level
469*e790a4ceSJonathan Corbetinitialisation: any other CPUs must wait for this to complete before
470*e790a4ceSJonathan Corbetproceeding.
471*e790a4ceSJonathan Corbet
472*e790a4ceSJonathan CorbetCluster-level initialisation may involve actions such as configuring
473*e790a4ceSJonathan Corbetcoherency controls in the bus fabric.
474*e790a4ceSJonathan Corbet
475*e790a4ceSJonathan CorbetThe current implementation in mcpm_head.S uses a separate mutual exclusion
476*e790a4ceSJonathan Corbetmechanism to do this arbitration.  This mechanism is documented in
477*e790a4ceSJonathan Corbetdetail in vlocks.txt.
478*e790a4ceSJonathan Corbet
479*e790a4ceSJonathan Corbet
480*e790a4ceSJonathan CorbetFeatures and Limitations
481*e790a4ceSJonathan Corbet------------------------
482*e790a4ceSJonathan Corbet
483*e790a4ceSJonathan CorbetImplementation:
484*e790a4ceSJonathan Corbet
485*e790a4ceSJonathan Corbet	The current ARM-based implementation is split between
486*e790a4ceSJonathan Corbet	arch/arm/common/mcpm_head.S (low-level inbound CPU operations) and
487*e790a4ceSJonathan Corbet	arch/arm/common/mcpm_entry.c (everything else):
488*e790a4ceSJonathan Corbet
489*e790a4ceSJonathan Corbet	__mcpm_cpu_going_down() signals the transition of a CPU to the
490*e790a4ceSJonathan Corbet	CPU_GOING_DOWN state.
491*e790a4ceSJonathan Corbet
492*e790a4ceSJonathan Corbet	__mcpm_cpu_down() signals the transition of a CPU to the CPU_DOWN
493*e790a4ceSJonathan Corbet	state.
494*e790a4ceSJonathan Corbet
495*e790a4ceSJonathan Corbet	A CPU transitions to CPU_COMING_UP and then to CPU_UP via the
496*e790a4ceSJonathan Corbet	low-level power-up code in mcpm_head.S.  This could
497*e790a4ceSJonathan Corbet	involve CPU-specific setup code, but in the current
498*e790a4ceSJonathan Corbet	implementation it does not.
499*e790a4ceSJonathan Corbet
500*e790a4ceSJonathan Corbet	__mcpm_outbound_enter_critical() and __mcpm_outbound_leave_critical()
501*e790a4ceSJonathan Corbet	handle transitions from CLUSTER_UP to CLUSTER_GOING_DOWN
502*e790a4ceSJonathan Corbet	and from there to CLUSTER_DOWN or back to CLUSTER_UP (in
503*e790a4ceSJonathan Corbet	the case of an aborted cluster power-down).
504*e790a4ceSJonathan Corbet
505*e790a4ceSJonathan Corbet	These functions are more complex than the __mcpm_cpu_*()
506*e790a4ceSJonathan Corbet	functions due to the extra inter-CPU coordination which
507*e790a4ceSJonathan Corbet	is needed for safe transitions at the cluster level.
508*e790a4ceSJonathan Corbet
509*e790a4ceSJonathan Corbet	A cluster transitions from CLUSTER_DOWN back to CLUSTER_UP via
510*e790a4ceSJonathan Corbet	the low-level power-up code in mcpm_head.S.  This
511*e790a4ceSJonathan Corbet	typically involves platform-specific setup code,
512*e790a4ceSJonathan Corbet	provided by the platform-specific power_up_setup
513*e790a4ceSJonathan Corbet	function registered via mcpm_sync_init.
514*e790a4ceSJonathan Corbet
515*e790a4ceSJonathan CorbetDeep topologies:
516*e790a4ceSJonathan Corbet
517*e790a4ceSJonathan Corbet	As currently described and implemented, the algorithm does not
518*e790a4ceSJonathan Corbet	support CPU topologies involving more than two levels (i.e.,
519*e790a4ceSJonathan Corbet	clusters of clusters are not supported).  The algorithm could be
520*e790a4ceSJonathan Corbet	extended by replicating the cluster-level states for the
521*e790a4ceSJonathan Corbet	additional topological levels, and modifying the transition
522*e790a4ceSJonathan Corbet	rules for the intermediate (non-outermost) cluster levels.
523*e790a4ceSJonathan Corbet
524*e790a4ceSJonathan Corbet
525*e790a4ceSJonathan CorbetColophon
526*e790a4ceSJonathan Corbet--------
527*e790a4ceSJonathan Corbet
528*e790a4ceSJonathan CorbetOriginally created and documented by Dave Martin for Linaro Limited, in
529*e790a4ceSJonathan Corbetcollaboration with Nicolas Pitre and Achin Gupta.
530*e790a4ceSJonathan Corbet
531*e790a4ceSJonathan CorbetCopyright (C) 2012-2013  Linaro Limited
532*e790a4ceSJonathan CorbetDistributed under the terms of Version 2 of the GNU General Public
533*e790a4ceSJonathan CorbetLicense, as defined in linux/COPYING.
534