1================ 2CPU Idle Cooling 3================ 4 5Situation: 6---------- 7 8Under certain circumstances a SoC can reach a critical temperature 9limit and is unable to stabilize the temperature around a temperature 10control. When the SoC has to stabilize the temperature, the kernel can 11act on a cooling device to mitigate the dissipated power. When the 12critical temperature is reached, a decision must be taken to reduce 13the temperature, that, in turn impacts performance. 14 15Another situation is when the silicon temperature continues to 16increase even after the dynamic leakage is reduced to its minimum by 17clock gating the component. This runaway phenomenon can continue due 18to the static leakage. The only solution is to power down the 19component, thus dropping the dynamic and static leakage that will 20allow the component to cool down. 21 22Last but not least, the system can ask for a specific power budget but 23because of the OPP density, we can only choose an OPP with a power 24budget lower than the requested one and under-utilize the CPU, thus 25losing performance. In other words, one OPP under-utilizes the CPU 26with a power less than the requested power budget and the next OPP 27exceeds the power budget. An intermediate OPP could have been used if 28it were present. 29 30Solutions: 31---------- 32 33If we can remove the static and the dynamic leakage for a specific 34duration in a controlled period, the SoC temperature will 35decrease. Acting on the idle state duration or the idle cycle 36injection period, we can mitigate the temperature by modulating the 37power budget. 38 39The Operating Performance Point (OPP) density has a great influence on 40the control precision of cpufreq, however different vendors have a 41plethora of OPP density, and some have large power gap between OPPs, 42that will result in loss of performance during thermal control and 43loss of power in other scenarios. 44 45At a specific OPP, we can assume that injecting idle cycle on all CPUs 46belong to the same cluster, with a duration greater than the cluster 47idle state target residency, we lead to dropping the static and the 48dynamic leakage for this period (modulo the energy needed to enter 49this state). So the sustainable power with idle cycles has a linear 50relation with the OPP’s sustainable power and can be computed with a 51coefficient similar to: 52 53 Power(IdleCycle) = Coef x Power(OPP) 54 55Idle Injection: 56--------------- 57 58The base concept of the idle injection is to force the CPU to go to an 59idle state for a specified time each control cycle, it provides 60another way to control CPU power and heat in addition to 61cpufreq. Ideally, if all CPUs belonging to the same cluster, inject 62their idle cycles synchronously, the cluster can reach its power down 63state with a minimum power consumption and reduce the static leakage 64to almost zero. However, these idle cycles injection will add extra 65latencies as the CPUs will have to wakeup from a deep sleep state. 66 67We use a fixed duration of idle injection that gives an acceptable 68performance penalty and a fixed latency. Mitigation can be increased 69or decreased by modulating the duty cycle of the idle injection. 70 71:: 72 73 ^ 74 | 75 | 76 |------- ------- 77 |_______|_______________________|_______|___________ 78 79 <------> 80 idle <----------------------> 81 running 82 83 <-----------------------------> 84 duty cycle 25% 85 86 87The implementation of the cooling device bases the number of states on 88the duty cycle percentage. When no mitigation is happening the cooling 89device state is zero, meaning the duty cycle is 0%. 90 91When the mitigation begins, depending on the governor's policy, a 92starting state is selected. With a fixed idle duration and the duty 93cycle (aka the cooling device state), the running duration can be 94computed. 95 96The governor will change the cooling device state thus the duty cycle 97and this variation will modulate the cooling effect. 98 99:: 100 101 ^ 102 | 103 | 104 |------- ------- 105 |_______|_______________|_______|___________ 106 107 <------> 108 idle <--------------> 109 running 110 111 <---------------------> 112 duty cycle 33% 113 114 115 ^ 116 | 117 | 118 |------- ------- 119 |_______|_______|_______|___________ 120 121 <------> 122 idle <------> 123 running 124 125 <-------------> 126 duty cycle 50% 127 128The idle injection duration value must comply with the constraints: 129 130- It is less than or equal to the latency we tolerate when the 131 mitigation begins. It is platform dependent and will depend on the 132 user experience, reactivity vs performance trade off we want. This 133 value should be specified. 134 135- It is greater than the idle state’s target residency we want to go 136 for thermal mitigation, otherwise we end up consuming more energy. 137 138Power considerations 139-------------------- 140 141When we reach the thermal trip point, we have to sustain a specified 142power for a specific temperature but at this time we consume: 143 144 Power = Capacitance x Voltage^2 x Frequency x Utilisation 145 146... which is more than the sustainable power (or there is something 147wrong in the system setup). The ‘Capacitance’ and ‘Utilisation’ are a 148fixed value, ‘Voltage’ and the ‘Frequency’ are fixed artificially 149because we don’t want to change the OPP. We can group the 150‘Capacitance’ and the ‘Utilisation’ into a single term which is the 151‘Dynamic Power Coefficient (Cdyn)’ Simplifying the above, we have: 152 153 Pdyn = Cdyn x Voltage^2 x Frequency 154 155The power allocator governor will ask us somehow to reduce our power 156in order to target the sustainable power defined in the device 157tree. So with the idle injection mechanism, we want an average power 158(Ptarget) resulting in an amount of time running at full power on a 159specific OPP and idle another amount of time. That could be put in a 160equation: 161 162 P(opp)target = ((Trunning x (P(opp)running) + (Tidle x P(opp)idle)) / 163 (Trunning + Tidle) 164 165 ... 166 167 Tidle = Trunning x ((P(opp)running / P(opp)target) - 1) 168 169At this point if we know the running period for the CPU, that gives us 170the idle injection we need. Alternatively if we have the idle 171injection duration, we can compute the running duration with: 172 173 Trunning = Tidle / ((P(opp)running / P(opp)target) - 1) 174 175Practically, if the running power is less than the targeted power, we 176end up with a negative time value, so obviously the equation usage is 177bound to a power reduction, hence a higher OPP is needed to have the 178running power greater than the targeted power. 179 180However, in this demonstration we ignore three aspects: 181 182 * The static leakage is not defined here, we can introduce it in the 183 equation but assuming it will be zero most of the time as it is 184 difficult to get the values from the SoC vendors 185 186 * The idle state wake up latency (or entry + exit latency) is not 187 taken into account, it must be added in the equation in order to 188 rigorously compute the idle injection 189 190 * The injected idle duration must be greater than the idle state 191 target residency, otherwise we end up consuming more energy and 192 potentially invert the mitigation effect 193 194So the final equation is: 195 196 Trunning = (Tidle - Twakeup ) x 197 (((P(opp)dyn + P(opp)static ) - P(opp)target) / P(opp)target ) 198