1458f69efSMauro Carvalho Chehab=========================================================== 2458f69efSMauro Carvalho ChehabClock sources, Clock events, sched_clock() and delay timers 3458f69efSMauro Carvalho Chehab=========================================================== 4458f69efSMauro Carvalho Chehab 5458f69efSMauro Carvalho ChehabThis document tries to briefly explain some basic kernel timekeeping 6458f69efSMauro Carvalho Chehababstractions. It partly pertains to the drivers usually found in 7458f69efSMauro Carvalho Chehabdrivers/clocksource in the kernel tree, but the code may be spread out 8458f69efSMauro Carvalho Chehabacross the kernel. 9458f69efSMauro Carvalho Chehab 10458f69efSMauro Carvalho ChehabIf you grep through the kernel source you will find a number of architecture- 11458f69efSMauro Carvalho Chehabspecific implementations of clock sources, clockevents and several likewise 12458f69efSMauro Carvalho Chehabarchitecture-specific overrides of the sched_clock() function and some 13458f69efSMauro Carvalho Chehabdelay timers. 14458f69efSMauro Carvalho Chehab 15458f69efSMauro Carvalho ChehabTo provide timekeeping for your platform, the clock source provides 16458f69efSMauro Carvalho Chehabthe basic timeline, whereas clock events shoot interrupts on certain points 17458f69efSMauro Carvalho Chehabon this timeline, providing facilities such as high-resolution timers. 18458f69efSMauro Carvalho Chehabsched_clock() is used for scheduling and timestamping, and delay timers 19458f69efSMauro Carvalho Chehabprovide an accurate delay source using hardware counters. 20458f69efSMauro Carvalho Chehab 21458f69efSMauro Carvalho Chehab 22458f69efSMauro Carvalho ChehabClock sources 23458f69efSMauro Carvalho Chehab------------- 24458f69efSMauro Carvalho Chehab 25458f69efSMauro Carvalho ChehabThe purpose of the clock source is to provide a timeline for the system that 26458f69efSMauro Carvalho Chehabtells you where you are in time. For example issuing the command 'date' on 27458f69efSMauro Carvalho Chehaba Linux system will eventually read the clock source to determine exactly 28458f69efSMauro Carvalho Chehabwhat time it is. 29458f69efSMauro Carvalho Chehab 30458f69efSMauro Carvalho ChehabTypically the clock source is a monotonic, atomic counter which will provide 31458f69efSMauro Carvalho Chehabn bits which count from 0 to (2^n)-1 and then wraps around to 0 and start over. 32458f69efSMauro Carvalho ChehabIt will ideally NEVER stop ticking as long as the system is running. It 33458f69efSMauro Carvalho Chehabmay stop during system suspend. 34458f69efSMauro Carvalho Chehab 35458f69efSMauro Carvalho ChehabThe clock source shall have as high resolution as possible, and the frequency 36458f69efSMauro Carvalho Chehabshall be as stable and correct as possible as compared to a real-world wall 37458f69efSMauro Carvalho Chehabclock. It should not move unpredictably back and forth in time or miss a few 38458f69efSMauro Carvalho Chehabcycles here and there. 39458f69efSMauro Carvalho Chehab 40458f69efSMauro Carvalho ChehabIt must be immune to the kind of effects that occur in hardware where e.g. 41458f69efSMauro Carvalho Chehabthe counter register is read in two phases on the bus lowest 16 bits first 42458f69efSMauro Carvalho Chehaband the higher 16 bits in a second bus cycle with the counter bits 43458f69efSMauro Carvalho Chehabpotentially being updated in between leading to the risk of very strange 44458f69efSMauro Carvalho Chehabvalues from the counter. 45458f69efSMauro Carvalho Chehab 46458f69efSMauro Carvalho ChehabWhen the wall-clock accuracy of the clock source isn't satisfactory, there 47458f69efSMauro Carvalho Chehabare various quirks and layers in the timekeeping code for e.g. synchronizing 48458f69efSMauro Carvalho Chehabthe user-visible time to RTC clocks in the system or against networked time 49458f69efSMauro Carvalho Chehabservers using NTP, but all they do basically is update an offset against 50458f69efSMauro Carvalho Chehabthe clock source, which provides the fundamental timeline for the system. 51458f69efSMauro Carvalho ChehabThese measures does not affect the clock source per se, they only adapt the 52458f69efSMauro Carvalho Chehabsystem to the shortcomings of it. 53458f69efSMauro Carvalho Chehab 54458f69efSMauro Carvalho ChehabThe clock source struct shall provide means to translate the provided counter 55458f69efSMauro Carvalho Chehabinto a nanosecond value as an unsigned long long (unsigned 64 bit) number. 56458f69efSMauro Carvalho ChehabSince this operation may be invoked very often, doing this in a strict 57458f69efSMauro Carvalho Chehabmathematical sense is not desirable: instead the number is taken as close as 58458f69efSMauro Carvalho Chehabpossible to a nanosecond value using only the arithmetic operations 59458f69efSMauro Carvalho Chehabmultiply and shift, so in clocksource_cyc2ns() you find: 60458f69efSMauro Carvalho Chehab 61458f69efSMauro Carvalho Chehab ns ~= (clocksource * mult) >> shift 62458f69efSMauro Carvalho Chehab 63458f69efSMauro Carvalho ChehabYou will find a number of helper functions in the clock source code intended 64458f69efSMauro Carvalho Chehabto aid in providing these mult and shift values, such as 65458f69efSMauro Carvalho Chehabclocksource_khz2mult(), clocksource_hz2mult() that help determine the 66458f69efSMauro Carvalho Chehabmult factor from a fixed shift, and clocksource_register_hz() and 67458f69efSMauro Carvalho Chehabclocksource_register_khz() which will help out assigning both shift and mult 68458f69efSMauro Carvalho Chehabfactors using the frequency of the clock source as the only input. 69458f69efSMauro Carvalho Chehab 70458f69efSMauro Carvalho ChehabFor real simple clock sources accessed from a single I/O memory location 71458f69efSMauro Carvalho Chehabthere is nowadays even clocksource_mmio_init() which will take a memory 72458f69efSMauro Carvalho Chehablocation, bit width, a parameter telling whether the counter in the 73458f69efSMauro Carvalho Chehabregister counts up or down, and the timer clock rate, and then conjure all 74458f69efSMauro Carvalho Chehabnecessary parameters. 75458f69efSMauro Carvalho Chehab 76458f69efSMauro Carvalho ChehabSince a 32-bit counter at say 100 MHz will wrap around to zero after some 43 77458f69efSMauro Carvalho Chehabseconds, the code handling the clock source will have to compensate for this. 78458f69efSMauro Carvalho ChehabThat is the reason why the clock source struct also contains a 'mask' 79458f69efSMauro Carvalho Chehabmember telling how many bits of the source are valid. This way the timekeeping 80458f69efSMauro Carvalho Chehabcode knows when the counter will wrap around and can insert the necessary 81458f69efSMauro Carvalho Chehabcompensation code on both sides of the wrap point so that the system timeline 82458f69efSMauro Carvalho Chehabremains monotonic. 83458f69efSMauro Carvalho Chehab 84458f69efSMauro Carvalho Chehab 85458f69efSMauro Carvalho ChehabClock events 86458f69efSMauro Carvalho Chehab------------ 87458f69efSMauro Carvalho Chehab 88458f69efSMauro Carvalho ChehabClock events are the conceptual reverse of clock sources: they take a 89458f69efSMauro Carvalho Chehabdesired time specification value and calculate the values to poke into 90458f69efSMauro Carvalho Chehabhardware timer registers. 91458f69efSMauro Carvalho Chehab 92458f69efSMauro Carvalho ChehabClock events are orthogonal to clock sources. The same hardware 93458f69efSMauro Carvalho Chehaband register range may be used for the clock event, but it is essentially 94458f69efSMauro Carvalho Chehaba different thing. The hardware driving clock events has to be able to 95458f69efSMauro Carvalho Chehabfire interrupts, so as to trigger events on the system timeline. On an SMP 96458f69efSMauro Carvalho Chehabsystem, it is ideal (and customary) to have one such event driving timer per 97458f69efSMauro Carvalho ChehabCPU core, so that each core can trigger events independently of any other 98458f69efSMauro Carvalho Chehabcore. 99458f69efSMauro Carvalho Chehab 100458f69efSMauro Carvalho ChehabYou will notice that the clock event device code is based on the same basic 101458f69efSMauro Carvalho Chehabidea about translating counters to nanoseconds using mult and shift 102458f69efSMauro Carvalho Chehabarithmetic, and you find the same family of helper functions again for 103458f69efSMauro Carvalho Chehabassigning these values. The clock event driver does not need a 'mask' 104458f69efSMauro Carvalho Chehabattribute however: the system will not try to plan events beyond the time 105458f69efSMauro Carvalho Chehabhorizon of the clock event. 106458f69efSMauro Carvalho Chehab 107458f69efSMauro Carvalho Chehab 108458f69efSMauro Carvalho Chehabsched_clock() 109458f69efSMauro Carvalho Chehab------------- 110458f69efSMauro Carvalho Chehab 111458f69efSMauro Carvalho ChehabIn addition to the clock sources and clock events there is a special weak 112458f69efSMauro Carvalho Chehabfunction in the kernel called sched_clock(). This function shall return the 113458f69efSMauro Carvalho Chehabnumber of nanoseconds since the system was started. An architecture may or 114458f69efSMauro Carvalho Chehabmay not provide an implementation of sched_clock() on its own. If a local 115458f69efSMauro Carvalho Chehabimplementation is not provided, the system jiffy counter will be used as 116458f69efSMauro Carvalho Chehabsched_clock(). 117458f69efSMauro Carvalho Chehab 118458f69efSMauro Carvalho ChehabAs the name suggests, sched_clock() is used for scheduling the system, 119458f69efSMauro Carvalho Chehabdetermining the absolute timeslice for a certain process in the CFS scheduler 120458f69efSMauro Carvalho Chehabfor example. It is also used for printk timestamps when you have selected to 121458f69efSMauro Carvalho Chehabinclude time information in printk for things like bootcharts. 122458f69efSMauro Carvalho Chehab 123458f69efSMauro Carvalho ChehabCompared to clock sources, sched_clock() has to be very fast: it is called 124458f69efSMauro Carvalho Chehabmuch more often, especially by the scheduler. If you have to do trade-offs 125458f69efSMauro Carvalho Chehabbetween accuracy compared to the clock source, you may sacrifice accuracy 126458f69efSMauro Carvalho Chehabfor speed in sched_clock(). It however requires some of the same basic 127458f69efSMauro Carvalho Chehabcharacteristics as the clock source, i.e. it should be monotonic. 128458f69efSMauro Carvalho Chehab 129458f69efSMauro Carvalho ChehabThe sched_clock() function may wrap only on unsigned long long boundaries, 130458f69efSMauro Carvalho Chehabi.e. after 64 bits. Since this is a nanosecond value this will mean it wraps 131458f69efSMauro Carvalho Chehabafter circa 585 years. (For most practical systems this means "never".) 132458f69efSMauro Carvalho Chehab 133458f69efSMauro Carvalho ChehabIf an architecture does not provide its own implementation of this function, 134458f69efSMauro Carvalho Chehabit will fall back to using jiffies, making its maximum resolution 1/HZ of the 135458f69efSMauro Carvalho Chehabjiffy frequency for the architecture. This will affect scheduling accuracy 136458f69efSMauro Carvalho Chehaband will likely show up in system benchmarks. 137458f69efSMauro Carvalho Chehab 138458f69efSMauro Carvalho ChehabThe clock driving sched_clock() may stop or reset to zero during system 139458f69efSMauro Carvalho Chehabsuspend/sleep. This does not matter to the function it serves of scheduling 140458f69efSMauro Carvalho Chehabevents on the system. However it may result in interesting timestamps in 141458f69efSMauro Carvalho Chehabprintk(). 142458f69efSMauro Carvalho Chehab 143458f69efSMauro Carvalho ChehabThe sched_clock() function should be callable in any context, IRQ- and 144458f69efSMauro Carvalho ChehabNMI-safe and return a sane value in any context. 145458f69efSMauro Carvalho Chehab 146458f69efSMauro Carvalho ChehabSome architectures may have a limited set of time sources and lack a nice 147458f69efSMauro Carvalho Chehabcounter to derive a 64-bit nanosecond value, so for example on the ARM 148458f69efSMauro Carvalho Chehabarchitecture, special helper functions have been created to provide a 149458f69efSMauro Carvalho Chehabsched_clock() nanosecond base from a 16- or 32-bit counter. Sometimes the 150458f69efSMauro Carvalho Chehabsame counter that is also used as clock source is used for this purpose. 151458f69efSMauro Carvalho Chehab 152458f69efSMauro Carvalho ChehabOn SMP systems, it is crucial for performance that sched_clock() can be called 153458f69efSMauro Carvalho Chehabindependently on each CPU without any synchronization performance hits. 154458f69efSMauro Carvalho ChehabSome hardware (such as the x86 TSC) will cause the sched_clock() function to 155458f69efSMauro Carvalho Chehabdrift between the CPUs on the system. The kernel can work around this by 156458f69efSMauro Carvalho Chehabenabling the CONFIG_HAVE_UNSTABLE_SCHED_CLOCK option. This is another aspect 157458f69efSMauro Carvalho Chehabthat makes sched_clock() different from the ordinary clock source. 158458f69efSMauro Carvalho Chehab 159458f69efSMauro Carvalho Chehab 160458f69efSMauro Carvalho ChehabDelay timers (some architectures only) 161458f69efSMauro Carvalho Chehab-------------------------------------- 162458f69efSMauro Carvalho Chehab 163458f69efSMauro Carvalho ChehabOn systems with variable CPU frequency, the various kernel delay() functions 164458f69efSMauro Carvalho Chehabwill sometimes behave strangely. Basically these delays usually use a hard 165458f69efSMauro Carvalho Chehabloop to delay a certain number of jiffy fractions using a "lpj" (loops per 166458f69efSMauro Carvalho Chehabjiffy) value, calibrated on boot. 167458f69efSMauro Carvalho Chehab 168458f69efSMauro Carvalho ChehabLet's hope that your system is running on maximum frequency when this value 169458f69efSMauro Carvalho Chehabis calibrated: as an effect when the frequency is geared down to half the 170458f69efSMauro Carvalho Chehabfull frequency, any delay() will be twice as long. Usually this does not 171458f69efSMauro Carvalho Chehabhurt, as you're commonly requesting that amount of delay *or more*. But 172458f69efSMauro Carvalho Chehabbasically the semantics are quite unpredictable on such systems. 173458f69efSMauro Carvalho Chehab 174458f69efSMauro Carvalho ChehabEnter timer-based delays. Using these, a timer read may be used instead of 175458f69efSMauro Carvalho Chehaba hard-coded loop for providing the desired delay. 176458f69efSMauro Carvalho Chehab 177458f69efSMauro Carvalho ChehabThis is done by declaring a struct delay_timer and assigning the appropriate 178458f69efSMauro Carvalho Chehabfunction pointers and rate settings for this delay timer. 179458f69efSMauro Carvalho Chehab 180458f69efSMauro Carvalho ChehabThis is available on some architectures like OpenRISC or ARM. 181