Searched hist:"3252 a646" (Results 1 – 2 of 2) sorted by relevance
/openbmc/linux/drivers/clocksource/ |
H A D | exynos_mct.c | 3252a646 Fri Jul 04 16:43:26 CDT 2014 Doug Anderson <dianders@chromium.org> clocksource: exynos_mct: Only use 32-bits where possible
The MCT has a nice 64-bit counter. That means that we _can_ register as a 64-bit clocksource and sched_clock. ...but that doesn't mean we should.
The 64-bit counter is read by reading two 32-bit registers. That means reading needs to be something like: - Read upper half - Read lower half - Read upper half and confirm that it hasn't changed.
That wouldn't be terrible, but: - THe MCT isn't very fast to access (hundreds of nanoseconds). - The clocksource is queried _all the time_.
In total system profiles of real workloads on ChromeOS, we've seen exynos_frc_read() taking 2% or more of CPU time even after optimizing the 3 reads above to 2 (see below).
The MCT is clocked at ~24MHz on all known systems. That means that the 32-bit half of the counter rolls over every ~178 seconds. This inspired an optimization in ChromeOS to cache the upper half between calls, moving 3 reads to 2. ...but we can do better! Having a 32-bit timer that flips every 178 seconds is more than sufficient for Linux. Let's just use the lower half of the MCT.
Times on 5420 to do 1000000 gettimeofday() calls from userspace: * Original code: 1323852 us * ChromeOS cache upper half: 1173084 us * ChromeOS + ldmia to optimize: 1045674 us * Use lower 32-bit only (this code): 1014429 us
As you can see, the time used doesn't increase linearly with the number of reads and we can make 64-bit work almost as fast as 32-bit with a bit of assembly code. But since there's no real gain for 64-bit, let's go with the simplest and fastest implementation.
Note: with this change roughly half the time for gettimeofday() is spent in exynos_frc_read(). The rest is timer / system call overhead.
Also note: this patch disables the use of the MCT on ARM64 systems until we've sorted out how to make "cycles_t" always 32-bit. Really ARM64 systems should be using arch timers anyway.
Signed-off-by: Doug Anderson <dianders@chromium.org> Acked-by Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> 3252a646 Fri Jul 04 16:43:26 CDT 2014 Doug Anderson <dianders@chromium.org> clocksource: exynos_mct: Only use 32-bits where possible The MCT has a nice 64-bit counter. That means that we _can_ register as a 64-bit clocksource and sched_clock. ...but that doesn't mean we should. The 64-bit counter is read by reading two 32-bit registers. That means reading needs to be something like: - Read upper half - Read lower half - Read upper half and confirm that it hasn't changed. That wouldn't be terrible, but: - THe MCT isn't very fast to access (hundreds of nanoseconds). - The clocksource is queried _all the time_. In total system profiles of real workloads on ChromeOS, we've seen exynos_frc_read() taking 2% or more of CPU time even after optimizing the 3 reads above to 2 (see below). The MCT is clocked at ~24MHz on all known systems. That means that the 32-bit half of the counter rolls over every ~178 seconds. This inspired an optimization in ChromeOS to cache the upper half between calls, moving 3 reads to 2. ...but we can do better! Having a 32-bit timer that flips every 178 seconds is more than sufficient for Linux. Let's just use the lower half of the MCT. Times on 5420 to do 1000000 gettimeofday() calls from userspace: * Original code: 1323852 us * ChromeOS cache upper half: 1173084 us * ChromeOS + ldmia to optimize: 1045674 us * Use lower 32-bit only (this code): 1014429 us As you can see, the time used doesn't increase linearly with the number of reads and we can make 64-bit work almost as fast as 32-bit with a bit of assembly code. But since there's no real gain for 64-bit, let's go with the simplest and fastest implementation. Note: with this change roughly half the time for gettimeofday() is spent in exynos_frc_read(). The rest is timer / system call overhead. Also note: this patch disables the use of the MCT on ARM64 systems until we've sorted out how to make "cycles_t" always 32-bit. Really ARM64 systems should be using arch timers anyway. Signed-off-by: Doug Anderson <dianders@chromium.org> Acked-by Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
H A D | Kconfig | 3252a646 Fri Jul 04 16:43:26 CDT 2014 Doug Anderson <dianders@chromium.org> clocksource: exynos_mct: Only use 32-bits where possible
The MCT has a nice 64-bit counter. That means that we _can_ register as a 64-bit clocksource and sched_clock. ...but that doesn't mean we should.
The 64-bit counter is read by reading two 32-bit registers. That means reading needs to be something like: - Read upper half - Read lower half - Read upper half and confirm that it hasn't changed.
That wouldn't be terrible, but: - THe MCT isn't very fast to access (hundreds of nanoseconds). - The clocksource is queried _all the time_.
In total system profiles of real workloads on ChromeOS, we've seen exynos_frc_read() taking 2% or more of CPU time even after optimizing the 3 reads above to 2 (see below).
The MCT is clocked at ~24MHz on all known systems. That means that the 32-bit half of the counter rolls over every ~178 seconds. This inspired an optimization in ChromeOS to cache the upper half between calls, moving 3 reads to 2. ...but we can do better! Having a 32-bit timer that flips every 178 seconds is more than sufficient for Linux. Let's just use the lower half of the MCT.
Times on 5420 to do 1000000 gettimeofday() calls from userspace: * Original code: 1323852 us * ChromeOS cache upper half: 1173084 us * ChromeOS + ldmia to optimize: 1045674 us * Use lower 32-bit only (this code): 1014429 us
As you can see, the time used doesn't increase linearly with the number of reads and we can make 64-bit work almost as fast as 32-bit with a bit of assembly code. But since there's no real gain for 64-bit, let's go with the simplest and fastest implementation.
Note: with this change roughly half the time for gettimeofday() is spent in exynos_frc_read(). The rest is timer / system call overhead.
Also note: this patch disables the use of the MCT on ARM64 systems until we've sorted out how to make "cycles_t" always 32-bit. Really ARM64 systems should be using arch timers anyway.
Signed-off-by: Doug Anderson <dianders@chromium.org> Acked-by Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> 3252a646 Fri Jul 04 16:43:26 CDT 2014 Doug Anderson <dianders@chromium.org> clocksource: exynos_mct: Only use 32-bits where possible The MCT has a nice 64-bit counter. That means that we _can_ register as a 64-bit clocksource and sched_clock. ...but that doesn't mean we should. The 64-bit counter is read by reading two 32-bit registers. That means reading needs to be something like: - Read upper half - Read lower half - Read upper half and confirm that it hasn't changed. That wouldn't be terrible, but: - THe MCT isn't very fast to access (hundreds of nanoseconds). - The clocksource is queried _all the time_. In total system profiles of real workloads on ChromeOS, we've seen exynos_frc_read() taking 2% or more of CPU time even after optimizing the 3 reads above to 2 (see below). The MCT is clocked at ~24MHz on all known systems. That means that the 32-bit half of the counter rolls over every ~178 seconds. This inspired an optimization in ChromeOS to cache the upper half between calls, moving 3 reads to 2. ...but we can do better! Having a 32-bit timer that flips every 178 seconds is more than sufficient for Linux. Let's just use the lower half of the MCT. Times on 5420 to do 1000000 gettimeofday() calls from userspace: * Original code: 1323852 us * ChromeOS cache upper half: 1173084 us * ChromeOS + ldmia to optimize: 1045674 us * Use lower 32-bit only (this code): 1014429 us As you can see, the time used doesn't increase linearly with the number of reads and we can make 64-bit work almost as fast as 32-bit with a bit of assembly code. But since there's no real gain for 64-bit, let's go with the simplest and fastest implementation. Note: with this change roughly half the time for gettimeofday() is spent in exynos_frc_read(). The rest is timer / system call overhead. Also note: this patch disables the use of the MCT on ARM64 systems until we've sorted out how to make "cycles_t" always 32-bit. Really ARM64 systems should be using arch timers anyway. Signed-off-by: Doug Anderson <dianders@chromium.org> Acked-by Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|