#
9522a2ff |
| 27-Apr-2016 |
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> |
cpufreq: intel_pstate: Enforce _PPC limits
Use ACPI _PPC notification to limit max P state driver will request. ACPI _PPC change notification is sent by BIOS to limit max P state in several cases: -
cpufreq: intel_pstate: Enforce _PPC limits
Use ACPI _PPC notification to limit max P state driver will request. ACPI _PPC change notification is sent by BIOS to limit max P state in several cases: - Reduce impact of platform thermal condition - When Config TDP feature is used, a changed _PPC is sent to follow TDP change - Remote node managers in server want to control platform power via baseboard management controller (BMC)
This change registers with ACPI processor performance lib so that _PPC changes are notified to cpufreq core, which in turns will result in call to .setpolicy() callback. Also the way _PSS table identifies a turbo frequency is not compatible to max turbo frequency in intel_pstate, so the very first entry in _PSS needs to be adjusted.
This feature can be turned on by using kernel parameters: intel_pstate=support_acpi_ppc
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Minor cleanups ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
1becf035 |
| 22-Apr-2016 |
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> |
cpufreq: intel_pstate: Fix processing for turbo activation ratio
When the config TDP level is not nominal (level = 0), the MSR values for reading level 1 and level 2 ratios contain power in low 14 b
cpufreq: intel_pstate: Fix processing for turbo activation ratio
When the config TDP level is not nominal (level = 0), the MSR values for reading level 1 and level 2 ratios contain power in low 14 bits and actual ratio bits are at bits [23:16]. The current processing for level 1 and level 2 is wrong as there is no shift done to get actual ratio.
Fixes: 6a35fc2d6c22 (cpufreq: intel_pstate: get P1 from TAR when available) Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 4.4+ <stable@vger.kernel.org> # 4.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
bdcaa23f |
| 22-Apr-2016 |
Philippe Longepe <philippe.longepe@linux.intel.com> |
cpufreq: intel_pstate: Use average P-State instead of current P-State
The result returned by pid_calc() is subtracted from current_pstate (which is the P-State requested during the last period) in o
cpufreq: intel_pstate: Use average P-State instead of current P-State
The result returned by pid_calc() is subtracted from current_pstate (which is the P-State requested during the last period) in order to obtain the target P-State for the current iteration.
However, current_pstate may not reflect the real current P-State of the CPU. In particular, that P-State may be higher because of the frequency sharing per module.
The theory is: - The load is the percentage of time spent in C0 and is related to the average P-State during the same period. - The last requested P-State can be completely different than the average P-State (because of frequency sharing or throttling). - The P-State shift computed by the pid_calc is based on the load computed at average P-State, so the shift must be relative to this average P-State.
Using the average P-State instead of current P-State improves power without significant performance penalty in cases when a task migrates from one core to other core sharing frequency and voltage.
Performance and power comparison with this patch on Cherry Trail platform using Android:
Benchmark ?Perf ?Power FishTank 10.45% 3.1% SmartBench-Gaming -0.1% -10.4% SmartBench-Productivity -0.8% -10.4% CandyCrush n/a -17.4% AngryBirds n/a -5.9% videoPlayback n/a -13.9% audioPlayback n/a -4.9% IcyRocks-20-50 0.0% -38.4% iozone RR -0.16% -1.3% iozone RW 0.74% -1.3%
Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
ffb81056 |
| 09-Apr-2016 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
intel_pstate: Avoid getting stuck in high P-states when idle
Jörg Otte reports that commit a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with utilization update callbacks) caused the CPUs in h
intel_pstate: Avoid getting stuck in high P-states when idle
Jörg Otte reports that commit a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with utilization update callbacks) caused the CPUs in his Haswell-based system to stay in the very high frequency region even if the system is completely idle.
That turns out to be an existing problem in the intel_pstate driver's P-state selection algorithm for Core processors. Namely, all decisions made by that algorithm are based on the average frequency of the CPU between sampling events and on the P-state requested on the last invocation, so it may get stuck at a very hight frequency even if the utilization of the CPU is very low (in fact, it may get stuck in a inadequate P-state regardless of the CPU utilization). The only way to kick it out of that limbo is a sufficiently long idle period (3 times longer than the prescribed sampling interval), but if that doesn't happen often enough (eg. due to a timing change like after the above commit), the P-state of the CPU may be inadequate pretty much all the time.
To address the most egregious manifestations of that issue, reset the core_busy value used to determine the next P-state to request if the utilization of the CPU, determined with the help of the MPERF feedback register and the TSC, is below 1%.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=115771 Reported-and-tested-by: Jörg Otte <jrg.otte@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
4836df17 |
| 05-Apr-2016 |
Joe Perches <joe@perches.com> |
intel_pstate: Use pr_fmt
Prefix the output using the more common kernel style.
Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked
intel_pstate: Use pr_fmt
Prefix the output using the more common kernel style.
Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Rebase ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
22590efb |
| 08-Apr-2016 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
intel_pstate: Avoid pointless FRAC_BITS shifts under div_fp()
There are multiple places in intel_pstate where int_tofp() is applied to both arguments of div_fp(), but this is pointless, because int_
intel_pstate: Avoid pointless FRAC_BITS shifts under div_fp()
There are multiple places in intel_pstate where int_tofp() is applied to both arguments of div_fp(), but this is pointless, because int_tofp() simply shifts its argument to the left by FRAC_BITS which mathematically is equivalent to multuplication by 2^FRAC_BITS, so if this is done to both arguments of a division, the extra factors will cancel each other during that operation anyway.
Drop the pointless int_tofp() applied to div_fp() arguments throughout the driver.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
13ad7701 |
| 03-Apr-2016 |
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> |
cpufreq: intel_pstate: Documenation for structures
No code change. Only added kernel doc style comments for structures.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Sign
cpufreq: intel_pstate: Documenation for structures
No code change. Only added kernel doc style comments for structures.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
30a39153 |
| 03-Apr-2016 |
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> |
cpufreq: intel_pstate: fix inconsistency in setting policy limits
When user sets performance policy using cpufreq interface, it is possible that because of policy->max limits, the actual performance
cpufreq: intel_pstate: fix inconsistency in setting policy limits
When user sets performance policy using cpufreq interface, it is possible that because of policy->max limits, the actual performance is still limited. But the current implementation will silently switch the policy to powersave and start using powersave limits. If user modifies any limits using intel_pstate sysfs, this is actually changing powersave limits.
The current implementation tracks limits under powersave and performance policy using two different variables. When policy->max is less than policy->cpuinfo.max_freq, only powersave limit variable is used.
This fix causes the performance limits variable to be used always when the policy is performance.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
0bed612b |
| 01-Apr-2016 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
cpufreq: sched: Helpers to add and remove update_util hooks
Replace the single helper for adding and removing cpufreq utilization update hooks, cpufreq_set_update_util_data(), with a pair of helpers
cpufreq: sched: Helpers to add and remove update_util hooks
Replace the single helper for adding and removing cpufreq utilization update hooks, cpufreq_set_update_util_data(), with a pair of helpers, cpufreq_add_update_util_hook() and cpufreq_remove_update_util_hook(), and modify the users of cpufreq_set_update_util_data() accordingly.
With the new helpers, the code using them doesn't need to worry about the internals of struct update_util_data and in particular it doesn't need to worry about populating the func field in it properly upfront.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
show more ...
|
#
febce40f |
| 01-Apr-2016 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
intel_pstate: Avoid extra invocation of intel_pstate_sample()
The initialization of intel_pstate for a given CPU involves populating the fields of its struct cpudata that represent the previous samp
intel_pstate: Avoid extra invocation of intel_pstate_sample()
The initialization of intel_pstate for a given CPU involves populating the fields of its struct cpudata that represent the previous sample, but currently that is done in a problematic way.
Namely, intel_pstate_init_cpu() makes an extra call to intel_pstate_sample() so it reads the current register values that will be used to populate the "previous sample" record during the next invocation of intel_pstate_sample(). However, after commit a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with utilization update callbacks) that doesn't work for last_sample_time, because the time value is passed to intel_pstate_sample() as an argument now. Passing 0 to it from intel_pstate_init_cpu() is problematic, because that causes cpu->last_sample_time == 0 to be visible in get_target_pstate_use_performance() (and hence the extra cpu->last_sample_time > 0 check in there) and effectively allows the first invocation of intel_pstate_sample() from intel_pstate_update_util() to happen immediately after the initialization which may lead to a significant "turn on" effect in the governor algorithm.
To mitigate that issue, rework the initialization to avoid the extra intel_pstate_sample() call from intel_pstate_init_cpu(). Instead, make intel_pstate_sample() return false if it has been called with cpu->sample.time equal to zero, which will make intel_pstate_update_util() skip the sample in that case, and reset cpu->sample.time from intel_pstate_set_update_util_hook() to make the algorithm start properly every time the hook is set.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
bb6ab52f |
| 31-Mar-2016 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
intel_pstate: Do not set utilization update hook too early
The utilization update hook in the intel_pstate driver is set too early, as it only should be set after the policy has been fully initializ
intel_pstate: Do not set utilization update hook too early
The utilization update hook in the intel_pstate driver is set too early, as it only should be set after the policy has been fully initialized by the core. That may cause intel_pstate_update_util() to use incorrect data and put the CPUs into incorrect P-states as a result.
To prevent that from happening, make intel_pstate_set_policy() set the utilization update hook instead of intel_pstate_init_cpu() so intel_pstate_update_util() only runs when all things have been initialized as appropriate.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
fdfdb2b1 |
| 18-Mar-2016 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
intel_pstate: Do not call wrmsrl_on_cpu() with disabled interrupts
After commit a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with utilization update callbacks) wrmsrl_on_cpu() cannot be calle
intel_pstate: Do not call wrmsrl_on_cpu() with disabled interrupts
After commit a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with utilization update callbacks) wrmsrl_on_cpu() cannot be called in the intel_pstate_adjust_busy_pstate() path as that is executed with disabled interrupts. However, atom_set_pstate() called from there via intel_pstate_set_pstate() uses wrmsrl_on_cpu() to update the IA32_PERF_CTL MSR which triggers the WARN_ON_ONCE() in smp_call_function_single().
The reason why wrmsrl_on_cpu() is used by atom_set_pstate() is because intel_pstate_set_pstate() calling it is also invoked during the initialization and cleanup of the driver and in those cases it is not guaranteed to be run on the CPU that is being updated. However, in the case when intel_pstate_set_pstate() is called by intel_pstate_adjust_busy_pstate(), wrmsrl() can be used to update the register safely. Moreover, intel_pstate_set_pstate() already contains code that only is executed if the function is called by intel_pstate_adjust_busy_pstate() and there is a special argument passed to it because of that.
To fix the problem at hand, rearrange the code taking the above observations into account.
First, replace the ->set() callback in struct pstate_funcs with a ->get_val() one that will return the value to be written to the IA32_PERF_CTL MSR without updating the register.
Second, split intel_pstate_set_pstate() into two functions, intel_pstate_update_pstate() to be called by intel_pstate_adjust_busy_pstate() that will contain all of the intel_pstate_set_pstate() code which only needs to be executed in that case and will use wrmsrl() to update the MSR (after obtaining the value to write to it from the ->get_val() callback), and intel_pstate_set_min_pstate() to be invoked during the initialization and cleanup that will set the P-state to the minimum one and will update the MSR using wrmsrl_on_cpu().
Finally, move the code shared between intel_pstate_update_pstate() and intel_pstate_set_min_pstate() to a new static inline function intel_pstate_record_pstate() and make them both call it.
Of course, that unifies the handling of the IA32_PERF_CTL MSR writes between Atom and Core.
Fixes: a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with utilization update callbacks) Reported-and-tested-by: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
4fec7ad5 |
| 10-Mar-2016 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
intel_pstate: Do not skip samples partially
If the current value of MPERF or the current value of TSC is the same as the previous one, respectively, intel_pstate_sample() bails out early and skips t
intel_pstate: Do not skip samples partially
If the current value of MPERF or the current value of TSC is the same as the previous one, respectively, intel_pstate_sample() bails out early and skips the sample.
However, intel_pstate_adjust_busy_pstate() is still called in that case which is not correct, so modify intel_pstate_sample() to return a bool value indicating whether or not the sample has been taken and use it to decide whether or not to call intel_pstate_adjust_busy_pstate().
While at it, remove redundant parentheses from the MPERF/TSC check in intel_pstate_sample().
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
show more ...
|
#
8fa520af |
| 06-Mar-2016 |
Philippe Longepe <philippe.longepe@linux.intel.com> |
intel_pstate: Remove freq calculation from intel_pstate_calc_busy()
Use a helper function to compute the average pstate and call it only where it is needed (only when tracing or in intel_pstate_get)
intel_pstate: Remove freq calculation from intel_pstate_calc_busy()
Use a helper function to compute the average pstate and call it only where it is needed (only when tracing or in intel_pstate_get).
Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
7349ec04 |
| 06-Mar-2016 |
Philippe Longepe <philippe.longepe@linux.intel.com> |
intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()
The cpu_load algorithm doesn't need to invoke intel_pstate_calc_busy(), so move that call from intel_pstate_sampl
intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()
The cpu_load algorithm doesn't need to invoke intel_pstate_calc_busy(), so move that call from intel_pstate_sample() to get_target_pstate_use_performance().
Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
a158bed5 |
| 06-Mar-2016 |
Philippe Longepe <philippe.longepe@linux.intel.com> |
intel_pstate: Optimize calculation for max/min_perf_adj
mul_fp(int_tofp(A), B) expands to: ((A << FRAC_BITS) * B) >> FRAC_BITS, so the same result can be obtained via simple multiplication A * B. A
intel_pstate: Optimize calculation for max/min_perf_adj
mul_fp(int_tofp(A), B) expands to: ((A << FRAC_BITS) * B) >> FRAC_BITS, so the same result can be obtained via simple multiplication A * B. Apply this observation to max_perf * limits->max_perf and max_perf * limits->min_perf in intel_pstate_get_min_max()."
Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
b54a0dfd |
| 08-Mar-2016 |
Philippe Longepe <philippe.longepe@linux.intel.com> |
intel_pstate: Remove extra conversions in pid calculation
pid->setpoint and pid->deadband can be initialized in fixed point, so we can avoid the int_tofp in pid_calc.
Signed-off-by: Philippe Longep
intel_pstate: Remove extra conversions in pid calculation
pid->setpoint and pid->deadband can be initialized in fixed point, so we can avoid the int_tofp in pid_calc.
Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
08f511fd |
| 03-Mar-2016 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
cpufreq: Reduce cpufreq_update_util() overhead a bit
Use the observation that cpufreq_update_util() is only called by the scheduler with rq->lock held, so the callers of cpufreq_set_update_util_data
cpufreq: Reduce cpufreq_update_util() overhead a bit
Use the observation that cpufreq_update_util() is only called by the scheduler with rq->lock held, so the callers of cpufreq_set_update_util_data() can use synchronize_sched() instead of synchronize_rcu() to wait for cpufreq_update_util() to complete. Moreover, if they are updated to do that, rcu_read_(un)lock() calls in cpufreq_update_util() might be replaced with rcu_read_(un)lock_sched(), respectively, but those aren't really necessary, because the scheduler calls that function from RCU-sched read-side critical sections already.
In addition to that, if cpufreq_set_update_util_data() checks the func field in the struct update_util_data before setting the per-CPU pointer to it, the data->func check may be dropped from cpufreq_update_util() as well.
Make the above changes to reduce the overhead from cpufreq_update_util() in the scheduler paths invoking it and to make the cleanup after removing its callbacks less heavy-weight somewhat.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
show more ...
|
#
a4675fbc |
| 04-Feb-2016 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
cpufreq: intel_pstate: Replace timers with utilization update callbacks
Instead of using a per-CPU deferrable timer for utilization sampling and P-states adjustments, register a utilization update c
cpufreq: intel_pstate: Replace timers with utilization update callbacks
Instead of using a per-CPU deferrable timer for utilization sampling and P-states adjustments, register a utilization update callback that will be invoked from the scheduler on utilization changes.
The sampling rate is still the same as what was used for the deferrable timers, so the functional impact of this patch should not be significant.
Based on an earlier patch from Srinivas Pandruvada.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
show more ...
|
#
f05c9665 |
| 25-Feb-2016 |
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> |
cpufreq: intel_pstate: disable HWP notifications
Disable HWP Interrupt notification before enabling HWP. Since we don't have HWP interrupt handling for possible performance interrupts, there is not
cpufreq: intel_pstate: disable HWP notifications
Disable HWP Interrupt notification before enabling HWP. Since we don't have HWP interrupt handling for possible performance interrupts, there is not much use of enabling HWP interrupts.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
7791e4aa |
| 25-Feb-2016 |
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> |
cpufreq: intel_pstate: Enable HWP by default
If the processor supports HWP, enable it by default without checking for the cpu model. This will allow to enable HWP in all supported processors without
cpufreq: intel_pstate: Enable HWP by default
If the processor supports HWP, enable it by default without checking for the cpu model. This will allow to enable HWP in all supported processors without driver change.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
41cfd64c |
| 21-Feb-2016 |
Viresh Kumar <viresh.kumar@linaro.org> |
intel_pstate: Update frequencies of policy->cpus only from ->set_policy()
The intel-pstate driver is using intel_pstate_hwp_set() from two separate paths, i.e. ->set_policy() callback and sysfs upda
intel_pstate: Update frequencies of policy->cpus only from ->set_policy()
The intel-pstate driver is using intel_pstate_hwp_set() from two separate paths, i.e. ->set_policy() callback and sysfs update path for the files present in /sys/devices/system/cpu/intel_pstate/ directory.
While an update to the sysfs path applies to all the CPUs being managed by the driver (which essentially means all the online CPUs), the update via the ->set_policy() callback applies to a smaller group of CPUs managed by the policy for which ->set_policy() is called.
And so, intel_pstate_hwp_set() should update frequencies of only the CPUs that are part of policy->cpus mask, while it is called from ->set_policy() callback.
In order to do that, add a parameter (cpumask) to intel_pstate_hwp_set() and apply the frequency changes only to the concerned CPUs.
For ->set_policy() path, we are only concerned about policy->cpus, and so policy->rwsem lock taken by the core prior to calling ->set_policy() is enough to take care of any races. The larger lock acquired by get_online_cpus() is required only for the updates to sysfs files.
Add another routine, intel_pstate_hwp_set_online_cpus(), and call it from the sysfs update paths.
This also fixes a lockdep reported recently, where policy->rwsem and get_online_cpus() could have been acquired in any order causing an ABBA deadlock. The sequence of events leading to that was:
intel_pstate_init(...) ...cpufreq_online(...) down_write(&policy->rwsem); // Locks policy->rwsem ... cpufreq_init_policy(policy); ...intel_pstate_hwp_set(); get_online_cpus(); // Temporarily locks cpu_hotplug.lock ... up_write(&policy->rwsem);
pm_suspend(...) ...disable_nonboot_cpus() _cpu_down() cpu_hotplug_begin(); // Locks cpu_hotplug.lock __cpu_notify(CPU_DOWN_PREPARE, ...); ...cpufreq_offline_prepare(); down_write(&policy->rwsem); // Locks policy->rwsem
Reported-and-tested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
88b7b7c0 |
| 08-Dec-2015 |
Prarit Bhargava <prarit@redhat.com> |
cpufreq: intel_pstate: Minor cleanup for FRAC_BITS
785ee27 ("cpufreq: intel_pstate: Fix limits->max_perf rounding error") hardcodes the value of FRAC_BITS. This patch fixes that minor issue.
Fixes
cpufreq: intel_pstate: Minor cleanup for FRAC_BITS
785ee27 ("cpufreq: intel_pstate: Fix limits->max_perf rounding error") hardcodes the value of FRAC_BITS. This patch fixes that minor issue.
Fixes: 785ee2788141 (cpufreq: intel_pstate: Fix limits->max_perf rounding error) Signed-off-by: Prarit Bhargava <prarit@redhat.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
63d1d656 |
| 04-Dec-2015 |
Philippe Longepe <philippe.longepe@intel.com> |
cpufreq: intel_pstate: Account for IO wait time
In cases where we have many IOs, the global load becomes low and the load algorithm will decrease the requested P-State. Because of that, the IOs over
cpufreq: intel_pstate: Account for IO wait time
In cases where we have many IOs, the global load becomes low and the load algorithm will decrease the requested P-State. Because of that, the IOs overheads will increase and impact the IO performances.
To improve IO bound work, we can count the io-wait time as busy time in calculating CPU busy.
This change uses get_cpu_iowait_time_us() to obtain the IO wait time value and converts time into number of cycles spent waiting on IO at the TSC rate. At the moment, this trick is only used for Atom.
Signed-off-by: Philippe Longepe <philippe.longepe@intel.com> Signed-off-by: Stephane Gasparini <stephane.gasparini@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
e70eed2b |
| 04-Dec-2015 |
Philippe Longepe <philippe.longepe@intel.com> |
cpufreq: intel_pstate: Account for non C0 time
The current function to calculate cpu utilization uses the average P-state ratio (APerf/Mperf) scaled by the ratio of the current P-state to the max av
cpufreq: intel_pstate: Account for non C0 time
The current function to calculate cpu utilization uses the average P-state ratio (APerf/Mperf) scaled by the ratio of the current P-state to the max available non-turbo one. This leads to an overestimation of utilization which causes higher-performance P-states to be selected more often and that leads to increased energy consumption.
This is a problem for low-power systems, so it is better to use a different utilization calculation algorithm for them.
Namely, the Percent Busy value (or load) can be estimated as the ratio of the MPERF counter that runs at a constant rate only during active periods (C0) to the time stamp counter (TSC) that also runs (at the same rate) during idle. That is:
Percent Busy = 100 * (delta_mperf / delta_tsc)
Use this algorithm for platforms with SoCs based on the Airmont and Silvermont Atom cores.
Signed-off-by: Philippe Longepe <philippe.longepe@intel.com> Signed-off-by: Stephane Gasparini <stephane.gasparini@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|