#
bc95a454 |
| 19-Jul-2016 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
intel_pstate: Update cpu_frequency tracepoint every time
Currently, intel_pstate only updates the cpu_frequency tracepoint if the new P-state to set is different from the current one, but that cause
intel_pstate: Update cpu_frequency tracepoint every time
Currently, intel_pstate only updates the cpu_frequency tracepoint if the new P-state to set is different from the current one, but that causes powertop to report 100% idle on an 100% loaded system sometimes.
Prevent that from happening by updating the cpu_frequency tracepoint every time intel_pstate_update_pstate() is called.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>-
show more ...
|
#
2630abc2 |
| 18-Jul-2016 |
Carsten Emde <C.Emde@osadl.org> |
cpufreq: intel_pstate: clean remnant struct element
When I was working with the Intel P state driver I came across a remnant struct element that is no longer needed after the function intel_pstate_c
cpufreq: intel_pstate: clean remnant struct element
When I was working with the Intel P state driver I came across a remnant struct element that is no longer needed after the function intel_pstate_calc_freq() was retired.
Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
5fc8f707 |
| 08-Jul-2016 |
Jan Kiszka <jan.kiszka@siemens.com> |
intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate()
If MSR_CONFIG_TDP_CONTROL is locked, we currently try to address some MSR 0x80000648 or so. Mask out the relevant level bits 0
intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate()
If MSR_CONFIG_TDP_CONTROL is locked, we currently try to address some MSR 0x80000648 or so. Mask out the relevant level bits 0 and 1.
Found while running over the Jailhouse hypervisor which became upset about this strange MSR index.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 4.4+ <stable@vger.kernel.org> # 4.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
100cf6f2 |
| 06-Jul-2016 |
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> |
cpufreq: intel_pstate: Replace MSR_NHM_TURBO_RATIO_LIMIT
Replace MSR_NHM_TURBO_RATIO_LIMIT with MSR_TURBO_RATIO_LIMIT.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked
cpufreq: intel_pstate: Replace MSR_NHM_TURBO_RATIO_LIMIT
Replace MSR_NHM_TURBO_RATIO_LIMIT with MSR_TURBO_RATIO_LIMIT.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
4a7cb7a9 |
| 27-Jun-2016 |
Jisheng Zhang <jszhang@marvell.com> |
intel_pstate: Declare pid_params/pstate_funcs/hwp_active __read_mostly
pid_params is written once by copy_pid_params() during initialization, and thereafter is mostly read by hot path intel_pstate_u
intel_pstate: Declare pid_params/pstate_funcs/hwp_active __read_mostly
pid_params is written once by copy_pid_params() during initialization, and thereafter is mostly read by hot path intel_pstate_update_util(). The read of pid_params gets more after commit a4675fbc4a7a ("cpufreq: intel_pstate: Replace timers with utilization update callbacks")
pstate_funcs is written once by copy_cpu_funcs() during initialization, and thereafter is mostly read by hot path intel_pstate_update_util()
hwp_active is written to once during initialization and thereafter is mostly read by hot path intel_pstate_update_util().
The fact that they are mostly read and not written to makes them candidates for __read_mostly declarations.
Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
29327c84 |
| 27-Jun-2016 |
Jisheng Zhang <jszhang@marvell.com> |
intel_pstate: add __init/__initdata marker to some functions/variables
These functions/variables are not needed after booting, so mark them as __init or __initdata.
Signed-off-by: Jisheng Zhang <js
intel_pstate: add __init/__initdata marker to some functions/variables
These functions/variables are not needed after booting, so mark them as __init or __initdata.
Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
eed43609 |
| 27-Jun-2016 |
Jisheng Zhang <jszhang@marvell.com> |
intel_pstate: Fix incorrect placement of __initdata
__initdata should be placed between the variable name and equal sign (if there is) for the variable to be placed in the intended section.
Signed-
intel_pstate: Fix incorrect placement of __initdata
__initdata should be placed between the variable name and equal sign (if there is) for the variable to be placed in the intended section.
Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
5ab666e0 |
| 27-Jun-2016 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
intel_pstate: Do not clear utilization update hooks on policy changes
intel_pstate_set_policy() is invoked by the cpufreq core during driver initialization, on changes of policy attributes (minimim
intel_pstate: Do not clear utilization update hooks on policy changes
intel_pstate_set_policy() is invoked by the cpufreq core during driver initialization, on changes of policy attributes (minimim and maximum frequency, for example) via sysfs and via CPU notifications from the platform firmware. On some platforms the latter may occur relatively often.
Commit bb6ab52f2bef (intel_pstate: Do not set utilization update hook too early) made intel_pstate_set_policy() clear the CPU's utilization update hook before updating the policy attributes for it (and set the hook again after doind that), but that involves invoking synchronize_sched() and adds overhead to the CPU notifications mentioned above and to the sched-RCU handling in general.
That extra overhead is arguably not necessary, because updating policy attributes when the CPU's utilization update hook is active should not lead to any adverse effects, so drop the clearing of the hook from intel_pstate_set_policy() and make it check if the hook has been set already when attempting to set it.
Fixes: bb6ab52f2bef (intel_pstate: Do not set utilization update hook too early) Reported-by: Jisheng Zhang <jszhang@marvell.com> Tested-by: Jisheng Zhang <jszhang@marvell.com> Tested-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
b00345d1 |
| 15-Jun-2016 |
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> |
cpufreq: intel_pstate: Adjust _PSS[0] freqeuency if needed
The maximum turbo P-State used by the intel_pstate driver may be limited by ACPI _PSS table entry 0. After commit 9522a2ff9cde (cpufreq: i
cpufreq: intel_pstate: Adjust _PSS[0] freqeuency if needed
The maximum turbo P-State used by the intel_pstate driver may be limited by ACPI _PSS table entry 0. After commit 9522a2ff9cde (cpufreq: intel_pstate: Enforce _PPC limits), the maximum performance on servers will be capped by the _PSS table entry 0 by default.
Even though that is formally correct, it may lead to preformance regressions in some cases. Namely, if the _PSS table entry 0 is not the maximum turbo P-State, performance measured after commit 9522a2ff9cde will not match the performance measured before that commit on the same system.
For this reason, modify the code to always use the maximum turbo frequency as the one that corresponds to _PSS table entry 0 if turbo is enabled in the BIOS. This way, the performance levels from before commit 9522a2ff9cde will be restored on the affected systems.
Fixes: 9522a2ff9cde (cpufreq: intel_pstate: Enforce _PPC limits) Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw : Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
41bad47f |
| 09-Jun-2016 |
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> |
cpufreq: intel_pstate: Broxton support
Add Broxton CPU model number.
Broxton requires core_params to get performance limits via MSRs, but it is an Atom platform, which requires more power optimized
cpufreq: intel_pstate: Broxton support
Add Broxton CPU model number.
Broxton requires core_params to get performance limits via MSRs, but it is an Atom platform, which requires more power optimized algorithm.
So the P state selection will use similar algorithm as other Atom platforms.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
5b20c944 |
| 02-Jun-2016 |
Dave Hansen <dave.hansen@linux.intel.com> |
x86/cpufreq: Use Intel family name macros for the intel_pstate cpufreq driver
Another straightforward replacement of magic numbers.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by
x86/cpufreq: Use Intel family name macros for the intel_pstate cpufreq driver
Another straightforward replacement of magic numbers.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: jacob.jun.pan@intel.com Cc: linux-pm@vger.kernel.org Link: http://lkml.kernel.org/r/20160603001945.0F5D02AA@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
#
983e600e |
| 07-Jun-2016 |
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> |
cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo
When turbo is disabled, the ->set_policy() interface is broken.
For example, when turbo is disabled and cpuinfo.max = 2900000 (full
cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo
When turbo is disabled, the ->set_policy() interface is broken.
For example, when turbo is disabled and cpuinfo.max = 2900000 (full max turbo frequency), setting the limits results in frequency less than the requested one: Set 1000000 KHz results in 0700000 KHz Set 1500000 KHz results in 1100000 KHz Set 2000000 KHz results in 1500000 KHz
This is because the limits->max_perf fraction is calculated using the max turbo frequency as the reference, but when the max P-State is capped in intel_pstate_get_min_max(), the reference is not the max turbo P-State. This results in reducing max P-State.
One option is to always use max turbo as reference for calculating limits. But this will not be correct. By definition the intel_pstate sysfs limits, shows percentage of available performance. So when BIOS has disabled turbo, the available performance is max non turbo. So the max_perf_pct should still show 100%.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw : Subject & changelog, rewrite in fewer lines of code ] Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
2c2c1af4 |
| 07-Jun-2016 |
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> |
cpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy()
The limits->max_perf is rounded_up but immediately overwritten by another assignment to limits->max_perf.
Move that operation t
cpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy()
The limits->max_perf is rounded_up but immediately overwritten by another assignment to limits->max_perf.
Move that operation to the correct location.
While here also added a pr_debug() call in ->set_policy to aid in debugging.
Fixes: 785ee2788141 (cpufreq: intel_pstate: Fix limits->max_perf rounding error) Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw : Subject & changelog ] Cc: 4.4+ <stable@vger.kernel.org> # 4.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
6cacd115 |
| 30-May-2016 |
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> |
cpufreq: intel_pstate: Downgrade print level for _PPC
Downgrade pr_info to pr_debug for the "_PPC limits will be enforced" message.
In server systems with many cores this message is annoying.
Sign
cpufreq: intel_pstate: Downgrade print level for _PPC
Downgrade pr_info to pr_debug for the "_PPC limits will be enforced" message.
In server systems with many cores this message is annoying.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
c749c64f |
| 11-May-2016 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
intel_pstate: Simplify conditional in intel_pstate_set_policy()
One of the if () statements in intel_pstate_set_policy() causes another if () to be evaluated if the condition is true and it doesn't
intel_pstate: Simplify conditional in intel_pstate_set_policy()
One of the if () statements in intel_pstate_set_policy() causes another if () to be evaluated if the condition is true and it doesn't do anything else, so merge the two if () statements into one.
No functional changes.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
show more ...
|
#
1aa7a6e2 |
| 11-May-2016 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
intel_pstate: Clean up get_target_pstate_use_performance()
The comments and the core_busy variable name in get_target_pstate_use_performance() are totally confusing, so modify them to reflect what's
intel_pstate: Clean up get_target_pstate_use_performance()
The comments and the core_busy variable name in get_target_pstate_use_performance() are totally confusing, so modify them to reflect what's going on.
The results of the computations should be the same as before.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
8edb0a6e |
| 11-May-2016 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
intel_pstate: Use sample.core_avg_perf in get_avg_pstate()
Notice that get_avg_pstate() can use sample.core_avg_perf instead of carrying the same division again, so make it do that.
Signed-off-by:
intel_pstate: Use sample.core_avg_perf in get_avg_pstate()
Notice that get_avg_pstate() can use sample.core_avg_perf instead of carrying the same division again, so make it do that.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
a1c9787d |
| 11-May-2016 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
intel_pstate: Clarify average performance computation
The core_pct_busy field of struct sample actually contains the average performace during the last sampling period (in percent) and not the utili
intel_pstate: Clarify average performance computation
The core_pct_busy field of struct sample actually contains the average performace during the last sampling period (in percent) and not the utilization of the core as suggested by its name which is confusing.
For this reason, change the name of that field to core_avg_perf and rename the function that computes its value accordingly.
Also notice that storing this value as percentage requires a costly integer multiplication to be carried out in a hot path, so instead store it as an "extended fixed point" value with more fraction bits and update the code using it accordingly (it is better to change the name of the field along with its meaning in one go than to make those two changes separately, as that would likely lead to more confusion).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
4578ee7e |
| 11-May-2016 |
Chen Yu <yu.c.chen@intel.com> |
intel_pstate: Avoid unnecessary synchronize_sched() during initialization
Currently, in intel_pstate_clear_update_util_hook(), after clearing the utilization update hook, we leverage synchronize_sch
intel_pstate: Avoid unnecessary synchronize_sched() during initialization
Currently, in intel_pstate_clear_update_util_hook(), after clearing the utilization update hook, we leverage synchronize_sched() to deal with synchronization, which is a little bit time-costly because synchronize_sched() has to wait for all the CPUs to go through a grace period.
Actually, the synchronize_sched() is not necessary if the utilization update hook has not been set for the given CPU yet, so make the driver check if that's the case and avoid the synchronize_sched() call then.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=116371 Tested-by: Tian Ye <yex.tian@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> [ rjw : Rebase ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
f96fd0c8 |
| 06-May-2016 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
intel_pstate: Clean up intel_pstate_get()
intel_pstate_get() contains a local variable that's initialized but never used and it can be written in fewer lines of code, so clean it up.
Signed-off-by:
intel_pstate: Clean up intel_pstate_get()
intel_pstate_get() contains a local variable that's initialized but never used and it can be written in fewer lines of code, so clean it up.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
show more ...
|
#
e59a8f7f |
| 04-May-2016 |
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> |
cpufreq: intel_pstate: Ignore _PPC processing under HWP
When HWP (hardware P states) feature is active, the ACPI _PSS and _PPC is not used. So ignore processing for _PPC limits.
Signed-off-by: Srin
cpufreq: intel_pstate: Ignore _PPC processing under HWP
When HWP (hardware P states) feature is active, the ACPI _PSS and _PPC is not used. So ignore processing for _PPC limits.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
6d45b719 |
| 04-May-2016 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
intel_pstate: Fix intel_pstate_get()
After commit 8fa520af5081 "intel_pstate: Remove freq calculation from intel_pstate_calc_busy()" intel_pstate_get() calls get_avg_frequency() to compute the avera
intel_pstate: Fix intel_pstate_get()
After commit 8fa520af5081 "intel_pstate: Remove freq calculation from intel_pstate_calc_busy()" intel_pstate_get() calls get_avg_frequency() to compute the average frequency, which is problematic for two reasons.
First, intel_pstate_get() may be invoked before the driver reads the CPU feedback registers for the first time and if that happens, get_avg_frequency() will attempt to divide by zero.
Second, the get_avg_frequency() call in intel_pstate_get() is racy with respect to intel_pstate_sample() and it may end up returning completely meaningless values for this reason.
Moreover, after commit 7349ec0470b6 "intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()" sample.core_pct_busy is never computed on Atom, but it is used in intel_pstate_adjust_busy_pstate() in that case too.
To address those problems notice that if sample.core_pct_busy was used in the average frequency computation carried out by get_avg_frequency(), both the divide by zero problem and the race with respect to intel_pstate_sample() would be avoided.
Accordingly, move the invocation of intel_pstate_calc_busy() from get_target_pstate_use_performance() to intel_pstate_update_util(), which also will take care of the uninitialized sample.core_pct_busy on Atom, and modify get_avg_frequency() to use sample.core_pct_busy as per the above.
Reported-by: kernel test robot <ying.huang@linux.intel.com> Link: http://marc.info/?l=linux-kernel&m=146226437623173&w=4 Fixes: 8fa520af5081 "intel_pstate: Remove freq calculation from intel_pstate_calc_busy()" Fixes: 7349ec0470b6 "intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()" Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
ba41e1bc |
| 01-May-2016 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
cpufreq: intel_pstate: Fix HWP on boot CPU after system resume
Commit 41cfd64cf49fc "Update frequencies of policy->cpus only from ->set_policy()" changed the way the intel_pstate driver's ->set_poli
cpufreq: intel_pstate: Fix HWP on boot CPU after system resume
Commit 41cfd64cf49fc "Update frequencies of policy->cpus only from ->set_policy()" changed the way the intel_pstate driver's ->set_policy callback updates the HWP (hardware-managed P-states) settings. A side effect of it is that if those settings are modified on the boot CPU during system suspend and wakeup, they will never be restored during subsequent system resume.
To address this problem, allow cpufreq drivers that don't provide ->target or ->target_index callbacks to use ->suspend and ->resume callbacks and add a ->resume callback to intel_pstate to restore the HWP settings on the CPUs that belong to the given policy.
Fixes: 41cfd64cf49fc "Update frequencies of policy->cpus only from ->set_policy()" Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
show more ...
|
#
2b3ec765 |
| 27-Apr-2016 |
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> |
cpufreq: intel_pstate: Enable PPC enforcement for servers
For platforms which are controlled via remove node manager, enable _PPC by default. These platforms are mostly categorized as enterprise ser
cpufreq: intel_pstate: Enable PPC enforcement for servers
For platforms which are controlled via remove node manager, enable _PPC by default. These platforms are mostly categorized as enterprise server or performance servers. These platforms needs to go through some certifications tests, which tests control via _PPC. The relative risk of enabling by default is low as this is is less likely that these systems have broken _PSS table.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
3be9200d |
| 27-Apr-2016 |
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> |
cpufreq: intel_pstate: Adjust policy->max
When policy->max is changed via _PPC or sysfs and is more than the max non turbo frequency, it does not really change resulting performance in some processo
cpufreq: intel_pstate: Adjust policy->max
When policy->max is changed via _PPC or sysfs and is more than the max non turbo frequency, it does not really change resulting performance in some processors. When policy->max results in a P-State ratio more than the turbo activation ratio, then processor can choose any P-State up to max turbo. So the user or _PPC setting has no value, but this can cause undesirable side effects like: - Showing reduced max percentage in Intel P-State sysfs - It can cause reduced max performance under certain boundary conditions: The requested max scaling frequency either via _PPC or via cpufreq-sysfs, will be converted into a fixed floating point max percent scale. In majority of the cases this will result in correct max. But not 100% of the time. If the _PPC is requested at a point where the calculation lead to a lower max, this can result in a lower P-State then expected and it will impact performance. Example of this condition using a Broadwell laptop with config TDP.
ACPI _PSS table from a Broadwell laptop 2301000 2300000 2200000 2000000 1900000 1800000 1700000 1500000 1400000 1300000 1100000 1000000 900000 800000 600000 500000
The actual results by disabling config TDP so that we can get what is requested on or below 2300000Khz.
scaling_max_freq Max Requested P-State Resultant scaling max ---------------------------------------- ---------------------- 2400000 18 2900000 (max turbo) 2300000 17 2300000 (max physical non turbo) 2200000 15 2100000 2100000 15 2100000 2000000 13 1900000 1900000 13 1900000 1800000 12 1800000 1700000 11 1700000 1600000 10 1600000 1500000 f 1500000 1400000 e 1400000 1300000 d 1300000 1200000 c 1200000 1100000 a 1000000 1000000 a 1000000 900000 9 900000 800000 8 800000 700000 7 700000 600000 6 600000 500000 5 500000 ------------------------------------------------------------------
Now set the config TDP level 1 ratio as 0x0b (equivalent to 1100000KHz) in BIOS (not every system will let you adjust this). The turbo activation ratio will be set to one less than that, which will be 0x0a (So any request above 1000000KHz should result in turbo region assuming no thermal limits). Here _PPC will request max to 1100000KHz (which basically should still result in turbo as this is more than the turbo activation ratio up to max allowable turbo frequency), but actual calculation resulted in a max ceiling P-State which is 0x0a. So under any load condition, this driver will not request turbo P-States. This will be a huge performance hit.
When config TDP feature is ON, if the _PPC points to a frequency above turbo activation ratio, the performance can still reach max turbo. In this case we don't need to treat this as the reduced frequency in set_policy callback.
In this change when config TDP is active (by checking if the physical max non turbo ratio is more than the current max non turbo ratio), any request above current max non turbo is treated as full performance.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw : Minor cleanups ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|