History log of /openbmc/linux/drivers/cpufreq/intel_pstate.c (Results 226 – 250 of 397)
Revision Date Author Comments
# bc95a454 19-Jul-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

intel_pstate: Update cpu_frequency tracepoint every time

Currently, intel_pstate only updates the cpu_frequency tracepoint
if the new P-state to set is different from the current one, but
that cause

intel_pstate: Update cpu_frequency tracepoint every time

Currently, intel_pstate only updates the cpu_frequency tracepoint
if the new P-state to set is different from the current one, but
that causes powertop to report 100% idle on an 100% loaded system
sometimes.

Prevent that from happening by updating the cpu_frequency tracepoint
every time intel_pstate_update_pstate() is called.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>-

show more ...


# 2630abc2 18-Jul-2016 Carsten Emde <C.Emde@osadl.org>

cpufreq: intel_pstate: clean remnant struct element

When I was working with the Intel P state driver I came across a
remnant struct element that is no longer needed after the function
intel_pstate_c

cpufreq: intel_pstate: clean remnant struct element

When I was working with the Intel P state driver I came across a
remnant struct element that is no longer needed after the function
intel_pstate_calc_freq() was retired.

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 5fc8f707 08-Jul-2016 Jan Kiszka <jan.kiszka@siemens.com>

intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate()

If MSR_CONFIG_TDP_CONTROL is locked, we currently try to address some
MSR 0x80000648 or so. Mask out the relevant level bits 0

intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate()

If MSR_CONFIG_TDP_CONTROL is locked, we currently try to address some
MSR 0x80000648 or so. Mask out the relevant level bits 0 and 1.

Found while running over the Jailhouse hypervisor which became upset
about this strange MSR index.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 100cf6f2 06-Jul-2016 Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

cpufreq: intel_pstate: Replace MSR_NHM_TURBO_RATIO_LIMIT

Replace MSR_NHM_TURBO_RATIO_LIMIT with MSR_TURBO_RATIO_LIMIT.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked

cpufreq: intel_pstate: Replace MSR_NHM_TURBO_RATIO_LIMIT

Replace MSR_NHM_TURBO_RATIO_LIMIT with MSR_TURBO_RATIO_LIMIT.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 4a7cb7a9 27-Jun-2016 Jisheng Zhang <jszhang@marvell.com>

intel_pstate: Declare pid_params/pstate_funcs/hwp_active __read_mostly

pid_params is written once by copy_pid_params() during initialization,
and thereafter is mostly read by hot path intel_pstate_u

intel_pstate: Declare pid_params/pstate_funcs/hwp_active __read_mostly

pid_params is written once by copy_pid_params() during initialization,
and thereafter is mostly read by hot path intel_pstate_update_util().
The read of pid_params gets more after commit a4675fbc4a7a ("cpufreq:
intel_pstate: Replace timers with utilization update callbacks")

pstate_funcs is written once by copy_cpu_funcs() during initialization,
and thereafter is mostly read by hot path intel_pstate_update_util()

hwp_active is written to once during initialization and thereafter is
mostly read by hot path intel_pstate_update_util().

The fact that they are mostly read and not written to makes them
candidates for __read_mostly declarations.

Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 29327c84 27-Jun-2016 Jisheng Zhang <jszhang@marvell.com>

intel_pstate: add __init/__initdata marker to some functions/variables

These functions/variables are not needed after booting, so mark them
as __init or __initdata.

Signed-off-by: Jisheng Zhang <js

intel_pstate: add __init/__initdata marker to some functions/variables

These functions/variables are not needed after booting, so mark them
as __init or __initdata.

Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# eed43609 27-Jun-2016 Jisheng Zhang <jszhang@marvell.com>

intel_pstate: Fix incorrect placement of __initdata

__initdata should be placed between the variable name and equal sign
(if there is) for the variable to be placed in the intended section.

Signed-

intel_pstate: Fix incorrect placement of __initdata

__initdata should be placed between the variable name and equal sign
(if there is) for the variable to be placed in the intended section.

Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 5ab666e0 27-Jun-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

intel_pstate: Do not clear utilization update hooks on policy changes

intel_pstate_set_policy() is invoked by the cpufreq core during
driver initialization, on changes of policy attributes (minimim

intel_pstate: Do not clear utilization update hooks on policy changes

intel_pstate_set_policy() is invoked by the cpufreq core during
driver initialization, on changes of policy attributes (minimim and
maximum frequency, for example) via sysfs and via CPU notifications
from the platform firmware. On some platforms the latter may occur
relatively often.

Commit bb6ab52f2bef (intel_pstate: Do not set utilization update hook
too early) made intel_pstate_set_policy() clear the CPU's utilization
update hook before updating the policy attributes for it (and set the
hook again after doind that), but that involves invoking
synchronize_sched() and adds overhead to the CPU notifications
mentioned above and to the sched-RCU handling in general.

That extra overhead is arguably not necessary, because updating
policy attributes when the CPU's utilization update hook is active
should not lead to any adverse effects, so drop the clearing of
the hook from intel_pstate_set_policy() and make it check if
the hook has been set already when attempting to set it.

Fixes: bb6ab52f2bef (intel_pstate: Do not set utilization update hook too early)
Reported-by: Jisheng Zhang <jszhang@marvell.com>
Tested-by: Jisheng Zhang <jszhang@marvell.com>
Tested-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# b00345d1 15-Jun-2016 Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

cpufreq: intel_pstate: Adjust _PSS[0] freqeuency if needed

The maximum turbo P-State used by the intel_pstate driver may be
limited by ACPI _PSS table entry 0. After commit 9522a2ff9cde
(cpufreq: i

cpufreq: intel_pstate: Adjust _PSS[0] freqeuency if needed

The maximum turbo P-State used by the intel_pstate driver may be
limited by ACPI _PSS table entry 0. After commit 9522a2ff9cde
(cpufreq: intel_pstate: Enforce _PPC limits), the maximum performance
on servers will be capped by the _PSS table entry 0 by default.

Even though that is formally correct, it may lead to preformance
regressions in some cases. Namely, if the _PSS table entry 0 is
not the maximum turbo P-State, performance measured after commit
9522a2ff9cde will not match the performance measured before that
commit on the same system.

For this reason, modify the code to always use the maximum turbo
frequency as the one that corresponds to _PSS table entry 0 if turbo
is enabled in the BIOS. This way, the performance levels from
before commit 9522a2ff9cde will be restored on the affected systems.

Fixes: 9522a2ff9cde (cpufreq: intel_pstate: Enforce _PPC limits)
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 41bad47f 09-Jun-2016 Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

cpufreq: intel_pstate: Broxton support

Add Broxton CPU model number.

Broxton requires core_params to get performance limits via MSRs, but
it is an Atom platform, which requires more power optimized

cpufreq: intel_pstate: Broxton support

Add Broxton CPU model number.

Broxton requires core_params to get performance limits via MSRs, but
it is an Atom platform, which requires more power optimized algorithm.

So the P state selection will use similar algorithm as other Atom
platforms.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 5b20c944 02-Jun-2016 Dave Hansen <dave.hansen@linux.intel.com>

x86/cpufreq: Use Intel family name macros for the intel_pstate cpufreq driver

Another straightforward replacement of magic numbers.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by

x86/cpufreq: Use Intel family name macros for the intel_pstate cpufreq driver

Another straightforward replacement of magic numbers.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: jacob.jun.pan@intel.com
Cc: linux-pm@vger.kernel.org
Link: http://lkml.kernel.org/r/20160603001945.0F5D02AA@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

show more ...


# 983e600e 07-Jun-2016 Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo

When turbo is disabled, the ->set_policy() interface is broken.

For example, when turbo is disabled and cpuinfo.max = 2900000 (full

cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo

When turbo is disabled, the ->set_policy() interface is broken.

For example, when turbo is disabled and cpuinfo.max = 2900000 (full
max turbo frequency), setting the limits results in frequency less
than the requested one:
Set 1000000 KHz results in 0700000 KHz
Set 1500000 KHz results in 1100000 KHz
Set 2000000 KHz results in 1500000 KHz

This is because the limits->max_perf fraction is calculated using
the max turbo frequency as the reference, but when the max P-State is
capped in intel_pstate_get_min_max(), the reference is not the max
turbo P-State. This results in reducing max P-State.

One option is to always use max turbo as reference for calculating
limits. But this will not be correct. By definition the intel_pstate
sysfs limits, shows percentage of available performance. So when
BIOS has disabled turbo, the available performance is max non turbo.
So the max_perf_pct should still show 100%.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Subject & changelog, rewrite in fewer lines of code ]
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 2c2c1af4 07-Jun-2016 Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

cpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy()

The limits->max_perf is rounded_up but immediately overwritten by
another assignment to limits->max_perf.

Move that operation t

cpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy()

The limits->max_perf is rounded_up but immediately overwritten by
another assignment to limits->max_perf.

Move that operation to the correct location.

While here also added a pr_debug() call in ->set_policy to aid in
debugging.

Fixes: 785ee2788141 (cpufreq: intel_pstate: Fix limits->max_perf rounding error)
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Subject & changelog ]
Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 6cacd115 30-May-2016 Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

cpufreq: intel_pstate: Downgrade print level for _PPC

Downgrade pr_info to pr_debug for the "_PPC limits will be enforced"
message.

In server systems with many cores this message is annoying.

Sign

cpufreq: intel_pstate: Downgrade print level for _PPC

Downgrade pr_info to pr_debug for the "_PPC limits will be enforced"
message.

In server systems with many cores this message is annoying.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# c749c64f 11-May-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

intel_pstate: Simplify conditional in intel_pstate_set_policy()

One of the if () statements in intel_pstate_set_policy() causes
another if () to be evaluated if the condition is true and it
doesn't

intel_pstate: Simplify conditional in intel_pstate_set_policy()

One of the if () statements in intel_pstate_set_policy() causes
another if () to be evaluated if the condition is true and it
doesn't do anything else, so merge the two if () statements into
one.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

show more ...


# 1aa7a6e2 11-May-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

intel_pstate: Clean up get_target_pstate_use_performance()

The comments and the core_busy variable name in
get_target_pstate_use_performance() are totally confusing,
so modify them to reflect what's

intel_pstate: Clean up get_target_pstate_use_performance()

The comments and the core_busy variable name in
get_target_pstate_use_performance() are totally confusing,
so modify them to reflect what's going on.

The results of the computations should be the same as before.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 8edb0a6e 11-May-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

intel_pstate: Use sample.core_avg_perf in get_avg_pstate()

Notice that get_avg_pstate() can use sample.core_avg_perf instead of
carrying the same division again, so make it do that.

Signed-off-by:

intel_pstate: Use sample.core_avg_perf in get_avg_pstate()

Notice that get_avg_pstate() can use sample.core_avg_perf instead of
carrying the same division again, so make it do that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# a1c9787d 11-May-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

intel_pstate: Clarify average performance computation

The core_pct_busy field of struct sample actually contains the
average performace during the last sampling period (in percent)
and not the utili

intel_pstate: Clarify average performance computation

The core_pct_busy field of struct sample actually contains the
average performace during the last sampling period (in percent)
and not the utilization of the core as suggested by its name
which is confusing.

For this reason, change the name of that field to core_avg_perf
and rename the function that computes its value accordingly.

Also notice that storing this value as percentage requires a costly
integer multiplication to be carried out in a hot path, so instead
store it as an "extended fixed point" value with more fraction bits
and update the code using it accordingly (it is better to change the
name of the field along with its meaning in one go than to make those
two changes separately, as that would likely lead to more
confusion).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 4578ee7e 11-May-2016 Chen Yu <yu.c.chen@intel.com>

intel_pstate: Avoid unnecessary synchronize_sched() during initialization

Currently, in intel_pstate_clear_update_util_hook(), after
clearing the utilization update hook, we leverage
synchronize_sch

intel_pstate: Avoid unnecessary synchronize_sched() during initialization

Currently, in intel_pstate_clear_update_util_hook(), after
clearing the utilization update hook, we leverage
synchronize_sched() to deal with synchronization, which
is a little bit time-costly because synchronize_sched()
has to wait for all the CPUs to go through a grace period.

Actually, the synchronize_sched() is not necessary if the utilization
update hook has not been set for the given CPU yet, so make the driver
check if that's the case and avoid the synchronize_sched() call then.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=116371
Tested-by: Tian Ye <yex.tian@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
[ rjw : Rebase ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# f96fd0c8 06-May-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

intel_pstate: Clean up intel_pstate_get()

intel_pstate_get() contains a local variable that's initialized but
never used and it can be written in fewer lines of code, so clean
it up.

Signed-off-by:

intel_pstate: Clean up intel_pstate_get()

intel_pstate_get() contains a local variable that's initialized but
never used and it can be written in fewer lines of code, so clean
it up.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

show more ...


# e59a8f7f 04-May-2016 Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

cpufreq: intel_pstate: Ignore _PPC processing under HWP

When HWP (hardware P states) feature is active, the ACPI _PSS and _PPC
is not used. So ignore processing for _PPC limits.

Signed-off-by: Srin

cpufreq: intel_pstate: Ignore _PPC processing under HWP

When HWP (hardware P states) feature is active, the ACPI _PSS and _PPC
is not used. So ignore processing for _PPC limits.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 6d45b719 04-May-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

intel_pstate: Fix intel_pstate_get()

After commit 8fa520af5081 "intel_pstate: Remove freq calculation from
intel_pstate_calc_busy()" intel_pstate_get() calls get_avg_frequency()
to compute the avera

intel_pstate: Fix intel_pstate_get()

After commit 8fa520af5081 "intel_pstate: Remove freq calculation from
intel_pstate_calc_busy()" intel_pstate_get() calls get_avg_frequency()
to compute the average frequency, which is problematic for two reasons.

First, intel_pstate_get() may be invoked before the driver reads the
CPU feedback registers for the first time and if that happens,
get_avg_frequency() will attempt to divide by zero.

Second, the get_avg_frequency() call in intel_pstate_get() is racy
with respect to intel_pstate_sample() and it may end up returning
completely meaningless values for this reason.

Moreover, after commit 7349ec0470b6 "intel_pstate: Move
intel_pstate_calc_busy() into get_target_pstate_use_performance()"
sample.core_pct_busy is never computed on Atom, but it is used in
intel_pstate_adjust_busy_pstate() in that case too.

To address those problems notice that if sample.core_pct_busy
was used in the average frequency computation carried out by
get_avg_frequency(), both the divide by zero problem and the
race with respect to intel_pstate_sample() would be avoided.

Accordingly, move the invocation of intel_pstate_calc_busy() from
get_target_pstate_use_performance() to intel_pstate_update_util(),
which also will take care of the uninitialized sample.core_pct_busy
on Atom, and modify get_avg_frequency() to use sample.core_pct_busy
as per the above.

Reported-by: kernel test robot <ying.huang@linux.intel.com>
Link: http://marc.info/?l=linux-kernel&m=146226437623173&w=4
Fixes: 8fa520af5081 "intel_pstate: Remove freq calculation from intel_pstate_calc_busy()"
Fixes: 7349ec0470b6 "intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()"
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# ba41e1bc 01-May-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: intel_pstate: Fix HWP on boot CPU after system resume

Commit 41cfd64cf49fc "Update frequencies of policy->cpus only from
->set_policy()" changed the way the intel_pstate driver's ->set_poli

cpufreq: intel_pstate: Fix HWP on boot CPU after system resume

Commit 41cfd64cf49fc "Update frequencies of policy->cpus only from
->set_policy()" changed the way the intel_pstate driver's ->set_policy
callback updates the HWP (hardware-managed P-states) settings.
A side effect of it is that if those settings are modified on the
boot CPU during system suspend and wakeup, they will never be
restored during subsequent system resume.

To address this problem, allow cpufreq drivers that don't provide
->target or ->target_index callbacks to use ->suspend and ->resume
callbacks and add a ->resume callback to intel_pstate to restore
the HWP settings on the CPUs that belong to the given policy.

Fixes: 41cfd64cf49fc "Update frequencies of policy->cpus only from ->set_policy()"
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

show more ...


# 2b3ec765 27-Apr-2016 Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

cpufreq: intel_pstate: Enable PPC enforcement for servers

For platforms which are controlled via remove node manager, enable _PPC by
default. These platforms are mostly categorized as enterprise ser

cpufreq: intel_pstate: Enable PPC enforcement for servers

For platforms which are controlled via remove node manager, enable _PPC by
default. These platforms are mostly categorized as enterprise server or
performance servers. These platforms needs to go through some
certifications tests, which tests control via _PPC.
The relative risk of enabling by default is low as this is is less likely
that these systems have broken _PSS table.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 3be9200d 27-Apr-2016 Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

cpufreq: intel_pstate: Adjust policy->max

When policy->max is changed via _PPC or sysfs and is more than the max non
turbo frequency, it does not really change resulting performance in some
processo

cpufreq: intel_pstate: Adjust policy->max

When policy->max is changed via _PPC or sysfs and is more than the max non
turbo frequency, it does not really change resulting performance in some
processors. When policy->max results in a P-State ratio more than the
turbo activation ratio, then processor can choose any P-State up to max
turbo. So the user or _PPC setting has no value, but this can cause
undesirable side effects like:
- Showing reduced max percentage in Intel P-State sysfs
- It can cause reduced max performance under certain boundary conditions:
The requested max scaling frequency either via _PPC or via cpufreq-sysfs,
will be converted into a fixed floating point max percent scale. In
majority of the cases this will result in correct max. But not 100% of the
time. If the _PPC is requested at a point where the calculation lead to a
lower max, this can result in a lower P-State then expected and it will
impact performance.
Example of this condition using a Broadwell laptop with config TDP.

ACPI _PSS table from a Broadwell laptop
2301000 2300000 2200000 2000000 1900000 1800000 1700000 1500000 1400000
1300000 1100000 1000000 900000 800000 600000 500000

The actual results by disabling config TDP so that we can get what is
requested on or below 2300000Khz.

scaling_max_freq Max Requested P-State Resultant scaling
max
---------------------------------------- ----------------------
2400000 18 2900000 (max
turbo)
2300000 17 2300000 (max
physical non turbo)
2200000 15 2100000
2100000 15 2100000
2000000 13 1900000
1900000 13 1900000
1800000 12 1800000
1700000 11 1700000
1600000 10 1600000
1500000 f 1500000
1400000 e 1400000
1300000 d 1300000
1200000 c 1200000
1100000 a 1000000
1000000 a 1000000
900000 9 900000
800000 8 800000
700000 7 700000
600000 6 600000
500000 5 500000
------------------------------------------------------------------

Now set the config TDP level 1 ratio as 0x0b (equivalent to 1100000KHz)
in BIOS (not every system will let you adjust this).
The turbo activation ratio will be set to one less than that, which will
be 0x0a (So any request above 1000000KHz should result in turbo region
assuming no thermal limits).
Here _PPC will request max to 1100000KHz (which basically should still
result in turbo as this is more than the turbo activation ratio up to
max allowable turbo frequency), but actual calculation resulted in a max
ceiling P-State which is 0x0a. So under any load condition, this driver
will not request turbo P-States. This will be a huge performance hit.

When config TDP feature is ON, if the _PPC points to a frequency above
turbo activation ratio, the performance can still reach max turbo. In this
case we don't need to treat this as the reduced frequency in set_policy
callback.

In this change when config TDP is active (by checking if the physical max
non turbo ratio is more than the current max non turbo ratio), any request
above current max non turbo is treated as full performance.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Minor cleanups ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


12345678910>>...16