History log of /openbmc/linux/drivers/cpufreq/intel_pstate.c (Results 276 – 300 of 397)
Revision Date Author Comments
# 157386b6 04-Dec-2015 Philippe Longepe <philippe.longepe@intel.com>

cpufreq: intel_pstate: Configurable algorithm to get target pstate

Target systems using different cpus have different power and performance
requirements. They may use different algorithms to get the

cpufreq: intel_pstate: Configurable algorithm to get target pstate

Target systems using different cpus have different power and performance
requirements. They may use different algorithms to get the next P-state
based on their power or performance preference.

For example, power-constrained systems may not want to use
high-performance P-states as aggressively as a full-size desktop or a
server platform. A server platform may want to run close to the max to
achieve better performance, while laptop-like systems may prefer
sacrificing performance for longer battery lifes.

For the above reasons, modify intel_pstate to allow the target P-state
selection algorithm to be depend on the CPU ID.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Philippe Longepe <philippe.longepe@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 584ee3dc 18-Nov-2015 Alexandra Yates <alexandra.yates@linux.intel.com>

intel_pstate: Fix "performance" mode behavior with HWP enabled

If hardware-driven P-state selection (HWP) is enabled, the
"performance" mode of intel_pstate should only allow the processor
to use th

intel_pstate: Fix "performance" mode behavior with HWP enabled

If hardware-driven P-state selection (HWP) is enabled, the
"performance" mode of intel_pstate should only allow the processor
to use the highest-performance P-state available. That is not
the case currently, so make it actually happen.

Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Alexandra Yates <alexandra.yates@linux.intel.com>
[ rjw: Subject and changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 785ee278 20-Nov-2015 Prarit Bhargava <prarit@redhat.com>

cpufreq: intel_pstate: Fix limits->max_perf rounding error

A rounding error was found in the calculation of limits->max_perf
in intel_pstate_set_policy(), which is used to calculate the max and min

cpufreq: intel_pstate: Fix limits->max_perf rounding error

A rounding error was found in the calculation of limits->max_perf
in intel_pstate_set_policy(), which is used to calculate the max and min
pstate values in intel_pstate_get_min_max(). In that code,
limits->max_perf is truncated to 2 hex digits such that, for example,
0x169 was incorrectly calculated to 0x16 instead of 0x17. This resulted in
the pstate being set one level too low. This patch rounds the value of
limits->max_perf up instead of down so that the correct max pstate can
be reached.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 8478f539 20-Nov-2015 Prarit Bhargava <prarit@redhat.com>

cpufreq: intel_pstate: Fix limits->max_policy_pct rounding error

I have a Intel (6,63) processor with a "marketing" frequency (from
/proc/cpuinfo) of 2100MHz, and a max turbo frequency of 2600MHz.

cpufreq: intel_pstate: Fix limits->max_policy_pct rounding error

I have a Intel (6,63) processor with a "marketing" frequency (from
/proc/cpuinfo) of 2100MHz, and a max turbo frequency of 2600MHz. I
can execute

cpupower frequency-set -g powersave --min 1200MHz --max 2100MHz

and the max_freq_pct is set to 80. When adding load to the system I noticed
that the cpu frequency only reached 2000MHZ and not 2100MHz as expected.

This is because limits->max_policy_pct is calculated as 2100 * 100 /2600 = 80.7
and is rounded down to 80 when it should be rounded up to 81. This patch
adds a DIV_ROUND_UP() which will return the correct value.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 1421df63 09-Nov-2015 Philippe Longepe <philippe.longepe@linux.intel.com>

cpufreq: intel_pstate: Add separate support for Airmont cores

There are two flavors of Atom cores to be supported by intel_pstate,
Silvermont and Airmont, so make the driver distinguish between them

cpufreq: intel_pstate: Add separate support for Airmont cores

There are two flavors of Atom cores to be supported by intel_pstate,
Silvermont and Airmont, so make the driver distinguish between them by
adding separate frequency tables.

Separate the CPU defaults params for each of them and match the CPU IDs
against them as appropriate.

Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
Signed-off-by: Stephane Gasparini <stephane.gasparini@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Subject and changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 938d21a2 09-Nov-2015 Philippe Longepe <philippe.longepe@linux.intel.com>

cpufreq: intel_pstate: Replace BYT with ATOM

Rename symbol and function names starting with "BYT" or "byt" to
start with "ATOM" or "atom", respectively, so as to make it clear
that they may apply to

cpufreq: intel_pstate: Replace BYT with ATOM

Rename symbol and function names starting with "BYT" or "byt" to
start with "ATOM" or "atom", respectively, so as to make it clear
that they may apply to Atom in general and not just to Baytrail
(the goal is to support several Atoms architectures eventually).

This should not lead to any functional changes.

Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
Signed-off-by: Stephane Gasparini <stephane.gasparini@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 6ee11e41 18-Nov-2015 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Revert "cpufreq: intel_pstate: Use ACPI perf configuration"

Revert commit 37afb0003242 (cpufreq: intel_pstate: Use ACPI perf
configuration) that is reported to cause a regression to happen
on a syst

Revert "cpufreq: intel_pstate: Use ACPI perf configuration"

Revert commit 37afb0003242 (cpufreq: intel_pstate: Use ACPI perf
configuration) that is reported to cause a regression to happen
on a system where invalid data are returned by the ACPI _PSS object.

Since that commit makes assumptions regarding the _PSS output
correctness that may turn out to be overly optimistic in general,
there is a concern that it may introduce regression on more
systems, so it's better to revert it now and we'll revisit the
underlying issue in the next cycle with a more robust solution.

Conflicts:
drivers/cpufreq/intel_pstate.c

Fixes: 37afb0003242 (cpufreq: intel_pstate: Use ACPI perf configuration)
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 799281a3 18-Nov-2015 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Revert "cpufreq: intel_pstate: Avoid calculation for max/min"

Revert commit 4ef451487019 (cpufreq: intel_pstate: Avoid calculation for
max/min) as it depends on commit 37afb0003242 (cpufreq: intel_p

Revert "cpufreq: intel_pstate: Avoid calculation for max/min"

Revert commit 4ef451487019 (cpufreq: intel_pstate: Avoid calculation for
max/min) as it depends on commit 37afb0003242 (cpufreq: intel_pstate: Use
ACPI perf configuration) that causes problems to happen and needs to be
reverted.

Conflicts:
drivers/cpufreq/intel_pstate.c

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 539342f6 22-Oct-2015 Prarit Bhargava <prarit@redhat.com>

intel_pstate: decrease number of "HWP enabled" messages

When booting an HWP enabled system the kernel displays one "HWP enabled"
message for each cpu. The messages are superfluous since HWP is glob

intel_pstate: decrease number of "HWP enabled" messages

When booting an HWP enabled system the kernel displays one "HWP enabled"
message for each cpu. The messages are superfluous since HWP is globally
enabled across all CPUs. This patch also adds an informational message
when HWP is disabled via intel_pstate=no_hwp.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 51443fbf 15-Oct-2015 Prarit Bhargava <prarit@redhat.com>

cpufreq: intel_pstate: Fix intel_pstate powersave min_perf_pct value

On systems that initialize the intel_pstate driver with the performance
governor, and then switch to the powersave governor will

cpufreq: intel_pstate: Fix intel_pstate powersave min_perf_pct value

On systems that initialize the intel_pstate driver with the performance
governor, and then switch to the powersave governor will not transition to
lower cpu frequencies until /sys/devices/system/cpu/intel_pstate/min_perf_pct
is set to a low value.

The behavior of governor switching changed after commit a04759924e25
("[cpufreq] intel_pstate: honor user space min_perf_pct override on
resume"). The commit introduced tracking of performance percentage
changes via sysfs in order to restore userspace changes during
suspend/resume. The problem occurs because the global values of the newly
introduced max_sysfs_pct and min_sysfs_pct are not lowered on the governor
change and this causes the powersave governor to inherit the performance
governor's settings.

A simple change would have been to reset max_sysfs_pct to 100 and
min_sysfs_pct to 0 on a governor change, which fixes the problem with
governor switching. However, since we cannot break userspace[1] the fix
is now to give each governor its own limits storage area so that governor
specific changes are tracked.

I successfully tested this by booting with both the performance governor
and the powersave governor by default, and switching between the two
governors (while monitoring /sys/devices/system/cpu/intel_pstate/ values,
and looking at the output of cpupower frequency-info). Suspend/Resume
testing was performed by Doug Smythies.

[1] Systems which suspend/resume using the unmaintained pm-utils package
will always transition to the performance governor before the suspend and
after the resume. This means a system using the powersave governor will
go from powersave to performance, then suspend/resume, performance to
powersave. The simple change during governor changes would have been
overwritten when the governor changed before and after the suspend/resume.
I have submitted https://bugzilla.redhat.com/show_bug.cgi?id=1271225
against Fedora to remove the 94cpufreq file that causes the problem. It
should be noted that pm-utils is obsoleted with newer versions of systemd.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 8e601a9f 15-Oct-2015 Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

cpufreq: intel_pstate: Fix divide by zero on Knights Landing (KNL)

This is a workaround for KNL platform, where in some cases MPERF counter
will not have updated value before next read of MSR_IA32_M

cpufreq: intel_pstate: Fix divide by zero on Knights Landing (KNL)

This is a workaround for KNL platform, where in some cases MPERF counter
will not have updated value before next read of MSR_IA32_MPERF. In this
case divide by zero will occur. This change ignores current sample for
busy calculation in this case.

Fixes: b34ef932d79a (intel_pstate: Knights Landing support)
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: 4.1+ <stable@vger.kernel.org> # 4.1+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 4ef45148 14-Oct-2015 Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

cpufreq: intel_pstate: Avoid calculation for max/min

When requested from cpufreq to set policy, look into _pss and get
control values, instead of using max/min perf calculations. These
calculation m

cpufreq: intel_pstate: Avoid calculation for max/min

When requested from cpufreq to set policy, look into _pss and get
control values, instead of using max/min perf calculations. These
calculation misses next control state in boundary conditions.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 37afb000 14-Oct-2015 Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

cpufreq: intel_pstate: Use ACPI perf configuration

Use ACPI _PSS to limit the Intel P State turbo, max and min ratios.
This driver uses acpi processor perf lib calls to register performance.
The fol

cpufreq: intel_pstate: Use ACPI perf configuration

Use ACPI _PSS to limit the Intel P State turbo, max and min ratios.
This driver uses acpi processor perf lib calls to register performance.
The following logic is used to adjust Intel P state driver limits:
- If there is no turbo entry in _PSS, then disable Intel P state turbo
and limit to non turbo max
- If the non turbo max ratio is more than _PSS max non turbo value, then
set the max non turbo ratio to _PSS non turbo max
- If the min ratio is less than _PSS min then change the min ratio
matching _PSS min
- Scale the _PSS turbo frequency to max turbo frequency based on control
value.
This feature can be disabled by using kernel parameters:
intel_pstate=no_acpi

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 3bcc6fa9 14-Oct-2015 Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

cpufreq: intel-pstate: Use separate max pstate for scaling

Systems with configurable TDP have multiple max non turbo p state. Intel
P state uses max non turbo P state for scaling. But using the real

cpufreq: intel-pstate: Use separate max pstate for scaling

Systems with configurable TDP have multiple max non turbo p state. Intel
P state uses max non turbo P state for scaling. But using the real max
non turbo p state causes underestimation of next P state. So using
the physical max non turbo P state as before for scaling.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 6a35fc2d 14-Oct-2015 Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

cpufreq: intel_pstate: get P1 from TAR when available

After Ivybridge, the max non turbo ratio obtained from platform info msr
is not always guaranteed P1 on client platforms. The max non turbo
acti

cpufreq: intel_pstate: get P1 from TAR when available

After Ivybridge, the max non turbo ratio obtained from platform info msr
is not always guaranteed P1 on client platforms. The max non turbo
activation ratio (TAR), determines the max for the current level of TDP.
The ratio in platform info is physical max. The TAR MSR can be locked,
so updating this value is not possible on all platforms.
This change gets this ratio from MSR TURBO_ACTIVATION_RATIO if
available,
but also do some sanity checking to make sure that this value is
correct.
The sanity check involves reading the TDP ratio for the current tdp
control value when platform has configurable TDP present and matching
TAC
with this.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 74da56ce 09-Sep-2015 Kristen Carlson Accardi <kristen@linux.intel.com>

intel_pstate: fix PCT_TO_HWP macro

PCT_TO_HWP does not take the actual range of pstates exported
by HWP_CAPABILITIES in account, and is broken on most platforms.
Remove the macro and set the min and

intel_pstate: fix PCT_TO_HWP macro

PCT_TO_HWP does not take the actual range of pstates exported
by HWP_CAPABILITIES in account, and is broken on most platforms.
Remove the macro and set the min and max pstate for hwp by
determining the range and adjusting by the min and max percent
limits values.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 43717aad 09-Sep-2015 Chen Yu <yu.c.chen@intel.com>

intel_pstate: Fix user input of min/max to legal policy region

In current code, max_perf_pct might be smaller than min_perf_pct
by improper user input:

$ grep . /sys/devices/system/cpu/intel_pstate

intel_pstate: Fix user input of min/max to legal policy region

In current code, max_perf_pct might be smaller than min_perf_pct
by improper user input:

$ grep . /sys/devices/system/cpu/intel_pstate/m*_perf_pct
/sys/devices/system/cpu/intel_pstate/max_perf_pct:100
/sys/devices/system/cpu/intel_pstate/min_perf_pct:100

$ echo 80 > /sys/devices/system/cpu/intel_pstate/max_perf_pct

$ grep . /sys/devices/system/cpu/intel_pstate/m*_perf_pct
/sys/devices/system/cpu/intel_pstate/max_perf_pct:80
/sys/devices/system/cpu/intel_pstate/min_perf_pct:100

Fix this problem by 2 steps:
1. Normalize the user input to [min_policy, max_policy].
2. Make sure max_perf_pct>=min_perf_pct, suggested by Seiichi Ikarashi.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 5aecc3c8 04-Aug-2015 Ethan Zhao <ethan.zhao@oracle.com>

intel_pstate: append more Oracle OEM table id to vendor bypass list

Append more Oracle X86 servers that have their own power management,

SUN FIRE X4275 M3
SUN FIRE X4170 M3
and
SUN FIRE X6-2

Signe

intel_pstate: append more Oracle OEM table id to vendor bypass list

Append more Oracle X86 servers that have their own power management,

SUN FIRE X4275 M3
SUN FIRE X4170 M3
and
SUN FIRE X6-2

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 1c939123 05-Aug-2015 Kristen Carlson Accardi <kristen@linux.intel.com>

intel_pstate: Add SKY-S support

Whitelist the SKL-S processor

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 144c8e17 29-Jul-2015 Chen Yu <yu.c.chen@intel.com>

intel_pstate: Fix possible overflow complained by Coverity

Coverity scanning performed on intel_pstate.c shows possible
overflow when doing left shifting:
val = pstate << 8;
since pstate is of type

intel_pstate: Fix possible overflow complained by Coverity

Coverity scanning performed on intel_pstate.c shows possible
overflow when doing left shifting:
val = pstate << 8;
since pstate is of type integer, while val is of u64, left shifting
pstate might lead to potential loss of upper bits. Say, if pstate equals
0x4000 0000, after pstate << 8 we will get zero assigned to val.
Although pstate will not likely be that big, this patch cast the left
operand to u64 before performing the left shift, to avoid complaining
from Coverity.

Reported-by: Coquard, Christophe <christophe.coquard@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 69cefc27 21-Jul-2015 Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>

intel_pstate: Add get_scaling cpu_defaults param to Knights Landing

Scaling for Knights Landing is same as the default scaling (100000).
When Knigts Landing support was added to the pstate driver, t

intel_pstate: Add get_scaling cpu_defaults param to Knights Landing

Scaling for Knights Landing is same as the default scaling (100000).
When Knigts Landing support was added to the pstate driver, this
parameter was omitted resulting in a kernel panic during boot.

Fixes: b34ef932d79a (intel_pstate: Knights Landing support)
Reported-by: Yasuaki Ishimatsu <yishimat@redhat.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: 4.1+ <stable@vger.kernel.org> # 4.1+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# ba88d433 14-Jul-2015 Kristen Carlson Accardi <kristen@linux.intel.com>

intel_pstate: enable HWP per CPU

HWP previously was only enabled at driver load time, on the boot
CPU, however, HWP must be enabled per package. Move the code to
enable HWP to the cpufreq driver ini

intel_pstate: enable HWP per CPU

HWP previously was only enabled at driver load time, on the boot
CPU, however, HWP must be enabled per package. Move the code to
enable HWP to the cpufreq driver init path so that it will be
called per CPU.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Tested-by: David Zhuang <david.zhuang@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 4ea1636b 25-Jun-2015 Andy Lutomirski <luto@kernel.org>

x86/asm/tsc: Rename native_read_tsc() to rdtsc()

Now that there is no paravirt TSC, the "native" is
inappropriate. The function does RDTSC, so give it the obvious
name: rdtsc().

Suggested-by: Boris

x86/asm/tsc: Rename native_read_tsc() to rdtsc()

Now that there is no paravirt TSC, the "native" is
inappropriate. The function does RDTSC, so give it the obvious
name: rdtsc().

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/fd43e16281991f096c1e4d21574d9e1402c62d39.1434501121.git.luto@kernel.org
[ Ported it to v4.2-rc1. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

show more ...


# 7180dddf 15-Jun-2015 Prarit Bhargava <prarit@redhat.com>

intel_pstate: Fix overflow in busy_scaled due to long delay

The kernel may delay interrupts for a long time which can result in timers
being delayed. If this occurs the intel_pstate driver will cras

intel_pstate: Fix overflow in busy_scaled due to long delay

The kernel may delay interrupts for a long time which can result in timers
being delayed. If this occurs the intel_pstate driver will crash with a
divide by zero error:

divide error: 0000 [#1] SMP
Modules linked in: btrfs zlib_deflate raid6_pq xor msdos ext4 mbcache jbd2 binfmt_misc arc4 md4 nls_utf8 cifs dns_resolver tcp_lp bnep bluetooth rfkill fuse dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ftp ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables intel_powerclamp coretemp vfat fat kvm_intel iTCO_wdt iTCO_vendor_support ipmi_devintf sr_mod kvm crct10dif_pclmul
crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel cdc_ether lrw usbnet cdrom mii gf128mul glue_helper ablk_helper cryptd lpc_ich mfd_core pcspkr sb_edac edac_core ipmi_si ipmi_msghandler ioatdma wmi shpchp acpi_pad nfsd auth_rpcgss nfs_acl lockd uinput dm_multipath sunrpc xfs libcrc32c usb_storage sd_mod crc_t10dif crct10dif_common ixgbe mgag200 syscopyarea sysfillrect sysimgblt mdio drm_kms_helper ttm igb drm ptp pps_core dca i2c_algo_bit megaraid_sas i2c_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 113 PID: 0 Comm: swapper/113 Tainted: G W -------------- 3.10.0-229.1.2.el7.x86_64 #1
Hardware name: IBM x3950 X6 -[3837AC2]-/00FN827, BIOS -[A8E112BUS-1.00]- 08/27/2014
task: ffff880fe8abe660 ti: ffff880fe8ae4000 task.ti: ffff880fe8ae4000
RIP: 0010:[<ffffffff814a9279>] [<ffffffff814a9279>] intel_pstate_timer_func+0x179/0x3d0
RSP: 0018:ffff883fff4e3db8 EFLAGS: 00010206
RAX: 0000000027100000 RBX: ffff883fe6965100 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000010 RDI: 000000002e53632d
RBP: ffff883fff4e3e20 R08: 000e6f69a5a125c0 R09: ffff883fe84ec001
R10: 0000000000000002 R11: 0000000000000005 R12: 00000000000049f5
R13: 0000000000271000 R14: 00000000000049f5 R15: 0000000000000246
FS: 0000000000000000(0000) GS:ffff883fff4e0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7668601000 CR3: 000000000190a000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff883fff4e3e58 ffffffff81099dc1 0000000000000086 0000000000000071
ffff883fff4f3680 0000000000000071 fbdc8a965e33afee ffffffff810b69dd
ffff883fe84ec000 ffff883fe6965108 0000000000000100 ffffffff814a9100
Call Trace:
<IRQ>

[<ffffffff81099dc1>] ? run_posix_cpu_timers+0x51/0x840
[<ffffffff810b69dd>] ? trigger_load_balance+0x5d/0x200
[<ffffffff814a9100>] ? pid_param_set+0x130/0x130
[<ffffffff8107df56>] call_timer_fn+0x36/0x110
[<ffffffff814a9100>] ? pid_param_set+0x130/0x130
[<ffffffff8107fdcf>] run_timer_softirq+0x21f/0x320
[<ffffffff81077b2f>] __do_softirq+0xef/0x280
[<ffffffff816156dc>] call_softirq+0x1c/0x30
[<ffffffff81015d95>] do_softirq+0x65/0xa0
[<ffffffff81077ec5>] irq_exit+0x115/0x120
[<ffffffff81616355>] smp_apic_timer_interrupt+0x45/0x60
[<ffffffff81614a1d>] apic_timer_interrupt+0x6d/0x80
<EOI>

[<ffffffff814a9c32>] ? cpuidle_enter_state+0x52/0xc0
[<ffffffff814a9c28>] ? cpuidle_enter_state+0x48/0xc0
[<ffffffff814a9d65>] cpuidle_idle_call+0xc5/0x200
[<ffffffff8101d14e>] arch_cpu_idle+0xe/0x30
[<ffffffff810c67c1>] cpu_startup_entry+0xf1/0x290
[<ffffffff8104228a>] start_secondary+0x1ba/0x230
Code: 42 0f 00 45 89 e6 48 01 c2 43 8d 44 6d 00 39 d0 73 26 49 c1 e5 08 89 d2 4d 63 f4 49 63 c5 48 c1 e2 08 48 c1 e0 08 48 63 ca 48 99 <48> f7 f9 48 98 4c 0f af f0 49 c1 ee 08 8b 43 78 c1 e0 08 44 29
RIP [<ffffffff814a9279>] intel_pstate_timer_func+0x179/0x3d0
RSP <ffff883fff4e3db8>

The kernel values for cpudata for CPU 113 were:

struct cpudata {
cpu = 113,
timer = {
entry = {
next = 0x0,
prev = 0xdead000000200200
},
expires = 8357799745,
base = 0xffff883fe84ec001,
function = 0xffffffff814a9100 <intel_pstate_timer_func>,
data = 18446612406765768960,
<snip>
i_gain = 0,
d_gain = 0,
deadband = 0,
last_err = 22489
},
last_sample_time = {
tv64 = 4063132438017305
},
prev_aperf = 287326796397463,
prev_mperf = 251427432090198,
sample = {
core_pct_busy = 23081,
aperf = 2937407,
mperf = 3257884,
freq = 2524484,
time = {
tv64 = 4063149215234118
}
}
}

which results in the time between samples = last_sample_time - sample.time
= 4063149215234118 - 4063132438017305 = 16777216813 which is 16.777 seconds.

The duration between reads of the APERF and MPERF registers overflowed a s32
sized integer in intel_pstate_get_scaled_busy()'s call to div_fp(). The result
is that int_tofp(duration_us) == 0, and the kernel attempts to divide by 0.

While the kernel shouldn't be delaying for a long time, it can and does
happen and the intel_pstate driver should not panic in this situation. This
patch changes the div_fp() function to use div64_s64() to allow for "long"
division. This will avoid the overflow condition on long delays.

[v2]: use div64_s64() in div_fp()

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 6c1e4591 01-Jun-2015 Doug Smythies <doug.smythies@gmail.com>

intel_pstate: Force setting target pstate when required

During initialization and exit it is possible that the target pstate
might not actually be set. Furthermore, the result can be that the
driver

intel_pstate: Force setting target pstate when required

During initialization and exit it is possible that the target pstate
might not actually be set. Furthermore, the result can be that the
driver and the processor are out of synch and, under some conditions,
the driver might never send the processor the proper target pstate.

This patch adds a bypass or do_checks flag to the call to
intel_pstate_set_pstate. If bypass, then specifically bypass clamp
checks and the do not send if it is the same as last time check. If
do_checks, then, and as before, do the current policy clamp checks,
and do not do actual send if the new target is the same as the old.

Signed-off-by: Doug Smythies <dsmythies@telus.net>
Reported-by: Marien Zwart <marien.zwart@gmail.com>
Reported-by: Alex Lochmann <alexander.lochmann@tu-dortmund.de>
Reported-by: Piotr Ko?aczkowski <pkolaczk@gmail.com>
Reported-by: Clemens Eisserer <linuxhippy@gmail.com>
Tested-by: Marien Zwart <marien.zwart@gmail.com>
Tested-by: Doug Smythies <dsmythies@telus.net>
[ rjw: Dropped pointless symbol definitions, rebased ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


1...<<111213141516