#
157386b6 |
| 04-Dec-2015 |
Philippe Longepe <philippe.longepe@intel.com> |
cpufreq: intel_pstate: Configurable algorithm to get target pstate
Target systems using different cpus have different power and performance requirements. They may use different algorithms to get the
cpufreq: intel_pstate: Configurable algorithm to get target pstate
Target systems using different cpus have different power and performance requirements. They may use different algorithms to get the next P-state based on their power or performance preference.
For example, power-constrained systems may not want to use high-performance P-states as aggressively as a full-size desktop or a server platform. A server platform may want to run close to the max to achieve better performance, while laptop-like systems may prefer sacrificing performance for longer battery lifes.
For the above reasons, modify intel_pstate to allow the target P-state selection algorithm to be depend on the CPU ID.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Philippe Longepe <philippe.longepe@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
584ee3dc |
| 18-Nov-2015 |
Alexandra Yates <alexandra.yates@linux.intel.com> |
intel_pstate: Fix "performance" mode behavior with HWP enabled
If hardware-driven P-state selection (HWP) is enabled, the "performance" mode of intel_pstate should only allow the processor to use th
intel_pstate: Fix "performance" mode behavior with HWP enabled
If hardware-driven P-state selection (HWP) is enabled, the "performance" mode of intel_pstate should only allow the processor to use the highest-performance P-state available. That is not the case currently, so make it actually happen.
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Alexandra Yates <alexandra.yates@linux.intel.com> [ rjw: Subject and changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
785ee278 |
| 20-Nov-2015 |
Prarit Bhargava <prarit@redhat.com> |
cpufreq: intel_pstate: Fix limits->max_perf rounding error
A rounding error was found in the calculation of limits->max_perf in intel_pstate_set_policy(), which is used to calculate the max and min
cpufreq: intel_pstate: Fix limits->max_perf rounding error
A rounding error was found in the calculation of limits->max_perf in intel_pstate_set_policy(), which is used to calculate the max and min pstate values in intel_pstate_get_min_max(). In that code, limits->max_perf is truncated to 2 hex digits such that, for example, 0x169 was incorrectly calculated to 0x16 instead of 0x17. This resulted in the pstate being set one level too low. This patch rounds the value of limits->max_perf up instead of down so that the correct max pstate can be reached.
Signed-off-by: Prarit Bhargava <prarit@redhat.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
8478f539 |
| 20-Nov-2015 |
Prarit Bhargava <prarit@redhat.com> |
cpufreq: intel_pstate: Fix limits->max_policy_pct rounding error
I have a Intel (6,63) processor with a "marketing" frequency (from /proc/cpuinfo) of 2100MHz, and a max turbo frequency of 2600MHz.
cpufreq: intel_pstate: Fix limits->max_policy_pct rounding error
I have a Intel (6,63) processor with a "marketing" frequency (from /proc/cpuinfo) of 2100MHz, and a max turbo frequency of 2600MHz. I can execute
cpupower frequency-set -g powersave --min 1200MHz --max 2100MHz
and the max_freq_pct is set to 80. When adding load to the system I noticed that the cpu frequency only reached 2000MHZ and not 2100MHz as expected.
This is because limits->max_policy_pct is calculated as 2100 * 100 /2600 = 80.7 and is rounded down to 80 when it should be rounded up to 81. This patch adds a DIV_ROUND_UP() which will return the correct value.
Signed-off-by: Prarit Bhargava <prarit@redhat.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
1421df63 |
| 09-Nov-2015 |
Philippe Longepe <philippe.longepe@linux.intel.com> |
cpufreq: intel_pstate: Add separate support for Airmont cores
There are two flavors of Atom cores to be supported by intel_pstate, Silvermont and Airmont, so make the driver distinguish between them
cpufreq: intel_pstate: Add separate support for Airmont cores
There are two flavors of Atom cores to be supported by intel_pstate, Silvermont and Airmont, so make the driver distinguish between them by adding separate frequency tables.
Separate the CPU defaults params for each of them and match the CPU IDs against them as appropriate.
Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> Signed-off-by: Stephane Gasparini <stephane.gasparini@linux.intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject and changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
938d21a2 |
| 09-Nov-2015 |
Philippe Longepe <philippe.longepe@linux.intel.com> |
cpufreq: intel_pstate: Replace BYT with ATOM
Rename symbol and function names starting with "BYT" or "byt" to start with "ATOM" or "atom", respectively, so as to make it clear that they may apply to
cpufreq: intel_pstate: Replace BYT with ATOM
Rename symbol and function names starting with "BYT" or "byt" to start with "ATOM" or "atom", respectively, so as to make it clear that they may apply to Atom in general and not just to Baytrail (the goal is to support several Atoms architectures eventually).
This should not lead to any functional changes.
Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> Signed-off-by: Stephane Gasparini <stephane.gasparini@linux.intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw : Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
6ee11e41 |
| 18-Nov-2015 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
Revert "cpufreq: intel_pstate: Use ACPI perf configuration"
Revert commit 37afb0003242 (cpufreq: intel_pstate: Use ACPI perf configuration) that is reported to cause a regression to happen on a syst
Revert "cpufreq: intel_pstate: Use ACPI perf configuration"
Revert commit 37afb0003242 (cpufreq: intel_pstate: Use ACPI perf configuration) that is reported to cause a regression to happen on a system where invalid data are returned by the ACPI _PSS object.
Since that commit makes assumptions regarding the _PSS output correctness that may turn out to be overly optimistic in general, there is a concern that it may introduce regression on more systems, so it's better to revert it now and we'll revisit the underlying issue in the next cycle with a more robust solution.
Conflicts: drivers/cpufreq/intel_pstate.c
Fixes: 37afb0003242 (cpufreq: intel_pstate: Use ACPI perf configuration) Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
799281a3 |
| 18-Nov-2015 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
Revert "cpufreq: intel_pstate: Avoid calculation for max/min"
Revert commit 4ef451487019 (cpufreq: intel_pstate: Avoid calculation for max/min) as it depends on commit 37afb0003242 (cpufreq: intel_p
Revert "cpufreq: intel_pstate: Avoid calculation for max/min"
Revert commit 4ef451487019 (cpufreq: intel_pstate: Avoid calculation for max/min) as it depends on commit 37afb0003242 (cpufreq: intel_pstate: Use ACPI perf configuration) that causes problems to happen and needs to be reverted.
Conflicts: drivers/cpufreq/intel_pstate.c
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
539342f6 |
| 22-Oct-2015 |
Prarit Bhargava <prarit@redhat.com> |
intel_pstate: decrease number of "HWP enabled" messages
When booting an HWP enabled system the kernel displays one "HWP enabled" message for each cpu. The messages are superfluous since HWP is glob
intel_pstate: decrease number of "HWP enabled" messages
When booting an HWP enabled system the kernel displays one "HWP enabled" message for each cpu. The messages are superfluous since HWP is globally enabled across all CPUs. This patch also adds an informational message when HWP is disabled via intel_pstate=no_hwp.
Signed-off-by: Prarit Bhargava <prarit@redhat.com> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
51443fbf |
| 15-Oct-2015 |
Prarit Bhargava <prarit@redhat.com> |
cpufreq: intel_pstate: Fix intel_pstate powersave min_perf_pct value
On systems that initialize the intel_pstate driver with the performance governor, and then switch to the powersave governor will
cpufreq: intel_pstate: Fix intel_pstate powersave min_perf_pct value
On systems that initialize the intel_pstate driver with the performance governor, and then switch to the powersave governor will not transition to lower cpu frequencies until /sys/devices/system/cpu/intel_pstate/min_perf_pct is set to a low value.
The behavior of governor switching changed after commit a04759924e25 ("[cpufreq] intel_pstate: honor user space min_perf_pct override on resume"). The commit introduced tracking of performance percentage changes via sysfs in order to restore userspace changes during suspend/resume. The problem occurs because the global values of the newly introduced max_sysfs_pct and min_sysfs_pct are not lowered on the governor change and this causes the powersave governor to inherit the performance governor's settings.
A simple change would have been to reset max_sysfs_pct to 100 and min_sysfs_pct to 0 on a governor change, which fixes the problem with governor switching. However, since we cannot break userspace[1] the fix is now to give each governor its own limits storage area so that governor specific changes are tracked.
I successfully tested this by booting with both the performance governor and the powersave governor by default, and switching between the two governors (while monitoring /sys/devices/system/cpu/intel_pstate/ values, and looking at the output of cpupower frequency-info). Suspend/Resume testing was performed by Doug Smythies.
[1] Systems which suspend/resume using the unmaintained pm-utils package will always transition to the performance governor before the suspend and after the resume. This means a system using the powersave governor will go from powersave to performance, then suspend/resume, performance to powersave. The simple change during governor changes would have been overwritten when the governor changed before and after the suspend/resume. I have submitted https://bugzilla.redhat.com/show_bug.cgi?id=1271225 against Fedora to remove the 94cpufreq file that causes the problem. It should be noted that pm-utils is obsoleted with newer versions of systemd.
Signed-off-by: Prarit Bhargava <prarit@redhat.com> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
8e601a9f |
| 15-Oct-2015 |
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> |
cpufreq: intel_pstate: Fix divide by zero on Knights Landing (KNL)
This is a workaround for KNL platform, where in some cases MPERF counter will not have updated value before next read of MSR_IA32_M
cpufreq: intel_pstate: Fix divide by zero on Knights Landing (KNL)
This is a workaround for KNL platform, where in some cases MPERF counter will not have updated value before next read of MSR_IA32_MPERF. In this case divide by zero will occur. This change ignores current sample for busy calculation in this case.
Fixes: b34ef932d79a (intel_pstate: Knights Landing support) Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com> Cc: 4.1+ <stable@vger.kernel.org> # 4.1+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
4ef45148 |
| 14-Oct-2015 |
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> |
cpufreq: intel_pstate: Avoid calculation for max/min
When requested from cpufreq to set policy, look into _pss and get control values, instead of using max/min perf calculations. These calculation m
cpufreq: intel_pstate: Avoid calculation for max/min
When requested from cpufreq to set policy, look into _pss and get control values, instead of using max/min perf calculations. These calculation misses next control state in boundary conditions.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
37afb000 |
| 14-Oct-2015 |
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> |
cpufreq: intel_pstate: Use ACPI perf configuration
Use ACPI _PSS to limit the Intel P State turbo, max and min ratios. This driver uses acpi processor perf lib calls to register performance. The fol
cpufreq: intel_pstate: Use ACPI perf configuration
Use ACPI _PSS to limit the Intel P State turbo, max and min ratios. This driver uses acpi processor perf lib calls to register performance. The following logic is used to adjust Intel P state driver limits: - If there is no turbo entry in _PSS, then disable Intel P state turbo and limit to non turbo max - If the non turbo max ratio is more than _PSS max non turbo value, then set the max non turbo ratio to _PSS non turbo max - If the min ratio is less than _PSS min then change the min ratio matching _PSS min - Scale the _PSS turbo frequency to max turbo frequency based on control value. This feature can be disabled by using kernel parameters: intel_pstate=no_acpi
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
3bcc6fa9 |
| 14-Oct-2015 |
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> |
cpufreq: intel-pstate: Use separate max pstate for scaling
Systems with configurable TDP have multiple max non turbo p state. Intel P state uses max non turbo P state for scaling. But using the real
cpufreq: intel-pstate: Use separate max pstate for scaling
Systems with configurable TDP have multiple max non turbo p state. Intel P state uses max non turbo P state for scaling. But using the real max non turbo p state causes underestimation of next P state. So using the physical max non turbo P state as before for scaling.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
6a35fc2d |
| 14-Oct-2015 |
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> |
cpufreq: intel_pstate: get P1 from TAR when available
After Ivybridge, the max non turbo ratio obtained from platform info msr is not always guaranteed P1 on client platforms. The max non turbo acti
cpufreq: intel_pstate: get P1 from TAR when available
After Ivybridge, the max non turbo ratio obtained from platform info msr is not always guaranteed P1 on client platforms. The max non turbo activation ratio (TAR), determines the max for the current level of TDP. The ratio in platform info is physical max. The TAR MSR can be locked, so updating this value is not possible on all platforms. This change gets this ratio from MSR TURBO_ACTIVATION_RATIO if available, but also do some sanity checking to make sure that this value is correct. The sanity check involves reading the TDP ratio for the current tdp control value when platform has configurable TDP present and matching TAC with this.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
74da56ce |
| 09-Sep-2015 |
Kristen Carlson Accardi <kristen@linux.intel.com> |
intel_pstate: fix PCT_TO_HWP macro
PCT_TO_HWP does not take the actual range of pstates exported by HWP_CAPABILITIES in account, and is broken on most platforms. Remove the macro and set the min and
intel_pstate: fix PCT_TO_HWP macro
PCT_TO_HWP does not take the actual range of pstates exported by HWP_CAPABILITIES in account, and is broken on most platforms. Remove the macro and set the min and max pstate for hwp by determining the range and adjusting by the min and max percent limits values.
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
43717aad |
| 09-Sep-2015 |
Chen Yu <yu.c.chen@intel.com> |
intel_pstate: Fix user input of min/max to legal policy region
In current code, max_perf_pct might be smaller than min_perf_pct by improper user input:
$ grep . /sys/devices/system/cpu/intel_pstate
intel_pstate: Fix user input of min/max to legal policy region
In current code, max_perf_pct might be smaller than min_perf_pct by improper user input:
$ grep . /sys/devices/system/cpu/intel_pstate/m*_perf_pct /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 /sys/devices/system/cpu/intel_pstate/min_perf_pct:100
$ echo 80 > /sys/devices/system/cpu/intel_pstate/max_perf_pct
$ grep . /sys/devices/system/cpu/intel_pstate/m*_perf_pct /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 /sys/devices/system/cpu/intel_pstate/min_perf_pct:100
Fix this problem by 2 steps: 1. Normalize the user input to [min_policy, max_policy]. 2. Make sure max_perf_pct>=min_perf_pct, suggested by Seiichi Ikarashi.
Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
5aecc3c8 |
| 04-Aug-2015 |
Ethan Zhao <ethan.zhao@oracle.com> |
intel_pstate: append more Oracle OEM table id to vendor bypass list
Append more Oracle X86 servers that have their own power management,
SUN FIRE X4275 M3 SUN FIRE X4170 M3 and SUN FIRE X6-2
Signe
intel_pstate: append more Oracle OEM table id to vendor bypass list
Append more Oracle X86 servers that have their own power management,
SUN FIRE X4275 M3 SUN FIRE X4170 M3 and SUN FIRE X6-2
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
1c939123 |
| 05-Aug-2015 |
Kristen Carlson Accardi <kristen@linux.intel.com> |
intel_pstate: Add SKY-S support
Whitelist the SKL-S processor
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
144c8e17 |
| 29-Jul-2015 |
Chen Yu <yu.c.chen@intel.com> |
intel_pstate: Fix possible overflow complained by Coverity
Coverity scanning performed on intel_pstate.c shows possible overflow when doing left shifting: val = pstate << 8; since pstate is of type
intel_pstate: Fix possible overflow complained by Coverity
Coverity scanning performed on intel_pstate.c shows possible overflow when doing left shifting: val = pstate << 8; since pstate is of type integer, while val is of u64, left shifting pstate might lead to potential loss of upper bits. Say, if pstate equals 0x4000 0000, after pstate << 8 we will get zero assigned to val. Although pstate will not likely be that big, this patch cast the left operand to u64 before performing the left shift, to avoid complaining from Coverity.
Reported-by: Coquard, Christophe <christophe.coquard@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
69cefc27 |
| 21-Jul-2015 |
Lukasz Anaczkowski <lukasz.anaczkowski@intel.com> |
intel_pstate: Add get_scaling cpu_defaults param to Knights Landing
Scaling for Knights Landing is same as the default scaling (100000). When Knigts Landing support was added to the pstate driver, t
intel_pstate: Add get_scaling cpu_defaults param to Knights Landing
Scaling for Knights Landing is same as the default scaling (100000). When Knigts Landing support was added to the pstate driver, this parameter was omitted resulting in a kernel panic during boot.
Fixes: b34ef932d79a (intel_pstate: Knights Landing support) Reported-by: Yasuaki Ishimatsu <yishimat@redhat.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com> Cc: 4.1+ <stable@vger.kernel.org> # 4.1+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
ba88d433 |
| 14-Jul-2015 |
Kristen Carlson Accardi <kristen@linux.intel.com> |
intel_pstate: enable HWP per CPU
HWP previously was only enabled at driver load time, on the boot CPU, however, HWP must be enabled per package. Move the code to enable HWP to the cpufreq driver ini
intel_pstate: enable HWP per CPU
HWP previously was only enabled at driver load time, on the boot CPU, however, HWP must be enabled per package. Move the code to enable HWP to the cpufreq driver init path so that it will be called per CPU.
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Tested-by: David Zhuang <david.zhuang@oracle.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
4ea1636b |
| 25-Jun-2015 |
Andy Lutomirski <luto@kernel.org> |
x86/asm/tsc: Rename native_read_tsc() to rdtsc()
Now that there is no paravirt TSC, the "native" is inappropriate. The function does RDTSC, so give it the obvious name: rdtsc().
Suggested-by: Boris
x86/asm/tsc: Rename native_read_tsc() to rdtsc()
Now that there is no paravirt TSC, the "native" is inappropriate. The function does RDTSC, so give it the obvious name: rdtsc().
Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/fd43e16281991f096c1e4d21574d9e1402c62d39.1434501121.git.luto@kernel.org [ Ported it to v4.2-rc1. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
#
7180dddf |
| 15-Jun-2015 |
Prarit Bhargava <prarit@redhat.com> |
intel_pstate: Fix overflow in busy_scaled due to long delay
The kernel may delay interrupts for a long time which can result in timers being delayed. If this occurs the intel_pstate driver will cras
intel_pstate: Fix overflow in busy_scaled due to long delay
The kernel may delay interrupts for a long time which can result in timers being delayed. If this occurs the intel_pstate driver will crash with a divide by zero error:
divide error: 0000 [#1] SMP Modules linked in: btrfs zlib_deflate raid6_pq xor msdos ext4 mbcache jbd2 binfmt_misc arc4 md4 nls_utf8 cifs dns_resolver tcp_lp bnep bluetooth rfkill fuse dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ftp ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables intel_powerclamp coretemp vfat fat kvm_intel iTCO_wdt iTCO_vendor_support ipmi_devintf sr_mod kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel cdc_ether lrw usbnet cdrom mii gf128mul glue_helper ablk_helper cryptd lpc_ich mfd_core pcspkr sb_edac edac_core ipmi_si ipmi_msghandler ioatdma wmi shpchp acpi_pad nfsd auth_rpcgss nfs_acl lockd uinput dm_multipath sunrpc xfs libcrc32c usb_storage sd_mod crc_t10dif crct10dif_common ixgbe mgag200 syscopyarea sysfillrect sysimgblt mdio drm_kms_helper ttm igb drm ptp pps_core dca i2c_algo_bit megaraid_sas i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU: 113 PID: 0 Comm: swapper/113 Tainted: G W -------------- 3.10.0-229.1.2.el7.x86_64 #1 Hardware name: IBM x3950 X6 -[3837AC2]-/00FN827, BIOS -[A8E112BUS-1.00]- 08/27/2014 task: ffff880fe8abe660 ti: ffff880fe8ae4000 task.ti: ffff880fe8ae4000 RIP: 0010:[<ffffffff814a9279>] [<ffffffff814a9279>] intel_pstate_timer_func+0x179/0x3d0 RSP: 0018:ffff883fff4e3db8 EFLAGS: 00010206 RAX: 0000000027100000 RBX: ffff883fe6965100 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000010 RDI: 000000002e53632d RBP: ffff883fff4e3e20 R08: 000e6f69a5a125c0 R09: ffff883fe84ec001 R10: 0000000000000002 R11: 0000000000000005 R12: 00000000000049f5 R13: 0000000000271000 R14: 00000000000049f5 R15: 0000000000000246 FS: 0000000000000000(0000) GS:ffff883fff4e0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7668601000 CR3: 000000000190a000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff883fff4e3e58 ffffffff81099dc1 0000000000000086 0000000000000071 ffff883fff4f3680 0000000000000071 fbdc8a965e33afee ffffffff810b69dd ffff883fe84ec000 ffff883fe6965108 0000000000000100 ffffffff814a9100 Call Trace: <IRQ>
[<ffffffff81099dc1>] ? run_posix_cpu_timers+0x51/0x840 [<ffffffff810b69dd>] ? trigger_load_balance+0x5d/0x200 [<ffffffff814a9100>] ? pid_param_set+0x130/0x130 [<ffffffff8107df56>] call_timer_fn+0x36/0x110 [<ffffffff814a9100>] ? pid_param_set+0x130/0x130 [<ffffffff8107fdcf>] run_timer_softirq+0x21f/0x320 [<ffffffff81077b2f>] __do_softirq+0xef/0x280 [<ffffffff816156dc>] call_softirq+0x1c/0x30 [<ffffffff81015d95>] do_softirq+0x65/0xa0 [<ffffffff81077ec5>] irq_exit+0x115/0x120 [<ffffffff81616355>] smp_apic_timer_interrupt+0x45/0x60 [<ffffffff81614a1d>] apic_timer_interrupt+0x6d/0x80 <EOI>
[<ffffffff814a9c32>] ? cpuidle_enter_state+0x52/0xc0 [<ffffffff814a9c28>] ? cpuidle_enter_state+0x48/0xc0 [<ffffffff814a9d65>] cpuidle_idle_call+0xc5/0x200 [<ffffffff8101d14e>] arch_cpu_idle+0xe/0x30 [<ffffffff810c67c1>] cpu_startup_entry+0xf1/0x290 [<ffffffff8104228a>] start_secondary+0x1ba/0x230 Code: 42 0f 00 45 89 e6 48 01 c2 43 8d 44 6d 00 39 d0 73 26 49 c1 e5 08 89 d2 4d 63 f4 49 63 c5 48 c1 e2 08 48 c1 e0 08 48 63 ca 48 99 <48> f7 f9 48 98 4c 0f af f0 49 c1 ee 08 8b 43 78 c1 e0 08 44 29 RIP [<ffffffff814a9279>] intel_pstate_timer_func+0x179/0x3d0 RSP <ffff883fff4e3db8>
The kernel values for cpudata for CPU 113 were:
struct cpudata { cpu = 113, timer = { entry = { next = 0x0, prev = 0xdead000000200200 }, expires = 8357799745, base = 0xffff883fe84ec001, function = 0xffffffff814a9100 <intel_pstate_timer_func>, data = 18446612406765768960, <snip> i_gain = 0, d_gain = 0, deadband = 0, last_err = 22489 }, last_sample_time = { tv64 = 4063132438017305 }, prev_aperf = 287326796397463, prev_mperf = 251427432090198, sample = { core_pct_busy = 23081, aperf = 2937407, mperf = 3257884, freq = 2524484, time = { tv64 = 4063149215234118 } } }
which results in the time between samples = last_sample_time - sample.time = 4063149215234118 - 4063132438017305 = 16777216813 which is 16.777 seconds.
The duration between reads of the APERF and MPERF registers overflowed a s32 sized integer in intel_pstate_get_scaled_busy()'s call to div_fp(). The result is that int_tofp(duration_us) == 0, and the kernel attempts to divide by 0.
While the kernel shouldn't be delaying for a long time, it can and does happen and the intel_pstate driver should not panic in this situation. This patch changes the div_fp() function to use div64_s64() to allow for "long" division. This will avoid the overflow condition on long delays.
[v2]: use div64_s64() in div_fp()
Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
6c1e4591 |
| 01-Jun-2015 |
Doug Smythies <doug.smythies@gmail.com> |
intel_pstate: Force setting target pstate when required
During initialization and exit it is possible that the target pstate might not actually be set. Furthermore, the result can be that the driver
intel_pstate: Force setting target pstate when required
During initialization and exit it is possible that the target pstate might not actually be set. Furthermore, the result can be that the driver and the processor are out of synch and, under some conditions, the driver might never send the processor the proper target pstate.
This patch adds a bypass or do_checks flag to the call to intel_pstate_set_pstate. If bypass, then specifically bypass clamp checks and the do not send if it is the same as last time check. If do_checks, then, and as before, do the current policy clamp checks, and do not do actual send if the new target is the same as the old.
Signed-off-by: Doug Smythies <dsmythies@telus.net> Reported-by: Marien Zwart <marien.zwart@gmail.com> Reported-by: Alex Lochmann <alexander.lochmann@tu-dortmund.de> Reported-by: Piotr Ko?aczkowski <pkolaczk@gmail.com> Reported-by: Clemens Eisserer <linuxhippy@gmail.com> Tested-by: Marien Zwart <marien.zwart@gmail.com> Tested-by: Doug Smythies <dsmythies@telus.net> [ rjw: Dropped pointless symbol definitions, rebased ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|