History log of /openbmc/linux/arch/x86/kernel/cpu/mce/core.c (Results 1 – 25 of 544)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.6.67, v6.6.66, v6.6.65, v6.6.64, v6.6.63, v6.6.62, v6.6.61, v6.6.60, v6.6.59, v6.6.58, v6.6.57, v6.6.56, v6.6.55, v6.6.54, v6.6.53, v6.6.52, v6.6.51, v6.6.50, v6.6.49, v6.6.48, v6.6.47, v6.6.46, v6.6.45, v6.6.44, v6.6.43, v6.6.42, v6.6.41, v6.6.40, v6.6.39, v6.6.38, v6.6.37, v6.6.36, v6.6.35, v6.6.34, v6.6.33, v6.6.32, v6.6.31, v6.6.30, v6.6.29, v6.6.28, v6.6.27
# 86aa961b 10-Apr-2024 Andrew Jeffery <andrew@codeconstruct.com.au>

Merge tag 'v6.6.26' into dev-6.6

This is the 6.6.26 stable release


Revision tags: v6.6.26, v6.6.25, v6.6.24, v6.6.23
# 5a02df3e 13-Mar-2024 Borislav Petkov (AMD) <bp@alien8.de>

x86/mce: Make sure to grab mce_sysfs_mutex in set_bank()

commit 3ddf944b32f88741c303f0b21459dbb3872b8bc5 upstream.

Modifying a MCA bank's MCA_CTL bits which control which error types to
be reported

x86/mce: Make sure to grab mce_sysfs_mutex in set_bank()

commit 3ddf944b32f88741c303f0b21459dbb3872b8bc5 upstream.

Modifying a MCA bank's MCA_CTL bits which control which error types to
be reported is done over

/sys/devices/system/machinecheck/
├── machinecheck0
│   ├── bank0
│   ├── bank1
│   ├── bank10
│   ├── bank11
...

sysfs nodes by writing the new bit mask of events to enable.

When the write is accepted, the kernel deletes all current timers and
reinits all banks.

Doing that in parallel can lead to initializing a timer which is already
armed and in the timer wheel, i.e., in use already:

ODEBUG: init active (active state 0) object: ffff888063a28000 object
type: timer_list hint: mce_timer_fn+0x0/0x240 arch/x86/kernel/cpu/mce/core.c:2642
WARNING: CPU: 0 PID: 8120 at lib/debugobjects.c:514
debug_print_object+0x1a0/0x2a0 lib/debugobjects.c:514

Fix that by grabbing the sysfs mutex as the rest of the MCA sysfs code
does.

Reported by: Yue Sun <samsun1006219@gmail.com>
Reported by: xingwei lee <xrivendell7@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/CAEkJfYNiENwQY8yV1LYJ9LjJs%2Bx_-PqMv98gKig55=2vbzffRw@mail.gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

show more ...


# 87832e93 10-Feb-2024 Andrew Jeffery <andrew@codeconstruct.com.au>

Merge tag 'v6.6.16' into dev-6.6

This is the 6.6.16 stable release


Revision tags: v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6
# 04c6948d 25-Oct-2023 Zhiquan Li <zhiquan1.li@intel.com>

x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel

[ Upstream commit 9f3b130048bfa2e44a8cfb1b616f826d9d5d8188 ]

Memory errors don't happen very often, especially fatal ones

x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel

[ Upstream commit 9f3b130048bfa2e44a8cfb1b616f826d9d5d8188 ]

Memory errors don't happen very often, especially fatal ones. However,
in large-scale scenarios such as data centers, that probability
increases with the amount of machines present.

When a fatal machine check happens, mce_panic() is called based on the
severity grading of that error. The page containing the error is not
marked as poison.

However, when kexec is enabled, tools like makedumpfile understand when
pages are marked as poison and do not touch them so as not to cause
a fatal machine check exception again while dumping the previous
kernel's memory.

Therefore, mark the page containing the error as poisoned so that the
kexec'ed kernel can avoid accessing the page.

[ bp: Rewrite commit message and comment. ]

Co-developed-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Link: https://lore.kernel.org/r/20231014051754.3759099-1-zhiquan1.li@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3
# c900529f 12-Sep-2023 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-fixes into drm-misc-fixes

Forwarding to v6.6-rc1.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


Revision tags: v6.5.2, v6.1.51, v6.5.1
# 1ac731c5 30-Aug-2023 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'next' into for-linus

Prepare input updates for 6.6 merge window.


Revision tags: v6.1.50
# 28c59d94 28-Aug-2023 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'ras_core_for_v6.6_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 RAS updates from Borislav Petkov:

- Add a quirk for AMD Zen machines where Instruction Fetch uni

Merge tag 'ras_core_for_v6.6_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 RAS updates from Borislav Petkov:

- Add a quirk for AMD Zen machines where Instruction Fetch unit poison
consumption MCEs are not delivered synchronously but still within the
same context, which can lead to erroneously increased error severity
and unneeded kernel panics

- Do not log errors caught by polling shared MCA banks as they
materialize as duplicated error records otherwise

* tag 'ras_core_for_v6.6_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/MCE: Always save CS register on AMD Zen IF Poison errors
x86/mce: Prevent duplicate error records

show more ...


Revision tags: v6.5, v6.1.49, v6.1.48, v6.1.46
# 4240e2eb 14-Aug-2023 Yazen Ghannam <yazen.ghannam@amd.com>

x86/MCE: Always save CS register on AMD Zen IF Poison errors

The Instruction Fetch (IF) units on current AMD Zen-based systems do not
guarantee a synchronous #MC is delivered for poison consumption

x86/MCE: Always save CS register on AMD Zen IF Poison errors

The Instruction Fetch (IF) units on current AMD Zen-based systems do not
guarantee a synchronous #MC is delivered for poison consumption errors.
Therefore, MCG_STATUS[EIPV|RIPV] will not be set. However, the
microarchitecture does guarantee that the exception is delivered within
the same context. In other words, the exact rIP is not known, but the
context is known to not have changed.

There is no architecturally-defined method to determine this behavior.

The Code Segment (CS) register is always valid on such IF unit poison
errors regardless of the value of MCG_STATUS[EIPV|RIPV].

Add a quirk to save the CS register for poison consumption from the IF
unit banks.

This is needed to properly determine the context of the error.
Otherwise, the severity grading function will assume the context is
IN_KERNEL due to the m->cs value being 0 (the initialized value). This
leads to unnecessary kernel panics on data poison errors due to the
kernel believing the poison consumption occurred in kernel context.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230814200853.29258-1-yazen.ghannam@amd.com

show more ...


Revision tags: v6.1.45, v6.1.44
# 2612e3bb 07-Aug-2023 Rodrigo Vivi <rodrigo.vivi@intel.com>

Merge drm/drm-next into drm-intel-next

Catching-up with drm-next and drm-intel-gt-next.
It will unblock a code refactor around the platform
definitions (names vs acronyms).

Signed-off-by: Rodrigo V

Merge drm/drm-next into drm-intel-next

Catching-up with drm-next and drm-intel-gt-next.
It will unblock a code refactor around the platform
definitions (names vs acronyms).

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

show more ...


# 9f771739 07-Aug-2023 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Merge drm/drm-next into drm-intel-gt-next

Need to pull in b3e4aae612ec ("drm/i915/hdcp: Modify hdcp_gsc_message msg sending mechanism") as
a dependency for https://patchwork.freedesktop.org/series/1

Merge drm/drm-next into drm-intel-gt-next

Need to pull in b3e4aae612ec ("drm/i915/hdcp: Modify hdcp_gsc_message msg sending mechanism") as
a dependency for https://patchwork.freedesktop.org/series/121735/

Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

show more ...


Revision tags: v6.1.43, v6.1.42, v6.1.41
# 61b73694 24-Jul-2023 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-next into drm-misc-next

Backmerging to get v6.5-rc2.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


Revision tags: v6.1.40, v6.1.39
# c3629dd7 19-Jul-2023 Borislav Petkov (AMD) <bp@alien8.de>

x86/mce: Prevent duplicate error records

A legitimate use case of the MCA infrastructure is to have the firmware
log all uncorrectable errors and also, have the OS see all correctable
errors.

The u

x86/mce: Prevent duplicate error records

A legitimate use case of the MCA infrastructure is to have the firmware
log all uncorrectable errors and also, have the OS see all correctable
errors.

The uncorrectable, UCNA errors are usually configured to be reported
through an SMI. CMCI, which is the correctable error reporting
interrupt, uses SMI too and having both enabled, leads to unnecessary
overhead.

So what ends up happening is, people disable CMCI in the wild and leave
on only the UCNA SMI.

When CMCI is disabled, the MCA infrastructure resorts to polling the MCA
banks. If a MCA MSR is shared between the logical threads, one error
ends up getting logged multiple times as the polling runs on every
logical thread.

Therefore, introduce locking on the Intel side of the polling routine to
prevent such duplicate error records from appearing.

Based on a patch by Aristeu Rozanski <aris@ruivo.org>.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@ruivo.org>
Link: https://lore.kernel.org/r/20230515143225.GC4090740@cathedrallabs.org

show more ...


# 50501936 17-Jul-2023 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'v6.4' into next

Sync up with mainline to bring in updates to shared infrastructure.


# 0791faeb 17-Jul-2023 Mark Brown <broonie@kernel.org>

ASoC: Merge v6.5-rc2

Get a similar baseline to my other branches, and fixes for people using
the branch.


# 2f98e686 11-Jul-2023 Maxime Ripard <mripard@kernel.org>

Merge v6.5-rc1 into drm-misc-fixes

Boris needs 6.5-rc1 in drm-misc-fixes to prevent a conflict.

Signed-off-by: Maxime Ripard <mripard@kernel.org>


Revision tags: v6.1.38, v6.1.37
# 44f10dbe 30-Jun-2023 Andrew Morton <akpm@linux-foundation.org>

Merge branch 'master' into mm-hotfixes-stable


Revision tags: v6.1.36
# bc6cb4d5 27-Jun-2023 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:

- Introduce cmpxchg128() -- aka. the demise of cmpxchg_double()

Merge tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:

- Introduce cmpxchg128() -- aka. the demise of cmpxchg_double()

The cmpxchg128() family of functions is basically & functionally the
same as cmpxchg_double(), but with a saner interface.

Instead of a 6-parameter horror that forced u128 - u64/u64-halves
layout details on the interface and exposed users to complexity,
fragility & bugs, use a natural 3-parameter interface with u128
types.

- Restructure the generated atomic headers, and add kerneldoc comments
for all of the generic atomic{,64,_long}_t operations.

The generated definitions are much cleaner now, and come with
documentation.

- Implement lock_set_cmp_fn() on lockdep, for defining an ordering when
taking multiple locks of the same type.

This gets rid of one use of lockdep_set_novalidate_class() in the
bcache code.

- Fix raw_cpu_generic_try_cmpxchg() bug due to an unintended variable
shadowing generating garbage code on Clang on certain ARM builds.

* tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
locking/atomic: scripts: fix ${atomic}_dec_if_positive() kerneldoc
percpu: Fix self-assignment of __old in raw_cpu_generic_try_cmpxchg()
locking/atomic: treewide: delete arch_atomic_*() kerneldoc
locking/atomic: docs: Add atomic operations to the driver basic API documentation
locking/atomic: scripts: generate kerneldoc comments
docs: scripts: kernel-doc: accept bitwise negation like ~@var
locking/atomic: scripts: simplify raw_atomic*() definitions
locking/atomic: scripts: simplify raw_atomic_long*() definitions
locking/atomic: scripts: split pfx/name/sfx/order
locking/atomic: scripts: restructure fallback ifdeffery
locking/atomic: scripts: build raw_atomic_long*() directly
locking/atomic: treewide: use raw_atomic*_<op>()
locking/atomic: scripts: add trivial raw_atomic*_<op>()
locking/atomic: scripts: factor out order template generation
locking/atomic: scripts: remove leftover "${mult}"
locking/atomic: scripts: remove bogus order parameter
locking/atomic: xtensa: add preprocessor symbols
locking/atomic: x86: add preprocessor symbols
locking/atomic: sparc: add preprocessor symbols
locking/atomic: sh: add preprocessor symbols
...

show more ...


# aa35a483 26-Jun-2023 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'ras_core_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS updates from Borislav Petkov:

- Add initial support for RAS hardware found on AMD server GPUs (MI200

Merge tag 'ras_core_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS updates from Borislav Petkov:

- Add initial support for RAS hardware found on AMD server GPUs (MI200).

Those GPUs and CPUs are connected together through the coherent
fabric and the GPU memory controllers report errors through x86's MCA
so EDAC needs to support them. The amd64_edac driver supports now HBM
(High Bandwidth Memory) and thus such heterogeneous memory controller
systems

- Other small cleanups and improvements

* tag 'ras_core_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
EDAC/amd64: Cache and use GPU node map
EDAC/amd64: Add support for AMD heterogeneous Family 19h Model 30h-3Fh
EDAC/amd64: Document heterogeneous system enumeration
x86/MCE/AMD, EDAC/mce_amd: Decode UMC_V2 ECC errors
x86/amd_nb: Re-sort and re-indent PCI defines
x86/amd_nb: Add MI200 PCI IDs
ras/debugfs: Fix error checking for debugfs_create_dir()
x86/MCE: Check a hw error's address to determine proper recovery action

show more ...


Revision tags: v6.4, v6.1.35, v6.1.34
# 03c60192 12-Jun-2023 Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

Merge branch 'drm-next' of git://anongit.freedesktop.org/drm/drm into msm-next-lumag-base

Merge the drm-next tree to pick up the DRM DSC helpers (merged via
drm-intel-next tree). MSM DSC v1.2 patche

Merge branch 'drm-next' of git://anongit.freedesktop.org/drm/drm into msm-next-lumag-base

Merge the drm-next tree to pick up the DRM DSC helpers (merged via
drm-intel-next tree). MSM DSC v1.2 patches depend on these helpers.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

show more ...


Revision tags: v6.1.33
# 5c680050 06-Jun-2023 Miquel Raynal <miquel.raynal@bootlin.com>

Merge tag 'v6.4-rc4' into wpan-next/staging

Linux 6.4-rc4


Revision tags: v6.1.32
# 0f613bfa 05-Jun-2023 Mark Rutland <mark.rutland@arm.com>

locking/atomic: treewide: use raw_atomic*_<op>()

Now that we have raw_atomic*_<op>() definitions, there's no need to use
arch_atomic*_<op>() definitions outside of the low-level atomic
definitions.

locking/atomic: treewide: use raw_atomic*_<op>()

Now that we have raw_atomic*_<op>() definitions, there's no need to use
arch_atomic*_<op>() definitions outside of the low-level atomic
definitions.

Move treewide users of arch_atomic*_<op>() over to the equivalent
raw_atomic*_<op>().

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-19-mark.rutland@arm.com

show more ...


Revision tags: v6.1.31, v6.1.30
# 9c3a985f 17-May-2023 Rodrigo Vivi <rodrigo.vivi@intel.com>

Merge drm/drm-next into drm-intel-next

Backmerge to get some hwmon dependencies.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


Revision tags: v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14
# e40879b6 07-Jan-2021 Yazen Ghannam <yazen.ghannam@amd.com>

x86/MCE: Check a hw error's address to determine proper recovery action

Make sure that machine check errors with a usable address are properly
marked as poison.

This is needed for errors that occur

x86/MCE: Check a hw error's address to determine proper recovery action

Make sure that machine check errors with a usable address are properly
marked as poison.

This is needed for errors that occur on memory which have
MCG_STATUS[RIPV] clear - i.e., the interrupted process cannot be
restarted reliably. One example is data poison consumption through the
instruction fetch units on AMD Zen-based systems.

The MF_MUST_KILL flag is passed to memory_failure() when
MCG_STATUS[RIPV] is not set. So the associated process will still be
killed. What this does, practically, is get rid of one more check to
kill_current_task with the eventual goal to remove it completely.

Also, make the handling identical to what is done on the notifier path
(uc_decode_notifier() does that address usability check too).

The scenario described above occurs when hardware can precisely identify
the address of poisoned memory, but execution cannot reliably continue
for the interrupted hardware thread.

[ bp: Massage commit message. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20230322005131.174499-1-tony.luck@intel.com

show more ...


# 9a87ffc9 01-May-2023 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'next' into for-linus

Prepare input updates for 6.4 merge window.


# cdc780f0 26-Apr-2023 Jiri Kosina <jkosina@suse.cz>

Merge branch 'for-6.4/amd-sfh' into for-linus

- assorted functional fixes for amd-sfh driver (Basavaraj Natikar)


12345678910>>...22