History log of /openbmc/linux/drivers/net/ethernet/mellanox/mlx5/core/main.c (Results 1 – 25 of 1182)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.6.35, v6.6.34, v6.6.33
# 6ccada6f 03-Jun-2024 Shay Drory <shayd@nvidia.com>

net/mlx5: Always stop health timer during driver removal

[ Upstream commit c8b3f38d2dae0397944814d691a419c451f9906f ]

Currently, if teardown_hca fails to execute during driver removal, mlx5
does no

net/mlx5: Always stop health timer during driver removal

[ Upstream commit c8b3f38d2dae0397944814d691a419c451f9906f ]

Currently, if teardown_hca fails to execute during driver removal, mlx5
does not stop the health timer. Afterwards, mlx5 continue with driver
teardown. This may lead to a UAF bug, which results in page fault
Oops[1], since the health timer invokes after resources were freed.

Hence, stop the health monitor even if teardown_hca fails.

[1]
mlx5_core 0000:18:00.0: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
mlx5_core 0000:18:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
mlx5_core 0000:18:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
mlx5_core 0000:18:00.0: E-Switch: cleanup
mlx5_core 0000:18:00.0: wait_func:1155:(pid 1967079): TEARDOWN_HCA(0x103) timeout. Will cause a leak of a command resource
mlx5_core 0000:18:00.0: mlx5_function_close:1288:(pid 1967079): tear_down_hca failed, skip cleanup
BUG: unable to handle page fault for address: ffffa26487064230
PGD 100c00067 P4D 100c00067 PUD 100e5a067 PMD 105ed7067 PTE 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G OE ------- --- 6.7.0-68.fc38.x86_64 #1
Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0013.121520200651 12/15/2020
RIP: 0010:ioread32be+0x34/0x60
RSP: 0018:ffffa26480003e58 EFLAGS: 00010292
RAX: ffffa26487064200 RBX: ffff9042d08161a0 RCX: ffff904c108222c0
RDX: 000000010bbf1b80 RSI: ffffffffc055ddb0 RDI: ffffa26487064230
RBP: ffff9042d08161a0 R08: 0000000000000022 R09: ffff904c108222e8
R10: 0000000000000004 R11: 0000000000000441 R12: ffffffffc055ddb0
R13: ffffa26487064200 R14: ffffa26480003f00 R15: ffff904c108222c0
FS: 0000000000000000(0000) GS:ffff904c10800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffa26487064230 CR3: 00000002c4420006 CR4: 00000000007706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<IRQ>
? __die+0x23/0x70
? page_fault_oops+0x171/0x4e0
? exc_page_fault+0x175/0x180
? asm_exc_page_fault+0x26/0x30
? __pfx_poll_health+0x10/0x10 [mlx5_core]
? __pfx_poll_health+0x10/0x10 [mlx5_core]
? ioread32be+0x34/0x60
mlx5_health_check_fatal_sensors+0x20/0x100 [mlx5_core]
? __pfx_poll_health+0x10/0x10 [mlx5_core]
poll_health+0x42/0x230 [mlx5_core]
? __next_timer_interrupt+0xbc/0x110
? __pfx_poll_health+0x10/0x10 [mlx5_core]
call_timer_fn+0x21/0x130
? __pfx_poll_health+0x10/0x10 [mlx5_core]
__run_timers+0x222/0x2c0
run_timer_softirq+0x1d/0x40
__do_softirq+0xc9/0x2c8
__irq_exit_rcu+0xa6/0xc0
sysvec_apic_timer_interrupt+0x72/0x90
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20
RIP: 0010:cpuidle_enter_state+0xcc/0x440
? cpuidle_enter_state+0xbd/0x440
cpuidle_enter+0x2d/0x40
do_idle+0x20d/0x270
cpu_startup_entry+0x2a/0x30
rest_init+0xd0/0xd0
arch_call_rest_init+0xe/0x30
start_kernel+0x709/0xa90
x86_64_start_reservations+0x18/0x30
x86_64_start_kernel+0x96/0xa0
secondary_startup_64_no_verify+0x18f/0x19b
---[ end trace 0000000000000000 ]---

Fixes: 9b98d395b85d ("net/mlx5: Start health poll at earlier stage of driver load")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.35, v6.6.34, v6.6.33
# 6ccada6f 03-Jun-2024 Shay Drory <shayd@nvidia.com>

net/mlx5: Always stop health timer during driver removal

[ Upstream commit c8b3f38d2dae0397944814d691a419c451f9906f ]

Currently, if teardown_hca fails to execute during driver removal, mlx5
does no

net/mlx5: Always stop health timer during driver removal

[ Upstream commit c8b3f38d2dae0397944814d691a419c451f9906f ]

Currently, if teardown_hca fails to execute during driver removal, mlx5
does not stop the health timer. Afterwards, mlx5 continue with driver
teardown. This may lead to a UAF bug, which results in page fault
Oops[1], since the health timer invokes after resources were freed.

Hence, stop the health monitor even if teardown_hca fails.

[1]
mlx5_core 0000:18:00.0: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
mlx5_core 0000:18:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
mlx5_core 0000:18:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
mlx5_core 0000:18:00.0: E-Switch: cleanup
mlx5_core 0000:18:00.0: wait_func:1155:(pid 1967079): TEARDOWN_HCA(0x103) timeout. Will cause a leak of a command resource
mlx5_core 0000:18:00.0: mlx5_function_close:1288:(pid 1967079): tear_down_hca failed, skip cleanup
BUG: unable to handle page fault for address: ffffa26487064230
PGD 100c00067 P4D 100c00067 PUD 100e5a067 PMD 105ed7067 PTE 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G OE ------- --- 6.7.0-68.fc38.x86_64 #1
Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0013.121520200651 12/15/2020
RIP: 0010:ioread32be+0x34/0x60
RSP: 0018:ffffa26480003e58 EFLAGS: 00010292
RAX: ffffa26487064200 RBX: ffff9042d08161a0 RCX: ffff904c108222c0
RDX: 000000010bbf1b80 RSI: ffffffffc055ddb0 RDI: ffffa26487064230
RBP: ffff9042d08161a0 R08: 0000000000000022 R09: ffff904c108222e8
R10: 0000000000000004 R11: 0000000000000441 R12: ffffffffc055ddb0
R13: ffffa26487064200 R14: ffffa26480003f00 R15: ffff904c108222c0
FS: 0000000000000000(0000) GS:ffff904c10800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffa26487064230 CR3: 00000002c4420006 CR4: 00000000007706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<IRQ>
? __die+0x23/0x70
? page_fault_oops+0x171/0x4e0
? exc_page_fault+0x175/0x180
? asm_exc_page_fault+0x26/0x30
? __pfx_poll_health+0x10/0x10 [mlx5_core]
? __pfx_poll_health+0x10/0x10 [mlx5_core]
? ioread32be+0x34/0x60
mlx5_health_check_fatal_sensors+0x20/0x100 [mlx5_core]
? __pfx_poll_health+0x10/0x10 [mlx5_core]
poll_health+0x42/0x230 [mlx5_core]
? __next_timer_interrupt+0xbc/0x110
? __pfx_poll_health+0x10/0x10 [mlx5_core]
call_timer_fn+0x21/0x130
? __pfx_poll_health+0x10/0x10 [mlx5_core]
__run_timers+0x222/0x2c0
run_timer_softirq+0x1d/0x40
__do_softirq+0xc9/0x2c8
__irq_exit_rcu+0xa6/0xc0
sysvec_apic_timer_interrupt+0x72/0x90
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20
RIP: 0010:cpuidle_enter_state+0xcc/0x440
? cpuidle_enter_state+0xbd/0x440
cpuidle_enter+0x2d/0x40
do_idle+0x20d/0x270
cpu_startup_entry+0x2a/0x30
rest_init+0xd0/0xd0
arch_call_rest_init+0xe/0x30
start_kernel+0x709/0xa90
x86_64_start_reservations+0x18/0x30
x86_64_start_kernel+0x96/0xa0
secondary_startup_64_no_verify+0x18f/0x19b
---[ end trace 0000000000000000 ]---

Fixes: 9b98d395b85d ("net/mlx5: Start health poll at earlier stage of driver load")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.32, v6.6.31, v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26
# 8c91c608 09-Apr-2024 Shay Drory <shayd@nvidia.com>

net/mlx5: Register devlink first under devlink lock

[ Upstream commit c6e77aa9dd82bc18a89bf49418f8f7e961cfccc8 ]

In case device is having a non fatal FW error during probe, the
driver will report t

net/mlx5: Register devlink first under devlink lock

[ Upstream commit c6e77aa9dd82bc18a89bf49418f8f7e961cfccc8 ]

In case device is having a non fatal FW error during probe, the
driver will report the error to user via devlink. This will trigger
a WARN_ON, since mlx5 is calling devlink_register() last.
In order to avoid the WARN_ON[1], change mlx5 to invoke devl_register()
first under devlink lock.

[1]
WARNING: CPU: 5 PID: 227 at net/devlink/health.c:483 devlink_recover_notify.constprop.0+0xb8/0xc0
CPU: 5 PID: 227 Comm: kworker/u16:3 Not tainted 6.4.0-rc5_for_upstream_min_debug_2023_06_12_12_38 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_health0000:08:00.0 mlx5_fw_reporter_err_work [mlx5_core]
RIP: 0010:devlink_recover_notify.constprop.0+0xb8/0xc0
Call Trace:
<TASK>
? __warn+0x79/0x120
? devlink_recover_notify.constprop.0+0xb8/0xc0
? report_bug+0x17c/0x190
? handle_bug+0x3c/0x60
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? devlink_recover_notify.constprop.0+0xb8/0xc0
devlink_health_report+0x4a/0x1c0
mlx5_fw_reporter_err_work+0xa4/0xd0 [mlx5_core]
process_one_work+0x1bb/0x3c0
? process_one_work+0x3c0/0x3c0
worker_thread+0x4d/0x3c0
? process_one_work+0x3c0/0x3c0
kthread+0xc6/0xf0
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>

Fixes: cf530217408e ("devlink: Notify users when objects are accessible")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240409190820.227554-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.32, v6.6.31, v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26
# 8c91c608 09-Apr-2024 Shay Drory <shayd@nvidia.com>

net/mlx5: Register devlink first under devlink lock

[ Upstream commit c6e77aa9dd82bc18a89bf49418f8f7e961cfccc8 ]

In case device is having a non fatal FW error during probe, the
driver will report t

net/mlx5: Register devlink first under devlink lock

[ Upstream commit c6e77aa9dd82bc18a89bf49418f8f7e961cfccc8 ]

In case device is having a non fatal FW error during probe, the
driver will report the error to user via devlink. This will trigger
a WARN_ON, since mlx5 is calling devlink_register() last.
In order to avoid the WARN_ON[1], change mlx5 to invoke devl_register()
first under devlink lock.

[1]
WARNING: CPU: 5 PID: 227 at net/devlink/health.c:483 devlink_recover_notify.constprop.0+0xb8/0xc0
CPU: 5 PID: 227 Comm: kworker/u16:3 Not tainted 6.4.0-rc5_for_upstream_min_debug_2023_06_12_12_38 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_health0000:08:00.0 mlx5_fw_reporter_err_work [mlx5_core]
RIP: 0010:devlink_recover_notify.constprop.0+0xb8/0xc0
Call Trace:
<TASK>
? __warn+0x79/0x120
? devlink_recover_notify.constprop.0+0xb8/0xc0
? report_bug+0x17c/0x190
? handle_bug+0x3c/0x60
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? devlink_recover_notify.constprop.0+0xb8/0xc0
devlink_health_report+0x4a/0x1c0
mlx5_fw_reporter_err_work+0xa4/0xd0 [mlx5_core]
process_one_work+0x1bb/0x3c0
? process_one_work+0x3c0/0x3c0
worker_thread+0x4d/0x3c0
? process_one_work+0x3c0/0x3c0
kthread+0xc6/0xf0
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>

Fixes: cf530217408e ("devlink: Notify users when objects are accessible")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240409190820.227554-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.32, v6.6.31, v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26
# 8c91c608 09-Apr-2024 Shay Drory <shayd@nvidia.com>

net/mlx5: Register devlink first under devlink lock

[ Upstream commit c6e77aa9dd82bc18a89bf49418f8f7e961cfccc8 ]

In case device is having a non fatal FW error during probe, the
driver will report t

net/mlx5: Register devlink first under devlink lock

[ Upstream commit c6e77aa9dd82bc18a89bf49418f8f7e961cfccc8 ]

In case device is having a non fatal FW error during probe, the
driver will report the error to user via devlink. This will trigger
a WARN_ON, since mlx5 is calling devlink_register() last.
In order to avoid the WARN_ON[1], change mlx5 to invoke devl_register()
first under devlink lock.

[1]
WARNING: CPU: 5 PID: 227 at net/devlink/health.c:483 devlink_recover_notify.constprop.0+0xb8/0xc0
CPU: 5 PID: 227 Comm: kworker/u16:3 Not tainted 6.4.0-rc5_for_upstream_min_debug_2023_06_12_12_38 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_health0000:08:00.0 mlx5_fw_reporter_err_work [mlx5_core]
RIP: 0010:devlink_recover_notify.constprop.0+0xb8/0xc0
Call Trace:
<TASK>
? __warn+0x79/0x120
? devlink_recover_notify.constprop.0+0xb8/0xc0
? report_bug+0x17c/0x190
? handle_bug+0x3c/0x60
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? devlink_recover_notify.constprop.0+0xb8/0xc0
devlink_health_report+0x4a/0x1c0
mlx5_fw_reporter_err_work+0xa4/0xd0 [mlx5_core]
process_one_work+0x1bb/0x3c0
? process_one_work+0x3c0/0x3c0
worker_thread+0x4d/0x3c0
? process_one_work+0x3c0/0x3c0
kthread+0xc6/0xf0
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>

Fixes: cf530217408e ("devlink: Notify users when objects are accessible")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240409190820.227554-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.32, v6.6.31, v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26
# 8c91c608 09-Apr-2024 Shay Drory <shayd@nvidia.com>

net/mlx5: Register devlink first under devlink lock

[ Upstream commit c6e77aa9dd82bc18a89bf49418f8f7e961cfccc8 ]

In case device is having a non fatal FW error during probe, the
driver will report t

net/mlx5: Register devlink first under devlink lock

[ Upstream commit c6e77aa9dd82bc18a89bf49418f8f7e961cfccc8 ]

In case device is having a non fatal FW error during probe, the
driver will report the error to user via devlink. This will trigger
a WARN_ON, since mlx5 is calling devlink_register() last.
In order to avoid the WARN_ON[1], change mlx5 to invoke devl_register()
first under devlink lock.

[1]
WARNING: CPU: 5 PID: 227 at net/devlink/health.c:483 devlink_recover_notify.constprop.0+0xb8/0xc0
CPU: 5 PID: 227 Comm: kworker/u16:3 Not tainted 6.4.0-rc5_for_upstream_min_debug_2023_06_12_12_38 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_health0000:08:00.0 mlx5_fw_reporter_err_work [mlx5_core]
RIP: 0010:devlink_recover_notify.constprop.0+0xb8/0xc0
Call Trace:
<TASK>
? __warn+0x79/0x120
? devlink_recover_notify.constprop.0+0xb8/0xc0
? report_bug+0x17c/0x190
? handle_bug+0x3c/0x60
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? devlink_recover_notify.constprop.0+0xb8/0xc0
devlink_health_report+0x4a/0x1c0
mlx5_fw_reporter_err_work+0xa4/0xd0 [mlx5_core]
process_one_work+0x1bb/0x3c0
? process_one_work+0x3c0/0x3c0
worker_thread+0x4d/0x3c0
? process_one_work+0x3c0/0x3c0
kthread+0xc6/0xf0
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>

Fixes: cf530217408e ("devlink: Notify users when objects are accessible")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240409190820.227554-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.32, v6.6.31, v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26
# 8c91c608 09-Apr-2024 Shay Drory <shayd@nvidia.com>

net/mlx5: Register devlink first under devlink lock

[ Upstream commit c6e77aa9dd82bc18a89bf49418f8f7e961cfccc8 ]

In case device is having a non fatal FW error during probe, the
driver will report t

net/mlx5: Register devlink first under devlink lock

[ Upstream commit c6e77aa9dd82bc18a89bf49418f8f7e961cfccc8 ]

In case device is having a non fatal FW error during probe, the
driver will report the error to user via devlink. This will trigger
a WARN_ON, since mlx5 is calling devlink_register() last.
In order to avoid the WARN_ON[1], change mlx5 to invoke devl_register()
first under devlink lock.

[1]
WARNING: CPU: 5 PID: 227 at net/devlink/health.c:483 devlink_recover_notify.constprop.0+0xb8/0xc0
CPU: 5 PID: 227 Comm: kworker/u16:3 Not tainted 6.4.0-rc5_for_upstream_min_debug_2023_06_12_12_38 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_health0000:08:00.0 mlx5_fw_reporter_err_work [mlx5_core]
RIP: 0010:devlink_recover_notify.constprop.0+0xb8/0xc0
Call Trace:
<TASK>
? __warn+0x79/0x120
? devlink_recover_notify.constprop.0+0xb8/0xc0
? report_bug+0x17c/0x190
? handle_bug+0x3c/0x60
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? devlink_recover_notify.constprop.0+0xb8/0xc0
devlink_health_report+0x4a/0x1c0
mlx5_fw_reporter_err_work+0xa4/0xd0 [mlx5_core]
process_one_work+0x1bb/0x3c0
? process_one_work+0x3c0/0x3c0
worker_thread+0x4d/0x3c0
? process_one_work+0x3c0/0x3c0
kthread+0xc6/0xf0
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>

Fixes: cf530217408e ("devlink: Notify users when objects are accessible")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240409190820.227554-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.32, v6.6.31, v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26
# 8c91c608 09-Apr-2024 Shay Drory <shayd@nvidia.com>

net/mlx5: Register devlink first under devlink lock

[ Upstream commit c6e77aa9dd82bc18a89bf49418f8f7e961cfccc8 ]

In case device is having a non fatal FW error during probe, the
driver will report t

net/mlx5: Register devlink first under devlink lock

[ Upstream commit c6e77aa9dd82bc18a89bf49418f8f7e961cfccc8 ]

In case device is having a non fatal FW error during probe, the
driver will report the error to user via devlink. This will trigger
a WARN_ON, since mlx5 is calling devlink_register() last.
In order to avoid the WARN_ON[1], change mlx5 to invoke devl_register()
first under devlink lock.

[1]
WARNING: CPU: 5 PID: 227 at net/devlink/health.c:483 devlink_recover_notify.constprop.0+0xb8/0xc0
CPU: 5 PID: 227 Comm: kworker/u16:3 Not tainted 6.4.0-rc5_for_upstream_min_debug_2023_06_12_12_38 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_health0000:08:00.0 mlx5_fw_reporter_err_work [mlx5_core]
RIP: 0010:devlink_recover_notify.constprop.0+0xb8/0xc0
Call Trace:
<TASK>
? __warn+0x79/0x120
? devlink_recover_notify.constprop.0+0xb8/0xc0
? report_bug+0x17c/0x190
? handle_bug+0x3c/0x60
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? devlink_recover_notify.constprop.0+0xb8/0xc0
devlink_health_report+0x4a/0x1c0
mlx5_fw_reporter_err_work+0xa4/0xd0 [mlx5_core]
process_one_work+0x1bb/0x3c0
? process_one_work+0x3c0/0x3c0
worker_thread+0x4d/0x3c0
? process_one_work+0x3c0/0x3c0
kthread+0xc6/0xf0
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>

Fixes: cf530217408e ("devlink: Notify users when objects are accessible")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240409190820.227554-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.32, v6.6.31, v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26
# 8c91c608 09-Apr-2024 Shay Drory <shayd@nvidia.com>

net/mlx5: Register devlink first under devlink lock

[ Upstream commit c6e77aa9dd82bc18a89bf49418f8f7e961cfccc8 ]

In case device is having a non fatal FW error during probe, the
driver will report t

net/mlx5: Register devlink first under devlink lock

[ Upstream commit c6e77aa9dd82bc18a89bf49418f8f7e961cfccc8 ]

In case device is having a non fatal FW error during probe, the
driver will report the error to user via devlink. This will trigger
a WARN_ON, since mlx5 is calling devlink_register() last.
In order to avoid the WARN_ON[1], change mlx5 to invoke devl_register()
first under devlink lock.

[1]
WARNING: CPU: 5 PID: 227 at net/devlink/health.c:483 devlink_recover_notify.constprop.0+0xb8/0xc0
CPU: 5 PID: 227 Comm: kworker/u16:3 Not tainted 6.4.0-rc5_for_upstream_min_debug_2023_06_12_12_38 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_health0000:08:00.0 mlx5_fw_reporter_err_work [mlx5_core]
RIP: 0010:devlink_recover_notify.constprop.0+0xb8/0xc0
Call Trace:
<TASK>
? __warn+0x79/0x120
? devlink_recover_notify.constprop.0+0xb8/0xc0
? report_bug+0x17c/0x190
? handle_bug+0x3c/0x60
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? devlink_recover_notify.constprop.0+0xb8/0xc0
devlink_health_report+0x4a/0x1c0
mlx5_fw_reporter_err_work+0xa4/0xd0 [mlx5_core]
process_one_work+0x1bb/0x3c0
? process_one_work+0x3c0/0x3c0
worker_thread+0x4d/0x3c0
? process_one_work+0x3c0/0x3c0
kthread+0xc6/0xf0
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>

Fixes: cf530217408e ("devlink: Notify users when objects are accessible")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240409190820.227554-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.32, v6.6.31, v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26
# 8c91c608 09-Apr-2024 Shay Drory <shayd@nvidia.com>

net/mlx5: Register devlink first under devlink lock

[ Upstream commit c6e77aa9dd82bc18a89bf49418f8f7e961cfccc8 ]

In case device is having a non fatal FW error during probe, the
driver will report t

net/mlx5: Register devlink first under devlink lock

[ Upstream commit c6e77aa9dd82bc18a89bf49418f8f7e961cfccc8 ]

In case device is having a non fatal FW error during probe, the
driver will report the error to user via devlink. This will trigger
a WARN_ON, since mlx5 is calling devlink_register() last.
In order to avoid the WARN_ON[1], change mlx5 to invoke devl_register()
first under devlink lock.

[1]
WARNING: CPU: 5 PID: 227 at net/devlink/health.c:483 devlink_recover_notify.constprop.0+0xb8/0xc0
CPU: 5 PID: 227 Comm: kworker/u16:3 Not tainted 6.4.0-rc5_for_upstream_min_debug_2023_06_12_12_38 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_health0000:08:00.0 mlx5_fw_reporter_err_work [mlx5_core]
RIP: 0010:devlink_recover_notify.constprop.0+0xb8/0xc0
Call Trace:
<TASK>
? __warn+0x79/0x120
? devlink_recover_notify.constprop.0+0xb8/0xc0
? report_bug+0x17c/0x190
? handle_bug+0x3c/0x60
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? devlink_recover_notify.constprop.0+0xb8/0xc0
devlink_health_report+0x4a/0x1c0
mlx5_fw_reporter_err_work+0xa4/0xd0 [mlx5_core]
process_one_work+0x1bb/0x3c0
? process_one_work+0x3c0/0x3c0
worker_thread+0x4d/0x3c0
? process_one_work+0x3c0/0x3c0
kthread+0xc6/0xf0
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>

Fixes: cf530217408e ("devlink: Notify users when objects are accessible")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240409190820.227554-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.32, v6.6.31, v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26
# 8c91c608 09-Apr-2024 Shay Drory <shayd@nvidia.com>

net/mlx5: Register devlink first under devlink lock

[ Upstream commit c6e77aa9dd82bc18a89bf49418f8f7e961cfccc8 ]

In case device is having a non fatal FW error during probe, the
driver will report t

net/mlx5: Register devlink first under devlink lock

[ Upstream commit c6e77aa9dd82bc18a89bf49418f8f7e961cfccc8 ]

In case device is having a non fatal FW error during probe, the
driver will report the error to user via devlink. This will trigger
a WARN_ON, since mlx5 is calling devlink_register() last.
In order to avoid the WARN_ON[1], change mlx5 to invoke devl_register()
first under devlink lock.

[1]
WARNING: CPU: 5 PID: 227 at net/devlink/health.c:483 devlink_recover_notify.constprop.0+0xb8/0xc0
CPU: 5 PID: 227 Comm: kworker/u16:3 Not tainted 6.4.0-rc5_for_upstream_min_debug_2023_06_12_12_38 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_health0000:08:00.0 mlx5_fw_reporter_err_work [mlx5_core]
RIP: 0010:devlink_recover_notify.constprop.0+0xb8/0xc0
Call Trace:
<TASK>
? __warn+0x79/0x120
? devlink_recover_notify.constprop.0+0xb8/0xc0
? report_bug+0x17c/0x190
? handle_bug+0x3c/0x60
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? devlink_recover_notify.constprop.0+0xb8/0xc0
devlink_health_report+0x4a/0x1c0
mlx5_fw_reporter_err_work+0xa4/0xd0 [mlx5_core]
process_one_work+0x1bb/0x3c0
? process_one_work+0x3c0/0x3c0
worker_thread+0x4d/0x3c0
? process_one_work+0x3c0/0x3c0
kthread+0xc6/0xf0
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>

Fixes: cf530217408e ("devlink: Notify users when objects are accessible")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240409190820.227554-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.32, v6.6.31, v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26
# 8c91c608 09-Apr-2024 Shay Drory <shayd@nvidia.com>

net/mlx5: Register devlink first under devlink lock

[ Upstream commit c6e77aa9dd82bc18a89bf49418f8f7e961cfccc8 ]

In case device is having a non fatal FW error during probe, the
driver will report t

net/mlx5: Register devlink first under devlink lock

[ Upstream commit c6e77aa9dd82bc18a89bf49418f8f7e961cfccc8 ]

In case device is having a non fatal FW error during probe, the
driver will report the error to user via devlink. This will trigger
a WARN_ON, since mlx5 is calling devlink_register() last.
In order to avoid the WARN_ON[1], change mlx5 to invoke devl_register()
first under devlink lock.

[1]
WARNING: CPU: 5 PID: 227 at net/devlink/health.c:483 devlink_recover_notify.constprop.0+0xb8/0xc0
CPU: 5 PID: 227 Comm: kworker/u16:3 Not tainted 6.4.0-rc5_for_upstream_min_debug_2023_06_12_12_38 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_health0000:08:00.0 mlx5_fw_reporter_err_work [mlx5_core]
RIP: 0010:devlink_recover_notify.constprop.0+0xb8/0xc0
Call Trace:
<TASK>
? __warn+0x79/0x120
? devlink_recover_notify.constprop.0+0xb8/0xc0
? report_bug+0x17c/0x190
? handle_bug+0x3c/0x60
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? devlink_recover_notify.constprop.0+0xb8/0xc0
devlink_health_report+0x4a/0x1c0
mlx5_fw_reporter_err_work+0xa4/0xd0 [mlx5_core]
process_one_work+0x1bb/0x3c0
? process_one_work+0x3c0/0x3c0
worker_thread+0x4d/0x3c0
? process_one_work+0x3c0/0x3c0
kthread+0xc6/0xf0
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>

Fixes: cf530217408e ("devlink: Notify users when objects are accessible")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240409190820.227554-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5
# fdd350fe 21-Sep-2023 Patrisious Haddad <phaddad@nvidia.com>

RDMA/mlx5: Send events from IB driver about device affiliation state

[ Upstream commit 0d293714ac32650bfb669ceadf7cc2fad8161401 ]

Send blocking events from IB driver whenever the device is done bei

RDMA/mlx5: Send events from IB driver about device affiliation state

[ Upstream commit 0d293714ac32650bfb669ceadf7cc2fad8161401 ]

Send blocking events from IB driver whenever the device is done being
affiliated or if it is removed from an affiliation.

This is useful since now the EN driver can register to those event and
know when a device is affiliated or not.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Link: https://lore.kernel.org/r/a7491c3e483cfd8d962f5f75b9a25f253043384a.1695296682.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Stable-dep-of: 762a55a54eec ("net/mlx5e: Disable IPsec offload support if not FW steering")
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39
# a41cb591 11-Jul-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: Remove unused MAX HCA capabilities

Each device cap has two modes: MAX and CUR. The driver maintains a
cache of both modes of the capabilities. For most device caps, the MAX
cap mode is nev

net/mlx5: Remove unused MAX HCA capabilities

Each device cap has two modes: MAX and CUR. The driver maintains a
cache of both modes of the capabilities. For most device caps, the MAX
cap mode is never used.

Hence, remove all driver queries of the MAX mode of the said caps as
well as their helper MACROs.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

show more ...


Revision tags: v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16
# 0b4eb603 02-Jan-2022 Shay Drory <shayd@nvidia.com>

net/mlx5: Remove unused CAPs

mlx5 driver queries the device for VECTOR_CALC and SHAMPO caps, but
there isn't any user who requires them.
As well as, MLX5_MCAM_REGS_0x9080_0x90FF is queried but not u

net/mlx5: Remove unused CAPs

mlx5 driver queries the device for VECTOR_CALC and SHAMPO caps, but
there isn't any user who requires them.
As well as, MLX5_MCAM_REGS_0x9080_0x90FF is queried but not used.

Thus, drop all usages and definitions of the mentioned caps above.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

show more ...


# 1f507e80 07-Aug-2023 Adham Faris <afaris@nvidia.com>

net/mlx5: Expose NIC temperature via hardware monitoring kernel API

Expose NIC temperature by implementing hwmon kernel API, which turns
current thermal zone kernel API to redundant.

For each one o

net/mlx5: Expose NIC temperature via hardware monitoring kernel API

Expose NIC temperature by implementing hwmon kernel API, which turns
current thermal zone kernel API to redundant.

For each one of the supported and exposed thermal diode sensors, expose
the following attributes:
1) Input temperature.
2) Highest temperature.
3) Temperature label:
Depends on the firmware capability, if firmware doesn't support
sensors naming, the fallback naming convention would be: "sensorX",
where X is the HW spec (MTMP register) sensor index.
4) Temperature critical max value:
refers to the high threshold of Warning Event. Will be exposed as
`tempY_crit` hwmon attribute (RO attribute). For example for
ConnectX5 HCA's this temperature value will be 105 Celsius, 10
degrees lower than the HW shutdown temperature).
5) Temperature reset history: resets highest temperature.

For example, for dualport ConnectX5 NIC with a single IC thermal diode
sensor will have 2 hwmon directories (one for each PCI function)
under "/sys/class/hwmon/hwmon[X,Y]".

Listing one of the directories above (hwmonX/Y) generates the
corresponding output below:

$ grep -H -d skip . /sys/class/hwmon/hwmon0/*

Output
=======================================================================
/sys/class/hwmon/hwmon0/name:mlx5
/sys/class/hwmon/hwmon0/temp1_crit:105000
/sys/class/hwmon/hwmon0/temp1_highest:48000
/sys/class/hwmon/hwmon0/temp1_input:46000
/sys/class/hwmon/hwmon0/temp1_label:asic
grep: /sys/class/hwmon/hwmon0/temp1_reset_history: Permission denied

In addition, displaying the sensors data via lm_sensors generates the
corresponding output below:

$ sensors

Output
=======================================================================
mlx5-pci-0800
Adapter: PCI adapter
asic: +46.0°C (crit = +105.0°C, highest = +48.0°C)

mlx5-pci-0801
Adapter: PCI adapter
asic: +46.0°C (crit = +105.0°C, highest = +48.0°C)

CC: Jean Delvare <jdelvare@suse.com>
Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230807180507.22984-3-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# aab8e1a2 23-Jul-2023 Moshe Shemesh <moshe@nvidia.com>

net/mlx5: Reload auxiliary devices in pci error handlers

Handling pci errors should fully teardown and load back auxiliary
devices, same as done through mlx5 health recovery flow.

Fixes: 72ed5d5624

net/mlx5: Reload auxiliary devices in pci error handlers

Handling pci errors should fully teardown and load back auxiliary
devices, same as done through mlx5 health recovery flow.

Fixes: 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

show more ...


# 06cd555f 18-Jan-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: split mlx5_cmd_init() to probe and reload routines

There is no need to destroy and allocate cmd SW structs during reload,
this is time consuming for no reason.
Hence, split mlx5_cmd_init()

net/mlx5: split mlx5_cmd_init() to probe and reload routines

There is no need to destroy and allocate cmd SW structs during reload,
this is time consuming for no reason.
Hence, split mlx5_cmd_init() to probe and reload routines.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

show more ...


# 88d162b4 04-May-2023 Roi Dayan <roid@nvidia.com>

net/mlx5: Devcom, Infrastructure changes

Update devcom infrastructure to be more generic, without
depending on max supported ports definition or a device guid,
and also more encapsulated so callers

net/mlx5: Devcom, Infrastructure changes

Update devcom infrastructure to be more generic, without
depending on max supported ports definition or a device guid,
and also more encapsulated so callers don't need to pass
the register devcom component id per event call.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

show more ...


# 53d737df 25-Jun-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: Unregister devlink params in case interface is down

Currently, in case an interface is down, mlx5 driver doesn't
unregister its devlink params, which leads to this WARN[1].
Fix it by unreg

net/mlx5: Unregister devlink params in case interface is down

Currently, in case an interface is down, mlx5 driver doesn't
unregister its devlink params, which leads to this WARN[1].
Fix it by unregistering devlink params in that case as well.

[1]
[ 295.244769 ] WARNING: CPU: 15 PID: 1 at net/core/devlink.c:9042 devlink_free+0x174/0x1fc
[ 295.488379 ] CPU: 15 PID: 1 Comm: shutdown Tainted: G S OE 5.15.0-1017.19.3.g0677e61-bluefield #g0677e61
[ 295.509330 ] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.2.0.12761 Jun 6 2023
[ 295.543096 ] pc : devlink_free+0x174/0x1fc
[ 295.551104 ] lr : mlx5_devlink_free+0x18/0x2c [mlx5_core]
[ 295.561816 ] sp : ffff80000809b850
[ 295.711155 ] Call trace:
[ 295.716030 ] devlink_free+0x174/0x1fc
[ 295.723346 ] mlx5_devlink_free+0x18/0x2c [mlx5_core]
[ 295.733351 ] mlx5_sf_dev_remove+0x98/0xb0 [mlx5_core]
[ 295.743534 ] auxiliary_bus_remove+0x2c/0x50
[ 295.751893 ] __device_release_driver+0x19c/0x280
[ 295.761120 ] device_release_driver+0x34/0x50
[ 295.769649 ] bus_remove_device+0xdc/0x170
[ 295.777656 ] device_del+0x17c/0x3a4
[ 295.784620 ] mlx5_sf_dev_remove+0x28/0xf0 [mlx5_core]
[ 295.794800 ] mlx5_sf_dev_table_destroy+0x98/0x110 [mlx5_core]
[ 295.806375 ] mlx5_unload+0x34/0xd0 [mlx5_core]
[ 295.815339 ] mlx5_unload_one+0x70/0xe4 [mlx5_core]
[ 295.824998 ] shutdown+0xb0/0xd8 [mlx5_core]
[ 295.833439 ] pci_device_shutdown+0x3c/0xa0
[ 295.841651 ] device_shutdown+0x170/0x340
[ 295.849486 ] __do_sys_reboot+0x1f4/0x2a0
[ 295.857322 ] __arm64_sys_reboot+0x2c/0x40
[ 295.865329 ] invoke_syscall+0x78/0x100
[ 295.872817 ] el0_svc_common.constprop.0+0x54/0x184
[ 295.882392 ] do_el0_svc+0x30/0xac
[ 295.889008 ] el0_svc+0x48/0x160
[ 295.895278 ] el0t_64_sync_handler+0xa4/0x130
[ 295.903807 ] el0t_64_sync+0x1a4/0x1a8
[ 295.911120 ] ---[ end trace 4f1d2381d00d9dce ]---

Fixes: fe578cbb2f05 ("net/mlx5: Move devlink registration before mlx5_load")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

show more ...


# 7a9770f1 17-May-2023 Moshe Shemesh <moshe@nvidia.com>

net/mlx5: Handle sync reset unload event

Added a new event handler to firmware sync reset, which is used to
support firmware sync reset flow on smart NIC. Adding this new stage to
the flow enables t

net/mlx5: Handle sync reset unload event

Added a new event handler to firmware sync reset, which is used to
support firmware sync reset flow on smart NIC. Adding this new stage to
the flow enables the firmware to ensure host PFs unload before ECPFs
unload, to avoid race of PFs recovery.

If firmware sends sync_reset_unload event to driver the driver should
unload and close all HW resources of the function. Once the driver
finishes unloading part, it can't get any more events from firmware as
event queues are closed, so it polls the reset state field to know when
to continue to next stage of the sync reset flow.

Added capability bit for supporting sync_reset_unload event.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

show more ...


# e71383fb 03-May-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: Light probe local SFs

In case user wants to configure the SFs, for example: to use only vdpa
functionality, he needs to fully probe a SF, configure what he wants,
and afterward reload the

net/mlx5: Light probe local SFs

In case user wants to configure the SFs, for example: to use only vdpa
functionality, he needs to fully probe a SF, configure what he wants,
and afterward reload the SF.

In order to save the time of the reload, local SFs will probe without
any auxiliary sub-device, so that the SFs can be configured prior to
its full probe.

The defaults of the enable_* devlink params of these SFs are set to
false.

Usage example:
Create SF:
$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
$ devlink port function set pci/0000:08:00.0/32768 \
hw_addr 00:00:00:00:00:11 state active

Enable ETH auxiliary device:
$ devlink dev param set auxiliary/mlx5_core.sf.1 \
name enable_eth value true cmode driverinit

Now, in order to fully probe the SF, use devlink reload:
$ devlink dev reload auxiliary/mlx5_core.sf.1

At this point the user have SF devlink instance with auxiliary device
for the Ethernet functionality only.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

show more ...


# 2059cf51 03-May-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: Split function_setup() to enable and open functions

mlx5_cmd_init_hca() is taking ~0.2 seconds. In case of a user who
desire to disable some of the SF aux devices, and with large scale-1K

net/mlx5: Split function_setup() to enable and open functions

mlx5_cmd_init_hca() is taking ~0.2 seconds. In case of a user who
desire to disable some of the SF aux devices, and with large scale-1K
SFs for example, this user will waste more than 3 minutes on
mlx5_cmd_init_hca() which isn't needed at this stage.

Downstream patch will change SFs which are probe over the E-switch,
local SFs, to be probed without any aux dev. In order to support this,
split function_setup() to avoid executing mlx5_cmd_init_hca().

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

show more ...


# 6d98f314 07-Mar-2023 Daniel Jurgens <danielj@nvidia.com>

net/mlx5: Update SRIOV enable/disable to handle EC/VFs

Previously on the embedded CPU platform SRIOV was never enabled/disabled
via mlx5_core_sriov_configure. Host VF updates are provided by an even

net/mlx5: Update SRIOV enable/disable to handle EC/VFs

Previously on the embedded CPU platform SRIOV was never enabled/disabled
via mlx5_core_sriov_configure. Host VF updates are provided by an event
handler. Now in the disable flow it must be known if this is a disable
due to driver unload or SRIOV detach, or if the user updated the number
of VFs. If due to change in the number of VFs only wait for the pages of
ECVFs.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

show more ...


# bbfa4b58 28-Apr-2023 Moshe Shemesh <moshe@nvidia.com>

net/mlx5: Read embedded cpu after init bit cleared

During driver load it reads embedded_cpu bit from initialization
segment, but the initialization segment is readable only after
initialization bit

net/mlx5: Read embedded cpu after init bit cleared

During driver load it reads embedded_cpu bit from initialization
segment, but the initialization segment is readable only after
initialization bit is cleared.

Move the call to mlx5_read_embedded_cpu() right after initialization bit
cleared.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Fixes: 591905ba9679 ("net/mlx5: Introduce Mellanox SmartNIC and modify page management logic")
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

show more ...


12345678910>>...48