#
276d197e |
| 06-Aug-2019 |
Aya Levin <ayal@mellanox.com> |
net/mlx5e: Fix error flow of CQE recovery on tx reporter CQE recovery function begins with test and set of recovery bit. Add an error flow which ensures clearing of this bit when leaving
net/mlx5e: Fix error flow of CQE recovery on tx reporter CQE recovery function begins with test and set of recovery bit. Add an error flow which ensures clearing of this bit when leaving the recovery function, to allow further recoveries to take place. This allows removal of clearing recovery bit on sq activate. Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
#
d9a2fcf5 |
| 07-Aug-2019 |
Aya Levin <ayal@mellanox.com> |
net/mlx5e: Fix false negative indication on tx reporter CQE recovery Remove wrong error return value when SQ is not in error state. CQE recovery on TX reporter queries the sq state. If t
net/mlx5e: Fix false negative indication on tx reporter CQE recovery Remove wrong error return value when SQ is not in error state. CQE recovery on TX reporter queries the sq state. If the sq is not in error state, the sq is either in ready or reset state. Ready state is good state which doesn't require recovery and reset state is a temporal state which ends in ready state. With this patch, CQE recovery in this scenario is successful. Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
#
c9e6c720 |
| 24-Jun-2019 |
Aya Levin <ayal@mellanox.com> |
net/mlx5e: TX reporter cleanup Remove redundant include files. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pi
net/mlx5e: TX reporter cleanup Remove redundant include files. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
#
baf6dfdb |
| 24-Jun-2019 |
Aya Levin <ayal@mellanox.com> |
net/mlx5e: Set tx reporter only on successful creation When failing to create tx reporter, don't set the reporter's pointer. Creating a reporter is not mandatory for driver load, avoid
net/mlx5e: Set tx reporter only on successful creation When failing to create tx reporter, don't set the reporter's pointer. Creating a reporter is not mandatory for driver load, avoid garbage/error pointer. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
#
7f7cc235 |
| 03-Jul-2019 |
Aya Levin <ayal@mellanox.com> |
net/mlx5e: Fix mlx5e_tx_reporter_create return value Return error when failing to create a reporter in devlink. Since NET_DEVLINK mandatory to MLX5_CORE in Kconfig, returned pointer
net/mlx5e: Fix mlx5e_tx_reporter_create return value Return error when failing to create a reporter in devlink. Since NET_DEVLINK mandatory to MLX5_CORE in Kconfig, returned pointer can't be NULL and can only hold an error in bad path. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
#
99d31cbd |
| 30-Jun-2019 |
Aya Levin <ayal@mellanox.com> |
net/mlx5e: Fix error flow in tx reporter diagnose Fix tx reporter's diagnose callback. Propagate error when failing to gather diagnostics information or failing to print diagnostic data
net/mlx5e: Fix error flow in tx reporter diagnose Fix tx reporter's diagnose callback. Propagate error when failing to gather diagnostics information or failing to print diagnostic data per queue. Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
Revision tags: v5.1.14, v5.1.13, v5.1.12, v5.1.11 |
|
#
39825350 |
| 17-Jun-2019 |
Aya Levin <ayal@mellanox.com> |
net/mlx5e: Fix return value from timeout recover function Fix timeout recover function to return a meaningful return value. When an interrupt was not sent by the FW, return IO error inst
net/mlx5e: Fix return value from timeout recover function Fix timeout recover function to return a meaningful return value. When an interrupt was not sent by the FW, return IO error instead of 'true'. Fixes: c7981bea48fb ("net/mlx5e: Fix return status of TX reporter timeout recover") Signed-off-by: Aya Levin <ayal@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
Revision tags: v5.1.10, v5.1.9, v5.1.8, v5.1.7, v5.1.6, v5.1.5, v5.1.4, v5.1.3, v5.1.2, v5.1.1, v5.0.14, v5.1, v5.0.13, v5.0.12, v5.0.11, v5.0.10, v5.0.9, v5.0.8, v5.0.7, v5.0.6 |
|
#
484c1ada |
| 28-Mar-2019 |
Eran Ben Elisha <eranbe@mellanox.com> |
net/mlx5e: Use fail-safe channels reopen in tx reporter recover When requested to recover from error, the tx reporter might open new channels and close the existing ones. Use safe channe
net/mlx5e: Use fail-safe channels reopen in tx reporter recover When requested to recover from error, the tx reporter might open new channels and close the existing ones. Use safe channels switch flow in order to guarantee opened channels at the end of the recover flow. For this purpose, define mlx5e_safe_reopen_channels function and use it within those flows. Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
#
192fba79 |
| 28-Mar-2019 |
Eran Ben Elisha <eranbe@mellanox.com> |
net/mlx5e: Skip un-needed tx recover if interface state is down Skip recover operation if interface is in down state as TX objects are not open. This fixes a bug were the recover flow re
net/mlx5e: Skip un-needed tx recover if interface state is down Skip recover operation if interface is in down state as TX objects are not open. This fixes a bug were the recover flow re-opened TX objects which were not opened before, leading to a possible memory leak at driver unload. Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
Revision tags: v5.0.5, v5.0.4, v5.0.3, v4.19.29, v5.0.2, v4.19.28, v5.0.1, v4.19.27, v5.0, v4.19.26 |
|
#
6bdbc1cb |
| 24-Feb-2019 |
Eran Ben Elisha <eranbe@mellanox.com> |
net/mlx5e: Declare mlx5e_tx_reporter_recover_from_ctx as static Function mlx5e_tx_reporter_recover_from_ctx is only used within mlx5e tx reporter, move it to be statically declared in en
net/mlx5e: Declare mlx5e_tx_reporter_recover_from_ctx as static Function mlx5e_tx_reporter_recover_from_ctx is only used within mlx5e tx reporter, move it to be statically declared in en/reporter_tx.c. Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
Revision tags: v4.19.25 |
|
#
2e5b0534 |
| 20-Feb-2019 |
Eran Ben Elisha <eranbe@mellanox.com> |
net/mlx5e: Fix mlx5e_tx_reporter_create return value If reporter is ERR_PTR or NULL, error code shall be returned. At all other cases it shall return success. Fix that. Fixes: d
net/mlx5e: Fix mlx5e_tx_reporter_create return value If reporter is ERR_PTR or NULL, error code shall be returned. At all other cases it shall return success. Fix that. Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
#
c7981bea |
| 20-Feb-2019 |
Eran Ben Elisha <eranbe@mellanox.com> |
net/mlx5e: Fix return status of TX reporter timeout recover In case of lost interrupt recover, we shall return success. Fix that. Fixes: 7d91126b1aea ("net/mlx5e: Add tx timeout sup
net/mlx5e: Fix return status of TX reporter timeout recover In case of lost interrupt recover, we shall return success. Fix that. Fixes: 7d91126b1aea ("net/mlx5e: Add tx timeout support for mlx5e tx reporter") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reported-by: Maria Pasechnik <mariap@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
Revision tags: v4.19.24 |
|
#
2c493ae0 |
| 19-Feb-2019 |
Eran Ben Elisha <eranbe@mellanox.com> |
net/mlx5e: Re-add support for TX timeout when TX reporter is not valid When TX reporter was introduced, it took ownership over TX timeout error handling. this introduced a regression in
net/mlx5e: Re-add support for TX timeout when TX reporter is not valid When TX reporter was introduced, it took ownership over TX timeout error handling. this introduced a regression in case TX reporter is not valid (NET_DEVLINK is not set, or devlink_health_reporter_create failure). Fix mlx5e_tx_reporter_timeout function so it can be called at all times. In addition, remove a warning print that indicates that a TX timeout won't be handled in case of no valid TX reporter. Fixes: 7d91126b1aea ("net/mlx5e: Add tx timeout support for mlx5e tx reporter") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
#
772ac5e2 |
| 19-Feb-2019 |
Eran Ben Elisha <eranbe@mellanox.com> |
net/mlx5e: Fix warn print in case of TX reporter creation failure Print warning message in case of TX reporter creation failure, only if the return value is ERR_PTR type. NULL pointer re
net/mlx5e: Fix warn print in case of TX reporter creation failure Print warning message in case of TX reporter creation failure, only if the return value is ERR_PTR type. NULL pointer return indicates that NET_DEVLINK is not set, and the warning print can be skipped. Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
Revision tags: v4.19.23, v4.19.22, v4.19.21 |
|
#
7d91126b |
| 07-Feb-2019 |
Eran Ben Elisha <eranbe@mellanox.com> |
net/mlx5e: Add tx timeout support for mlx5e tx reporter With this patch, ndo_tx_timeout callback will be redirected to the tx reporter in order to detect a tx timeout error and report it
net/mlx5e: Add tx timeout support for mlx5e tx reporter With this patch, ndo_tx_timeout callback will be redirected to the tx reporter in order to detect a tx timeout error and report it to the devlink health. (The watchdog detects tx timeouts, but the driver verify the issue still exists before launching any recover method). In addition, recover from tx timeout in case of lost interrupt was added to the tx reporter recover method. The tx timeout recover from lost interrupt is not a new feature in the driver, this patch re-organize the functionality and move it to the tx reporter recovery flow. tx timeout example: (with auto_recover set to false, if set to true, the manual recover and diagnose sections are irrelevant) $cat /sys/kernel/debug/tracing/trace ... devlink_health_report: bus_name=pci dev_name=0000:00:09.0 driver_name=mlx5_core reporter_name=tx: TX timeout on queue: 0, SQ: 0x8a, CQ: 0x35, SQ Cons: 0x2 SQ Prod: 0x2, usecs since last trans: 14912000 $devlink health show pci/0000:00:09.0: name tx state healthy #err 1 #recover 0 last_dump_ts N/A parameters: grace_period 500 auto_recover false $devlink health diagnose pci/0000:00:09.0 reporter tx -j -p { "SQs": [ { "sqn": 138, "HW state": 1, "stopped": true },{ "sqn": 142, "HW state": 1, "stopped": false } ] } $devlink health diagnose pci/0000:00:09.0 reporter tx SQs: sqn: 138 HW state: 1 stopped: true sqn: 142 HW state: 1 stopped: false $devlink health recover pci/0000:00:09 reporter tx $devlink health show pci/0000:00:09.0: name tx state healthy #err 1 #recover 1 last_dump_ts N/A parameters: grace_period 500 auto_recover false Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
de8650a8 |
| 07-Feb-2019 |
Eran Ben Elisha <eranbe@mellanox.com> |
net/mlx5e: Add tx reporter support Add mlx5e tx reporter to devlink health reporters. This reporter will be responsible for diagnosing, reporting and recovering of tx errors. This pa
net/mlx5e: Add tx reporter support Add mlx5e tx reporter to devlink health reporters. This reporter will be responsible for diagnosing, reporting and recovering of tx errors. This patch declares the TX reporter operations and creates it using the devlink health API. Currently, this reporter supports reporting and recovering from send error CQE only. In addition, it adds diagnose information for the open SQs. For a local SQ recover (due to driver error report), in case of SQ recover failure, the recover operation will be considered as a failure. For a full tx recover, an attempt to close and open the channels will be done. If this one passed successfully, it will be considered as a successful recover. The SQ recover from error CQE flow is not a new feature in the driver, this patch re-organize the functions and adapt them for the devlink health API. For this purpose, move code from en_main.c to a new file named reporter_tx.c. Diagnose output: $devlink health diagnose pci/0000:00:09.0 reporter tx -j -p { "SQs": [ { "sqn": 138, "HW state": 1, "stopped": false },{ "sqn": 142, "HW state": 1, "stopped": false } ] } $devlink health diagnose pci/0000:00:09.0 reporter tx SQs: sqn: 138 HW state: 1 stopped: false sqn: 142 HW state: 1 stopped: false Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
30e5c2c6 |
| 25-Jan-2019 |
David S. Miller <davem@davemloft.net> |
net: Revert devlink health changes. This reverts the devlink health changes from 9/17/2019, Jiri wants things to be designed differently and it was agreed that the easiest way to do
net: Revert devlink health changes. This reverts the devlink health changes from 9/17/2019, Jiri wants things to be designed differently and it was agreed that the easiest way to do this is start from the beginning again. Commits reverted: cb5ccfbe73b389470e1dc11061bb185ef4bc9aec 880ee82f0313453ec5a6cb122866ac057263066b c7af343b4e33578b7de91786a3f639c8cfa0d97b ff253fedab961b22117a73ab808fcfa9e6852b50 6f9d56132eb6d2603d4273cfc65bed914ec47acb fcd852c69d776c0f46c8f79e8e431e5cc6ddc7b7 8a66704a13d9713593342e29b4f0c19762f5746b 12bd0dcefe88782ac1c9fff632958dd1b71d27e5 aba25279c10094c5c97d09c3491ca86d00b4ad5e ce019faa70f81555fa17ebc1d5a03651f2e7e15a b8c45a033acc607201588f7665ba84207e5149e0 And the follow-on build fix: o33a0efa4baecd689da9474ce0e8b673eb6931c60 Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.19.20, v4.19.19, v4.19.18, v4.19.17 |
|
#
ce019faa |
| 17-Jan-2019 |
Eran Ben Elisha <eranbe@mellanox.com> |
net/mlx5e: Add TX timeout support for mlx5e TX reporter With this patch, ndo_tx_timeout callback will be redirected to the TX reporter in order to detect a TX timeout error and report it
net/mlx5e: Add TX timeout support for mlx5e TX reporter With this patch, ndo_tx_timeout callback will be redirected to the TX reporter in order to detect a TX timeout error and report it to the devlink health. (The watchdog detects TX timeouts, but the driver verify the issue still exists before launching any recover method). In addition, recover from TX timeout in case of lost interrupt was added to the TX reporter recover method. The TX timeout recover from lost interrupt is not a new feature in the driver, this patch re-organize the functionality and move it to the TX reporter recovery flow. TX timeout example: (with auto_recover set to false, if set to true, the manual recover and diagnose sections are irrelevant) $cat /sys/kernel/debug/tracing/trace ... devlink_health_report: bus_name=pci dev_name=0000:00:09.0 driver_name=mlx5_core reporter_name=TX: TX timeout on queue: 0, SQ: 0xd8a, CQ: 0x406, SQ Cons: 0x2 SQ Prod: 0x2, usecs since last trans: 13972000 $devlink health diagnose pci/0000:00:09 reporter TX SQ 0xd8a: HW state: 1, stopped: 1 SQ 0xe44: HW state: 1, stopped: 0 SQ 0xeb4: HW state: 1, stopped: 0 SQ 0xf1f: HW state: 1, stopped: 0 SQ 0xf80: HW state: 1, stopped: 0 SQ 0xfe5: HW state: 1, stopped: 0 $devlink health recover pci/0000:00:09 reporter TX $devlink health show pci/0000:00:09.0: name TX state healthy #err 1 #recover 1 last_dump_ts N/A dump_available false attributes: grace_period 500 auto_recover false Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
aba25279 |
| 17-Jan-2019 |
Eran Ben Elisha <eranbe@mellanox.com> |
net/mlx5e: Add TX reporter support Add mlx5e tx reporter to devlink health reporters. This reporter will be responsible for diagnosing, reporting and recovering of TX errors. This pa
net/mlx5e: Add TX reporter support Add mlx5e tx reporter to devlink health reporters. This reporter will be responsible for diagnosing, reporting and recovering of TX errors. This patch declares the TX reporter operations and allocate it using the devlink health API. Currently, this reporter supports reporting and recovering from send error CQE only. In addition, it adds diagnose information for the open SQs. For a local SQ recover (due to driver error report), in case of SQ recover failure, the recover operation will be considered as a failure. For a full TX recover, an attempt to close and open the channels will be done. If this one passed successfully, it will be considered as a successful recover. The SQ recover from error CQE flow is not a new feature in the driver, this patch re-organize the functions and adapt them for the devlink health API. For this purpose, move code from en_main.c to a new file named reporter_tx.c. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|