History log of /openbmc/linux/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c (Results 76 – 94 of 94)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 276d197e 06-Aug-2019 Aya Levin <ayal@mellanox.com>

net/mlx5e: Fix error flow of CQE recovery on tx reporter

CQE recovery function begins with test and set of recovery bit. Add an
error flow which ensures clearing of this bit when leaving

net/mlx5e: Fix error flow of CQE recovery on tx reporter

CQE recovery function begins with test and set of recovery bit. Add an
error flow which ensures clearing of this bit when leaving the recovery
function, to allow further recoveries to take place. This allows removal
of clearing recovery bit on sq activate.

Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

show more ...


# d9a2fcf5 07-Aug-2019 Aya Levin <ayal@mellanox.com>

net/mlx5e: Fix false negative indication on tx reporter CQE recovery

Remove wrong error return value when SQ is not in error state.
CQE recovery on TX reporter queries the sq state. If t

net/mlx5e: Fix false negative indication on tx reporter CQE recovery

Remove wrong error return value when SQ is not in error state.
CQE recovery on TX reporter queries the sq state. If the sq is not in
error state, the sq is either in ready or reset state. Ready state is
good state which doesn't require recovery and reset state is a temporal
state which ends in ready state. With this patch, CQE recovery in this
scenario is successful.

Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

show more ...


# c9e6c720 24-Jun-2019 Aya Levin <ayal@mellanox.com>

net/mlx5e: TX reporter cleanup

Remove redundant include files.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pi

net/mlx5e: TX reporter cleanup

Remove redundant include files.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

show more ...


# baf6dfdb 24-Jun-2019 Aya Levin <ayal@mellanox.com>

net/mlx5e: Set tx reporter only on successful creation

When failing to create tx reporter, don't set the reporter's pointer.
Creating a reporter is not mandatory for driver load, avoid

net/mlx5e: Set tx reporter only on successful creation

When failing to create tx reporter, don't set the reporter's pointer.
Creating a reporter is not mandatory for driver load, avoid
garbage/error pointer.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

show more ...


# 7f7cc235 03-Jul-2019 Aya Levin <ayal@mellanox.com>

net/mlx5e: Fix mlx5e_tx_reporter_create return value

Return error when failing to create a reporter in devlink. Since
NET_DEVLINK mandatory to MLX5_CORE in Kconfig, returned pointer

net/mlx5e: Fix mlx5e_tx_reporter_create return value

Return error when failing to create a reporter in devlink. Since
NET_DEVLINK mandatory to MLX5_CORE in Kconfig, returned pointer
can't be NULL and can only hold an error in bad path.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

show more ...


# 99d31cbd 30-Jun-2019 Aya Levin <ayal@mellanox.com>

net/mlx5e: Fix error flow in tx reporter diagnose

Fix tx reporter's diagnose callback. Propagate error when failing to
gather diagnostics information or failing to print diagnostic data

net/mlx5e: Fix error flow in tx reporter diagnose

Fix tx reporter's diagnose callback. Propagate error when failing to
gather diagnostics information or failing to print diagnostic data per
queue.

Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

show more ...


Revision tags: v5.1.14, v5.1.13, v5.1.12, v5.1.11
# 39825350 17-Jun-2019 Aya Levin <ayal@mellanox.com>

net/mlx5e: Fix return value from timeout recover function

Fix timeout recover function to return a meaningful return value.
When an interrupt was not sent by the FW, return IO error inst

net/mlx5e: Fix return value from timeout recover function

Fix timeout recover function to return a meaningful return value.
When an interrupt was not sent by the FW, return IO error instead of
'true'.

Fixes: c7981bea48fb ("net/mlx5e: Fix return status of TX reporter timeout recover")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

show more ...


Revision tags: v5.1.10, v5.1.9, v5.1.8, v5.1.7, v5.1.6, v5.1.5, v5.1.4, v5.1.3, v5.1.2, v5.1.1, v5.0.14, v5.1, v5.0.13, v5.0.12, v5.0.11, v5.0.10, v5.0.9, v5.0.8, v5.0.7, v5.0.6
# 484c1ada 28-Mar-2019 Eran Ben Elisha <eranbe@mellanox.com>

net/mlx5e: Use fail-safe channels reopen in tx reporter recover

When requested to recover from error, the tx reporter might open new
channels and close the existing ones. Use safe channe

net/mlx5e: Use fail-safe channels reopen in tx reporter recover

When requested to recover from error, the tx reporter might open new
channels and close the existing ones. Use safe channels switch flow in
order to guarantee opened channels at the end of the recover flow.
For this purpose, define mlx5e_safe_reopen_channels function and use it
within those flows.

Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

show more ...


# 192fba79 28-Mar-2019 Eran Ben Elisha <eranbe@mellanox.com>

net/mlx5e: Skip un-needed tx recover if interface state is down

Skip recover operation if interface is in down state as TX objects are
not open. This fixes a bug were the recover flow re

net/mlx5e: Skip un-needed tx recover if interface state is down

Skip recover operation if interface is in down state as TX objects are
not open. This fixes a bug were the recover flow re-opened TX objects
which were not opened before, leading to a possible memory leak at
driver unload.

Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

show more ...


Revision tags: v5.0.5, v5.0.4, v5.0.3, v4.19.29, v5.0.2, v4.19.28, v5.0.1, v4.19.27, v5.0, v4.19.26
# 6bdbc1cb 24-Feb-2019 Eran Ben Elisha <eranbe@mellanox.com>

net/mlx5e: Declare mlx5e_tx_reporter_recover_from_ctx as static

Function mlx5e_tx_reporter_recover_from_ctx is only used within mlx5e tx
reporter, move it to be statically declared in en

net/mlx5e: Declare mlx5e_tx_reporter_recover_from_ctx as static

Function mlx5e_tx_reporter_recover_from_ctx is only used within mlx5e tx
reporter, move it to be statically declared in en/reporter_tx.c.

Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

show more ...


Revision tags: v4.19.25
# 2e5b0534 20-Feb-2019 Eran Ben Elisha <eranbe@mellanox.com>

net/mlx5e: Fix mlx5e_tx_reporter_create return value

If reporter is ERR_PTR or NULL, error code shall be returned. At all other
cases it shall return success. Fix that.

Fixes: d

net/mlx5e: Fix mlx5e_tx_reporter_create return value

If reporter is ERR_PTR or NULL, error code shall be returned. At all other
cases it shall return success. Fix that.

Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

show more ...


# c7981bea 20-Feb-2019 Eran Ben Elisha <eranbe@mellanox.com>

net/mlx5e: Fix return status of TX reporter timeout recover

In case of lost interrupt recover, we shall return success. Fix that.

Fixes: 7d91126b1aea ("net/mlx5e: Add tx timeout sup

net/mlx5e: Fix return status of TX reporter timeout recover

In case of lost interrupt recover, we shall return success. Fix that.

Fixes: 7d91126b1aea ("net/mlx5e: Add tx timeout support for mlx5e tx reporter")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reported-by: Maria Pasechnik <mariap@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

show more ...


Revision tags: v4.19.24
# 2c493ae0 19-Feb-2019 Eran Ben Elisha <eranbe@mellanox.com>

net/mlx5e: Re-add support for TX timeout when TX reporter is not valid

When TX reporter was introduced, it took ownership over TX timeout error
handling. this introduced a regression in

net/mlx5e: Re-add support for TX timeout when TX reporter is not valid

When TX reporter was introduced, it took ownership over TX timeout error
handling. this introduced a regression in case TX reporter is not valid
(NET_DEVLINK is not set, or devlink_health_reporter_create failure).

Fix mlx5e_tx_reporter_timeout function so it can be called at all times.

In addition, remove a warning print that indicates that a TX timeout won't
be handled in case of no valid TX reporter.

Fixes: 7d91126b1aea ("net/mlx5e: Add tx timeout support for mlx5e tx reporter")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

show more ...


# 772ac5e2 19-Feb-2019 Eran Ben Elisha <eranbe@mellanox.com>

net/mlx5e: Fix warn print in case of TX reporter creation failure

Print warning message in case of TX reporter creation failure, only if the
return value is ERR_PTR type. NULL pointer re

net/mlx5e: Fix warn print in case of TX reporter creation failure

Print warning message in case of TX reporter creation failure, only if the
return value is ERR_PTR type. NULL pointer return indicates that
NET_DEVLINK is not set, and the warning print can be skipped.

Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

show more ...


Revision tags: v4.19.23, v4.19.22, v4.19.21
# 7d91126b 07-Feb-2019 Eran Ben Elisha <eranbe@mellanox.com>

net/mlx5e: Add tx timeout support for mlx5e tx reporter

With this patch, ndo_tx_timeout callback will be redirected to the tx
reporter in order to detect a tx timeout error and report it

net/mlx5e: Add tx timeout support for mlx5e tx reporter

With this patch, ndo_tx_timeout callback will be redirected to the tx
reporter in order to detect a tx timeout error and report it to the
devlink health. (The watchdog detects tx timeouts, but the driver verify
the issue still exists before launching any recover method).

In addition, recover from tx timeout in case of lost interrupt was added
to the tx reporter recover method. The tx timeout recover from lost
interrupt is not a new feature in the driver, this patch re-organize the
functionality and move it to the tx reporter recovery flow.

tx timeout example:
(with auto_recover set to false, if set to true, the manual recover and
diagnose sections are irrelevant)

$cat /sys/kernel/debug/tracing/trace
...
devlink_health_report: bus_name=pci dev_name=0000:00:09.0
driver_name=mlx5_core reporter_name=tx: TX timeout on queue: 0, SQ: 0x8a,
CQ: 0x35, SQ Cons: 0x2 SQ Prod: 0x2, usecs since last trans: 14912000

$devlink health show
pci/0000:00:09.0:
name tx
state healthy #err 1 #recover 0 last_dump_ts N/A
parameters:
grace_period 500 auto_recover false

$devlink health diagnose pci/0000:00:09.0 reporter tx -j -p
{
"SQs": [ {
"sqn": 138,
"HW state": 1,
"stopped": true
},{
"sqn": 142,
"HW state": 1,
"stopped": false
} ]
}

$devlink health diagnose pci/0000:00:09.0 reporter tx
SQs:
sqn: 138 HW state: 1 stopped: true
sqn: 142 HW state: 1 stopped: false

$devlink health recover pci/0000:00:09 reporter tx
$devlink health show
pci/0000:00:09.0:
name tx
state healthy #err 1 #recover 1 last_dump_ts N/A
parameters:
grace_period 500 auto_recover false

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# de8650a8 07-Feb-2019 Eran Ben Elisha <eranbe@mellanox.com>

net/mlx5e: Add tx reporter support

Add mlx5e tx reporter to devlink health reporters. This reporter will be
responsible for diagnosing, reporting and recovering of tx errors.
This pa

net/mlx5e: Add tx reporter support

Add mlx5e tx reporter to devlink health reporters. This reporter will be
responsible for diagnosing, reporting and recovering of tx errors.
This patch declares the TX reporter operations and creates it using the
devlink health API. Currently, this reporter supports reporting and
recovering from send error CQE only. In addition, it adds diagnose
information for the open SQs.

For a local SQ recover (due to driver error report), in case of SQ recover
failure, the recover operation will be considered as a failure.
For a full tx recover, an attempt to close and open the channels will be
done. If this one passed successfully, it will be considered as a
successful recover.

The SQ recover from error CQE flow is not a new feature in the driver,
this patch re-organize the functions and adapt them for the devlink
health API. For this purpose, move code from en_main.c to a new file
named reporter_tx.c.

Diagnose output:
$devlink health diagnose pci/0000:00:09.0 reporter tx -j -p
{
"SQs": [ {
"sqn": 138,
"HW state": 1,
"stopped": false
},{
"sqn": 142,
"HW state": 1,
"stopped": false
} ]
}

$devlink health diagnose pci/0000:00:09.0 reporter tx
SQs:
sqn: 138 HW state: 1 stopped: false
sqn: 142 HW state: 1 stopped: false

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# 30e5c2c6 25-Jan-2019 David S. Miller <davem@davemloft.net>

net: Revert devlink health changes.

This reverts the devlink health changes from 9/17/2019,
Jiri wants things to be designed differently and it was
agreed that the easiest way to do

net: Revert devlink health changes.

This reverts the devlink health changes from 9/17/2019,
Jiri wants things to be designed differently and it was
agreed that the easiest way to do this is start from the
beginning again.

Commits reverted:

cb5ccfbe73b389470e1dc11061bb185ef4bc9aec
880ee82f0313453ec5a6cb122866ac057263066b
c7af343b4e33578b7de91786a3f639c8cfa0d97b
ff253fedab961b22117a73ab808fcfa9e6852b50
6f9d56132eb6d2603d4273cfc65bed914ec47acb
fcd852c69d776c0f46c8f79e8e431e5cc6ddc7b7
8a66704a13d9713593342e29b4f0c19762f5746b
12bd0dcefe88782ac1c9fff632958dd1b71d27e5
aba25279c10094c5c97d09c3491ca86d00b4ad5e
ce019faa70f81555fa17ebc1d5a03651f2e7e15a
b8c45a033acc607201588f7665ba84207e5149e0

And the follow-on build fix:

o33a0efa4baecd689da9474ce0e8b673eb6931c60

Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


Revision tags: v4.19.20, v4.19.19, v4.19.18, v4.19.17
# ce019faa 17-Jan-2019 Eran Ben Elisha <eranbe@mellanox.com>

net/mlx5e: Add TX timeout support for mlx5e TX reporter

With this patch, ndo_tx_timeout callback will be redirected to the TX
reporter in order to detect a TX timeout error and report it

net/mlx5e: Add TX timeout support for mlx5e TX reporter

With this patch, ndo_tx_timeout callback will be redirected to the TX
reporter in order to detect a TX timeout error and report it to the
devlink health. (The watchdog detects TX timeouts, but the driver verify
the issue still exists before launching any recover method).

In addition, recover from TX timeout in case of lost interrupt was added
to the TX reporter recover method. The TX timeout recover from lost
interrupt is not a new feature in the driver, this patch re-organize the
functionality and move it to the TX reporter recovery flow.

TX timeout example:
(with auto_recover set to false, if set to true, the manual recover and
diagnose sections are irrelevant)

$cat /sys/kernel/debug/tracing/trace
...
devlink_health_report: bus_name=pci dev_name=0000:00:09.0
driver_name=mlx5_core reporter_name=TX: TX timeout on queue: 0, SQ: 0xd8a, CQ:
0x406, SQ Cons: 0x2 SQ Prod: 0x2, usecs since last trans: 13972000

$devlink health diagnose pci/0000:00:09 reporter TX
SQ 0xd8a: HW state: 1, stopped: 1
SQ 0xe44: HW state: 1, stopped: 0
SQ 0xeb4: HW state: 1, stopped: 0
SQ 0xf1f: HW state: 1, stopped: 0
SQ 0xf80: HW state: 1, stopped: 0
SQ 0xfe5: HW state: 1, stopped: 0

$devlink health recover pci/0000:00:09 reporter TX
$devlink health show
pci/0000:00:09.0:
name TX state healthy #err 1 #recover 1 last_dump_ts N/A dump_available false
attributes:
grace_period 500 auto_recover false

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# aba25279 17-Jan-2019 Eran Ben Elisha <eranbe@mellanox.com>

net/mlx5e: Add TX reporter support

Add mlx5e tx reporter to devlink health reporters. This reporter will be
responsible for diagnosing, reporting and recovering of TX errors.
This pa

net/mlx5e: Add TX reporter support

Add mlx5e tx reporter to devlink health reporters. This reporter will be
responsible for diagnosing, reporting and recovering of TX errors.
This patch declares the TX reporter operations and allocate it using the
devlink health API. Currently, this reporter supports reporting and
recovering from send error CQE only. In addition, it adds diagnose
information for the open SQs.

For a local SQ recover (due to driver error report), in case of SQ recover
failure, the recover operation will be considered as a failure.
For a full TX recover, an attempt to close and open the channels will be
done. If this one passed successfully, it will be considered as a
successful recover.

The SQ recover from error CQE flow is not a new feature in the driver,
this patch re-organize the functions and adapt them for the devlink
health API. For this purpose, move code from en_main.c to a new file
named reporter_tx.c.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


1234