03be3489 | 24-Aug-2023 |
farah kassabri <fkassabri@habana.ai> |
accel/habanalabs: fix bug in timestamp interrupt handling
[ Upstream commit 0165994c215f321e2d055368f89b424756e340eb ]
There is a potential race between user thread seeking to re-use a timestamp re
accel/habanalabs: fix bug in timestamp interrupt handling
[ Upstream commit 0165994c215f321e2d055368f89b424756e340eb ]
There is a potential race between user thread seeking to re-use a timestamp record with new interrupt id, while this record is still in the middle of interrupt handling and it is about to be freed. Imagine the driver set the record in_use to 0 and only then fill the free_node information. This might lead to unpleasant scenario where the new registration thread detects the record as free to use, and change the cq buff address. That will cause the free_node to get the wrong buffer address to put refcount to.
Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
db5ba2c1 | 09-Aug-2023 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: export dma-buf only if size/offset multiples of PAGE_SIZE
[ Upstream commit 0b75cb5b240fddf181c284d415ee77ef61b418d6 ]
It is currently allowed for a user to export dma-buf with si
accel/habanalabs: export dma-buf only if size/offset multiples of PAGE_SIZE
[ Upstream commit 0b75cb5b240fddf181c284d415ee77ef61b418d6 ]
It is currently allowed for a user to export dma-buf with size and offset that are not multiples of PAGE_SIZE. The exported memory is mapped for the importer device, and there it will be rounded to PAGE_SIZE, leading to actually exporting more than the user intended to. To make the user be aware of it, accept only size and offset which are multiple of PAGE_SIZE.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
e6f49e96 | 22-May-2023 |
Dani Liberman <dliberman@habana.ai> |
accel/habanalabs: refactor error info reset
Moved error info reset code to single function for future use from other places in the driver.
Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewe
accel/habanalabs: refactor error info reset
Moved error info reset code to single function for future use from other places in the driver.
Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
fac91dd5 | 21-May-2023 |
Ofir Bitton <obitton@habana.ai> |
accel/habanalabs: add event queue extra validation
In order to increase reliability of the event queue interface, we apply to Gaudi2 the same mechanism we have in Gaudi1. The extra validation is bas
accel/habanalabs: add event queue extra validation
In order to increase reliability of the event queue interface, we apply to Gaudi2 the same mechanism we have in Gaudi1. The extra validation is basically checking that the received event index matches the expected index.
Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
19aa21b9 | 22-May-2023 |
Ofir Bitton <obitton@habana.ai> |
accel/habanalabs: unsecure TSB_CFG_MTRR regs
In order to utilize Engine Barrier padding, user must have access to this register set.
Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded
accel/habanalabs: unsecure TSB_CFG_MTRR regs
In order to utilize Engine Barrier padding, user must have access to this register set.
Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
ff5c7025 | 18-May-2023 |
Oded Gabbay <ogabbay@kernel.org> |
accel/habanalabs: move ioctl error print to debug level
We don't want to allow users to spam the kernel log and sending ioctls with bad opcodes is a sure way to do it.
Signed-off-by: Oded Gabbay <o
accel/habanalabs: move ioctl error print to debug level
We don't want to allow users to spam the kernel log and sending ioctls with bad opcodes is a sure way to do it.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Ofir Bitton <obitton@habana.ai>
show more ...
|
8a20b381 | 17-May-2023 |
Ofir Bitton <obitton@habana.ai> |
accel/habanalabs: fix bug of not fetching addr_dec info
addr_dec info should always be fetched, regardless of cause value.
Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <o
accel/habanalabs: fix bug of not fetching addr_dec info
addr_dec info should always be fetched, regardless of cause value.
Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
56921023 | 22-May-2023 |
Oded Gabbay <ogabbay@kernel.org> |
accel/habanalabs: remove sim code
There were a few places where simulator only code got into the upstream. Remove those places that can confuse other developers.
Fixes: 2a0a839b6a28 ("habanalabs: e
accel/habanalabs: remove sim code
There were a few places where simulator only code got into the upstream. Remove those places that can confuse other developers.
Fixes: 2a0a839b6a28 ("habanalabs: extend fatal messages to contain PCI info") Cc: Moti Haimovski <mhaimovski@habana.ai> Cc: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
5d658d0c | 08-May-2023 |
Dani Liberman <dliberman@habana.ai> |
accel/habanalabs: mask part of hmmu page fault captured address
When receiving page fault from hmmu, the captured address is scrambled both by HW and by driver. The driver part is unscrambled but th
accel/habanalabs: mask part of hmmu page fault captured address
When receiving page fault from hmmu, the captured address is scrambled both by HW and by driver. The driver part is unscrambled but the HW part isn't getting unscrambled. To avoid declaring wrong address, the HW scrambled part will be masked.
Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
7e63f317 | 10-May-2023 |
Koby Elbaz <kelbaz@habana.ai> |
accel/habanalabs: update state when loading boot fit
Any FW component we load must be followed by a corresponding state update. However, it seems that so far we skipped doing so for the bootfit case
accel/habanalabs: update state when loading boot fit
Any FW component we load must be followed by a corresponding state update. However, it seems that so far we skipped doing so for the bootfit case, so fix that.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
6092cedf | 10-May-2023 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: print qman data on error only for lower qman
By default, the upper QMANs are not used, and instead engines ARCs access the lower QMANs directly. Errors for upper QMANs are therefor
accel/habanalabs: print qman data on error only for lower qman
By default, the upper QMANs are not used, and instead engines ARCs access the lower QMANs directly. Errors for upper QMANs are therefore not expected, and the debug print of the PQ entries is not needed.
Modify the QMAN debug data print on errors to include only information for the lower QMAN.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
54381ee8 | 10-May-2023 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: use lower QM in QM errors handling
The QMAN GLBL_ERR_STS_4 register has indications for errors also in the lower CQ and the ARC CQ, and not just for errors in the lower CP. Modify
accel/habanalabs: use lower QM in QM errors handling
The QMAN GLBL_ERR_STS_4 register has indications for errors also in the lower CQ and the ARC CQ, and not just for errors in the lower CP. Modify the relevant define/struct and the related print to use "lower QM" instead of "lower CP".
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
dcc8fa88 | 08-May-2023 |
Dani Liberman <dliberman@habana.ai> |
accel/habanalabs: use binning info when handling razwi
When receiving sei interrupt from tpc or decoder, we need to check the binning mask because if the engine is binned, the razwi info won't be in
accel/habanalabs: use binning info when handling razwi
When receiving sei interrupt from tpc or decoder, we need to check the binning mask because if the engine is binned, the razwi info won't be in the router of the binned engine, instead will be in the router of the substitute engine.
Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
583f12a8 | 09-May-2023 |
Ofir Bitton <obitton@habana.ai> |
accel/habanalabs: remove support for mmu disable
As mmu disable mode is only used for bring-up stages, let's remove this option and all code related to it.
Signed-off-by: Ofir Bitton <obitton@haban
accel/habanalabs: remove support for mmu disable
As mmu disable mode is only used for bring-up stages, let's remove this option and all code related to it.
Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
b2d61fec | 02-May-2023 |
Koby Elbaz <kelbaz@habana.ai> |
accel/habanalabs: upon DMA errors, use FW-extracted error cause
Initially, the driver used to read the error cause data directly from the ASIC. However, the FW now clears it before the driver could
accel/habanalabs: upon DMA errors, use FW-extracted error cause
Initially, the driver used to read the error cause data directly from the ASIC. However, the FW now clears it before the driver could read it. Therefore we should use the error cause data that is extracted by the FW.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
adda800c | 03-May-2023 |
Oded Gabbay <ogabbay@kernel.org> |
accel/habanalabs: print max timeout value on CS stuck
If a workload got stuck, we print an error to the kernel log about it. Add to that print the configured max timeout value, as that value is not
accel/habanalabs: print max timeout value on CS stuck
If a workload got stuck, we print an error to the kernel log about it. Add to that print the configured max timeout value, as that value is not fixed between ASICs and in addition it can be configured using a kernel module parameter.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Ofir Bitton <obitton@habana.ai>
show more ...
|