#
7f6f26d7 |
| 16-Apr-2023 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-next into drm-misc-next
Backmerging drm-next to sync with msm tree. Resolves a conflict between aperture-helper changes and msm's use of those interfaces.
Signed-off-by: Thomas Zimmer
Merge drm/drm-next into drm-misc-next
Backmerging drm-next to sync with msm tree. Resolves a conflict between aperture-helper changes and msm's use of those interfaces.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
show more ...
|
#
ea68a3e9 |
| 11-Apr-2023 |
Joonas Lahtinen <joonas.lahtinen@linux.intel.com> |
Merge drm/drm-next into drm-intel-gt-next
Need to pull in commit from drm-next (earlier in drm-intel-next):
1eca0778f4b3 ("drm/i915: add struct i915_dsm to wrap dsm members together")
In order to
Merge drm/drm-next into drm-intel-gt-next
Need to pull in commit from drm-next (earlier in drm-intel-next):
1eca0778f4b3 ("drm/i915: add struct i915_dsm to wrap dsm members together")
In order to merge following patch to drm-intel-gt-next:
https://patchwork.freedesktop.org/patch/530942/?series=114925&rev=6
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
show more ...
|
#
838ac90d |
| 11-Apr-2023 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
Merge tag 'drm-habanalabs-next-2023-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into drm-next
This tag contains additional habanalabs driver changes for v6.4:
- uAPI cha
Merge tag 'drm-habanalabs-next-2023-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into drm-next
This tag contains additional habanalabs driver changes for v6.4:
- uAPI changes: - Add a definition of a new Gaudi2 server type. This is used by userspace to know what is the connectivity between the accelerators inside the server
- New features and improvements: - speedup h/w queues test in Gaudi2 to reduce device initialization times.
- Firmware related fixes: - Fixes to the handshake protocol during f/w initialization. - Sync f/w events interrupt in hard reset to avoid warning message. - Improvements to extraction of the firmware version.
- Misc bug fixes and code cleanups. Notable fixes are: - Multiple fixes for interrupt handling in Gaudi2. - Unmap mapped memory in case TLB invalidation fails.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Oded Gabbay <ogabbay@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230410124637.GA2441888@ogabbay-vm-u20.habana-labs.com
show more ...
|
#
802f25b6 |
| 21-Mar-2023 |
Tal Cohen <talcohen@habana.ai> |
accel/habanalabs: sync f/w events interrupt in hard reset
Receiving events from FW, while the device is in hard reset, causes a warning message in Driver log. The message may point to a problem in t
accel/habanalabs: sync f/w events interrupt in hard reset
Receiving events from FW, while the device is in hard reset, causes a warning message in Driver log. The message may point to a problem in the Driver or FW. But It also can appear as a result of events that have been sent from FW just before the hard reset. In order to avoid receiving events from FW while the device is in reset and is already in 'disabled' mode, sync the f/w events interrupt right before setting the device to 'disabled'.
Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
#
3a8d7c3a |
| 22-Mar-2023 |
Tal Cohen <talcohen@habana.ai> |
accel/habanalabs: send disable pci when compute ctx is active
Fix an issue in hard reset flow in which the driver didn't send a disable pci message if there was an active compute context. In hard re
accel/habanalabs: send disable pci when compute ctx is active
Fix an issue in hard reset flow in which the driver didn't send a disable pci message if there was an active compute context. In hard reset, disable pci message should be sent no matter if a compute context exists or not.
Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
show more ...
|
#
a855f710 |
| 21-Mar-2023 |
Tal Cohen <talcohen@habana.ai> |
accel/habanalabs: remove duplicated disable pci msg
The disable pci message is sent in reset device. It informs the FW not to raise more EQs. The Driver may ignore received EQs, when the device is i
accel/habanalabs: remove duplicated disable pci msg
The disable pci message is sent in reset device. It informs the FW not to raise more EQs. The Driver may ignore received EQs, when the device is in disabled mode. The duplication happens when hard reset is scheduled during compute reset and also performs 'escalate_reset_flow'.
Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
show more ...
|
#
248ed9e2 |
| 23-Mar-2023 |
Cai Huoqing <cai.huoqing@linux.dev> |
accel/habanalabs: Remove redundant pci_clear_master
Remove pci_clear_master to simplify the code, the bus-mastering is also cleared in do_pci_disable_device, like this: ./drivers/pci/pci.c:2197 stat
accel/habanalabs: Remove redundant pci_clear_master
Remove pci_clear_master to simplify the code, the bus-mastering is also cleared in do_pci_disable_device, like this: ./drivers/pci/pci.c:2197 static void do_pci_disable_device(struct pci_dev *dev) { u16 pci_command;
pci_read_config_word(dev, PCI_COMMAND, &pci_command); if (pci_command & PCI_COMMAND_MASTER) { pci_command &= ~PCI_COMMAND_MASTER; pci_write_config_word(dev, PCI_COMMAND, pci_command); }
pcibios_disable_device(dev); }. And dev->is_busmaster is set to 0 in pci_disable_device.
Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
#
8ba264f4 |
| 30-Mar-2023 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
Merge remote-tracking branch 'drm/drm-next' into drm-misc-next
Backmerge to get rc4.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
|
#
cecdd52a |
| 28-Mar-2023 |
Rodrigo Vivi <rodrigo.vivi@intel.com> |
Merge drm/drm-next into drm-intel-next
Catch up with 6.3-rc cycle...
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
d36d68fd |
| 21-Mar-2023 |
Dave Airlie <airlied@redhat.com> |
Merge tag 'drm-habanalabs-next-2023-03-20' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into drm-next
This tag contains habanalabs driver and accel changes for v6.4:
- uAPI chan
Merge tag 'drm-habanalabs-next-2023-03-20' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into drm-next
This tag contains habanalabs driver and accel changes for v6.4:
- uAPI changes:
- Add opcodes to the CS ioctl to allow user to stall/resume specific engines inside Gaudi2. This is to allow the user to perform power testing/measurements when training different topologies.
- Expose in the INFO ioctl the amount of device memory that the driver and f/w reserve for themselves.
- Expose in the INFO ioctl a bit-mask of the available rotator engines in Gaudi2. This is to align with other engines that are already exposed.
- Expose in the INFO ioctl the register's address of the f/w that should be used to trigger interrupts from within the user's code running in the compute engines.
- Add a critical-event bit in the eventfd bitmask so the user will know the event that was received was critical, and a reset will now occur
- Expose in the INFO ioctl two new opcodes to fetch information on h/w and f/w events. The events recorded are the events that were reported in the eventfd.
- New features and improvements:
- Add a dedicated interrupt ID in MSI-X in the device to the notification of an unexpected user-related event in Gaudi2. Handle it in the driver by reporting this event.
- Allow the user to fetch the device memory current usage even when the device is undergoing compute-reset (a reset type that only clears the compute engines).
- Enable graceful reset mechanism for compute-reset. This will give the user a few seconds before the device is reset. For example, the user can, during that time, perform certain device operations (dump data for debug) or close the device in an orderly fashion.
- Align the decoder with the rest of the engines in regard to notification to the user about interrupts and in regard to performing graceful reset when needed (instead of immediate reset).
- Add support for assert interrupt from the TPC engine.
- Get the reset type that is necessary to perform per event from the auto-generated irq_map array.
- Print the specific reason why a device is still in use when notifying to the user about it (after the user closed the device's FD).
- Move to threaded IRQ when handling interrupts of workload completions.
- Firmware related fixes:
- Fix RAZWI event handler to match newest f/w version.
- Read error cause register in dma core events because the f/w doesn't do that.
- Increase maximum time to wait for completion of Gaudi2 reset due to f/w bug.
- Align to the latest firmware specs.
- Enforce the release order of the compute device and dma-buf. i.e increment the device file refcount for any dma-buf that was exported for that device. This will make sure the compute device release function won't be called until the user closes all the FDs of the relevant dma-bufs. Without this change, closing the device's FD before/without closing the dma-buf's FD would always lead to hard-reset of the device.
- Fix a link in the drm documentation to correctly point to the accel section.
- Compilation warnings cleanups
- Misc bug fixes and code cleanups
Signed-off-by: Dave Airlie <airlied@redhat.com>
# -----BEGIN PGP SIGNATURE----- # # iQEzBAABCgAdFiEE7TEboABC71LctBLFZR1NuKta54AFAmQYfcAACgkQZR1NuKta # 54DB4Af/SuiHZkVXwr+yHPv9El726rz9ZQD7mQtzNmehWGonwAvz15yqocNMUSbF # JbqE/vrZjvbXrP1Uv5UrlRVdnFHSPV18VnHU4BMS/WOm19SsR6vZ0QOXOoa6/AUb # w+kF3D//DbFI4/mTGfpH5/pzwu51ti8aVktosPFlHIa8iI8CB4/4IV+ivQ8UW4oK # HyDRkIvHdRmER7vGOfhwhsr4zdqSlJBYrv3C3Z1dkSYBPW/5ICbiM1UlKycwdYKI # cajQBSdUQwUCWnI+i8RmSy3kjNO6OE4XRUvTv89F2bQeyK/1rJLG2m2xZR/Ml/o5 # 7Cgvbn0hWZyeqe7OObYiBlSOBSehCA== # =wclm # -----END PGP SIGNATURE----- # gpg: Signature made Tue 21 Mar 2023 01:37:36 AEST # gpg: using RSA key ED311BA00042EF52DCB412C5651D4DB8AB5AE780 # gpg: Can't check signature: No public key From: Oded Gabbay <ogabbay@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230320154026.GA766126@ogabbay-vm-u20.habana-labs.com
show more ...
|
#
e752ab11 |
| 20-Mar-2023 |
Rob Clark <robdclark@chromium.org> |
Merge remote-tracking branch 'drm/drm-next' into msm-next
Merge drm-next into msm-next to pick up external clk and PM dependencies for improved a6xx GPU reset sequence.
Signed-off-by: Rob Clark <ro
Merge remote-tracking branch 'drm/drm-next' into msm-next
Merge drm-next into msm-next to pick up external clk and PM dependencies for improved a6xx GPU reset sequence.
Signed-off-by: Rob Clark <robdclark@chromium.org>
show more ...
|
#
d26a3a6c |
| 17-Mar-2023 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v6.3-rc2' into next
Merge with mainline to get of_property_present() and other newer APIs.
|
#
2e8e9a89 |
| 01-Mar-2023 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: postpone mem_mgr IDR destruction to hpriv_release()
The memory manager IDR is currently destroyed when user releases the file descriptor. However, at this point the user context mi
accel/habanalabs: postpone mem_mgr IDR destruction to hpriv_release()
The memory manager IDR is currently destroyed when user releases the file descriptor. However, at this point the user context might be still held, and memory buffers might be still in use. Later on, calls to release those buffers will fail due to not finding their handles in the IDR, leading to a memory leak. To avoid this leak, split the IDR destruction from the memory manager fini, and postpone it to hpriv_release() when there is no user context and no buffers are used.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
#
28fbc058 |
| 17-Feb-2023 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: use scnprintf() in print_device_in_use_info()
compose_device_in_use_info() was added to handle the snprintf() return value in a single place. However, the buffer size in print_devi
accel/habanalabs: use scnprintf() in print_device_in_use_info()
compose_device_in_use_info() was added to handle the snprintf() return value in a single place. However, the buffer size in print_device_in_use_info() is set such that it would be enough for the max possible print, so compose_device_in_use_info() is not really needed. Moreover, scnprintf() can be used instead of snprintf(), to save the check if the return value larger than the given size.
Cc: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
show more ...
|
#
86b74d84 |
| 15-Feb-2023 |
Dafna Hirschfeld <dhirschfeld@habana.ai> |
accel/habanalabs: assert return value of hw_fini
Since hw_fini return error code for failure indication, we should check its return value. Currently it might only fail upon soft-reset from hl_device
accel/habanalabs: assert return value of hw_fini
Since hw_fini return error code for failure indication, we should check its return value. Currently it might only fail upon soft-reset from hl_device_reset. Later patch will add hw_fini failure in case of polling timeout in hard-reset.
Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
#
efbd36b2 |
| 19-Feb-2023 |
Sagiv Ozeri <sozeri@habana.ai> |
accel/habanalabs: add device id to all threads names
Compute driver threads names will start with hlX-*, when X is the device id. This will help distinguish them from the NIC thread names.
Signed-o
accel/habanalabs: add device id to all threads names
Compute driver threads names will start with hlX-*, when X is the device id. This will help distinguish them from the NIC thread names.
Signed-off-by: Sagiv Ozeri <sozeri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
#
a8c14f53 |
| 12-Feb-2023 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: improve readability of engines idle mask print
Remove leading zeroes when printing the idle mask to make it clearer.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ode
accel/habanalabs: improve readability of engines idle mask print
Remove leading zeroes when printing the idle mask to make it clearer.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
show more ...
|
#
3a621af6 |
| 13-Feb-2023 |
Tom Rix <trix@redhat.com> |
accel/habanalabs: set hl_capture_*_err storage-class-specifier to static
smatch reports drivers/accel/habanalabs/common/device.c:2619:6: warning: symbol 'hl_capture_hw_err' was not declared. Shoul
accel/habanalabs: set hl_capture_*_err storage-class-specifier to static
smatch reports drivers/accel/habanalabs/common/device.c:2619:6: warning: symbol 'hl_capture_hw_err' was not declared. Should it be static? drivers/accel/habanalabs/common/device.c:2641:6: warning: symbol 'hl_capture_fw_err' was not declared. Should it be static?
both are only used in device.c, so they should be static
Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
#
4a2e9d11 |
| 01-Feb-2023 |
Dafna Hirschfeld <dhirschfeld@habana.ai> |
accel/habanalabs: don't trace cpu accessible dma alloc/free
The cpu accessible dma allocations use the gen_pool api which actually does not allocate new memory from the system but manages memory alr
accel/habanalabs: don't trace cpu accessible dma alloc/free
The cpu accessible dma allocations use the gen_pool api which actually does not allocate new memory from the system but manages memory already allocated before. When tracing this together with real dma allocation/free it cause confusing logs like a '0' dma address and a cpu address appearing twice etc.
Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
show more ...
|
#
1d0f9ad7 |
| 08-Feb-2023 |
Dafna Hirschfeld <dhirschfeld@habana.ai> |
accel/habanalabs: in hl_device_reset small refactor for readabilty
in the out_err flow, combine the two cases of soft-reset since they have mostly common code. In addition unlock reset_info.lock aft
accel/habanalabs: in hl_device_reset small refactor for readabilty
in the out_err flow, combine the two cases of soft-reset since they have mostly common code. In addition unlock reset_info.lock after touching reset count.
Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
show more ...
|
#
39ab4da9 |
| 08-Feb-2023 |
Dafna Hirschfeld <dhirschfeld@habana.ai> |
accel/habanalabs: in hl_device_reset remove 'hard_instead_of_soft'
Because this field is only used for debug print, we can do more precise debug directly instead.
Signed-off-by: Dafna Hirschfeld <d
accel/habanalabs: in hl_device_reset remove 'hard_instead_of_soft'
Because this field is only used for debug print, we can do more precise debug directly instead.
Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
show more ...
|
#
7810c524 |
| 08-Feb-2023 |
Dafna Hirschfeld <dhirschfeld@habana.ai> |
accel/habanalabs: tiny refactor of hl_device_reset for readability
Align assignment of reset_upon_device_release to the convention used in this function.
Signed-off-by: Dafna Hirschfeld <dhirschfel
accel/habanalabs: tiny refactor of hl_device_reset for readability
Align assignment of reset_upon_device_release to the convention used in this function.
Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
show more ...
|
#
18d13584 |
| 25-Jan-2023 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: enable graceful reset mechanism for compute-reset
The graceful reset mechanism is currently enabled only for reset requests that will end up with hard-reset. In future, reset reque
accel/habanalabs: enable graceful reset mechanism for compute-reset
The graceful reset mechanism is currently enabled only for reset requests that will end up with hard-reset. In future, reset requests due to errors in some device engines, are going to be modified to request compute-reset, as the much longer hard-reset is not really needed there. To allow it, enable graceful reset also for compute-reset, and reset after user releases the device won't be escalated to hard-reset in those cases. If watchdog expires and user didn't release the device, hard-reset will be initiated in any case.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
show more ...
|
#
57479adb |
| 29-Jan-2023 |
Koby Elbaz <kelbaz@habana.ai> |
accel/habanalabs: disable PCI when escalating compute to hard-reset
In case a compute reset has failed or a request for a hard reset has just arrived, then we escalate current reset procedure from c
accel/habanalabs: disable PCI when escalating compute to hard-reset
In case a compute reset has failed or a request for a hard reset has just arrived, then we escalate current reset procedure from compute to hard-reset. In such a case, the FW should be aware of the updated error cause, and if LKD is the one who performs the reset (rather than the FW), then we ask the FW to disable PCI access.
We would also like to have relevant debug info and therefore we print the currently escalating reset type.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
show more ...
|
#
313e9f63 |
| 10-Jan-2023 |
Moti Haimovski <mhaimovski@habana.ai> |
accel/habanalabs: add critical-event bit in notifier
Enhance the existing user notifications by adding a HW and FW critical event bits to be used when a HW or FW event occur that requires both SW ab
accel/habanalabs: add critical-event bit in notifier
Enhance the existing user notifications by adding a HW and FW critical event bits to be used when a HW or FW event occur that requires both SW abort and hard-resetting the chip.
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
show more ...
|