#
d43bce6e |
| 18-Jan-2023 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: add info when FD released while device still in use
When user closes the device file descriptor, it is checked whether the device is still in use, and a message is printed if it is
accel/habanalabs: add info when FD released while device still in use
When user closes the device file descriptor, it is checked whether the device is still in use, and a message is printed if it is. To make this message more informative, add to this print also the reason due to which the device is considered as in use. The possible reasons which are checked for now are active CS and exported dma-buf.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
show more ...
|
#
323adae9 |
| 22-Jan-2023 |
Oded Gabbay <ogabbay@kernel.org> |
accel/habanalabs: save class in hdev
It is more concise than to pass it to device init. Once we will add the accel class, then we won't need to change the function signatures.
Signed-off-by: Oded G
accel/habanalabs: save class in hdev
It is more concise than to pass it to device init. Once we will add the accel class, then we won't need to change the function signatures.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
show more ...
|
#
89859a89 |
| 22-Jan-2023 |
Oded Gabbay <ogabbay@kernel.org> |
accel/habanalabs: split cdev creation to separate function
Move the cdev creation code from the main hdev init function to a separate function. This will make the code more readable once we add the
accel/habanalabs: split cdev creation to separate function
Move the cdev creation code from the main hdev init function to a separate function. This will make the code more readable once we add the accel registration code (instead/in addition to legacy cdev).
Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
show more ...
|
#
b3c9a041 |
| 13-Mar-2023 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-fixes into drm-misc-fixes
Backmerging to get latest upstream.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
#
b8fa3e38 |
| 10-Mar-2023 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
Merge remote-tracking branch 'acme/perf-tools' into perf-tools-next
To pick up perf-tools fixes just merged upstream.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
a5c95ca1 |
| 22-Feb-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'drm-next-2023-02-23' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie: "There are a bunch of changes all over in the usual places.
Highlights:
- habanala
Merge tag 'drm-next-2023-02-23' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie: "There are a bunch of changes all over in the usual places.
Highlights:
- habanalabs moves from misc to accel
- first accel driver for Intel VPU (Versatile Processing Unit) inference engine
- dropped all the ancient legacy DRI1 drivers. I think it's been at least 10 years since anyone has heard about these.
- Intel DG2 updates and prelim Meteorlake enablement
- etnaviv adds support for Versilicon NPU device (a GPU like engine with inference accelerators)
Detailed summary:
Removals: - remove legacy dri1 drivers: i810, mga, r128, savage, sis, tdfx, via
New driver: - intel VPU accelerator driver - habanalabs comes via drm tree now
drm/core: - use drm_dbg_ helpers in several places - Document defaults for CRTC backgrounds - Document use of drm_minor
edid: - improve mode parsing and refactoring
connector: - support analog TV mode property
media: - add some common formats
udmabuf: - add vmap/vunmap methods
fourcc: - add XRGB1555 and RGB565 formats - document open source user waiver
firmware: - fix color-format selection for system framebuffer
format-helper: - Add conversion from XRGB8888 to various sysfb formats - Make XRGB8888 the only driver-emulated legacy format - Add conversion from XRGB8888 to XBGR8888 and ABGR8888
fb-helper: - fix preferred depth and bpp values across drivers - Avoid blank consoles from selecting an incorrect color format
probe-helper: - Enable/disable HPD on connectors
scheduler: - Fix lockup in drm_sched_entity_kill() - Deprecate drm_sched_resubmit_jobs()
bridge: - remove unused functions - implement i2c probe_new in various drivers - ite-it6505: Locking fixes, Cache EDID data - ite-it66121: Support IT6610 chip - lontium-tl9611: Fix HDMI on DragonBoard 845c - parade-ps8640: Use atomic bridge functions - Support i.MX93 LDB plus DT bindings
debugfs: - add per device helpers and convert drivers
displayport: - mst fixes - add DP adaptive sync DPCD definitions
fbdev: - always pick 32bpp as default - remove some unused code
simpledrm: - support system memory framebuffers
panel: - add orientation quirks for Lenovo Yoga Tab 3 X90F and DynaBook K50 - Use ktime_get_boottime() to measure power-down delay - Fix auto-suspend delay - Visionox VTDR6130 AMOLED DSI - Support Himax HX8394 - Convert many drivers to common generic DSI write-sequence helper - AUO A030JTN01
ttm: - drop bo wait wrapper - fix MIPS build
habanalabs: - moved driver to accel subsystem - gaudi2 decoder error improvement - more trace events - Gaudi2 abrupt reset by firmware support - add uAPI to flush memory transactions - add uAPI to pass through userspace reqs to fw - remove dma-buf export by handle
amdgpu: - add new INFO queries for peak and min sclk/mclk for profile modes - Add PCIe info to the INFO IOCTL - secure display support for multiple displays - DML optimizations - DCN 3.2 updates - PSR updates - DP 2.1 updates - SR-IOV RAS updates - VCN RAS support - SMU 13.x updates - Switch 1 element arrays to flexible arrays - Add RAS support for DF 4.3 - Stack size improvements - S0ix rework - Allow 0 as a vram limit on APUs - Handle profiling modes for SMU13.x - Fix possible segfault in failure case - Rework FW requests to happen in early_init for all IPs so that we don't lose the sbios console if FW is missing - Fix power reporting on certain firmwares for CZN/RN - Allow S0ix without BIOS support - Enable freesync over PCon - Re-enable the AGP aperture on GMC 11.x
amdkfd: - Error handling fixes - PASID fixes - Fix for cleared VRAM BOs - Fix cleanup if GPUVM creation fails - Memory accounting fix - Use resource_size rather than open codeing it - GC11 mGPU fix
radeon: - Switch 1 element arrays to flexible arrays - Fix memory leak on shutdown - move to new logging
i915: - Meteorlake display/OA/GSC fw/workarounds enabling - DP MST DSC support - Gamma/degamma readout support for the state checker - Enable SDP split support for DP 2.0 - Add probe blocking support to i915.force_probe parameter - Enable Xe HP 4tile support - Avoid display direct calls to uncore - Fix HuC delayed load memory leaks - Add DG2 workarounds Wa_18018764978 and Wa_18019271663 - Improve suspend / resume times with VT-d scanout workaround active - Fix DG2 visual corruption on small BAR systems by not forgetting to copy CCS aux state - Fix TLB invalidation for Gen12.50 video and compute engines - Enable HF-EEODB by switching HDMI, DP and LVDS to use struct drm_edid - Start using unversioned DMC firmware paths for new platforms - ELD refactor: Stop using hardware buffer, precompute ELD - lots of display code refactoring
nouveau: - drop legacy ioctl support - replace 0-sized array
msm: - dpu/dsi/mdss: Support for SM8350, SM8450 SM8550 and SC8280XP platform - Added bindings for SM8150 - dpu: Partial support for DSC on SM8150 and SM8250 - dpu: Fixed color transformation matrix being lost on suspend/resume - dp: Support SDM845 and SC8280XP platforms - dp: Support for limiting DP link rate via DT property - dsi: Validate display modes according to the DSI OPP table - dsi: DSI PHY support for the SM6375 platform - Add MSM_SUBMIT_BO_NO_IMPLICI - a2xx: Support to load legacy firmware - a6xx: GPU devcore dump updates for a650/a660 - GPU devfreq tuning and fixes - Turn 8960 HDMI PHY into clock provider, - Make 8960 HDMI PHY use PXO clock from DT
etnaviv: - experimental versilicon NPU support - report GPU load via fdinfo format - MMU fault message improvements
tegra: - rework syncpoint interrupt
mediatek: - DSI timing fix - fix config deps
ast: - various fixes
exynos: - restore bridge chain order fixes
gud: - convert to shadow plane buffers - perform flushing synchronously during atomic update - Use new debugfs helpers
arm/hdlcd: - Use new debugfs helper
ili9486: - Support 16-bit pixel data
imx: - Split off IPUv3 driver
mipi-dbi: - convert to DRM shadow-plane helpers - rsp driver changes - Support separate I/O-voltage supply
mxsfb: - Depend on ARCH_MXS or ARCH_MXC
sun4i: - convert to new TV mode property
vc4: - convert to new TV mode property - kunit tests - Support RGB565 and RGB666 formats - convert dsi driver to bridge - Various HVS an CRTC fixes
v3d: - Do not opencode drm_gem_object_lookup()
virtio: - improve tracing
vkms: - support small cursors in IGT tests - Fix SEGFAULT from incorrect GEM-buffer mapping
rcar-du: - fixes and improvements"
* tag 'drm-next-2023-02-23' of git://anongit.freedesktop.org/drm/drm: (1455 commits) msm/fbdev: fix unused variable warning with clang. drm/fb-helper: Remove drm_fb_helper_unprepare() from drm_fb_helper_fini() dma-buf: make kobj_type structure constant drm/shmem-helper: Fix locking for drm_gem_shmem_get_pages_sgt() drm/amd/display: disable SubVP + DRR to prevent underflow drm/amd/display: Fail atomic_check early on normalize_zpos error drm/amd/pm: avoid unaligned access warnings drm/amd/display: avoid unaligned access warnings drm/amd/display: Remove duplicate/repeating expressions drm/amd/display: Remove duplicate/repeating expression drm/amd/display: Make variables declaration inside ifdef guard drm/amd/display: Fix excess arguments on kernel-doc drm/amd/display: Add previously missing includes drm/amd/amdgpu: Add function prototypes to headers drm/amd/display: Add function prototypes to headers drm/amd/display: Turn global functions into static drm/amd/display: remove unused _calculate_degamma_curve function drm/amd/display: remove unused func declaration from resource headers drm/amd/display: unset initial value for tf since it's never used drm/amd/display: camel case cleanup in color_gamma file ...
show more ...
|
#
df5bf3b9 |
| 31-Jan-2023 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-next into drm-misc-next
Backmerging to get v6.2-rc6.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
#
729b3c15 |
| 29-Jan-2023 |
Dave Airlie <airlied@redhat.com> |
Merge tag 'drm-habanalabs-next-2023-01-26' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into drm-next
This tag contains habanalabs driver and accel changes for v6.3:
- Moved the
Merge tag 'drm-habanalabs-next-2023-01-26' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into drm-next
This tag contains habanalabs driver and accel changes for v6.3:
- Moved the driver to the accel subsystem. Currently only the files were moved (including the uapi file which was also renamed). This doesn't include registering to the accel subsystem. This will probably be only in the next kernel version.
- In case of decoder error (axi error) in Gaudi2, we can now find the exact IP that initiated the erroneous transaction and print the details for better debug.
- Add more trace events. We now can trace mmio transactions and communication with the preboot firmware.
- Add to Gaudi2 support for abrupt reset that is done by the firmware. This was support so far only for Gaudi1.
- Add uAPI to flush memory transactions (to the device memory). This is needed by the communications library in case of doing p2p with a host NIC which access our HBM directly through the PCI BAR.
- Add uAPI to pass-through a request from user-space to firmware and get the result back to user-space. This will allow the driver code to avoid the need to add new packet (in the communication channel with the firmware) for every new request type.
- Remove the option to export dma-buf by memory allocation handle in our uAPI. This was planned for Gaudi2 but was never used. Instead, we will do export by memory address (same as Gaudi1). In addition, we added the option to specify an offset to the address. This is needed in Gaudi2 because there the user allocates the entire HBM in one allocation, but would like to export only small part of it.
- Multiple bug fixes, refactors and small optimizations.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Oded Gabbay <ogabbay@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230126213317.GA1520525@ogabbay-vm-u20.habana-labs.com
show more ...
|
#
44155bb6 |
| 17-Jan-2023 |
Tomer Tayar <ttayar@habana.ai> |
habanalabs: clear in_compute_reset when escalating to hard reset
If resetting device upon release while the release watchdog work is scheduled, the compute reset is replaced with hard reset. In this
habanalabs: clear in_compute_reset when escalating to hard reset
If resetting device upon release while the release watchdog work is scheduled, the compute reset is replaced with hard reset. In this case, need to clear the in_compute_reset indication in the device reset information structure.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
#
0c93eb09 |
| 17-Jan-2023 |
Tomer Tayar <ttayar@habana.ai> |
habanalabs: run error handling if scrub_device_mem fails after reset
If device memory scrubbing from hl_device_reset() fails, we return with an error code but not perform error handling code.
Signe
habanalabs: run error handling if scrub_device_mem fails after reset
If device memory scrubbing from hl_device_reset() fails, we return with an error code but not perform error handling code.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
#
a6685b57 |
| 11-Jan-2023 |
Koby Elbaz <kelbaz@habana.ai> |
habanalabs: block soft-reset on an unusable device
A device with status malfunction indicates that it can't be used. In such a case we do not support certain reset types, e.g., all kinds of soft-res
habanalabs: block soft-reset on an unusable device
A device with status malfunction indicates that it can't be used. In such a case we do not support certain reset types, e.g., all kinds of soft-resets (compute reset, inference soft-reset), and reset upon device release.
A hard-reset is the only way that an unusable device can change its status. All other reset procedures can't put the device in a reset procedure, which might ultimately cause the device to change its status, unintentionally, to become operational again.
Such a scenario has recently occurred, when a user requested a hard-reset while another heavy user workload was ongoing (reset request is queued). Since the workload couldn't finish within reset's timeout limits, the reset has failed and set a device status malfunction. Eventually, when the user released the FD, an unsuccessful soft-reset occurred, hence followed by an additional hard-reset that changed the ASICs status back to be operational.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
#
2a0a839b |
| 29-Dec-2022 |
Moti Haimovski <mhaimovski@habana.ai> |
habanalabs: extend fatal messages to contain PCI info
This commit attaches the PCI device address to driver fatal messages in order to ease debugging in multi-device setups.
Signed-off-by: Moti Hai
habanalabs: extend fatal messages to contain PCI info
This commit attaches the PCI device address to driver fatal messages in order to ease debugging in multi-device setups.
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
Revision tags: v6.0.11 |
|
#
54fcb384 |
| 30-Nov-2022 |
Ohad Sharabi <osharabi@habana.ai> |
habanalabs: trace LBW reads/writes
Add traces to LBW reads/writes. This may be handy when debugging configuration failure or events when tracking configuration flow.
Signed-off-by: Ohad Sharabi <os
habanalabs: trace LBW reads/writes
Add traces to LBW reads/writes. This may be handy when debugging configuration failure or events when tracking configuration flow.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
#
571d1a72 |
| 23-Dec-2022 |
Koby Elbaz <kelbaz@habana.ai> |
habanalabs: protect access to dynamic mem 'user_mappings'
When HL_INFO_USER_MAPPINGS IOCTL is called, we copy_to_user from a dynamically allocated memory - 'user_mappings'. Since freeing/allocating
habanalabs: protect access to dynamic mem 'user_mappings'
When HL_INFO_USER_MAPPINGS IOCTL is called, we copy_to_user from a dynamically allocated memory - 'user_mappings'. Since freeing/allocating it happens in runtime (upon a page fault), it not unlikely to access it even before being initially allocated (i.e., accessing a NULL pointer).
The solution is to simply mark the spot when the err info has been collected, and that way to know whether err info (either page fault or RAZWI) is available to be read.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
#
78baccbd |
| 25-Dec-2022 |
Koby Elbaz <kelbaz@habana.ai> |
habanalabs: refactor razwi/page-fault information structures
This refactor makes the code clearer and the new variables' names better describe their roles.
Signed-off-by: Koby Elbaz <kelbaz@habana.
habanalabs: refactor razwi/page-fault information structures
This refactor makes the code clearer and the new variables' names better describe their roles.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
#
e2a079a2 |
| 06-Dec-2022 |
Tomer Tayar <ttayar@habana.ai> |
habanalabs: verify that kernel CB is destroyed only once
Remove the distinction between user CB and kernel CB, and verify for both that they are not destroyed more than once.
As kernel CB might be
habanalabs: verify that kernel CB is destroyed only once
Remove the distinction between user CB and kernel CB, and verify for both that they are not destroyed more than once.
As kernel CB might be taken from the pre-allocated CB pool, so we need to clear the handle destroyed indication when returning a CB to the pool.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
#
e65e175b |
| 26-Dec-2022 |
Oded Gabbay <ogabbay@kernel.org> |
habanalabs: move driver to accel subsystem
Now that we have a subsystem for compute accelerators, move the habanalabs driver to it.
This patch only moves the files and fixes the Makefiles. Future p
habanalabs: move driver to accel subsystem
Now that we have a subsystem for compute accelerators, move the habanalabs driver to it.
This patch only moves the files and fixes the Makefiles. Future patches will change the existing code to register to the accel subsystem and expose the accel device char files instead of the habanalabs device char files.
Update the MAINTAINERS file to reflect this change.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|