History log of /openbmc/linux/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c (Results 26 – 50 of 101)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14, v5.10, v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10, v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51
# 7ce29357 06-Jul-2020 James Zhu <James.Zhu@amd.com>

drm/amdgpu/nbio: add aldebaran support

Aldebaran has a new mmBIF_MMSCH1_DOORBELL_RANGE setting.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Al

drm/amdgpu/nbio: add aldebaran support

Aldebaran has a new mmBIF_MMSCH1_DOORBELL_RANGE setting.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.4.50, v5.7.7, v5.4.49, v5.7.6, v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2, v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43
# f8a98f16 25-May-2020 Hawking Zhang <Hawking.Zhang@amd.com>

drm/amdgpu: fix incorrect EP_STRAP reg offset for aldebaran

mmRCC_DEV0_EPF0_STRAP0 offset in aldebaran is changed
from arcturus

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Kev

drm/amdgpu: fix incorrect EP_STRAP reg offset for aldebaran

mmRCC_DEV0_EPF0_STRAP0 offset in aldebaran is changed
from arcturus

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.4.42, v5.4.41, v5.4.40, v5.4.39, v5.4.38, v5.4.37, v5.4.36, v5.4.35, v5.4.34, v5.4.33, v5.4.32, v5.4.31, v5.4.30, v5.4.29, v5.6, v5.4.28, v5.4.27, v5.4.26, v5.4.25, v5.4.24, v5.4.23, v5.4.22, v5.4.21, v5.4.20, v5.4.19, v5.4.18, v5.4.17, v5.4.16, v5.5, v5.4.15, v5.4.14, v5.4.13, v5.4.12, v5.4.11, v5.4.10, v5.4.9, v5.4.8, v5.4.7, v5.4.6, v5.4.5, v5.4.4, v5.4.3, v5.3.15, v5.4.2, v5.4.1, v5.3.14, v5.4, v5.3.13, v5.3.12
# 759eb38e 18-Nov-2019 Le Ma <le.ma@amd.com>

drm/amdgpu: correct mmBIF_SDMA4_DOORBELL_RANGE address for aldebaran

On aldebaran, mmBIF_SDMA4_DOORBELL_RANGE isn't right next to
mmBIF_SDMA3_DOORBELL_RANGE.

Signed-off-by: Le Ma <le.ma@amd.com>
Re

drm/amdgpu: correct mmBIF_SDMA4_DOORBELL_RANGE address for aldebaran

On aldebaran, mmBIF_SDMA4_DOORBELL_RANGE isn't right next to
mmBIF_SDMA3_DOORBELL_RANGE.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 9ca0674a 28-Dec-2020 Likun Gao <Likun.Gao@amd.com>

drm/amdgpu: remove redundant logic related HDP

Remove hdp_flush function from amdgpu_nbio struct as it have been unified
into hdp struct.
Remove the include about hdp register which was not used.
V2

drm/amdgpu: remove redundant logic related HDP

Remove hdp_flush function from amdgpu_nbio struct as it have been unified
into hdp struct.
Remove the include about hdp register which was not used.
V2: Remove hdp golden setting which is unnecessary.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# f75e94d8 04-Aug-2020 Guchun Chen <guchun.chen@amd.com>

drm/amdgpu: bypass querying ras error count registers

Once ras recovery is issued by ras sync flood interrupt or
ras controller interrupt, add this guard to bypass or execute
ras error count registe

drm/amdgpu: bypass querying ras error count registers

Once ras recovery is issued by ras sync flood interrupt or
ras controller interrupt, add this guard to bypass or execute
ras error count register harvest of all IPs.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 6952e99c 10-Apr-2020 Guchun Chen <guchun.chen@amd.com>

drm/amdgpu: refine ras related message print

Prefix ras related kernel message logging with PCI
device info by replacing DRM_INFO/WARN/ERROR with
dev_info/warn/err. This can clearly tell user about

drm/amdgpu: refine ras related message print

Prefix ras related kernel message logging with PCI
device info by replacing DRM_INFO/WARN/ERROR with
dev_info/warn/err. This can clearly tell user about
GPU device information where ras is. And add some
other ras message printing to make it more clear
and friendly as well.

Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# b635ae87 18-Mar-2020 Alex Sierra <alex.sierra@amd.com>

drm/amdgpu: ih doorbell size of range changed for nbio v7.4

[Why]
nbio v7.4 size of ih doorbell range is 64 bit. This requires 2 DWords per register.

[How]
Change ih doorbell size from 2 to 4. This

drm/amdgpu: ih doorbell size of range changed for nbio v7.4

[Why]
nbio v7.4 size of ih doorbell range is 64 bit. This requires 2 DWords per register.

[How]
Change ih doorbell size from 2 to 4. This means two Dwords per ring.
Current configuration uses two ih rings.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 3aa0115d 04-Mar-2020 Monk Liu <Monk.Liu@amd.com>

drm/amdgpu: cleanup all virtualization detection routine

we need to move virt detection much earlier because:
1) HW team confirms us that RCC_IOV_FUNC_IDENTIFIER will always
be at DE5 (dw) mmio offs

drm/amdgpu: cleanup all virtualization detection routine

we need to move virt detection much earlier because:
1) HW team confirms us that RCC_IOV_FUNC_IDENTIFIER will always
be at DE5 (dw) mmio offset from vega10, this way there is no
need to implement detect_hw_virt() routine in each nbio/chip file.
for VI SRIOV chip (tonga & fiji), the BIF_IOV_FUNC_IDENTIFIER is at
0x1503

2) we need to acknowledged we are SRIOV VF before we do IP discovery because
the IP discovery content will be updated by host everytime after it recieved
a new coming "REQ_GPU_INIT_DATA" request from guest (there will be patches
for this new handshake soon).

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 3cd4f618 13-Feb-2020 Guchun Chen <guchun.chen@amd.com>

drm/amdgpu: record non-zero error counter info in NBIO before resetting GPU

When NBIO's RAS error happens, before trigging GPU reset, it's needed
to record error counter information, which can corre

drm/amdgpu: record non-zero error counter info in NBIO before resetting GPU

When NBIO's RAS error happens, before trigging GPU reset, it's needed
to record error counter information, which can correct the error counter
value missed issue when reading from debugfs.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 8831fa6e 24-Dec-2019 Guchun Chen <guchun.chen@amd.com>

drm/amdgpu: simplify function return logic

Former return logic is redundant.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexand

drm/amdgpu: simplify function return logic

Former return logic is redundant.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 61934624 13-Dec-2019 Guchun Chen <guchun.chen@amd.com>

drm/amdgpu: drop useless BACO arg in amdgpu_ras_reset_gpu

BACO reset mode strategy is determined by latter func when
calling amdgpu_ras_reset_gpu. So not to confuse audience, drop
it.

Signed-off-by

drm/amdgpu: drop useless BACO arg in amdgpu_ras_reset_gpu

BACO reset mode strategy is determined by latter func when
calling amdgpu_ras_reset_gpu. So not to confuse audience, drop
it.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 5c39d600 22-Nov-2019 Le Ma <le.ma@amd.com>

drm/amdgpu: clear uncorrectable parity error status bit

This should be cleared during every nbif uncorrectable error cleanup work.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <H

drm/amdgpu: clear uncorrectable parity error status bit

This should be cleared during every nbif uncorrectable error cleanup work.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 28f87950 22-Nov-2019 Le Ma <le.ma@amd.com>

drm/amdgpu: clear ras controller status registers when interrupt occurs

To fix issue that ras controller interrupt cannot be triggered anymore after
one time nbif uncorrectable error. And error coun

drm/amdgpu: clear ras controller status registers when interrupt occurs

To fix issue that ras controller interrupt cannot be triggered anymore after
one time nbif uncorrectable error. And error count is stored in nbif ras object
for query.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.3.11, v5.3.10, v5.3.9, v5.3.8
# 4a2d9356 21-Oct-2019 Le Ma <Le.Ma@amd.com>

drm/amdgpu: remove ras global recovery handling from ras_controller_int handler

v2: add notification when ras controller interrupt generates

Signed-off-by: Le Ma <Le.Ma@amd.com>
Reviewed-by: Hawkin

drm/amdgpu: remove ras global recovery handling from ras_controller_int handler

v2: add notification when ras controller interrupt generates

Signed-off-by: Le Ma <Le.Ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.3.7, v5.3.6
# 956f6705 11-Oct-2019 Le Ma <le.ma@amd.com>

drm/amdgpu/soc15: disable doorbell interrupt as part of BACO entry sequence

Workaround to make RAS recovery work in BACO reset.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Alex Deucher <alexa

drm/amdgpu/soc15: disable doorbell interrupt as part of BACO entry sequence

Workaround to make RAS recovery work in BACO reset.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.3.5, v5.3.4, v5.3.3, v5.3.2, v5.3.1, v5.3
# 3e103fc3 10-Sep-2019 Kent Russell <kent.russell@amd.com>

Revert "drm/amdgpu/nbio7.4: add hw bug workaround for vega20"

This reverts commit e01f2d41895102d824c6b8f5e011dd5e286d5e8b.

VG20 did not require this workaround, as the fix is in the VBIOS.
Leave V

Revert "drm/amdgpu/nbio7.4: add hw bug workaround for vega20"

This reverts commit e01f2d41895102d824c6b8f5e011dd5e286d5e8b.

VG20 did not require this workaround, as the fix is in the VBIOS.
Leave VG10/12 workaround as some older shipped cards do not have the
VBIOS fix in place, and the kernel workaround is required in those
situations

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 1a3f2e8c 10-Sep-2019 Guchun Chen <guchun.chen@amd.com>

drm/amdgpu: implement ras query function for pcie bif

ras error query funtionality implementation

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Review

drm/amdgpu: implement ras query function for pcie bif

ras error query funtionality implementation

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.2.14, v5.3-rc8, v5.2.13, v5.2.12
# 52652ef2 04-Sep-2019 Guchun Chen <guchun.chen@amd.com>

drm/amdgpu: add ras error query count interface for nbio

Add the interface query_ras_error_count for nbio.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.co

drm/amdgpu: add ras error query count interface for nbio

Add the interface query_ras_error_count for nbio.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 1bd252c5 10-Sep-2019 Guchun Chen <guchun.chen@amd.com>

drm/amdgpu: remove duplicated header file include

amdgpu_ras.h is already included.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex De

drm/amdgpu: remove duplicated header file include

amdgpu_ras.h is already included.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 1c70d3d9 02-Sep-2019 Hawking Zhang <Hawking.Zhang@amd.com>

drm/amdgpu/nbio: switch to amdgpu_nbio_ras_late_init helper function

amdgpu_nbio_ras_late_init is used to init nbio specfic
ras debugfs/sysfs node and nbio specific interrupt handler.
It can be shar

drm/amdgpu/nbio: switch to amdgpu_nbio_ras_late_init helper function

amdgpu_nbio_ras_late_init is used to init nbio specfic
ras debugfs/sysfs node and nbio specific interrupt handler.
It can be shared among nbio generations

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# d094aea3 02-Sep-2019 Hawking Zhang <Hawking.Zhang@amd.com>

drm/amdgpu: set ip specific ras interface pointer to NULL after free it

to prevent access to dangling pointers

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@

drm/amdgpu: set ip specific ras interface pointer to NULL after free it

to prevent access to dangling pointers

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 7c6e68c7 13-Sep-2019 Andrey Grodzovsky <andrey.grodzovsky@amd.com>

drm/amdgpu: Avoid HW GPU reset for RAS.

Problem:
Under certain conditions, when some IP bocks take a RAS error,
we can get into a situation where a GPU reset is not possible
due to issues in RAS in

drm/amdgpu: Avoid HW GPU reset for RAS.

Problem:
Under certain conditions, when some IP bocks take a RAS error,
we can get into a situation where a GPU reset is not possible
due to issues in RAS in SMU/PSP.

Temporary fix until proper solution in PSP/SMU is ready:
When uncorrectable error happens the DF will unconditionally
broadcast error event packets to all its clients/slave upon
receiving fatal error event and freeze all its outbound queues,
err_event_athub interrupt will be triggered.
In such case and we use this interrupt
to issue GPU reset. THe GPU reset code is modified for such case to avoid HW
reset, only stops schedulers, deatches all in progress and not yet scheduled
job's fences, set error code on them and signals.
Also reject any new incoming job submissions from user space.
All this is done to notify the applications of the problem.

v2:
Extract amdgpu_amdkfd_pre/post_reset from amdgpu_device_lock/unlock_adev
Move amdgpu_job_stop_all_jobs_on_sched to amdgpu_job.c
Remove print param from amdgpu_ras_query_error_count

v3:
Update based on prevoius bug fixing patch to properly call amdgpu_amdkfd_pre_reset
for other XGMI hive memebers.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 9ad1dc29 29-Aug-2019 Hawking Zhang <Hawking.Zhang@amd.com>

drm/amdgpu: add ras_late_init callback function for nbio v7_4 (v3)

ras_late_init callback function will be used to do common ras
init in late init phase.

v2: call ras_late_fini to do cleanup when f

drm/amdgpu: add ras_late_init callback function for nbio v7_4 (v3)

ras_late_init callback function will be used to do common ras
init in late init phase.

v2: call ras_late_fini to do cleanup when fails to enable interrupt

v3: rename sysfs/debugfs node name to pcie_bif_xxx

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.2.11, v5.2.10, v5.2.9, v5.2.8, v5.2.7, v5.2.6, v5.2.5, v5.2.4, v5.2.3, v5.2.2, v5.2.1, v5.2, v5.1.16, v5.1.15, v5.1.14, v5.1.13, v5.1.12, v5.1.11, v5.1.10, v5.1.9, v5.1.8
# 4e644fff 05-Jun-2019 Hawking Zhang <Hawking.Zhang@amd.com>

drm/amdgpu: add ras_controller and err_event_athub interrupt support

Ras controller interrupt and Ras err event athub interrupt are two dedicated
interrupts for RAS support.

Signed-off-by: Hawking

drm/amdgpu: add ras_controller and err_event_athub interrupt support

Ras controller interrupt and Ras err event athub interrupt are two dedicated
interrupts for RAS support.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.1.7, v5.1.6
# 4241863a 29-May-2019 Hawking Zhang <Hawking.Zhang@amd.com>

drm/amdgpu/nbio: add functions to query ras specific interrupt status

ras_controller_interrupt and err_event_interrupt are ras specific interrupts.
add functions to check their status and ack them i

drm/amdgpu/nbio: add functions to query ras specific interrupt status

ras_controller_interrupt and err_event_interrupt are ras specific interrupts.
add functions to check their status and ack them if they are generated. both
funcitons should only be invoked in ISR when BIF ring is disabled or even not
initialized.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


12345