"0cdea445" (history) in projects: openbmc

Searched hist:"0 cdea445" (Results 1 – 2 of 2) sorted by relevance

/openbmc/linux/include/drm/
H A D	drm_mm.h	0cdea445 Mon May 04 10:40:35 CDT 2020 Nirmoy Das <nirmoy.das@amd.com> drm/mm: optimize rb_hole_addr rbtree search Userspace can severely fragment rb_hole_addr rbtree by manipulating alignment while allocating buffers. Fragmented rb_hole_addr rbtree would result in large delays while allocating buffer object for a userspace application. It takes long time to find suitable hole because if we fail to find a suitable hole in the first attempt then we look for neighbouring nodes using rb_prev()/rb_next(). Traversing rbtree using rb_prev()/rb_next() can take really long time if the tree is fragmented. This patch improves searches in fragmented rb_hole_addr rbtree by modifying it to an augmented rbtree which will store an extra field in drm_mm_node, subtree_max_hole. Each drm_mm_node now stores maximum hole size for its subtree in drm_mm_node->subtree_max_hole. Using drm_mm_node->subtree_max_hole, it is possible to eliminate a complete subtree if that subtree is unable to serve a request hence reducing number of rb_prev()/rb_next() used. With this patch applied, 1 million bo allocs on amdgpu took ~8 sec, compared to 50k bo allocs which took 28 sec without it. partial test code: int test_fragmentation(void) { int i = 0; uint32_t minor_version; uint32_t major_version; struct amdgpu_bo_alloc_request request = {}; amdgpu_bo_handle vram_handle[MAX_ALLOC] = {}; amdgpu_device_handle device_handle; request.alloc_size = 4096; request.phys_alignment = 8192; request.preferred_heap = AMDGPU_GEM_DOMAIN_VRAM; int fd = open("/dev/dri/card0", O_RDWR \| O_CLOEXEC); amdgpu_device_initialize(fd, &major_version, &minor_version, &device_handle); for (i = 0; i < MAX_ALLOC; i++) { amdgpu_bo_alloc(device_handle, &request, &vram_handle[i]); } for (i = 0; i < MAX_ALLOC; i++) amdgpu_bo_free(vram_handle[i]); return 0; } v2: Use RB_DECLARE_CALLBACKS_MAX to maintain subtree_max_hole v3: insert_hole_addr() should be static a function fix return value of next_hole_high_addr()/next_hole_low_addr() Reported-by: kbuild test robot <lkp@intel.com> v4: Fix commit message. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/364341/ Signed-off-by: Christian König <christian.koenig@amd.com> 0cdea445 Mon May 04 10:40:35 CDT 2020 Nirmoy Das <nirmoy.das@amd.com> drm/mm: optimize rb_hole_addr rbtree search Userspace can severely fragment rb_hole_addr rbtree by manipulating alignment while allocating buffers. Fragmented rb_hole_addr rbtree would result in large delays while allocating buffer object for a userspace application. It takes long time to find suitable hole because if we fail to find a suitable hole in the first attempt then we look for neighbouring nodes using rb_prev()/rb_next(). Traversing rbtree using rb_prev()/rb_next() can take really long time if the tree is fragmented. This patch improves searches in fragmented rb_hole_addr rbtree by modifying it to an augmented rbtree which will store an extra field in drm_mm_node, subtree_max_hole. Each drm_mm_node now stores maximum hole size for its subtree in drm_mm_node->subtree_max_hole. Using drm_mm_node->subtree_max_hole, it is possible to eliminate a complete subtree if that subtree is unable to serve a request hence reducing number of rb_prev()/rb_next() used. With this patch applied, 1 million bo allocs on amdgpu took ~8 sec, compared to 50k bo allocs which took 28 sec without it. partial test code: int test_fragmentation(void) { int i = 0; uint32_t minor_version; uint32_t major_version; struct amdgpu_bo_alloc_request request = {}; amdgpu_bo_handle vram_handle[MAX_ALLOC] = {}; amdgpu_device_handle device_handle; request.alloc_size = 4096; request.phys_alignment = 8192; request.preferred_heap = AMDGPU_GEM_DOMAIN_VRAM; int fd = open("/dev/dri/card0", O_RDWR \| O_CLOEXEC); amdgpu_device_initialize(fd, &major_version, &minor_version, &device_handle); for (i = 0; i < MAX_ALLOC; i++) { amdgpu_bo_alloc(device_handle, &request, &vram_handle[i]); } for (i = 0; i < MAX_ALLOC; i++) amdgpu_bo_free(vram_handle[i]); return 0; } v2: Use RB_DECLARE_CALLBACKS_MAX to maintain subtree_max_hole v3: insert_hole_addr() should be static a function fix return value of next_hole_high_addr()/next_hole_low_addr() Reported-by: kbuild test robot <lkp@intel.com> v4: Fix commit message. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/364341/ Signed-off-by: Christian König <christian.koenig@amd.com>
/openbmc/linux/drivers/gpu/drm/
H A D	drm_mm.c	0cdea445 Mon May 04 10:40:35 CDT 2020 Nirmoy Das <nirmoy.das@amd.com> drm/mm: optimize rb_hole_addr rbtree search Userspace can severely fragment rb_hole_addr rbtree by manipulating alignment while allocating buffers. Fragmented rb_hole_addr rbtree would result in large delays while allocating buffer object for a userspace application. It takes long time to find suitable hole because if we fail to find a suitable hole in the first attempt then we look for neighbouring nodes using rb_prev()/rb_next(). Traversing rbtree using rb_prev()/rb_next() can take really long time if the tree is fragmented. This patch improves searches in fragmented rb_hole_addr rbtree by modifying it to an augmented rbtree which will store an extra field in drm_mm_node, subtree_max_hole. Each drm_mm_node now stores maximum hole size for its subtree in drm_mm_node->subtree_max_hole. Using drm_mm_node->subtree_max_hole, it is possible to eliminate a complete subtree if that subtree is unable to serve a request hence reducing number of rb_prev()/rb_next() used. With this patch applied, 1 million bo allocs on amdgpu took ~8 sec, compared to 50k bo allocs which took 28 sec without it. partial test code: int test_fragmentation(void) { int i = 0; uint32_t minor_version; uint32_t major_version; struct amdgpu_bo_alloc_request request = {}; amdgpu_bo_handle vram_handle[MAX_ALLOC] = {}; amdgpu_device_handle device_handle; request.alloc_size = 4096; request.phys_alignment = 8192; request.preferred_heap = AMDGPU_GEM_DOMAIN_VRAM; int fd = open("/dev/dri/card0", O_RDWR \| O_CLOEXEC); amdgpu_device_initialize(fd, &major_version, &minor_version, &device_handle); for (i = 0; i < MAX_ALLOC; i++) { amdgpu_bo_alloc(device_handle, &request, &vram_handle[i]); } for (i = 0; i < MAX_ALLOC; i++) amdgpu_bo_free(vram_handle[i]); return 0; } v2: Use RB_DECLARE_CALLBACKS_MAX to maintain subtree_max_hole v3: insert_hole_addr() should be static a function fix return value of next_hole_high_addr()/next_hole_low_addr() Reported-by: kbuild test robot <lkp@intel.com> v4: Fix commit message. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/364341/ Signed-off-by: Christian König <christian.koenig@amd.com> 0cdea445 Mon May 04 10:40:35 CDT 2020 Nirmoy Das <nirmoy.das@amd.com> drm/mm: optimize rb_hole_addr rbtree search Userspace can severely fragment rb_hole_addr rbtree by manipulating alignment while allocating buffers. Fragmented rb_hole_addr rbtree would result in large delays while allocating buffer object for a userspace application. It takes long time to find suitable hole because if we fail to find a suitable hole in the first attempt then we look for neighbouring nodes using rb_prev()/rb_next(). Traversing rbtree using rb_prev()/rb_next() can take really long time if the tree is fragmented. This patch improves searches in fragmented rb_hole_addr rbtree by modifying it to an augmented rbtree which will store an extra field in drm_mm_node, subtree_max_hole. Each drm_mm_node now stores maximum hole size for its subtree in drm_mm_node->subtree_max_hole. Using drm_mm_node->subtree_max_hole, it is possible to eliminate a complete subtree if that subtree is unable to serve a request hence reducing number of rb_prev()/rb_next() used. With this patch applied, 1 million bo allocs on amdgpu took ~8 sec, compared to 50k bo allocs which took 28 sec without it. partial test code: int test_fragmentation(void) { int i = 0; uint32_t minor_version; uint32_t major_version; struct amdgpu_bo_alloc_request request = {}; amdgpu_bo_handle vram_handle[MAX_ALLOC] = {}; amdgpu_device_handle device_handle; request.alloc_size = 4096; request.phys_alignment = 8192; request.preferred_heap = AMDGPU_GEM_DOMAIN_VRAM; int fd = open("/dev/dri/card0", O_RDWR \| O_CLOEXEC); amdgpu_device_initialize(fd, &major_version, &minor_version, &device_handle); for (i = 0; i < MAX_ALLOC; i++) { amdgpu_bo_alloc(device_handle, &request, &vram_handle[i]); } for (i = 0; i < MAX_ALLOC; i++) amdgpu_bo_free(vram_handle[i]); return 0; } v2: Use RB_DECLARE_CALLBACKS_MAX to maintain subtree_max_hole v3: insert_hole_addr() should be static a function fix return value of next_hole_high_addr()/next_hole_low_addr() Reported-by: kbuild test robot <lkp@intel.com> v4: Fix commit message. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/364341/ Signed-off-by: Christian König <christian.koenig@amd.com>

Project(s)

Full Search
Definition
Symbol
File Path
History
Type