1979855d3SDonald Hunter.. SPDX-License-Identifier: GPL-2.0-only 2979855d3SDonald Hunter.. Copyright (C) 2022 Red Hat, Inc. 3*1a986518SJoe Stringer.. Copyright (C) 2022-2023 Isovalent, Inc. 4979855d3SDonald Hunter 5979855d3SDonald Hunter=============================================== 6979855d3SDonald HunterBPF_MAP_TYPE_HASH, with PERCPU and LRU Variants 7979855d3SDonald Hunter=============================================== 8979855d3SDonald Hunter 9979855d3SDonald Hunter.. note:: 10979855d3SDonald Hunter - ``BPF_MAP_TYPE_HASH`` was introduced in kernel version 3.19 11979855d3SDonald Hunter - ``BPF_MAP_TYPE_PERCPU_HASH`` was introduced in version 4.6 12979855d3SDonald Hunter - Both ``BPF_MAP_TYPE_LRU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH`` 13979855d3SDonald Hunter were introduced in version 4.10 14979855d3SDonald Hunter 15979855d3SDonald Hunter``BPF_MAP_TYPE_HASH`` and ``BPF_MAP_TYPE_PERCPU_HASH`` provide general 16979855d3SDonald Hunterpurpose hash map storage. Both the key and the value can be structs, 17979855d3SDonald Hunterallowing for composite keys and values. 18979855d3SDonald Hunter 19979855d3SDonald HunterThe kernel is responsible for allocating and freeing key/value pairs, up 20979855d3SDonald Hunterto the max_entries limit that you specify. Hash maps use pre-allocation 21979855d3SDonald Hunterof hash table elements by default. The ``BPF_F_NO_PREALLOC`` flag can be 22979855d3SDonald Hunterused to disable pre-allocation when it is too memory expensive. 23979855d3SDonald Hunter 24979855d3SDonald Hunter``BPF_MAP_TYPE_PERCPU_HASH`` provides a separate value slot per 25979855d3SDonald HunterCPU. The per-cpu values are stored internally in an array. 26979855d3SDonald Hunter 27979855d3SDonald HunterThe ``BPF_MAP_TYPE_LRU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH`` 28979855d3SDonald Huntervariants add LRU semantics to their respective hash tables. An LRU hash 29979855d3SDonald Hunterwill automatically evict the least recently used entries when the hash 30979855d3SDonald Huntertable reaches capacity. An LRU hash maintains an internal LRU list that 31979855d3SDonald Hunteris used to select elements for eviction. This internal LRU list is 32979855d3SDonald Huntershared across CPUs but it is possible to request a per CPU LRU list with 33af0335d2SJoe Stringerthe ``BPF_F_NO_COMMON_LRU`` flag when calling ``bpf_map_create``. The 34af0335d2SJoe Stringerfollowing table outlines the properties of LRU maps depending on the a 35af0335d2SJoe Stringermap type and the flags used to create the map. 36af0335d2SJoe Stringer 37af0335d2SJoe Stringer======================== ========================= ================================ 38af0335d2SJoe StringerFlag ``BPF_MAP_TYPE_LRU_HASH`` ``BPF_MAP_TYPE_LRU_PERCPU_HASH`` 39af0335d2SJoe Stringer======================== ========================= ================================ 40af0335d2SJoe Stringer**BPF_F_NO_COMMON_LRU** Per-CPU LRU, global map Per-CPU LRU, per-cpu map 41af0335d2SJoe Stringer**!BPF_F_NO_COMMON_LRU** Global LRU, global map Global LRU, per-cpu map 42af0335d2SJoe Stringer======================== ========================= ================================ 43979855d3SDonald Hunter 44979855d3SDonald HunterUsage 45979855d3SDonald Hunter===== 46979855d3SDonald Hunter 47539886a3SDonald HunterKernel BPF 48539886a3SDonald Hunter---------- 49539886a3SDonald Hunter 50539886a3SDonald Hunterbpf_map_update_elem() 51539886a3SDonald Hunter~~~~~~~~~~~~~~~~~~~~~ 52539886a3SDonald Hunter 53539886a3SDonald Hunter.. code-block:: c 54539886a3SDonald Hunter 55979855d3SDonald Hunter long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags) 56979855d3SDonald Hunter 57979855d3SDonald HunterHash entries can be added or updated using the ``bpf_map_update_elem()`` 58979855d3SDonald Hunterhelper. This helper replaces existing elements atomically. The ``flags`` 59979855d3SDonald Hunterparameter can be used to control the update behaviour: 60979855d3SDonald Hunter 61979855d3SDonald Hunter- ``BPF_ANY`` will create a new element or update an existing element 62979855d3SDonald Hunter- ``BPF_NOEXIST`` will create a new element only if one did not already 63979855d3SDonald Hunter exist 64979855d3SDonald Hunter- ``BPF_EXIST`` will update an existing element 65979855d3SDonald Hunter 66979855d3SDonald Hunter``bpf_map_update_elem()`` returns 0 on success, or negative error in 67979855d3SDonald Huntercase of failure. 68979855d3SDonald Hunter 69539886a3SDonald Hunterbpf_map_lookup_elem() 70539886a3SDonald Hunter~~~~~~~~~~~~~~~~~~~~~ 71539886a3SDonald Hunter 72539886a3SDonald Hunter.. code-block:: c 73539886a3SDonald Hunter 74979855d3SDonald Hunter void *bpf_map_lookup_elem(struct bpf_map *map, const void *key) 75979855d3SDonald Hunter 76979855d3SDonald HunterHash entries can be retrieved using the ``bpf_map_lookup_elem()`` 77979855d3SDonald Hunterhelper. This helper returns a pointer to the value associated with 78979855d3SDonald Hunter``key``, or ``NULL`` if no entry was found. 79979855d3SDonald Hunter 80539886a3SDonald Hunterbpf_map_delete_elem() 81539886a3SDonald Hunter~~~~~~~~~~~~~~~~~~~~~ 82539886a3SDonald Hunter 83539886a3SDonald Hunter.. code-block:: c 84539886a3SDonald Hunter 85979855d3SDonald Hunter long bpf_map_delete_elem(struct bpf_map *map, const void *key) 86979855d3SDonald Hunter 87979855d3SDonald HunterHash entries can be deleted using the ``bpf_map_delete_elem()`` 88979855d3SDonald Hunterhelper. This helper will return 0 on success, or negative error in case 89979855d3SDonald Hunterof failure. 90979855d3SDonald Hunter 91979855d3SDonald HunterPer CPU Hashes 92979855d3SDonald Hunter-------------- 93979855d3SDonald Hunter 94979855d3SDonald HunterFor ``BPF_MAP_TYPE_PERCPU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH`` 95979855d3SDonald Hunterthe ``bpf_map_update_elem()`` and ``bpf_map_lookup_elem()`` helpers 96979855d3SDonald Hunterautomatically access the hash slot for the current CPU. 97979855d3SDonald Hunter 98539886a3SDonald Hunterbpf_map_lookup_percpu_elem() 99539886a3SDonald Hunter~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 100539886a3SDonald Hunter 101539886a3SDonald Hunter.. code-block:: c 102539886a3SDonald Hunter 103979855d3SDonald Hunter void *bpf_map_lookup_percpu_elem(struct bpf_map *map, const void *key, u32 cpu) 104979855d3SDonald Hunter 105979855d3SDonald HunterThe ``bpf_map_lookup_percpu_elem()`` helper can be used to lookup the 106979855d3SDonald Huntervalue in the hash slot for a specific CPU. Returns value associated with 107979855d3SDonald Hunter``key`` on ``cpu`` , or ``NULL`` if no entry was found or ``cpu`` is 108979855d3SDonald Hunterinvalid. 109979855d3SDonald Hunter 110979855d3SDonald HunterConcurrency 111979855d3SDonald Hunter----------- 112979855d3SDonald Hunter 113979855d3SDonald HunterValues stored in ``BPF_MAP_TYPE_HASH`` can be accessed concurrently by 114979855d3SDonald Hunterprograms running on different CPUs. Since Kernel version 5.1, the BPF 115979855d3SDonald Hunterinfrastructure provides ``struct bpf_spin_lock`` to synchronise access. 116979855d3SDonald HunterSee ``tools/testing/selftests/bpf/progs/test_spin_lock.c``. 117979855d3SDonald Hunter 118979855d3SDonald HunterUserspace 119979855d3SDonald Hunter--------- 120979855d3SDonald Hunter 121539886a3SDonald Hunterbpf_map_get_next_key() 122539886a3SDonald Hunter~~~~~~~~~~~~~~~~~~~~~~ 123539886a3SDonald Hunter 124539886a3SDonald Hunter.. code-block:: c 125539886a3SDonald Hunter 126979855d3SDonald Hunter int bpf_map_get_next_key(int fd, const void *cur_key, void *next_key) 127979855d3SDonald Hunter 128979855d3SDonald HunterIn userspace, it is possible to iterate through the keys of a hash using 129979855d3SDonald Hunterlibbpf's ``bpf_map_get_next_key()`` function. The first key can be fetched by 130979855d3SDonald Huntercalling ``bpf_map_get_next_key()`` with ``cur_key`` set to 131979855d3SDonald Hunter``NULL``. Subsequent calls will fetch the next key that follows the 132979855d3SDonald Huntercurrent key. ``bpf_map_get_next_key()`` returns 0 on success, -ENOENT if 133979855d3SDonald Huntercur_key is the last key in the hash, or negative error in case of 134979855d3SDonald Hunterfailure. 135979855d3SDonald Hunter 136979855d3SDonald HunterNote that if ``cur_key`` gets deleted then ``bpf_map_get_next_key()`` 137979855d3SDonald Hunterwill instead return the *first* key in the hash table which is 138979855d3SDonald Hunterundesirable. It is recommended to use batched lookup if there is going 139979855d3SDonald Hunterto be key deletion intermixed with ``bpf_map_get_next_key()``. 140979855d3SDonald Hunter 141979855d3SDonald HunterExamples 142979855d3SDonald Hunter======== 143979855d3SDonald Hunter 144979855d3SDonald HunterPlease see the ``tools/testing/selftests/bpf`` directory for functional 145979855d3SDonald Hunterexamples. The code snippets below demonstrates API usage. 146979855d3SDonald Hunter 147979855d3SDonald HunterThis example shows how to declare an LRU Hash with a struct key and a 148979855d3SDonald Hunterstruct value. 149979855d3SDonald Hunter 150979855d3SDonald Hunter.. code-block:: c 151979855d3SDonald Hunter 152979855d3SDonald Hunter #include <linux/bpf.h> 153979855d3SDonald Hunter #include <bpf/bpf_helpers.h> 154979855d3SDonald Hunter 155979855d3SDonald Hunter struct key { 156979855d3SDonald Hunter __u32 srcip; 157979855d3SDonald Hunter }; 158979855d3SDonald Hunter 159979855d3SDonald Hunter struct value { 160979855d3SDonald Hunter __u64 packets; 161979855d3SDonald Hunter __u64 bytes; 162979855d3SDonald Hunter }; 163979855d3SDonald Hunter 164979855d3SDonald Hunter struct { 165979855d3SDonald Hunter __uint(type, BPF_MAP_TYPE_LRU_HASH); 166979855d3SDonald Hunter __uint(max_entries, 32); 167979855d3SDonald Hunter __type(key, struct key); 168979855d3SDonald Hunter __type(value, struct value); 169979855d3SDonald Hunter } packet_stats SEC(".maps"); 170979855d3SDonald Hunter 171979855d3SDonald HunterThis example shows how to create or update hash values using atomic 172979855d3SDonald Hunterinstructions: 173979855d3SDonald Hunter 174979855d3SDonald Hunter.. code-block:: c 175979855d3SDonald Hunter 176979855d3SDonald Hunter static void update_stats(__u32 srcip, int bytes) 177979855d3SDonald Hunter { 178979855d3SDonald Hunter struct key key = { 179979855d3SDonald Hunter .srcip = srcip, 180979855d3SDonald Hunter }; 181979855d3SDonald Hunter struct value *value = bpf_map_lookup_elem(&packet_stats, &key); 182979855d3SDonald Hunter 183979855d3SDonald Hunter if (value) { 184979855d3SDonald Hunter __sync_fetch_and_add(&value->packets, 1); 185979855d3SDonald Hunter __sync_fetch_and_add(&value->bytes, bytes); 186979855d3SDonald Hunter } else { 187979855d3SDonald Hunter struct value newval = { 1, bytes }; 188979855d3SDonald Hunter 189979855d3SDonald Hunter bpf_map_update_elem(&packet_stats, &key, &newval, BPF_NOEXIST); 190979855d3SDonald Hunter } 191979855d3SDonald Hunter } 192979855d3SDonald Hunter 193979855d3SDonald HunterUserspace walking the map elements from the map declared above: 194979855d3SDonald Hunter 195979855d3SDonald Hunter.. code-block:: c 196979855d3SDonald Hunter 197979855d3SDonald Hunter #include <bpf/libbpf.h> 198979855d3SDonald Hunter #include <bpf/bpf.h> 199979855d3SDonald Hunter 200979855d3SDonald Hunter static void walk_hash_elements(int map_fd) 201979855d3SDonald Hunter { 202979855d3SDonald Hunter struct key *cur_key = NULL; 203979855d3SDonald Hunter struct key next_key; 204979855d3SDonald Hunter struct value value; 205979855d3SDonald Hunter int err; 206979855d3SDonald Hunter 207979855d3SDonald Hunter for (;;) { 208979855d3SDonald Hunter err = bpf_map_get_next_key(map_fd, cur_key, &next_key); 209979855d3SDonald Hunter if (err) 210979855d3SDonald Hunter break; 211979855d3SDonald Hunter 212979855d3SDonald Hunter bpf_map_lookup_elem(map_fd, &next_key, &value); 213979855d3SDonald Hunter 214979855d3SDonald Hunter // Use key and value here 215979855d3SDonald Hunter 216979855d3SDonald Hunter cur_key = &next_key; 217979855d3SDonald Hunter } 218979855d3SDonald Hunter } 219*1a986518SJoe Stringer 220*1a986518SJoe StringerInternals 221*1a986518SJoe Stringer========= 222*1a986518SJoe Stringer 223*1a986518SJoe StringerThis section of the document is targeted at Linux developers and describes 224*1a986518SJoe Stringeraspects of the map implementations that are not considered stable ABI. The 225*1a986518SJoe Stringerfollowing details are subject to change in future versions of the kernel. 226*1a986518SJoe Stringer 227*1a986518SJoe Stringer``BPF_MAP_TYPE_LRU_HASH`` and variants 228*1a986518SJoe Stringer-------------------------------------- 229*1a986518SJoe Stringer 230*1a986518SJoe StringerUpdating elements in LRU maps may trigger eviction behaviour when the capacity 231*1a986518SJoe Stringerof the map is reached. There are various steps that the update algorithm 232*1a986518SJoe Stringerattempts in order to enforce the LRU property which have increasing impacts on 233*1a986518SJoe Stringerother CPUs involved in the following operation attempts: 234*1a986518SJoe Stringer 235*1a986518SJoe Stringer- Attempt to use CPU-local state to batch operations 236*1a986518SJoe Stringer- Attempt to fetch free nodes from global lists 237*1a986518SJoe Stringer- Attempt to pull any node from a global list and remove it from the hashmap 238*1a986518SJoe Stringer- Attempt to pull any node from any CPU's list and remove it from the hashmap 239*1a986518SJoe Stringer 240*1a986518SJoe StringerThis algorithm is described visually in the following diagram. See the 241*1a986518SJoe Stringerdescription in commit 3a08c2fd7634 ("bpf: LRU List") for a full explanation of 242*1a986518SJoe Stringerthe corresponding operations: 243*1a986518SJoe Stringer 244*1a986518SJoe Stringer.. kernel-figure:: map_lru_hash_update.dot 245*1a986518SJoe Stringer :alt: Diagram outlining the LRU eviction steps taken during map update. 246*1a986518SJoe Stringer 247*1a986518SJoe Stringer LRU hash eviction during map update for ``BPF_MAP_TYPE_LRU_HASH`` and 248*1a986518SJoe Stringer variants. See the dot file source for kernel function name code references. 249*1a986518SJoe Stringer 250*1a986518SJoe StringerMap updates start from the oval in the top right "begin ``bpf_map_update()``" 251*1a986518SJoe Stringerand progress through the graph towards the bottom where the result may be 252*1a986518SJoe Stringereither a successful update or a failure with various error codes. The key in 253*1a986518SJoe Stringerthe top right provides indicators for which locks may be involved in specific 254*1a986518SJoe Stringeroperations. This is intended as a visual hint for reasoning about how map 255*1a986518SJoe Stringercontention may impact update operations, though the map type and flags may 256*1a986518SJoe Stringerimpact the actual contention on those locks, based on the logic described in 257*1a986518SJoe Stringerthe table above. For instance, if the map is created with type 258*1a986518SJoe Stringer``BPF_MAP_TYPE_LRU_PERCPU_HASH`` and flags ``BPF_F_NO_COMMON_LRU`` then all map 259*1a986518SJoe Stringerproperties would be per-cpu. 260