1.. SPDX-License-Identifier: GPL-2.0-only
2.. Copyright (C) 2022 Red Hat, Inc.
3.. Copyright (C) 2022-2023 Isovalent, Inc.
4
5===============================================
6BPF_MAP_TYPE_HASH, with PERCPU and LRU Variants
7===============================================
8
9.. note::
10   - ``BPF_MAP_TYPE_HASH`` was introduced in kernel version 3.19
11   - ``BPF_MAP_TYPE_PERCPU_HASH`` was introduced in version 4.6
12   - Both ``BPF_MAP_TYPE_LRU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH``
13     were introduced in version 4.10
14
15``BPF_MAP_TYPE_HASH`` and ``BPF_MAP_TYPE_PERCPU_HASH`` provide general
16purpose hash map storage. Both the key and the value can be structs,
17allowing for composite keys and values.
18
19The kernel is responsible for allocating and freeing key/value pairs, up
20to the max_entries limit that you specify. Hash maps use pre-allocation
21of hash table elements by default. The ``BPF_F_NO_PREALLOC`` flag can be
22used to disable pre-allocation when it is too memory expensive.
23
24``BPF_MAP_TYPE_PERCPU_HASH`` provides a separate value slot per
25CPU. The per-cpu values are stored internally in an array.
26
27The ``BPF_MAP_TYPE_LRU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH``
28variants add LRU semantics to their respective hash tables. An LRU hash
29will automatically evict the least recently used entries when the hash
30table reaches capacity. An LRU hash maintains an internal LRU list that
31is used to select elements for eviction. This internal LRU list is
32shared across CPUs but it is possible to request a per CPU LRU list with
33the ``BPF_F_NO_COMMON_LRU`` flag when calling ``bpf_map_create``.  The
34following table outlines the properties of LRU maps depending on the a
35map type and the flags used to create the map.
36
37======================== ========================= ================================
38Flag                     ``BPF_MAP_TYPE_LRU_HASH`` ``BPF_MAP_TYPE_LRU_PERCPU_HASH``
39======================== ========================= ================================
40**BPF_F_NO_COMMON_LRU**  Per-CPU LRU, global map   Per-CPU LRU, per-cpu map
41**!BPF_F_NO_COMMON_LRU** Global LRU, global map    Global LRU, per-cpu map
42======================== ========================= ================================
43
44Usage
45=====
46
47Kernel BPF
48----------
49
50bpf_map_update_elem()
51~~~~~~~~~~~~~~~~~~~~~
52
53.. code-block:: c
54
55   long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags)
56
57Hash entries can be added or updated using the ``bpf_map_update_elem()``
58helper. This helper replaces existing elements atomically. The ``flags``
59parameter can be used to control the update behaviour:
60
61- ``BPF_ANY`` will create a new element or update an existing element
62- ``BPF_NOEXIST`` will create a new element only if one did not already
63  exist
64- ``BPF_EXIST`` will update an existing element
65
66``bpf_map_update_elem()`` returns 0 on success, or negative error in
67case of failure.
68
69bpf_map_lookup_elem()
70~~~~~~~~~~~~~~~~~~~~~
71
72.. code-block:: c
73
74   void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
75
76Hash entries can be retrieved using the ``bpf_map_lookup_elem()``
77helper. This helper returns a pointer to the value associated with
78``key``, or ``NULL`` if no entry was found.
79
80bpf_map_delete_elem()
81~~~~~~~~~~~~~~~~~~~~~
82
83.. code-block:: c
84
85   long bpf_map_delete_elem(struct bpf_map *map, const void *key)
86
87Hash entries can be deleted using the ``bpf_map_delete_elem()``
88helper. This helper will return 0 on success, or negative error in case
89of failure.
90
91Per CPU Hashes
92--------------
93
94For ``BPF_MAP_TYPE_PERCPU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH``
95the ``bpf_map_update_elem()`` and ``bpf_map_lookup_elem()`` helpers
96automatically access the hash slot for the current CPU.
97
98bpf_map_lookup_percpu_elem()
99~~~~~~~~~~~~~~~~~~~~~~~~~~~~
100
101.. code-block:: c
102
103   void *bpf_map_lookup_percpu_elem(struct bpf_map *map, const void *key, u32 cpu)
104
105The ``bpf_map_lookup_percpu_elem()`` helper can be used to lookup the
106value in the hash slot for a specific CPU. Returns value associated with
107``key`` on ``cpu`` , or ``NULL`` if no entry was found or ``cpu`` is
108invalid.
109
110Concurrency
111-----------
112
113Values stored in ``BPF_MAP_TYPE_HASH`` can be accessed concurrently by
114programs running on different CPUs.  Since Kernel version 5.1, the BPF
115infrastructure provides ``struct bpf_spin_lock`` to synchronise access.
116See ``tools/testing/selftests/bpf/progs/test_spin_lock.c``.
117
118Userspace
119---------
120
121bpf_map_get_next_key()
122~~~~~~~~~~~~~~~~~~~~~~
123
124.. code-block:: c
125
126   int bpf_map_get_next_key(int fd, const void *cur_key, void *next_key)
127
128In userspace, it is possible to iterate through the keys of a hash using
129libbpf's ``bpf_map_get_next_key()`` function. The first key can be fetched by
130calling ``bpf_map_get_next_key()`` with ``cur_key`` set to
131``NULL``. Subsequent calls will fetch the next key that follows the
132current key. ``bpf_map_get_next_key()`` returns 0 on success, -ENOENT if
133cur_key is the last key in the hash, or negative error in case of
134failure.
135
136Note that if ``cur_key`` gets deleted then ``bpf_map_get_next_key()``
137will instead return the *first* key in the hash table which is
138undesirable. It is recommended to use batched lookup if there is going
139to be key deletion intermixed with ``bpf_map_get_next_key()``.
140
141Examples
142========
143
144Please see the ``tools/testing/selftests/bpf`` directory for functional
145examples.  The code snippets below demonstrates API usage.
146
147This example shows how to declare an LRU Hash with a struct key and a
148struct value.
149
150.. code-block:: c
151
152    #include <linux/bpf.h>
153    #include <bpf/bpf_helpers.h>
154
155    struct key {
156        __u32 srcip;
157    };
158
159    struct value {
160        __u64 packets;
161        __u64 bytes;
162    };
163
164    struct {
165            __uint(type, BPF_MAP_TYPE_LRU_HASH);
166            __uint(max_entries, 32);
167            __type(key, struct key);
168            __type(value, struct value);
169    } packet_stats SEC(".maps");
170
171This example shows how to create or update hash values using atomic
172instructions:
173
174.. code-block:: c
175
176    static void update_stats(__u32 srcip, int bytes)
177    {
178            struct key key = {
179                    .srcip = srcip,
180            };
181            struct value *value = bpf_map_lookup_elem(&packet_stats, &key);
182
183            if (value) {
184                    __sync_fetch_and_add(&value->packets, 1);
185                    __sync_fetch_and_add(&value->bytes, bytes);
186            } else {
187                    struct value newval = { 1, bytes };
188
189                    bpf_map_update_elem(&packet_stats, &key, &newval, BPF_NOEXIST);
190            }
191    }
192
193Userspace walking the map elements from the map declared above:
194
195.. code-block:: c
196
197    #include <bpf/libbpf.h>
198    #include <bpf/bpf.h>
199
200    static void walk_hash_elements(int map_fd)
201    {
202            struct key *cur_key = NULL;
203            struct key next_key;
204            struct value value;
205            int err;
206
207            for (;;) {
208                    err = bpf_map_get_next_key(map_fd, cur_key, &next_key);
209                    if (err)
210                            break;
211
212                    bpf_map_lookup_elem(map_fd, &next_key, &value);
213
214                    // Use key and value here
215
216                    cur_key = &next_key;
217            }
218    }
219
220Internals
221=========
222
223This section of the document is targeted at Linux developers and describes
224aspects of the map implementations that are not considered stable ABI. The
225following details are subject to change in future versions of the kernel.
226
227``BPF_MAP_TYPE_LRU_HASH`` and variants
228--------------------------------------
229
230Updating elements in LRU maps may trigger eviction behaviour when the capacity
231of the map is reached. There are various steps that the update algorithm
232attempts in order to enforce the LRU property which have increasing impacts on
233other CPUs involved in the following operation attempts:
234
235- Attempt to use CPU-local state to batch operations
236- Attempt to fetch free nodes from global lists
237- Attempt to pull any node from a global list and remove it from the hashmap
238- Attempt to pull any node from any CPU's list and remove it from the hashmap
239
240This algorithm is described visually in the following diagram. See the
241description in commit 3a08c2fd7634 ("bpf: LRU List") for a full explanation of
242the corresponding operations:
243
244.. kernel-figure::  map_lru_hash_update.dot
245   :alt:    Diagram outlining the LRU eviction steps taken during map update.
246
247   LRU hash eviction during map update for ``BPF_MAP_TYPE_LRU_HASH`` and
248   variants. See the dot file source for kernel function name code references.
249
250Map updates start from the oval in the top right "begin ``bpf_map_update()``"
251and progress through the graph towards the bottom where the result may be
252either a successful update or a failure with various error codes. The key in
253the top right provides indicators for which locks may be involved in specific
254operations. This is intended as a visual hint for reasoning about how map
255contention may impact update operations, though the map type and flags may
256impact the actual contention on those locks, based on the logic described in
257the table above. For instance, if the map is created with type
258``BPF_MAP_TYPE_LRU_PERCPU_HASH`` and flags ``BPF_F_NO_COMMON_LRU`` then all map
259properties would be per-cpu.
260