xref: /openbmc/linux/Documentation/bpf/map_hash.rst (revision 2612e3bbc0386368a850140a6c9b990cd496a5ec)
1979855d3SDonald Hunter.. SPDX-License-Identifier: GPL-2.0-only
2979855d3SDonald Hunter.. Copyright (C) 2022 Red Hat, Inc.
3*1a986518SJoe Stringer.. Copyright (C) 2022-2023 Isovalent, Inc.
4979855d3SDonald Hunter
5979855d3SDonald Hunter===============================================
6979855d3SDonald HunterBPF_MAP_TYPE_HASH, with PERCPU and LRU Variants
7979855d3SDonald Hunter===============================================
8979855d3SDonald Hunter
9979855d3SDonald Hunter.. note::
10979855d3SDonald Hunter   - ``BPF_MAP_TYPE_HASH`` was introduced in kernel version 3.19
11979855d3SDonald Hunter   - ``BPF_MAP_TYPE_PERCPU_HASH`` was introduced in version 4.6
12979855d3SDonald Hunter   - Both ``BPF_MAP_TYPE_LRU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH``
13979855d3SDonald Hunter     were introduced in version 4.10
14979855d3SDonald Hunter
15979855d3SDonald Hunter``BPF_MAP_TYPE_HASH`` and ``BPF_MAP_TYPE_PERCPU_HASH`` provide general
16979855d3SDonald Hunterpurpose hash map storage. Both the key and the value can be structs,
17979855d3SDonald Hunterallowing for composite keys and values.
18979855d3SDonald Hunter
19979855d3SDonald HunterThe kernel is responsible for allocating and freeing key/value pairs, up
20979855d3SDonald Hunterto the max_entries limit that you specify. Hash maps use pre-allocation
21979855d3SDonald Hunterof hash table elements by default. The ``BPF_F_NO_PREALLOC`` flag can be
22979855d3SDonald Hunterused to disable pre-allocation when it is too memory expensive.
23979855d3SDonald Hunter
24979855d3SDonald Hunter``BPF_MAP_TYPE_PERCPU_HASH`` provides a separate value slot per
25979855d3SDonald HunterCPU. The per-cpu values are stored internally in an array.
26979855d3SDonald Hunter
27979855d3SDonald HunterThe ``BPF_MAP_TYPE_LRU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH``
28979855d3SDonald Huntervariants add LRU semantics to their respective hash tables. An LRU hash
29979855d3SDonald Hunterwill automatically evict the least recently used entries when the hash
30979855d3SDonald Huntertable reaches capacity. An LRU hash maintains an internal LRU list that
31979855d3SDonald Hunteris used to select elements for eviction. This internal LRU list is
32979855d3SDonald Huntershared across CPUs but it is possible to request a per CPU LRU list with
33af0335d2SJoe Stringerthe ``BPF_F_NO_COMMON_LRU`` flag when calling ``bpf_map_create``.  The
34af0335d2SJoe Stringerfollowing table outlines the properties of LRU maps depending on the a
35af0335d2SJoe Stringermap type and the flags used to create the map.
36af0335d2SJoe Stringer
37af0335d2SJoe Stringer======================== ========================= ================================
38af0335d2SJoe StringerFlag                     ``BPF_MAP_TYPE_LRU_HASH`` ``BPF_MAP_TYPE_LRU_PERCPU_HASH``
39af0335d2SJoe Stringer======================== ========================= ================================
40af0335d2SJoe Stringer**BPF_F_NO_COMMON_LRU**  Per-CPU LRU, global map   Per-CPU LRU, per-cpu map
41af0335d2SJoe Stringer**!BPF_F_NO_COMMON_LRU** Global LRU, global map    Global LRU, per-cpu map
42af0335d2SJoe Stringer======================== ========================= ================================
43979855d3SDonald Hunter
44979855d3SDonald HunterUsage
45979855d3SDonald Hunter=====
46979855d3SDonald Hunter
47539886a3SDonald HunterKernel BPF
48539886a3SDonald Hunter----------
49539886a3SDonald Hunter
50539886a3SDonald Hunterbpf_map_update_elem()
51539886a3SDonald Hunter~~~~~~~~~~~~~~~~~~~~~
52539886a3SDonald Hunter
53539886a3SDonald Hunter.. code-block:: c
54539886a3SDonald Hunter
55979855d3SDonald Hunter   long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags)
56979855d3SDonald Hunter
57979855d3SDonald HunterHash entries can be added or updated using the ``bpf_map_update_elem()``
58979855d3SDonald Hunterhelper. This helper replaces existing elements atomically. The ``flags``
59979855d3SDonald Hunterparameter can be used to control the update behaviour:
60979855d3SDonald Hunter
61979855d3SDonald Hunter- ``BPF_ANY`` will create a new element or update an existing element
62979855d3SDonald Hunter- ``BPF_NOEXIST`` will create a new element only if one did not already
63979855d3SDonald Hunter  exist
64979855d3SDonald Hunter- ``BPF_EXIST`` will update an existing element
65979855d3SDonald Hunter
66979855d3SDonald Hunter``bpf_map_update_elem()`` returns 0 on success, or negative error in
67979855d3SDonald Huntercase of failure.
68979855d3SDonald Hunter
69539886a3SDonald Hunterbpf_map_lookup_elem()
70539886a3SDonald Hunter~~~~~~~~~~~~~~~~~~~~~
71539886a3SDonald Hunter
72539886a3SDonald Hunter.. code-block:: c
73539886a3SDonald Hunter
74979855d3SDonald Hunter   void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
75979855d3SDonald Hunter
76979855d3SDonald HunterHash entries can be retrieved using the ``bpf_map_lookup_elem()``
77979855d3SDonald Hunterhelper. This helper returns a pointer to the value associated with
78979855d3SDonald Hunter``key``, or ``NULL`` if no entry was found.
79979855d3SDonald Hunter
80539886a3SDonald Hunterbpf_map_delete_elem()
81539886a3SDonald Hunter~~~~~~~~~~~~~~~~~~~~~
82539886a3SDonald Hunter
83539886a3SDonald Hunter.. code-block:: c
84539886a3SDonald Hunter
85979855d3SDonald Hunter   long bpf_map_delete_elem(struct bpf_map *map, const void *key)
86979855d3SDonald Hunter
87979855d3SDonald HunterHash entries can be deleted using the ``bpf_map_delete_elem()``
88979855d3SDonald Hunterhelper. This helper will return 0 on success, or negative error in case
89979855d3SDonald Hunterof failure.
90979855d3SDonald Hunter
91979855d3SDonald HunterPer CPU Hashes
92979855d3SDonald Hunter--------------
93979855d3SDonald Hunter
94979855d3SDonald HunterFor ``BPF_MAP_TYPE_PERCPU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH``
95979855d3SDonald Hunterthe ``bpf_map_update_elem()`` and ``bpf_map_lookup_elem()`` helpers
96979855d3SDonald Hunterautomatically access the hash slot for the current CPU.
97979855d3SDonald Hunter
98539886a3SDonald Hunterbpf_map_lookup_percpu_elem()
99539886a3SDonald Hunter~~~~~~~~~~~~~~~~~~~~~~~~~~~~
100539886a3SDonald Hunter
101539886a3SDonald Hunter.. code-block:: c
102539886a3SDonald Hunter
103979855d3SDonald Hunter   void *bpf_map_lookup_percpu_elem(struct bpf_map *map, const void *key, u32 cpu)
104979855d3SDonald Hunter
105979855d3SDonald HunterThe ``bpf_map_lookup_percpu_elem()`` helper can be used to lookup the
106979855d3SDonald Huntervalue in the hash slot for a specific CPU. Returns value associated with
107979855d3SDonald Hunter``key`` on ``cpu`` , or ``NULL`` if no entry was found or ``cpu`` is
108979855d3SDonald Hunterinvalid.
109979855d3SDonald Hunter
110979855d3SDonald HunterConcurrency
111979855d3SDonald Hunter-----------
112979855d3SDonald Hunter
113979855d3SDonald HunterValues stored in ``BPF_MAP_TYPE_HASH`` can be accessed concurrently by
114979855d3SDonald Hunterprograms running on different CPUs.  Since Kernel version 5.1, the BPF
115979855d3SDonald Hunterinfrastructure provides ``struct bpf_spin_lock`` to synchronise access.
116979855d3SDonald HunterSee ``tools/testing/selftests/bpf/progs/test_spin_lock.c``.
117979855d3SDonald Hunter
118979855d3SDonald HunterUserspace
119979855d3SDonald Hunter---------
120979855d3SDonald Hunter
121539886a3SDonald Hunterbpf_map_get_next_key()
122539886a3SDonald Hunter~~~~~~~~~~~~~~~~~~~~~~
123539886a3SDonald Hunter
124539886a3SDonald Hunter.. code-block:: c
125539886a3SDonald Hunter
126979855d3SDonald Hunter   int bpf_map_get_next_key(int fd, const void *cur_key, void *next_key)
127979855d3SDonald Hunter
128979855d3SDonald HunterIn userspace, it is possible to iterate through the keys of a hash using
129979855d3SDonald Hunterlibbpf's ``bpf_map_get_next_key()`` function. The first key can be fetched by
130979855d3SDonald Huntercalling ``bpf_map_get_next_key()`` with ``cur_key`` set to
131979855d3SDonald Hunter``NULL``. Subsequent calls will fetch the next key that follows the
132979855d3SDonald Huntercurrent key. ``bpf_map_get_next_key()`` returns 0 on success, -ENOENT if
133979855d3SDonald Huntercur_key is the last key in the hash, or negative error in case of
134979855d3SDonald Hunterfailure.
135979855d3SDonald Hunter
136979855d3SDonald HunterNote that if ``cur_key`` gets deleted then ``bpf_map_get_next_key()``
137979855d3SDonald Hunterwill instead return the *first* key in the hash table which is
138979855d3SDonald Hunterundesirable. It is recommended to use batched lookup if there is going
139979855d3SDonald Hunterto be key deletion intermixed with ``bpf_map_get_next_key()``.
140979855d3SDonald Hunter
141979855d3SDonald HunterExamples
142979855d3SDonald Hunter========
143979855d3SDonald Hunter
144979855d3SDonald HunterPlease see the ``tools/testing/selftests/bpf`` directory for functional
145979855d3SDonald Hunterexamples.  The code snippets below demonstrates API usage.
146979855d3SDonald Hunter
147979855d3SDonald HunterThis example shows how to declare an LRU Hash with a struct key and a
148979855d3SDonald Hunterstruct value.
149979855d3SDonald Hunter
150979855d3SDonald Hunter.. code-block:: c
151979855d3SDonald Hunter
152979855d3SDonald Hunter    #include <linux/bpf.h>
153979855d3SDonald Hunter    #include <bpf/bpf_helpers.h>
154979855d3SDonald Hunter
155979855d3SDonald Hunter    struct key {
156979855d3SDonald Hunter        __u32 srcip;
157979855d3SDonald Hunter    };
158979855d3SDonald Hunter
159979855d3SDonald Hunter    struct value {
160979855d3SDonald Hunter        __u64 packets;
161979855d3SDonald Hunter        __u64 bytes;
162979855d3SDonald Hunter    };
163979855d3SDonald Hunter
164979855d3SDonald Hunter    struct {
165979855d3SDonald Hunter            __uint(type, BPF_MAP_TYPE_LRU_HASH);
166979855d3SDonald Hunter            __uint(max_entries, 32);
167979855d3SDonald Hunter            __type(key, struct key);
168979855d3SDonald Hunter            __type(value, struct value);
169979855d3SDonald Hunter    } packet_stats SEC(".maps");
170979855d3SDonald Hunter
171979855d3SDonald HunterThis example shows how to create or update hash values using atomic
172979855d3SDonald Hunterinstructions:
173979855d3SDonald Hunter
174979855d3SDonald Hunter.. code-block:: c
175979855d3SDonald Hunter
176979855d3SDonald Hunter    static void update_stats(__u32 srcip, int bytes)
177979855d3SDonald Hunter    {
178979855d3SDonald Hunter            struct key key = {
179979855d3SDonald Hunter                    .srcip = srcip,
180979855d3SDonald Hunter            };
181979855d3SDonald Hunter            struct value *value = bpf_map_lookup_elem(&packet_stats, &key);
182979855d3SDonald Hunter
183979855d3SDonald Hunter            if (value) {
184979855d3SDonald Hunter                    __sync_fetch_and_add(&value->packets, 1);
185979855d3SDonald Hunter                    __sync_fetch_and_add(&value->bytes, bytes);
186979855d3SDonald Hunter            } else {
187979855d3SDonald Hunter                    struct value newval = { 1, bytes };
188979855d3SDonald Hunter
189979855d3SDonald Hunter                    bpf_map_update_elem(&packet_stats, &key, &newval, BPF_NOEXIST);
190979855d3SDonald Hunter            }
191979855d3SDonald Hunter    }
192979855d3SDonald Hunter
193979855d3SDonald HunterUserspace walking the map elements from the map declared above:
194979855d3SDonald Hunter
195979855d3SDonald Hunter.. code-block:: c
196979855d3SDonald Hunter
197979855d3SDonald Hunter    #include <bpf/libbpf.h>
198979855d3SDonald Hunter    #include <bpf/bpf.h>
199979855d3SDonald Hunter
200979855d3SDonald Hunter    static void walk_hash_elements(int map_fd)
201979855d3SDonald Hunter    {
202979855d3SDonald Hunter            struct key *cur_key = NULL;
203979855d3SDonald Hunter            struct key next_key;
204979855d3SDonald Hunter            struct value value;
205979855d3SDonald Hunter            int err;
206979855d3SDonald Hunter
207979855d3SDonald Hunter            for (;;) {
208979855d3SDonald Hunter                    err = bpf_map_get_next_key(map_fd, cur_key, &next_key);
209979855d3SDonald Hunter                    if (err)
210979855d3SDonald Hunter                            break;
211979855d3SDonald Hunter
212979855d3SDonald Hunter                    bpf_map_lookup_elem(map_fd, &next_key, &value);
213979855d3SDonald Hunter
214979855d3SDonald Hunter                    // Use key and value here
215979855d3SDonald Hunter
216979855d3SDonald Hunter                    cur_key = &next_key;
217979855d3SDonald Hunter            }
218979855d3SDonald Hunter    }
219*1a986518SJoe Stringer
220*1a986518SJoe StringerInternals
221*1a986518SJoe Stringer=========
222*1a986518SJoe Stringer
223*1a986518SJoe StringerThis section of the document is targeted at Linux developers and describes
224*1a986518SJoe Stringeraspects of the map implementations that are not considered stable ABI. The
225*1a986518SJoe Stringerfollowing details are subject to change in future versions of the kernel.
226*1a986518SJoe Stringer
227*1a986518SJoe Stringer``BPF_MAP_TYPE_LRU_HASH`` and variants
228*1a986518SJoe Stringer--------------------------------------
229*1a986518SJoe Stringer
230*1a986518SJoe StringerUpdating elements in LRU maps may trigger eviction behaviour when the capacity
231*1a986518SJoe Stringerof the map is reached. There are various steps that the update algorithm
232*1a986518SJoe Stringerattempts in order to enforce the LRU property which have increasing impacts on
233*1a986518SJoe Stringerother CPUs involved in the following operation attempts:
234*1a986518SJoe Stringer
235*1a986518SJoe Stringer- Attempt to use CPU-local state to batch operations
236*1a986518SJoe Stringer- Attempt to fetch free nodes from global lists
237*1a986518SJoe Stringer- Attempt to pull any node from a global list and remove it from the hashmap
238*1a986518SJoe Stringer- Attempt to pull any node from any CPU's list and remove it from the hashmap
239*1a986518SJoe Stringer
240*1a986518SJoe StringerThis algorithm is described visually in the following diagram. See the
241*1a986518SJoe Stringerdescription in commit 3a08c2fd7634 ("bpf: LRU List") for a full explanation of
242*1a986518SJoe Stringerthe corresponding operations:
243*1a986518SJoe Stringer
244*1a986518SJoe Stringer.. kernel-figure::  map_lru_hash_update.dot
245*1a986518SJoe Stringer   :alt:    Diagram outlining the LRU eviction steps taken during map update.
246*1a986518SJoe Stringer
247*1a986518SJoe Stringer   LRU hash eviction during map update for ``BPF_MAP_TYPE_LRU_HASH`` and
248*1a986518SJoe Stringer   variants. See the dot file source for kernel function name code references.
249*1a986518SJoe Stringer
250*1a986518SJoe StringerMap updates start from the oval in the top right "begin ``bpf_map_update()``"
251*1a986518SJoe Stringerand progress through the graph towards the bottom where the result may be
252*1a986518SJoe Stringereither a successful update or a failure with various error codes. The key in
253*1a986518SJoe Stringerthe top right provides indicators for which locks may be involved in specific
254*1a986518SJoe Stringeroperations. This is intended as a visual hint for reasoning about how map
255*1a986518SJoe Stringercontention may impact update operations, though the map type and flags may
256*1a986518SJoe Stringerimpact the actual contention on those locks, based on the logic described in
257*1a986518SJoe Stringerthe table above. For instance, if the map is created with type
258*1a986518SJoe Stringer``BPF_MAP_TYPE_LRU_PERCPU_HASH`` and flags ``BPF_F_NO_COMMON_LRU`` then all map
259*1a986518SJoe Stringerproperties would be per-cpu.
260