xref: /openbmc/linux/Documentation/networking/page_pool.rst (revision 35b1b1fd96388d5e3cf179bf36bd8a4153baf4a3)
1.. SPDX-License-Identifier: GPL-2.0
2
3=============
4Page Pool API
5=============
6
7The page_pool allocator is optimized for the XDP mode that uses one frame
8per-page, but it can fallback on the regular page allocator APIs.
9
10Basic use involves replacing alloc_pages() calls with the
11page_pool_alloc_pages() call.  Drivers should use page_pool_dev_alloc_pages()
12replacing dev_alloc_pages().
13
14API keeps track of in-flight pages, in order to let API user know
15when it is safe to free a page_pool object.  Thus, API users
16must call page_pool_put_page() to free the page, or attach
17the page to a page_pool-aware objects like skbs marked with
18skb_mark_for_recycle().
19
20API user must call page_pool_put_page() once on a page, as it
21will either recycle the page, or in case of refcnt > 1, it will
22release the DMA mapping and in-flight state accounting.
23
24Architecture overview
25=====================
26
27.. code-block:: none
28
29    +------------------+
30    |       Driver     |
31    +------------------+
32            ^
33            |
34            |
35            |
36            v
37    +--------------------------------------------+
38    |                request memory              |
39    +--------------------------------------------+
40        ^                                  ^
41        |                                  |
42        | Pool empty                       | Pool has entries
43        |                                  |
44        v                                  v
45    +-----------------------+     +------------------------+
46    | alloc (and map) pages |     |  get page from cache   |
47    +-----------------------+     +------------------------+
48                                    ^                    ^
49                                    |                    |
50                                    | cache available    | No entries, refill
51                                    |                    | from ptr-ring
52                                    |                    |
53                                    v                    v
54                          +-----------------+     +------------------+
55                          |   Fast cache    |     |  ptr-ring cache  |
56                          +-----------------+     +------------------+
57
58API interface
59=============
60The number of pools created **must** match the number of hardware queues
61unless hardware restrictions make that impossible. This would otherwise beat the
62purpose of page pool, which is allocate pages fast from cache without locking.
63This lockless guarantee naturally comes from running under a NAPI softirq.
64The protection doesn't strictly have to be NAPI, any guarantee that allocating
65a page will cause no race conditions is enough.
66
67.. kernel-doc:: net/core/page_pool.c
68   :identifiers: page_pool_create
69
70.. kernel-doc:: include/net/page_pool.h
71   :identifiers: struct page_pool_params
72
73.. kernel-doc:: include/net/page_pool.h
74   :identifiers: page_pool_put_page page_pool_put_full_page
75		 page_pool_recycle_direct page_pool_dev_alloc_pages
76		 page_pool_get_dma_addr page_pool_get_dma_dir
77
78.. kernel-doc:: net/core/page_pool.c
79   :identifiers: page_pool_put_page_bulk page_pool_get_stats
80
81DMA sync
82--------
83Driver is always responsible for syncing the pages for the CPU.
84Drivers may choose to take care of syncing for the device as well
85or set the ``PP_FLAG_DMA_SYNC_DEV`` flag to request that pages
86allocated from the page pool are already synced for the device.
87
88If ``PP_FLAG_DMA_SYNC_DEV`` is set, the driver must inform the core what portion
89of the buffer has to be synced. This allows the core to avoid syncing the entire
90page when the drivers knows that the device only accessed a portion of the page.
91
92Most drivers will reserve headroom in front of the frame. This part
93of the buffer is not touched by the device, so to avoid syncing
94it drivers can set the ``offset`` field in struct page_pool_params
95appropriately.
96
97For pages recycled on the XDP xmit and skb paths the page pool will
98use the ``max_len`` member of struct page_pool_params to decide how
99much of the page needs to be synced (starting at ``offset``).
100When directly freeing pages in the driver (page_pool_put_page())
101the ``dma_sync_size`` argument specifies how much of the buffer needs
102to be synced.
103
104If in doubt set ``offset`` to 0, ``max_len`` to ``PAGE_SIZE`` and
105pass -1 as ``dma_sync_size``. That combination of arguments is always
106correct.
107
108Note that the syncing parameters are for the entire page.
109This is important to remember when using fragments (``PP_FLAG_PAGE_FRAG``),
110where allocated buffers may be smaller than a full page.
111Unless the driver author really understands page pool internals
112it's recommended to always use ``offset = 0``, ``max_len = PAGE_SIZE``
113with fragmented page pools.
114
115Stats API and structures
116------------------------
117If the kernel is configured with ``CONFIG_PAGE_POOL_STATS=y``, the API
118page_pool_get_stats() and structures described below are available.
119It takes a  pointer to a ``struct page_pool`` and a pointer to a struct
120page_pool_stats allocated by the caller.
121
122The API will fill in the provided struct page_pool_stats with
123statistics about the page_pool.
124
125.. kernel-doc:: include/net/page_pool.h
126   :identifiers: struct page_pool_recycle_stats
127		 struct page_pool_alloc_stats
128		 struct page_pool_stats
129
130Coding examples
131===============
132
133Registration
134------------
135
136.. code-block:: c
137
138    /* Page pool registration */
139    struct page_pool_params pp_params = { 0 };
140    struct xdp_rxq_info xdp_rxq;
141    int err;
142
143    pp_params.order = 0;
144    /* internal DMA mapping in page_pool */
145    pp_params.flags = PP_FLAG_DMA_MAP;
146    pp_params.pool_size = DESC_NUM;
147    pp_params.nid = NUMA_NO_NODE;
148    pp_params.dev = priv->dev;
149    pp_params.napi = napi; /* only if locking is tied to NAPI */
150    pp_params.dma_dir = xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE;
151    page_pool = page_pool_create(&pp_params);
152
153    err = xdp_rxq_info_reg(&xdp_rxq, ndev, 0);
154    if (err)
155        goto err_out;
156
157    err = xdp_rxq_info_reg_mem_model(&xdp_rxq, MEM_TYPE_PAGE_POOL, page_pool);
158    if (err)
159        goto err_out;
160
161NAPI poller
162-----------
163
164
165.. code-block:: c
166
167    /* NAPI Rx poller */
168    enum dma_data_direction dma_dir;
169
170    dma_dir = page_pool_get_dma_dir(dring->page_pool);
171    while (done < budget) {
172        if (some error)
173            page_pool_recycle_direct(page_pool, page);
174        if (packet_is_xdp) {
175            if XDP_DROP:
176                page_pool_recycle_direct(page_pool, page);
177        } else (packet_is_skb) {
178            skb_mark_for_recycle(skb);
179            new_page = page_pool_dev_alloc_pages(page_pool);
180        }
181    }
182
183Stats
184-----
185
186.. code-block:: c
187
188	#ifdef CONFIG_PAGE_POOL_STATS
189	/* retrieve stats */
190	struct page_pool_stats stats = { 0 };
191	if (page_pool_get_stats(page_pool, &stats)) {
192		/* perhaps the driver reports statistics with ethool */
193		ethtool_print_allocation_stats(&stats.alloc_stats);
194		ethtool_print_recycle_stats(&stats.recycle_stats);
195	}
196	#endif
197
198Driver unload
199-------------
200
201.. code-block:: c
202
203    /* Driver unload */
204    page_pool_put_full_page(page_pool, page, false);
205    xdp_rxq_info_unreg(&xdp_rxq);
206