xref: /openbmc/linux/Documentation/mm/zsmalloc.rst (revision f4356947)
1========
2zsmalloc
3========
4
5This allocator is designed for use with zram. Thus, the allocator is
6supposed to work well under low memory conditions. In particular, it
7never attempts higher order page allocation which is very likely to
8fail under memory pressure. On the other hand, if we just use single
9(0-order) pages, it would suffer from very high fragmentation --
10any object of size PAGE_SIZE/2 or larger would occupy an entire page.
11This was one of the major issues with its predecessor (xvmalloc).
12
13To overcome these issues, zsmalloc allocates a bunch of 0-order pages
14and links them together using various 'struct page' fields. These linked
15pages act as a single higher-order page i.e. an object can span 0-order
16page boundaries. The code refers to these linked pages as a single entity
17called zspage.
18
19For simplicity, zsmalloc can only allocate objects of size up to PAGE_SIZE
20since this satisfies the requirements of all its current users (in the
21worst case, page is incompressible and is thus stored "as-is" i.e. in
22uncompressed form). For allocation requests larger than this size, failure
23is returned (see zs_malloc).
24
25Additionally, zs_malloc() does not return a dereferenceable pointer.
26Instead, it returns an opaque handle (unsigned long) which encodes actual
27location of the allocated object. The reason for this indirection is that
28zsmalloc does not keep zspages permanently mapped since that would cause
29issues on 32-bit systems where the VA region for kernel space mappings
30is very small. So, before using the allocating memory, the object has to
31be mapped using zs_map_object() to get a usable pointer and subsequently
32unmapped using zs_unmap_object().
33
34stat
35====
36
37With CONFIG_ZSMALLOC_STAT, we could see zsmalloc internal information via
38``/sys/kernel/debug/zsmalloc/<user name>``. Here is a sample of stat output::
39
40 # cat /sys/kernel/debug/zsmalloc/zram0/classes
41
42 class  size       10%       20%       30%       40%       50%       60%       70%       80%       90%       99%      100% obj_allocated   obj_used pages_used pages_per_zspage freeable
43    ...
44    ...
45    30   512         0        12         4         1         0         1         0         0         1         0       414          3464       3346        433                1       14
46    31   528         2         7         2         2         1         0         1         0         0         2       117          4154       3793        536                4       44
47    32   544         6         3         4         1         2         1         0         0         0         1       260          4170       3965        556                2       26
48    ...
49    ...
50
51
52class
53	index
54size
55	object size zspage stores
5610%
57	the number of zspages with usage ratio less than 10% (see below)
5820%
59	the number of zspages with usage ratio between 10% and 20%
6030%
61	the number of zspages with usage ratio between 20% and 30%
6240%
63	the number of zspages with usage ratio between 30% and 40%
6450%
65	the number of zspages with usage ratio between 40% and 50%
6660%
67	the number of zspages with usage ratio between 50% and 60%
6870%
69	the number of zspages with usage ratio between 60% and 70%
7080%
71	the number of zspages with usage ratio between 70% and 80%
7290%
73	the number of zspages with usage ratio between 80% and 90%
7499%
75	the number of zspages with usage ratio between 90% and 99%
76100%
77	the number of zspages with usage ratio 100%
78obj_allocated
79	the number of objects allocated
80obj_used
81	the number of objects allocated to the user
82pages_used
83	the number of pages allocated for the class
84pages_per_zspage
85	the number of 0-order pages to make a zspage
86freeable
87	the approximate number of pages class compaction can free
88
89Each zspage maintains inuse counter which keeps track of the number of
90objects stored in the zspage.  The inuse counter determines the zspage's
91"fullness group" which is calculated as the ratio of the "inuse" objects to
92the total number of objects the zspage can hold (objs_per_zspage). The
93closer the inuse counter is to objs_per_zspage, the better.
94
95Internals
96=========
97
98zsmalloc has 255 size classes, each of which can hold a number of zspages.
99Each zspage can contain up to ZSMALLOC_CHAIN_SIZE physical (0-order) pages.
100The optimal zspage chain size for each size class is calculated during the
101creation of the zsmalloc pool (see calculate_zspage_chain_size()).
102
103As an optimization, zsmalloc merges size classes that have similar
104characteristics in terms of the number of pages per zspage and the number
105of objects that each zspage can store.
106
107For instance, consider the following size classes:::
108
109  class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable
110  ...
111     94  1536        0    ....       0             0          0          0                3        0
112    100  1632        0    ....       0             0          0          0                2        0
113  ...
114
115
116Size classes #95-99 are merged with size class #100. This means that when we
117need to store an object of size, say, 1568 bytes, we end up using size class
118#100 instead of size class #96. Size class #100 is meant for objects of size
1191632 bytes, so each object of size 1568 bytes wastes 1632-1568=64 bytes.
120
121Size class #100 consists of zspages with 2 physical pages each, which can
122hold a total of 5 objects. If we need to store 13 objects of size 1568, we
123end up allocating three zspages, or 6 physical pages.
124
125However, if we take a closer look at size class #96 (which is meant for
126objects of size 1568 bytes) and trace `calculate_zspage_chain_size()`, we
127find that the most optimal zspage configuration for this class is a chain
128of 5 physical pages:::
129
130    pages per zspage      wasted bytes     used%
131           1                  960           76
132           2                  352           95
133           3                 1312           89
134           4                  704           95
135           5                   96           99
136
137This means that a class #96 configuration with 5 physical pages can store 13
138objects of size 1568 in a single zspage, using a total of 5 physical pages.
139This is more efficient than the class #100 configuration, which would use 6
140physical pages to store the same number of objects.
141
142As the zspage chain size for class #96 increases, its key characteristics
143such as pages per-zspage and objects per-zspage also change. This leads to
144dewer class mergers, resulting in a more compact grouping of classes, which
145reduces memory wastage.
146
147Let's take a closer look at the bottom of `/sys/kernel/debug/zsmalloc/zramX/classes`:::
148
149  class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable
150
151  ...
152    202  3264         0   ..         0             0          0          0                4        0
153    254  4096         0   ..         0             0          0          0                1        0
154  ...
155
156Size class #202 stores objects of size 3264 bytes and has a maximum of 4 pages
157per zspage. Any object larger than 3264 bytes is considered huge and belongs
158to size class #254, which stores each object in its own physical page (objects
159in huge classes do not share pages).
160
161Increasing the size of the chain of zspages also results in a higher watermark
162for the huge size class and fewer huge classes overall. This allows for more
163efficient storage of large objects.
164
165For zspage chain size of 8, huge class watermark becomes 3632 bytes:::
166
167  class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable
168
169  ...
170    202  3264         0   ..         0             0          0          0                4        0
171    211  3408         0   ..         0             0          0          0                5        0
172    217  3504         0   ..         0             0          0          0                6        0
173    222  3584         0   ..         0             0          0          0                7        0
174    225  3632         0   ..         0             0          0          0                8        0
175    254  4096         0   ..         0             0          0          0                1        0
176  ...
177
178For zspage chain size of 16, huge class watermark becomes 3840 bytes:::
179
180  class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable
181
182  ...
183    202  3264         0   ..         0             0          0          0                4        0
184    206  3328         0   ..         0             0          0          0               13        0
185    207  3344         0   ..         0             0          0          0                9        0
186    208  3360         0   ..         0             0          0          0               14        0
187    211  3408         0   ..         0             0          0          0                5        0
188    212  3424         0   ..         0             0          0          0               16        0
189    214  3456         0   ..         0             0          0          0               11        0
190    217  3504         0   ..         0             0          0          0                6        0
191    219  3536         0   ..         0             0          0          0               13        0
192    222  3584         0   ..         0             0          0          0                7        0
193    223  3600         0   ..         0             0          0          0               15        0
194    225  3632         0   ..         0             0          0          0                8        0
195    228  3680         0   ..         0             0          0          0                9        0
196    230  3712         0   ..         0             0          0          0               10        0
197    232  3744         0   ..         0             0          0          0               11        0
198    234  3776         0   ..         0             0          0          0               12        0
199    235  3792         0   ..         0             0          0          0               13        0
200    236  3808         0   ..         0             0          0          0               14        0
201    238  3840         0   ..         0             0          0          0               15        0
202    254  4096         0   ..         0             0          0          0                1        0
203  ...
204
205Overall the combined zspage chain size effect on zsmalloc pool configuration:::
206
207  pages per zspage   number of size classes (clusters)   huge size class watermark
208         4                        69                               3264
209         5                        86                               3408
210         6                        93                               3504
211         7                       112                               3584
212         8                       123                               3632
213         9                       140                               3680
214        10                       143                               3712
215        11                       159                               3744
216        12                       164                               3776
217        13                       180                               3792
218        14                       183                               3808
219        15                       188                               3840
220        16                       191                               3840
221
222
223A synthetic test
224----------------
225
226zram as a build artifacts storage (Linux kernel compilation).
227
228* `CONFIG_ZSMALLOC_CHAIN_SIZE=4`
229
230  zsmalloc classes stats:::
231
232    class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable
233
234    ...
235    Total              13   ..        51        413836     412973     159955                         3
236
237  zram mm_stat:::
238
239   1691783168 628083717 655175680        0 655175680       60        0    34048    34049
240
241
242* `CONFIG_ZSMALLOC_CHAIN_SIZE=8`
243
244  zsmalloc classes stats:::
245
246    class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable
247
248    ...
249    Total              18   ..        87        414852     412978     156666                         0
250
251  zram mm_stat:::
252
253    1691803648 627793930 641703936        0 641703936       60        0    33591    33591
254
255Using larger zspage chains may result in using fewer physical pages, as seen
256in the example where the number of physical pages used decreased from 159955
257to 156666, at the same time maximum zsmalloc pool memory usage went down from
258655175680 to 641703936 bytes.
259
260However, this advantage may be offset by the potential for increased system
261memory pressure (as some zspages have larger chain sizes) in cases where there
262is heavy internal fragmentation and zspool compaction is unable to relocate
263objects and release zspages. In these cases, it is recommended to decrease
264the limit on the size of the zspage chains (as specified by the
265CONFIG_ZSMALLOC_CHAIN_SIZE option).
266