xref: /openbmc/linux/Documentation/mm/zsmalloc.rst (revision 36aa1e67)
1.. _zsmalloc:
2
3========
4zsmalloc
5========
6
7This allocator is designed for use with zram. Thus, the allocator is
8supposed to work well under low memory conditions. In particular, it
9never attempts higher order page allocation which is very likely to
10fail under memory pressure. On the other hand, if we just use single
11(0-order) pages, it would suffer from very high fragmentation --
12any object of size PAGE_SIZE/2 or larger would occupy an entire page.
13This was one of the major issues with its predecessor (xvmalloc).
14
15To overcome these issues, zsmalloc allocates a bunch of 0-order pages
16and links them together using various 'struct page' fields. These linked
17pages act as a single higher-order page i.e. an object can span 0-order
18page boundaries. The code refers to these linked pages as a single entity
19called zspage.
20
21For simplicity, zsmalloc can only allocate objects of size up to PAGE_SIZE
22since this satisfies the requirements of all its current users (in the
23worst case, page is incompressible and is thus stored "as-is" i.e. in
24uncompressed form). For allocation requests larger than this size, failure
25is returned (see zs_malloc).
26
27Additionally, zs_malloc() does not return a dereferenceable pointer.
28Instead, it returns an opaque handle (unsigned long) which encodes actual
29location of the allocated object. The reason for this indirection is that
30zsmalloc does not keep zspages permanently mapped since that would cause
31issues on 32-bit systems where the VA region for kernel space mappings
32is very small. So, before using the allocating memory, the object has to
33be mapped using zs_map_object() to get a usable pointer and subsequently
34unmapped using zs_unmap_object().
35
36stat
37====
38
39With CONFIG_ZSMALLOC_STAT, we could see zsmalloc internal information via
40``/sys/kernel/debug/zsmalloc/<user name>``. Here is a sample of stat output::
41
42 # cat /sys/kernel/debug/zsmalloc/zram0/classes
43
44 class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage
45    ...
46    ...
47     9   176           0            1           186        129          8                4
48    10   192           1            0          2880       2872        135                3
49    11   208           0            1           819        795         42                2
50    12   224           0            1           219        159         12                4
51    ...
52    ...
53
54
55class
56	index
57size
58	object size zspage stores
59almost_empty
60	the number of ZS_ALMOST_EMPTY zspages(see below)
61almost_full
62	the number of ZS_ALMOST_FULL zspages(see below)
63obj_allocated
64	the number of objects allocated
65obj_used
66	the number of objects allocated to the user
67pages_used
68	the number of pages allocated for the class
69pages_per_zspage
70	the number of 0-order pages to make a zspage
71
72We assign a zspage to ZS_ALMOST_EMPTY fullness group when n <= N / f, where
73
74* n = number of allocated objects
75* N = total number of objects zspage can store
76* f = fullness_threshold_frac(ie, 4 at the moment)
77
78Similarly, we assign zspage to:
79
80* ZS_ALMOST_FULL  when n > N / f
81* ZS_EMPTY        when n == 0
82* ZS_FULL         when n == N
83
84
85Internals
86=========
87
88zsmalloc has 255 size classes, each of which can hold a number of zspages.
89Each zspage can contain up to ZSMALLOC_CHAIN_SIZE physical (0-order) pages.
90The optimal zspage chain size for each size class is calculated during the
91creation of the zsmalloc pool (see calculate_zspage_chain_size()).
92
93As an optimization, zsmalloc merges size classes that have similar
94characteristics in terms of the number of pages per zspage and the number
95of objects that each zspage can store.
96
97For instance, consider the following size classes:::
98
99  class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
100  ...
101     94  1536           0            0             0          0          0                3        0
102    100  1632           0            0             0          0          0                2        0
103  ...
104
105
106Size classes #95-99 are merged with size class #100. This means that when we
107need to store an object of size, say, 1568 bytes, we end up using size class
108#100 instead of size class #96. Size class #100 is meant for objects of size
1091632 bytes, so each object of size 1568 bytes wastes 1632-1568=64 bytes.
110
111Size class #100 consists of zspages with 2 physical pages each, which can
112hold a total of 5 objects. If we need to store 13 objects of size 1568, we
113end up allocating three zspages, or 6 physical pages.
114
115However, if we take a closer look at size class #96 (which is meant for
116objects of size 1568 bytes) and trace `calculate_zspage_chain_size()`, we
117find that the most optimal zspage configuration for this class is a chain
118of 5 physical pages:::
119
120    pages per zspage      wasted bytes     used%
121           1                  960           76
122           2                  352           95
123           3                 1312           89
124           4                  704           95
125           5                   96           99
126
127This means that a class #96 configuration with 5 physical pages can store 13
128objects of size 1568 in a single zspage, using a total of 5 physical pages.
129This is more efficient than the class #100 configuration, which would use 6
130physical pages to store the same number of objects.
131
132As the zspage chain size for class #96 increases, its key characteristics
133such as pages per-zspage and objects per-zspage also change. This leads to
134dewer class mergers, resulting in a more compact grouping of classes, which
135reduces memory wastage.
136
137Let's take a closer look at the bottom of `/sys/kernel/debug/zsmalloc/zramX/classes`:::
138
139  class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
140  ...
141    202  3264           0            0             0          0          0                4        0
142    254  4096           0            0             0          0          0                1        0
143  ...
144
145Size class #202 stores objects of size 3264 bytes and has a maximum of 4 pages
146per zspage. Any object larger than 3264 bytes is considered huge and belongs
147to size class #254, which stores each object in its own physical page (objects
148in huge classes do not share pages).
149
150Increasing the size of the chain of zspages also results in a higher watermark
151for the huge size class and fewer huge classes overall. This allows for more
152efficient storage of large objects.
153
154For zspage chain size of 8, huge class watermark becomes 3632 bytes:::
155
156  class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
157  ...
158    202  3264           0            0             0          0          0                4        0
159    211  3408           0            0             0          0          0                5        0
160    217  3504           0            0             0          0          0                6        0
161    222  3584           0            0             0          0          0                7        0
162    225  3632           0            0             0          0          0                8        0
163    254  4096           0            0             0          0          0                1        0
164  ...
165
166For zspage chain size of 16, huge class watermark becomes 3840 bytes:::
167
168  class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
169  ...
170    202  3264           0            0             0          0          0                4        0
171    206  3328           0            0             0          0          0               13        0
172    207  3344           0            0             0          0          0                9        0
173    208  3360           0            0             0          0          0               14        0
174    211  3408           0            0             0          0          0                5        0
175    212  3424           0            0             0          0          0               16        0
176    214  3456           0            0             0          0          0               11        0
177    217  3504           0            0             0          0          0                6        0
178    219  3536           0            0             0          0          0               13        0
179    222  3584           0            0             0          0          0                7        0
180    223  3600           0            0             0          0          0               15        0
181    225  3632           0            0             0          0          0                8        0
182    228  3680           0            0             0          0          0                9        0
183    230  3712           0            0             0          0          0               10        0
184    232  3744           0            0             0          0          0               11        0
185    234  3776           0            0             0          0          0               12        0
186    235  3792           0            0             0          0          0               13        0
187    236  3808           0            0             0          0          0               14        0
188    238  3840           0            0             0          0          0               15        0
189    254  4096           0            0             0          0          0                1        0
190  ...
191
192Overall the combined zspage chain size effect on zsmalloc pool configuration:::
193
194  pages per zspage   number of size classes (clusters)   huge size class watermark
195         4                        69                               3264
196         5                        86                               3408
197         6                        93                               3504
198         7                       112                               3584
199         8                       123                               3632
200         9                       140                               3680
201        10                       143                               3712
202        11                       159                               3744
203        12                       164                               3776
204        13                       180                               3792
205        14                       183                               3808
206        15                       188                               3840
207        16                       191                               3840
208
209
210A synthetic test
211----------------
212
213zram as a build artifacts storage (Linux kernel compilation).
214
215* `CONFIG_ZSMALLOC_CHAIN_SIZE=4`
216
217  zsmalloc classes stats:::
218
219    class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
220    ...
221    Total                13           51        413836     412973     159955                         3
222
223  zram mm_stat:::
224
225   1691783168 628083717 655175680        0 655175680       60        0    34048    34049
226
227
228* `CONFIG_ZSMALLOC_CHAIN_SIZE=8`
229
230  zsmalloc classes stats:::
231
232    class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
233    ...
234    Total                18           87        414852     412978     156666                         0
235
236  zram mm_stat:::
237
238    1691803648 627793930 641703936        0 641703936       60        0    33591    33591
239
240Using larger zspage chains may result in using fewer physical pages, as seen
241in the example where the number of physical pages used decreased from 159955
242to 156666, at the same time maximum zsmalloc pool memory usage went down from
243655175680 to 641703936 bytes.
244
245However, this advantage may be offset by the potential for increased system
246memory pressure (as some zspages have larger chain sizes) in cases where there
247is heavy internal fragmentation and zspool compaction is unable to relocate
248objects and release zspages. In these cases, it is recommended to decrease
249the limit on the size of the zspage chains (as specified by the
250CONFIG_ZSMALLOC_CHAIN_SIZE option).
251