xref: /openbmc/linux/Documentation/mm/zsmalloc.rst (revision d47a97bd)
1========
2zsmalloc
3========
4
5This allocator is designed for use with zram. Thus, the allocator is
6supposed to work well under low memory conditions. In particular, it
7never attempts higher order page allocation which is very likely to
8fail under memory pressure. On the other hand, if we just use single
9(0-order) pages, it would suffer from very high fragmentation --
10any object of size PAGE_SIZE/2 or larger would occupy an entire page.
11This was one of the major issues with its predecessor (xvmalloc).
12
13To overcome these issues, zsmalloc allocates a bunch of 0-order pages
14and links them together using various 'struct page' fields. These linked
15pages act as a single higher-order page i.e. an object can span 0-order
16page boundaries. The code refers to these linked pages as a single entity
17called zspage.
18
19For simplicity, zsmalloc can only allocate objects of size up to PAGE_SIZE
20since this satisfies the requirements of all its current users (in the
21worst case, page is incompressible and is thus stored "as-is" i.e. in
22uncompressed form). For allocation requests larger than this size, failure
23is returned (see zs_malloc).
24
25Additionally, zs_malloc() does not return a dereferenceable pointer.
26Instead, it returns an opaque handle (unsigned long) which encodes actual
27location of the allocated object. The reason for this indirection is that
28zsmalloc does not keep zspages permanently mapped since that would cause
29issues on 32-bit systems where the VA region for kernel space mappings
30is very small. So, before using the allocating memory, the object has to
31be mapped using zs_map_object() to get a usable pointer and subsequently
32unmapped using zs_unmap_object().
33
34stat
35====
36
37With CONFIG_ZSMALLOC_STAT, we could see zsmalloc internal information via
38``/sys/kernel/debug/zsmalloc/<user name>``. Here is a sample of stat output::
39
40 # cat /sys/kernel/debug/zsmalloc/zram0/classes
41
42 class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage
43    ...
44    ...
45     9   176           0            1           186        129          8                4
46    10   192           1            0          2880       2872        135                3
47    11   208           0            1           819        795         42                2
48    12   224           0            1           219        159         12                4
49    ...
50    ...
51
52
53class
54	index
55size
56	object size zspage stores
57almost_empty
58	the number of ZS_ALMOST_EMPTY zspages(see below)
59almost_full
60	the number of ZS_ALMOST_FULL zspages(see below)
61obj_allocated
62	the number of objects allocated
63obj_used
64	the number of objects allocated to the user
65pages_used
66	the number of pages allocated for the class
67pages_per_zspage
68	the number of 0-order pages to make a zspage
69
70We assign a zspage to ZS_ALMOST_EMPTY fullness group when n <= N / f, where
71
72* n = number of allocated objects
73* N = total number of objects zspage can store
74* f = fullness_threshold_frac(ie, 4 at the moment)
75
76Similarly, we assign zspage to:
77
78* ZS_ALMOST_FULL  when n > N / f
79* ZS_EMPTY        when n == 0
80* ZS_FULL         when n == N
81
82
83Internals
84=========
85
86zsmalloc has 255 size classes, each of which can hold a number of zspages.
87Each zspage can contain up to ZSMALLOC_CHAIN_SIZE physical (0-order) pages.
88The optimal zspage chain size for each size class is calculated during the
89creation of the zsmalloc pool (see calculate_zspage_chain_size()).
90
91As an optimization, zsmalloc merges size classes that have similar
92characteristics in terms of the number of pages per zspage and the number
93of objects that each zspage can store.
94
95For instance, consider the following size classes:::
96
97  class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
98  ...
99     94  1536           0            0             0          0          0                3        0
100    100  1632           0            0             0          0          0                2        0
101  ...
102
103
104Size classes #95-99 are merged with size class #100. This means that when we
105need to store an object of size, say, 1568 bytes, we end up using size class
106#100 instead of size class #96. Size class #100 is meant for objects of size
1071632 bytes, so each object of size 1568 bytes wastes 1632-1568=64 bytes.
108
109Size class #100 consists of zspages with 2 physical pages each, which can
110hold a total of 5 objects. If we need to store 13 objects of size 1568, we
111end up allocating three zspages, or 6 physical pages.
112
113However, if we take a closer look at size class #96 (which is meant for
114objects of size 1568 bytes) and trace `calculate_zspage_chain_size()`, we
115find that the most optimal zspage configuration for this class is a chain
116of 5 physical pages:::
117
118    pages per zspage      wasted bytes     used%
119           1                  960           76
120           2                  352           95
121           3                 1312           89
122           4                  704           95
123           5                   96           99
124
125This means that a class #96 configuration with 5 physical pages can store 13
126objects of size 1568 in a single zspage, using a total of 5 physical pages.
127This is more efficient than the class #100 configuration, which would use 6
128physical pages to store the same number of objects.
129
130As the zspage chain size for class #96 increases, its key characteristics
131such as pages per-zspage and objects per-zspage also change. This leads to
132dewer class mergers, resulting in a more compact grouping of classes, which
133reduces memory wastage.
134
135Let's take a closer look at the bottom of `/sys/kernel/debug/zsmalloc/zramX/classes`:::
136
137  class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
138  ...
139    202  3264           0            0             0          0          0                4        0
140    254  4096           0            0             0          0          0                1        0
141  ...
142
143Size class #202 stores objects of size 3264 bytes and has a maximum of 4 pages
144per zspage. Any object larger than 3264 bytes is considered huge and belongs
145to size class #254, which stores each object in its own physical page (objects
146in huge classes do not share pages).
147
148Increasing the size of the chain of zspages also results in a higher watermark
149for the huge size class and fewer huge classes overall. This allows for more
150efficient storage of large objects.
151
152For zspage chain size of 8, huge class watermark becomes 3632 bytes:::
153
154  class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
155  ...
156    202  3264           0            0             0          0          0                4        0
157    211  3408           0            0             0          0          0                5        0
158    217  3504           0            0             0          0          0                6        0
159    222  3584           0            0             0          0          0                7        0
160    225  3632           0            0             0          0          0                8        0
161    254  4096           0            0             0          0          0                1        0
162  ...
163
164For zspage chain size of 16, huge class watermark becomes 3840 bytes:::
165
166  class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
167  ...
168    202  3264           0            0             0          0          0                4        0
169    206  3328           0            0             0          0          0               13        0
170    207  3344           0            0             0          0          0                9        0
171    208  3360           0            0             0          0          0               14        0
172    211  3408           0            0             0          0          0                5        0
173    212  3424           0            0             0          0          0               16        0
174    214  3456           0            0             0          0          0               11        0
175    217  3504           0            0             0          0          0                6        0
176    219  3536           0            0             0          0          0               13        0
177    222  3584           0            0             0          0          0                7        0
178    223  3600           0            0             0          0          0               15        0
179    225  3632           0            0             0          0          0                8        0
180    228  3680           0            0             0          0          0                9        0
181    230  3712           0            0             0          0          0               10        0
182    232  3744           0            0             0          0          0               11        0
183    234  3776           0            0             0          0          0               12        0
184    235  3792           0            0             0          0          0               13        0
185    236  3808           0            0             0          0          0               14        0
186    238  3840           0            0             0          0          0               15        0
187    254  4096           0            0             0          0          0                1        0
188  ...
189
190Overall the combined zspage chain size effect on zsmalloc pool configuration:::
191
192  pages per zspage   number of size classes (clusters)   huge size class watermark
193         4                        69                               3264
194         5                        86                               3408
195         6                        93                               3504
196         7                       112                               3584
197         8                       123                               3632
198         9                       140                               3680
199        10                       143                               3712
200        11                       159                               3744
201        12                       164                               3776
202        13                       180                               3792
203        14                       183                               3808
204        15                       188                               3840
205        16                       191                               3840
206
207
208A synthetic test
209----------------
210
211zram as a build artifacts storage (Linux kernel compilation).
212
213* `CONFIG_ZSMALLOC_CHAIN_SIZE=4`
214
215  zsmalloc classes stats:::
216
217    class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
218    ...
219    Total                13           51        413836     412973     159955                         3
220
221  zram mm_stat:::
222
223   1691783168 628083717 655175680        0 655175680       60        0    34048    34049
224
225
226* `CONFIG_ZSMALLOC_CHAIN_SIZE=8`
227
228  zsmalloc classes stats:::
229
230    class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
231    ...
232    Total                18           87        414852     412978     156666                         0
233
234  zram mm_stat:::
235
236    1691803648 627793930 641703936        0 641703936       60        0    33591    33591
237
238Using larger zspage chains may result in using fewer physical pages, as seen
239in the example where the number of physical pages used decreased from 159955
240to 156666, at the same time maximum zsmalloc pool memory usage went down from
241655175680 to 641703936 bytes.
242
243However, this advantage may be offset by the potential for increased system
244memory pressure (as some zspages have larger chain sizes) in cases where there
245is heavy internal fragmentation and zspool compaction is unable to relocate
246objects and release zspages. In these cases, it is recommended to decrease
247the limit on the size of the zspage chains (as specified by the
248CONFIG_ZSMALLOC_CHAIN_SIZE option).
249