1.. SPDX-License-Identifier: GPL-2.0
2
3======
4Design
5======
6
7
8Overall Architecture
9====================
10
11DAMON subsystem is configured with three layers including
12
13- Operations Set: Implements fundamental operations for DAMON that depends on
14  the given monitoring target address-space and available set of
15  software/hardware primitives,
16- Core: Implements core logics including monitoring overhead/accurach control
17  and access-aware system operations on top of the operations set layer, and
18- Modules: Implements kernel modules for various purposes that provides
19  interfaces for the user space, on top of the core layer.
20
21
22Configurable Operations Set
23---------------------------
24
25For data access monitoring and additional low level work, DAMON needs a set of
26implementations for specific operations that are dependent on and optimized for
27the given target address space.  On the other hand, the accuracy and overhead
28tradeoff mechanism, which is the core logic of DAMON, is in the pure logic
29space.  DAMON separates the two parts in different layers, namely DAMON
30Operations Set and DAMON Core Logics Layers, respectively.  It further defines
31the interface between the layers to allow various operations sets to be
32configured with the core logic.
33
34Due to this design, users can extend DAMON for any address space by configuring
35the core logic to use the appropriate operations set.  If any appropriate set
36is unavailable, users can implement one on their own.
37
38For example, physical memory, virtual memory, swap space, those for specific
39processes, NUMA nodes, files, and backing memory devices would be supportable.
40Also, if some architectures or devices supporting special optimized access
41check primitives, those will be easily configurable.
42
43
44Programmable Modules
45--------------------
46
47Core layer of DAMON is implemented as a framework, and exposes its application
48programming interface to all kernel space components such as subsystems and
49modules.  For common use cases of DAMON, DAMON subsystem provides kernel
50modules that built on top of the core layer using the API, which can be easily
51used by the user space end users.
52
53
54Operations Set Layer
55====================
56
57The monitoring operations are defined in two parts:
58
591. Identification of the monitoring target address range for the address space.
602. Access check of specific address range in the target space.
61
62DAMON currently provides the implementations of the operations for the physical
63and virtual address spaces. Below two subsections describe how those work.
64
65
66VMA-based Target Address Range Construction
67-------------------------------------------
68
69This is only for the virtual address space monitoring operations
70implementation.  That for the physical address space simply asks users to
71manually set the monitoring target address ranges.
72
73Only small parts in the super-huge virtual address space of the processes are
74mapped to the physical memory and accessed.  Thus, tracking the unmapped
75address regions is just wasteful.  However, because DAMON can deal with some
76level of noise using the adaptive regions adjustment mechanism, tracking every
77mapping is not strictly required but could even incur a high overhead in some
78cases.  That said, too huge unmapped areas inside the monitoring target should
79be removed to not take the time for the adaptive mechanism.
80
81For the reason, this implementation converts the complex mappings to three
82distinct regions that cover every mapped area of the address space.  The two
83gaps between the three regions are the two biggest unmapped areas in the given
84address space.  The two biggest unmapped areas would be the gap between the
85heap and the uppermost mmap()-ed region, and the gap between the lowermost
86mmap()-ed region and the stack in most of the cases.  Because these gaps are
87exceptionally huge in usual address spaces, excluding these will be sufficient
88to make a reasonable trade-off.  Below shows this in detail::
89
90    <heap>
91    <BIG UNMAPPED REGION 1>
92    <uppermost mmap()-ed region>
93    (small mmap()-ed regions and munmap()-ed regions)
94    <lowermost mmap()-ed region>
95    <BIG UNMAPPED REGION 2>
96    <stack>
97
98
99PTE Accessed-bit Based Access Check
100-----------------------------------
101
102Both of the implementations for physical and virtual address spaces use PTE
103Accessed-bit for basic access checks.  Only one difference is the way of
104finding the relevant PTE Accessed bit(s) from the address.  While the
105implementation for the virtual address walks the page table for the target task
106of the address, the implementation for the physical address walks every page
107table having a mapping to the address.  In this way, the implementations find
108and clear the bit(s) for next sampling target address and checks whether the
109bit(s) set again after one sampling period.  This could disturb other kernel
110subsystems using the Accessed bits, namely Idle page tracking and the reclaim
111logic.  DAMON does nothing to avoid disturbing Idle page tracking, so handling
112the interference is the responsibility of sysadmins.  However, it solves the
113conflict with the reclaim logic using ``PG_idle`` and ``PG_young`` page flags,
114as Idle page tracking does.
115
116
117Core Logics
118===========
119
120
121Monitoring
122----------
123
124Below four sections describe each of the DAMON core mechanisms and the five
125monitoring attributes, ``sampling interval``, ``aggregation interval``,
126``update interval``, ``minimum number of regions``, and ``maximum number of
127regions``.
128
129
130Access Frequency Monitoring
131~~~~~~~~~~~~~~~~~~~~~~~~~~~
132
133The output of DAMON says what pages are how frequently accessed for a given
134duration.  The resolution of the access frequency is controlled by setting
135``sampling interval`` and ``aggregation interval``.  In detail, DAMON checks
136access to each page per ``sampling interval`` and aggregates the results.  In
137other words, counts the number of the accesses to each page.  After each
138``aggregation interval`` passes, DAMON calls callback functions that previously
139registered by users so that users can read the aggregated results and then
140clears the results.  This can be described in below simple pseudo-code::
141
142    while monitoring_on:
143        for page in monitoring_target:
144            if accessed(page):
145                nr_accesses[page] += 1
146        if time() % aggregation_interval == 0:
147            for callback in user_registered_callbacks:
148                callback(monitoring_target, nr_accesses)
149            for page in monitoring_target:
150                nr_accesses[page] = 0
151        sleep(sampling interval)
152
153The monitoring overhead of this mechanism will arbitrarily increase as the
154size of the target workload grows.
155
156
157Region Based Sampling
158~~~~~~~~~~~~~~~~~~~~~
159
160To avoid the unbounded increase of the overhead, DAMON groups adjacent pages
161that assumed to have the same access frequencies into a region.  As long as the
162assumption (pages in a region have the same access frequencies) is kept, only
163one page in the region is required to be checked.  Thus, for each ``sampling
164interval``, DAMON randomly picks one page in each region, waits for one
165``sampling interval``, checks whether the page is accessed meanwhile, and
166increases the access frequency of the region if so.  Therefore, the monitoring
167overhead is controllable by setting the number of regions.  DAMON allows users
168to set the minimum and the maximum number of regions for the trade-off.
169
170This scheme, however, cannot preserve the quality of the output if the
171assumption is not guaranteed.
172
173
174Adaptive Regions Adjustment
175~~~~~~~~~~~~~~~~~~~~~~~~~~~
176
177Even somehow the initial monitoring target regions are well constructed to
178fulfill the assumption (pages in same region have similar access frequencies),
179the data access pattern can be dynamically changed.  This will result in low
180monitoring quality.  To keep the assumption as much as possible, DAMON
181adaptively merges and splits each region based on their access frequency.
182
183For each ``aggregation interval``, it compares the access frequencies of
184adjacent regions and merges those if the frequency difference is small.  Then,
185after it reports and clears the aggregated access frequency of each region, it
186splits each region into two or three regions if the total number of regions
187will not exceed the user-specified maximum number of regions after the split.
188
189In this way, DAMON provides its best-effort quality and minimal overhead while
190keeping the bounds users set for their trade-off.
191
192
193Dynamic Target Space Updates Handling
194~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
195
196The monitoring target address range could dynamically changed.  For example,
197virtual memory could be dynamically mapped and unmapped.  Physical memory could
198be hot-plugged.
199
200As the changes could be quite frequent in some cases, DAMON allows the
201monitoring operations to check dynamic changes including memory mapping changes
202and applies it to monitoring operations-related data structures such as the
203abstracted monitoring target memory area only for each of a user-specified time
204interval (``update interval``).
205
206
207Operation Schemes
208-----------------
209
210One common purpose of data access monitoring is access-aware system efficiency
211optimizations.  For example,
212
213    paging out memory regions that are not accessed for more than two minutes
214
215or
216
217    using THP for memory regions that are larger than 2 MiB and showing a high
218    access frequency for more than one minute.
219
220One straightforward approach for such schemes would be profile-guided
221optimizations.  That is, getting data access monitoring results of the
222workloads or the system using DAMON, finding memory regions of special
223characteristics by profiling the monitoring results, and making system
224operation changes for the regions.  The changes could be made by modifying or
225providing advice to the software (the application and/or the kernel), or
226reconfiguring the hardware.  Both offline and online approaches could be
227available.
228
229Among those, providing advice to the kernel at runtime would be flexible and
230effective, and therefore widely be used.   However, implementing such schemes
231could impose unnecessary redundancy and inefficiency.  The profiling could be
232redundant if the type of interest is common.  Exchanging the information
233including monitoring results and operation advice between kernel and user
234spaces could be inefficient.
235
236To allow users to reduce such redundancy and inefficiencies by offloading the
237works, DAMON provides a feature called Data Access Monitoring-based Operation
238Schemes (DAMOS).  It lets users specify their desired schemes at a high
239level.  For such specifications, DAMON starts monitoring, finds regions having
240the access pattern of interest, and applies the user-desired operation actions
241to the regions as soon as found.
242
243
244Operation Action
245~~~~~~~~~~~~~~~~
246
247The management action that the users desire to apply to the regions of their
248interest.  For example, paging out, prioritizing for next reclamation victim
249selection, advising ``khugepaged`` to collapse or split, or doing nothing but
250collecting statistics of the regions.
251
252The list of supported actions is defined in DAMOS, but the implementation of
253each action is in the DAMON operations set layer because the implementation
254normally depends on the monitoring target address space.  For example, the code
255for paging specific virtual address ranges out would be different from that for
256physical address ranges.  And the monitoring operations implementation sets are
257not mandated to support all actions of the list.  Hence, the availability of
258specific DAMOS action depends on what operations set is selected to be used
259together.
260
261Applying an action to a region is considered as changing the region's
262characteristics.  Hence, DAMOS resets the age of regions when an action is
263applied to those.
264
265
266Target Access Pattern
267~~~~~~~~~~~~~~~~~~~~~
268
269The access pattern of the schemes' interest.  The patterns are constructed with
270the properties that DAMON's monitoring results provide, specifically the size,
271the access frequency, and the age.  Users can describe their access pattern of
272interest by setting minimum and maximum values of the three properties.  If a
273region's three properties are in the ranges, DAMOS classifies it as one of the
274regions that the scheme is having an interest in.
275
276
277Quotas
278~~~~~~
279
280DAMOS upper-bound overhead control feature.  DAMOS could incur high overhead if
281the target access pattern is not properly tuned.  For example, if a huge memory
282region having the access pattern of interest is found, applying the scheme's
283action to all pages of the huge region could consume unacceptably large system
284resources.  Preventing such issues by tuning the access pattern could be
285challenging, especially if the access patterns of the workloads are highly
286dynamic.
287
288To mitigate that situation, DAMOS provides an upper-bound overhead control
289feature called quotas.  It lets users specify an upper limit of time that DAMOS
290can use for applying the action, and/or a maximum bytes of memory regions that
291the action can be applied within a user-specified time duration.
292
293
294Prioritization
295^^^^^^^^^^^^^^
296
297A mechanism for making a good decision under the quotas.  When the action
298cannot be applied to all regions of interest due to the quotas, DAMOS
299prioritizes regions and applies the action to only regions having high enough
300priorities so that it will not exceed the quotas.
301
302The prioritization mechanism should be different for each action.  For example,
303rarely accessed (colder) memory regions would be prioritized for page-out
304scheme action.  In contrast, the colder regions would be deprioritized for huge
305page collapse scheme action.  Hence, the prioritization mechanisms for each
306action are implemented in each DAMON operations set, together with the actions.
307
308Though the implementation is up to the DAMON operations set, it would be common
309to calculate the priority using the access pattern properties of the regions.
310Some users would want the mechanisms to be personalized for their specific
311case.  For example, some users would want the mechanism to weigh the recency
312(``age``) more than the access frequency (``nr_accesses``).  DAMOS allows users
313to specify the weight of each access pattern property and passes the
314information to the underlying mechanism.  Nevertheless, how and even whether
315the weight will be respected are up to the underlying prioritization mechanism
316implementation.
317
318
319Watermarks
320~~~~~~~~~~
321
322Conditional DAMOS (de)activation automation.  Users might want DAMOS to run
323only under certain situations.  For example, when a sufficient amount of free
324memory is guaranteed, running a scheme for proactive reclamation would only
325consume unnecessary system resources.  To avoid such consumption, the user would
326need to manually monitor some metrics such as free memory ratio, and turn
327DAMON/DAMOS on or off.
328
329DAMOS allows users to offload such works using three watermarks.  It allows the
330users to configure the metric of their interest, and three watermark values,
331namely high, middle, and low.  If the value of the metric becomes above the
332high watermark or below the low watermark, the scheme is deactivated.  If the
333metric becomes below the mid watermark but above the low watermark, the scheme
334is activated.  If all schemes are deactivated by the watermarks, the monitoring
335is also deactivated.  In this case, the DAMON worker thread only periodically
336checks the watermarks and therefore incurs nearly zero overhead.
337
338
339Filters
340~~~~~~~
341
342Non-access pattern-based target memory regions filtering.  If users run
343self-written programs or have good profiling tools, they could know something
344more than the kernel, such as future access patterns or some special
345requirements for specific types of memory. For example, some users may know
346only anonymous pages can impact their program's performance.  They can also
347have a list of latency-critical processes.
348
349To let users optimize DAMOS schemes with such special knowledge, DAMOS provides
350a feature called DAMOS filters.  The feature allows users to set an arbitrary
351number of filters for each scheme.  Each filter specifies the type of target
352memory, and whether it should exclude the memory of the type (filter-out), or
353all except the memory of the type (filter-in).
354
355As of this writing, anonymous page type and memory cgroup type are supported by
356the feature.  Some filter target types can require additional arguments.  For
357example, the memory cgroup filter type asks users to specify the file path of
358the memory cgroup for the filter.  Hence, users can apply specific schemes to
359only anonymous pages, non-anonymous pages, pages of specific cgroups, all pages
360excluding those of specific cgroups, and any combination of those.
361
362
363Application Programming Interface
364---------------------------------
365
366The programming interface for kernel space data access-aware applications.
367DAMON is a framework, so it does nothing by itself.  Instead, it only helps
368other kernel components such as subsystems and modules building their data
369access-aware applications using DAMON's core features.  For this, DAMON exposes
370its all features to other kernel components via its application programming
371interface, namely ``include/linux/damon.h``.  Please refer to the API
372:doc:`document </mm/damon/api>` for details of the interface.
373
374
375Modules
376=======
377
378Because the core of DAMON is a framework for kernel components, it doesn't
379provide any direct interface for the user space.  Such interfaces should be
380implemented by each DAMON API user kernel components, instead.  DAMON subsystem
381itself implements such DAMON API user modules, which are supposed to be used
382for general purpose DAMON control and special purpose data access-aware system
383operations, and provides stable application binary interfaces (ABI) for the
384user space.  The user space can build their efficient data access-aware
385applications using the interfaces.
386
387
388General Purpose User Interface Modules
389--------------------------------------
390
391DAMON modules that provide user space ABIs for general purpose DAMON usage in
392runtime.
393
394DAMON user interface modules, namely 'DAMON sysfs interface' and 'DAMON debugfs
395interface' are DAMON API user kernel modules that provide ABIs to the
396user-space.  Please note that DAMON debugfs interface is currently deprecated.
397
398Like many other ABIs, the modules create files on sysfs and debugfs, allow
399users to specify their requests to and get the answers from DAMON by writing to
400and reading from the files.  As a response to such I/O, DAMON user interface
401modules control DAMON and retrieve the results as user requested via the DAMON
402API, and return the results to the user-space.
403
404The ABIs are designed to be used for user space applications development,
405rather than human beings' fingers.  Human users are recommended to use such
406user space tools.  One such Python-written user space tool is available at
407Github (https://github.com/awslabs/damo), Pypi
408(https://pypistats.org/packages/damo), and Fedora
409(https://packages.fedoraproject.org/pkgs/python-damo/damo/).
410
411Please refer to the ABI :doc:`document </admin-guide/mm/damon/usage>` for
412details of the interfaces.
413
414
415Special-Purpose Access-aware Kernel Modules
416-------------------------------------------
417
418DAMON modules that provide user space ABI for specific purpose DAMON usage.
419
420DAMON sysfs/debugfs user interfaces are for full control of all DAMON features
421in runtime.  For each special-purpose system-wide data access-aware system
422operations such as proactive reclamation or LRU lists balancing, the interfaces
423could be simplified by removing unnecessary knobs for the specific purpose, and
424extended for boot-time and even compile time control.  Default values of DAMON
425control parameters for the usage would also need to be optimized for the
426purpose.
427
428To support such cases, yet more DAMON API user kernel modules that provide more
429simple and optimized user space interfaces are available.  Currently, two
430modules for proactive reclamation and LRU lists manipulation are provided.  For
431more detail, please read the usage documents for those
432(:doc:`/admin-guide/mm/damon/reclaim` and
433:doc:`/admin-guide/mm/damon/lru_sort`).
434