1================== 2HugeTLB Controller 3================== 4 5HugeTLB controller can be created by first mounting the cgroup filesystem. 6 7# mount -t cgroup -o hugetlb none /sys/fs/cgroup 8 9With the above step, the initial or the parent HugeTLB group becomes 10visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in 11the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup. 12 13New groups can be created under the parent group /sys/fs/cgroup:: 14 15 # cd /sys/fs/cgroup 16 # mkdir g1 17 # echo $$ > g1/tasks 18 19The above steps create a new group g1 and move the current shell 20process (bash) into it. 21 22Brief summary of control files:: 23 24 hugetlb.<hugepagesize>.rsvd.limit_in_bytes # set/show limit of "hugepagesize" hugetlb reservations 25 hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes # show max "hugepagesize" hugetlb reservations and no-reserve faults 26 hugetlb.<hugepagesize>.rsvd.usage_in_bytes # show current reservations and no-reserve faults for "hugepagesize" hugetlb 27 hugetlb.<hugepagesize>.rsvd.failcnt # show the number of allocation failure due to HugeTLB reservation limit 28 hugetlb.<hugepagesize>.limit_in_bytes # set/show limit of "hugepagesize" hugetlb faults 29 hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded 30 hugetlb.<hugepagesize>.usage_in_bytes # show current usage for "hugepagesize" hugetlb 31 hugetlb.<hugepagesize>.failcnt # show the number of allocation failure due to HugeTLB usage limit 32 33For a system supporting three hugepage sizes (64k, 32M and 1G), the control 34files include:: 35 36 hugetlb.1GB.limit_in_bytes 37 hugetlb.1GB.max_usage_in_bytes 38 hugetlb.1GB.usage_in_bytes 39 hugetlb.1GB.failcnt 40 hugetlb.1GB.rsvd.limit_in_bytes 41 hugetlb.1GB.rsvd.max_usage_in_bytes 42 hugetlb.1GB.rsvd.usage_in_bytes 43 hugetlb.1GB.rsvd.failcnt 44 hugetlb.64KB.limit_in_bytes 45 hugetlb.64KB.max_usage_in_bytes 46 hugetlb.64KB.usage_in_bytes 47 hugetlb.64KB.failcnt 48 hugetlb.64KB.rsvd.limit_in_bytes 49 hugetlb.64KB.rsvd.max_usage_in_bytes 50 hugetlb.64KB.rsvd.usage_in_bytes 51 hugetlb.64KB.rsvd.failcnt 52 hugetlb.32MB.limit_in_bytes 53 hugetlb.32MB.max_usage_in_bytes 54 hugetlb.32MB.usage_in_bytes 55 hugetlb.32MB.failcnt 56 hugetlb.32MB.rsvd.limit_in_bytes 57 hugetlb.32MB.rsvd.max_usage_in_bytes 58 hugetlb.32MB.rsvd.usage_in_bytes 59 hugetlb.32MB.rsvd.failcnt 60 61 621. Page fault accounting 63 64hugetlb.<hugepagesize>.limit_in_bytes 65hugetlb.<hugepagesize>.max_usage_in_bytes 66hugetlb.<hugepagesize>.usage_in_bytes 67hugetlb.<hugepagesize>.failcnt 68 69The HugeTLB controller allows users to limit the HugeTLB usage (page fault) per 70control group and enforces the limit during page fault. Since HugeTLB 71doesn't support page reclaim, enforcing the limit at page fault time implies 72that, the application will get SIGBUS signal if it tries to fault in HugeTLB 73pages beyond its limit. Therefore the application needs to know exactly how many 74HugeTLB pages it uses before hand, and the sysadmin needs to make sure that 75there are enough available on the machine for all the users to avoid processes 76getting SIGBUS. 77 78 792. Reservation accounting 80 81hugetlb.<hugepagesize>.rsvd.limit_in_bytes 82hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes 83hugetlb.<hugepagesize>.rsvd.usage_in_bytes 84hugetlb.<hugepagesize>.rsvd.failcnt 85 86The HugeTLB controller allows to limit the HugeTLB reservations per control 87group and enforces the controller limit at reservation time and at the fault of 88HugeTLB memory for which no reservation exists. Since reservation limits are 89enforced at reservation time (on mmap or shget), reservation limits never causes 90the application to get SIGBUS signal if the memory was reserved before hand. For 91MAP_NORESERVE allocations, the reservation limit behaves the same as the fault 92limit, enforcing memory usage at fault time and causing the application to 93receive a SIGBUS if it's crossing its limit. 94 95Reservation limits are superior to page fault limits described above, since 96reservation limits are enforced at reservation time (on mmap or shget), and 97never causes the application to get SIGBUS signal if the memory was reserved 98before hand. This allows for easier fallback to alternatives such as 99non-HugeTLB memory for example. In the case of page fault accounting, it's very 100hard to avoid processes getting SIGBUS since the sysadmin needs precisely know 101the HugeTLB usage of all the tasks in the system and make sure there is enough 102pages to satisfy all requests. Avoiding tasks getting SIGBUS on overcommited 103systems is practically impossible with page fault accounting. 104 105 1063. Caveats with shared memory 107 108For shared HugeTLB memory, both HugeTLB reservation and page faults are charged 109to the first task that causes the memory to be reserved or faulted, and all 110subsequent uses of this reserved or faulted memory is done without charging. 111 112Shared HugeTLB memory is only uncharged when it is unreserved or deallocated. 113This is usually when the HugeTLB file is deleted, and not when the task that 114caused the reservation or fault has exited. 115 116 1174. Caveats with HugeTLB cgroup offline. 118 119When a HugeTLB cgroup goes offline with some reservations or faults still 120charged to it, the behavior is as follows: 121 122- The fault charges are charged to the parent HugeTLB cgroup (reparented), 123- the reservation charges remain on the offline HugeTLB cgroup. 124 125This means that if a HugeTLB cgroup gets offlined while there is still HugeTLB 126reservations charged to it, that cgroup persists as a zombie until all HugeTLB 127reservations are uncharged. HugeTLB reservations behave in this manner to match 128the memory controller whose cgroups also persist as zombie until all charged 129memory is uncharged. Also, the tracking of HugeTLB reservations is a bit more 130complex compared to the tracking of HugeTLB faults, so it is significantly 131harder to reparent reservations at offline time. 132