1da82c92fSMauro Carvalho Chehab==================
2da82c92fSMauro Carvalho ChehabHugeTLB Controller
3da82c92fSMauro Carvalho Chehab==================
4da82c92fSMauro Carvalho Chehab
5da82c92fSMauro Carvalho ChehabHugeTLB controller can be created by first mounting the cgroup filesystem.
6da82c92fSMauro Carvalho Chehab
7da82c92fSMauro Carvalho Chehab# mount -t cgroup -o hugetlb none /sys/fs/cgroup
8da82c92fSMauro Carvalho Chehab
9da82c92fSMauro Carvalho ChehabWith the above step, the initial or the parent HugeTLB group becomes
10da82c92fSMauro Carvalho Chehabvisible at /sys/fs/cgroup. At bootup, this group includes all the tasks in
11da82c92fSMauro Carvalho Chehabthe system. /sys/fs/cgroup/tasks lists the tasks in this cgroup.
12da82c92fSMauro Carvalho Chehab
13da82c92fSMauro Carvalho ChehabNew groups can be created under the parent group /sys/fs/cgroup::
14da82c92fSMauro Carvalho Chehab
15da82c92fSMauro Carvalho Chehab  # cd /sys/fs/cgroup
16da82c92fSMauro Carvalho Chehab  # mkdir g1
17da82c92fSMauro Carvalho Chehab  # echo $$ > g1/tasks
18da82c92fSMauro Carvalho Chehab
19da82c92fSMauro Carvalho ChehabThe above steps create a new group g1 and move the current shell
20da82c92fSMauro Carvalho Chehabprocess (bash) into it.
21da82c92fSMauro Carvalho Chehab
22da82c92fSMauro Carvalho ChehabBrief summary of control files::
23da82c92fSMauro Carvalho Chehab
246566704dSMina Almasry hugetlb.<hugepagesize>.rsvd.limit_in_bytes            # set/show limit of "hugepagesize" hugetlb reservations
256566704dSMina Almasry hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes        # show max "hugepagesize" hugetlb reservations and no-reserve faults
266566704dSMina Almasry hugetlb.<hugepagesize>.rsvd.usage_in_bytes            # show current reservations and no-reserve faults for "hugepagesize" hugetlb
276566704dSMina Almasry hugetlb.<hugepagesize>.rsvd.failcnt                   # show the number of allocation failure due to HugeTLB reservation limit
286566704dSMina Almasry hugetlb.<hugepagesize>.limit_in_bytes                 # set/show limit of "hugepagesize" hugetlb faults
29da82c92fSMauro Carvalho Chehab hugetlb.<hugepagesize>.max_usage_in_bytes             # show max "hugepagesize" hugetlb  usage recorded
30da82c92fSMauro Carvalho Chehab hugetlb.<hugepagesize>.usage_in_bytes                 # show current usage for "hugepagesize" hugetlb
316566704dSMina Almasry hugetlb.<hugepagesize>.failcnt                        # show the number of allocation failure due to HugeTLB usage limit
32*f4776199SMina Almasry hugetlb.<hugepagesize>.numa_stat                      # show the numa information of the hugetlb memory charged to this cgroup
33da82c92fSMauro Carvalho Chehab
34da82c92fSMauro Carvalho ChehabFor a system supporting three hugepage sizes (64k, 32M and 1G), the control
35da82c92fSMauro Carvalho Chehabfiles include::
36da82c92fSMauro Carvalho Chehab
37da82c92fSMauro Carvalho Chehab  hugetlb.1GB.limit_in_bytes
38da82c92fSMauro Carvalho Chehab  hugetlb.1GB.max_usage_in_bytes
39*f4776199SMina Almasry  hugetlb.1GB.numa_stat
40da82c92fSMauro Carvalho Chehab  hugetlb.1GB.usage_in_bytes
41da82c92fSMauro Carvalho Chehab  hugetlb.1GB.failcnt
426566704dSMina Almasry  hugetlb.1GB.rsvd.limit_in_bytes
436566704dSMina Almasry  hugetlb.1GB.rsvd.max_usage_in_bytes
446566704dSMina Almasry  hugetlb.1GB.rsvd.usage_in_bytes
456566704dSMina Almasry  hugetlb.1GB.rsvd.failcnt
46da82c92fSMauro Carvalho Chehab  hugetlb.64KB.limit_in_bytes
47da82c92fSMauro Carvalho Chehab  hugetlb.64KB.max_usage_in_bytes
48*f4776199SMina Almasry  hugetlb.64KB.numa_stat
49da82c92fSMauro Carvalho Chehab  hugetlb.64KB.usage_in_bytes
50da82c92fSMauro Carvalho Chehab  hugetlb.64KB.failcnt
516566704dSMina Almasry  hugetlb.64KB.rsvd.limit_in_bytes
526566704dSMina Almasry  hugetlb.64KB.rsvd.max_usage_in_bytes
536566704dSMina Almasry  hugetlb.64KB.rsvd.usage_in_bytes
546566704dSMina Almasry  hugetlb.64KB.rsvd.failcnt
55da82c92fSMauro Carvalho Chehab  hugetlb.32MB.limit_in_bytes
56da82c92fSMauro Carvalho Chehab  hugetlb.32MB.max_usage_in_bytes
57*f4776199SMina Almasry  hugetlb.32MB.numa_stat
58da82c92fSMauro Carvalho Chehab  hugetlb.32MB.usage_in_bytes
59da82c92fSMauro Carvalho Chehab  hugetlb.32MB.failcnt
606566704dSMina Almasry  hugetlb.32MB.rsvd.limit_in_bytes
616566704dSMina Almasry  hugetlb.32MB.rsvd.max_usage_in_bytes
626566704dSMina Almasry  hugetlb.32MB.rsvd.usage_in_bytes
636566704dSMina Almasry  hugetlb.32MB.rsvd.failcnt
646566704dSMina Almasry
656566704dSMina Almasry
666566704dSMina Almasry1. Page fault accounting
676566704dSMina Almasry
686566704dSMina Almasryhugetlb.<hugepagesize>.limit_in_bytes
696566704dSMina Almasryhugetlb.<hugepagesize>.max_usage_in_bytes
706566704dSMina Almasryhugetlb.<hugepagesize>.usage_in_bytes
716566704dSMina Almasryhugetlb.<hugepagesize>.failcnt
726566704dSMina Almasry
736566704dSMina AlmasryThe HugeTLB controller allows users to limit the HugeTLB usage (page fault) per
746566704dSMina Almasrycontrol group and enforces the limit during page fault. Since HugeTLB
756566704dSMina Almasrydoesn't support page reclaim, enforcing the limit at page fault time implies
766566704dSMina Almasrythat, the application will get SIGBUS signal if it tries to fault in HugeTLB
776566704dSMina Almasrypages beyond its limit. Therefore the application needs to know exactly how many
786566704dSMina AlmasryHugeTLB pages it uses before hand, and the sysadmin needs to make sure that
796566704dSMina Almasrythere are enough available on the machine for all the users to avoid processes
806566704dSMina Almasrygetting SIGBUS.
816566704dSMina Almasry
826566704dSMina Almasry
836566704dSMina Almasry2. Reservation accounting
846566704dSMina Almasry
856566704dSMina Almasryhugetlb.<hugepagesize>.rsvd.limit_in_bytes
866566704dSMina Almasryhugetlb.<hugepagesize>.rsvd.max_usage_in_bytes
876566704dSMina Almasryhugetlb.<hugepagesize>.rsvd.usage_in_bytes
886566704dSMina Almasryhugetlb.<hugepagesize>.rsvd.failcnt
896566704dSMina Almasry
906566704dSMina AlmasryThe HugeTLB controller allows to limit the HugeTLB reservations per control
916566704dSMina Almasrygroup and enforces the controller limit at reservation time and at the fault of
926566704dSMina AlmasryHugeTLB memory for which no reservation exists. Since reservation limits are
936566704dSMina Almasryenforced at reservation time (on mmap or shget), reservation limits never causes
946566704dSMina Almasrythe application to get SIGBUS signal if the memory was reserved before hand. For
956566704dSMina AlmasryMAP_NORESERVE allocations, the reservation limit behaves the same as the fault
966566704dSMina Almasrylimit, enforcing memory usage at fault time and causing the application to
976566704dSMina Almasryreceive a SIGBUS if it's crossing its limit.
986566704dSMina Almasry
996566704dSMina AlmasryReservation limits are superior to page fault limits described above, since
1006566704dSMina Almasryreservation limits are enforced at reservation time (on mmap or shget), and
1016566704dSMina Almasrynever causes the application to get SIGBUS signal if the memory was reserved
1026566704dSMina Almasrybefore hand. This allows for easier fallback to alternatives such as
1036566704dSMina Almasrynon-HugeTLB memory for example. In the case of page fault accounting, it's very
1046566704dSMina Almasryhard to avoid processes getting SIGBUS since the sysadmin needs precisely know
1056566704dSMina Almasrythe HugeTLB usage of all the tasks in the system and make sure there is enough
1066566704dSMina Almasrypages to satisfy all requests. Avoiding tasks getting SIGBUS on overcommited
1076566704dSMina Almasrysystems is practically impossible with page fault accounting.
1086566704dSMina Almasry
1096566704dSMina Almasry
1106566704dSMina Almasry3. Caveats with shared memory
1116566704dSMina Almasry
1126566704dSMina AlmasryFor shared HugeTLB memory, both HugeTLB reservation and page faults are charged
1136566704dSMina Almasryto the first task that causes the memory to be reserved or faulted, and all
1146566704dSMina Almasrysubsequent uses of this reserved or faulted memory is done without charging.
1156566704dSMina Almasry
1166566704dSMina AlmasryShared HugeTLB memory is only uncharged when it is unreserved or deallocated.
1176566704dSMina AlmasryThis is usually when the HugeTLB file is deleted, and not when the task that
1186566704dSMina Almasrycaused the reservation or fault has exited.
1196566704dSMina Almasry
1206566704dSMina Almasry
1216566704dSMina Almasry4. Caveats with HugeTLB cgroup offline.
1226566704dSMina Almasry
1236566704dSMina AlmasryWhen a HugeTLB cgroup goes offline with some reservations or faults still
1246566704dSMina Almasrycharged to it, the behavior is as follows:
1256566704dSMina Almasry
1266566704dSMina Almasry- The fault charges are charged to the parent HugeTLB cgroup (reparented),
1276566704dSMina Almasry- the reservation charges remain on the offline HugeTLB cgroup.
1286566704dSMina Almasry
1296566704dSMina AlmasryThis means that if a HugeTLB cgroup gets offlined while there is still HugeTLB
1306566704dSMina Almasryreservations charged to it, that cgroup persists as a zombie until all HugeTLB
1316566704dSMina Almasryreservations are uncharged. HugeTLB reservations behave in this manner to match
1326566704dSMina Almasrythe memory controller whose cgroups also persist as zombie until all charged
1336566704dSMina Almasrymemory is uncharged. Also, the tracking of HugeTLB reservations is a bit more
1346566704dSMina Almasrycomplex compared to the tracking of HugeTLB faults, so it is significantly
1356566704dSMina Almasryharder to reparent reservations at offline time.
136