Table of Contents
Very rough guide to CGroups v2 and how it can be used to limit resource usage.
To be used if more comfortable mechanisms like resource limits in systemd are not available or working.
Prerequisites: Linux Kernel with CGroups version 2 (roughly post 2015), system configured for CGroups version 2 unified hierarchy
To get a high-level idea read (now obsolete) https://andrestc.com/post/cgroups-io/ . Beware - it is obsolete and exactly following commands in it will likely not work. See below.
All the gory details are here: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html
Setup - TL;DR version
- Find cgroup mount point:
# mount | grep 'type cgroup2'
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)
- Create a new cgroup:
# mkdir /sys/fs/cgroup/mycgroup
# cd /sys/fs/cgroup/mycgroup
- Check that IO, memory etc. controllers are enabled here:
# cat cgroup.controllers
cpuset cpu io memory pids
If not see https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#controlling-controllers .
- Assign a process into the new cgroup. Can be a shell, the cgroup assignment will be inherited by children:
# echo PID > cgroup.procs
Memory limits
Details: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory
TL;DR exceeding memory.high
limit causes pages to be reclaimed. If the process attempts to exceed memory.max
limit it will be OOM-killed.
100 MB as "expected" upper bound, 200 MB hard limit.
# echo 104857600 > memory.high
# echo 209715200 > memory.max
This counts to all memory, including sockets, disk cache etc. See memory.stat
file for breakdown into different types.
I/O limits
Details: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#io
- Beware, HW I/O limit itself might not be sufficient because of in-memory cache. Set
memory.high
limit first so the disk cache has a defined size limit. Details: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#writeback - Beware, this does not work on bare tmpfs. (I did not try to use file on tmpfs to back loopback device ...)
- Make up your mind, pick a test directory, and determine backing device node for it.
# mount | grep /mnt/experiment
/dev/mapper/darp7-experiment on /mnt/experiment type ext4 (rw,noatime)
# ls -l /dev/mapper/darp7-experiment
lrwxrwxrwx 1 root root 7 09-21 12:38 /dev/mapper/darp7-experiment -> ../dm-4
# cat /proc/partitions
major minor #blocks name
...
254 4 325058560 dm-4
Here the directory /mnt/experiment
maps to device node 254:4
.
- Set limit 10 IO operations per second for the backing device:
# echo "254:4 wiops=10" > io.max
IO limits:
Key | Description |
---|---|
rbps | Max read bytes per second |
wbps | Max write bytes per second |
riops | Max read IO operations per second |
wiops | Max write IO operations per second |
You can check current limits using (this shows 1 MB/s write limit):
# cat io.max
254:4 rbps=max wbps=1048576 riops=max wiops=max
- Test the limit:
# echo 5000000 > memory.high
# dd status=progress if=/dev/zero of=/mnt/experiment/zero bs=1k count=1M
18887680 bytes (19 MB, 18 MiB) copied, 20 s, 936 kB/s
Tricks & good to know
- beware of
cgroup.subtree_control
- https://unix.stackexchange.com/questions/680167/ebusy-when-trying-to-add-process-to-cgroup-v2 , https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#no-internal-process-constraint - possibly not applicable anymore - caution while selecting device node for I/O limits, partitions might not work? - https://stackoverflow.com/questions/28830754/restrict-io-usage-using-cgroups