1.. SPDX-License-Identifier: GPL-2.0 2 3######### 4UML HowTo 5######### 6 7.. contents:: :local: 8 9************ 10Introduction 11************ 12 13Welcome to User Mode Linux 14 15User Mode Linux is the first Open Source virtualization platform (first 16release date 1991) and second virtualization platform for an x86 PC. 17 18How is UML Different from a VM using Virtualization package X? 19============================================================== 20 21We have come to assume that virtualization also means some level of 22hardware emulation. In fact, it does not. As long as a virtualization 23package provides the OS with devices which the OS can recognize and 24has a driver for, the devices do not need to emulate real hardware. 25Most OSes today have built-in support for a number of "fake" 26devices used only under virtualization. 27User Mode Linux takes this concept to the ultimate extreme - there 28is not a single real device in sight. It is 100% artificial or if 29we use the correct term 100% paravirtual. All UML devices are abstract 30concepts which map onto something provided by the host - files, sockets, 31pipes, etc. 32 33The other major difference between UML and various virtualization 34packages is that there is a distinct difference between the way the UML 35kernel and the UML programs operate. 36The UML kernel is just a process running on Linux - same as any other 37program. It can be run by an unprivileged user and it does not require 38anything in terms of special CPU features. 39The UML userspace, however, is a bit different. The Linux kernel on the 40host machine assists UML in intercepting everything the program running 41on a UML instance is trying to do and making the UML kernel handle all 42of its requests. 43This is different from other virtualization packages which do not make any 44difference between the guest kernel and guest programs. This difference 45results in a number of advantages and disadvantages of UML over let's say 46QEMU which we will cover later in this document. 47 48 49Why Would I Want User Mode Linux? 50================================= 51 52 53* If User Mode Linux kernel crashes, your host kernel is still fine. It 54 is not accelerated in any way (vhost, kvm, etc) and it is not trying to 55 access any devices directly. It is, in fact, a process like any other. 56 57* You can run a usermode kernel as a non-root user (you may need to 58 arrange appropriate permissions for some devices). 59 60* You can run a very small VM with a minimal footprint for a specific 61 task (for example 32M or less). 62 63* You can get extremely high performance for anything which is a "kernel 64 specific task" such as forwarding, firewalling, etc while still being 65 isolated from the host kernel. 66 67* You can play with kernel concepts without breaking things. 68 69* You are not bound by "emulating" hardware, so you can try weird and 70 wonderful concepts which are very difficult to support when emulating 71 real hardware such as time travel and making your system clock 72 dependent on what UML does (very useful for things like tests). 73 74* It's fun. 75 76Why not to run UML 77================== 78 79* The syscall interception technique used by UML makes it inherently 80 slower for any userspace applications. While it can do kernel tasks 81 on par with most other virtualization packages, its userspace is 82 **slow**. The root cause is that UML has a very high cost of creating 83 new processes and threads (something most Unix/Linux applications 84 take for granted). 85 86* UML is strictly uniprocessor at present. If you want to run an 87 application which needs many CPUs to function, it is clearly the 88 wrong choice. 89 90*********************** 91Building a UML instance 92*********************** 93 94There is no UML installer in any distribution. While you can use off 95the shelf install media to install into a blank VM using a virtualization 96package, there is no UML equivalent. You have to use appropriate tools on 97your host to build a viable filesystem image. 98 99This is extremely easy on Debian - you can do it using debootstrap. It is 100also easy on OpenWRT - the build process can build UML images. All other 101distros - YMMV. 102 103Creating an image 104================= 105 106Create a sparse raw disk image:: 107 108 # dd if=/dev/zero of=disk_image_name bs=1 count=1 seek=16G 109 110This will create a 16G disk image. The OS will initially allocate only one 111block and will allocate more as they are written by UML. As of kernel 112version 4.19 UML fully supports TRIM (as usually used by flash drives). 113Using TRIM inside the UML image by specifying discard as a mount option 114or by running ``tune2fs -o discard /dev/ubdXX`` will request UML to 115return any unused blocks to the OS. 116 117Create a filesystem on the disk image and mount it:: 118 119 # mkfs.ext4 ./disk_image_name && mount ./disk_image_name /mnt 120 121This example uses ext4, any other filesystem such as ext3, btrfs, xfs, 122jfs, etc will work too. 123 124Create a minimal OS installation on the mounted filesystem:: 125 126 # debootstrap buster /mnt http://deb.debian.org/debian 127 128debootstrap does not set up the root password, fstab, hostname or 129anything related to networking. It is up to the user to do that. 130 131Set the root password -t he easiest way to do that is to chroot into the 132mounted image:: 133 134 # chroot /mnt 135 # passwd 136 # exit 137 138Edit key system files 139===================== 140 141UML block devices are called ubds. The fstab created by debootstrap 142will be empty and it needs an entry for the root file system:: 143 144 /dev/ubd0 ext4 discard,errors=remount-ro 0 1 145 146The image hostname will be set to the same as the host on which you 147are creating it image. It is a good idea to change that to avoid 148"Oh, bummer, I rebooted the wrong machine". 149 150UML supports two classes of network devices - the older uml_net ones 151which are scheduled for obsoletion. These are called ethX. It also 152supports the newer vector IO devices which are significantly faster 153and have support for some standard virtual network encapsulations like 154Ethernet over GRE and Ethernet over L2TPv3. These are called vec0. 155 156Depending on which one is in use, ``/etc/network/interfaces`` will 157need entries like:: 158 159 # legacy UML network devices 160 auto eth0 161 iface eth0 inet dhcp 162 163 # vector UML network devices 164 auto vec0 165 iface eth0 inet dhcp 166 167We now have a UML image which is nearly ready to run, all we need is a 168UML kernel and modules for it. 169 170Most distributions have a UML package. Even if you intend to use your own 171kernel, testing the image with a stock one is always a good start. These 172packages come with a set of modules which should be copied to the target 173filesystem. The location is distribution dependent. For Debian these 174reside under /usr/lib/uml/modules. Copy recursively the content of this 175directory to the mounted UML filesystem:: 176 177 # cp -rax /usr/lib/uml/modules /mnt/lib/modules 178 179If you have compiled your own kernel, you need to use the usual "install 180modules to a location" procedure by running:: 181 182 # make install MODULES_DIR=/mnt/lib/modules 183 184At this point the image is ready to be brought up. 185 186************************* 187Setting Up UML Networking 188************************* 189 190UML networking is designed to emulate an Ethernet connection. This 191connection may be either a point-to-point (similar to a connection 192between machines using a back-to-back cable) or a connection to a 193switch. UML supports a wide variety of means to build these 194connections to all of: local machine, remote machine(s), local and 195remote UML and other VM instances. 196 197 198+-----------+--------+------------------------------------+------------+ 199| Transport | Type | Capabilities | Throughput | 200+===========+========+====================================+============+ 201| tap | vector | checksum, tso | > 8Gbit | 202+-----------+--------+------------------------------------+------------+ 203| hybrid | vector | checksum, tso, multipacket rx | > 6GBit | 204+-----------+--------+------------------------------------+------------+ 205| raw | vector | checksum, tso, multipacket rx, tx" | > 6GBit | 206+-----------+--------+------------------------------------+------------+ 207| EoGRE | vector | multipacket rx, tx | > 3Gbit | 208+-----------+--------+------------------------------------+------------+ 209| Eol2tpv3 | vector | multipacket rx, tx | > 3Gbit | 210+-----------+--------+------------------------------------+------------+ 211| bess | vector | multipacket rx, tx | > 3Gbit | 212+-----------+--------+------------------------------------+------------+ 213| fd | vector | dependent on fd type | varies | 214+-----------+--------+------------------------------------+------------+ 215| tuntap | legacy | none | ~ 500Mbit | 216+-----------+--------+------------------------------------+------------+ 217| daemon | legacy | none | ~ 450Mbit | 218+-----------+--------+------------------------------------+------------+ 219| socket | legacy | none | ~ 450Mbit | 220+-----------+--------+------------------------------------+------------+ 221| pcap | legacy | rx only | ~ 450Mbit | 222+-----------+--------+------------------------------------+------------+ 223| ethertap | legacy | obsolete | ~ 500Mbit | 224+-----------+--------+------------------------------------+------------+ 225| vde | legacy | obsolete | ~ 500Mbit | 226+-----------+--------+------------------------------------+------------+ 227 228* All transports which have tso and checksum offloads can deliver speeds 229 approaching 10G on TCP streams. 230 231* All transports which have multi-packet rx and/or tx can deliver pps 232 rates of up to 1Mps or more. 233 234* All legacy transports are generally limited to ~600-700MBit and 0.05Mps 235 236* GRE and L2TPv3 allow connections to all of: local machine, remote 237 machines, remote network devices and remote UML instances. 238 239* Socket allows connections only between UML instances. 240 241* Daemon and bess require running a local switch. This switch may be 242 connected to the host as well. 243 244 245Network configuration privileges 246================================ 247 248The majority of the supported networking modes need ``root`` privileges. 249For example, in the legacy tuntap networking mode, users were required 250to be part of the group associated with the tunnel device. 251 252For newer network drivers like the vector transports, ``root`` privilege 253is required to fire an ioctl to setup the tun interface and/or use 254raw sockets where needed. 255 256This can be achieved by granting the user a particular capability instead 257of running UML as root. In case of vector transport, a user can add the 258capability ``CAP_NET_ADMIN`` or ``CAP_NET_RAW``, to the uml binary. 259Thenceforth, UML can be run with normal user privilges, along with 260full networking. 261 262For example:: 263 264 # sudo setcap cap_net_raw,cap_net_admin+ep linux 265 266Configuring vector transports 267=============================== 268 269All vector transports support a similar syntax: 270 271If X is the interface number as in vec0, vec1, vec2, etc, the general 272syntax for options is:: 273 274 vecX:transport="Transport Name",option=value,option=value,...,option=value 275 276Common options 277-------------- 278 279These options are common for all transports: 280 281* ``depth=int`` - sets the queue depth for vector IO. This is the 282 amount of packets UML will attempt to read or write in a single 283 system call. The default number is 64 and is generally sufficient 284 for most applications that need throughput in the 2-4 Gbit range. 285 Higher speeds may require larger values. 286 287* ``mac=XX:XX:XX:XX:XX`` - sets the interface MAC address value. 288 289* ``gro=[0,1]`` - sets GRO on or off. Enables receive/transmit offloads. 290 The effect of this option depends on the host side support in the transport 291 which is being configured. In most cases it will enable TCP segmentation and 292 RX/TX checksumming offloads. The setting must be identical on the host side 293 and the UML side. The UML kernel will produce warnings if it is not. 294 For example, GRO is enabled by default on local machine interfaces 295 (e.g. veth pairs, bridge, etc), so it should be enabled in UML in the 296 corresponding UML transports (raw, tap, hybrid) in order for networking to 297 operate correctly. 298 299* ``mtu=int`` - sets the interface MTU 300 301* ``headroom=int`` - adjusts the default headroom (32 bytes) reserved 302 if a packet will need to be re-encapsulated into for instance VXLAN. 303 304* ``vec=0`` - disable multipacket io and fall back to packet at a 305 time mode 306 307Shared Options 308-------------- 309 310* ``ifname=str`` Transports which bind to a local network interface 311 have a shared option - the name of the interface to bind to. 312 313* ``src, dst, src_port, dst_port`` - all transports which use sockets 314 which have the notion of source and destination and/or source port 315 and destination port use these to specify them. 316 317* ``v6=[0,1]`` to specify if a v6 connection is desired for all 318 transports which operate over IP. Additionally, for transports that 319 have some differences in the way they operate over v4 and v6 (for example 320 EoL2TPv3), sets the correct mode of operation. In the absense of this 321 option, the socket type is determined based on what do the src and dst 322 arguments resolve/parse to. 323 324tap transport 325------------- 326 327Example:: 328 329 vecX:transport=tap,ifname=tap0,depth=128,gro=1 330 331This will connect vec0 to tap0 on the host. Tap0 must already exist (for example 332created using tunctl) and UP. 333 334tap0 can be configured as a point-to-point interface and given an ip 335address so that UML can talk to the host. Alternatively, it is possible 336to connect UML to a tap interface which is connected to a bridge. 337 338While tap relies on the vector infrastructure, it is not a true vector 339transport at this point, because Linux does not support multi-packet 340IO on tap file descriptors for normal userspace apps like UML. This 341is a privilege which is offered only to something which can hook up 342to it at kernel level via specialized interfaces like vhost-net. A 343vhost-net like helper for UML is planned at some point in the future. 344 345Privileges required: tap transport requires either: 346 347* tap interface to exist and be created persistent and owned by the 348 UML user using tunctl. Example ``tunctl -u uml-user -t tap0`` 349 350* binary to have ``CAP_NET_ADMIN`` privilege 351 352hybrid transport 353---------------- 354 355Example:: 356 357 vecX:transport=hybrid,ifname=tap0,depth=128,gro=1 358 359This is an experimental/demo transport which couples tap for transmit 360and a raw socket for receive. The raw socket allows multi-packet 361receive resulting in significantly higher packet rates than normal tap 362 363Privileges required: hybrid requires ``CAP_NET_RAW`` capability by 364the UML user as well as the requirements for the tap transport. 365 366raw socket transport 367-------------------- 368 369Example:: 370 371 vecX:transport=raw,ifname=p-veth0,depth=128,gro=1 372 373 374This transport uses vector IO on raw sockets. While you can bind to any 375interface including a physical one, the most common use it to bind to 376the "peer" side of a veth pair with the other side configured on the 377host. 378 379Example host configuration for Debian: 380 381**/etc/network/interfaces**:: 382 383 auto veth0 384 iface veth0 inet static 385 address 192.168.4.1 386 netmask 255.255.255.252 387 broadcast 192.168.4.3 388 pre-up ip link add veth0 type veth peer name p-veth0 && \ 389 ifconfig p-veth0 up 390 391UML can now bind to p-veth0 like this:: 392 393 vec0:transport=raw,ifname=p-veth0,depth=128,gro=1 394 395 396If the UML guest is configured with 192.168.4.2 and netmask 255.255.255.0 397it can talk to the host on 192.168.4.1 398 399The raw transport also provides some support for offloading some of the 400filtering to the host. The two options to control it are: 401 402* ``bpffile=str`` filename of raw bpf code to be loaded as a socket filter 403 404* ``bpfflash=int`` 0/1 allow loading of bpf from inside User Mode Linux. 405 This option allows the use of the ethtool load firmware command to 406 load bpf code. 407 408In either case the bpf code is loaded into the host kernel. While this is 409presently limited to legacy bpf syntax (not ebpf), it is still a security 410risk. It is not recommended to allow this unless the User Mode Linux 411instance is considered trusted. 412 413Privileges required: raw socket transport requires `CAP_NET_RAW` 414capability. 415 416GRE socket transport 417-------------------- 418 419Example:: 420 421 vecX:transport=gre,src=$src_host,dst=$dst_host 422 423 424This will configure an Ethernet over ``GRE`` (aka ``GRETAP`` or 425``GREIRB``) tunnel which will connect the UML instance to a ``GRE`` 426endpoint at host dst_host. ``GRE`` supports the following additional 427options: 428 429* ``rx_key=int`` - GRE 32 bit integer key for rx packets, if set, 430 ``txkey`` must be set too 431 432* ``tx_key=int`` - GRE 32 bit integer key for tx packets, if set 433 ``rx_key`` must be set too 434 435* ``sequence=[0,1]`` - enable GRE sequence 436 437* ``pin_sequence=[0,1]`` - pretend that the sequence is always reset 438 on each packet (needed to interoperate with some really broken 439 implementations) 440 441* ``v6=[0,1]`` - force IPv4 or IPv6 sockets respectively 442 443* GRE checksum is not presently supported 444 445GRE has a number of caveats: 446 447* You can use only one GRE connection per ip address. There is no way to 448 multiplex connections as each GRE tunnel is terminated directly on 449 the UML instance. 450 451* The key is not really a security feature. While it was intended as such 452 it's "security" is laughable. It is, however, a useful feature to 453 ensure that the tunnel is not misconfigured. 454 455An example configuration for a Linux host with a local address of 456192.168.128.1 to connect to a UML instance at 192.168.129.1 457 458**/etc/network/interfaces**:: 459 460 auto gt0 461 iface gt0 inet static 462 address 10.0.0.1 463 netmask 255.255.255.0 464 broadcast 10.0.0.255 465 mtu 1500 466 pre-up ip link add gt0 type gretap local 192.168.128.1 \ 467 remote 192.168.129.1 || true 468 down ip link del gt0 || true 469 470Additionally, GRE has been tested versus a variety of network equipment. 471 472Privileges required: GRE requires ``CAP_NET_RAW`` 473 474l2tpv3 socket transport 475----------------------- 476 477_Warning_. L2TPv3 has a "bug". It is the "bug" known as "has more 478options than GNU ls". While it has some advantages, there are usually 479easier (and less verbose) ways to connect a UML instance to something. 480For example, most devices which support L2TPv3 also support GRE. 481 482Example:: 483 484 vec0:transport=l2tpv3,udp=1,src=$src_host,dst=$dst_host,srcport=$src_port,dstport=$dst_port,depth=128,rx_session=0xffffffff,tx_session=0xffff 485 486This will configure an Ethernet over L2TPv3 fixed tunnel which will 487connect the UML instance to a L2TPv3 endpoint at host $dst_host using 488the L2TPv3 UDP flavour and UDP destination port $dst_port. 489 490L2TPv3 always requires the following additional options: 491 492* ``rx_session=int`` - l2tpv3 32 bit integer session for rx packets 493 494* ``tx_session=int`` - l2tpv3 32 bit integer session for tx packets 495 496As the tunnel is fixed these are not negotiated and they are 497preconfigured on both ends. 498 499Additionally, L2TPv3 supports the following optional parameters 500 501* ``rx_cookie=int`` - l2tpv3 32 bit integer cookie for rx packets - same 502 functionality as GRE key, more to prevent misconfiguration than provide 503 actual security 504 505* ``tx_cookie=int`` - l2tpv3 32 bit integer cookie for tx packets 506 507* ``cookie64=[0,1]`` - use 64 bit cookies instead of 32 bit. 508 509* ``counter=[0,1]`` - enable l2tpv3 counter 510 511* ``pin_counter=[0,1]`` - pretend that the counter is always reset on 512 each packet (needed to interoperate with some really broken 513 implementations) 514 515* ``v6=[0,1]`` - force v6 sockets 516 517* ``udp=[0,1]`` - use raw sockets (0) or UDP (1) version of the protocol 518 519L2TPv3 has a number of caveats: 520 521* you can use only one connection per ip address in raw mode. There is 522 no way to multiplex connections as each L2TPv3 tunnel is terminated 523 directly on the UML instance. UDP mode can use different ports for 524 this purpose. 525 526Here is an example of how to configure a linux host to connect to UML 527via L2TPv3: 528 529**/etc/network/interfaces**:: 530 531 auto l2tp1 532 iface l2tp1 inet static 533 address 192.168.126.1 534 netmask 255.255.255.0 535 broadcast 192.168.126.255 536 mtu 1500 537 pre-up ip l2tp add tunnel remote 127.0.0.1 \ 538 local 127.0.0.1 encap udp tunnel_id 2 \ 539 peer_tunnel_id 2 udp_sport 1706 udp_dport 1707 && \ 540 ip l2tp add session name l2tp1 tunnel_id 2 \ 541 session_id 0xffffffff peer_session_id 0xffffffff 542 down ip l2tp del session tunnel_id 2 session_id 0xffffffff && \ 543 ip l2tp del tunnel tunnel_id 2 544 545 546Privileges required: L2TPv3 requires ``CAP_NET_RAW`` for raw IP mode and 547no special privileges for the UDP mode. 548 549BESS socket transport 550--------------------- 551 552BESS is a high performance modular network switch. 553 554https://github.com/NetSys/bess 555 556It has support for a simple sequential packet socket mode which in the 557more recent versions is using vector IO for high performance. 558 559Example:: 560 561 vecX:transport=bess,src=$unix_src,dst=$unix_dst 562 563This will configure a BESS transport using the unix_src Unix domain 564socket address as source and unix_dst socket address as destination. 565 566For BESS configuration and how to allocate a BESS Unix domain socket port 567please see the BESS documentation. 568 569https://github.com/NetSys/bess/wiki/Built-In-Modules-and-Ports 570 571BESS transport does not require any special privileges. 572 573Configuring Legacy transports 574============================= 575 576Legacy transports are now considered obsolete. Please use the vector 577versions. 578 579*********** 580Running UML 581*********** 582 583This section assumes that either the user-mode-linux package from the 584distribution or a custom built kernel has been installed on the host. 585 586These add an executable called linux to the system. This is the UML 587kernel. It can be run just like any other executable. 588It will take most normal linux kernel arguments as command line 589arguments. Additionally, it will need some UML specific arguments 590in order to do something useful. 591 592Arguments 593========= 594 595Mandatory Arguments: 596-------------------- 597 598* ``mem=int[K,M,G]`` - amount of memory. By default bytes. It will 599 also accept K, M or G qualifiers. 600 601* ``ubdX[s,d,c,t]=`` virtual disk specification. This is not really 602 mandatory, but it is likely to be needed in nearly all cases so we can 603 specify a root file system. 604 The simplest possible image specification is the name of the image 605 file for the filesystem (created using one of the methods described 606 in `Creating an image`_) 607 608 * UBD devices support copy on write (COW). The changes are kept in 609 a separate file which can be discarded allowing a rollback to the 610 original pristine image. If COW is desired, the UBD image is 611 specified as: ``cow_file,master_image``. 612 Example:``ubd0=Filesystem.cow,Filesystem.img`` 613 614 * UBD devices can be set to use synchronous IO. Any writes are 615 immediately flushed to disk. This is done by adding ``s`` after 616 the ``ubdX`` specification 617 618 * UBD performs some euristics on devices specified as a single 619 filename to make sure that a COW file has not been specified as 620 the image. To turn them off, use the ``d`` flag after ``ubdX`` 621 622 * UBD supports TRIM - asking the Host OS to reclaim any unused 623 blocks in the image. To turn it off, specify the ``t`` flag after 624 ``ubdX`` 625 626* ``root=`` root device - most likely ``/dev/ubd0`` (this is a Linux 627 filesystem image) 628 629Important Optional Arguments 630---------------------------- 631 632If UML is run as "linux" with no extra arguments, it will try to start an 633xterm for every console configured inside the image (up to 6 in most 634linux distributions). Each console is started inside an 635xterm. This makes it nice and easy to use UML on a host with a GUI. It is, 636however, the wrong approach if UML is to be used as a testing harness or run 637in a text-only environment. 638 639In order to change this behaviour we need to specify an alternative console 640and wire it to one of the supported "line" channels. For this we need to map a 641console to use something different from the default xterm. 642 643Example which will divert console number 1 to stdin/stdout:: 644 645 con1=fd:0,fd:1 646 647UML supports a wide variety of serial line channels which are specified using 648the following syntax 649 650 conX=channel_type:options[,channel_type:options] 651 652 653If the channel specification contains two parts separated by comma, the first 654one is input, the second one output. 655 656* The null channel - Discard all input or output. Example ``con=null`` will set 657 all consoles to null by default. 658 659* The fd channel - use file descriptor numbers for input/out. Example: 660 ``con1=fd:0,fd:1.`` 661 662* The port channel - listen on tcp port number. Example: ``con1=port:4321`` 663 664* The pty and pts channels - use system pty/pts. 665 666* The tty channel - bind to an existing system tty. Example: ``con1=/dev/tty8`` 667 will make UML use the host 8th console (usually unused). 668 669* The xterm channel - this is the default - bring up an xterm on this channel 670 and direct IO to it. Note, that in order for xterm to work, the host must 671 have the UML distribution package installed. This usually contains the 672 port-helper and other utilities needed for UML to communicate with the xterm. 673 Alternatively, these need to be complied and installed from source. All 674 options applicable to consoles also apply to UML serial lines which are 675 presented as ttyS inside UML. 676 677Starting UML 678============ 679 680We can now run UML. 681:: 682 683 # linux mem=2048M umid=TEST \ 684 ubd0=Filesystem.img \ 685 vec0:transport=tap,ifname=tap0,depth=128,gro=1 \ 686 root=/dev/ubda con=null con0=null,fd:2 con1=fd:0,fd:1 687 688This will run an instance with ``2048M RAM``, try to use the image file 689called ``Filesystem.img`` as root. It will connect to the host using tap0. 690All consoles except ``con1`` will be disabled and console 1 will 691use standard input/output making it appear in the same terminal it was started. 692 693Logging in 694============ 695 696If you have not set up a password when generating the image, you will have to 697shut down the UML instance, mount the image, chroot into it and set it - as 698described in the Generating an Image section. If the password is already set, 699you can just log in. 700 701The UML Management Console 702============================ 703 704In addition to managing the image from "the inside" using normal sysadmin tools, 705it is possible to perform a number of low level operations using the UML 706management console. The UML management console is a low-level interface to the 707kernel on a running UML instance, somewhat like the i386 SysRq interface. Since 708there is a full-blown operating system under UML, there is much greater 709flexibility possible than with the SysRq mechanism. 710 711There are a number of things you can do with the mconsole interface: 712 713* get the kernel version 714* add and remove devices 715* halt or reboot the machine 716* Send SysRq commands 717* Pause and resume the UML 718* Inspect processes running inside UML 719* Inspect UML internal /proc state 720 721You need the mconsole client (uml\_mconsole) which is a part of the UML 722tools package available in most Linux distritions. 723 724You also need ``CONFIG_MCONSOLE`` (under 'General Setup') enabled in the UML 725kernel. When you boot UML, you'll see a line like:: 726 727 mconsole initialized on /home/jdike/.uml/umlNJ32yL/mconsole 728 729If you specify a unique machine id one the UML command line, i.e. 730``umid=debian``, you'll see this:: 731 732 mconsole initialized on /home/jdike/.uml/debian/mconsole 733 734 735That file is the socket that uml_mconsole will use to communicate with 736UML. Run it with either the umid or the full path as its argument:: 737 738 # uml_mconsole debian 739 740or 741 742 # uml_mconsole /home/jdike/.uml/debian/mconsole 743 744 745You'll get a prompt, at which you can run one of these commands: 746 747* version 748* help 749* halt 750* reboot 751* config 752* remove 753* sysrq 754* help 755* cad 756* stop 757* go 758* proc 759* stack 760 761version 762------- 763 764This command takes no arguments. It prints the UML version:: 765 766 (mconsole) version 767 OK Linux OpenWrt 4.14.106 #0 Tue Mar 19 08:19:41 2019 x86_64 768 769 770There are a couple actual uses for this. It's a simple no-op which 771can be used to check that a UML is running. It's also a way of 772sending a device interrupt to the UML. UML mconsole is treated internally as 773a UML device. 774 775help 776---- 777 778This command takes no arguments. It prints a short help screen with the 779supported mconsole commands. 780 781 782halt and reboot 783--------------- 784 785These commands take no arguments. They shut the machine down immediately, with 786no syncing of disks and no clean shutdown of userspace. So, they are 787pretty close to crashing the machine:: 788 789 (mconsole) halt 790 OK 791 792config 793------ 794 795"config" adds a new device to the virtual machine. This is supported 796by most UML device drivers. It takes one argument, which is the 797device to add, with the same syntax as the kernel command line:: 798 799 (mconsole) config ubd3=/home/jdike/incoming/roots/root_fs_debian22 800 801remove 802------ 803 804"remove" deletes a device from the system. Its argument is just the 805name of the device to be removed. The device must be idle in whatever 806sense the driver considers necessary. In the case of the ubd driver, 807the removed block device must not be mounted, swapped on, or otherwise 808open, and in the case of the network driver, the device must be down:: 809 810 (mconsole) remove ubd3 811 812sysrq 813----- 814 815This command takes one argument, which is a single letter. It calls the 816generic kernel's SysRq driver, which does whatever is called for by 817that argument. See the SysRq documentation in 818Documentation/admin-guide/sysrq.rst in your favorite kernel tree to 819see what letters are valid and what they do. 820 821cad 822--- 823 824This invokes the ``Ctl-Alt-Del`` action in the running image. What exactly 825this ends up doing is up to init, systemd, etc. Normally, it reboots the 826machine. 827 828stop 829---- 830 831This puts the UML in a loop reading mconsole requests until a 'go' 832mconsole command is received. This is very useful as a 833debugging/snapshotting tool. 834 835go 836-- 837 838This resumes a UML after being paused by a 'stop' command. Note that 839when the UML has resumed, TCP connections may have timed out and if 840the UML is paused for a long period of time, crond might go a little 841crazy, running all the jobs it didn't do earlier. 842 843proc 844---- 845 846This takes one argument - the name of a file in /proc which is printed 847to the mconsole standard output 848 849stack 850----- 851 852This takes one argument - the pid number of a process. Its stack is 853printed to a standard output. 854 855******************* 856Advanced UML Topics 857******************* 858 859Sharing Filesystems between Virtual Machines 860============================================ 861 862Don't attempt to share filesystems simply by booting two UMLs from the 863same file. That's the same thing as booting two physical machines 864from a shared disk. It will result in filesystem corruption. 865 866Using layered block devices 867--------------------------- 868 869The way to share a filesystem between two virtual machines is to use 870the copy-on-write (COW) layering capability of the ubd block driver. 871Any changed blocks are stored in the private COW file, while reads come 872from either device - the private one if the requested block is valid in 873it, the shared one if not. Using this scheme, the majority of data 874which is unchanged is shared between an arbitrary number of virtual 875machines, each of which has a much smaller file containing the changes 876that it has made. With a large number of UMLs booting from a large root 877filesystem, this leads to a huge disk space saving. 878 879Sharing file system data will also help performance, since the host will 880be able to cache the shared data using a much smaller amount of memory, 881so UML disk requests will be served from the host's memory rather than 882its disks. There is a major caveat in doing this on multisocket NUMA 883machines. On such hardware, running many UML instances with a shared 884master image and COW changes may caise issues like NMIs from excess of 885inter-socket traffic. 886 887If you are running UML on high end hardware like this, make sure to 888bind UML to a set of logical cpus residing on the same socket using the 889``taskset`` command or have a look at the "tuning" section. 890 891To add a copy-on-write layer to an existing block device file, simply 892add the name of the COW file to the appropriate ubd switch:: 893 894 ubd0=root_fs_cow,root_fs_debian_22 895 896where ``root_fs_cow`` is the private COW file and ``root_fs_debian_22`` is 897the existing shared filesystem. The COW file need not exist. If it 898doesn't, the driver will create and initialize it. 899 900Disk Usage 901---------- 902 903UML has TRIM support which will release any unused space in its disk 904image files to the underlying OS. It is important to use either ls -ls 905or du to verify the actual file size. 906 907COW validity. 908------------- 909 910Any changes to the master image will invalidate all COW files. If this 911happens, UML will *NOT* automatically delete any of the COW files and 912will refuse to boot. In this case the only solution is to either 913restore the old image (including its last modified timestamp) or remove 914all COW files which will result in their recreation. Any changes in 915the COW files will be lost. 916 917Cows can moo - uml_moo : Merging a COW file with its backing file 918----------------------------------------------------------------- 919 920Depending on how you use UML and COW devices, it may be advisable to 921merge the changes in the COW file into the backing file every once in 922a while. 923 924The utility that does this is uml_moo. Its usage is:: 925 926 uml_moo COW_file new_backing_file 927 928 929There's no need to specify the backing file since that information is 930already in the COW file header. If you're paranoid, boot the new 931merged file, and if you're happy with it, move it over the old backing 932file. 933 934``uml_moo`` creates a new backing file by default as a safety measure. 935It also has a destructive merge option which will merge the COW file 936directly into its current backing file. This is really only usable 937when the backing file only has one COW file associated with it. If 938there are multiple COWs associated with a backing file, a -d merge of 939one of them will invalidate all of the others. However, it is 940convenient if you're short of disk space, and it should also be 941noticeably faster than a non-destructive merge. 942 943``uml_moo`` is installed with the UML distribution packages and is 944available as a part of UML utilities. 945 946Host file access 947================== 948 949If you want to access files on the host machine from inside UML, you 950can treat it as a separate machine and either nfs mount directories 951from the host or copy files into the virtual machine with scp. 952However, since UML is running on the host, it can access those 953files just like any other process and make them available inside the 954virtual machine without the need to use the network. 955This is possible with the hostfs virtual filesystem. With it, you 956can mount a host directory into the UML filesystem and access the 957files contained in it just as you would on the host. 958 959*SECURITY WARNING* 960 961Hostfs without any parameters to the UML Image will allow the image 962to mount any part of the host filesystem and write to it. Always 963confine hostfs to a specific "harmless" directory (for example ``/var/tmp``) 964if running UML. This is especially important if UML is being run as root. 965 966Using hostfs 967------------ 968 969To begin with, make sure that hostfs is available inside the virtual 970machine with:: 971 972 # cat /proc/filesystems 973 974``hostfs`` should be listed. If it's not, either rebuild the kernel 975with hostfs configured into it or make sure that hostfs is built as a 976module and available inside the virtual machine, and insmod it. 977 978 979Now all you need to do is run mount:: 980 981 # mount none /mnt/host -t hostfs 982 983will mount the host's ``/`` on the virtual machine's ``/mnt/host``. 984If you don't want to mount the host root directory, then you can 985specify a subdirectory to mount with the -o switch to mount:: 986 987 # mount none /mnt/home -t hostfs -o /home 988 989will mount the hosts's /home on the virtual machine's /mnt/home. 990 991hostfs as the root filesystem 992----------------------------- 993 994It's possible to boot from a directory hierarchy on the host using 995hostfs rather than using the standard filesystem in a file. 996To start, you need that hierarchy. The easiest way is to loop mount 997an existing root_fs file:: 998 999 # mount root_fs uml_root_dir -o loop 1000 1001 1002You need to change the filesystem type of ``/`` in ``etc/fstab`` to be 1003'hostfs', so that line looks like this:: 1004 1005 /dev/ubd/0 / hostfs defaults 1 1 1006 1007Then you need to chown to yourself all the files in that directory 1008that are owned by root. This worked for me:: 1009 1010 # find . -uid 0 -exec chown jdike {} \; 1011 1012Next, make sure that your UML kernel has hostfs compiled in, not as a 1013module. Then run UML with the boot device pointing at that directory:: 1014 1015 ubd0=/path/to/uml/root/directory 1016 1017UML should then boot as it does normally. 1018 1019Hostfs Caveats 1020-------------- 1021 1022Hostfs does not support keeping track of host filesystem changes on the 1023host (outside UML). As a result, if a file is changed without UML's 1024knowledge, UML will not know about it and its own in-memory cache of 1025the file may be corrupt. While it is possible to fix this, it is not 1026something which is being worked on at present. 1027 1028Tuning UML 1029============ 1030 1031UML at present is strictly uniprocessor. It will, however spin up a 1032number of threads to handle various functions. 1033 1034The UBD driver, SIGIO and the MMU emulation do that. If the system is 1035idle, these threads will be migrated to other processors on a SMP host. 1036This, unfortunately, will usually result in LOWER performance because of 1037all of the cache/memory synchronization traffic between cores. As a 1038result, UML will usually benefit from being pinned on a single CPU 1039especially on a large system. This can result in performance differences 1040of 5 times or higher on some benchmarks. 1041 1042Similarly, on large multi-node NUMA systems UML will benefit if all of 1043its memory is allocated from the same NUMA node it will run on. The 1044OS will *NOT* do that by default. In order to do that, the sysadmin 1045needs to create a suitable tmpfs ramdisk bound to a particular node 1046and use that as the source for UML RAM allocation by specifying it 1047in the TMP or TEMP environment variables. UML will look at the values 1048of ``TMPDIR``, ``TMP`` or ``TEMP`` for that. If that fails, it will 1049look for shmfs mounted under ``/dev/shm``. If everything else fails use 1050``/tmp/`` regardless of the filesystem type used for it:: 1051 1052 mount -t tmpfs -ompol=bind:X none /mnt/tmpfs-nodeX 1053 TEMP=/mnt/tmpfs-nodeX taskset -cX linux options options options.. 1054 1055******************************************* 1056Contributing to UML and Developing with UML 1057******************************************* 1058 1059UML is an excellent platform to develop new Linux kernel concepts - 1060filesystems, devices, virtualization, etc. It provides unrivalled 1061opportunities to create and test them without being constrained to 1062emulating specific hardware. 1063 1064Example - want to try how linux will work with 4096 "proper" network 1065devices? 1066 1067Not an issue with UML. At the same time, this is something which 1068is difficult with other virtualization packages - they are 1069constrained by the number of devices allowed on the hardware bus 1070they are trying to emulate (for example 16 on a PCI bus in qemu). 1071 1072If you have something to contribute such as a patch, a bugfix, a 1073new feature, please send it to ``linux-um@lists.infradead.org`` 1074 1075Please follow all standard Linux patch guidelines such as cc-ing 1076relevant maintainers and run ``./sripts/checkpatch.pl`` on your patch. 1077For more details see ``Documentation/process/submitting-patches.rst`` 1078 1079Note - the list does not accept HTML or attachments, all emails must 1080be formatted as plain text. 1081 1082Developing always goes hand in hand with debugging. First of all, 1083you can always run UML under gdb and there will be a whole section 1084later on on how to do that. That, however, is not the only way to 1085debug a linux kernel. Quite often adding tracing statements and/or 1086using UML specific approaches such as ptracing the UML kernel process 1087are significantly more informative. 1088 1089Tracing UML 1090============= 1091 1092When running UML consists of a main kernel thread and a number of 1093helper threads. The ones of interest for tracing are NOT the ones 1094that are already ptraced by UML as a part of its MMU emulation. 1095 1096These are usually the first three threads visible in a ps display. 1097The one with the lowest PID number and using most CPU is usually the 1098kernel thread. The other threads are the disk 1099(ubd) device helper thread and the sigio helper thread. 1100Running ptrace on this thread usually results in the following picture:: 1101 1102 host$ strace -p 16566 1103 --- SIGIO {si_signo=SIGIO, si_code=POLL_IN, si_band=65} --- 1104 epoll_wait(4, [{EPOLLIN, {u32=3721159424, u64=3721159424}}], 64, 0) = 1 1105 epoll_wait(4, [], 64, 0) = 0 1106 rt_sigreturn({mask=[PIPE]}) = 16967 1107 ptrace(PTRACE_GETREGS, 16967, NULL, 0xd5f34f38) = 0 1108 ptrace(PTRACE_GETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=832}]) = 0 1109 ptrace(PTRACE_GETSIGINFO, 16967, NULL, {si_signo=SIGTRAP, si_code=0x85, si_pid=16967, si_uid=0}) = 0 1110 ptrace(PTRACE_SETREGS, 16967, NULL, 0xd5f34f38) = 0 1111 ptrace(PTRACE_SETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=2696}]) = 0 1112 ptrace(PTRACE_SYSEMU, 16967, NULL, 0) = 0 1113 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=16967, si_uid=0, si_status=SIGTRAP, si_utime=65, si_stime=89} --- 1114 wait4(16967, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP | 0x80}], WSTOPPED|__WALL, NULL) = 16967 1115 ptrace(PTRACE_GETREGS, 16967, NULL, 0xd5f34f38) = 0 1116 ptrace(PTRACE_GETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=832}]) = 0 1117 ptrace(PTRACE_GETSIGINFO, 16967, NULL, {si_signo=SIGTRAP, si_code=0x85, si_pid=16967, si_uid=0}) = 0 1118 timer_settime(0, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=2830912}}, NULL) = 0 1119 getpid() = 16566 1120 clock_nanosleep(CLOCK_MONOTONIC, 0, {tv_sec=1, tv_nsec=0}, NULL) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) 1121 --- SIGALRM {si_signo=SIGALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_value={int=1631716592, ptr=0x614204f0}} --- 1122 rt_sigreturn({mask=[PIPE]}) = -1 EINTR (Interrupted system call) 1123 1124This is a typical picture from a mostly idle UML instance 1125 1126* UML interrupt controller uses epoll - this is UML waiting for IO 1127 interrupts: 1128 1129 epoll_wait(4, [{EPOLLIN, {u32=3721159424, u64=3721159424}}], 64, 0) = 1 1130 1131* The sequence of ptrace calls is part of MMU emulation and runnin the 1132 UML userspace 1133* ``timer_settime`` is part of the UML high res timer subsystem mapping 1134 timer requests from inside UML onto the host high resultion timers. 1135* ``clock_nanosleep`` is UML going into idle (similar to the way a PC 1136 will execute an ACPI idle). 1137 1138As you can see UML will generate quite a bit of output even in idle.The output 1139can be very informative when observing IO. It shows the actual IO calls, their 1140arguments and returns values. 1141 1142Kernel debugging 1143================ 1144 1145You can run UML under gdb now, though it will not necessarily agree to 1146be started under it. If you are trying to track a runtime bug, it is 1147much better to attach gdb to a running UML instance and let UML run. 1148 1149Assuming the same PID number as in the previous example, this would be:: 1150 1151 # gdb -p 16566 1152 1153This will STOP the UML instance, so you must enter `cont` at the GDB 1154command line to request it to continue. It may be a good idea to make 1155this into a gdb script and pass it to gdb as an argument. 1156 1157Developing Device Drivers 1158========================= 1159 1160Nearly all UML drivers are monolithic. While it is possible to build a 1161UML driver as a kernel module, that limits the possible functionality 1162to in-kernel only and non-UML specific. The reason for this is that 1163in order to really leverage UML, one needs to write a piece of 1164userspace code which maps driver concepts onto actual userspace host 1165calls. 1166 1167This forms the so called "user" portion of the driver. While it can 1168reuse a lot of kernel concepts, it is generally just another piece of 1169userspace code. This portion needs some matching "kernel" code which 1170resides inside the UML image and which implements the Linux kernel part. 1171 1172*Note: There are very few limitations in the way "kernel" and "user" interact*. 1173 1174UML does not have a strictly defined kernel to host API. It does not 1175try to emulate a specific architecture or bus. UML's "kernel" and 1176"user" can share memory, code and interact as needed to implement 1177whatever design the software developer has in mind. The only 1178limitations are purely technical. Due to a lot of functions and 1179variables having the same names, the developer should be careful 1180which includes and libraries they are trying to refer to. 1181 1182As a result a lot of userspace code consists of simple wrappers. 1183F.e. ``os_close_file()`` is just a wrapper around ``close()`` 1184which ensures that the userspace function close does not clash 1185with similarly named function(s) in the kernel part. 1186 1187Security Considerations 1188----------------------- 1189 1190Drivers or any new functionality should default to not 1191accepting arbitrary filename, bpf code or other parameters 1192which can affect the host from inside the UML instance. 1193For example, specifying the socket used for IPC communication 1194between a driver and the host at the UML command line is OK 1195security-wise. Allowing it as a loadable module parameter 1196isn't. 1197 1198If such functionality is desireable for a particular application 1199(e.g. loading BPF "firmware" for raw socket network transports), 1200it should be off by default and should be explicitly turned on 1201as a command line parameter at startup. 1202 1203Even with this in mind, the level of isolation between UML 1204and the host is relatively weak. If the UML userspace is 1205allowed to load arbitrary kernel drivers, an attacker can 1206use this to break out of UML. Thus, if UML is used in 1207a production application, it is recommended that all modules 1208are loaded at boot and kernel module loading is disabled 1209afterwards. 1210