Hotplug in Docker

We would like to achieve two goals here:

  • Be able to plug (and unplug) devices into our applications running inside Docker containers

  • Achieve full isolation: we don’t want to overlap devices between different containers

Hotplug demo gif
Figure 1. A working example: connecting a joypad to the client triggers the right detection in RetroArch running in the server.

Mount virtual devices

Wolf creates virtual devices on-demand based on the packets that are received from Moonlight clients over the control stream.
When a new virtual device is created, we have to "mount" it inside the right running Docker container since, in order to achieve isolation, we can’t run them with the privileged flag.

Normally, Docker assigns devices available to a container at creation time. Fortunately, it supports a more permissive rule that allows to access a wider range of devices based on the device major:minor. By adding the --device-cgroup-rule we can then call mknod [1] from inside the container in order to mount the virtual devices on-demand from the outside.

Example Docker command issued by Wolf
docker exec -it <App_Container> sh -c "mknod /dev/input/<device name> c <major>:<minor>"

If a tree falls in the forest, does it make a sound?

We’ve mounted the right device into the right container, still, no application is picking it up [2]. We have to trigger some kind of "event" in order to advertise that a new device has arrived, turns out those kind of events are generated by udev.

An example udev event when you plug a joypad
UDEV  [3588.199301] add      /devices/pci0000:00/0000:00:01.3/0000:02:00.0/usb1/1-1/1-1:1.3/0003:054C:0CE6.0007/input/input20/js0 (input)
ACTION=add
SUBSYSTEM=input
DEVNAME=/dev/input/js0
ID_BUS=usb
ID_MODEL=Wireless_Controller
ID_SERIAL=Sony_Interactive_Entertainment_Wireless_Controller
ID_VENDOR=Sony_Interactive_Entertainment
ID_VENDOR_ENC=Sony\x20Interactive\x20Entertainment
ID_VENDOR_ID=054c
ID_REVISION=0100
ID_TYPE=hid
ID_USB_VENDOR_ID=054c
ID_USB_REVISION=0100
ID_USB_TYPE=hid
ID_USB_DRIVER=usbhid
MAJOR=13
MINOR=0

Use udev from the host

The first approach is to just pass the udev socket (and db files, more on this later) from the host. Sure, every application will get the event that a new device has been plugged, so this kind of defeats isolation; but since we are mounting only in one container, this should still work, right?

Unfortunately, there’s another issue: when Wolf creates a new virtual device, it will in turn automatically trigger the udev event to be "broadcast" to every listening application and if this happens before we are able to mknod the device it’ll result in an application that can’t access the device that we’ve created.

We want to be able to be in control of exactly when these events will be sent so that the flow is as follows:

The order here matters, the udev events *must* be sent after the mount
Figure 2. The order here matters, the udev events must be sent after the mount

Generating udev events

Since we have no control over udev and we can’t mount a device before it’s created we have to replicate the events that are generated by it and broadcast them in the same way to all listening applications.
First, let’s take a step back; how are application communicating with udev?

Udev internals

Though udev runs in userspace, it is highly entangled with the Linux kernel. The first entry that recognizes device insertion/deletion events is surely the Linux kernel. While there were no mechanisms for the Linux kernel to push notifications to userspace processes (with ioctl() the kernel can only provide responses for the corresponding requests from userspace processes), netlink IPC mechanism emerged and currently it is available for the kernel to send a notification first.
— https://insujang.github.io/2018-11-27/udev-device-manager-for-the-linux-kernel-in-userspace/

Normally udevd does the following steps:

  • Listens for kernel events via the netlink socket (GROUP_KERNEL)

  • When it receives a new device event, it’ll run all the rules that are defined

  • Send back the "augmented" message via another netlink socket (GROUP_UDEV)

The two different groups are crucial, whilst every user application can listen the GROUP_KERNEL events only the kernel can be the origin for those messages. GROUP_UDEV being an user space group instead can be "impersonated" by any user space application [3].

Since this will run in Docker there’s an additional mechanism that we have to keep in mind: network namespaces. Since udevd communicates via netlink a container that doesn’t run with --network=host will not receive those events; we’ll have to run our custom udev sender inside the container network namespace in order to achieve full isolation.

Faking udev

There are 3 main components that needs to be "faked" in order for tricking programs that are using libudev into using our fake events:

  • create a file under /run/udev/control this is not used for anything else apart from detection that udev is present AFAIU.

  • send a valid NETLINK_KOBJECT_UEVENT event via netlink using GROUP_UDEV

  • generate the appropriate DB entries in /run/udev/data/

    • these are just plain text files where the filename is just c<major>:<minor> and the content is roughly the same as the event message payload

The result of all this is our little CLI utility called fake-udev which we’ll install into our containers and call with our custom-generated events after mknod.

echo -ne \"ACTION=add\\0DEVNAME=input/bomb\\0DEVPATH=/devices/bomb\\0SEQNUM=1234\\0SUBSYSTEM=input\\0\" | base64 | sudo fake-udev

# `udevadm monitor` should print something like:
UDEV  [3931.403835] add      /devices/bomb  (input)

Putting it all together

Diagram

These steps will finally achieve proper hotplug detection by the applications that are running inside a Docker container without exposing any udev event/file from the host filesystem.

Luckily, reversing the steps is enough to also correctly unplug devices.

References

First off, a huge thanks goes to John McDonough for all the help in figuring most of this stuff out and for leading the way with his prototype JohnCMcDonough/virtual-gamepad.


1. This obviously requires also the MKNOD capability to be enabled (--cap-add MKNOD)
2. Some application might react to the new device if it’s using inotify, unfortunately, this is not the default behaviour in most apps/games
3. Given enough permissions, that’s why our fake udev runs as root