Reverse engineering: "Decompile" the docker image into a Dockerfile

Reverse engineer the Docker image by studying its internal structure.

TL;DR

In this article, we will reverse engineer a Docker image by understanding how a Docker image stores data, and how to use tools to view all aspects of the image; and how to use Python's Docker API to build a tool like Dedockify to create a Dockerfile .

Introduction

As public Docker registries such as Docker Hub and TreeScale become more popular, it is becoming more common for administrators and developers to download Docker images from unknown sources. In most cases, convenience outweighs predictable risks. Under normal circumstances, when a Docker image is published, it will appear directly in the list, in the git repository, or through related links. Sometimes the image does not provide a Dockerfile. Even if the Dockerfile is provided, it is difficult for us to guarantee that the pre-built image is built by the given Dockerfile. These images are a black box for us, and we cannot even guarantee the safety of their use.

Maybe you don't care about security vulnerabilities, you may just want to update your usual mirrors, such as nginx, to run on the latest version of Ubuntu. Or, you may want to release a more optimized image because another distribution's compiler is better suited to generating binaries at compile time.

Whatever the reason, we need the option to restore the image to a Dockerfile. A Docker image is not a black box. Most of the information needed to rebuild the Dockerfile can be retrieved. By looking inside the Docker image and inspecting its internal structure, we will be able to reconstruct a Dockerfile from an arbitrary precompiled container.

In this article, we'll show how to rebuild a Dockerfile from an image using two tools: the aforementioned Dedockify, a Python script, and Dive, a Docker image browsing tool. The basic flow used is as follows.

Use Dive

Dive demo

To quickly understand how images are composed, we will use Dive to learn some advanced Docker concepts that may be unfamiliar to us. The Dive tool can inspect each layer of the Docker image.

Let's create a simple Dockerfile for testing.

Paste this code snippet directly into the command line of the Linux host with Docker installed:

mkdir $HOME/test1
cd $HOME/test1
cat > Dockerfile << EOF ; touch testfile1 testfile2 testfile3
FROM scratch
COPY testfile1 /
COPY testfile2 /
COPY testfile3 /
EOF

Enter the above content and press Enter, we create a new Dockerfile and fill it with 3 zero-byte test files in the same directory.

$ ls
Dockerfile  testfile1  testfile2  testfile3

Now, let's build an image using this Dockerfile and mark it as example1.

docker build . -t example1

Building the example1 image produces the following output:

Sending build context to Docker daemon  3.584kB
Step 1/4 : FROM scratch
 --->
Step 2/4 : COPY testfile1 /
 ---> a9cc49948e40
Step 3/4 : COPY testfile2 /
 ---> 84acff3a5554
Step 4/4 : COPY testfile3 /
 ---> 374e0127c1bc
Successfully built 374e0127c1bc
Successfully tagged example1:latest

Now the example1 image we just built is complete:

$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
example1            latest              374e0127c1bc        31 seconds ago      0B

Since there is no executable file, the image will not run. We only use this as a simplified example of how to view storage layers in a Docker image.

We can see from the size of the image that there is no source image here. We use scratch instead of the source image, which lets Docker use a zero-byte blank image as the source image. We then modified the blank image by copying three additional zero-byte test files and marked the modifications as example1.

Now, let's use Dive to view this new image.

docker run --rm -it \
    -v /var/run/docker.sock:/var/run/docker.sock \
    wagoodman/dive:latest example1

Executing the above command will automatically pull the wagoodman/dive image from Docker Hub and generate Dive output.

Unable to find image 'wagoodman/dive:latest' locally
latest: Pulling from wagoodman/dive
89d9c30c1d48: Pull complete
5ac8ae86f99b: Pull complete
f10575f61141: Pull complete
Digest: sha256:2d3be9e9362ecdcb04bf3afdd402a785b877e3bcca3d2fc6e10a83d99ce0955f
Status: Downloaded newer image for wagoodman/dive:latest
Image Source: docker://example-image
Fetching image... (this can take a while for large images)
Analyzing image...
Building cache...

Select the three layers of the mirror up and down in the list, and find the three files in the directory tree displayed on the right.

We can see that the content on the right changes as each layer is selected. When each file is copied to a blank Docker scratch image, it is stored as a new layer.

If you notice, we can also see the commands used to generate each layer. We can also see the hashes of the source and updated files.

If we notice the Command: part, we should see the following:

#(nop) COPY file:e3c862873fa89cbf2870e2afb7f411d5367d37a4aea01f2620f7314d3370edcc in /
#(nop) COPY file:2a949ad55eee33f6191c82c4554fe83e069d84e9d9d8802f5584c34e79e5622c in /
#(nop) COPY file:aa717ff85b39d3ed034eed42bc1186230cfca081010d9dde956468decdf8bf20 in /

Each command provides the original command in the Dockerfile used to build the image. However, the original filename is lost. It appears that the only way to recover this information is to observe changes to the target file system, or to infer from other details. More on that later.

Docker History

In addition to third-party tools like dive, a tool we can use at our fingertips is docker history. If we use the docker history command on the example1 image, we can view the entries we used in the Dockerfile to create the image.

docker history example1

Running play should yield the following results:

IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
374e0127c1bc        25 minutes ago      /bin/sh -c #(nop) COPY file:aa717ff85b39d3ed…   0B
84acff3a5554        25 minutes ago      /bin/sh -c #(nop) COPY file:2a949ad55eee33f6…   0B
a9cc49948e40        25 minutes ago      /bin/sh -c #(nop) COPY file:e3c862873fa89cbf…   0B

Everything in the CREATED BY column is truncated. These are Dockerfile instructions passed through the Bourne shell. This information may be useful to recreate our Dockerfile, although it is truncated here, but we can also see the complete information by using the no-trunc option:

$ docker history example1 --no-trunc
IMAGE                                                                     CREATED             CREATED BY                                                                                           SIZE                COMMENT
sha256:374e0127c1bc51bca9330c01a9956be163850162f3c9f3be0340bb142bc57d81   29 minutes ago      /bin/sh -c #(nop) COPY file:aa717ff85b39d3ed034eed42bc1186230cfca081010d9dde956468decdf8bf20 in /    0B
sha256:84acff3a5554aea9a3a98549286347dd466d46db6aa7c2e13bb77f0012490cef   29 minutes ago      /bin/sh -c #(nop) COPY file:2a949ad55eee33f6191c82c4554fe83e069d84e9d9d8802f5584c34e79e5622c in /    0B
sha256:a9cc49948e40d15166b06dab42ea0e388f9905dfdddee7092f9f291d481467fc   29 minutes ago      /bin/sh -c #(nop) COPY file:e3c862873fa89cbf2870e2afb7f411d5367d37a4aea01f2620f7314d3370edcc in /    0B

While this has some useful information, restoring it from the command line can be a bit challenging. We can also use docker inspect. However, in this article, we will focus on using the Docker Engine API with Python.

Using the Python Docker Engine API

Docker has released a Python library for the Docker Engine API, allowing Docker to be managed in Python. In the example below, we can restore similar information to docker history by running the following Python 3 code:

#!/usr/bin/python3

import docker

cli = docker.APIClient(base_url='unix://var/run/docker.sock')
print (cli.history('example1'))

The output is as follows:

[{'Comment': '', 'Created': 1583008507, 'CreatedBy': '/bin/sh -c #(nop) COPY file:aa717ff85b39d3ed034eed42bc1186230cfca081010d9dde956468decdf8bf20 in / ', 'Id': 'sha256:374e0127c1bc51bca9330c01a9956be163850162f3c9f3be0340bb142bc57d81', 'Size': 0, 'Tags': ['example:latest']}, {'Comment': '', 'Created': 1583008507, 'CreatedBy': '/bin/sh -c #(nop) COPY file:2a949ad55eee33f6191c82c4554fe83e069d84e9d9d8802f5584c34e79e5622c in / ', 'Id': 'sha256:84acff3a5554aea9a3a98549286347dd466d46db6aa7c2e13bb77f0012490cef', 'Size': 0, 'Tags': None}, {'Comment': '', 'Created': 1583008507, 'CreatedBy': '/bin/sh -c #(nop) COPY file:e3c862873fa89cbf2870e2afb7f411d5367d37a4aea01f2620f7314d3370edcc in / ', 'Id': 'sha256:a9cc49948e40d15166b06dab42ea0e388f9905dfdddee7092f9f291d481467fc', 'Size': 0, 'Tags': None}]

Based on the output, we can find that if we reconstruct the contents of the Dockerfile, we only need to parse all relevant data and reverse its order. But as we saw before, we also noticed that there was some hashed content in the COPY instruction. As mentioned before, the hashed content here represents the file name used from outside the layer. This information cannot be restored directly. However, as we saw in Dive, we can infer these names when we search for changes made to this image layer. Sometimes these file names can also be inferred in cases where the original copy instructions had them as targets. In other cases, the filename may not matter, allowing us to use arbitrary filenames. In other cases, although more difficult to evaluate, we can infer file names that are back-referenced elsewhere in the system, such as in supporting dependencies such as scripts or configuration files. But regardless, it is most reliable to search for all changes between layers.

Dedockify

Let's go a few steps deeper. In order to better reverse engineer this image into a Dockerfile, we need to parse everything and reformat it into a readable form. To simplify our experiments, the following code is already available from the Dedockify repository on GitHub. Thanks to  LanikSJ  for all the groundwork and coding.

from sys import argv
import docker

class ImageNotFound(Exception):
    pass

class MainObj:
    def __init__(self):
        super(MainObj, self).__init__()
        self.commands = []
        self.cli = docker.APIClient(base_url='unix://var/run/docker.sock')
        self._get_image(argv[-1])
        self.hist = self.cli.history(self.img['RepoTags'][0])
        self._parse_history()
        self.commands.reverse()
        self._print_commands()

    def _print_commands(self):
        for i in self.commands:
            print(i)

    def _get_image(self, img_hash):
        images = self.cli.images()
        for i in images:
            if img_hash in i['Id']:
                self.img = i
                return
        raise ImageNotFound("Image {} not found\n".format(img_hash))

    def _insert_step(self, step):
        if "#(nop)" in step:
            to_add = step.split("#(nop) ")[1]
        else:
            to_add = ("RUN {}".format(step))
        to_add = to_add.replace("&&", "\\\n    &&")
        self.commands.append(to_add.strip(' '))

    def _parse_history(self, rec=False):
        first_tag = False
        actual_tag = False
        for i in self.hist:
            if i['Tags']:
                actual_tag = i['Tags'][0]
                if first_tag and not rec:
                    break
                first_tag = True
            self._insert_step(i['CreatedBy'])
        if not rec:
            self.commands.append("FROM {}".format(actual_tag))

__main__ = MainObj()

Generate initial Dockerfile

If you have completed this step, you should have two images on the host of your experiment: wagoodman/dive and our custom example1 image.

$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
example1            latest              374e0127c1bc        42 minutes ago      0B
wagoodman/dive      latest              4d9ce0be7689        2 weeks ago         83.6MB

Running this command on the example1 image we use dedockify will ultimately produce the following results:

$ python3 dedockify.py 374e0127c1bc
FROM example1:latest
COPY file:e3c862873fa89cbf2870e2afb7f411d5367d37a4aea01f2620f7314d3370edcc in /
COPY file:2a949ad55eee33f6191c82c4554fe83e069d84e9d9d8802f5584c34e79e5622c in /
COPY file:aa717ff85b39d3ed034eed42bc1186230cfca081010d9dde956468decdf8bf20 in /

The information we extracted is almost identical to what we saw when we used Dive to parse the image. Note that the FROM instruction displays example1:late instead of scratch. In this case, our code made incorrect assumptions about the base image.

For comparison, we do the same thing with the wagoodman/dive image.

$ python3 dedockify.py 4d9ce0be7689
FROM wagoodman/dive:latest
ADD file:fe1f09249227e2da2089afb4d07e16cbf832eeb804120074acd2b8192876cd28 in /
CMD ["/bin/sh"]
ARG DOCKER_CLI_VERSION=
RUN |1 DOCKER_CLI_VERSION=19.03.1 /bin/sh -c wget -O- https://download.docker.com/linux/static/stable/x86_64/docker-${DOCKER_CLI_VERSION}.tgz |     tar -xzf - docker/docker --strip-component=1 \
    &&     mv docker /usr/local/bin
COPY file:8385774b036879eb290175cc42a388877142f8abf1342382c4d0496b6a659034 in /usr/local/bin/
ENTRYPOINT ["/usr/local/bin/dive"]

This image shows more errors than example1. We see that the ADD instruction comes just before the FROM instruction. Once again our code makes the wrong assumption. We don't know what the ADD instruction adds. However, we can intuitively make the assumption that we are not sure what the base image is. The ADD instruction may be used to extract a local tar file to the root directory. It is also possible to use this method to load another base image.

Dedockify limitation testing

Let's experiment by creating a sample Dockerfile where we explicitly define the base image. Just like we did before, run the following code directly from the command line in an empty directory.

mkdir $HOME/test2
cd $HOME/test2
cat > Dockerfile << EOF ; touch testfile1 testfile2 testfile3
FROM ubuntu:latest
RUN mkdir testdir1
COPY testfile1 /testdir1
RUN mkdir testdir2
COPY testfile2 /testdir2
RUN mkdir testdir3
COPY testfile3 /testdir3
EOF

Then build the image and mark our new image as example2. This will create an image similar to before, except instead of using scratch, we will use ubuntu:latest as the base image.

$ docker build . -t example2
Sending build context to Docker daemon  3.584kB
Step 1/7 : FROM ubuntu:latest
 ---> 72300a873c2c
Step 2/7 : RUN mkdir testdir1
 ---> Using cache
 ---> 4110037ae26d
Step 3/7 : COPY testfile1 /testdir1
 ---> Using cache
 ---> e4adf6dc5677
Step 4/7 : RUN mkdir testdir2
 ---> Using cache
 ---> 22d301b39a57
Step 5/7 : COPY testfile2 /testdir2
 ---> Using cache
 ---> f60e5f378e13
Step 6/7 : RUN mkdir testdir3
 ---> Using cache
 ---> cec486378382
Step 7/7 : COPY testfile3 /testdir3
 ---> Using cache
 ---> 05651f084d67
Successfully built 05651f084d67
Successfully tagged example2:latest

Since we now have a slightly more complex Dockerfile to rebuild, and we also have the Dockerfile used to generate this image, we can make a comparison.

$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
example2            latest              05651f084d67        2 minutes ago       64.2MB
example1            latest              374e0127c1bc        1 hour ago          0B
ubuntu              latest              72300a873c2c        9 days ago          64.2MB
wagoodman/dive      latest              4d9ce0be7689        3 weeks ago         83.6MB

Run the dedockify script

$ python3 dedockify.py 05651f084d67
FROM ubuntu:latest
RUN /bin/sh -c mkdir testdir1
COPY file:cc4f6e89a1bc3e3c361a1c6de5acc64d3bac297f0b99aa75af737981a19bc9d6 in /testdir1
RUN /bin/sh -c mkdir testdir2
COPY file:a04cdcdf5fd077a994fe5427a04f6b9a52288af02dad44bb1f8025ecf209b339 in /testdir2
RUN /bin/sh -c mkdir testdir3
COPY file:2ed8ccde7cd97bc95ca15f0ec24ec447484a8761fa901df6032742e8f1a2a191 in /testdir3

This matches the original Dockerfile very well. There is no ADD instruction this time, and the FROM instruction is correct. As long as our base image is defined in the original Dockerfile, and we avoid using scratch or ADD to create a base image from a tar file, we should be able to rebuild the Dockerfile fairly accurately. However, we still don't know the name of the original file that was copied.

Any Dockerfile reconstruction

Now, let us try to reverse engineer a Docker container in the right way using the tools we have discussed. The container we will use is modified from the example above. Our previous Dockerfile was modified to example3. By adding a small binary executable file, the image is ready to run. The source code can be found in Dedockify’s GitHub repository. Since this image is so small, we don't need to build or pull it. We can demonstrate our command line skills by copy-pasting the entire container into our Docker environment with the snippet below :)

uudecode << EOF | zcat | docker load
begin-base64 600 -
H4sICMicXV4AA2V4YW1wbGUzLnRhcgDtXVtvG8cVVnp56UN/QJ/YDQokgETN
zJkrgTykjgsbDSzDURMkshDM5YzFhiJVkkpiCELzH/pP+tYfkf/UsxRNXdxI
spe7lqv5IJF7PTM7Z87MmY9nZxgL2qG1DkN2nkXmtTecQwYrMMfsBHgXgFuV
eXLCI5c26sxdNHQsie2Nm8GYZEYp+l7g6vdim4MWBrgyBjaY4MbIjZ66hezG
OJ7N/ZSyMp1M5tddd9P5qw/3noA11f+XD5998XjnybVpcMa0lNfoH67oHxiI
jV4nhXjP9c/7701WC1pAY/v/+2wyvimNG+zfgLpi/0KbYv+d4KQapmpQNa0G
1WZ15Kc4npMsr7JUjGGMyLUzPEXJc1RCWqFkTtoEL5TgEaLSKlmOiXHnUSlK
jYsQSFacop9jnTHuDNtinP52GRss/r6pL5iM5344xum3tJWHL6rBSfVoMpuP
/SHSXXTFZ5NDuuB8/28znJ5tfTqf+3jwxTwNx9Ug+9EMLxybHM9fP4jT6erg
7vzlanvnCMeX5Sz2dsYRV0cejr+vBuPj0WizenCYXm0+PvQvlhn7cjI6PsTZ
qzNfTabfDccvPhsuc/twPJ++PJoM66I9u2Jn/Ofj4Wgl6nMfcLS8/XSzmtBm
NRqOj3+sTm+h/8b2P/IvcdqvbeiX07je/uXr/h9oZor9dwF/dHQbF74R3sz/
F1RfuKbLi//fAWr993846C/+J0f/aCONRR9/g/4vbXNQShb7LygoKGgTTDkP
1tiEUvkgJajgnTZRA0pAplFqZ1m02hqXjc4+aaVTztx4pzywfvPxH7X1V/0/
oZgu7X8XOKn8NB4M5xjnx9N6ROIPk5ZnI6y7P67aq55+uvvok+3j2XR7NIl+
tD0Lw/Hgwv5q9/zEYuNslz6q/f85MJsd0CBVD4SHEDhGkM4LDIqHQN4JjU5s
5NqiJguRAr3nKpgICoRhyLNiCgHQOLxhfLdN7teVMd7e4uD2AY5Gkzpv14/2
VuPgegyvDA3aOATr5cJkJUs0tMsRacytE6MsGZnp/piCEMaiijHmJJOIihvN
bh5WX0zh/7kqkA5od3t2QI+yFenjw4/Gk6OPe7Wqnuw++/rpzuMnu7295xdU
9bzar29/X6rPyenpRZZFMMG2GGwxsStgoPhAQF8KrZ1grqZb0iR+R5Xie5zO
hpPxgpbpM+hrOnUwnM0nU1LY3sm1AnnfOXCgDDPfnDM834aX9XOclXZvK/aW
Jf3VzrO/fvb4WW97jjPS95RXp5vXyxd9Qd0MSCnsLeQ/2Hn6dS8PRzgAIbLL
TiBEJ9Ej8hRZEmCURW6CcTaD8+h18BhtdsrJIFDGmBigS6w3HPfqTNbCeO8W
2SQjVBK4lW9RDOIW8hUZv+PG8XdWDOI2xWA4mYGkKvYWxQC3kO+YdMYxod5Z
McDNxQB9Zpkg38iJNymG2uxvFi005+RJwRsVQAAIkmURmaF+gwUXvAZjApis
tWaSJyWFoAaKOS1DFJrKR0omQSSNVvO6ABaNz20e/mITc0MOe9c1vJsVHh7N
X3674CKrwXx6jKf7l6jQzar233Ld8lXzl0d1E724eFY3bsOcvx2mWd14Lttt
riUXQXJAZFJn5MLriMHJTA6yT44zHZjHQJ2qZM5watCpBeeevFdMdANJXUkC
7ZMxjCq7TI6cbW+zUkEpqj/RMrBeWInCUuFqj3QZTxays9RrBNT+XBJDh2Qw
wQkfqXcAqjZK5RA5SNIJzywKSTKTMGiYjkoknwJjzgVg1vBwLilYbbwhFzsz
LYTTwqNgRiMEZhMYK7zztZ9gmHUZap8/0WOROCXRenUhT1Q8IlhltBbOihhR
ZhOtoucgwVRolD3DkqprS6bHZzlbmxMNOBJGRYVxLsmDcSGZJBkTjkdlEtW3
IKj4nA0oUJHDYgP5MZQ3qjHKZCrDqJSylvwZhAtPF5PRMfJEt4I0kjvms/Qq
U4VVnp6O6jD3PJpaujKcLqJqaNF5TlrR5lxS5jQ2IfVRv22MAhbInjAaiNIp
zkSGCAacVmAj0IPWDVoKPFCjQMXi0VX7p7eh4N8phI70kElxeg7vhJIKZYpa
qOy9J9Vpck3Qkx2I2qOMIUrSQ46GKrzkqHRb8R+cF/63CzTWfyvxH8LI8vtP
JyjxH/cbje1/DfEfWvDX4j8YL/bfBZbxH02rQYnZ6DBmY51obP/txH8oUX7/
7QSv+LU2g0DezP/nVF+EBij+fxdY6b/FIJC6PN4s/kOqWv/F/gsKCgrag2Jg
EIxFl3RyIWQB2bjkDIiYFeTIUuLJy6SNNUG4wKPl3GeXEgSfWuL/BNTxv6X/
bx+N9d/O+18gVen/u0Dh/+43Gtt/K+9/gWDF/jvBkv9rWg0uvf/lBSZQEqIi
YUoB50E7iU4JFD4ESbcok5Kq6SWbIWjFpbFSCOmUYaxwiR1yiY3tvxX+T3JR
3v/sBK8Cy+4O/yfO+L/y/lcnWOn/rvF/hf8vKCgoaBXGQgwYvEKWwNmY0euo
As/cks/mFGMgBI+BfD7jaJtrHaMnhx1ZMAF9a/M/6dL/d4HG+m9r/qfS/3eC
wv/dbzS2/7bmfyr23wmW/F/TanCR/7PKBZAyeI7SuGDqd0Y9ahVYZj6nJFFq
pZS2UieBDAC05HRxNBKE06nwfx3yf43tv6X5n3iJ/+0Er96ovHP8X/H/O8FK
/3eN/yu//xUUFBS0iqbOekv8nzSs9P9doLH+23n/V4ky/1cnKPzf/UZj+2+H
/wNd7L8TLPm/NXB2K/5vDbGEhf/riv9rbP/t8H/Ayvt/naDE/xX9L/S/mrxv
/Wnc7P9f1P+C/5OyxP8WFBQUtIqmizW1Ff9X+v9u0Fj/Lc3/Z4r/3wkK/3e/
0dj+W4r/K/P/dIMl/7eGNRtX/N8aYgkL/9cV/9fY/lua/0+xYv9doMT/Ff0v
9L9atWL9abwV/1fi/wsKCgpaRdPJetqK/6vX/y39f/torP+W4v94+f2vExT+
736jsf23FP+niv13glf8X/M5+1b8X1lL5H3i/5rafzv8n9Bl/a9OcPfW/1jy
f8X/7wQr/a/WMF5/Gm/F/xX+v6CgoKBVJPK8tUkQIUjIXCXMBoJj0oPO3Btu
jJc86SyRWaFFDOhQ8QxKJ8uya2n9D6bK+h+doLH+25n/j/My/0cnKPzf/UZj
+2/O/wmQr63/a0r8TzdY8n9Nq8FF/o+xoB1a6zBk51lkXpMADhmswByzE+Bd
AG5V5skJj1zaSKm4aOhYEpf4P8EE22KwxcSugIHiAwF9KbR2grlLRGCdsKrf
XuUQrJe25hYlS5SRHDFpoxNDACOzDxhToCpmUcUYc5JJRMWNZjfzihdTuGsc
41719NPdR59sH8+m26NJ9KPtWRiOBxf2V7vnJxYbZ7v0Ue0vScq9qt7dnh3Q
o2xF+vjwo/Hk6ONezUc+2X329dOdx092e3vPq21/dLR9gKPR5Hm1X9/+itic
HXih9EB4CIFjBOm8wKB4CIxMnRkbubaokRoGgd5zFUwEBcIw5FkxRbpC4/AG
frRO/gpHunchT3WGfoEtPTk93azSJH5Hqv4ep7PhZLxgm/sM+vqsItx9Wnmd
Kr/MS9919S3Jbj+NB8M5xvnxtM61P0y6tst3uJ72+4bG/X9z/l+9Pv4DXd7/
6Qa1sbWdxhuP/znXqsT/doJVY9tiGrX9X6t/Y67w/9SPlPF/J/jnw8//8sFq
78Fvf7Xx640/0Hf9/xs68h9OG//+/c8//am38cHGv37+6VFdWTZ7P0ymo/TH
3727jBcUFBQUNMKhHw8zzub9W/B4b4sb/H8O0lzyBemIEuX9n06wd1I9WFId
FVPOgzU2oVQ+SAkqeKdN1IASaKiPUrsFr2aNy0Znn8hNTzlz453ywBZ1iIbf
z/BosutfzOrRPP7oD49GCIORryNM6iH95/WIcXGyaaTg+eCTUl1fJBsJW9+y
uCRsfXPskbD1LdhRl9na3v4lYU1530vC1sdLVPun+8VP/SVMyVZnw/lkOsRZ
W2nc9PsvB7ja/pNBlPa/C5ysWuia61420mv4Pej0tNhcQUFBwV3GfwHMszUX
AMIAAA==
====
EOF

Running it directly from the command line will load a new image example3:latest.

$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
example3 latest 059a3878de45 5 minutes ago 63B

Now, let's try to rebuild the Dockerfile.

$ python3 dedockify.py 059a3878de45
FROM example3:latest
WORKDIR /testdir1
COPY file:322f9f92e3c94eaee1dc0d23758e17b798f39aea6baec8f9594b2e4ccd03e9d0 in testfile1
WORKDIR /testdir2
COPY file:322f9f92e3c94eaee1dc0d23758e17b798f39aea6baec8f9594b2e4ccd03e9d0 in testfile2
WORKDIR /testdir3
COPY file:322f9f92e3c94eaee1dc0d23758e17b798f39aea6baec8f9594b2e4ccd03e9d0 in testfile3
WORKDIR /app
COPY file:b33b40f2c07ced0b9ba6377b37f666041d542205e0964bc26dc0440432d6e861 in hello
ENTRYPOINT ["/app/hello"]

This gives us a base Dockerfile. Since example3:latest is the name of this image, we can assume from the context that it uses `scratch`. Now, we need to see what files were copied to /testdir1, /testdir2, /testdir3 and /app. Let's run this image in Dive and see how to recover the lost data.

docker run - rm -it \
 -v /var/run/docker.sock:/var/run/docker.sock \
 wagoodman/dive:latest example3:latest

If you select down to the last level, you will be able to see all the missing data populated into the directory tree on the right. Zero-byte files named testfile1, testfile2, and testfile3 were copied to each directory. In the last layer, a 63-byte file called hello is copied to the /app directory.

Let's restore these files! Since files cannot be copied directly from the image, we need to create a container first.

$ docker run -td --name example3 example3:latest
6fdca182a128df7a76e618931c85a67e14a73adc69ad23782bc9a5dc29420a27

Now, let’s copy the files we need from the container to the host using the paths and filenames restored from Dive below.

mkdir $HOME/test3
cd $HOME/test3
docker cp example3:/testdir1/testfile1 .
docker cp example3:/testdir2/testfile2 .
docker cp example3:/testdir3/testfile3 .
docker cp example3:/app/hello .

We may have to check if our container is still running first.

$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6fdca182a128 example3:latest "/app/hello" 2 minutes ago Up 2 minutes wizardly_lamport

If the container isn't running for some reason, that's okay. We can verify its status to see if it has stopped.

$ docker container ls -a

We can also view the running log.

$ docker logs 6fdca182a128
Hello, world!

It appears to be running a program that prints Hello, world!. In fact, in this case, the Hello, world! program is not designed to run all the time. In version 19.03.6 of Docker, there may be a bug that prevents the program from terminating gracefully. This is acceptable for now. Containers can be active or stopped; the application does not require persistence to recover any data we need. Containers in any state only need to be spawned from the source image from which we are pulling data.

Verify its behavior by running the recovered executable and we should see the following:

$ ./hello
Hello, world!

Using the Dockerfile we generated earlier, we can update it to include all the new details. This includes updating the FROM directive to start from scratch, as well as all the filenames we discovered while exploring with Dive.

FROM scratch
WORKDIR /testdir1
COPY testfile1 .
WORKDIR /testdir2
COPY testfile2 .
WORKDIR /testdir3
COPY testfile3 .
WORKDIR /app
COPY hello .
ENTRYPOINT ["/app/hello"]

Again, merging all the files into a shared folder, we can run our reverse engineered Dockerfile.

Let's build an image first.

$ docker build . -t example3:recovered
Sending build context to Docker daemon 4.608kB
Step 1/10 : FROM scratch
 - ->
Step 2/10 : WORKDIR /testdir1
 - -> Running in 5e8e47505ca6
Removing intermediate container 5e8e47505ca6
 - -> d30a2f002626
Step 3/10 : COPY testfile1 .
 - -> 4ac46077a588
Step 4/10 : WORKDIR /testdir2
 - -> Running in 8c48189da985
Removing intermediate container 8c48189da985
 - -> 7c7d90bc2219
Step 5/10 : COPY testfile2 .
 - -> 5b40d33100e1
Step 6/10 : WORKDIR /testdir3
 - -> Running in 4ccd634a04db
Removing intermediate container 4ccd634a04db
 - -> f89fdda8f059
Step 7/10 : COPY testfile3 .
 - -> 9542f614200d
Step 8/10 : WORKDIR /app
 - -> Running in 7614b0fdba42
Removing intermediate container 7614b0fdba42
 - -> 6d686935a791
Step 9/10 : COPY hello .
 - -> cd4baca758dd
Step 10/10 : ENTRYPOINT ["/app/hello"]
 - -> Running in 28a1ca58b27f
Removing intermediate container 28a1ca58b27f
 - -> 35dfd9240a2e
Successfully built 35dfd9240a2e
Successfully tagged example3:recovered

Then let's run the image:

$ docker run - name recovered -dt example3:recovered
0f696bf500267a996339b522cf584e010434103fe82497df2c1fa58a9c548f20
$ docker logs recovered
Hello, world!

For further verification, let's check the image using Dive again.

docker run - rm -it \
 -v /var/run/docker.sock:/var/run/docker.sock \
 wagoodman/dive:latest example3:recovered

This image shows the same files as the original image. Comparing the two images side by side, they both match exactly. Both show the same file size. Both function exactly the same.

Below is the original Dockerfile used to generate the example3 image.

FROM alpine:3.9.2
RUN apk add - no-cache nasm
WORKDIR /app
COPY hello.s /app/hello.s
RUN touch testfile && nasm -f bin -o hello hello.s && chmod +x hello
FROM scratch
WORKDIR /testdir1
COPY - from=0 /app/testfile testfile1
WORKDIR /testdir2
COPY - from=0 /app/testfile testfile2
WORKDIR /testdir3
COPY - from=0 /app/testfile testfile3
WORKDIR /app
COPY - from=0 /app/hello hello
ENTRYPOINT ["/app/hello"]

We can see that while we can't reconstruct it perfectly, we can roughly reconstruct it. A Dockerfile built with multiple stages like this cannot be rebuilt. This information simply does not exist. Our only option is to rebuild the Dockerfile of the image we actually have. If we had images from earlier build stages, we could refactor a Dockerfile for each stage, but in this case we only have images from the final build stage. But anyway, we managed to reproduce a useful Dockerfile from the Docker image.

postscript

By using a similar approach to Dive, we should be able to update Dedockify's source code to automatically analyze each layer to recover all useful file information. Additionally, the program can be updated to automatically restore files from containers and store them locally, while also automatically making appropriate updates to the Dockerfile. Finally, the program can be updated to easily infer whether the base layer uses Scratch or another base image. With some additional modifications to the restored Dockerfile syntax, Dedockify has the potential to be updated to fully automatically reverse engineer a Docker image into a functional Dockerfile in most cases.

Guess you like

Origin blog.csdn.net/jeansboy/article/details/131726919