relates to moby/moby#32507, moby/buildkit#442
Doing some silly experimenting with RUN --mount
:
# syntax=docker/dockerfile:1
FROM alpine AS stage1
RUN mkdir -p /stage1-files
RUN echo "testing stage 1" > /stage1-files/s1-file
FROM alpine AS stage2
RUN mkdir -p /stage2-files
RUN echo "testing stage 2" > /stage2-files/s2-file
# The "utility" stage mounts "stage1" (readonly), and "stage2" (readwrite),
# processes files from "stage1", and writes the result to "stage2".
#
# The idea here is to have "utility" build-stages that have tools installed
# to manipulate other stages, without those tools ending up in the stage (layer)
# itself
FROM alpine AS utility
RUN --mount=from=stage1,dst=/stage1 --mount=from=stage2,dst=/stage2,readwrite cp -r /stage1/stage1-files /stage2/
# this "utility" stage processes files from the "stage1" stage, mounting it
# read-write to make modifications
FROM alpine AS utility2
RUN --mount=from=stage1,dst=/stage1,readwrite touch /stage1/stage1-files/s1-file2
# this of course works
FROM alpine AS utility3
RUN mkdir /utility3-files
RUN --mount=from=stage1,dst=/stage1 cp -r /stage1/stage1-files /utility3-files/
# this doesn't work: this still gives the original, unmodified layer from stage1
FROM stage1 AS attempt1
RUN apk add --no-cache tree
CMD tree /stage1-files
# this doesn't work for stage1 and 2: --mount still gives the original,
# unmodified layers from those stages
FROM alpine AS attempt2
RUN apk add --no-cache tree
RUN mkdir -p /results/stage1-result /results/stage2-result /results/utility3-result
RUN --mount=from=stage1,dst=/stage1 cp -r /stage1/stage1-files /results/stage1-result
RUN --mount=from=stage2,dst=/stage2 cp -r /stage2/stage2-files /results/stage2-result
RUN --mount=from=utility3,dst=/utility3 cp -r /utility3/utility3-files /results/utility3-result
CMD tree /results
Building works (no errors):
docker build --no-cache -t bla .
[+] Building 4.7s (18/18) FINISHED
=> local://dockerfile (Dockerfile) 0.0s
=> => transferring dockerfile: 1.88kB 0.0s
=> local://context (.dockerignore) 0.0s
=> => transferring context: 02B 0.0s
=> docker-image://docker.io/tonistiigi/dockerfile:runmount20180618@sha256:576332cea88216b4bf20c56046fabb150c675be4a504440da11970bea501281b 0.0s
=> => resolve docker.io/tonistiigi/dockerfile:runmount20180618@sha256:576332cea88216b4bf20c56046fabb150c675be4a504440da11970bea501281b 0.0s
=> => sha256:576332cea88216b4bf20c56046fabb150c675be4a504440da11970bea501281b 528B / 528B 0.0s
=> => sha256:d0fbaded5db6066249af00e1c83c06c976dc9ba74bfca3d5efee1c7856253aa3 1.58kB / 1.58kB 0.0s
=> local://dockerfile (Dockerfile) 0.0s
=> local://context (.dockerignore) 0.0s
=> CACHED docker-image://docker.io/library/alpine:latest 0.0s
=> /bin/sh -c mkdir -p /stage2-files 0.4s
=> /bin/sh -c mkdir -p /stage1-files 0.5s
=> /bin/sh -c mkdir /utility3-files 0.6s
=> /bin/sh -c apk add --no-cache tree 1.0s
=> /bin/sh -c echo "testing stage 2" > /stage2-files/s2-file 0.5s
=> /bin/sh -c echo "testing stage 1" > /stage1-files/s1-file 0.4s
=> /bin/sh -c mkdir -p /results/stage1-result /results/stage2-result /results/utility3-result 0.5s
=> /bin/sh -c cp -r /stage1/stage1-files /utility3-files/ 0.5s
=> /bin/sh -c cp -r /stage1/stage1-files /results/stage1-result 0.4s
=> /bin/sh -c cp -r /stage2/stage2-files /results/stage2-result 0.4s
=> /bin/sh -c cp -r /utility3/utility3-files /results/utility3-result 0.5s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:3ef6a42407019d37b5d13c9be1685e3ce2fabdf4c4f3d3fa294fc64e7836ded7 0.0s
=> => naming to docker.io/library/bla 0.0s
Running the resulting image produces:
docker run --rm bla
/results
├── stage1-result
│ └── stage1-files
│ └── s1-file
├── stage2-result
│ └── stage2-files
│ └── s2-file
└── utility3-result
└── utility3-files
└── stage1-files
└── s1-file
7 directories, 3 files
It's obvious from the above that I don't have a clue what/how the readwrite
option is used (in combination with from=<stage|image>
)
Hi @thaJeztah thanks for providing those examples. I think I would need such functionality in my project too, but I am still unsure if this is what I am looking for. I'd be glad if you could help! So consider the following:
My image is meant to be a game server for CounterStrike:GO. There is a tool called steamcmd that automaticly downloads and installs the game files and server files. Unfortunately when doing a full rebuild, the build process pulls ~18GB of data which takes a lot of time. But normally steamcmd just pulls delta updates based on the previous game files and it even validates the output.
There is a (not very docker-like) workaround to this: Install the the game data on our host server and use it as a source for our Dockerfile. However since we're using Docker we also need a "docker solution" for this, or the whole docker paradigma woudn't make any sense..
So far I had considered the following methods:
1) Using multi stage builds. Unfortunately it's not really the solution: The base image basically needs to be "kept-out-of-date" to play out it's perks and avoid pulling 18 GB of files everytime. Secondly, if kept out of date too long the delta would get bigger and bigger, also slowing down build time. So either way this solution sucks.
2) Second alternative that came to mind is using "rolling release containers" (my own terminology since I don't how how you'd call this), which basically hold all the game server data inside a dedicated volume and keep it constantly up-to-date. This volume would basically be used as "cache" for building the new image. The current content of the volume would simply need to be copied during build. Considering it is a dedicated Docker container just for this one task, we can be quite sure the data is valid. In the rare case that something goes wrong one can still restart the container and refill the volume from scratch.
So one downside is that data needs to be copied which takes time. The other downside is that volumes cannot simply be mounted during build time, so it would require a more complex mechanism than that. Lastly using volumes as "cache" for building images seems to be discouraged by the Docker developers -- I guess there a reasons for that but currently I would probably be forced to use this method.
To be honest, both approaches seem like rather overcomplicated workarounds for something that can be achieved in a simpler fashion.
BUT NOW I think with those new buildkit features I might be able to create such dedicated cache location! It would always be kind of recent (contrary to base images in multi stage) and also not have any of the downsides of the second solution (ready to be modified/updated; no need to copy files over; built-in native solution, etc).
Am I on the right track here or am I totally wrong?
My current Dockerfile is here if you wanna take a look, the game data gets pulled in line #27. Currently it uses none of the above methods as I am still looking for the best solution.