[go: up one dir, main page]

0% found this document useful (0 votes)
22 views6 pages

Building Docker Images - Adoc

This document provides best practices for building Docker images efficiently, focusing on speed and minimizing image size. Key topics include understanding Docker image layers, the importance of layer order, and techniques such as using a .dockerignore file and directory caching to optimize builds. It emphasizes the need for reproducibility in builds and offers practical examples to illustrate these concepts.

Uploaded by

Suresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views6 pages

Building Docker Images - Adoc

This document provides best practices for building Docker images efficiently, focusing on speed and minimizing image size. Key topics include understanding Docker image layers, the importance of layer order, and techniques such as using a .dockerignore file and directory caching to optimize builds. It emphasizes the need for reproducibility in builds and offers practical examples to illustrate these concepts.

Uploaded by

Suresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 6

= Building Docker Images - Best Practices

Marco Behler
2024-06-28
:page-layout: layout-guides
:page-image: "/images/guides/undraw_takeout_boxes_ap54.png"
:page-description: Tips & Tricks to build Docker images in the fastest amount of
time and with the smallest possible size.
:page-published: true
:page-tags: ["docker", "docker images", "docker tips"]
:page-commento_id: /guides/docker-images

== What are we trying to understand?

Whenever you're building Docker images, say, you want to bake your Java/Node/Python
application into one, you'll be confronted with the following two questions:

* How can I make the `_docker build_` command run as fast as possible?
* How can I make sure that the resulting Docker image is as small as possible?

You will want to continue reading for answers to these questions.

== Docker Image Layers 101

Take a look at the following `_Dockerfile_`.

[source,dockerfile]
----
FROM eclipse-temurin:17-jdk
ARG JAR_FILE=build/libs/*.jar
COPY ${JAR_FILE} app.jar
ENTRYPOINT ["java","-jar","/app.jar"]
----

By running `_docker build -t myapp ._` on this Dockerfile, you will get (one)
Docker image, which will be based on a Java 17 (Eclipse-Temurin) image, as well as
contain and run our Java application (the app.jar file).

What might not immediately be obvious, is that _every single line_ from your Docker
line, will result in the creation of _one_ Docker image *layer* - every image
consists of several such layers.

You can confirm this by running e.g.:

[source,console]
----
docker image history myapp
----

Which will return the image layers on new lines:

[source,console]
----
IMAGE CREATED CREATED BY
SIZE COMMENT
3ca5a60826f0 8 minutes ago ENTRYPOINT ["java" "-jar" "/app.jar"] 0B
buildkit.dockerfile.v0
<missing> 8 minutes ago COPY build/libs/*.jar app.jar # buildkit
19.7MB buildkit.dockerfile.v0
<missing> 8 minutes ago ARG JAR_FILE=build/libs/*.jar 0B
buildkit.dockerfile.v0
... (other layers from the base image left out)
----
There is a layer for our `_ENTRYPOINT_` line, one for `_COPY_` and one for `_ARG_`.

The layer containing our `_app.jar_` file (`_COPY_`) is roughly 20MB large, with 0B
metadata layers for the `_ENTRYPOINT_` and `_ARG_` lines.

Now, what do we do with this information?

== Your layers can easily bloat

Imagine you want to install a package through your package manager, and for that,
you want to run `_apt update_`, which updates the package manager's index.

[source,dockerfile]
----
FROM eclipse-temurin:17-jdk
RUN apt update -y
ARG JAR_FILE=build/libs/*.jar
COPY ${JAR_FILE} app.jar
ENTRYPOINT ["java","-jar","/app.jar"]
----

Let's have a look at the resulting layers (`_docker image history myapp_`) and
focus on the very last line (`_RUN /bin/sh -c_...`):

[source,console]
----
IMAGE CREATED CREATED BY SIZE
COMMENT
c14a18a04751 8 seconds ago ENTRYPOINT ["java" "-jar" "/app.jar"] 0B
buildkit.dockerfile.v0
<missing> 8 seconds ago COPY build/libs/*.jar app.jar # buildkit
19.7MB buildkit.dockerfile.v0
<missing> 8 seconds ago ARG JAR_FILE=build/libs/*.jar 0B
buildkit.dockerfile.v0
<missing> 8 seconds ago RUN /bin/sh -c apt update -y # buildkit
45.7MB buildkit.dockerfile.v0
----

Wooha! Running `_apt-update_` has added a new layer with a whooping 45.7MB to our
resulting Docker image. Now every time you push or pull your image, you'll need to
transfer those additional megabytes.

== Layers are additive

Let's continue with the example above and add a couple more run commands, to
install the latest mysql package.

[source,dockerfile]
----
FROM eclipse-temurin:17-jdk
RUN apt update -y
RUN apt install mysql -y
RUN rm -rf /var/lib/apt/lists/*
ARG JAR_FILE=build/libs/*.jar
COPY ${JAR_FILE} app.jar
ENTRYPOINT ["java","-jar","/app.jar"]
----

In addition, we're removing the apt index cache (the 45.7MB from above) with the
`_rm -rf /var/lib/apt/lists/*_` command. Let's see what our image history now looks
like:

[source,console]
----
59f82a5b4c5a 6 seconds ago ENTRYPOINT ["java" "-jar" "/app.jar"] 0B
buildkit.dockerfile.v0
<missing> 6 seconds ago COPY build/libs/*.jar app.jar # buildkit
19.7MB buildkit.dockerfile.v0
<missing> 6 seconds ago ARG JAR_FILE=build/libs/*.jar 0B
buildkit.dockerfile.v0
<missing> 6 seconds ago RUN /bin/sh -c rm -rf /var/lib/apt/lists/* #… 0B
buildkit.dockerfile.v0
<missing> 7 seconds ago RUN /bin/sh -c apt install -y mysql-server #…
605MB buildkit.dockerfile.v0
<missing> 8 minutes ago RUN /bin/sh -c apt update -y # buildkit
45.7MB buildkit.dockerfile.v0
----

Waah, what's that? Even though we deleted the apt cache files, the 45.7MB layer is
still there (in addition to the 605MB MySQL layer, btw).

That's because layers are strictly _additive / immutable_. You can surely delete
those files from your current layer, but the older/previous layers will still
contain them.

How can you get around this? A simple workaround would be to run all three `_RUN_`
commands on a single line (== a single resulting layer)

[source,dockerfile]
----
FROM eclipse-temurin:17-jdk
RUN apt update -y && \
apt install -y mysql-server && \
rm -rf /var/lib/apt/lists/*
ARG JAR_FILE=build/libs/*.jar
COPY ${JAR_FILE} app.jar
ENTRYPOINT ["java","-jar","/app.jar"]
----

Let's look at the image's history now:

[source,console]
----
IMAGE CREATED CREATED BY
SIZE COMMENT
4b8c0f7f895a 14 seconds ago ENTRYPOINT ["java" "-jar" "/app.jar"] 0B
buildkit.dockerfile.v0
<missing> 14 seconds ago COPY build/libs/*.jar app.jar # buildkit
19.7MB buildkit.dockerfile.v0
<missing> 14 seconds ago ARG JAR_FILE=build/libs/*.jar 0B
buildkit.dockerfile.v0
<missing> 14 seconds ago RUN /bin/sh -c apt update -y && apt ins…
605MB buildkit.dockerfile.v0
----

Ha! We at least saved the 45.7MB for now. What else is wrong with this, though?

== Make it reproducible

You ideally want your builds to be reproducible (who would have thought). By
running `_apt update_` and then installing whatever latest package there is in the
repo, you effectively break that reproducibility, because package versions might
change between builds.

The gist:

* Install only specific versions of whatever you are trying to install


* _Avoid_ (package-manager-of-your-choice)'ing in your Dockerfiles for your
application in the first place - instead, build a new base image and use that in
your Dockerfile's `_FROM_`. This will also be a lot faster!

== Layer order matters

You'll want to make sure to put layers that change a lot towards the bottom of your
`_Dockerfile_`, whereas more stable layers should be ordered on top.

Why? Because when building images, you'll need to rebuild _every_ layer starting
from the layer(s) that changed between builds.

A practical example: Imagine that you want to package an `_index.html_` file into
your image, which changes _a lot_, i.e. more often than anything else.

[source,Dockerfile]
----
FROM eclipse-temurin:17-jdk
COPY index.html index.html
RUN apt update -y && \
apt install -y mysql-server && \
rm -rf /var/lib/apt/lists/*
ARG JAR_FILE=build/libs/*.jar
COPY ${JAR_FILE} app.jar
ENTRYPOINT ["java","-jar","/app.jar"]
----

You can see the `_COPY index.html index.html_` line added almost at the top of the
`_Dockerfile_`. Now, *every time* the index.html file changes, you'll need to
rebuild all subsequent layers, i.e. the `_RUN apt-update, ARG & COPY app.jar`
layers - a huge time sink. On my machine, all of the above takes roughly 17 seconds
to finish.

If, however, you re-order the statement towards the bottom, Docker can re-use all
previous layers, as they haven't changed.

[source,Dockerfile]
----
FROM eclipse-temurin:17-jdk
RUN apt update -y && \
apt install -y mysql-server && \
rm -rf /var/lib/apt/lists/*
ARG JAR_FILE=build/libs/*.jar
COPY ${JAR_FILE} app.jar
COPY index.html index.html
ENTRYPOINT ["java","-jar","/app.jar"]
----

Now a new `_docker build_` only takes, 0.5 seconds (on my machine), much much
better!

Here are the golden layering rules:

* Files that rarely change or are time/network-intensive (e.g. installing new


software) -> Top
* Files that change often (e.g. source code) -> Very Low
* ENV, CMD, etc -> Bottom

== When does Docker re-build layers?

Docker doesn't always rebuild all image layers, whenever you run `_docker build_`.
There is a specific set of rules,on when and how Docker will cache your layers and
you can read about them in the
https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#leverage-
build-cache[official documentation].

The gist is, whenever you run Docker build, Docker will:

* Either check the commands in the Dockerfile for changes (e.g. did you change
`_RUN blah_` to `_RUN doh_`).
* Did any of the involved files (or rather their checksums), in the case of `_ADD_`
or `_COPY_`, change?

== .dockerignore

When you run `_docker build -t <tag> ._`, the `_._`, your current directory, will
actually be your so-called `_build context_`. Meaning all the files inside your
current directory will be tar'ed up and sent to your local or remote Docker daemon
to perform the build.

If you want to make sure that some directories never make it to your build daemon,
thus keeping things snappy and small, you can create a `_.dockerignore_` file,
which has a similar syntax to `_.gitignore_`.

In general, you should put any files/directories that are not relevant to your
build here (e.g. your `_.git folder_`), which is especially important when using
commands like `_COPY . /somewhere_`, because then your entire project will end up
in the resulting image.

An npm example: You might want to run e.g. `_npm install_` during build time and
let it download its dependencies, instead of (slowly) copying your `_node_modules_`
folder in, so that would also make a good candidate for the dockerignore file.
However, if you do that, here's another trick you'd want to know about: directory
caching.

== Directory Caching

Say you run `_npm install_`, `_pip install_` `_gradlew build_` etc. to build your
image. This will lead to dependencies being downloaded and a new image layer being
created. Now, if that image layer has to be rebuilt, _all_ dependencies will be re-
downloaded on the next build, because there won't be a `_.npm_`, `_.cache_` or
`_.gradle_` folder available with the already downloaded dependencies.
But you can change that! Let's take `_pip_` as an example and change the following
line:

[source,Dockerfile]
----
FROM ...
RUN pip install -r requirements.txt
CMD ...
----

to:

[source,Dockerfile]
----
RUN --mount=type=cache,target=/root/.cache pip install -r requirements.txt
----

This will tell Docker to mount a caching layer/folder (`_/root/.cache_`) into the
container during build time - in this case, the folder that pip caches its
dependencies in, for the root user. The trick is: this folder will not end up in
the resulting image, but/and will be available to pip in all subsequent builds -
and you'll get a nice speed up!

The same goes for NPM, Gradle, or any other package manager out there. Just make
sure to specify the correct target folder.

== What are multistage builds?

Coming Soon.

== Fin

This article should have given you a good grasp of Docker image fundamentals. If
you have any questions or other comments, please post them in the comment section
below.

== Video

If you'd like to see this article as a video instead, have a look here:

mb_youtube::JcGwgNMZc_E[]

== Acknowledgments & References

Thanks to Maarten Balliauw, Andreas Eisele for comments/corrections/discussion.

You might also like