0% found this document useful (0 votes)

22 views6 pages

Building Docker Images - Adoc

This document provides best practices for building Docker images efficiently, focusing on speed and minimizing image size. Key topics include understanding Docker image layers, the importance of layer order, and techniques such as using a .dockerignore file and directory caching to optimize builds. It emphasizes the need for reproducibility in builds and offers practical examples to illustrate these concepts.

Uploaded by

Suresh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views6 pages

Building Docker Images - Adoc

Uploaded by

Suresh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 6

= Building Docker Images - Best Practices

Marco Behler
2024-06-28
:page-layout: layout-guides
:page-image: "/images/guides/undraw_takeout_boxes_ap54.png"
:page-description: Tips & Tricks to build Docker images in the fastest amount of
time and with the smallest possible size.
:page-published: true
:page-tags: ["docker", "docker images", "docker tips"]
:page-commento_id: /guides/docker-images

== What are we trying to understand?

Whenever you're building Docker images, say, you want to bake your Java/Node/Python
application into one, you'll be confronted with the following two questions:

* How can I make the `_docker build_` command run as fast as possible?
* How can I make sure that the resulting Docker image is as small as possible?

You will want to continue reading for answers to these questions.

== Docker Image Layers 101

Take a look at the following `_Dockerfile_`.

[source,dockerfile]
----
FROM eclipse-temurin:17-jdk
ARG JAR_FILE=build/libs/*.jar
COPY ${JAR_FILE} app.jar
ENTRYPOINT ["java","-jar","/app.jar"]
----

By running `_docker build -t myapp ._` on this Dockerfile, you will get (one)
Docker image, which will be based on a Java 17 (Eclipse-Temurin) image, as well as
contain and run our Java application (the app.jar file).

What might not immediately be obvious, is that _every single line_ from your Docker
line, will result in the creation of _one_ Docker image *layer* - every image
consists of several such layers.

You can confirm this by running e.g.:

[source,console]
----
docker image history myapp
----

Which will return the image layers on new lines:

[source,console]
----
IMAGE CREATED CREATED BY
SIZE COMMENT
3ca5a60826f0 8 minutes ago ENTRYPOINT ["java" "-jar" "/app.jar"] 0B
buildkit.dockerfile.v0
<missing> 8 minutes ago COPY build/libs/*.jar app.jar # buildkit
19.7MB buildkit.dockerfile.v0
<missing> 8 minutes ago ARG JAR_FILE=build/libs/*.jar 0B
buildkit.dockerfile.v0
... (other layers from the base image left out)
----
There is a layer for our `_ENTRYPOINT_` line, one for `_COPY_` and one for `_ARG_`.

The layer containing our `_app.jar_` file (`_COPY_`) is roughly 20MB large, with 0B
metadata layers for the `_ENTRYPOINT_` and `_ARG_` lines.

Now, what do we do with this information?

== Your layers can easily bloat

Imagine you want to install a package through your package manager, and for that,
you want to run `_apt update_`, which updates the package manager's index.

[source,dockerfile]
----
FROM eclipse-temurin:17-jdk
RUN apt update -y
ARG JAR_FILE=build/libs/*.jar
COPY ${JAR_FILE} app.jar
ENTRYPOINT ["java","-jar","/app.jar"]
----

Let's have a look at the resulting layers (`_docker image history myapp_`) and
focus on the very last line (`_RUN /bin/sh -c_...`):

[source,console]
----
IMAGE CREATED CREATED BY SIZE
COMMENT
c14a18a04751 8 seconds ago ENTRYPOINT ["java" "-jar" "/app.jar"] 0B
buildkit.dockerfile.v0
<missing> 8 seconds ago COPY build/libs/*.jar app.jar # buildkit
19.7MB buildkit.dockerfile.v0
<missing> 8 seconds ago ARG JAR_FILE=build/libs/*.jar 0B
buildkit.dockerfile.v0
<missing> 8 seconds ago RUN /bin/sh -c apt update -y # buildkit
45.7MB buildkit.dockerfile.v0
----

Wooha! Running `_apt-update_` has added a new layer with a whooping 45.7MB to our
resulting Docker image. Now every time you push or pull your image, you'll need to
transfer those additional megabytes.

== Layers are additive

Let's continue with the example above and add a couple more run commands, to
install the latest mysql package.

[source,dockerfile]
----
FROM eclipse-temurin:17-jdk
RUN apt update -y
RUN apt install mysql -y
RUN rm -rf /var/lib/apt/lists/*
ARG JAR_FILE=build/libs/*.jar
COPY ${JAR_FILE} app.jar
ENTRYPOINT ["java","-jar","/app.jar"]
----

In addition, we're removing the apt index cache (the 45.7MB from above) with the
`_rm -rf /var/lib/apt/lists/*_` command. Let's see what our image history now looks
like:

[source,console]
----
59f82a5b4c5a 6 seconds ago ENTRYPOINT ["java" "-jar" "/app.jar"] 0B
buildkit.dockerfile.v0
<missing> 6 seconds ago COPY build/libs/*.jar app.jar # buildkit
19.7MB buildkit.dockerfile.v0
<missing> 6 seconds ago ARG JAR_FILE=build/libs/*.jar 0B
buildkit.dockerfile.v0
<missing> 6 seconds ago RUN /bin/sh -c rm -rf /var/lib/apt/lists/* #… 0B
buildkit.dockerfile.v0
<missing> 7 seconds ago RUN /bin/sh -c apt install -y mysql-server #…
605MB buildkit.dockerfile.v0
<missing> 8 minutes ago RUN /bin/sh -c apt update -y # buildkit
45.7MB buildkit.dockerfile.v0
----

Waah, what's that? Even though we deleted the apt cache files, the 45.7MB layer is
still there (in addition to the 605MB MySQL layer, btw).

That's because layers are strictly _additive / immutable_. You can surely delete
those files from your current layer, but the older/previous layers will still
contain them.

How can you get around this? A simple workaround would be to run all three `_RUN_`
commands on a single line (== a single resulting layer)

[source,dockerfile]
----
FROM eclipse-temurin:17-jdk
RUN apt update -y && \
apt install -y mysql-server && \
rm -rf /var/lib/apt/lists/*
ARG JAR_FILE=build/libs/*.jar
COPY ${JAR_FILE} app.jar
ENTRYPOINT ["java","-jar","/app.jar"]
----

Let's look at the image's history now:

[source,console]
----
IMAGE CREATED CREATED BY
SIZE COMMENT
4b8c0f7f895a 14 seconds ago ENTRYPOINT ["java" "-jar" "/app.jar"] 0B
buildkit.dockerfile.v0
<missing> 14 seconds ago COPY build/libs/*.jar app.jar # buildkit
19.7MB buildkit.dockerfile.v0
<missing> 14 seconds ago ARG JAR_FILE=build/libs/*.jar 0B
buildkit.dockerfile.v0
<missing> 14 seconds ago RUN /bin/sh -c apt update -y && apt ins…
605MB buildkit.dockerfile.v0
----

Ha! We at least saved the 45.7MB for now. What else is wrong with this, though?

== Make it reproducible

You ideally want your builds to be reproducible (who would have thought). By
running `_apt update_` and then installing whatever latest package there is in the
repo, you effectively break that reproducibility, because package versions might
change between builds.

The gist:

* Install only specific versions of whatever you are trying to install

* _Avoid_ (package-manager-of-your-choice)'ing in your Dockerfiles for your
application in the first place - instead, build a new base image and use that in
your Dockerfile's `_FROM_`. This will also be a lot faster!

== Layer order matters

You'll want to make sure to put layers that change a lot towards the bottom of your
`_Dockerfile_`, whereas more stable layers should be ordered on top.

Why? Because when building images, you'll need to rebuild _every_ layer starting
from the layer(s) that changed between builds.

A practical example: Imagine that you want to package an `_index.html_` file into
your image, which changes _a lot_, i.e. more often than anything else.

[source,Dockerfile]
----
FROM eclipse-temurin:17-jdk
COPY index.html index.html
RUN apt update -y && \
apt install -y mysql-server && \
rm -rf /var/lib/apt/lists/*
ARG JAR_FILE=build/libs/*.jar
COPY ${JAR_FILE} app.jar
ENTRYPOINT ["java","-jar","/app.jar"]
----

You can see the `_COPY index.html index.html_` line added almost at the top of the
`_Dockerfile_`. Now, *every time* the index.html file changes, you'll need to
rebuild all subsequent layers, i.e. the `_RUN apt-update, ARG & COPY app.jar`
layers - a huge time sink. On my machine, all of the above takes roughly 17 seconds
to finish.

If, however, you re-order the statement towards the bottom, Docker can re-use all
previous layers, as they haven't changed.

[source,Dockerfile]
----
FROM eclipse-temurin:17-jdk
RUN apt update -y && \
apt install -y mysql-server && \
rm -rf /var/lib/apt/lists/*
ARG JAR_FILE=build/libs/*.jar
COPY ${JAR_FILE} app.jar
COPY index.html index.html
ENTRYPOINT ["java","-jar","/app.jar"]
----

Now a new `_docker build_` only takes, 0.5 seconds (on my machine), much much
better!

Here are the golden layering rules:

* Files that rarely change or are time/network-intensive (e.g. installing new

software) -> Top
* Files that change often (e.g. source code) -> Very Low
* ENV, CMD, etc -> Bottom

== When does Docker re-build layers?

Docker doesn't always rebuild all image layers, whenever you run `_docker build_`.
There is a specific set of rules,on when and how Docker will cache your layers and
you can read about them in the
https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#leverage-
build-cache[official documentation].

The gist is, whenever you run Docker build, Docker will:

* Either check the commands in the Dockerfile for changes (e.g. did you change
`_RUN blah_` to `_RUN doh_`).
* Did any of the involved files (or rather their checksums), in the case of `_ADD_`
or `_COPY_`, change?

== .dockerignore

When you run `_docker build -t <tag> ._`, the `_._`, your current directory, will
actually be your so-called `_build context_`. Meaning all the files inside your
current directory will be tar'ed up and sent to your local or remote Docker daemon
to perform the build.

If you want to make sure that some directories never make it to your build daemon,
thus keeping things snappy and small, you can create a `_.dockerignore_` file,
which has a similar syntax to `_.gitignore_`.

In general, you should put any files/directories that are not relevant to your
build here (e.g. your `_.git folder_`), which is especially important when using
commands like `_COPY . /somewhere_`, because then your entire project will end up
in the resulting image.

An npm example: You might want to run e.g. `_npm install_` during build time and
let it download its dependencies, instead of (slowly) copying your `_node_modules_`
folder in, so that would also make a good candidate for the dockerignore file.
However, if you do that, here's another trick you'd want to know about: directory
caching.

== Directory Caching

Say you run `_npm install_`, `_pip install_` `_gradlew build_` etc. to build your
image. This will lead to dependencies being downloaded and a new image layer being
created. Now, if that image layer has to be rebuilt, _all_ dependencies will be re-
downloaded on the next build, because there won't be a `_.npm_`, `_.cache_` or
`_.gradle_` folder available with the already downloaded dependencies.
But you can change that! Let's take `_pip_` as an example and change the following
line:

[source,Dockerfile]
----
FROM ...
RUN pip install -r requirements.txt
CMD ...
----

to:

[source,Dockerfile]
----
RUN --mount=type=cache,target=/root/.cache pip install -r requirements.txt
----

This will tell Docker to mount a caching layer/folder (`_/root/.cache_`) into the
container during build time - in this case, the folder that pip caches its
dependencies in, for the root user. The trick is: this folder will not end up in
the resulting image, but/and will be available to pip in all subsequent builds -
and you'll get a nice speed up!

The same goes for NPM, Gradle, or any other package manager out there. Just make
sure to specify the correct target folder.

== What are multistage builds?

Coming Soon.

== Fin

This article should have given you a good grasp of Docker image fundamentals. If
you have any questions or other comments, please post them in the comment section
below.

== Video

If you'd like to see this article as a video instead, have a look here:

mb_youtube::JcGwgNMZc_E[]

== Acknowledgments & References

Thanks to Maarten Balliauw, Andreas Eisele for comments/corrections/discussion.

Dockerfile
No ratings yet
Dockerfile
7 pages
Image-Building Best Practices Docker Docs
No ratings yet
Image-Building Best Practices Docker Docs
7 pages
Chapter 3
No ratings yet
Chapter 3
29 pages
Docker 1703562629
No ratings yet
Docker 1703562629
14 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Minimal Docker Image Tips
No ratings yet
Minimal Docker Image Tips
11 pages
Docker
No ratings yet
Docker
14 pages
3-Docker For Spring Boot Developer Sept 24
No ratings yet
3-Docker For Spring Boot Developer Sept 24
21 pages
1-Docker Cheetsheet With Spring Boot Jan 25
No ratings yet
1-Docker Cheetsheet With Spring Boot Jan 25
20 pages
Docker Optimization for DevOps Pros
No ratings yet
Docker Optimization for DevOps Pros
5 pages
How To Debug and Fix Common Docker Issues
No ratings yet
How To Debug and Fix Common Docker Issues
17 pages
08 Harvard Containers
No ratings yet
08 Harvard Containers
44 pages
Incremental Docker Builds For Monolithic Codebases Claudio Jolowicz
No ratings yet
Incremental Docker Builds For Monolithic Codebases Claudio Jolowicz
7 pages
Containers With Docker
No ratings yet
Containers With Docker
59 pages
Docker by Example Using A Visual Approach
No ratings yet
Docker by Example Using A Visual Approach
160 pages
How To Minimize Docker Image Size
No ratings yet
How To Minimize Docker Image Size
4 pages
Where Are Docker Images Stored
No ratings yet
Where Are Docker Images Stored
5 pages
Aws Interview Questions 1734772145
No ratings yet
Aws Interview Questions 1734772145
6 pages
DCA - Section 2 Image Management
No ratings yet
DCA - Section 2 Image Management
14 pages
Dockerfile'S Based On Different Scenarios: Scenario 1: Multi-Stage Build For Optimized Image Size
No ratings yet
Dockerfile'S Based On Different Scenarios: Scenario 1: Multi-Stage Build For Optimized Image Size
17 pages
GFFG
No ratings yet
GFFG
35 pages
Docker Slides
100% (1)
Docker Slides
114 pages
Docker Image Networking
No ratings yet
Docker Image Networking
27 pages
Docker Datacamp Chapter3
No ratings yet
Docker Datacamp Chapter3
35 pages
ch09 Docker-Introduction
No ratings yet
ch09 Docker-Introduction
83 pages
Docker Evolution & Commands Guide
100% (1)
Docker Evolution & Commands Guide
92 pages
Docker - Interview Q&A
No ratings yet
Docker - Interview Q&A
9 pages
Docker Image Basics
No ratings yet
Docker Image Basics
7 pages
Task 2 Docker Setup v4
No ratings yet
Task 2 Docker Setup v4
15 pages
Docker Interview
No ratings yet
Docker Interview
7 pages
Dockerfile Image Creation Guide
No ratings yet
Dockerfile Image Creation Guide
17 pages
Docer
No ratings yet
Docer
44 pages
Docker Command Cheat Sheet
No ratings yet
Docker Command Cheat Sheet
1 page
7 k8s Docker
No ratings yet
7 k8s Docker
261 pages
(CC - 23) Lab 6
No ratings yet
(CC - 23) Lab 6
35 pages
The Docker Book
No ratings yet
The Docker Book
80 pages
15.2 Notes
No ratings yet
15.2 Notes
16 pages
04 - Multi-Stage Dockerfiles
No ratings yet
04 - Multi-Stage Dockerfiles
18 pages
How To Write Production-Ready Dockerfiles
100% (1)
How To Write Production-Ready Dockerfiles
33 pages
How To Reduced The Size of My Docker Image 95 Percent
No ratings yet
How To Reduced The Size of My Docker Image 95 Percent
2 pages
Docker实战和基本原理张晋涛
No ratings yet
Docker实战和基本原理张晋涛
53 pages
Docker Commands for Developers
No ratings yet
Docker Commands for Developers
14 pages
Docker Basics and Commands Guide
No ratings yet
Docker Basics and Commands Guide
16 pages
Docker Gateways Compress
No ratings yet
Docker Gateways Compress
71 pages
Docker Fundamentals
100% (9)
Docker Fundamentals
202 pages
Notesk
No ratings yet
Notesk
16 pages
Docker Commands Quick Guide
No ratings yet
Docker Commands Quick Guide
3 pages
Docker Training Pt.1
No ratings yet
Docker Training Pt.1
15 pages
Docker Anti Patterns
No ratings yet
Docker Anti Patterns
24 pages
Docker Certified Associate Part2 Image Creation Management and Registry
No ratings yet
Docker Certified Associate Part2 Image Creation Management and Registry
21 pages
Buildah Cheat Sheet Red Hat Developer
No ratings yet
Buildah Cheat Sheet Red Hat Developer
9 pages
Docker
No ratings yet
Docker
38 pages
DOCKER and KUBERNETES Training
No ratings yet
DOCKER and KUBERNETES Training
25 pages
Docker Basics
No ratings yet
Docker Basics
37 pages
Js Intro Presentation
No ratings yet
Js Intro Presentation
5 pages
Create-React-App Steps
No ratings yet
Create-React-App Steps
3 pages
2-Lab Assignment Spring Boot Framework-Assignment
No ratings yet
2-Lab Assignment Spring Boot Framework-Assignment
8 pages
UDDI
No ratings yet
UDDI
4 pages
Proxy
No ratings yet
Proxy
4 pages
Observer
No ratings yet
Observer
7 pages
New Text Document
No ratings yet
New Text Document
3 pages
Spring4 Spring MVC
No ratings yet
Spring4 Spring MVC
25 pages
Casestudyforpractice Java8features Day2
No ratings yet
Casestudyforpractice Java8features Day2
3 pages
WSDL
No ratings yet
WSDL
8 pages
ClassBook-Lesson00-JPA With Hibernate 3.0
No ratings yet
ClassBook-Lesson00-JPA With Hibernate 3.0
9 pages
App1 Stepbystep-Django
No ratings yet
App1 Stepbystep-Django
15 pages
Amines DPP 10 Solutions
No ratings yet
Amines DPP 10 Solutions
4 pages
Java InterviewQuestion LinkedIn JavaCommunity
No ratings yet
Java InterviewQuestion LinkedIn JavaCommunity
45 pages
Spring Angular Secruity Integration 20 Jan 21
No ratings yet
Spring Angular Secruity Integration 20 Jan 21
36 pages
Java Basics for Beginners
No ratings yet
Java Basics for Beginners
52 pages
Unique Properties of P Block Elements Elias Lectures 2 August 2016
No ratings yet
Unique Properties of P Block Elements Elias Lectures 2 August 2016
37 pages
Steps ApplicationManagedEntityManager
No ratings yet
Steps ApplicationManagedEntityManager
4 pages
PMHEFT
No ratings yet
PMHEFT
2 pages
NEET Part Test-3 Answer Key
No ratings yet
NEET Part Test-3 Answer Key
6 pages
THEORY of C
No ratings yet
THEORY of C
56 pages
Databuildtoolpdf 220704 142715
No ratings yet
Databuildtoolpdf 220704 142715
39 pages
(E Source) Chapra, Steven C - Introduction To VBA For Excel (2009 - 2010, Pearson - Prentice Hall) - Libgen - Li
100% (4)
(E Source) Chapra, Steven C - Introduction To VBA For Excel (2009 - 2010, Pearson - Prentice Hall) - Libgen - Li
210 pages
NX Server Manager Installation and Configuration Guide
No ratings yet
NX Server Manager Installation and Configuration Guide
18 pages
Adding Substitution Field - GGB1: Sanil K Bhandari
No ratings yet
Adding Substitution Field - GGB1: Sanil K Bhandari
3 pages
Module 13 14 15 16
No ratings yet
Module 13 14 15 16
132 pages
A Code Complexity Model of Object Oriented Programming (OOP)
No ratings yet
A Code Complexity Model of Object Oriented Programming (OOP)
5 pages
Android SettingsProvider Logs
No ratings yet
Android SettingsProvider Logs
11 pages
BLS Auto All Buuy
50% (2)
BLS Auto All Buuy
14 pages
Python Modules for Beginners
100% (1)
Python Modules for Beginners
41 pages
Mastering VB6 Database
No ratings yet
Mastering VB6 Database
69 pages
WIA2002 Software Modeling: Assoc. Prof. Dr. Siti Hafizah Ab Hamid B-3-12, FCSIT Sitihafizah@um - Edu.my
No ratings yet
WIA2002 Software Modeling: Assoc. Prof. Dr. Siti Hafizah Ab Hamid B-3-12, FCSIT Sitihafizah@um - Edu.my
31 pages
Cwe v1.8.1
No ratings yet
Cwe v1.8.1
1,019 pages
Mad Lab Questions
No ratings yet
Mad Lab Questions
46 pages
Test
No ratings yet
Test
2 pages
Oop ST
No ratings yet
Oop ST
10 pages
Aleena .Net Developer
No ratings yet
Aleena .Net Developer
6 pages
Cloud ERP vs. On-Premise ERP: Add Comment
No ratings yet
Cloud ERP vs. On-Premise ERP: Add Comment
4 pages
CS2311-oops EEE
No ratings yet
CS2311-oops EEE
189 pages
OS PracticalSlipsQues
No ratings yet
OS PracticalSlipsQues
72 pages
Data Science Course Content
No ratings yet
Data Science Course Content
2 pages
Switch Statement in C-Programming
No ratings yet
Switch Statement in C-Programming
9 pages
CHAPTER 5 Review Question
No ratings yet
CHAPTER 5 Review Question
9 pages
Distribution Model
100% (1)
Distribution Model
24 pages
reMARK Technical Description
No ratings yet
reMARK Technical Description
7 pages
Custom Material Master Data Fields
No ratings yet
Custom Material Master Data Fields
9 pages
GitHub & Version Control Basics
No ratings yet
GitHub & Version Control Basics
37 pages
Jonathan Ma Resume
100% (2)
Jonathan Ma Resume
2 pages
Fit 1008 MST-Solution
No ratings yet
Fit 1008 MST-Solution
6 pages
Cloud Computing Unit 1
No ratings yet
Cloud Computing Unit 1
32 pages

Building Docker Images - Adoc

Uploaded by

Building Docker Images - Adoc

Uploaded by

= Building Docker Images - Best Practices

== What are we trying to understand?

You will want to continue reading for answers to these questions.

== Docker Image Layers 101

Take a look at the following `_Dockerfile_`.

You can confirm this by running e.g.:

Which will return the image layers on new lines:

Now, what do we do with this information?

== Your layers can easily bloat

== Layers are additive

Let's look at the image's history now:

* Install only specific versions of whatever you are trying to install

== Layer order matters

Here are the golden layering rules:

* Files that rarely change or are time/network-intensive (e.g. installing new

== When does Docker re-build layers?

== What are multistage builds?

== Acknowledgments & References

Thanks to Maarten Balliauw, Andreas Eisele for comments/corrections/discussion.

You might also like