978-981-19-3026-3 - 7 A Usar 3
978-981-19-3026-3 - 7 A Usar 3
978-981-19-3026-3 - 7 A Usar 3
Container Technology
Fig. 7.1 The difference between a container (left) and a virtual machine (right)
other. The second is through the virtual machine image. However, the virtual
machine image is too large, replication and download are too time-consuming.
In order to solve the above problems, container technology has been proposed.
Drawing on traditional transportation solutions, it has been suggested that applica-
tions be packaged in a container-like manner (the dependencies required for an
application to run), i.e., to package any application and its dependencies into a
lightweight, portable, self-contained container. Isolate different processes running
on the host with a kind of virtualization technology that isolates and does not affect
each other between containers, containers, and host operating systems, enabling
applications to run in the same way anywhere. Developers take notes on their own.
Containers created and tested on this computer can run on virtual machines,
physical servers, or public cloud hosts of production systems without any
modifications.
1. Container and virtual machine
When it comes to containers, you have to compare them to virtual machines
because both provide encapsulation and isolation for your application.
Traditional virtualization technologies, such as VMware, KVM, and Xen, aim
to create complete virtual machines. In order to run the app, install the entire
operating system in addition to the app itself and its dependencies.
Containers consist of the app itself, and the IT resources on which the app
depends, such as the libraries or other applications that the app requires. Con-
tainers run in the host operating system's user space and are isolated from other
processes of the operating system, which is significantly different from virtual
machines. Figure 7.1 shows the difference between a container and a virtual
machine.
Figure 7.1 shows that because all containers share a host operating system, this
makes the container much smaller in size than the virtual machine. In addition,
boot containers do not need to start the entire operating system, so container
deployment and startup are faster, less expensive, and easier to migrate.
2. The evolution of containers
7.1 Overview of Container Technology 297
Container technology dates back to the chroot command in the UNIX operat-
ing system in 1979, originally intended to facilitate switching root directories,
providing isolation of file system resources for each process, which is also the
origin of the idea of operating system virtualization.
FreeBSD Jails was issued in 2000 based on the chroot command, which was
absorbed and improved by BSD. In addition to file system isolation, FreeBSD
Jails adds isolation of user and network resources, and each Jail can assign a
separate IP for some relatively separate software installation and configuration.
Linux VServer was released in 2001. Linux VServer continues the idea of
FreeBSD Jails, isolating resources such as file system, CPU time, network
address, and memory on an operating system, each partition is called a Security
Context, and the internal virtualization system is called VPS.
In 2004, Sun released Solaris Containers. Solaris Containers is released as a
feature in Solaris 10 and contains system resource control and binary isolation
provided by Zones, where Zones exists as a fully isolated virtual server within the
operating system instance.
In 2005, SWsoft released OpenVZ. OpenVZ is very similar to Solaris Con-
tainers in providing virtualization, isolation, resource management, and check-
points through patched Linux cores. OpenVZ marks the true mainstream of
kernel-level virtualization, followed by the addition of relevant technologies to
the kernel.
In 2006, Google released Process Containers. Process Containers recorded and
isolated each process's resources (including CPU, memory, hard disk I/O, net-
work, etc.), changed its name to Control Groups, and was added to Linux Kernel
2.6.24 in 2007.
In 2008, the first more complete container technology, Linux Container
(LXC), was available, based on Cgroups and Linux Namespaces implementations
added to the kernel. LXC does not need to be patched to run on Linux on any
vanilla kernel.
In 2011, CloudFoundry released Warden, and unlike LXC, Warden can work
on any operating system, run as a daemon, and provide an API to manage
containers.
In 2013, Google established an open source container technology stack.
Google started this project to enable high-performance, high-resource utilization,
and near-zero-cost virtualization technology through containers. Currently, the
monitoring tool cAdvisor in Kubernetes originated from the lmctfy project. In
2015, Google donated the core technology of lmctfy to libcontainer.
Docker was born in 2013. Docker was originally an internal project of
DotCloud, the predecessor of Docker, a PaaS company. Like Warden, Docker
initially used LXC and later replaced LXC with libcontainer. Unlike other
container technologies, Docker built a complete ecosystem around containers,
including container imaging standards, container Registry, REST APIs, CLI,
container cluster management tool Docker Swarm, and more.
298 7 Container Technology
(2) Config files: The hierarchical information of the file system (hash values
at each level, as well as historical information), as well as some informa-
tion (such as environment variables, working directories, command
parameters, mount lists) that is required for the container runtime, spec-
ifying the configuration of the image on a particular platform and system.
This is closer to what we see <image_id> using docker inspect.
(3) Manifest file: Image’s config file index, the manifest file holds a lot of
information about the current platform.
(4) Index file: Optional file that points to manifest files from different
platforms. This file guarantees that an image can be used across plat-
forms, each with a different manifest file, indexed using index.
4. Container scenarios
The birth of container technology solves the technical implementation of the
PaaS layer. Technologies such as OpenStack and CloudStack are primarily used
to solve problems at the IaaS layer. So what are the main scenarios in which
container technology is used? There are several mainstream applications at
present.
(1) Containerized traditional applications
Containers not only improve the security and portability of existing
applications, but also save money. Each enterprise environment has an
older set of applications to serve customers or automate business processes.
Even large-scale monomer applications benefit from container isolation by
enhancing security and portability, reducing costs. Once containerized, these
applications can scale additional services or transition to a microservices
architecture.
(2) Continuous integration and continuous deployment
Accelerate application pipeline automation and application deployment
with Docker. Data suggest that the use of Docker can increase delivery speeds
by more than 13 times. Modern development processes are fast, continuous,
and automated, with the ultimate goal of developing more reliable software.
With Continuous Integration (CI) and CONTINUOUS Deployment( CD), IT
teams can integrate new code every time a developer checks in and success-
fully tests it. As the basis for developing an operational approach, CI/CD
creates a real-time feedback loop mechanism that continuously transmits
small iterative changes, accelerating changes and improving quality. Ci
environments are typically fully automated, triggering tests with git push
commands, automatically building new images when successful, and then
pushing them to the Docker image library. With subsequent automation and
scripting, the new image's container can be deployed to the preview environ-
ment for further testing.
(3) Microservices
Use microservices to accelerate application architecture modernization.
The application architecture moves from a monomer code base with waterfall
300 7 Container Technology
Fig. 7.2 Download the hello-world image from the official Docker warehouse
The new image does not need to start from scratch, but is directly built on
the Debian base image, then install emacs and apache2, and finally set up the
bash image to run when the container starts. The construction process of the
new image is shown in Fig. 7.6.
As you can see, the new image is generated from a layer-by-layer overlay
of the base image. For each software installed, a layer is added to the existing
image. The most significant benefit of Docker's hierarchy is that resources can
be shared.
At this point, someone might ask, if multiple containers share a base
image, when one container modifies the contents of the base image, will the
contents of the other containers be modified? If the answer is no, the modi-
fication is limited to a single container. This is known as the COW charac-
teristics of containers. When the container starts, a new writeable layer is
added to the top of the image, which is called the container layer, and the
underside of the container layer is called the image layer. All changes to the
container, whether added, deleted, or modified, occur only in the container
layer, only the container layer is writeable, and all image layers below
the container layer are read-only. It is visible that the container layer holds
the part of the image that changes and does not make any modifications to the
image itself.
2. Image construction
For Docker users, the best situation is that you do not need to create an image
yourself. Commonly used databases, middleware, software, etc. have ready-made
official Docker images or images created by other people and organizations, and
we can use them directly with a little configuration. The benefits of using ready-
made images are not only saving the workload of doing images yourself, but more
7.1 Overview of Container Technology 303
Fig. 7.7 Building a new image through the docker commit command
importantly, you can use the experience of predecessors, especially those official
images, because Docker engineers know how to run software in containers better.
Of course, in some cases we have to build the image ourselves, for example:
① Cannot find a ready-made image, such as software developed by oneself.
② Specific functions need to be added to the image. For example, the official
image does not provide SSH.
Docker provides two ways to build images: docker commit command and
Dockerfile build file.
The docker commit command is the most intuitive way to build a new image,
and its process consists of three steps. The following is an example of installing
Vim in the Ubuntu base image and saving it as a new image to illustrate how to
build a new image through the docker commit command, as shown in Fig. 7.7.
① Run the container. The function of the -it parameter is to enter the container
in interactive mode and open the terminal.
② Install Vim. First confirm that Vim is not installed, and then execute the
installation command.
③ Save as a new image, you can use the docker ps command to view the
containers running in the current environment in a new window. Silly-Goldberg
is the name randomly assigned by Docker for our new container. Execute the
docker commit command to save the container as a new image and rename it to
ubuntu-with-vim.
304 7 Container Technology
The above steps demonstrate how to build a new image through the docker
commit command. However, due to the considerations of error-prone manual
creation, low efficiency, weak repeatability, and security, this method is not the
preferred method officially recommended by Docker.
Dockerfile is another way to build an image. It is a text file that records all the
steps of image building. Similarly, we take the ubuntu-with-vim image in the
previous article to illustrate how to build a new image through this method.
To build a new image with Dockerfile, you first need to create a Dockerfile,
whose content is shown in Fig. 7.8.
① The current directory is/root.
② Dockerfile is ready.
③ Run the Docker build command, it will name the new image ubuntu-with-
vim-dockerfile, and the “.” at the end of the command indicates that the build
context is the current directory. Docker will find the Dockerfile from the build
context by default, and we can also specify the location of the Dockerfile through
the -f parameter.
④ Starting from this step is the real construction process of the image. First,
Docker sends all the files in the build context to the Docker daemon, and the build
context provides the files or directories needed for image building.
The ADD, COPY, and other commands in the Dockerfile can add files in the
build context to the image. In this example, the build context is the current
directory/root, and all files and subdirectories in this directory will be sent to
7.1 Overview of Container Technology 305
the Docker daemon. Therefore, you have to be careful when using the build
context, do not put extra files in the build context, and be careful not to use / and /
usr as the build context; otherwise, the build process will be quite slow or
even fail.
⑤ Step 1: Execute FROM and use Ubuntu as the base image. The Ubuntu
image ID is f753707788c5.
⑥ Step 2: Execute RUN and install Vim, the specific steps are ⑦⑧⑨.
⑦ Start the temporary container with ID 9f4d4166f7e3 and install Vim in the
container via apt-get.
⑧ After the installation is successful, save the container as an image with the
ID 350a89798937. The bottom layer of this step uses commands similar to docker
commit.
⑨ Delete the temporary container with ID 9f4d4166f7e3.
⑩ The image is successfully built.
In addition, it needs to be specially pointed out that Docker will cache the
image layer of the existing image when building the image. When building a new
image, if a certain image layer already exists, it will be used directly without
re-creating it. This is called the caching feature of Docker images.
3. Image management and distribution
We have learned to build our image, and then we will talk about how to use the
image on multiple Docker hosts. There are several methods you can use:
① Use the same Dockerfile to build images on other hosts.
② Upload the image to the registry, such as Docker Hub, and the host can be
directly downloaded and used.
③ Build a private repository for the local host to use.
The first method is to rebuild an image through the Dockerfile described in the
previous article. The following focuses on how to distribute images through a
public/private registry.
Regardless of the method used to save and distribute the image, you must first
name the image. When we execute the docker build command, we have given the
image a name, such as docker build –t ubuntu-with-vim, where ubuntu-with-vim
is the name of the image.
The most straightforward way to save and distribute images is to use Docker
Hub. Docker Hub is a public registry maintained by Docker. Users can save their
images in the free repository of Docker Hub. If you don't want others to access
your image, you can also buy a private repository. In addition to Docker Hub,
quay.io is another public registry that provides similar services to Docker Hub.
The following describes how to use Docker Hub to access the image.
① First, you need to register an account on Docker Hub.
② Use the command docker login -u xx to log in on the Docker host. xx is the
username, you can log in successfully after entering the password.
③ Modify the image repository to match the Docker Hub account. In order to
distinguish images with the same name from different users, the Docker Hub
must include the username in the registry of the image, and the complete format is
[username/xxx]:[tag]. We rename the image through the docker tag command.
306 7 Container Technology
Docker’s official image maintained by itself does not have a username, such as httpd.
④ Upload the image to Docker Hub via docker push. Docker will upload each
layer of the image. If this image is consistent with an official image, all the image
layers are on the Docker Hub. Then there is very little data actually uploaded.
Similarly, if our image is based on the base image, only the newly added image
layer will be uploaded. If you want to upload all the images in the same
repository, just omit the tag part, such as Docker push cloudman6/httpd.
Although Docker Hub is very convenient, it still has some limitations, such as
an Internet connection and slow download and upload speeds. Anyone can access
the images uploaded to Docker Hub. Although a private repository can be used, it
is not free. For security reasons, many organizations do not allow images to be
placed on the extranet.
The solution is to build a local registry. Docker has open sourced the registry,
and there is also an official image registry on Docker Hub. Next, we will run our
registry in Docker.
(1) Start the registry container
The image we started is registry: 2, as shown in Fig. 7.9.
-d:Start the container in the background.
7.1 Overview of Container Technology 307
-p:Map port 5000 of the container to port 5000 of the host, where 5000 is
the registry service port. Port mapping will be discussed in detail in
Sect. 7.1.3.
-v:Map the container /var/lib/registry directory to the host's /myregistry to
store image data. The use of -v will be discussed in detail in Sect. 7.1.4.
Use the docker tag command to rename the image to match the registry, as
shown in Fig. 7.10.
We added the name and port of the host running the registry to the front of
the image.
The complete format of the repository is [registry-host]:[port]/[username]/
xxx.
Only the mirror on Docker Hub can omit [registry-host]:[port].
(2) Upload image via docker push
Upload the image to the mirror warehouse via docker push, as shown in
Fig. 7.11.
Now the image can be downloaded from the local registry through docker
pull, as shown in Fig. 7.12.
This is a closed network. Some applications that require high security and
do not require networking can use the None network.
(2) Host network
Containers connected to the Host network share the Docker Host network
stack, and the network configuration of the container is the same as that of the
host. You can specify the use of the Host network through -network-host, as
shown in Fig. 7.15.
You can see all the host's network cards in the container, and even the
hostname is also the host. The biggest advantage of using the Docker Host
network directly is performance. If the container has higher requirements for
network transmission efficiency, you can choose the Host network. Of
course, the inconvenience is to sacrifice some flexibility. For example, to
consider port conflicts, the ports already used on Docker Host can no longer
be used.
(3) Bridge network
When Docker is installed, a Linux bridge named “docker0” is created. If
you do not specify --network, the created container will be hung on docker0
by default, as shown in Fig. 7.16.
In addition to the three automatically created networks of None, Host, and
Bridge, users can also create user-defined networks according to business
needs. Docker provides three user-defined network drivers: Bridge, Overlay,
and Macvlan. Overlay and Macvlan are used to create a cross-host network.
We will not discuss it here.
310 7 Container Technology
From the previous example, we can conclude that they must have network
cards that belong to the same network for two containers to communicate. After
this condition is met, the container can interact through the IP address. The
specific method is to specify the corresponding network through --network
when creating the container or add the existing container to the specified network
through docker network connect.
Although accessing the container through the IP address satisfies the commu-
nication needs, it is still not flexible enough. Because it may not be determined
before deploying the application, it will be troublesome to specify the IP address
to be accessed after deployment. This problem can be solved through the DNS
service that comes with Docker.
Starting from Docker 1.10, Docker Daemon has implemented an embedded
Docker DNS service, allowing containers to communicate directly through the
"container name." The method is straightforward, just use -name to name the
container at startup. Start two containers bbox1 and bbox2 below:
Then, bbox2 can directly ping to bbox1 and start a specific image, as shown in
Fig. 7.17.
There is a limitation when using Docker DNS Server: it can only be used in
user-defined networks. In other words, the default Bridge network cannot use
DNS.
Joined containers are another way to achieve communication between con-
tainers. Joined container is exceptional. It can make two or more containers share
a network stack, network card, and configuration information. Joined containers
can communicate directly through 127.0.0.1.
3. Container communicates with external world
312 7 Container Technology
Docker provides two kinds of data storage resources for containers—Storage Driver
(management image layer and container layer) and data volume.
We have learned that the Docker image is a hierarchical structure. It consists of a
writable container layer on the top and several read-only image layers. The data of
7.1 Overview of Container Technology 313
the container is placed in these layers. The biggest characteristic of such layering
is COW.
The hierarchical structure makes the creation, sharing, and distribution of images
and containers very efficient, and these are all due to the Storage Driver. Storage
Driver realizes the stacking of multiple layers of data and provides users with a
single unified view after merging. Docker supports various Storage Drivers, includ-
ing AUFS, Device, Mapper, Btrfs, VFS, and ZFS. They can all achieve a hierarchi-
cal structure, and at the same time, have their characteristics.
When Docker is installed, the default Storage Driver will be selected according to
the current system configuration. The default Storage Driver has better stability
because the default Storage Driver has been rigorously tested on the release version.
Run the docker info command to view the default Storage Driver.
It is a good choice for some containers to put the data directly in the layer
maintained by the Storage Driver, such as those stateless applications. Stateless
means that the container has no data that needs to be persisted and can be built
directly from the image at any time. However, this method is not suitable for another
type of application. They need to persist data. When the container starts, it needs to
load the existing data. When the container is destroyed, it hopes to retain the
generated data. In other words, this type of container is stateful. This requires another
data storage resource of Docker-data volume.
The data volume is essentially a directory or file in the Docker Host file system,
which can be directly arranged in the container's file system. It has the following
characteristics.
• Data volumes are directories or files, not unformatted disks or block devices.
• The container can read/write the data in it.
• The data in the data volume can be stored permanently, even if the container using
it is destroyed.
In terms of specific use, Docker provides two types of Date Volume: Bind Mount
and Docker Managed Volume.
1. Bind Mount
314 7 Container Technology
Bind Mount is to arrange the existing directories or files on the host into the
container, as shown in Fig. 7.20.
Arrange it to the httpd container through -v, as shown in Fig. 7.21.
Bind Mount allows the host to share data with the container, which is very
convenient in management. Even if the container is destroyed, Bind Mount is still
there. In addition, when Bind Mount, you can also specify the data read/write
permission, which is readable and writable by default.
Bind Mount has many application scenarios. For example, we can mount the
source code directory into the container and modify the host's code to see the real-
time effect of the application; or put the data of the MySQL container in Bind
Mount, so that the host can be convenient back up and migrate data locally.
The use of Bind Mount is intuitive, efficient, and easy to understand, but it also
has shortcomings: Bind Mount needs to specify the specific path of the host file
system, limiting the portability of the container. When the container needs to be
migrated to another host and that host does not have the data to be mounted or the
data is not in the same path, the operation will fail. The more portable way is to
use Docker Managed Volume.
2. Docker Managed Volume
The biggest difference between Docker Managed Volume and Bind Mount is
that you do not need to specify the Mount source, just specify the Mount Point.
Here, we will take the httpd container as an example, as shown in Fig. 7.22.
We use -v to tell Docker that a data volume is needed and mounted to /usr/
local/apache2/htdocs.
Whenever a container applies for Mount Docker Managed Volume, Docker
will generate a directory under /var/lib/docker/volumes. This directory is the
Mount source.
Summarize the creation process of Docker Managed Volume.
7.1 Overview of Container Technology 315
① When the container starts, tell Docker that it needs a Data Volume to store
data, and help us Mount to the specified directory.
② Docker generates a random directory in /var/lib/docker/volumes as the
Mount source.
③ If the specified directory already exists, copy the data to the Mount source.
④ Move Docker Managed Volume to the specified directory.
In addition to using the Docker inspect command to view Volume, we can also
use the docker volume command.
Then, we discuss sharing data. Sharing data is a key feature of Volume. We
will discuss how to share data between containers and hosts and between con-
tainers through Volume.
(1) Sharing data between the container and the host
There are two types of data volumes for sharing data between the container
and the host. Both of them can share data between the container and the host,
but the methods are different. This is very clear for Bind Mount: Mount the
shared directory directly to the container. Docker Managed Volume will be
more troublesome. Since Volume is located in the directory on the host, it is
generated when the container starts, so the shared data needs to be copied to
the Volume. Use the docker cp command to copy data between the container
and the host. Of course, we can also use the Linux cp command directly.
(2) Sharing data between containers
One method is to put the shared data in Bind Mount, and then mount it to
multiple containers. Another method is to use Volume Container. Volume
Container is to provide Volume specifically for other containers. The Volume
it provides can be Bind Mount or Docker Managed Volume. Next we create a
Volume Container, as shown in Fig. 7.23.
We named the container vc_data (vc is the abbreviation of Volume
Container). Note that the docker create command is executed here because
Volume Container's role is only to provide data, and it does not need to be
running. The container is mounted with two Volumes:
① Bind Mount, used to store static files of the Web server.
② Docker Managed Volume, used to store some useful tools (of course it
is empty now, here is just an example).
Other containers can use the vc_data Volume Container through volumes-
from.
316 7 Container Technology
as an example. Each container will think that it has an independent network card,
even if there is only one physical network card on the host. This approach is
excellent. It makes the container more like an independent computer.
The technology that Linux implements this way is Namespace. Namespace
manages the globally unique resource in the host and can make each container
feel that only it is using it. In other words, Namespace realizes the isolation of
resources between containers.
Linux uses the following Namespaces: Mount, UTS, IPC, PID, Network, and
User. Mount Namespace makes the container appear to have the entire file
system. UTS Namespace allows the container to have its hostname. IPC
Namespace allows containers to have their shared content and semaphores to
achieve inter-process communication. PID Namespace allows the container to
have its own independent set of PID. Network Namespace allows the container to
have its independent network card, IP, and routing resources. User Namespace
allows the container to have the authority to manage its users.
The popularity and standardization of Docker technology have activated the tepid
PaaS market, followed by the emergence of various types of Micro-PaaS, and
Kubernetes is one of the most representative ones. Kubernetes is Google's open
source container cluster management system. It is built on Docker technology and
provides a complete set of functions for containerized applications such as resource
318 7 Container Technology
functions in the network. It opens the two main SIG-API Machinery and
SIG-Node functions for beta testing and continues to enhance storage
functions.
A one-time task, the Pod will be destroyed after the operation is com-
pleted, and the container will not be restarted. Tasks can also be run regularly.
(10) Namespace
Namespace is a fundamental concept in the Kubernetes system.
Namespace is used to implement resource isolation for multi-tenancy in
many cases. Namespace “distributes” resource objects within the cluster to
different Namespaces to form logically grouped different projects, groups,
or user groups, so that different groups can be managed separately while
sharing the resources of the entire cluster. After the Kubernetes cluster is
started, a Namespace named “default”| will be created, which can be viewed
through kubectl.
The object mentioned above components are the core components of the
Kubernetes system, and together they constitute the framework and com-
puting model of the Kubernetes system. By flexibly combining them, users
can quickly and easily configure, create, and manage container clusters. In
addition, many resource objects assist configuration in the Kubernetes
system, such as LimitRange and ResourceQuota. In addition, for objects
used in the system such as Binding, Event, etc., please refer to the
Kubernetes API documentation.
3. Service release
The Service's virtual IP address belongs to the internal network virtualized by
Kubernetes, and the external network cannot be found, but some services need to
be exposed externally, such as the Web front end. At this time, it is necessary to
add a layer of network forwarding, that is, the forwarding from the extranet to the
intranet. Kubernetes provides NodePort Service, LoadBalancer Service, and
Ingress to publish Service.
(1) NodePort Service
NodePort Service is a Service of type NodePort. In addition to assigning
an internal virtual IP address to the NodePort Service, Kubernetes also
exposes the port NodePort on each Node node. The extranet can access the
Service through [NodeIP]:[NodePort].
326 7 Container Technology
Kubernetes is independent from Docker's default network model to form its own
network model, which is more suitable for traditional network models, and applica-
tions can smoothly migrate from non-container environments to Kubernetes.
1. Communication between containers
In this case, the container communication is relatively simple because the
container inside the Pod shares the network space, so the container can directly
use the local host to access other containers. In this way, all containers in the Pod
are interoperable, and the Pod can be regarded as a complete network unit
externally, as shown in Fig. 7.26.
When Kubernetes starts a container, it starts a Pause container, which
implements the communication function between containers. Each Pod runs a
special container called Pause, and other containers are business containers.
These business containers share the Pause container's network stack and Vol-
ume mount volume, so the communication and data exchange between them is
more Efficient. In design, we can make full use of this feature to put a group of
closely related service processes into the same Pod.
2. Communication between Pod
The Kubernetes network model is a flat network plane. Pod as a network unit
is at the same level as the Kubernetes Node network in this network plane. We
consider a minimal Kubernetes network topology, as shown in Fig. 7.27. The
following conditions are met in this network topology.
① Inter-Pod communication: Pod2 and Pod3 (same host), Pod1 and Pod3
(cross-host) can communicate.
② Communication between Node node and Pod: Node1 and Pod2/Pod3
(same host), Pod/1 (cross-host) can communicate.
So the first question is how to ensure that the IP address of the Pod is globally
unique? In fact, the method is straightforward because the Docker bridge assigns
the Pod's IP address. Therefore, you can configure the Docker bridges of
different Kubernetes Nodes to different IP network segments.
In addition, Pods/containers on the same Kubernetes Node can communicate
natively, but how do Pods/containers between Kubernetes Nodes communicate?
This requires enhancements to Docker. Create an overlay network in the con-
tainer cluster to connect all nodes. Currently, overlay networks can be created
through third-party network plug-ins, such as Flannel and OVS.
(1) Use Flannel to create a Kubernetes overlay network
Flannel is an overlay network tool designed and developed by the
CoreOS team. It creates an overlay network in the cluster, sets a subnet
for the host, and encapsulates the communication messages between con-
tainers through a tunnel protocol to achieve cross-host communication
between containers. Now we use Flannel to connect two Kubernetes
Nodes, as shown in Fig. 7.28.
328 7 Container Technology
remotely. OVS is a critical SDN technology that can flexibly create virtual
networks that meet various needs, including overlay networks.
Next, we use OVS to connect two Kubernetes Nodes. In order to ensure
that the container IP does not conflict, the network segment of the Docker
bridge on the Kubernetes Node must be planned.
2. Storage system
In the design and implementation of Docker, the container's data is temporary.
That is, when the container is destroyed, the data in it will be lost. If you need to
persist data, you need to use the Docker data volume to mount files or directories
on the host to the container.
In the Kubernetes system, when the Pod is rebuilt, the data will be lost.
Kubernetes also provides the persistence of the Pod data through the data volume.
The Kubernetes data volume is an extension of the Docker data volume. The
Kubernetes data volume is at the Pod level and can be used to implement file-
sharing of containers in the Pod.
Kubernetes data volume adapts to various storage systems, providing rich and
powerful functions. Kubernetes provides multiple types of data volumes, which
are divided into three categories: local data volumes, network data volumes, and
information data volumes according to their functions.
(1) Local data volume
There are two types of data volumes in Kubernetes. They can only act on
the local file system. We call them local data volumes. The data in the local
data volume will only exist on one machine, so when the Pod is migrated, the
data will be lost, which cannot meet the real data persistence requirements.
However, local data volumes provide other uses, such as file-sharing of
containers in Pod, or sharing the host's file system.
① EmptyDir
EmptyDir is an empty directory, which is a new directory created when the
Pod is created. If the Pod is configured with an EmptyDir data volume, the
EmptyDir data volume will exist during the life of the Pod. When the Pod is
allocated to the Node node, the EmptyDir data volume will be created on
the Node node and mounted to the Pod container. As long as the Pod exists,
the EmptyDir data volume will exist (container deletion will not cause the
EmptyDir data volume to lose data). However, if the Pod's life cycle ends
(Pod is deleted), the EmptyDir data volume will be deleted and lost forever.
The EmptyDir data volume is very suitable for file-sharing of containers in
Pod. Pod's design provides a good container combination model, each of
which performs its duties and completes the interaction through shared file
directories. For example, a full-time log collection container can be combined
in each Pod and business container to complete the logs' collection and
summary.
② HostPath
The HostPath data volume allows the file system on the container host to
be mounted to the Pod. If the Pod needs to use some files on the host, you can
use the HostPath data volume.
(2) Network data volume
Kubernetes provides many types of data volumes to integrate third-party
storage systems, including some prevalent distributed file systems and stor-
age support provided on the LaaS platform. These storage systems are
332 7 Container Technology
distributed and share file systems through the network, so we call it network
data volume.
Network data volumes can meet the persistence requirements of data. Pod
is configured to use network data volume. Each time a Pod is created, the
remote file directory of the storage system will be mounted to the container,
and the data in the data volume will be permanently stored. Even if the Pod is
deleted, it will only delete the mounted data volume. The data in the data
volume is still stored in the storage system, and when a new Pod is created,
the same data volume is still mounted.
① NFS
NFS is a file system supported by FreeBSD, which allows computers on
the network to share resources via TCP/IP. In NFS applications, the local
NFS client application can transparently read/write files located on the remote
NFS server, just like accessing local files.
② iSCSI
iSCSI is researched and developed by IBM. It is an SCSI instruction set for
hardware devices that can run on the IP address's upper layer. This instruction
set can be implemented to run the SCSI protocol on the IP network, enabling
it to perform routing selection on, for example, high-speed Gigabit Ethernet.
iSCSI technology is a new storage technology that combines the existing
SCSI interface with Ethernet technology to enable servers to exchange data
with storage devices using IP networks.
③ GlusterFS
GlusterFS is the core of the horizontal expansion storage solution. It is an
open source distributed file system with powerful horizontal expansion
capabilities. Through expansion, it can support PB-level storage capacity
and handle thousands of clients. GlusterFS uses TCP/IP or InfiniBand
RDMA network to aggregate physically distributed storage resources and
uses a single global namespace to manage data. GlusterFs is based on a
stackable userspace design, which can provide excellent performance for
various data loads.
④ RBD
Ceph is an open source, distributed network storage, and at the same time,
a file system. Ceph's design goals are excellent performance, reliability, and
scalability. Ceph is based on reliable, scalable, and distributed object storage,
manages metadata through a distributed cluster, and supports POSIX inter-
faces. RBD (Rados Block Device) is a Linux block device driver that pro-
vides a shared network block device to interact with Ceph. RBD strips and
replicates on the cluster of Ceph object storage to provide reliability, scal-
ability, and access to block devices.
(3) Information data volume
There are some data volumes in Kubernetes, mainly used to pass config-
uration information to containers, which we call information data volumes.
For example, Secret and Downward API both save Pod information in the
form of a file and then mount it to the container in the form of a data volume,
7.2 Overview of Kubernetes 333
and the container obtains the corresponding information by reading the file.
In terms of functional design, this is a bit deviating from the original intention
of the data volume because it is used to persist data or file-sharing. Future
versions may restructure this part, placing the functions provided by the
information data volume in a more appropriate place.
① Secret
Kubernetes provides Secret to handle sensitive data, such as passwords,
tokens, and secret keys. Compared to directly configuring sensitive data in the
Pod definition or mirror, Secret provides a more secure mechanism to prevent
data leakage.
The creation of the Secret is independent of the Pod, and it is mounted to
the Pod in the form of a data volume. The Secret's data will be saved in the
form of a file, and the container can obtain the required data by reading
the file.
② Downward API
The Downward API can tell the container Pod information through envi-
ronment variables. In addition, it can also pass values through data volumes.
Pod information will be mounted in the container through the data volume in
the form of a file. The information can be obtained by reading the file in the
container. Currently, the Pod name, Pod Namespace, Pod Label, and Pod
Annotation are supported.
③ Git Repo
Kubernetes supports downloading the Git warehouse to the Pod. It is
currently implemented through the Git Repo data volume. That is, when the
Pod configures the Git Repo data volume, it downloads and configures the Git
warehouse to the Pod data volume, and then mounts it to the container.
(4) Storage resource management
Understanding each storage system is a complicated matter, especially for
ordinary users, who sometimes do not care about various storage
implementations, but only hope to store data safely and reliably. Kubernetes
provides Persistent Volume and Persistent Volume Claim mechanisms,
which are storage consumption models. Persistent Volume is a data volume
configured and created by the system administrator. It represents a specific
type of storage plug-in implementation, which can be NFS, iSCSI, etc.: For
ordinary users, through Persistent Volume Claim, you can request and obtain
a suitable Persistent Volume without the need to perceive the back-end
storage implementation.
The relationship between Persistent Volume Claim and Persistent Volume
is similar to Pod and Node node. Pod consumes the resources of Node node,
and Persistent Volume Claim consumes the resources of Persistent Volume.
Persistent Volume and Persistent Volume Claim are related to each other and
have complete life cycle management.
(1) Preparation
The system administrator plans and creates a series of Persistent Volumes.
After the Persistent Volume is successfully created, it is available.
334 7 Container Technology
(2) Binding
The user creates a Persistent Volume Claim to declare the storage request,
including storage size and access mode. After the Persistent Volume Claim is
successfully created, it is in a waiting state. When Kubernetes finds that a new
Persistent Volume Claim is created, it will look for the Persistent Volume
according to the conditions. When Persistent Volume matches, Persistent
Volume Claim and Persistent Volume will be bound, and Persistent Volume
and Persistent Volume Claim are both in a bound state.
Kubernetes will only select the Persistent Volume in the available state
and adopt the minimum satisfaction strategy. When there is no Persistent
Volume to meet the demand, the Persistent Volume Claim will be in a waiting
state. For example, there are now two Persistent Volumes available, one
Persistent Volume with a capacity of 50GB and one Persistent Volume
with a capacity of 60GB, then the Persistent Volume Claim for 40GB will
be bound to the Persistent Volume for 50GB, and the Persistent Volume
Claim for 100Gi is requested. It is in a waiting state until a Persistent Volume
larger than 100GB appears (Persistent Volume may be created or recycled).
(3) Use
When creating a Pod using Persistent Volume Claim, Kubernetes will
query its bound Persistent Volume, call the real storage implementation, and
then mount the Pod's data volume.
(4) Release
When the user deletes the Persistent Volume Claim bound to the Persistent
Volume, the Persistent Volume is in the released state. At this time, the
Persistent Volume may retain the Persistent Volume Claim data, so the
Persistent Volume is not available, and the Persistent Volume needs to be
recycled.
(5) Recycling
The released Persistent Volume needs to be recycled before it can be used
again. The recycling strategy can be manual processing or automatic cleaning
by Kubernetes. If the cleaning fails, the Persistent Volume will be in a failed
state.
In order to realize the effective scheduling and allocation of resources while improv-
ing resource utilization, Kubernetes uses QoS to manage the quality of service on
Pod according to the expectations of different service quality. For a Pod, the quality
of service is reflected in two specific indicators: CPU and memory. When the
memory resources on the node are tight, Kubernetes will deal with it according to
the different QoS categories set in advance.
7.2 Overview of Kubernetes 335
1. QoS Classification
QoS is mainly divided into three categories: Guaranteed, Burstable and Best-
Effort, with priority from high to low.
(1) Guaranteed
All containers in the Pod must set limits uniformly, and the set parameters
are consistent. If there is a container to set requests, then all containers must
be set and the set parameters are consistent with the limits. The QoS of this
Pod is the Guaranteed level.
Note: If a container only sets limits but not requests, the value of requests
is equal to the value of limits.
Guaranteed example: Both requests and limits are set and the values are
equal, as shown in Fig. 7.32.
(2) Burstable
As long as the requests and limits of a container in the Pod are not the
same, the QoS of the Pod is the Burstable level.
Burstable example: set limits for the different resources of the container
foo and bar (foo is memory, and bar is cpu), as shown in Fig. 7.33.
(3) Best-Effort
336 7 Container Technology
If requests and limits are not set for all resources, the QoS of the Pod is the
Best-Effort level.
Best-Effort example: neither container foo nor container bar has requests
and limits set, as shown in Fig. 7.34.
resources. In addition, the containers share the same system kernel, so that when
multiple containers use the same kernel, the efficiency of memory usage will be
improved.
Although the two virtualization technologies, containers and virtual machines,
are entirely different, their resource requirements and models are similar. Con-
tainers like virtual machines require memory, CPU, hard disk space, and network
bandwidth. The host system can treat the virtual machine and the container as a
whole, allocate and manage the resources it needs for this whole. Of course, the
virtual machine provides the security of a dedicated operating system and a firmer
logical boundary, while the container is relatively loose on the resource boundary,
which brings flexibility and uncertainty.
Kubernetes is a container cluster management platform. Kubernetes needs to
count the overall platform's resource usage, allocate resources to the container
reasonably, and ensure that there are enough resources in the container life cycle
to ensure its operation. Furthermore, if the resource issuance is exclusive, the
resource has been distributed to one container, the same resource will not be
distributed to another container. For idle containers, it is very wasteful to occupy
resources (such as CPU) that they do not use. Kubernetes needs to consider how
to improve resource utilization under the premise of priority and fairness.
2. Resource requests and resource limits
Computing resources are required for Pod or container operation, mainly
including the following two.
• CPU: The unit is Core.
• Memory: The unit is Byte.
When creating a Pod, you can specify the resource request and resource
limit of each container. The resource request is the minimum resource require-
ment required by the container, and the resource limit is the upper limit of the
resource that the container cannot exceed. Their size relationship must be:
0<=request<=limit<=infinity
Pod, it checks whether there are enough resources on the Node node to satisfy
the Pod's resource request. If it is not satisfied, the Node node will be excluded.
Resource requests can ensure that the Pod has enough resources to run, and
resource restrictions prevent a Pod from using resources unrestrictedly, caus-
ing other Pods to crash. Especially in the public cloud scenario, malicious
software often preempts the attack platform.
Docker containers use Linux Cgroups to implement resource limits, and the
docker run command provides parameters to limit CPU and memory.
(1) --memory
The docker run command sets the memory quota available to a container
through the --memory parameter. Cgroup will limit the memory usage of the
container. Once the quota is exceeded, the container will be terminated. The
value of --memory of the Docker container in Kubernetes is the value of
resources.limits.memory, for example, resources.limits.memory¼512MB,
then the value of --memory is 512102410241024.
(2) --cpu-shares
The docker run command sets the available CPU quota for a container
through the --cpu-shares parameter. It is important to note that this is a
relative weight and has nothing to do with the actual processing speed.
Each new container will have 1024 CPU quota by default. When we talk
about it alone, this value does not mean anything. However, if you start two
containers and both will use 100% of the CPU, the CPU time will be evenly
distributed between the two containers because they both have the same CPU
quota. If we set the container's CPU quota to 512, compared to another
1024CPU quota container, it will use 1/3 of the CPU time, but this does not
mean that it can only use 1/3 of the CPU time. If another container (1024CPU
quota is easy) is idle, the other container will be allowed to use 100% of the
CPU. For CPUs, it is difficult to clearly state how many CPUs are allocated to
which container, depending on the actual operating conditions.
The value of --cpu-shares of the Docker container in Kubernetes is
through resources.requests.cpu or resources.
The value of requests.cpu is multiplied by 1024. If resources.requests.cpu
is specified, --cpu-shares is equal to resources.
requests.cpu multiplied by 1024; if resources.requests.cpu is not specified,
but resources.limits.cpu is specified, --cpu-shares is equal to resources.limits.
cpu multiplied by 1024; if resources.limits.cpu and resources. If limits.cpu is
not specified, --cpu-shares takes the minimum value.
LimitRange includes two types of configurations, Container and Pod. The
configurations, including constraints and default values, are shown in
Tables 7.1 and 7.2.
Kubernetes is a multi-tenant architecture. When multiple tenants or teams share a
Kubernetes system, the system administrator needs to prevent the tenants from
occupying resources and define resource allocation strategies. Kubernetes provides
the API object ResourceQuota to implement resource quotas. ResourceQuota can
7.2 Overview of Kubernetes 341
not only act on CPU and memory, but also limit the number of Pods created. The
computing resource quotas and resources supported by ResourceQuota are shown in
Tables 7.3 and 7.4.
342 7 Container Technology
7.3 Exercise
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-
NonCommercial-NoDerivatives 4.0 International License (http://creativecommons.org/licenses/by-
nc-nd/4.0/), which permits any noncommercial use, sharing, distribution and reproduction in any
medium or format, as long as you give appropriate credit to the original author(s) and the source,
provide a link to the Creative Commons license and indicate if you modified the licensed material.
You do not have permission under this license to share adapted material derived from this chapter or
parts of it.
The images or other third party material in this chapter are included in the chapter's Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter's Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.