Part III Automation and Orchestration
Chapter 9 Automation
Introduction
• Previous parts of the text discuss the motivation for cloud computing, the physical infrastructure, and key
virtualizations that cloud systems use.
• This part of the book introduces the topics of automation and orchestration.
• This chapter examines aspects of automation and explains why so many automation systems have arisen due to
the need cloud systems have for automated support mechanisms.
Groups That Use Automation
• Automation focuses on making tasks easier and faster for humans to perform. Cloud automation
mechanisms have been developed for three categories of users:
o Individual customers
o Large cloud customers
o Cloud providers
• Individual customers.
o Individual subscribers often use SaaS apps, such as a document editing system that allows a set of users
to share and edit documents cooperatively.
o To make such services convenient, providers typically offer access through a web browser or a dedicated
app that the user downloads.
o The provider may create a virtual machine or container, allocate storage, configure network access, and
launch web server software. When an individual uses a point-and-click interface to access a service,
the interface must be backed by underlying automated systems that handle many chores on behalf of the
individual.
• Large cloud customers.
o Unlike a typical individual, an enterprise company or other organization that moves to a public cloud
needs tools to control and manage computing.
o Two types of automated tools are available for large cloud customers. One type, available from the
provider or a third party, allows a customer to download and run the tools to deploy and manage apps.
The other type consists of tools offered by a provider that allow large customers to configure, deploy,
and manage apps and services without downloading software.
o The next chapters explain examples, including Kubernetes, which automates deployment and operation
of a service built with containers, and Hadoop, that automates MapReduce computations.
• Cloud providers
o Cloud providers have devised some of the most sophisticated and advanced automation tools and use
them to manage cloud data centers.
o In addition to building tools to configure, monitor, and manage the underlying cloud infrastructure, a
provider also creates tools that handle requests from cloud customers automatically.
The Need For Automation In A Data Center
• Operating a data center is much more complex than operating IT facilities for a single organization. Four aspects
of a cloud data center stand out.
• Extreme scale – A cloud provider must accommodate thousands of tenants including individuals and major
enterprise customers
• Diverse services - Because a cloud data center provider allows each customer to choose software and services
to run it must be able to run software for thousands of services at the same time.
• Constant change - A data center provider must handle dynamically changing requirements, with the response
time dependent on the change including handling a tenant request for a new VM or to deploy a container in the
moment.
• Human error - Many problems in a data center can be traced to human error.
An Example Deployment
• Below lists example steps a provider takes when deploying a single VM.
o Step 1 - Choose a server on which to run the VM
o Step 2 - Configure the hypervisor on the server to run the VM
o Step 3 - Assign an IP address to the VM
o Step 4 - Configure the network to forward packets for the VM, which may involve configuring the
tenant’s virtual network (e.g., VXLAN)
o Step 5 - Choose a remote disk server and allocate storage for the VM, according to the tenant’s
specification
o Step 6 - Configure the hypervisor to send requests from the VM to the storage server
• Because even a trivial procedure, such as deploying a VM, requires multiple configuration steps, manual
operation does not suffice — operating an entire data center efficiently requires automation.
What Can Be Automated?
• Almost all operational tasks in a data center can be automated. There are a few example items that can be
automated:
o Creation and deployment of new virtual resources
The creation of new virtual machines and containers; new virtual storage facilities, including virtual
disk images (SAN) and initial contents of virtual file systems (NAS); new virtual networks, including
VLANS, IP subnets, IP forwarding, IP multicast, and extended VLANs (VXLAN).
o Workload monitoring and accounting
Measurement of the load on servers, storage facilities, and networks; tracking each tenant’s resource use
and computing charges; identification of hot spots; long-term trends, including capacity assessment and
predictions of when additional physical facilities will be needed.
o Optimizations
Optimizations for both initial deployments and subsequent changes; the initial placement of VMs and
containers to handle balancing the load across physical servers; minimization of the latency between
applications and storage, and minimization of network traffic; VM migration, including migration to
increase performance or to minimize power consumption.
o Safety and recovery
Scheduled backups of tenant’s data; server monitoring; monitoring of network equipment and fast re-
routing around failed switches or links; monitoring of storage equipment, including detecting failures of
redundant power supplies and redundant disks (RAID systems); automated restart of VMs and containers;
auditing and compliance enforcement.
o Software update and upgrade
Keeping apps and operating system images updated to the latest versions; upgrading to new releases and
versions of software as specified by a tenant; providing facilities a tenant can use to update their private
software and deploy new versions; aid in achieving continuous deployment of a tenant’s apps.
o Administration of security policies
Deploying network security across the data center in accordance with the provider’s policies, including
firewall facilities; protecting each tenant’s data and computation; facilities for the management of secrets
and encryption keys.
Levels Of Automation
• The five level basic model below can help explain the extent to which automation can be applied.
o Level 0 - No automation (manual operation)
o Level 1 - Automated preparation and configuration - automation of offline tasks that are performed
before installation occurs.
o Level 2 -Automated monitoring and measurement - monitoring both physical and virtual resources of a
data center and making measurements available to human operators. Monitoring often focuses on
performance, and includes the load on each server, the traffic on each link in the network, and the
performance of storage systems.
o Level 3 - Automated analysis of trends and prediction - enhances level 2 monitoring by adding analytic
capabilities by collecting measurements over a long period and using software to analyze changes and
trends. From a data center owner’s point of view, the key advantage of level 3 analysis lies in the ability
to predict needs, allowing the data center owner to plan ahead rather than waiting for a crisis to occur.
o Level 4 - Automated identification of root causes - uses data gathered from monitoring along with
knowledge of both the data center infrastructure and layers of virtualization that have been added to
deduce the cause of problems.
o Level 5 - Automated remediation of problems - extends the ideas of a level 4 system by adding
automated problem solving.
AIops: Using Machine Learning And Artificial Intelligence
• Higher levels of automation require sophisticated software systems. For example, Levels 3 and above may use
machine learning (ML) software. The top levels may use additional forms of Artificial Intelligence (AI).
• Industry uses the term AIops (Artificial Intelligence operations) to describe an automation system that uses AI
and can operate a data center without human intervention.
A Plethora Of Automation Tools
• There have been so many automation tools because operating a data center is an extremely complex task and
it is easiest to automate each small part of the task independently.
• Management complexity
o Data center operations encompass a broad set of facilities and services, both physical and virtual and
must t manage a broad variety of computation, networking, and storage mechanisms in the presence of
continuous change. An operator may have multiple goals, some of which may conflict with each other.
Examples include:
▪ Avoiding hot spot
▪ Minimizing network traffic
▪ Keeping VMs near the storage server the VM will use.
▪ Placing active VMs on a subset of servers, making it possible to reduce power costs by powering
down some servers.
• Automating each small task independently
o Although it may be impossible to build an automation that optimizes all goals, it is possible to design
small tools that each help automate one small task.
o Such tools can be especially useful if they relieve humans from tasks that involve tedious details.
Human error is a source of many problems, and a tool is less prone to making errors.
Automation Of Manual Data Center Practices
• Many tools are designed to automate tasks that humans had been performing manually. To understand the
tools, one must know how humans operated data centers. Below is an example workflow for manual
configuration.
o Step 1 - A tenant signs a contract for a new VM and a new work order (i.e., a “ticket”) is created
o Step 2 - A human from the group that handles VMs configures a new VM and passes the
ticket on
o Step 3 - A human from the group that handles networking configures the network and
passes the ticket on
o Step 4 - A human from the group that handles storage configures a SAN server and passes
the ticket on
o Step 5 - The tenant is notified that the VM is ready
• Because automation tools evolved to help human operators who each have limited expertise, each tool tends to
focus on one aspect of data center operations.
Zero Touch Provisioning And Infrastructure As Code
• One particular use of automation has emerged as necessary for large scale: automated configuration.
• A cloud provider must configure servers, networks, storage, databases, and software systems continuously.
• Each vendor creates their own specialized configuration language, and a data center contains hardware and
software from many vendors so an automation tool can allow humans to specify items in a vendor-independent
language, and pass the appropriate commands to the hardware or software system being configured. The operator
does not need to interact with the system being configured.
• Zero Touch Provisioning (ZTP) (or Infrastructure as Code (IaC)) refer to a process where a data center
operator creates a specification and uses software to read the specification and configure underlying systems.
Two approaches have been used: push and pull.
o The push version follows the traditional pattern of installing a configuration: a tool reads a specification
and performs the commands needed to configure the underlying system.
o The pull version requires an entity to initiate configuration. For example, when a new software system
starts, the system can be designed to pull configuration information from a server.
Declarative, Imperative, And Intent-Based Specifications
The specifications used with automated tools can take many forms and two aspects have become important:
• Declarative vs. imperative
• Intent-based vs. detailed
• Declarative vs. imperative
o An imperative specification states an action to be performed. For example, an imperative
specification to assign an IP address to a VM might have the form:
Assign IP address 192.168.1.17 to the VM’s Ethernet interface
o Imperative specifications follow the paradigm of early binding by specifying operations for the
underlying system and values to be used. The result can be misleading and ambiguous.
o A declarative specification focuses on the result rather than the steps used to achieve it. For example,
a declarative specification for IP address assignment might have the form:
Main IP address: 192.168.1.17
• Intent-based
o Intent-based characterizes a specification that allows a human to state the desired outcome without
giving details about how to achieve the desired outcome or specific values to be used. For example,
an intent-based specification for IP address assignment might state:
Each VM receives a unique IP address on the tenant’s IP subnet
o Intent-based specifications offer generality and flexibility. Because they do not dictate steps to be taken,
intent-based specifications allow many possible implementations to be used.
o Using a declarative, intent-based configuration specification can help eliminate ambiguity and
increase both generality and flexibility. An intent-based specification gives tools freedom to
choose an implementation that produces the desired outcome.
The Evolution Of Automation Tools
• Kubernetes, described in the next chapter, provides a large set of tools that can be used to manage
containerized software deployments. Various additional tools and technologies have been created to control
network communication among containers.
• By default, Kubernetes assigns a unique IP address to each group of containers (called a pod). Doing so means
network forwarding can be arranged to allow containers in a group to communicate and run microservices, even
if the containers run on multiple servers.
• Docker software takes the approach of using a virtual layer 3 bridge to allow containers to communicate.
• Other tools are available that can configure an overlay network for containers such that each host has a separate
IP subnet and each container on a host has a unique address.
• Other tools are available to provide secure network communication for containers and microservices.
Summary
• The diversity of services, large scale, and constant change in a cloud data center mandate the use of automated
tools for the configuration and operation of hardware and software.
• Almost all operational tasks can be automated.
• Several models have been proposed to help identify levels of automation.
• Because operating a data center is complex, providers have multiple, conflicting goals.
• The first step toward automated configuration allows a human to specify configuration in a vendor-independent
form, and then uses a tool to read the specification and configure the underlying hardware and software systems
accordingly.
• An imperative specification follows a paradigm of early binding by specifying the operations and values to be
used for configuration. A declarative specification can help avoid ambiguities.
• An intent-based specification, which allows a human to specify the desired outcome without specifying how to
achieve the outcome, increases flexibility and generality.