0% found this document useful (0 votes)

25 views202 pages

ECA 5 - 15 Course Guide VB

The Enterprise Cloud Administration (ECA) 5.15 Course Guide provides comprehensive information on managing and securing Nutanix clusters, including modules on cloud computing, cluster management, networking, virtual machine management, and data protection. It outlines key concepts, tools, and best practices for optimizing performance and ensuring security within the Nutanix environment. The document also includes lab exercises and additional resources for further learning.

Uploaded by

aymenchabchoub271

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views202 pages

ECA 5 - 15 Course Guide VB

Uploaded by

aymenchabchoub271

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 202

Enterprise Cloud

Adminstration (ECA) 5.15

Course Guide
Copyright

Nutanix, Inc.
1740 Technology Drive, Suite 150
San Jose, CA 95110

All rights reserved. This product is protected by U.S. and international copyright and intellectual
property laws. Nutanix and the Nutanix logo are registered trademarks of Nutanix, Inc. in the
United States and/or other jurisdictions. All other brand and product names mentioned herein
are for identification purposes only and may be trademarks of their respective holders.

License
The provision of this software to you does not grant any licenses or other rights under any
Microsoft patents with respect to anything other than the file server implementation portion of
the binaries for this software, including no licenses or any other rights in any hardware or any
devices or software that are used to communicate with or in connection with this software.

Conventions
Convention Description
variable_value The action depends on a value that is unique to your
environment.
ncli> command The commands are executed in the Nutanix nCLI.

user@host$ command The commands are executed as a non-privileged user (such

as nutanix) in the system shell.
root@host# command The commands are executed as the root user in the vSphere
or Acropolis host shell.
> command The commands are executed in the Hyper-V host shell.

The information is displayed as output from a command or

output in a log file.

Version B
Last modified: July 17, 2020

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 2
Contents

Copyright...................................................................................................................2
License.................................................................................................................................................................. 2
Conventions........................................................................................................................................................ 2
Version.................................................................................................................................................................. 2

Module 1: Introduction......................................................................................... 9
Making Computing Invisible: A History of Cloud Computing........................................................9
What is Server Virtualization?................................................................................................................... 11
Traditional Three-Tier Architecture........................................................................................................ 12
Nodes, Blocks, and Clusters.......................................................................................................................12
Acropolis............................................................................................................................................................ 13
Prism Overview............................................................................................................................................... 13
Cloud Computing (Public, Private, and Hybrid Clouds)................................................................. 15
Why Hybrid Cloud?....................................................................................................................................... 17
Reducing Friction in the Hybrid Cloud..................................................................................... 17
State in Data and Applications: Heirlooms vs Handkerchiefs.......................................... 19
Networks: Pipes of Friction.......................................................................................................... 20
Reducing Friction Through Automation................................................................................. 20
Nutanix Zero Trust Architecture..................................................................................................21
Governance and Compliance....................................................................................................... 23
Capital Expenditures (Capex) vs. Operating Expenditures (Opex)..............................24
Predictable vs. Unpredictable Workloads...............................................................................25
One Platform. Any App. Any Location.................................................................................... 27
Other Resources............................................................................................................................................ 28
Additional Lab Source: Test Drive.............................................................................................28
Nutanix University.............................................................................................................................29
Nutanix Certification.................................................................................................................................... 29
Labs..................................................................................................................................................................... 29

Module 2: Managing the Nutanix Cluster.................................................. 30

Overview........................................................................................................................................................... 30
Cluster Example.............................................................................................................................................30
Nutanix AOS Services...................................................................................................................................31
Nutanix Cluster Components.................................................................................................................... 31
Prism Features................................................................................................................................................33
Infrastructure Management...........................................................................................................33
Performance Monitoring.................................................................................................................34
Operational Insight (Prism Pro)..................................................................................................34
Capacity Planning (Prism Pro).................................................................................................... 35
Accessing Prism Element...........................................................................................................................35
Enabling Pulse................................................................................................................................................ 36
What Gets Shared by Pulse?................................................................................................................... 36
Command Line Interfaces..........................................................................................................................37
Nutanix Command Line Interface (nCLI)................................................................................ 37
Command Format.............................................................................................................................38
nCLI Entities and Parameters...................................................................................................... 38
nCLI Embedded Help...................................................................................................................... 38

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020
allssh and hostssh.............................................................................................................................39
aCLI Example...................................................................................................................................... 39
PowerShell Cmdlets..................................................................................................................................... 39
PowerShell Cmdlets Examples.................................................................................................... 39
PowerShell Cmdlets (Partial List).............................................................................................. 40
REST API.......................................................................................................................................................... 40
Labs.................................................................................................................................................................... 40

Module 3: Securing the Nutanix Cluster.................................................... 42

Overview........................................................................................................................................................... 42
Security Overview.........................................................................................................................................42
The Nutanix Security Development Life Cycle..................................................................... 42
Security in the Enterprise Cloud............................................................................................................ 44
Two-Factor Authentication...........................................................................................................44
Cluster Lockdown.............................................................................................................................44
Hardware Key Management and Administration.................................................................45
Software Key Management and Administration.................................................................. 45
Security Technical Implementation Guides (STIGs)........................................................... 45
Data at Rest Encryption................................................................................................................ 46
Configuring Authentication.......................................................................................................................48
Changing Passwords....................................................................................................................... 48
Role-Based Access Control...................................................................................................................... 50
Gathering Requirements to Create Custom Roles............................................................... 51
Built-in Roles........................................................................................................................................ 51
Custom Roles...................................................................................................................................... 52
Configuring Role Mapping.............................................................................................................52
Working with SSL Certificates.....................................................................................................53
Labs..................................................................................................................................................................... 53

Module 4: Networking.......................................................................................54
Overview........................................................................................................................................................... 54
Default Network Configuration............................................................................................................... 54
Default Network Configuration (cont.)................................................................................................ 55
Open vSwitch (OVS)....................................................................................................................................55
Bridges............................................................................................................................................................... 56
Ports.................................................................................................................................................................... 56
Bonds..................................................................................................................................................................56
Bond Modes.........................................................................................................................................57
Virtual Local Area Networks (VLANs)................................................................................................. 60
IP Address Management (IPAM)..............................................................................................................61
Network Segmentation............................................................................................................................... 62
Configuring Network Segmentation for an Existing RDMA Cluster............................. 63
Network Segmentation During Cluster Expansion..............................................................63
Network Segmentation During an AOS Upgrade................................................................63
Reconfiguring the Backplane Network.................................................................................... 63
Disabling Network Segmentation.............................................................................................. 64
Unsupported Network Segmentation Configurations....................................................... 64
AHV Host Networking.................................................................................................................................64
Recommended Network Configuration................................................................................... 64
AHV Networking Terminology Comparison...........................................................................67
Labs..................................................................................................................................................................... 67

Module 5: Virtual Machine Management....................................................68

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020
Overview........................................................................................................................................................... 68
Understanding Image Configuration..................................................................................................... 68
Overview............................................................................................................................................... 69
Supported Disk Formats................................................................................................................69
Uploading Images............................................................................................................................. 69
Creating and Managing Virtual Machines in AHV............................................................................70
Creating a VM in AHV......................................................................................................................71
Creating a VM using Prism Self-Service (PSS)...................................................................... 71
Managing a VM.................................................................................................................................. 72
Supported Guest VM Types for AHV....................................................................................................73
Nutanix VirtIO................................................................................................................................................. 73
Nutanix Guest Tools.....................................................................................................................................74
Overview............................................................................................................................................... 74
NGT Requirements and Limitations.......................................................................................... 75
Requirements and Limitations by Operating System........................................................ 75
Customizing a VM......................................................................................................................................... 76
Sysprep..................................................................................................................................................76
Cloud-Init...............................................................................................................................................77
Guest VM Data Management................................................................................................................... 78
Guest VM Data: Standard Behavior...........................................................................................78
Live Migration..................................................................................................................................... 78
Data Path Redundancy...................................................................................................................79
Configuring Flash Mode............................................................................................................................. 79
Labs.....................................................................................................................................................................80

Module 6: Health Monitoring and Alerts.....................................................81

Overview............................................................................................................................................................ 81
Health Monitoring...........................................................................................................................................81
Nutanix Cluster Check.................................................................................................................................82
Installing and Updating NCC........................................................................................................82
When to Run NCC............................................................................................................................83
NCC Syntax..........................................................................................................................................83
NCC Output.........................................................................................................................................83
Output Status Types........................................................................................................................83
Checking the Status of a Cluster............................................................................................... 84
Usage Examples.................................................................................................................................85
Running Checks................................................................................................................................. 85
Configuring Check Frequency and Email Notifications.....................................................85
Health Dashboard..........................................................................................................................................86
Configuring Health Checks........................................................................................................... 86
Setting NCC Frequency..................................................................................................................87
Collecting Logs.................................................................................................................................. 87
Analysis Dashboard...................................................................................................................................... 88
Understanding Metric and Entity Charts.................................................................................89
Alerts Dashboard...........................................................................................................................................89
Alerts View.......................................................................................................................................... 89
Events View.........................................................................................................................................92
Labs.....................................................................................................................................................................94

Module 7: Distributed Storage Fabric.........................................................95

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020
Data Storage Representation...................................................................................................................98
Storage Components.......................................................................................................................98
Understanding Snapshots and Clones................................................................................................. 99
Clones.....................................................................................................................................................99
Shadow Clones.................................................................................................................................. 99
Snapshotting Disks.........................................................................................................................100
Capacity Optimization - Deduplication............................................................................................... 101
Deduplication Process................................................................................................................... 102
Deduplication Techniques............................................................................................................103
Capacity Optimization - Compression................................................................................................103
Compression Process.................................................................................................................... 104
Compression Technique Comparison..................................................................................... 104
Workloads and Dedup/Compression......................................................................................105
Deduplication and Compression Best Practices.................................................................105
Replication Factor.......................................................................................................................................106
Erasure Coding Basics...............................................................................................................................106
EC-X Compared to Traditional RAID...................................................................................... 107
EC-X Process.....................................................................................................................................108
Erasure Coding in Operation......................................................................................................109
Replication Factor 3 with Erasure Coding: 6-Node........................................................... 110
Replication Factor 2 with Erasure Coding: 4-Node.............................................................111
Labs......................................................................................................................................................................111

Module 8: Migrating Workloads to AHV................................................... 113

Objectives.........................................................................................................................................................113
Nutanix Move.................................................................................................................................................. 113
Nutanix Move Operations............................................................................................................. 114
Compatibility Matrix........................................................................................................................ 115
Unsupported Features.................................................................................................................... 115
Configuring Nutanix Move............................................................................................................ 115
Nutanix Move Migration.................................................................................................................116
Downloading Nutanix Move......................................................................................................... 116
Labs.....................................................................................................................................................................116

Module 9: Files and Volumes......................................................................... 117

Module 10: Managing Failures.......................................................................128

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020
CVM Unavailability...........................................................................................................................128
Node Unavailability.........................................................................................................................130
Drive Unavailability......................................................................................................................... 130
Boot Drive (DOM) Unavailability................................................................................................131
Network Link Unavailability.........................................................................................................133
Redundancy Factor (Fault Tolerance)................................................................................................ 133
Block Fault Tolerant Data Placement................................................................................................. 135
Rack Fault Tolerance................................................................................................................................. 136
VM High Availability in Acropolis......................................................................................................... 136
High Availability............................................................................................................................... 138
Affinity and Anti-Affinity Rules for AHV............................................................................................138
Limitations of Affinity Rules........................................................................................................139
Labs....................................................................................................................................................................139

Module 11: Data Protection............................................................................. 141

Overview...........................................................................................................................................................141
VM-centric Data Protection Terminology...........................................................................................141
RPO and RTO Considerations................................................................................................................ 143
Time Stream...................................................................................................................................................143
Protection Domains.................................................................................................................................... 145
Concepts..............................................................................................................................................145
Terminology....................................................................................................................................... 145
Protection Domain States............................................................................................................146
Protection Domain Failover and Failback............................................................................. 147
Using Nutanix Leap.....................................................................................................................................147
Availability Zone.............................................................................................................................. 148
License Requirements....................................................................................................................148
Nutanix Software Requirements............................................................................................... 149
Networking Requirements........................................................................................................... 149
Remote Office Branch Office................................................................................................................. 150
Requirements and Best Practices..............................................................................................151
ROBO Cluster Considerations..................................................................................................... 151
Hypervisor........................................................................................................................................... 152
ROBO Witness VM Requirements............................................................................................ 153
ROBO Failure and Recovery Scenarios for Two-Node Clusters...................................154
Seeding.................................................................................................................................................157
Labs....................................................................................................................................................................158

Module 12: Prism Central................................................................................ 159

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020
Operations Deep Dive.................................................................................................................... 171
Applications Deep Dive.................................................................................................................172
Labs....................................................................................................................................................................172

Module 13: Monitoring the Nutanix Cluster..............................................173

Module 14: Cluster Management and Expansion...................................178

Overview.......................................................................................................................................................... 178
Starting and Stopping a Cluster or Node..........................................................................................178
Understanding Controller VM Access..................................................................................... 178
Cluster Shutdown Procedures....................................................................................................178
Starting a Node................................................................................................................................ 179
Starting a Cluster............................................................................................................................ 180
Removing a Node from a Cluster......................................................................................................... 182
Before You Begin............................................................................................................................ 182
Removing or Reconfiguring Cluster Hardware....................................................................182
Expanding a Cluster................................................................................................................................... 183
Managing Licenses...................................................................................................................................... 185
Cluster Licensing Considerations.............................................................................................. 185
Understanding AOS Prism and Add on Licenses...............................................................186
Managing Your Licenses...............................................................................................................187
Managing Licenses in a Dark Site............................................................................................ 190
Reclaiming Your Licenses......................................................................................................................... 191
Reclaiming Licenses with a Portal Connection................................................................... 192
Reclaiming Licenses Without Portal Connection............................................................... 192
Upgrading Software and Firmware......................................................................................................193
Understanding Long Term Support and Short Term Support Releases....................195
Before You Upgrade...................................................................................................................... 196
Lifecycle Manager (LCM) Upgrade Process......................................................................... 196
Upgrading the Hypervisor and AOS on Each Cluster...................................................... 197
Working with Life Cycle Manager............................................................................................ 198
Upgrading Recommended Firmware...................................................................................... 199
Labs................................................................................................................................................................... 201

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020
Module

1
INTRODUCTION

Making Computing Invisible: A History of Cloud Computing

Advancing technologies continually disrupt the landscape of our world and every step is an
incremental evolution that drives it forward. Modern businesses rely heavily upon this cyclical
and iterative positive disruption to be competitive in the marketplace and innovate, bringing
new services and products to their customers faster than ever.

Companies also demand flexibility, choice, agility, and cost efficiency from these enabling
technologies to ensure that business capabilities can change with changing demand, market,
and business mission.

In the past decade, the rise of cloud services fulfilled the demand for more IT and business
agility, bringing new applications and services online almost overnight. Yet this ability created
secondary issues of system and data sprawl, governance and compliance stratification, and it
ultimately cost businesses more than traditional datacenter models since it lacked mature cost
controls.

So, for these reasons and more, businesses realized that certain workloads and data sets were
more suitable for their datacenters, while others required the web-scale architecture that
reached a global audience, without friction or impedance.

This is how the hybrid cloud model was born. It blends control, flexibility, security, scalability,
and cost effectiveness, serving the needs of both business and customers alike.

But to really understand the business drivers that led to the hybrid cloud, we need to briefly
discuss where all of this began. And everything began with the mainframe.

Mainframe Computing

Starting with mainframe computing, users had the ability to create massive, monolithic code
structures that lived on largely siloed and custom-built equipment. The processing power and
centralized design of this system made it very costly and inflexible – maintaining each system
required specialized training and careful coordination to ensure minimal business disruptions.
In the event of a failure, a secondary mainframe is required, and restoring from backup tapes
could take days to complete. Applications had to be custom written for each platform, which
were both time consuming and expensive.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 9
Introduction

Unix Servers

The creation of Unix operating systems saw a standardization of hardware and software into
more focused and manageable systems. This homogeneity enabled computer operator to
standardize their skill sets, and maintain systems similarly across any business, or even multiple
enterprises. Unix system hardware is still specialized by vendor though (IA64, SPARC, and
others), as were different Unix operating systems. Applications are developed, but still not
ported between disparate vendors, creating lock-in and requiring customized skill sets for
computer operators.

Intel x86

Enter the Intel x86 platform: a commoditized set of hardware that is delivered rapidly and
cheaply and standardized the way that hardware systems are created today. By streamlining
the underlying hardware architecture, the subsequent operating systems on x86 systems are
managed more easily than their counterparts. Parts and whole systems are interchangeable,
OS images are ported easily between systems, and applications are rapidly developed and
migrated to new systems.

The advancement of Intel x86 systems also saw that hardware innovation outpace software
demand, where multicore systems with abundant memory and storage would sit idle, or
underutilized at periods, due to the static nature of the system size. X86 servers suffered from
their own success and required further innovation at the software level to unlock the next
innovation: virtualization.

Virtualized x86

Intel x86 software virtualization abstracts an operating system from its underlying hardware,
allowing any x86 operating system to run simultaneously with other x86 operating systems on
the same bare metal server. This allowed for even more flexibility, cost savings, and efficiency,
as well as portability – now, applications shipped preinstalled within virtual machine “files” or
images. These virtualized systems maximized the density of operating system to hardware,
cutting costs in the datacenter as well as enabling newer programmatic ways to rapidly deploy
new workloads.

Virtualized systems still required the overhead of maintenance and a specialized skill set to
operate and maintain, and quite often businesses suffered from the operational complexity
of maintaining hundreds or even thousands of virtual machines at scale. Upgrades, updates,
and system maintenance still required careful coordination and planning, and often disrupted
business operations. This model was positively changed again when containers were
introduced.

Containers

Containers are prepackaged images of software, using fractions of the compute and storage
capacity of virtual machines, that can be instantaneously deployed upon any container runtime
via automation and orchestration. These tiny compute units allowed developers to rapidly
test and deploy code in a matter of minutes instead of days – checking in software changes
to a repository, and enabling an automated software build and test cycle that could be simply
monitored and not managed heavily. Containers also enabled applications to be subdivided
into smaller “micro-services”, where an entire application need not reside in the same instance
or operating system, but only a fraction that could service a known business demand. This
capability combined with the “pay for what you use” operational model of the cloud allowed
businesses to truly only pay for what services they needed, when they needed them.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 10
Introduction

Serverless

Atomizing applications even more, serverless computing or “function-as-a-service” further

minimized the footprint of application services by running only small sections of code at a
time. Serverless computing enabled businesses to consume fractions of computational time
and minimal storage by simply running their code “on-demand” and not owning the rest of the
infrastructure beneath it.

At every step of the way, computing power was enhanced, made more efficient, and drew
applications closer to their desired operating environment: the right performance, fulfilling the
right customer demand, at exactly the right cost and scale.

However, serverless computing isn't the only option for performance, cost, and scale. There is,
as we'll see later in this module, an inherent risk in surrendering complete hardware control to
a third-party and simply consuming resources as a service. An organization losing control of its
data is hardly an ideal scenario so there's a clear need for an on-prem solution that provides the
same benefits that serverless computing offers, but without the security and governance risks.

That alternative on-prem solution is hyperconverged infrastructure.

What is Server Virtualization?

Virtualization, as the name implies, creates a virtual version of a once-physical item. In a
datacenter, the most commonly virtualized items include operating systems, servers, storage
devices, or desktops. With virtualization, technologies like applications and operating systems
are abstracted away from the hardware or software beneath them.

Hardware virtualization involves virtual machines (VMs), which take the place of a “real”
computer with a “real” operating system.

How Does Virtualization Work?

One of the main reasons businesses use virtualization technology is server virtualization, which
uses a hypervisor to “duplicate” the hardware underneath. In a non-virtualized environment,
the guest operating system (OS) normally works in conjunction with the hardware. When
virtualized, the OS still runs as if its on hardware, letting companies enjoy much of the same
performance they expect without hardware. Though the hardware performance vs. virtualized
performance isn’t always equal, virtualization still works and is preferable since most guest
operating systems don’t need complete access to hardware. As a result, businesses can enjoy
better flexibility and control and eliminate any dependency on a single piece of hardware.
Because of its success with server virtualization, virtualization has spread to other areas of the
datacenter, including applications, networks, data, and desktops.

Put simply, virtualization solutions streamline your enterprise datacenter. It abstracts away the
complexity in deploying and administering a virtualized solution, while providing the flexibility
needed in the modern datacenter.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 11
Introduction

Virtualization Terminology

1. Host (machine): a computer on which a hypervisor runs.

2. Hypervisor: creates a virtual version of a once-physical system. Manages multiple guest VMs
simultaneously. Apps and O.S. are abstracted away from the hardware. VMs are presented
with a virtual O.S.

3. Guest (virtual) machine: virtual machine (VM). VMs have run their own OS. Interaction with
physical hardware is done through para-virtualized drivers.

Traditional Three-Tier Architecture

Legacy infrastructure—with separate storage, storage networks, and servers—is not well
suited to meet the growing demands of enterprise applications or the fast pace of modern
business. The silos created by traditional infrastructure have become a barrier to change and
progress, adding complexity to every step from ordering to deployment to management. New
business initiatives require buy-in from multiple teams and your organization needs to predict
IT infrastructure 3-to-5 years in advance. As most IT teams know, this is almost impossible to
get right. In addition, vendor lock-in and increasing licensing costs are stretching budgets to the
breaking point.

Nodes, Blocks, and Clusters

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 12
Introduction

A node is an x86 server with compute and storage resources. A single Nutanix cluster can have
an unlimited number of nodes. Different hardware platforms are available to address varying
workload needs for compute and storage.

A block is a chassis that holds one to four nodes, and contains power, cooling, and the
backplane for the nodes. The number of nodes and drives depends on the hardware chosen for
the solution.

Acropolis

Acropolis is the foundation for a platform that starts with hyperconverged infrastructure
then adds built-in virtualization, storage services, virtual networking, and cross-hypervisor
application mobility.

For the complete list of features, see the Software Options page on the Nutanix website.

Nutanix delivers a hyperconverged infrastructure solution purpose-built for virtualization and

cloud environments. This solution brings the performance and economic benefits of web-
scale architecture to the enterprise through the Enterprise Cloud Platform, which includes two
product families— Nutanix Acropolis and Nutanix Prism.

Nutanix Acropolis includes three foundational components:

• Distributed Storage Fabric (DSF)

• App Mobility Fabric

• AHV

AHV is the hypervisor while DSF and App Mobility Fabric are functional layers in the Controller
VM (CVM).

Note: Acropolis also refers to the base software running on each node in the
cluster.

Prism Overview
Prism is the management plane that provides a unified management interface that can generate
actionable insights for optimizing virtualization, provides infrastructure management and
everyday operations.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 13
Introduction

Prism gives Nutanix administrators an easy way to manage and operate their end-to-end virtual
environments. Prism includes two software components: Prism Element and Prism Central.

Prism Element

Prism Element provides a graphical user interface to manage most activities in a Nutanix
cluster.

Some of the major tasks you can perform using Prism Element include:

• View or modify cluster parameters.

• Create a storage container.

• Add nodes to the cluster.

• Upgrade the cluster to newer Acropolis versions.

• Update disk firmware and other upgradeable components.

• Add, update, and delete user accounts.

• Specify alert policies.

Prism Central

Provides multicluster management through a single web console and runs as a separate VM.

Note: We will cover both Prism Element and Prism Central in separate lessons
within this course.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 14
Introduction

Prism Interface

Prism is an end-to-end management solution for any virtualized datacenter, with additional
functionality for AHV clusters, and streamlines common hypervisor and VM tasks.

The information in Prism focuses on common operational tasks grouped into four areas:

• Infrastructure management

• Operational insight

• Capacity planning

• Performance monitoring

Prism provides one-click infrastructure management for virtual environments and is hypervisor
agnostic. With AHV installed, Prism and aCLI (Acropolis Command Line Interface) provide more
VM and networking options and functionality.

Cloud Computing (Public, Private, and Hybrid Clouds)

The arrival of cloud computing to enterprise IT brought much more than new business value
and end-user utility. It also added a great deal of confusion. An entirely new set of terms
was created to describe the many varieties of virtual data storage and transmission. First,

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 15
Introduction

you learned about private clouds, or cloud environments that were created to only support
workloads from a specific organization. Private cloud infrastructure like this is usually created
utilizing resources within a company’s own on-prem datacenter. Then as time progressed, you
learned about clouds that are publicly accessed and consumed (public clouds). This means
that all hardware-based networking, storage, and compute resources are owned and managed
by a third-party provider like Amazon Web Services (AWS), Microsoft Azure, or Google Cloud
Platform (GCP). Though workloads are partitioned for data security, these resources are shared
by the customers of a particular public cloud provider.

With two types of clouds to account for now, you would naturally need terminology to describe
the transmission of applications and data between public and private clouds. This architecture
is known as a hybrid cloud. An encrypted highway of sorts, hybrid cloud allows operators to
perform a single task leveraging two separate cloud resources. Most hybrid cloud environments
combine the resources of two separate private clouds. This could be two private, two public, or
a mix of both. If you were to visualize a Venn diagram, and assigned an on-prem private cloud
on the left and a cloud hosted private on the right, a hybrid cloud would entail the sum of both
parts. The overlapping space in the middle represents the encrypted layer.

This middle ground between clouds provides a vital bridge for data transmission. It allows
organizations to leverage cloud capabilities without compromising productivity or security.

Why do businesses want a hybrid cloud environment?

• Businesses that are managing resources privately in both on-prem and cloud hosted
environments.

• Companies who are migrating from a complete on-prem solution to a configuration that
incorporates some usage of public cloud capacity.

• Organizations that are moving back to a private, on-prem datacenter from being primarily
cloud-based.

• IT departments that are deploying a platform-as-a-service (PaaS) or infrastructure-as-

a-service (IaaS) solution in which computational resources can be leveraged without
measurable data risk.

Identify what you are already doing in the cloud

Hybrid cloud infrastructure provides notable flexibility for organizations. You enjoy the secure
access of on-prem resources while also having the rapid scale and elasticity of the public cloud.
And, encrypted data sharing enables industries that manage hypersensitive information such
as public sector entities, law offices, financial service institutions, and healthcare providers
to consume cloud services. Organizations from these industries can store and share data as
needed with external partners while still adhering to regulatory compliance guidelines such as
HIPAA, ISO, PCI-DSS, CIS, NiST, and SOC-2.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 16
Introduction

Why Hybrid Cloud?

Flexibility

The findings also make it clear that enterprise IT teams highly value having the flexibility to
choose the optimum IT infrastructure for each of their business applications on a dynamic
basis, with 61% of respondents saying that application mobility across clouds and cloud types
is “essential.” Cherry-picking infrastructure in this way to match the right resources to each
workload as needs change results in a growing mixture of on- and off-prem cloud resources,
a.k.a. the hybrid cloud.

Security

Security is driving deployment decisions, according to research findings, and respondents

overwhelmingly chose the hybrid cloud model as the one they believe to be the most secure—
even over private clouds and traditional datacenters.

Expanding Cloud Options

The proverbial “cloud” is no longer the simple notion it once was. There was a time when IT
made a fairly straightforward decision whether to run an application in its on-prem datacenter
or in the public cloud. However, with the growth of additional cloud options, such as managed
on premises private cloud services, decision-making has become much more nuanced. Instead
of facing a binary cloud-or-no-cloud situation, IT departments today more often are deciding
on which cloud(s) to use, often on an application-by-application basis.

Reducing Friction in the Hybrid Cloud

Many organizations would like to operate their applications and data in a hybrid environment
spanning on-prem datacenters and public clouds. However, expanding from private to public
cloud environments poses many challenges, including the need to learn new technology skills,
need to rearchitect applications, and multiple tools and silos to manage cloud accounts.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 17
Introduction

There is a pressing need for a single platform that can span private, distributed, and public
clouds so that operators can manage their traditional and modern applications using a
consistent cloud platform.

Using the same platform on both clouds, a hybrid model dramatically reduces the operational
complexity of extending, bursting, or migrating your applications and data between clouds.
Operators can conveniently use the same skill sets, tools, and practices used on-prem to
manage applications running in public clouds such as AWS. Nutanix Clusters integrates with
public cloud accounts like AWS, so you can run applications within existing Virtual Private
Clouds (or VPCs), eliminating network complexity, and improving performance.

Maintain cloud optionality with portable software that can move with your applications and
data across clouds. With a consistent consumption model that spans private and public clouds,
you can confidently plan your long-term hybrid and multicloud strategy, maximizing the
benefits of each environment.

Hybrid Cloud Key Use Cases

• Cloud Portability: Migrate traditional applications without expensive rework or retooling.

• Datacenter Consolidation: Consolidate datacenters without application compatibility
concerns.

• Capacity Bursting: Rapidly add incremental capacity for Dev/Test or seasonal demands.

• On-demand Geo Expansion: Easily expand into regions beyond your current physical
presence.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 18
Introduction

State in Data and Applications: Heirlooms vs Handkerchiefs

In a hybrid world, where cost efficiency is of utmost importance, you must differentiate the
items that must exist and persist, and those that need only be leveraged when called upon.
This concept is called “having state” -where a database may be required to be online 24x7 and
replicated across multiple geographies to ensure data availability and accessibility, while the
application front-end used to access these data may only need to service a given set of users
for their localized working hours.

In this example, the data can be considered ever present, managed, and stateful. These data
are groomed, protected, and correlated with other data sets. The application front-end may be
ephemeral, having only the number of instances required to service the customer’s needs, and
some spare capacity to fulfill potential incoming sessions as needed.

If these application instances ebb and flow with user demand, they can reduce or increase cost
as needed – much like adjusting the flow of water through a faucet, you only use what you need
at any given time. Applications in this type of configuration are not useful for what they are, so
much as what they do.

In that respect, they are much like handkerchiefs – users can use them for their needs, and then
throw them away when they are done. This optimizes cloud resource utilization, minimizes the
impact of complex environments and security footprints at scale, and ultimately preserves cost
efficiency for the business while still providing the agility desired.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 19
Introduction

Networks: Pipes of Friction

The hybrid cloud model presents new challenges with access to these data as well: can you
access the data you need, in the timeframe you require, via the platform of your choice? To
this end data locality is a differentiated advantage, providing faster access and lower turn-
around time to providing an end-user with their requested resources. If data locality reduces
friction between users and their services, then the transmission of that data can impact that
locality by increasing or decreasing performance and speed to access. The more direct path a
set of data requests can take through the cloud, the faster response time services can maintain,
while simultaneously limiting the number of points of failure any given network has in the data
path. Ultimately this reduces friction in the cloud and provides agility for both businesses and
customers alike.

Reducing Friction Through Automation

Existing disparate systems within a company, siloed organizations, as well as unoptimized and
manual workflows drive inefficiencies in any business. These factors all led decision makers
towards the cloud adoption model in the first place, which is delivered “as-a-Service” through
streamlined processes, homogeneous platforms and automated workflows.

By automation of repeatable, expectable, and manual tasks, and inserting a self-service portal
that service consumers can easily satisfy their demands without a chain of human responsibility,
we reduce the task time for IT workers from days per person to potentially zero input. This
allows IT workers to streamline their processes even further, and shift their work onto more

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 20
Introduction

proactive tasks that build on top of this model. Each iteration that allows the business to
consume IT services in a streamlined fashion builds more value and realizes greater innovation
cycles for IT as a whole.

This organization and automation further drive home the desired goal of “Write Once, Read
Many” – where expected and repeatable work can be automated and delivered at the speed
of business without the need for reactive means or interrupting the flow of IT innovation.
Innovation thereby becomes the default state for your IT department, rapidly growing the
capabilities of the business and further creating a differentiation between a business and their
competitors.

Nutanix Zero Trust Architecture

Nutanix approaches Zero Trust in multiple ways:

• Role-Based Authentication Controls (RBAC) and the concept of applied least-privilege in

Prism Central.

• Microsegmentation with Nutanix Flow.

• Data-at-rest encryption with Nutanix’s native key management.

At each of these layers Nutanix helps assert a Zero Trust Architecture to ensure that only the
right user, on the right device and network, has access to the right applications and data.

Let’s talk about them one by one.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 21
Introduction

Identity

Authentication

Prism Central supports user authentication. There are three authentication options:

• Local user authentication: Users can authenticate if they have a local Prism Central account

• Active Directory authentication: Users can authenticate using their Active Directory (or
OpenLDAP) credentials when Active Directory support is enabled for Prism Central.

• SAML authentication: Users can authenticate through a qualified identify provider when
SAML support is enabled for Prism Central. The Security Assertion Markup Language (SAML)
is an open standard for exchanging authentication and authorization data between two
parties, ADFS as the identity provider (IDP) and Prism Central as the service provider.

Authorization: RBAC

Prism Central supports role-based access control (RBAC) that you can configure to provide
customized access permissions for users based on their assigned roles. The roles dashboard
allows you to view information about all defined roles and the users and groups assigned to
those roles.

• Prism Central includes a set of predefined roles.

• You can also define additional custom roles.

• Configuring authentication confers default user permissions that vary depending on the
type of authentication (full permissions from a directory service or no permissions from an
identify provider). You can configure role maps to customize these user permissions.

• You can refine access permissions even further by assigning roles to individual users or
groups that apply to a specified set of entities.

• With RBAC, user roles do not depend on the project membership. You can use RBAC and
log in to Prism Central even without a project membership

Encrypt Everything Everywhere

Evolution of SED / Key Management into software-encrypted-storage and Native KMS.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 22
Introduction

Data-at-rest encryption can be delivered through self-encrypting drives (SED) that are factory-
installed in Nutanix hardware. This provides strong data protection by encrypting user and
application data for FIPS 140-2 Level 2 compliance. For SED drives, key management servers
are accessed via an interface using the industry-standard Key Management Interface Protocol
(KMIP) instead of storing the keys in the cluster. Nutanix also provides the option to use a
native data-at-rest encryption feature that does not require specialized hardware from self-
encrypting drives (SED).

Hardware encryption leveraging SEDs and External Key Management software are costly to
configure and maintain, as well as complex to operate. Nutanix recognized the need to simplify
cost and complexity by enabling software-based storage encryption, introduced in AOS 5.5.
The introduction of Native Key Management also allowed for external key management servers
to be integrated into the Nutanix cluster itself. This further reduced cost and complexity, and
this software-defined solution helps companies further drive simplicity and automation in their
security domains.

Nutanix solutions support SAML integration and optional two-factor authentication for system
administrators in environments requiring additional layers of security. When implemented,
administrator logins require a combination of a client certificate and username and password.

Governance and Compliance

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 23
Introduction

• Configuration Baselines: Nutanix publishes custom security baseline documents, based

on United States Department of Defense (DoD) Security Technical Implementation Guides
(STIGs) that cover the entire infrastructure stack and prescribe steps to secure deployment
in the field. Nutanix baselines are based on common National Institute of Standards and
Technology (NIST) standards that can be applied to multiple regulatory concerns, e.g., for
government, healthcare (HIPAA), or retail and finance (PCI-DSS).

• Automated Validation and Self-Healing: Nutanix baselines are published in a machine-

readable format, allowing for automated validation and ongoing monitoring of the security
baseline for compliance. Nutanix has implemented Security Configuration Management
Automation (SCMA) to efficiently check security entities in the security baselines that cover
both HCI and AHV virtualization. Nutanix automatically reports log inconsistencies and
reverts them to the baseline. With SCMA, systems can self-heal from any deviation and
remain in compliance (hourly, daily, weekly, or monthly intervals).

Capital Expenditures (Capex) vs. Operating Expenditures (Opex)

Businesses seek to be the most agile and competitive in their respective markets, while at the
same time maximizing cost efficiency and ensuring the efforts of the businesses are appropriate
for the demand in those same markets. To that end, every businesses initiative decision
ultimately comes down to financial expense, and the value realized when making any purchase.
There are two models to be aware of when proposing or making purchases in a hybrid cloud
world: Capital expenditures and operating expenditures.

Capex

• Any large purchases a business makes for a project or initiative.

• Normally paid up-front for goods or services to be delivered.
• Written off on taxes on a depreciation schedule.

Capex are any large purchases a business makes for a project or initiative, normally paid
up-front for goods or services to be delivered. Capex costs for IT are normally allocated
for purchases of software and hardware, or one-time costs of service provided for large
projects. An easy way to think of capex costs would be to think of a bottle of water: When you
are thirsty, you simply purchase a bottle of water and drink it. The water consumed can be
considered the utility of the asset, or usefulness of a product (as you are now no longer thirsty)
and the bottle itself is also an asset that can either be repurposed (recycled, refilled with more
water, or used for an art project). In this example, we fully own the bottle, the water inside, and
the utility or value of that water (we are no longer thirsty).

Businesses have traditionally leveraged capex cost models to purchase assets as there are
two major financial benefits: tax write-offs and financial reporting measures. Capex costs are
sometimes seen as a necessary business expense, and as such are written off on taxes on a
depreciation schedule. This means that a purchase of one million dollars ($1M) can be used to
reduce corporate taxes either in one fiscal year, or spread out over the lifetime of the asset
itself (known as a depreciation schedule). Capex purchases also usually result in the gain of
an asset or multiple assets, that can grow the company’s balance sheet while spreading out
the depreciation of that asset’s cost over multiple years. This approach can ultimately show
an increase in valuation of a company and reflect directly in stock price, and from a business
perspective this increase is clearly desirable.

There are also added responsibilities that come with the capex cost model: assets are now
owned, and require upkeep and maintenance; large capital costs in the enterprise need to
be planned for at least a year (or multiple years) in advance, and budgeted by management
to ensure responsible spending; capital costs may not be accurately forecasted and become
a drag on corporate finances (inappropriate sizing of hardware/software; lack of training

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 24
Introduction

incorporated in projects; unknown or undocumented software costs and renewals, etc.) Cost
controls for capital expenditures are normally budgeted for yearly, and follow a schedule of
approvals at various levels in the management chain, giving tight control to corporate costs at
various stages.

Opex

• Regular payments made at repeatable intervals (daily, weekly, monthly, etc.) for
subscriptions and services that assist with the normal daily operation of the business.
• Fully tax deductible at the time of purchase.

In contrast to capex, opex are regular payments made at repeatable intervals (daily, weekly,
monthly, etc.) for subscriptions and services that assist with the normal daily operation of
the business. Normal operational expenses include: payroll for employees, utility expenses
such as electrical costs and cellular provider usage, and SaaS subscription fees. In our water
example, imagine that opex means that you would “pay by the drink” and not own the bottle.
This simplifies purchasing for many businesses, being able to “buy the value” without owning
and managing the asset.

Public cloud expenses fall under this category of operational expenses, as they are a “pay
as you go” model, and none of the physical assets in the cloud are owned by the subscriber.
The financial advantage here is that any subscriptions purchased via opex funds are fully tax
deductible at the time of purchase and may ultimately mean more real-time margin for the
company, equating to hopefully more financial gains for shareholders.

One downside to opex is that there are no assets to add to the corporate balance sheet, so
the additional value is in the utility of a service, and not in the purchase of the actual item
used. Another danger here is the variable nature of cloud usage equates to an unknown
monthly expense. As opposed to capex payments, which are up-front costs to the business,
and budgeted for yearly; opex costs are utility model – paid for after usage, which can put a
strain on finances if not forecasted appropriately. Cloud sprawl, and hence cost sprawl, are
quite common when adopting a new cloud model of operation and can be detrimental to the
business if not monitored carefully.

Ultimately there are benefits and detractors to both financial expense models, and every
business has a certain combination of both capex and opex in use. Knowing the different
options and how to optimize them will ultimately make IT departments successful in delivering
business value while effectively managing costs at scale.

Predictable vs. Unpredictable Workloads

Every business has applications that they rely upon for normal operations: whether they be
database servers to store and retrieve valuable data; web servers for eCommerce sites or even
simply company online presence; to massively complex and intricate ERP systems that manage

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 25
Introduction

the flow from order to delivery, and AI workloads used to formulate and model predictive
algorithms. This diversity of application usage and need requires capacity of hardware and
software to service the need, and to make things more interesting: the usage of any given
application can vary wildly depending on this business needs!

Predictable workloads can be defined as the hardware and software that supports any given
application, which in turn can be expected to operate in an observable and understood fashion.
We can “predict” how an application is going to be used, by simply knowing what business
needs that application serves. One good example of this predictability could be an email server.
An IT department would know how many users they have across the company, set limits on
attachment sizes sent per email, and know how many additional users they would have per
month/quarter/year to plan for growth. Usage by individual will be varied, but ultimately the
total expected performance needs and usage of an email server can be sized and predicted
successfully.

Unpredictable workloads can be viewed as the opposite of predictable workloads: any set
of hardware and software that support a business application that cannot be quantified
or understood to perform in an expected way. A good example of one of these workloads
might be an eCommerce site. Businesses regularly run marketing and sales campaigns to
drive potential customers to their websites to purchase products and services. While we can
(hopefully) expect customers to purchase our offerings, it is unknown how many users may be
attracted to which various campaigns, and hence visit the eCommerce site to buy something.
This lack of visibility is complicated when we look at the seasonality of workloads: Holidays can
see a sharp, drastic uptrend in purchasing from users, while the rest of the year we may have
more expectable results. Similarly, various events may see large inflows of user demand: the
World Cup drives billions of pageviews for FIFA every 4 years, while the remaining 3 may see
much less public demand on their IT infrastructure.

Before cloud computing, both predictable and unpredictable workloads would be hosted on
the same platforms in the datacenter. Businesses would capitalize costs, own assets to perform
the work desired, and service customer needs. The problem with this operating model is that
businesses would suffer from the ownership of an asset, while not reaping any benefit from it,
if that asset was purchased to service unpredictable workloads. The opposite can also be said:
when lacking the appropriate capacity to service customer demand, businesses would suffer
the loss of revenue by losing access to those customers who were unable to purchase goods or
services due to the increased demand.

How do we solve for both predictable and unpredictable workloads while still optimizing
cost and meeting customer demand? By combining the best of both worlds, the hybrid cloud
approach enables predictable workloads to attain efficiency and scale while maximizing capital
expense, and simultaneously enabling unpredictable workloads to achieve the dynamic web-
scale they require to meet the seasonality of user demands and obtaining the unique cost
benefits of “pay as you go”. The elastic nature of the cloud perfectly meets unpredictable
workload needs, while the steady-state nature of predictable workloads can be perfectly suited
for datacenter consumption.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 26
Introduction

One Platform. Any App. Any Location.

The image above encapsulates a broad idea that lies at the heart of Nutanix.

At the top are workloads. Workloads are the reason the underlying infrastructure exists and
is necessary. They are how a business runs, how it grows, and how it shapes its present and
future.

At the bottom are the choices that you have when you consider deploying and running your
workloads on Nutanix. Choosing Nutanix is meant to be liberating, rather than restrictive.
Nutanix supports several leading hardware platforms, so you can run Nutanix software on your
choice of hardware.

And the freedom that comes with choosing hardware platforms extends to the public cloud as
well. If you have workloads on AWS, Azure, or GCP, Nutanix integrates neatly and tightly with
them, so you can continue to benefit from the public cloud when necessary while leveraging the
strengths of your private cloud – in a true, hybrid model.

And in the middle, between the underlying infrastructure and the workloads is the Nutanix
Cloud Platform – the products that power this tremendous freedom. When you choose to
modernize your infrastructure with Nutanix HCI, the only way forward is up – to better security,
to simplified storage, to automated operations, and fully integrated enterprise-grade backup
and DR.
Each product represents a key component of the hybrid cloud. AOS, AHV, and Prism are the
foundation. Every other product can be layered on top and integrates with this foundation to
give you a fully featured enterprise-class hybrid cloud solution.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 27
Introduction

Other Resources

https://www.nutanix.com/resources

Nutanix also maintains a list of resources including whitepapers, solution briefs, ebooks, and
other support material.

http://www.nutanixbible.com
The Nutanix Bible has become a valuable reference point for those that want to learn
about hyperconvergence and web-scale principles or dig deep into Nutanix and hypervisor
architectures. The book explains these technologies in a way that is understandable to IT
generalists without compromising the technical veracity.

https://next.nutanix.com

Nutanix also has a strong community of peers and professionals, the .NEXT community. Access
the community via the direct link shown here or from the Documentation menu in the Support
Portal. The community is a great place to get answers, learn about the latest topics, and lend
your expertise to your peers.

https://www.nutanix.com/support-services/training-certification/

An excellent place to learn and grow your expertise is with Nutanix training and certification.
Learn about other classes and get certified with us.

Additional Lab Source: Test Drive

Nutanix Test drive is a tool provided at no cost that has guided tours on multiple Nutanix
products and features. You can access it directly through the site https://www.nutanix.com/
test-drive-hyperconverged-infrastructure or through my.nutanix.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 28
Introduction

Nutanix University

Nutanix University has many resources, including online training and an “Ask the Experts”
section

Nutanix Certification
Nutanix technical certifications are designed to recognize the skills and knowledge you've
acquired to successfully deploy, manage, optimize, and scale your Enterprise Cloud. Earning
these certifications validates your proven abilities and aptness to guide your organization along
the next phase of your Enterprise Cloud journey.

Visit our website for more information about our certification portfolio.

Labs
1. TCO/ROI video (self-paced)

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 29
Managing the Nutanix Cluster

Module

2
MANAGING THE NUTANIX CLUSTER

Overview
Within this module, you will learn how to manage your Nutanix cluster using various tools. After
completing this module, you will know how to:
• Describe tools like Prism Central, PowerShell, REST API.

• Monitor and configure a cluster using Prism and aCLI/nCLI. .

Note: Within this module you’ll discuss various ways to manage your Nutanix
cluster. First, we’ll start with the Prism Element GUI, then talk about Command
Line Interfaces. Finally, we’ll provide an overview of lesson common tools such as
PowerShell Cmdlets and REST API.

Cluster Example

• A Nutanix cluster is a logical grouping of physical and logical components.

• The nodes in a block can belong to the same or different clusters.

• Joining multiple nodes in a cluster allows for the pooling of resources.

• Acropolis presents storage as a single pool via the Controller VM (CVM).

• As part of the cluster creation process, all storage hardware (SSDs, HDDs, and NVMe) is
presented as a single storage pool.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 30
Managing the Nutanix Cluster

Nutanix AOS Services

When Nutanix is installed on a server, a Controller VM is deployed (CVM). Every CVM has
dedicated memory and reserved CPUs to allow the CVM to perform various services required
by the cluster.

Nutanix Cluster Components

The Nutanix cluster has a distributed architecture, which means that each node in the cluster
shares in the management of cluster resources and responsibilities. Within each node, there are
software components (aka AOS Services) that perform specific tasks during cluster operation.

All components run on multiple nodes in the cluster and depend on connectivity between their
peers that also run the component. Most components also depend on other components for
information.

Zookeeper

Zookeeper stores information about physical components, including their IP addresses,

capacities, and data replication rules, in the cluster configuration.

Zookeeper runs on either three or five nodes, depending on the redundancy factor (number of
data block copies) applied to the cluster. Zookeeper uses multiple nodes to prevent stale data

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 31
Managing the Nutanix Cluster

from being returned to other components. An odd number provides a method for breaking ties
if two nodes have different information.

Of these nodes, Zookeeper elects one node as the leader. The leader receives all requests for
information and confers with the two follower nodes. If the leader stops responding, a new
leader is elected automatically.

Zookeeper has no dependencies, meaning that it can start without any other cluster
components running.

Zeus

Zeus is an interface to access the information stored within Zookeeper and is the Nutanix
library that all other components use to access the cluster configuration.

A key element of a distributed system is a method for all nodes to store and update the
cluster's configuration. This configuration includes details about the physical components in the
cluster, such as hosts and disks, and logical components, like storage containers.

Medusa

Distributed systems that store data for other systems (for example, a hypervisor that hosts
virtual machines) must have a way to keep track of where that data is. In the case of a Nutanix
cluster, it is also important to track where the replicas of that data are stored.

Medusa is a Nutanix abstraction layer that sits in front of the database that holds metadata.
The database is distributed across all nodes in the cluster, using a modified form of Apache
Cassandra.

Cassandra

Cassandra is a distributed, high-performance, scalable database that stores all metadata about
the guest VM data stored in a Nutanix datastore.

Cassandra runs on all nodes of the cluster. Cassandra monitor Level-2 periodically sends
heartbeat to the daemon, that include information about the load, schema, and health of all the
nodes in the ring. Cassandra monitor L2 depends on Zeus/Zk for this information.

Stargate

A distributed system that presents storage to other systems (such as a hypervisor) needs a
unified component for receiving and processing data that it receives. The Nutanix cluster has a
software component called Stargate that manages this responsibility.

All read and write requests are sent across an internal vSwitch to the Stargate process running
on that node.

Stargate depends on Medusa to gather metadata and Zeus to gather cluster configuration data.

From the perspective of the hypervisor, Stargate is the main point of contact for the Nutanix
cluster.

Note: If Stargate cannot reach Medusa, the log files include an HTTP timeout. Zeus
communication issues can include a Zookeeper timeout.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 32
Managing the Nutanix Cluster

Curator

A Curator master node periodically scans the metadata database and identifies cleanup and
optimization tasks that Stargate should perform. Curator shares analyzed metadata across
other Curator nodes.

Curator depends on Zeus to learn which nodes are available, and Medusa to gather metadata.
Based on that analysis, it sends commands to Stargate.

Prism Features
Infrastructure Management

• Streamline common hypervisor and VM tasks.

• Deploy, configure, and manage clusters for storage and virtualization.

• Deploy, configure, migrate, and manage virtual machines.

• Create datastores, manage storage policies, and administer DR.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 33
Managing the Nutanix Cluster

Performance Monitoring

• Provides real-time performance behavior of VMs and workloads.

• Utilizes predictive monitoring based on behavioral analysis to detect anomalies.

• Detects bottlenecks and provides guidance for VM resource allocation.

Operational Insight (Prism Pro)

• Advanced machine learning technology

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 34
Managing the Nutanix Cluster

• Built-in heuristics and business intelligence

• Customizable dashboards

• Built-in and custom reporting

• Single-click query

Capacity Planning (Prism Pro)

• Predictive analytics based on capacity usage and workload behavior

• Capacity optimization advisor

• Capacity expansion forecast

Accessing Prism Element

Nutanix supports current browser editions as well as the previous two major versions for:

• Firefox

• Chrome

• Safari

• Internet Explorer (10 and 11)

• Microsoft Edge

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 35
Managing the Nutanix Cluster

Enabling Pulse

Pulse is enabled by default and monitors cluster health and proactively notifies customer
support if a problem is detected.

• Collects cluster data automatically and unobtrusively with no performance impact

• Sends diagnostic data via e-mail to both Nutanix Support and the user (if configured) once
per day, per node

• Proactive monitoring (different from alerts)

Controller VMs communicate with ESXi hosts and IPMI interfaces throughout the cluster to
gather health information.

Warnings and errors are also displayed in Prism Element, where administrators can analyze the
data and create reports.

What Gets Shared by Pulse?

Information collected and shared

• System alerts

• Current Nutanix software version

• Nutanix processes and Controller VM information

• Hypervisor details such as type and version

• System-level statistics

• Configuration information

Information not shared

• Guest VMs

• User data

• Metadata

• Administrator credentials

• Identification data

• Private information

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 36
Managing the Nutanix Cluster

Command Line Interfaces

You can run system administration commands against a Nutanix cluster from a local machine or
any CVM in the cluster.

There are two most commonly used command line interfaces (CLIs).

• nCLI – Get status and configure entities within a cluster.

• aCLI – Manage the Acropolis portion of the Nutanix environment: hosts, networks, snapshots,
and VMs.

The Acropolis 5.15 Command Reference on the Support Portal contains nCLI, aCLI and CVM
commands.

Nutanix Command Line Interface (nCLI)

ncli> entity action parameter1=value parameter2=value ...

From Prism Element, download the nCLI installer to a local machine. This requires Java Runtime
Environment (JRE) version 5.0 or higher.

The PATH environment variable should point to the nCLI folder as well as the JRE bin folder.

Once downloaded and installed, go to a bash shell or command prompt and point ncli to the
cluster IP or any controller CVM.

Enter ncli -s management_ip_addr -u 'username' -p 'user_password'

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 37
Managing the Nutanix Cluster

Command Format
nCLI commands must match the following format:
ncli> entity action parameter1=value parameter2=value ...

• You can replace entity with any Nutanix entity, such as cluster or disk.

• You can replace action with any valid action for the preceding entity. Each entity has a
unique set of actions, but a common action across all entities is list. For example, you can
type the following command to request a list of all storage pools in the cluster.
ncli> storagepool list

Some actions require parameters at the end of the command. For example, when creating
an NFS datastore, you need to provide both the name of the datastore as it appears to the
hypervisor and the name of the source storage container.
ncli> datastore create name="NTNX-NFS" ctr-name="nfs-ctr"

You can list parameter-value pairs in any order, as long as they are preceded by a valid entity
and action.

Note: Tip: To avoid syntax errors, surround all string values with double-quotes,
as demonstrated in the preceding example. This is particularly important when
specifying parameters that accept a list of values.

nCLI Entities and Parameters

Each entity has unique actions, but a common action for all entities is list ncli> storagepool list.

Some actions require parameters ncli> datastore create name="NTNX-NFS" ctr-name="nfs- ctr".

You can list parameter-value pairs in any order. You should surround string values with quotes.

Note: This is critical when specifying a list of values.

nCLI Embedded Help

The nCLI provides assistance on all entities and actions. By typing help at the command line,
you can request additional information at one of three levels of detail.

• help provides a list of entities and their corresponding actions

• <entity> help provides a list of all actions and parameters associated with the entity, as well
as which parameters are required, and which are optional

• <entity> action help provides a list of all parameters associated with the action, as well as a
description of each parameter

The nCLI provides additional details at each level. To control the scope of the nCLI help output,
add the detailed parameter, which can be set to either true or false.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 38
Managing the Nutanix Cluster

For example, type the following command to request a detailed list of all actions and
parameters for the cluster entity.
ncli> cluster help detailed=true

You can also type the following command if you prefer to see a list of parameters for the
cluster edit-params action without descriptions.
ncli> cluster edit-params help detailed=false

allssh and hostssh

allssh

• You can run the command from any CVM

• Executes the command on all CVMs within the cluster

hostssh

• You can run the command from any CVM

• Executes the command on all hosts within the cluster

aCLI Example
• Create a new virtual network for VMs: net.create vlan.100 ip_config=10.1.1.1/24

• Add a DHCP pool to a managed network: net.add_dhcp_pool vlan.100 start=10.1.1.100

end=10.1.1.200

• Clone a VM: vm.clone testClone clone_from_vm=Edu01VM

Note: Use extreme caution when executing allssh commands. The allssh command
executes a ssh command to all CVMs in the cluster.

PowerShell Cmdlets
PowerShell Cmdlets Examples
• Connect to a cluster: Connect-NutanixCluster –Server <Cluster IP> -UserName <Prism User> -P
<Password>

• Get information about the cluster you are connected to: Get-NutanixCluster

• Get information about ALL of the clusters you are connected to by specifying a CVM IP for
each cluster: Get-NutanixCluster -Server cvm_ip_addr

• Get help while in the PowerShell interface: Get-Help

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 39
Managing the Nutanix Cluster

PowerShell Cmdlets (Partial List)

Acropolis Task Administration

Get-NTNXTask
Poll-NTNXTask
Acropolis VMAdministration
Operations
Add-NTNXVMDisk
Get-NTNXVMDisk
Remove- NTNXVMDisk
Set- NTNXVMDisk
Stop- NTNXVMDisk
Stop-NTNXVMMove
Add-NTNXVMNIC
Get- NTNXVMNIC
Remove-NTNXVMNIC

Acropolis Network Administration

Get-NTNXNetwork
New-NTNXNetwork
Remove-NTNXNetwork
Set-NTNXNetwork
Get-NTNXNetworkAddressTable
Reserve-NTNXNetworkIP
UnReserve-NTNXNetworkIP

Acropolis Snapshot Administration

Clone-NTNXSnapshot
Get-NTNXSnapshot
New-NTNXSnapshot
Remove-NTNXSnapshot

REST API

Labs
1. Connecting to Prism Element

2. Configuring an NTP Server

3. Using Nutanix Interfaces

4. Exploring Prism Views

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 40
Managing the Nutanix Cluster

5. Using nCLI

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 41
Securing the Nutanix Cluster

Module

3
SECURING THE NUTANIX CLUSTER

Overview
After completing this module, you will be able to

• Describe Nutanix cluster security methodologies.

• Use Data at Rest Encryption (DARE) to encrypt data.

• Understand how key-based SSH access to a cluster works.

• Configure user authentication.

Security Overview

The Nutanix Security Development Life Cycle

The Nutanix Enterprise Cloud follows the Nutanix Security Development Life Cycle (SecDL).

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 42
Securing the Nutanix Cluster

Our security development life cycle integrates security into every step of product development,
rather than applying it as an afterthought. It’s is a foundational part of product design. The
pervasive culture and processes built around security harden the Enterprise Cloud OS and
eliminate zero-day vulnerabilities.

For example, research and development teams work together to fully understand all the code
in the product, whether it is produced in-house or inherited from dependencies. We schedule
product updates to handle known common vulnerabilities and exposures (CVEs) for minor
release cycles, and backport all dependencies to their latest release versions in major release
cycles. This approach significantly reduces zero-day risks without slowing down product
evolution.

Efficient one-click operations and self-healing security models enable automation to maintain
security in an always-on hyperconverged solution. And finally, since this is about more than just
our platform, Nutanix also delivers validated joint solutions with security-focused vendors.

• Mitigates risk through repeated assessment and testing.

• Performs fully automated testing during development, and times all security-related code
modifications during minor releases to minimize risk.

• Assesses and mitigates customer risk from code changes by using threat modeling.

Nutanix has been tested against multiple industry standards.

• Passes DoD and Federal security audits

• Certified in HIPAA environments

• Certified in payment and financial environments

• Implements and conforms to Security Technical Implementation Guides (STIG)

Note: Download the Information Security with Nutanix Tech Note for more
information on this topic.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 43
Securing the Nutanix Cluster

Security in the Enterprise Cloud

Nutanix security features include:

Two-Factor Authentication
Logons require a combination of a client certificate and username and password.
Administrators can use local accounts or use AD.

• One-way: Authenticate to the server

• Two-way: Server also authenticates the client

• Two-factor: Username/Password and a valid certificate

Cluster Lockdown

You can restrict access to a Nutanix cluster. SSH sessions can be restricted through
nonrepudiated keys.

• Each node employs a public/private key-pair

• Cluster secured by distributing these keys

You can disable remote logon with a password. You can completely lock down SSH access
by disabling remote logon and deleting all keys except for the interCVM and CVM to host
communication keys.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 44
Securing the Nutanix Cluster

Hardware Key Management and Administration

Nutanix nodes are authenticated by a physical key management server

(KMS). Required SEDs generate new encryption keys, which are uploaded to the KMS. In the
event of power failure or a reboot, keys are retrieved from the KMS and used to unlock the
SEDs.

Software Key Management and Administration

Nutanix nodes are authenticated by a Cluster Local KMS, SEDs not required

Security Technical Implementation Guides (STIGs)

Once deployed, STIGs lock down IT environments and reduce security vulnerabilities in
infrastructure.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 45
Securing the Nutanix Cluster

Traditionally, using STIGs to secure an environment is a manual process that is highly time-
consuming and prone to operator error. Because of this, only the most security-conscious IT
shops follow the required process.

Nutanix has created custom STIGs that are based on the guidelines outlined by DISA to keep
the Enterprise Cloud Platform within compliance and reduce attack surfaces.

Nutanix includes STIGs that collectively check over 800 security entities covering storage,
virtualization, and management:

• AHV

• AOS

• Prism

• Web server

• Prism reverse proxy

Working with STIGs

To make the STIGs usable by all organizations, Nutanix provides the STIGs in machine-readable
XCCDF.xml format and PDF. This enables organizations to use tools that can read STIGs and
automatically validate the security baseline of a deployment, reducing the accreditation time
required to stay within compliance from months to days.

Nutanix leverages SaltStack and SCMA to self-heal any deviation from the security baseline
configuration of the operating system and hypervisor to remain in compliance. If any
component is found as non-compliant, then the component is set back to the supported
security settings without any intervention. To achieve this objective, Nutanix Controller VM
conforms to RHEL 7 (Linux 7) STIG as published by DISA. Additionally, Nutanix maintains its
own STIG for the Acropolis Hypervisor (AHV).

STIG SCMA Monitoring

Security Configuration Management Automation (SCMA)

• Monitors over 800 security entities covering storage, virtualization, and management

• Detects unknown or unauthorized changes and can self-heal to maintain compliance

• Logs SCMA output/actions to syslog

The SCMA framework ensures that services are constantly inspected for variance to the
security policy.

Nutanix has implemented security configuration management automation (SCMA) to check

multiple security entities for both Nutanix storage and AHV. Nutanix automatically reports log
inconsistencies and reverts them to the baseline.

With SCMA, you can schedule the STIG to run hourly, daily, weekly, or monthly. STIG has the
lowest system priority within the virtual storage controller, ensuring that security checks do not
interfere with platform performance.

Data at Rest Encryption

Data at Rest Encryption (DARE) secures data while at rest using built-in key-based access
management.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 46
Securing the Nutanix Cluster

• Data is encrypted on all drives at all times.

• Data is inaccessible in the event of drive or node theft.

• Data on a drive can be securely destroyed.

• Key authorization enables password rotation at arbitrary times.

• Protection can be enabled or disabled at any time.

• No performance penalty is incurred despite encrypting all data.

Enterprise key management (with KMS): A consolidated, central key management server (KMS)
which provides service to multiple cryptographic clusters.

Nutanix provides a software-only option for data-at-rest security with the Ultimate license. This
does not require the use of self-encrypting drives.

DARE Implementation

1. Install SEDs for all data drives in a cluster. The drives are FIPS 140-2 Level 2 validated and
use FIPS 140-2 validated cryptographic modules.

2. When you enable data protection for the cluster, the Controller VM must provide the proper
key to access data on a SED.

3. Keys are stored in a key management server that is outside the cluster, and the Controller
VM communicates with the key management server using the Key Management
Interoperability Protocol (KMIP) to upload and retrieve drive keys.

4. When a node experiences a full power off and power on (and cluster protection is enabled),
the Controller VM retrieves the drive keys from the key management server and uses them
to unlock the drives.

Use Prism to manage key management device and certificate authorities.

Each Nutanix node automatically:

1. Generates an authentication certificate and adds it to the key management device

2. Auto-generates and sets PINs on its respective FIPS-validated SED.

The Nutanix controller in each node then adds the PINs (aka KEK, key encryption key) to the
key management device.

Once the PIN is set on an SED, you need the PIN to unlock the device (lose the PIN, lose data).
You can reset the PIN using the SecureErase primitive to “unsecure” the disk/partition, but all
existing data is lost in this case.

This is an important detail if you move drives between clusters or nodes.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 47
Securing the Nutanix Cluster

ESXi and NTNX boot partition remain unencrypted. SEDs support encrypting individual disk
partitions selectively using the “BAND” feature (a range of blocks).

Configuring Authentication

Changing Passwords
It is possible to change 4 different sets of passwords in a Nutanix cluster: user, CVM, IPMI, and
the hypervisor host.

When you will change these passwords depends on your company’s IT security policies. Most
companies enforce password changes on a schedule via security guidelines, but the intervals
are usually company specific.

Nutanix enables administrators with password complexity features such as forcing the use of
upper/lower case letters, symbols, numbers, change frequency, and password length. After you
have successfully changed a password, the new password is synchronized across all Controller
VMs and interfaces (Prism web console, nCLI, and SSH).

By default, the admin user password does not expire and can be changed at any time. If you
do change the admin password, you will also need to update any applications and scripts that
use the admin credentials for authentication. For authentication purposes, Nutanix recommends
that you create a user with an admin role, instead of using the admin account.

Note: For more information on this topic, please see the Nutanix Support Portal >
Common Criteria Guidance Reference > User Identity and Authentication.

Changing User Passwords

You can change user passwords, including for the default admin user, in the web console or
nCLI. Changing the password through either interface changes it for both.

To change a user password, do one of the following:

Using the web console: Log on to the web console as the user whose password is to be
changed and select Change Password from the user icon pull-down list of the main menu.

Note: For more information about changing properties of the current users, see the
Web Console Guide.

Using nCLI: Specify the username and passwords.

$ ncli -u 'username' -p 'old_pw' user change-password current-password="curr_pw" new-
password="new_pw"

Remember to:

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 48
Securing the Nutanix Cluster

• Replace username with the name of the user whose password is to be changed.

• Replace curr_pw with the current password.

• Replace new_pw with the new password.

Note: If you change the password of the admin user from the default, you must
specify the password every time you start an nCLI session from a remote system.
A password is not required if you are starting an nCLI session from a Controller VM
where you are already logged on.

Changing the CVM Password

For a Regular User Account

Perform these steps on any one Controller VM in the cluster to change the password of
the nutanix user. After you have successfully changed the password, the new password is
synchronized across all Controller VMs in the cluster. During the sync, you will see a task
appear in the Recent Tasks section of Prism and will be notified when the password sync task is
complete.

1. Log on to the Controller VM with SSH as the nutanix user.

2. Change the nutanix user password.

nutanix@cvm$ passwd

3. Respond to the prompts, providing the current and new nutanix user password.
Changing password for nutanix.
Old Password:
New password:
Retype new password:
Password changed.

Changing the IPMI Password

This procedure helps prevent the BMC password from being retrievable on port 49152.

Although it is not required for the administrative user to have the same password on all hosts,
doing so makes cluster management much easier. If you do select a different password for one
or more hosts, make sure to note the password for each host.

Note: The maximum allowed length of the IPMI password is 19 characters, except
on ESXi hosts, where the maximum length is 15 characters.

Note: Do not use the following special characters in the IPMI password: & ; ` ' \ " |
* ? ~ < > ^ ( ) [ ] { } $ \n \r

Change the administrative user password of all IPMI hosts.

Perform these steps on every IPMI host in the cluster.

1. Sign in to the IPMI web interface as the administrative user.

2. Navigate to the administrative user configuration and modify the user

3. Update the password

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 49
Securing the Nutanix Cluster

Note: It is also possible to change the IPMI password for ESXi, Hyper-V, and AHV
if you do not know the current password but have root access to the host. For
instructions on how to do this, please see the relevant section of the NX and SX
Series Hardware Administration Guide on the Nutanix support portal.

Changing the Acropolis Host Password

Perform these steps on every Acropolis host in the cluster.

1. Log on to the AHV host with SSH.

2. Change the root password.

root@ahv# passwd root

3. Respond to the prompts, providing the current and new root password.
Changing password for root.
New password:
Retype new password:
Password changed.

Role-Based Access Control

• Prism Central includes a set of predefined roles.

• You can also define additional custom roles.

• Configuring authentication confers default user permissions that vary depending on the
type of authentication (full permissions from a directory service or no permissions from an
identity provider). You can configure role maps to customize these user permissions.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 50
Securing the Nutanix Cluster

• You can refine access permissions even further by assigning roles to individual users or
groups that apply to a specified set of entities.

Note: Defining custom roles and assigning roles are supported on AHV only.

Gathering Requirements to Create Custom Roles

When securely delegating access to applications and infrastructure components, the ideal way
to assign permissions is to follow the rule of least privilege – that is, a person with elevated
access should have the permissions necessary for them to do their day-to-day work, no more.

Nutanix RBAC enables this by providing fine grained controls when creating custom roles in
Prism Central. As an example, it is possible to create a VM Admin role with the ability to view
VMs, limited permission to modify CPU, memory, and power state, and no other administrative
privileges.

When creating custom roles for your organization, remember to:

• Clearly understand the specific set of tasks a user will need to perform their job

• Identify permissions that map to those tasks and assign them accordingly

• Document and verify your custom roles to ensure that the correct privileges have been
assigned

Built-in Roles
The following built-in roles are defined by default. You can see a more detailed list of
permissions for any of the built-in roles through the details view for that role. The Project
Admin, Developer, Consumer, and Operator roles are available when assigning roles in a project.

Role Privileges

Super Admin Full administrator privileges

Full administrator privileges except for creating or modifying the user

Prism Admin accounts

Prism Viewer View-only privileges

Manages all cloud-oriented resources and services

Self-Service
Admin This is the only cloud administration role available.

Manages cloud objects (roles, VMs, Apps, Marketplace) belonging to a

project

You can specify a role for a user when you assign a user to a project, so
Project Admin individual users or groups can have different roles in the same project.

Developer Develops, troubleshoots, and tests applications in a project

Consumer Accesses the applications and blueprints in a project

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 51
Securing the Nutanix Cluster

Custom Roles
If the built-in roles are not sufficient for your needs, you can create one or more custom roles.
After creation, these roles can also be modified if necessary.

Note: Custom role creation is only possible in AHV.

Creating a Custom Role

You can create a custom role from the Roles dashboard, with the following parameters:

• Name

• Description

• Permissions for VMs, blueprints, apps, marketplace items, and reports management

Modifying or Deleting a Custom Role

A custom role can also be modified or deleted from the Roles dashboard. When updating a
role, you will be able to modify the same parameters that are available when creating a custom
role. To delete a role, select the Delete option from the Actions menu and provide confirmation
when prompted.

Configuring Role Mapping

When user authentication is enabled, the following permissions are applied:

• Directory-service-authorized users are assigned full administrator permissions by default.

• SAML-authorized users are not assigned any permissions by default; they must be explicitly
assigned.

Note: To configure user authentication, please see the Prism Web Console Guide >
Security Management > Configuring Authentication section.

You can refine the authentication process by assigning a role (with associated permissions) to
users or groups. To assign roles:

1. Navigate to the Role Mapping section of the Settings page.

2. Create a role mapping and provide information for the directory or provider, role, entities
that should be assigned to the role, and then save. Repeat this process for each role that you
want to create.

You can edit a role map entry, which will present you with the same field available when
creating a role map. Make your desired changes and save to update the entry.

You can also delete a role map entry, by clicking the delete icon and then providing
confirmation when prompted.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 52
Securing the Nutanix Cluster

Working with SSL Certificates

Nutanix supports SSL certificate-based authentication for console access. AOS includes a self-
signed SSL certificate by default to enable secure communication with a cluster. AOS allows
you to replace the default certificate through the web console Prism user interface.

For more information, see the Nutanix Controller VM Security Operations Guide and
the Certificate Authority sections of the Common Criteria Guidance Reference on the Support
Portal.

Labs
1. Adding a user

2. Verifying the new user account

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 53
Networking

Module

4
NETWORKING

Overview
After completing this module, you will be able to:

• Explain managed and unmanaged Acropolis networks.

• Describe the use of Open vSwitch (OVS) in Acropolis.
• Display network details using Prism.
• Differentiate supported OVS bond modes.
• Discuss default network configuration.

Default Network Configuration

The following diagram illustrates the networking configuration of a single host. The best
practice is to use only the 10 GB NICs and to disconnect the 1 GB NICs if you do not need them
or put them in a separate bond to be used in not critical networks.

Connections from the server to the physical switch use 10 GbE or higher interfaces. You can
establish connections between the switches with 40 GbE or faster direct links, or through a
leaf-spine network topology (not shown). The IPMI management interface of the Nutanix node
also connects to the out-of-band management network, which may connect to the production
network, but it is not mandatory. Each node always has a single connection to the management
network, but we have omitted this element from further images in this document for clarity and
simplicity.

Review the Leaf Spine section of the Physical Networking Guide for more information on leaf-
spine topology.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 54
Networking

Default Network Configuration (cont.)

Open vSwitch (OVS)

Open vSwitch OVS is an open source software switch implemented in the Linux kernel and
designed to work in a multiserver virtualization environment. By default, OVS behaves like a
layer-2 switch that maintains a MAC address table. The hypervisor host and VMs connect to
virtual ports on the switch. OVS supports many popular switch features, such as VLAN tagging,
load balancing, and Link Aggregation Control Protocol (LACP.)

Each AHV server maintains an OVS instance, and all OVS instances combine to form a single
logical switch. Constructs called bridges manage the switch instances residing on the AHV
hosts. Use the following commands to configure OVS with bridges, bonds, and VLAN tags. For
example:

•ovs-vsctl (on the AHV hosts)

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 55
Networking

•ovs-appctl (on the AHV hosts)

•manage_ovs (on CVMs)

See the Open vSwitch website for more information.

Bridges
Bridges act as virtual switches to manage traffic between physical and virtual network
interfaces. The default AHV configuration includes an OVS bridge called br0 and a native Linux
bridge called virbr0 (the names could vary between AHV/AOS versions and depending on
what configuration changes were done on the nodes, but in this training we will use br0 and
virbr0 by default). The virbr0 Linux bridge carries management and storage communication
between the CVM and AHV host. All other storage, host, and VM network traffic flows through
the br0 OVS bridge. The AHV host, VMs, and physical interfaces use "ports" for connectivity to
the bridge.

Ports
Ports are logical constructs created in a bridge that represent connectivity to the virtual switch.
Nutanix uses several port types, including internal, tap, VXLAN, and bond.

• An internal port with the same name as the default bridge (br0) provides access for the AHV
host.

• Tap ports connect virtual NICs presented to VMs.

• Use VXLAN ports for IP address management functionality provided by Acropolis.

• Bonded ports provide NIC teaming for the physical interfaces of the AHV host.

Bonds
Bonded ports aggregate the physical interfaces on the AHV host. By default, the system
creates a bond named br0-up in bridge br0 containing all physical interfaces. Changes to
the default bond br0-up using manage_ovs commands can rename it to bond0. Remember,
bond names on your system might differ from the diagram below. Nutanix recommends using
the name br0-up to quickly identify this interface as the bridge br0 uplink. Using this naming
scheme, you can also easily distinguish uplinks for additional bridges from each other.

OVS bonds allow for several load-balancing modes, including active-backup, balance-slb, and
balance-tcp. Active-backup mode is enabled by default. Nutanix recommends this mode for
ease of use.

The following diagram illustrates the networking configuration of a single host immediately
after imaging. The best practice is to use only the 10 GB NICs and to disconnect the 1 GB NICs if
you do not need them.

Only utilize NICs of the same speed within the same bond.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 56
Networking

Bond Modes

There are three load-balancing/failover modes that can be applied to bonds:

• active-backup (default)

• balance-slb

• LACP with balance-tcp

Active-Backup

With the active-backup bond mode, one interface in the bond carries traffic and other
interfaces in the bond are used only when the active link fails. Active-backup is the simplest
bond mode, easily allowing connections to multiple upstream switches without any additional
switch configuration. The active-backup bond mode requires no special hardware and you can
use different physical switches for redundancy.

The tradeoff is that traffic from all VMs uses only a single active link within the bond at
one time. All backup links remain unused until the active link fails. In a system with dual
10 GB adapters, the maximum throughput of all VMs running on a Nutanix node with this
configuration is 10 Gbps or the speed of a single link.

This mode only offers failover ability (no traffic load balancing.) If the active link goes down,
a backup or passive link activates to provide continued connectivity. AHV transmits all traffic
including those from the CVM and VMs across the active link. All traffic shares 10 Gbps of
network bandwidth.

Balance-SLB
To take advantage of the bandwidth provided by multiple upstream switch links, you can use
the balance-slb bond mode. The balance-slb bond mode in OVS takes advantage of all links in
a bond and uses measured traffic load to rebalance VM traffic from highly used to less used
interfaces. When the configurable bond-rebalance interval expires, OVS uses the measured
load for each interface and the load for each source MAC hash to spread traffic evenly among
links in the bond. Traffic from some source MAC hashes may move to a less active link to more
evenly balance bond member utilization.

Perfectly even balancing may not always be possible, depending on the number of source
MAC hashes and their stream sizes. Each individual VM NIC uses only a single bond member
interface at a time, but a hashing algorithm distributes multiple VM NICs’ multiple source MAC
addresses across bond member interfaces. As a result, it is possible for a Nutanix AHV node
with two 10 GB interfaces to use up to 20 Gbps of network throughput. Individual VM NICs have
a maximum throughput of 10 Gbps, the speed of a single physical interface. A VM with multiple

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 57
Networking

NICs could still have more bandwidth than the speed of a single physical interface, but there is
no guarantee that the different VM NICs will land on different physical interfaces.

The default rebalance interval is 10 seconds, but Nutanix recommends setting this interval to
30 seconds to avoid excessive movement of source MAC address hashes between upstream
switches. Nutanix has tested this configuration using two separate upstream switches with
AHV. If the upstream switches are interconnected physically or virtually, and both uplinks allow
the same VLANs, no additional configuration, such as link aggregation is necessary.

Note: Do not use link aggregation technologies such as LACP with balance-slb.
The balance-slb algorithm assumes that upstream switch links are independent L2
interfaces. It handles broadcast, unicast, and multicast (BUM) traffic, selectively
listening for this traffic on only a single active adapter in the bond.

Note: Do not use IGMP snooping on physical switches connected to Nutanix

servers using balance-slb. Balance-slb forwards inbound multicast traffic on only a
single active adapter and discards multicast traffic from other adapters. Switches
with IGMP snooping may discard traffic to the active adapter and only send it
to the back up adapters. This mismatch leads to unpredictable multicast traffic
behavior. Disable IGMP snooping or configure static IGMP groups for all switch
ports connected to Nutanix servers using balance-slb. IGMP snooping is often
enabled by default on physical switches.

Note: Both active-backup and balance-slb do not require configuration on the

switch side.

LACP with Balance-TCP

Taking full advantage of bandwidth provided by multiple links to upstream switches, from
a single VM, requires dynamically negotiated link aggregation and load balancing using
balance-tcp. Nutanix recommends dynamic link aggregation with LACP instead of static link
aggregation due to improved failure detection and recovery.

Note: Ensure that you have appropriately configured the upstream switches before
enabling LACP. On the switch, link aggregation is commonly referred to as port
channel or LAG, depending on the switch vendor. Using multiple upstream switches
may require additional configuration such as MLAG or vPC. Configure switches to
fall back to active-backup mode in case LACP negotiation fails sometimes called
fallback or no suspend-individual. This setting assists with node imaging and initial
configuration where LACP may not yet be available.

Note: Review the following documents for more information

on MLAG and vPC best practices.

With link aggregation negotiated by LACP, multiple links to separate physical switches appear
as a single layer-2 (L2) link. A traffic-hashing algorithm such as balance-tcp can split traffic
between multiple links in an active-active fashion. Because the uplinks appear as a single L2
link, the algorithm can balance traffic among bond members without any regard for switch
MAC address tables. Nutanix recommends using balance-tcp when using LACP and link
aggregation, because each TCP stream from a single VM can potentially use a different uplink in
this configuration.

With link aggregation, LACP, and balance-tcp, a single guest VM with multiple TCP streams
could use up to 20 Gbps of bandwidth in an AHV node with two 10 GB adapters.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 58
Networking

Configuring Link Aggregation

Configure link aggregation with LACP and balance-tcp using the commands below on all
Nutanix CVMs in the cluster.

Note: You must configure upstream switches for link aggregation with LACP
before configuring the AHV host from the CVM. Upstream LACP settings, such as
timers, should match the AHV host settings for configuration consistency. See KB
3263 for more information on LCAP configuration.

If upstream LACP negotiation fails, the default AHV host configuration disables the bond, thus
blocking all traffic. The following command allows fallback to active-backup bond mode in the
AHV host in the event of LACP negotiation failure:
nutanix@CVM$ ssh root@192.168.5.1 "ovs-vsctl set port br0-up other_config:lacp-fallback-
ab=true"

In the AHV host and on most switches, the default OVS LACP timer configuration is slow,
or 30 seconds. This value — which is independent of the switch timer setting — determines
how frequently the AHV host requests LACPDUs from the connected physical switch. The
fast setting (1 second) requests LACPDUs from the connected physical switch every second,
helping to detect interface failures more quickly. Failure to receive three LACPDUs — in other
words, after 3 seconds with the fast setting — shuts down the link within the bond. Nutanix
recommends setting lacp-time to decrease the time it takes to detect link failure from 90
seconds to 3 seconds. Only use the slower lacp-time setting if the physical switch requires it for
interoperability.
nutanix@CVM$ ssh root@192.168.5.1 "ovs-vsctl set port br0-up other_config:lacp-time=fast"

Next, enable LACP negotiation and set the hash algorithm to balance-tcp.
nutanix@CVM$ ssh root@192.168.5.1 "ovs-vsctl set port br0-up lacp=active"
nutanix@CVM$ ssh root@192.168.5.1 "ovs-vsctl set port br0-up bond_mode=balance-tcp"

Confirm the LACP negotiation with the upstream switch or switches using ovs-appctl, looking
for the word "negotiated" in the status lines.
nutanix@CVM$ ssh root@192.168.5.1 "ovs-appctl bond/show br0-up"
nutanix@CVM$ ssh root@192.168.5.1 "ovs-appctl lacp/show br0-up"

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 59
Networking

Virtual Local Area Networks (VLANs)

AHV supports two different ways to provide VM connectivity: managed and unmanaged
networks.

With unmanaged networks, VMs get a direct connection to their VLAN of choice. Each virtual
network in AHV maps to a single VLAN and bridge. All VLANs allowed on the physical switch
port to the AHV host are available to the CVM and guest VMs. You can create and manage
virtual networks, without any additional AHV host configuration, using:

• Prism Element

• Acropolis CLI (aCLI)

• REST API

Acropolis binds each virtual network it creates to a single VLAN. During VM creation, you can
create a virtual NIC and associate it with a network and VLAN. Or, you can provision multiple
virtual NICs each with a single VLAN or network.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 60
Networking

IP Address Management (IPAM)

A managed network is a VLAN plus IP Address Management (IPAM). IPAM is the cluster
capability to function like a DHCP server, to assign an IP address to a VM that sits on the
managed network.

Administrators can configure each virtual network with a specific IP subnet, associated domain
settings, and group of IP address pools available for assignment.

• The Acropolis Master acts as an internal DHCP server for all managed networks.

• The OVS is responsible for encapsulating DHCP requests from the VMs in VXLAN and
forwarding them to the Acropolis Master.

• VMs receive their IP addresses from the Acropolis Master’s responses.

• The IP address assigned to a VM is persistent until you delete the VNIC or destroy the VM.

The Acropolis Master runs the CVM administrative process to track device IP addresses. This
creates associations between the interface’s MAC addresses, IP addresses and defined pool of
IP addresses for the AOS DHCP server.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 61
Networking

Network Segmentation

Network segmentation is designed to manage traffic from backplane (storage and CVM) traffic.
It separates storage traffic from routable management traffic for security purposes and creates
separate virtual networks for each traffic type.

You can segment the network on a Nutanix cluster in the following ways:

• On an existing cluster by using the Prism Web Console

• When creating a cluster by using Nutanix Foundation 3.11.2 or higher versions.

View this Tech TopX video to learn more about network segmentation. You can also read the
Securing Traffic Through Network Segmentation section of the Nutanix Security Guide on the
Support Portal for more information on securing traffic through network segmentation.

Configuring Network Segmentation on an Existing Cluster

For more information about segmenting the network when creating a cluster, see the Field
Installation Guide on the Support Portal.

You can segment the network on an existing cluster by using the Prism web console. The
network segmentation process:
• Creates a separate network for backplane communications on the existing default virtual
switch.

• Configures the eth2 interfaces that AHV creates on the CVMs during upgrade.

• Places the host interfaces on the newly created network.

From the specified subnet, AHV assigns IP addresses to each new interface. Each node requires
two IP addresses. For new backplane networks, you must specify a nonroutable subnet. The
interfaces on the backplane network are automatically assigned IP addresses from this subnet,
so reserve the entire subnet for the backplane network alone.

If you plan to specify a VLAN for the backplane network, configure the VLAN on the physical
switch ports to which the nodes connect. If you specify the optional VLAN ID, AHV places the

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 62
Networking

newly created interfaces on the VLAN. Nutanix highly recommends a separate VLAN for the
backplane network to achieve true segmentation.

Configuring Network Segmentation for an Existing RDMA Cluster

Segment the network on an existing RDMA cluster by using the Prism web console.

The network segmentation process:

• Creates a separate network for RDMA communications on the existing default virtual switch.

• Places the rdma0 interface created on the CVMs during upgrade.

• Places the host interfaces on the newly created network.

From the specified subnet, AHV assigns IP addresses (two per node) to each new interface. For
new RDMA networks, you must specify a nonroutable subnet. AHV automatically assigns the
interfaces on the backplane network IP addresses from this subnet, so reserve the entire subnet
for the backplane network alone.

If you plan to specify a VLAN for the RDMA network, configure the VLAN on the physical
switch ports to which the nodes connect. If you specify the optional VLAN ID, AHV places the
newly created interfaces on the VLAN. Nutanix highly recommends a separate VLAN for the
RDMA network to achieve true segmentation.

Network Segmentation During Cluster Expansion

When you expand a cluster, AHV extends network segmentation to the added nodes. For each
node you add to the cluster, AHV allocates two IP addresses from the specified nonroutable
network address space. If IP addresses are not available in the specified network, Prism displays
a message on the tasks page. In this case, you must reconfigure the network before you retry
cluster expansion.

When you change the subnet, any IP addresses assigned to the interfaces on the backplane
network change, and the procedure therefore involves stopping the cluster. For information
about how to reconfigure the network, see the Reconfiguring the Backplane Network section
of the Nutanix Security Guide on the Support Portal.

Network Segmentation During an AOS Upgrade

If the new AOS release supports network segmentation, AHV automatically creates the eth2
interface on each CVM. However, the network remains unsegmented and the cluster services on
the CVM continue to use eth0 until you configure network segmentation.

Note: Do not delete the eth2 interface that AHV creates on the CVMs, even if you
are not using the network segmentation feature.

Reconfiguring the Backplane Network

Backplane network reconfiguration is a CLI-driven procedure that you perform on any one of
the CVMs in the cluster. AHV propagates the change to the remaining CVMs.

Note: At the end of this procedure, the cluster stops and restarts, even if only
changing the VLAN ID, and therefore involves cluster downtime. Shut down all user
VMs and CVMs before reconfiguring the network backplane.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 63
Networking

Disabling Network Segmentation

Disabling network segmentation is a CLI-driven procedure that you perform on any one of the
CVMs in the cluster. AHV propagates the change to the remaining CVMs.

At the end of this procedure, the cluster stops and restarts and therefore involves cluster
downtime. Shut down all user VMs and CVMs before reconfiguring the disabling network
segmentation.

Unsupported Network Segmentation Configurations

Network segmentation is not supported in the following configurations:

• Clusters on which the CVMs have a manually created eth2 interface.

• Clusters on which the eth2 interface on one or more CVMs have manually assigned IP
addresses.
• In ESXi clusters where the CVM connects to a VMware distributed virtual switch.

• Clusters that have two (or more) vSwitches or bridges for CVM traffic isolation. The CVM
management network (eth0) and the CVM backplane network (eth2) must reside on a single
vSwitch or bridge. Do not create these CVM networks on separate vSwitches or bridges.

AHV Host Networking

Network management in an Acropolis cluster consists of the following tasks:

• Configuring L2 switching (configuring bridges, bonds, and VLANs.)

• Optionally changing the IP address, netmask, and default gateway that AHV specified for the
hosts during the imaging process.

This Tech TopX video walks through AHV networking concepts, including both CLI and Prism
examples.

Recommended Network Configuration

You need to change the default network configuration to the recommended configuration
below.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 64
Networking

Network Component Best Practice

OVS Do not modify the OpenFlow tables that are associated with the
default OVS bridge br0.

VLANs Add the Controller VM and the AHV host to the same VLAN. By
default, AHV assigns the Controller VM and the hypervisor to VLAN
0, which effectively places them on the native VLAN configured on
the upstream physical switch.

Do not add other devices, such as guest VMs, to the same VLAN as
the CVM and hypervisor. Isolate guest VMs on one or more separate
VLANs.

Virtual bridges Do not delete or rename OVS bridge br0.

Do not modify the native Linux bridge virbr0.

OVS bonded port Aggregate the host 10 GbE interfaces to an OVS bond on br0. Trunk
these interfaces on the physical switch.
(bond0)
By default, the 10 GbE interfaces in the OVS bond operate in the
recommended active-backup mode.

Note: Nutanix does not recommend nor support the

mixing of bond modes across AHV hosts in the same
cluster.

LACP configurations might work but might have limited support.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 65
Networking

Network Component Best Practice

1 GbE and 10 GbE If you want to use the 10 GbE interfaces for guest VM traffic, make
sure that the guest VMs do not use the VLAN over which the
interfaces (physical Controller VM and hypervisor communicate.
host)
If you want to use the 1 GbE interfaces for guest VM connectivity,
follow the hypervisor manufacturer’s switch port and networking
configuration guidelines.

Note: Do not include the 1 GbE interfaces in the same

bond as the 10 GbE interfaces. Also, to avoid loops,
do not add the 1 GbE interfaces to bridge br0, either
individually or in a second bond. Use them on other
bridges.

IPMI port on the Do not trunk switch ports that connect to the IPMI interface.
hypervisor host
Configure the switch ports as access ports for management
simplicity.

Upstream physical Nutanix does not recommend the use of Fabric Extenders (FEX)
switch or similar technologies for production use cases. While initial, low
load implementations might run smoothly with such technologies,
poor performance, VM lockups, and other issues might occur as
implementations scale upward.

Nutanix recommends the use of 10Gbps, line-rate, nonblocking

switches with larger buffers for production workloads.

Use an 802.3-2012 standards–compliant switch that has a low

latency, cut-through design and provides predictable, consistent
traffic latency regardless of packet size, traffic pattern, or the
features enabled on the 10 GbE interfaces.

Port-to-port latency should be no higher than 2 microseconds.

Use fast-convergence technologies (such as Cisco PortFast) on

switch ports connected to the hypervisor host.

Avoid using shared buffers for the 10 GbE ports. Use a dedicated
buffer for each port.

Physical Network Use redundant top-of-rack switches in a traditional leaf-spine

Layout architecture. The flat network design is well suited for a highly
distributed, shared-nothing compute and storage architecture.

Add all the nodes that belong to a given cluster to the same Layer-2
network segment.

Nutanix supports other network layouts as long as you follow all

other Nutanix recommendations.

Controller VM Do not remove the Controller VM from either the OVS bridge br0 or
the native Linux bridge virbr0.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 66
Networking

AHV Networking Terminology Comparison

AHV Term VMware Term Microsoft Hyper-V or SCVMM

Term

Bridge vSwitch, distributed virtual switch Virtual switch, logical switch

Bond NIC team Team or uplink port profile

Port Port N/A

Network Port group VLAN tag or logical network

Uplink Physical NIC or VMNIC Physical NIC or pNIC

VM NIC VNIC VM NIC

Internal port VMkernel port Virtual NIC

Active-backup Active/standby Active/standby

Balance-slb Route based on source MAC hash Switch-independent/ dynamic

combined with route based on
physical NIC load

LACP with LACP and route based on IP hash Switch-dependent (LACP) /

balance-tcp address hash

Labs
1. Creating an Unmanaged Network

2. Creating a Managed Network

3. Managing Open vSwitch (OVS)

4. Creating a New OVS

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 67
Virtual Machine Management

Module

5
VIRTUAL MACHINE MANAGEMENT

Overview
After completing this module, you will be able to:

• Use Image Configuration to upload and manage images.

• Manage a VM in AHV.

• Define Data Path Redundancy.

Understanding Image Configuration

The Prism Web Console allows you to import and configure operating system ISO and disk
image files. This image service allows you to assemble a repository of image files in different
formats (raw, vhd, vhdx, vmdk, vdi, iso, and qcow2) that you can later use when creating virtual
machines. How image creation, updates, and deletions work depends on whether or not Prism
Element is registered with Prism Central.

Images that are imported to Prism Element reside in and can be managed from Prism
Element. If connected to Prism Central, you can migrate your images over to Prism Central for
centralized management. This will not remove your images from Prism Element, but will allow
management only in Prism Central. So, for example, if you want to update a migrated image, it
can only be done from Prism Central, not from Prism Element.

Registration with Prism Central is also useful if you have multiple Prism Element clusters
managed by a single instance of Prism Central. In this scenario, if you upload an image to a local
Prism Element instance, for example, this is what happens:

• The image is available locally on that Prism Element instance. (Assuming it has not been
migrated to Prism Central.)

• The image is flagged as ‘active’ on that Prism Element cluster.

• The image is flagged as ‘inactive’ on other Prism Element clusters.

• When you create a VM using that image, the image is copied to other Prism Element
clusters, is made active, and is then available for use on all Prism Element clusters managed
by that instance of Prism Central.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 68
Virtual Machine Management

Overview

Supported Disk Formats

Supported disk formats include

• RAW

• VHD (virtual hard disk)

• VMDK (virtual machine disk)

• VDI (Oracle VirtualBox)

• ISO (disc image)

• QCOW2 (QEMU copy on write)

The QCOW2 format decouples the physical storage layer from the virtual layer by adding a
mapping between the logical and physical blocks.

Post-Import Actions

After you import an image you can perform several actions.

• Clone a VM from the image

• Leave the image in the service for future deployments

• Delete the imported image

After you import an image, you must clone a VM from the image that you have imported and
then delete the imported image.

For more information on how to create a VM from the imported image, see the Prism Web
Console Guide on the Support Portal.

Uploading Images
There are two ways to upload an image to the Image Service:

• Via Prism

• Using the command line

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 69
Virtual Machine Management

Using the Prism Interface

1. From the Settings menu in Prism, select Image Configuration.

2. Upload an image and specify the required parameters, including the name, the image type,
the container on which it will be stored, and the image source for upload.

After Prism completes the upload, the image will appear in a list of available images for use
during VM creation.

Using the Command Line

To create an image (testimage) from an image located at http://example.com/disk_image, you

can use the following command:
<acropolis> image.create testimage source_url=http://example.com/image_iso
container=default image_type=kIsoImage

To create an image (testimage) from an image located at NFS server, you can use the following
command:
<acropolis> image.create testimage source_url=nfs://nfs_server_path/path_to_image

To create an image (image_template) from a vmdisk 0b4fc60b-cc56-41c6-911e-67cc8406d096

(UUID of the VM):
<acropolis> image.create image_template clone_from_vmdisk=0b4fc60b-
cc56-41c6-911e-67cc8406d096 imag

Creating and Managing Virtual Machines in AHV

You can use the Prism web console to create virtual machines (VMs) on a Nutanix cluster. If
you have administrative access to a cluster, you can create a VM with Prism by completing a
form that requires a name, compute, storage, and network specifications. If you have already
uploaded the required image files to the image service, you can create either Windows or Linux
VMs during the VM creation process.

Prism also has self-service capabilities that enable administrators or project members with the
required permissions to create VMs. In this scenario, users will select from a list of pre-defined
templates for VMs and disk images to create their VM.

Finally, VMs can be updated after creation, cloned, or deleted as required. When updating a VM,
you can change compute details (vCPUs, cores per vCPU, memory), storage details (disk types
and capacity), as well as other parameters that were specified during the VM creation process.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 70
Virtual Machine Management

Creating a VM in AHV
1. In Prism Central, navigate to VM dashboard, click the List tab, and click Create VM.

2. In the Cluster Selection window, select the target cluster for your VM and click OK.

3. In the Create VM dialog box, update the following information as required for your VM:

• Name

• Description (optional)
• Timezone

• vCPUs

• Number of cores per vCPU

• Memory

• GPU and GPU mode

• Disks (CD-ROM or disk drives)

• Network interface
- NIC

- VLAN name, ID, and UUID

- Network connection state

- Network address/prefix

- IP address (for NICs on managed networks only)

• VM host affinity

4. After all fields have been updated and verified, click Save to create the VM.

When creating a VM, you can also provide a user data file for Linux VMs, or an answer file for
Windows VMs, for unattended provisioning. There are 3 ways to do this:

• If the file has been uploaded to a storage container on a cluster, click ADSF path and enter
the path to the file.

• If the file is available on your local computer, click Upload a File, click Choose File, and then
upload the file.

• If you want to create the file or paste the contents directly, click Type or paste script and
then use the text box that is provided

You can also copy or move files to a location on the VM for Linux VMs, or to a location in the
ISO file for Windows VMs, during initialization. To do this, you need to specify the source file
ADSF path and the destination path in the VM. To add other files or directories, repeat this
process as necessary.

Creating a VM using Prism Self-Service (PSS)

This process is slightly different from creating a VM with administrative permissions. This is
because self-service VMs are based on a source file stores in the Prism Central catalog. To
create a VM using Prism Self-Service:

1. In Prism Central, navigate to VM dashboard, click the List tab, and click Create VM.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 71
Virtual Machine Management

2. Select source images for the VM, including the VM template and disk images.

3. In the Deploy VM tab, provide the following information:

• VM name

• Target project

• Disks

• Network

• Advanced Settings (vCPUs and memory)

4. After all the fields have been updated and verified, click Save to create the VM.

Managing a VM
Whether you have created a VM with administrative permissions or as a self-service
administrator, three options are available to you when managing VMs:

To modify a VM’s configuration

1. Select the VM and click Update.

2. The Update VM dialog box includes the same fields as the Create VM dialog box. Make the
required changes and click Save.

To delete a VM

1. Select the VM and click Delete.

2. A confirmation prompt will appear; click OK to delete the VM.

To clone a VM

1. Select the VM and click Clone.

2. The Clone VM dialog box includes the same fields as the Create VM dialog box. However,
all fields will be populated with information based on the VM that you are cloning. You can
either:

• Enter a name for the cloned VM and click Save, or

• Change the information in some of the fields as desired, and then click Save.

Other operations that are possible for a VM via one-click operations in Prism Central are:

• Launch console

• Power on/off

• Take Snapshot

• Migrate (to move the VM to another host)

• Assign a category value

• Quarantine/Unquarantine

• Enable/disable Nutanix Guest Tools

• Configure host affinity

• Add snapshot to self-service portal template (Prism Central Administrator only)

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 72
Virtual Machine Management

• Manage VM ownership (for self-service VMs)

Note: For more information on each of these topics, please see the Prism
Central Guide > Virtual Infrastructure (Cluster) Administration > VM Management
documents on the Nutanix Support Portal.

Supported Guest VM Types for AHV

OS Types with SCSI Bus Types OS Types with PCI Bus Types

Windows 7, 8.x, 10 RHEL 5.10, 5.11, 6.3

Windows Server 2008/R2, 2012/R2, CentOS 5.10, 5.11, 6.3

2016

RHEL 6.4-6.9, 7.0-7.4 Ubuntu 12.0.4

CentOS 6.4-6.8, 7.0-7.3 SLES 12

Ubuntu 12.0.4.5, 14.04x, 16.04x, 16.10

FreeBSD 9.3, 10.0-10.3, 11

SLES 11 SP3/SP4, 12

Oracle Linux 6.x, 7.x

See the AHV Guest OS Compatibility Matrix on the Support Portal for the current list of
supported guest VMs in AHV.

Nutanix VirtIO
Nutanix VirtIO is a collection of drivers for paravirtual devices that enhance the stability and
performance of virtual machines on AHV.

Nutanix VirtIO drivers:

• Enable Windows 64-bit VMs to recognize AHV virtual hardware
• Contain Network, Storage and a Balloon driver (which is used to gather stats from Windows
guest VMs)
• If not added as ISO (CDROM), VM may not boot
• Most modern Linux distributions already include drivers
• Support Portal: Downloads > Tools and Firmware

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 73
Virtual Machine Management

Nutanix Guest Tools

Overview

Nutanix Guest Tools (NGT) is an in-guest agent framework that enables advanced VM
management functionality through the Nutanix Platform.

The NGT bundle consists of the following components:

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 74
Virtual Machine Management

• Nutanix Guest Agent (NGA) service. Communicates with the Nutanix Controller VM.

• File Level Restore CLI. Performs self-service file-level recovery from the VM snapshots. For
more information about self-service restore, see the Acropolis Advanced Setup Guide.

• Nutanix VM mobility drivers. Provides drivers for VM migration between ESXi and AHV,
in-place hypervisor conversion, and cross-hypervisor disaster recovery (CH-DR) features.
For more information about cross- hypervisor disaster recovery, see the Cross-Hypervisor
Disaster Recovery section of the Data Protection and Disaster Recovery guide on the
Support Portal.

• VSS requestor and hardware provider for Windows VMs. Enables application-consistent
snapshots of AHV or ESXi Windows VMs. For more information about Nutanix VSS-based
snapshots for the Windows VMs, see the Application-Consistent Snapshot Guidelines on
the Support Portal.

NGT Requirements and Limitations

General requirements and limitations

• You must configure the cluster virtual IP address on the Nutanix cluster. If the virtual IP
address of the cluster changes, it will impact all the NGT instances that are running in your
cluster. For more information, see the Impact of Changing Virtual IP Address of the Cluster
section of the Prism Web Console Guide on the Support Portal.

• VMs must have at least one empty IDE CD-ROM slot to attach the ISO.

• Port 2074 should be open to communicate with the NGT-Controller VM service.

• The hypervisor should be ESXi 5.1 or later, or AHV 20160215 or later version.

• You should connect the VMs to a network that you can access by using the virtual IP
address of the cluster.

Supported operating systems

• For Windows Server Edition VMs, ensure that Microsoft VSS service is enabled before
starting the NGT installation.

• When you connect a VM to VG, NGT captures the IQN of the VM and stores the information.
If you change the VM IQN before the NGT refresh cycle occurs and you take a snapshot
of the VM, the NGT will not be able to provide auto restore capability because the
snapshot operation will not be able to capture the VM-VG connection. As a workaround,
you can manually restart the Nutanix guest agent service by running the $sudo service
ngt_guest_agent restart command on the Linux VM and from the Services tab of the
Windows VM to update NGT.

Note: See the supported operating system information for the specific NGT
features to verify if an operating system is supported for a specific NGT feature.

Requirements and Limitations by Operating System

Windows

Versions: Windows 2008 or later, Windows 7 or later

• Only the 64-bit operating system is supported.

• You must install the SHA-2 code signing support update before installing NGT.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 75
Virtual Machine Management

Apply the security update in KB3033929 to enable SHA-2 code signing support on the
Windows OS. If the installation of the security update in KB3033929 fails, apply one of the
following rollups:

- KB3185330 (October 2016 Security Rollup)

- KB3197867 (November 2016 Security Only Rollup)

- KB3197868 (November 2016 Quality Rollup)

• For Windows Server Edition VMs, ensure that Microsoft VSS Services is enabled before
starting the NGT installation.

Linux

Versions: CentOS 6.5 and 7.0, Red Hat Enterprise Linux (RHEL) 6.5 and 7.0, Oracle Linux 6.5
and 7.0, SUSE Linux Enterprise Server (SLES) 11 SP4 and 12, Ubuntu 14.0.4 or later

• The SLES operating system is only supported for the application consistent snapshots with
VSS feature. The SLES operating system is not supported for the cross-hypervisor disaster
recovery feature.

Customizing a VM
Sysprep

Sysprep is a utility that prepares a Windows installation for duplication (imaging) across
multiple systems. Sysprep is most often used to generalize a Windows installation.

During generalization, Sysprep removes system-specific information and settings such as the
security identifier (SID) and leaves installed applications untouched.

You can capture an image of the generalized installation and use the image with an answer
file to customize the installation of Windows on other systems. The answer file contains the
information that Sysprep needs to complete an unattended installation.

Sysprep customization requires a reference image:

1. Log into the Web Console and browse to the VM dashboard.

2. Select a VM to clone, click Launch Console, and log in with Administrator credentials.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 76
Virtual Machine Management

3. Configure Sysprep with a system cleanup. Specify whether or not to generalize the
installation, then choose to shut down the VM.

Note: Do not power on the VM after this step!

Cloud-Init

On non-Windows VMs, Cloud-config files, special scripts designed to be run by the Cloud-Init
process, are generally used for initial configuration on the very first boot of a server. The cloud-
config format implements a declarative syntax for many common configuration items and
also allows you to specify arbitrary commands for anything that falls outside of the predefined
declarative capabilities. This lets the file act like a configuration file for common tasks, while
maintaining the flexibility of a script for more complex functionality.

You must pre-install the utility in the operating system image used to create VMs.

Cloud-Init runs early in the boot process and configures the operating system on the basis of
data that you provide. You can use Cloud-Init to automate tasks such as:

• Setting a host name and locale

• Creating users and groups

• Generating and adding SSH keys so that users can log on

• Installing packages
• Copying files

• Bootstrapping other configuration management tools such as Chef, Puppet, and Salt

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 77
Virtual Machine Management

Guest VM Data Management

Guest VM Data: Standard Behavior

Hosts read and write data in shared Nutanix datastores as if they were connected to a SAN.
From the perspective of a hypervisor host, the only difference is the improved performance
that results from data not traveling across a traditional SAN. VM data is stored locally and
replicated on other nodes for protection against hardware failure.

When a guest VM submits a write request through the hypervisor, that request is sent to
the Controller VM on the host. To provide a rapid response to the guest VM, this data is first
stored on the metadata drive within a subset of storage called the oplog. This cache is rapidly
distributed across the 10GbE network to other metadata drives in the cluster.

Oplog data is periodically transferred to persistent storage within the cluster. Data is written
locally for performance and replicated on multiple nodes for high availability.

When the guest VM sends a read request through the hypervisor, the Controller VM reads
from the local copy first. If the host does not contain a local copy, then the Controller VM reads
across the network from a host that does contain a copy. As remote data is accessed, the
remote data is migrated to storage devices on the current host so that future read requests can
be local.

Live Migration

The Nutanix Enterprise Cloud Computing Platform fully supports live migration of VMs, whether
initiated manually or through an automatic process. All hosts within the cluster have visibility
into shared Nutanix datastores through the Controller VMs. Guest VM data is written locally and
is also replicated on other nodes for high availability.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 78
Virtual Machine Management

If you migrate a VM to another host, future read requests are sent to a local copy of the data
(if it exists). Otherwise, the request is sent across the network to a host that does contain the
requested data. As remote data is accessed, the remote data is migrated to storage devices on
the current host, so that future read requests can be local.

Data Path Redundancy

The Nutanix cluster automatically selects the optimal path between a hypervisor host and its
guest VM data. The Controller VM has multiple redundant paths available, which makes the
cluster more resilient to failures.

Configuring Flash Mode

Verification of Flash Mode settings using Prism:

Flash mode is configured when you update the VM configuration. In addition to modifying the
configuration, you can attach a volume group to the VM and enable flash mode on the VM. If
you attach a volume group to a VM that is part of a protection domain, the VM is not protected
automatically. Add the VM to the same consistency group manually.

To enable flash mode on the VM, click the Enable Flash Mode check box.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 79
Virtual Machine Management

After you enable this feature on the VM, the status is updated in the VM table view. To view the
status of individual virtual disks (disks that are flashed to the SSD), go the Virtual Disks tab in
the VM table view.

You can disable the flash mode feature for individual virtual disks. To update the flash mode
for individual virtual disks, click the update disk icon in the Disks pane and deselect the Enable
Flash Mode check box.

Labs
1. Uploading an image

2. Creating a Windows virtual machine

3. Creating a Linux virtual machine

4. Using dynamic VM resource management

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 80
Module

6
HEALTH MONITORING AND ALERTS

Overview
After completing this module, you will be able to:

• Monitor cluster health.

• Work with NCC.
• Use the Health dashboard and other components.

• Configure a health check.

• Monitor alerts and events.

• Configure alert email settings for a cluster.

Health Monitoring
Nutanix provides a range of status checks to monitor the health of a cluster.

• Summary health status information for VMs, hosts, and disks displays on
the Home dashboard.

• In depth health status information for VMs, hosts, and disks is available through
the Health dashboard.

You can:

• Customize the frequency of scheduled health checks.

• Run Nutanix Cluster Check (NCC) health checks directly from Prism.

• Collect logs for all the nodes and components.

Note: If the Cluster Health service status is down for more than 15 minutes, an
alert email is sent by the AOS cluster to configured addresses and to Nutanix
support (if selected). In this case, no alert is generated in the Prism web
console. The email is sent once every 24 hours. You can run the NCC check
cluster_services_down_check to see the service status.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 81
Health Monitoring and Alerts

Nutanix Cluster Check

Nutanix Cluster Check (NCC) is a framework of scripts that can help diagnose cluster health.
You can run checks from the Prism Web Console or the CVM command line. NCC actions are
grouped into plugins and modules. Plugins are objects that run the diagnostic commands.
Modules are logical groups of plugins that can be run as a set. You can run individual or multiple
simultaneous health checks from either the Prism Web Console or the command line. When run
from the CVM command line, NCC generates a log file with the output of diagnostic commands
selected by the user. A similar log file is generated when the web console is used, but it is less
easy to read than the one generated by the command line.

NCC can be run as long as the individual nodes are up, regardless of cluster state. The scripts
run standard commands against the cluster or nodes based on the type of information being
retrieved.

Note: Some plug-ins run nCLI commands and might require the user to input the
nCLI password. The password is logged on as plain text.

If you change the default password of the admin user, you must specify the password every
time you start an nCLI session from a remote system.

If you are logged onto a CVM, a password is not required for nCLI. Comprehensive
documentation of NCC is available in the Nutanix AOS 5.10 Command Reference on the
Support Portal.

Installing and Updating NCC

NCC is installed on every Nutanix node by default. However, if you are adding nodes to expand
a cluster, the latest version of NCC may not be installed on each new node. In this scenario,
there are two ways in which you can upgrade NCC software:

• Using the web console: https://portal.nutanix.com/#/page/docs/details?

targetId=NCC_Guide-NCC_v2_x:wc-cluster_ncc_upgrade_wc_t.html

• Using an installer file: https://portal.nutanix.com/#/page/docs/details?targetId=NCC_Guide-

NCC_v2_x:ncc-ncc_install_t.html

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 82
Health Monitoring and Alerts

When to Run NCC

NCC is effectively the Swiss Army Knife of Nutanix troubleshooting tools. Since it allows
administrators to run a multitude of health checks, identify misconfigurations, collect logs, and
monitor checks via email, it’s a good practice to run NCC before or after performing major
activities on a cluster. NCC should be run:

• After a new install

• Before and after activities such as adding, removing, reconfiguring, or upgrading nodes

• As a first step before troubleshooting an issue

NCC Syntax
The general syntax of NCC is $ ncc <ncc-flags> <module> <sub-module> [...] <plugin> <plugin-
flags>

NCC Output
The Type column distinguishes between modules (M) and plug-ins (P), $ ncc.

Output Status Types

Each NCC plug-in is a test that completes independently of other plug-ins. Each test completes
with one of these status types.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 83
Health Monitoring and Alerts

Checking the Status of a Cluster

Before running any tests, you should evaluate the current state of the cluster. The command to
do this is:

$cluster status

After installing the cluster, you need to verify that it is set up correctly using NCC: $ncc
health_checks run_all

NCC checks for common misconfiguration problems and verifies settings are correct. An
example of a common CVM misconfiguration problem is using 1GbE NICs instead of 10GbE
NICs.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 84
Health Monitoring and Alerts

Usage Examples
Note: The flags override the default configurations of the NCC modules and
plug-ins. Do not run these flags unless your cluster configuration requires these
modifications.

ncc_installer_filename.sh

$ ncchealth_checks run_all

Display default command flags:

$ ncc –ncc interactive=false <module> <sub-module […]> <plugin> --helpshort

Running Checks
In addition to running various cluster health checks, it is also possible to run NCC health checks
in parallel.

Configuring Check Frequency and Email Notifications

NCC allows you to set the frequency of cluster checks and to email the results of these checks.
By default, this feature is disabled. If you enable and configure this feature, NCC will:
• Run the checks periodically according to a frequency you have set

• Email the results of the checks to users who you have also configured to receive alert emails.
This feature uses the settings and infrastructure of the Alert Email Configuration feature,
which can send alert information automatically to Nutanix customer support and others.

• Run as configured even if you have upgraded your cluster's AOS or NCC version after
configuring this feature.

Note: For step-by-step instructions on how to configure check frequency and email
notifications, see the Nutanix Cluster Check 2.x Guide.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 85
Health Monitoring and Alerts

Health Dashboard
The Health dashboard displays dynamically updated health information about VMs, hosts, and
disks in the cluster. To view the Health dashboard, select Health from the drop down menu on
the left of the main menu.

The Health dashboard is divided into three columns:

The left column displays tabs for each entity type (VMs, hosts, disks, storage pools, storage
containers, cluster services, and [when configured] protection domains and remote sites). Each
tab displays the entity total for the cluster (such as the total number of disks) and the number
in each health state. Clicking a tab expands the displayed information (see the following
section).

The middle column displays more detailed information about whatever is selected in the left
column.

The right column displays a summary of all the health checks. You also have the option to view
individual checks from the Checks button (success, warning, failure, or disabled).
• The Summary tab provides a summarized view of all the health checks according to check
status and check type.

• The Checks tab provides information about individual checks. Hovering the cursor over an
entry displays more information about that health check. You can filter checks by clicking
the appropriate field type and clicking Apply.

• The Actions tab provides you with options to manage checks, run checks, and collect logs.

Configuring Health Checks

A set of automated health checks are run regularly. They provide a range of cluster health
indicators. You can specify which checks to run and configure the schedules for the checks and
other parameters.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 86
Health Monitoring and Alerts

Cluster health checks cover a range of entities including AOS, hypervisor, and hardware
components. A set of checks are enabled by default, but you can run, disable, or reconfigure
any of the checks at any time to suit your specific needs.

To configure health checks, From the Actions menu on the Health dashboard, click Manage
Checks.

The displayed screen lists all checks that can be run on the cluster, divided into categories
including CVM, Cluster, Data Protection, File Server, Host, and so on. Sub-categories include
CPU, disk, and hardware for CVMs; Network, Protection Domains, and Remote Sites for
Clusters; CPU and disk for hosts; and so on.

Selecting a check from the left pane will allow you to:

• View a history of all entities evaluated by this check, displayed in the middle of the screen.

• Run the check.

• Turn the check off.

• View causes and resolutions, as well as supporting reference articles on the Nutanix
Knowledge Base.

Setting NCC Frequency

Nutanix Cluster Check (NCC) is a framework of scripts that can help diagnose cluster health.
You can run individual or multiple simultaneous health checks from either the Prism Web
Console or the command line. When run from the CVM command line, NCC generates a log file
with the output of diagnostic commands selected by the user. A similar log file is generated
when the web console is used, but it is less easy to read than the one generated by the
command line.

NCC allows administrators to run a multitude of health checks, identify misconfigurations,

collect logs, and monitor checks via email. It’s a good practice to run NCC before or after
performing major activities on a cluster. NCC should be run:

• After a new install

• Before and after activities such as adding, removing, reconfiguring, or upgrading nodes

• As a first step before troubleshooting an issue

Set NCC Frequency allows you to configure the run schedule for Nutanix Cluster Checks and
view e-mail notification settings.

You can run cluster checks:

• Every 4 hours

• Every day

• Every week

You can set the day of the week and the start time for the checks where appropriate.

A report is sent to all e-mail recipients shown. You can configure e-mail notifications using
the Alert Email Configuration menu option.

Collecting Logs
Logs you your Nutanix cluster and its various components can be collected directly from the
Prism web console. Logs can be collected for Controller VMs, file server, hardware, alerts,

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 87
Health Monitoring and Alerts

hypervisor, and for the system. The most common scenarios in which you will need to collect
logs are when troubleshooting an issue, or when you need to provide information for a Nutanix
Support case.

1. On the Health dashboard, click Actions on the right pane and select Log Collector.

2. Select the period for which you want to collect logs, either by choosing a duration in hours
or by setting a custom date range.

3. Run Log Collector.

After you run Log Collector and the task completes, the bundle will be available to download.

Analysis Dashboard
The Analysis dashboard allows you to create charts that can dynamically monitor a variety of
performance measures.

The Analysis dashboard includes three sections.

Chart definitions

The pane on the left lists the charts that can be run. No charts are provided by default, but you
can create any number of charts. A chart defines the metrics to monitor.

Chart monitors

When a chart definition is checked, the monitor appears in the middle pane. An Alerts monitor
always displays first. The remaining displayed monitors are determined by which charts are
checked in the left pane. You can customize the display by selecting a time interval from the
Range drop-down (above the charts) and then refining the monitored period by moving the
time interval end points to the desired length.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 88
Health Monitoring and Alerts

Alerts

Any alerts that occur during the interval specified by the timeline in the middle pane display in
the pane on the right.

Understanding Metric and Entity Charts

A metric chart monitors the performance of a single metric on one or more entities. For
example, you can create a single chart that monitors the content cache hits for multiple hosts
within a cluster.
An entity chart monitors the performance of one or more metrics for a single entity. For
example, you can create a single metric chart that monitors a particular host, for metrics such
as Disk I/O Bandwidth for Reads, Disk I/O Bandwidth for Writes, and Disk IOPS.

To create either a metric or an entity chart:

1. On the Analysis dashboard, click New and select either a Metric chart or an Entity chart.

• For Metric charts, select the metric you want to monitor, the entity type, and then a list of
entities.

• For Entity charts, select the entity type, then the specific entity and all the metrics you want
to monitor on that entity.

Alerts Dashboard
The Alerts dashboard displays alert and event messages.

Alerts View

Two viewing modes are available: Alerts, and Events. The Alerts view, shown above, lists
all alerts messages and can be sorted by source entity, impact type, severity, resolution,
acknowledgement, and time of creation.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 89
Health Monitoring and Alerts

These are the fields in the Alerts view:

Parameter Description Values

(selection box) Click this box to select the message n/a

for acknowledgement or resolution.

Configure (button) Allows you to configure Alert Alert Policies, Email

Policies and email notification Configuration
settings for your cluster.

Title Displays the alert message. (message text)

Source Entity Displays the name of the entity to (entity name)

which this alert applies, for example
host or cluster.

Severity Displays the severity level of this Critical, Warning,

condition. There are three levels: Informational

Critical: A "critical" alert is one that

requires immediate attention, such
as a failed Controller VM.

Warning: A "warning" alert is one

that might need attention soon,
such as an issue that could lead to a
performance problem.

Informational: An "informational"
alert highlights a condition to be
aware of, for example, a reminder
that the support tunnel is enabled.

Resolved Indicates whether a user has set the (user and time), No
alert as resolved. Resolving an error
means you set that error as fixed.
(The alert may return if the condition
is scanned again at a future point.) If
you do not want to be notified about
the condition again, turn off the alert
for this condition.

Acknowledged Indicates whether the alert has been (user and time), No
acknowledged. Acknowledging
an alert means you recognize the
error exists (no more reminders for
this condition), but the alert status
remains.

Create Time Displays the date and time when the (time and date)
alert occurred.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 90
Health Monitoring and Alerts

Parameter Description Values

Documentation Displays Cause and Resolution links (test description of

that pop up an explanation of the cause or resolution)
alert cause and resolution when you
hover the cursor over the link.

(pencil icon) Clicking the pencil icon opens the n/a

Update Alert Policy window at that
message.

Configuring Alert Email Settings

Alert email notifications are enabled by default. This feature sends alert messages automatically
to Nutanix customer support through customer-opened ports 80 or 8443. To automatically
receive email notification alerts, ensure that nos-alerts and nos-asup recipients are added to the
accepted domain of your SMTP server. To customize who should receive the alert e-mails (or to
disable e-mail notification), do the following:

On the Alerts dashboard, click Configure and select Email Configuration.

The Email Configuration page allows you to customize:

• Your alert email settings

• The rules that govern when and to whom emails will be sent

• The template that will be used to send emails

Alert Email Settings

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 91
Health Monitoring and Alerts

Alert Email Rules

Alert Email Templates

Events View
The Event messages view displays a list of event messages. Event messages describe cluster
actions such as adding a storage pool or taking a snapshot. This view is read-only and you do
not need to take any action like acknowledging or resolving generated events.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 92
Health Monitoring and Alerts

To filter the list, click the filter icon on the right of the screen. This displays a pane (on the
right) for selecting filter values. Check the box for each value to include in the filter. You can
include multiple values. The values are for event type (Behavioral Anomaly, System Action, User
Action) and time range (Last 1 hour, Last 24 hours, Last week, From XXX to XXX). You can also
specify a cluster. The selected values appear in the filter field above the events list. You can do
the following in the current filters field:

• Remove a filter by clicking the X for that filter.

• Remove all filters by clicking Clear (on the right).

• Save the filter list by clicking the star icon. You can save a maximum of 20 filter lists per
entity type.

• Use a saved filter list by selecting from the drop down list.

These are the fields in the Events view:

Parameter Description Values

Title Displays the event message. (message text)

Source Entities Displays the entity (such as a cluster, (entity name)

host, or VM name) to which the
event applies. A comma separated
list appears if it applies to multiple
entities. If there is an associated
details page, the entity is a live link;
clicking the link displays the details
page.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 93
Health Monitoring and Alerts

Parameter Description Values

Event Type Displays the event category. For Availability, Capacity,

example, a user action like logging Configuration,
out, node added, and so on. Performance, System
Indicator, Behavioral
Abnormality, DR

Create Time Displays the date and time when the (time and date)
event occurred.

Labs
1. Creating a performance chart

2. Generating Write I/O

3. Managing alerts

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 94
Module

7
DISTRIBUTED STORAGE FABRIC

Overview
After completing this module, you will be able to:

• Create a storage container.

• Identify cluster capacity optimization.

• Explain cloning and snapshotting.

• Describe available storage optimizations such as deduplication and compression.

• Describe replication factor and erasure coding.

Understanding the Distributed Storage Fabric

DSF is a distributed storage architecture that replaces traditional SAN/NAS solutions.

The Distributed Storage Fabric (DSF) is a scalable distributed storage system which exposes
NFS/SMB file storage as well as iSCSI block storage with no single points of failure. The
distributed storage fabric stores user data (VM disk/files) across storage tiers (SSDs, Hard
Disks, Cloud) on multiple nodes. The DSF also supports instant snapshots, clones of VM disks
and other advanced features such as deduplication, compression and erasure coding.

The DSF logically divides user VM data into extents which are 1MB in size. These extents may
be compressed, erasure coded, deduplicated, snapshotted or left untransformed. Extents can
also move around; new or recently accessed extents stay on faster storage (SSD) while colder
extents move to HDD. The DSF utilizes a “least recently used” algorithm to determine what data
can be declared “cold” and migrated to HDD. Additionally, the DSF attempts to maintain data
locality for VM data – so that one copy of each vDisk’s data is available locally from the CVM on
the host where the VM is running.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 95
Distributed Storage Fabric

DSF presents SDDs and HDDs as a storage pool and provides cluster-wide storage services:

• Snapshots

• Clones

• HA/DR

• Deduplication

• Compression

• Erasure coding

The Controller VMs (CVMs) running on each node combine to form an interconnected
network within the cluster, where every node in the cluster has access to data from shared
SSD, HDD, and cloud resources. The CVMs allow for cluster-wide operations on VM-centric
software-defined services: snapshots, clones, high availability, disaster recovery, deduplication,
compression, erasure coding, storage optimization, and so on.

Hypervisors (AHV, ESXi, Hyper-V) and the DSF communicate using the industry-standard
protocols NFS, iSCSI, and SMB3.

The Extent Store

The extent store is the persistent bulk storage of DSF and spans SSD and HDD and is extensible
to facilitate additional devices/tiers. Data entering the extent store is either drained from the
OpLog or is sequential in nature and has bypassed the OpLog directly.

Nutanix ILM will determine tier placement dynamically based upon I/O patterns and will move
data between tiers.

The OpLog

The OpLog is similar to a filesystem journal and is used to service bursts of random write
operations, coalesce them, and then sequentially drain that data to the extent store. For each
write OP, the data is written to disk locally and synchronously replicated to another n number
of CVM’s OpLog before the write is acknowledged for data availability purposes (where “n” is
the RF of the container, 2 or 3).

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 96
Distributed Storage Fabric

All CVM participate in OpLog replication. Individual replica location is dynamically chosen based
upon load. The OpLog is stored on the SSD tier on the CVM to provide extremely fast write I/O
performance. OpLog storage is distributed across the SSD devices attached to each CVM.

For sequential workloads, the OpLog is bypassed and the writes go directly to the extent store.

If data is currently sitting in the OpLog and has not been drained, all read requests will be
directly fulfilled from the OpLog until they have been drained, where they would then be served
by the extent store/unified cache.

For containers where fingerprinting (aka Dedupe) has been enabled, all write I/Os will be
fingerprinted using a hashing scheme allowing them to be deduplicated based upon fingerprint
in the unified cache.

Guest VM Write Request

Going through the hypervisor, DSF sends write operations to the CVM on the local host, where
they are written to either the Oplog or Extent Store. In addition to the local copy an additional
write operation is, then distributed across the 10 GbE network to other nodes in the cluster.

Guest VM Read Request

Going through the hypervisor, read operations are sent to the local CVM which returns data
from a local copy. If no local copy is present, the local CVM retrieves the data from a remote
CVM that contains a copy.

The file system automatically tiers data across different types of storage devices using
intelligent data placement algorithms. These algorithms make sure that the most frequently
used data is available in memory or in flash for the fastest possible performance.

Viewing Overall Capacity Optimization

You can create, migrate, and manage VMs within Nutanix datastores as you would with any
other storage solution.

AHV Storage
Prism or aCLI is used to configure all AHV storage.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 97
Distributed Storage Fabric

Data Storage Representation

Storage Components
• Storage Pool

A storage pool is a group of physical storage devices for the cluster including PCIe
SSD, SSD, and HDD devices. The storage pool spans multiple nodes and scales as the
cluster expands. A storage device can only be a member of a single storage pool. Nutanix
recommends creating a single storage pool containing all disks within the cluster.

ncli sp ls displays existing storage pools.

• Storage Container

A storage container is a subset of available storage within a storage pool. Storage containers
enable an administrator to apply rules or transformations such as compression to a data set.
They hold the virtual disks (vDisks) used by virtual machines. Selecting a storage pool for a
new storage container defines the physical disks where the vDisks are stored.

ncli ctr ls displays existing containers.

• Volume Group

A volume group is a collection of logically related virtual disks or volumes. It is attached to

one or more execution contexts (VMs or other iSCSI initiators) that share the disks in the
volume group. You can manage volume groups as a single unit.

Each volume group contains a UUID, a name, and iSCSI target name. Each disk in the volume
group also has a UUID and a LUN number that specifies ordering within the volume group.
You can include volume groups in protection domains configured for asynchronous data
replication (Async DR) either exclusively or with VMs.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 98
Distributed Storage Fabric

Volume groups cannot be included in a protection domain configured for Metro Availability,
in a protected VStore, or in a consistency group for which application consistent
snapshotting is enabled.

• vDisk

A vDisk is a subset of available storage within a storage container that provides storage
to virtual machines. A vDisk is any file over 512 KB on DSF, including VMDKs and VM disks.
vDisks are broken up into extents, which are grouped and stored on physical disk as an
extent group.

• Datastore

A datastore is a hypervisor construct that provides a logical container for files necessary for VM
operations. In the context of the DSF, each container on a cluster is a datastore

Understanding Snapshots and Clones

DSF provides native support for offloaded snapshots and clones which can be leveraged via
VAAI, ODX, ncli, REST, Prism, etc. Both the snapshots and clones leverage the redirect-on-write
algorithm which is the most effective and efficient.

Clones
As mentioned in the introduction to this module, a vDisk is composed of extents, which
are logically contiguous chunks of data. Extents are stored within extent groups, which are
physically contiguous data stored as files on the storage devices. When a snapshot or clone is
taken, the base vDisk is marked immutable and another vDisk is created where new data will be
written.

At creation, both vDisks have the same block map, which is a metadata mapping of the vDisk to
its corresponding extents. Unlike traditional approaches which require traversal of the snapshot
chain to locate vDisk data (which can add read latency), each vDisk has its own block map.
This eliminates any of the overhead normally seen by large snapshot chain depths and allows
multiple snapshots to be taken without any performance impact.

Shadow Clones
A clone is a duplicate of a vDisk, which can then be modified.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 99
Distributed Storage Fabric

A shadow clone, on the other hand, is a cache of a vDisk on all the nodes in the cluster. When a
vDisk is read by multiple VMs (such as the base image for a VDI clone pool), the cluster creates
shadow clones of the vDisk. Shadow clones are enabled by default.

Snapshotting Disks

Snapshots for a VM are crash consistent, which means that the VMDK on-disk images are
consistent with a single point in time. That is, the snapshot represents the on-disk data as if the
VM crashed. The snapshots are not, however, application consistent, meaning that application
data is not quiesced at the time the snapshot is taken.

In order to take application-consistent snapshots, select the option to do so when configuring a

protection domain. Nutanix Guest Tools (NGT) should be installed on any VM of which requires
application-consistent snapshots.

For a breakdown of the differences in snapshots for different hypervisors and operating
systems, with different statuses of NGT, see the following table.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 100
Distributed Storage Fabric

ESXi AHV

NGT Status Result NGT Status Result

Microsoft Installed and Nutanix script- Installed and Nutanix script-

Windows Active. Also based VSS Active. Also based VSS
Server Edition pre_freeze and snapshots pre_freeze and snapshots
post_thaw scripts post_thaw scripts
are present are present

Installed and Nutanix VSS- Installed and Nutanix VSS-

Active enabled Active enabled snapshots
snapshots.

Not enabled Hypervisor-based Not enabled Crash-consistent

application- snapshots
consistent or
crash-consistent
snapshots.

Microsoft Installed and Nutanix script- Installed and Nutanix script-

Windows Active. Also based VSS Active. Also based VSS
Client Edition pre_freeze and snapshots pre_freeze and snapshots
post_thaw scripts post_thaw scripts
are present are present

Not enabled Hypervisor-based Not enabled Crash-consistent

snapshots or snapshots
crash-consistent
snapshots.

Linux VMs Installed and Nutanix script- Installed and Nutanix script-
Active. Also based VSS Active. Also based VSS
pre_freeze and snapshots pre_freeze and snapshots
post_thaw scripts post_thaw scripts
are present are present

Not enabled Hypervisor-based Not enabled Crash-consistent

snapshots or snapshots
crash-consistent
snapshots.

Capacity Optimization - Deduplication

Deduplication is similar to incremental backup and a process that eliminates redundant data
and reduces storage overhead. Deduplication works with compression and erasure coding to
optimize capacity efficiency.

• Ensures that only one unique instance of data is retained

• Replaces redundant data blocks with pointers to copies

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 101
Distributed Storage Fabric

• Supports both inline and post-process deduplication

Deduplication Process

The Elastic Deduplication Engine is a software-based feature of DSF that allows for data
deduplication in the capacity (Extent Store) and performance (Unified Cache) tiers. Incoming
data is fingerprinted during ingest using a SHA-1 hash at a 16 K granularity. This fingerprint is
then stored persistently as part of the written block’s metadata.

Contrary to traditional approaches, which utilize background scans requiring the data to be
reread, Nutanix creates the fingerprint inline on ingest. For data being deduplicated in the
capacity tier, the data does not need to be scanned or reread – matching fingerprints are
detected and duplicate copies can be removed.

Block-level deduplication looks within a file and saves unique iterations of each block. All the
blocks are broken into chunks. Each chunk of data is processed using an SHA-1 hash algorithm.
This process generates a unique number for each piece: a fingerprint.

The fingerprint is then compared with the index of existing fingerprints. If it is already in
the index, the piece of data is considered a duplicate and does not need to be stored again.
Otherwise, the new hash number is added to the index and the new data is stored.

If you update a file, only the changed data is saved, even if only a few bytes of the document
or presentation have changed. The changes do not constitute an entirely new file. This behavior
makes block deduplication (compared with file deduplication) far more efficient.

However, block deduplication takes more processing power and uses a much larger index to
track the individual pieces.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 102
Distributed Storage Fabric

To reduce metadata overhead, fingerprint reference counts (refcounts) are monitored during
the deduplication process. Fingerprints with low refcounts will be discarded. Full extents are
preferred for capacity tier deduplication in order to minimize fragmentation.

When used in the appropriate situation, deduplication makes the effective size of the
performance tier larger so that more active data can be stored.

• Deduplication allows the sharing of guest VM data on Nutanix storage tiers.

• Performance of guest VMs suffers when active data can no longer fit in the performance
tiers.

Deduplication Techniques
Inline deduplication is useful for applications with large common working sets.

• Removes redundant data in performance tier

• Allows more active data, can improve performance to VMs

• Leverages hardware-assist capabilities; software-driven

Post-process deduplication is useful for virtual desktops (VDI) with full clones.
• Reduces redundant data in capacity tier, increasing effective storage capacity of a cluster

• Distributed across all nodes in a cluster (global)

Capacity Optimization - Compression

Nutanix recommends using inline compression (compression delay = 0), because it compresses
only larger/sequential writes and does not affect random write performance. This also increases
the usable size of the SSD tier, increasing effective performance and enabling more data to sit
in the SSD tier.

For sequential data that is written and compressed inline, the RF copy of the data is
compressed before transmission, further increasing performance since it is sending less data
across the network.

Inline compression also pairs perfectly with erasure coding. For instance, an algorithm may
represent a string of bits with a smaller string of 0s and 1s by using a dictionary for the
conversion between them, or the formula may insert a reference or pointer to a string of 0s and
1s that the program has already seen.

Text compression can be as simple as removing all unneeded characters, inserting a single
repeat character to indicate a string of repeated characters, and substituting a smaller bit
string for a frequently occurring bit string. Data compression can reduce a text file to 50% or a
significantly higher percentage of its original size.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 103
Distributed Storage Fabric

Compression Process

Inline compression condenses sequential streams of data or large I/O sizes (>64K) when
written to the Extent Store (SSD + HDD). This includes data draining from oplog as well as
sequential data skipping it.

Offline compression initially writes the data in an uncompressed state and then leverages the
Curator framework to compress the data cluster-wide. When inline compression is enabled
but the I/O operations are random in nature, the data is written uncompressed in the oplog,
coalesced, and then compressed in memory before being written to the Extent Store.

Nutanix leverages LZ4 for initial data compression, which provides a very good blend between
compression and performance. For cold data, Nutanix uses LZ4HC to provide an improved
compression ratio.

Compression Technique Comparison

Inline compression
• Data compressed as it’s written

• LZ4, an extremely fast compression algorithm

Post-process (offline) compression

• Data is compressed after a configured delay

• Utilizes Lz4 compression initially

• Cold data recompressed with LZ4HC, a high-compression version of LZ4 algorithm

• No impact on normal I/O path

• Ideal for random-access batch workloads

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 104
Distributed Storage Fabric

Workloads and Dedup/Compression

Although both dedup and compression optimize the use of storage capacity, it is important to
understand which use cases and workloads benefit most from each.

Environments suitable for compression

Data compression tends to be more effective than deduplication in reducing the size of unique
information, such as images, audio, videos, databases, and executable files.

Environments less suitable for compression

Workloads that frequently update data (for example, virtualized applications for power users,
such as CAD) are not good candidates for compression.

Environments suitable for deduplication

Deduplication is most effective in environments that have a high degree of redundant data,
such as virtual desktop infrastructure or storage backup systems.

Environments less suitable for deduplication

View Composer API for Array Integration (VCAI) snapshots, linked clones: By using Nutanix
VCAI snapshots, linked clones, or similar approaches, storage requirements for end-user VMs
are already at a minimum. In this case, the overhead of deduplication outweighs the benefits.

Deduplication and Compression Best Practices

Use Case Example Recommendation

User data File server, user data, vDisk Post-process compression with 4-6 hour delay

VDI Vmware View, Citrix VCAI snapshots, linked clones, full clones with
XenDesktop inline dedup (not container compression)

Data processing Hadoop, data analytics, Inline compression

data warehousing

Transactional Exchange, Active Native application compression where

applications Directory, SQL Server, available, otherwise inline compression
Oracle

Archive or Handy Backup, SyncBack Inline compression unless data is already

backup compressed

Note: Nutanix does not recommend turning on deduplication for VAAI (vStorage
APIs for Array Integration) clone or linked clone environments.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 105
Distributed Storage Fabric

Replication Factor

When you configure the DSF with a replication factor of 2 or 3, the Nutanix cluster maintains
two or three exact copies of the same data on different nodes to ensure data availability. The
actual logical capacity available depends on the replication factor you choose. When you
use replication factor 2 (also called fault tolerance 1), you have approximately 50 percent
capacity available. When you use replication factor 3 (also called fault tolerance 2), you have
approximately 33 percent capacity available.

Replication factor is how many times data is duplicated on the system

If your environment is in a Redundancy Factor 2 configuration, the Replication Factor is also 2.
If in a Redundancy Factor 3, it can be either 2 or 3.

Erasure Coding Basics

Before EC-X starts, the data must be write-cold (in other words, there has been no write access
to the data for seven days). The amount of space you can save when using EC-X varies based
on:

• The amount of cold data.

• The size of your Nutanix cluster
• The replication factore you have configured for the DSF

Nutanix DSF works with different logical constructs:

• Slice: 32KB (8KB is a subregion of a slice that you can address).

• Extent: 1MB (for deduplicated data, an extent is 16KB)

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 106
Distributed Storage Fabric

• Extent Group: Either 1MB or 4MB.

• Container: Logical grouping construct where VMs are placed and EC-X is enabled.

EC-X works at the extent group layer, meaning it uses 1 or 4 MB data sets when performing its
calculations. By default, the cluster automatically uses extent groups that belong to the same
virtual disk (vDisk), as this method makes it easier to perform garbage cleanup, but the cluster
can use extent groups from different vDisks if necessary. vDisks on the DSF are made of virtual
blocks (vBlocks), which are 1 MB chunks of virtual address space. Each vDisk in the system is
owned by a Nutanix CVM that typically runs on the same Nutanix node as the VM the vDisk
belongs to.

EC-X Compared to Traditional RAID

Traditional RAID

• Bottleneck by single disk

• Slow rebuilds

• Hardware-defined

• Hot spares waste space

Erasure Coding

• Keeps resiliency unchanged

• Optimizes availability (fast rebuilds)

• Uses resources of entire cluster

• Increases usable storage capacity

EC-X increases effective or usable capacity on a cluster. The savings after enabling EC-X is in
addition to deduplication and compression storage.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 107
Distributed Storage Fabric

EC-X Process

Erasure coding is performed post-process and leverages the Curator MapReduce framework
for task distribution.

Since this is a post-process framework, the traditional write I/O path is unaffected. In this
scenario, primary copies of both RF2 and RF3 data are local and replicas are distributed on the
remaining cluster nodes.

When Curator runs a full scan, it finds eligible extent groups for encoding. Eligible extent
groups must be "write-cold", meaning they have not been overwritten for a defined amount of
time. For regular vdisks, this time period is 7 days. For snapshot vdisks, it is 1 day.

After erasure coding finds the eligible candidates, Chronos will distribute and throttle the
encoding tasks.

EC-X Pros and Cons

Pros
• Increases usable capacity of RAW storage.

• Potentially increases amount of data stored in SSD tier.

Cons

• Higher impact (read) in case of drive/node failure.

• Degrades performance for I/O patterns with high percentage of overwrites.

• Increases computational overheads

EC-X Workloads

Recommended workloads for erasure coding (workloads not requiring high I/O):

• Write Once, Read Many (WORM) workloads

- Backups

- Archives

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 108
Distributed Storage Fabric

- File servers

- Log servers

- Email (depending on usage)

Workloads not ideal for erasure coding:

• Anything write/overwrite-intensive that increases the overhead on software-defined storage.

For example: VDI, which is typically very write-intensive.

• VDI is not capacity-intensive thanks to intelligent cloning (so EC-X advantages are minimal).

Erasure Coding in Operation

Before

After

Once the data becomes cold, the erasure code engine computes double-parity for the data
copies by taking all the data copies (‘d’) and performing an exclusive OR operation to create
one or more parity blocks. With the two parity blocks in place, the 2nd and 3rd copies are
removed.

You end up with 12 (original three copies) + 2 (parity) - 8 (removal second + third copies) = 6
blocks, which is a storage savings of 50%.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 109
Distributed Storage Fabric

Replication Factor 3 with Erasure Coding: 6-Node

EC-X makes a strip from existing data to create parity. The strip width depends on the number
of nodes in the Nutanix cluster and the data replication factor configured for the Nutanix
container.

EC-X tries to delete the copy of the extent group that is not local to the CVM. For example,
if VM1 runs on node1 and has egroup1 on node1 and node2, the DSF keeps the egroup1 copy
on node1 after the EC-X operation. EC-X places the parity extent group on a different node
(not node1) and does not compress the parity bit, even if compression is enabled at the DSF
container level. In a hybrid system, the DSF places the parity bit on HDD if possible.

In a 6-node cluster configured with redundancy factor 2, erasure coding uses a stripe size of 5:
4 nodes for data and 1 node for parity. The sixth node in the cluster ensures that if a node fails,
a node is available for rebuild. With a stripe of 4 data to 1 parity, the overhead is 25%. Without
erasure coding, the overhead is 100%.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 110
Distributed Storage Fabric

Replication Factor 2 with Erasure Coding: 4-Node

Erasure coding stripe size adapts to the size of the cluster starting with the minimum 4 nodes
with a maximum of 5 node stripe width.

Best Practices:

1. A cluster must have at least four nodes populated with each storage tier (SSD/HDD)
represented to enable erasure coding.

2. Avoid strips greater than (4, 1) because capacity savings provide diminishing returns and the
larger strip size increases the cost of rebuild.

3. Erasure coding effectiveness (data reduction savings) might be reduced on workloads that
have many overwrites outside of the erasure coding window, which by default is 7 days.

4. Read performance can degrade during failure scenarios.

5. Erasure coding is an asynchronous process, so space savings might not appear for some
time.

6. Multiple node removal operations can break the erasure coded strip. If it is necessary to
remove multiple nodes from a cluster that uses erasure coding turn off erasure coding
before removing the nodes.

7. If erasure coding is enabled on any storage container, a minimum of four blocks for RF2 or
six blocks for RF3 is required to maintain block awareness.

8. Erasure coding mandates that data and parity strips must be placed on separate failure
domains (node) and that there is an additional node available for recovery. For example, a
strip size of (4, 1) requires you to have at least six nodes.

Labs
1. Creating a container with compression enabled

2. Creating a container without compression

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 111
Distributed Storage Fabric

3. Comparing data in a compressed vs uncompressed container

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 112
Module

8
MIGRATING WORKLOADS TO AHV

Objectives
When you have completed this module, you will be able to describe how to migrate workloads
using Nutanix Move.

Nutanix Move
Nutanix Move is a freely distributed application to support migration from a non-Nutanix source
to a Nutanix target with minimal downtime.

Nutanix Move supports three types of sources for migration.

• Migration of VMs running on an ESXi hypervisor managed by vCenter Server.

• Migration of Amazon Elastic Block Store (EBS) backed EC2 instances running on AWS.

• Migration of VMs running on a Hyper-V hypervisor.

The distributed architecture of Nutanix Move has the following components.

• Nutanix Move: a VM running on the Nutanix cluster to orchestrate the migration.

• NTNX-MOVE-AGENT: an agent running on AWS as an EC2 instance of type t2.micro. NTNX-

MOVE-AGENT interfaces with the source VM to facilitate migration, works with AWS APIs
to take snapshots, and transfers data from source to target. Move deploys the NTNX-MOVE-
AGENT in every region with the AWS account of the IAM user. When Move deletes the last
migration plan of the region, Move stops the NTNX-MOVE-AGENT instance. When Move
removes the source, the NTNX-MOVE-AGENT instance is terminated.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 113
Migrating Workloads to AHV

- When you are migrating from ESXi to AHV, Nutanix Move directly communicates with
vCenter through the Management Server and the Source Agent. The Source Agent
collects information about the VM being migrated (guest VM) from the VMware library.

- Hyper-V to AHV migration requires installation of MOVE-AGENT agent on each source

Hyper-V host. The Agent is installed as a Windows service and must be running in order
to allow VM discovery and migration. Currently automatic and manual methods are
supported for Hyper-V Move agent deployment.

Note: Adding single AWS account as source with multiple IAM users is not
supported.

• Changed Block Tracking (CBT) driver: a driver running on the source VMs to be migrated to
facilitate efficient transfer of data from the source to the target. Move deploys the driver as
part of the source VM preparation and removes it during post migration cleanup.

In case of migration from AWS to AHV, NTNX-MOVE-AGENT runs on AWS as an EC2 instance
to establish connection between AWS and Nutanix Move. Nutanix Move takes snapshots of
the EBS volumes of the VMs for the actual transfer of data for the VM being migrated (guest
VM). The CBT driver computes the list of blocks that have changed to optimally transfer only
changed blocks of data on the disk. The data path connection between NTNX-MOVE-AGENT
and Nutanix Move is used to transfer data from AWS to the target Nutanix Cluster.

After the migration of the VM from the source to the target, Nutanix Move deletes all EBS
volume snapshots taken by it.

Note: Nutanix Move does not store other copies of the data.

Nutanix Move Operations

You can perform the following operations with Nutanix Move.

• Migrate powered on or powered off VMs.

Note: For AWS, the migration takes place in powered on state. For ESXi, the
power state is retained.

• Pause and resume migration.

• Schedule migration.

• Schedule data-seeding for the virtual machines in advance and cut over to a new AHV
cluster.
• Manage VM migrations between multiple clusters from a single management interface.

• Sort and group VMs for easy migration.

• Monitor details of migration plan execution, even at the individual VM level.

• Cancel in-progress migration for individual VMs.

• Migrate all AHV certified OSs (see the Supported Guest VM Types for AHV section of the
AHV Admin Guide on the Support Portal).

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 114
Migrating Workloads to AHV

Compatibility Matrix

Software Version Number

ESXi host version 5.1, 5.5, 6.0, 6.5, 6.7

vCenter 5.5, 6.0, 6.5, 6.7

Hyper-V Windows Server 2012 with Hyper-V role (Standalone and

Cluster)

Windows Server 2012 R2 with Hyper-V role (Standalone and

Cluster)

Windows Server 2016 with Hyper-V role (Standalone and

Cluster)

Microsoft Hyper-V Server 2012 (Standalone and Cluster)

Microsoft Hyper-V Server 2012 R2 (Standalone and Cluster)

Microsoft Hyper-V Server 2016 (Standalone and Cluster)

Unsupported Features
• IPV6

• VM names with non-English characters.

• VM names with single and double quotes.

• Windows VMs installed with any antivirus software. Antivirus software prevents the
installation of the VirtIO drivers.

Configuring Nutanix Move

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 115
Migrating Workloads to AHV

Nutanix Move Migration

Downloading Nutanix Move

Download the Nutanix Move bundle from the Nutanix Move tab of the Nutanix Support Portal.

To get started with Nutanix Move, you need to first download and invoke the Nutanix Move
appliance on the target clusters, and then deploy Nutanix Move. If you are migrating to multiple
AHV clusters, you can deploy Nutanix Move on any of the target clusters. Once the installation
has completed, continue with configuring the Move environment and build a Migration Plan
using the Move interface.

Labs
1. Preparing a VM for Migration

2. Deploying a Move VM

3. Configuring Move

4. Configuring a Migration Pla

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 116
Module

9
FILES AND VOLUMES

Overview
After completing this module, you will be able to:

• Describe and configure Nutanix Volumes.

• Describe Nutanix Files.

Nutanix Volumes
Nutanix Volumes is a native scale-out block storage solution that enables enterprise
applications running on external servers to leverage the benefits of the hyperconverged
Nutanix architecture, accessing the Nutanix DSF via the iSCSI protocol

Nutanix Volumes offers a solution for workloads that may not be a fit for running on virtual
infrastructure but still need highly available and scalable storage. For example, workloads
requiring locally installed peripheral adaptors, high socket quantity compute demands, or
licensing constraints.

Nutanix Volumes enables you to create a shared infrastructure providing block-level iSCSI
storage for physical servers without compromising availability, scalability, or performance. In
addition, you can leverage efficient backup and recovery techniques, dynamic load-balancing,
LUN resizing, and simplified cloning of production databases. You can use Nutanix Volumes to
export Nutanix storage for use with applications like Oracle databases including Oracle RAC,
Microsoft SQL Server, and IBM Db2 running outside of the Nutanix cluster.

Every CVM in a Nutanix cluster can participate in presenting storage, allowing individual
applications to scale out for high performance. You can dynamically add or remove Nutanix
nodes, and by extension CVMs, from a Nutanix cluster.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 117
Files and Volumes

Upgrading a Nutanix cluster using Volumes is seamless and nondisruptive to applications.

Storage is always highly available with robust failure handling.

Nutanix manages storage allocation and assignment for Volumes through a construct called a
volume group (VG). A VG is a collection of “volumes,” more commonly referred to as virtual
disks (vDisks). Volumes presents these vDisks to both VMs and physical servers, which we refer
to as “hosts” unless otherwise specified.

vDisks represent logical “slices” of the ADSF’s container, which are then presented to the hosts
via the iSCSI protocol. vDisks inherit the properties (replication factor, compression, erasure
coding, and so on) of the container on which you create them. By default, these vDisks are
thinly provisioned. Because Nutanix uses iSCSI as the protocol for presenting VG storage, hosts
obtain access based on their iSCSI Qualified Name (IQN). The system uses IQNs as a whitelist
and attaches them to a VG to permit access by a given host. You can use IP addresses as an
alternative to IQNs for VG attachment. Once a host has access to a VG, Volumes discovers the
VG as one or more iSCSI targets. Upon connecting to the iSCSI targets, the host discovers the
vDisks as SCSI disk devices. The figure above shows these relationships.

Advantages of Nutanix Volumes

• Shared disks (Oracle RAC, Microsoft failover clustering).

• Disks as first-class entities - execution contexts are ephemeral and data is critical.

• Guest-initiated iSCSI supports bare-metal consumers and Microsoft Exchange on vSphere.

See Converting Volume Groups and Updating Clients to use Volumes for more information.

iSCSI Qualified Name (IQN)

• YYYY- M M: The year and month the naming authority was established.

• NAMING - AUTHORITY: Usually a reverse syntax of the Internet domain name of the naming
authority.

• UNIQUE NAME: Any name you want to use, for example:

iqn.1998-01.com.nutanix.iscsi:name999

iSCSI Qualified Name (IQN) is one of the naming conventions used by iSCSI to identify initiators
and targets. IQN is documented in RFC 3720. The IQN can be up to 255 characters long.

Challenge-Handshake Authentication Protocol (CHAP)

Challenge-Handshake Authentication Protocol (CHAP) authentication is a shared secret known
to both authenticator and peer. CHAP provides protection against replay attacks by the peer

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 118
Files and Volumes

using an incrementally changing identifier and of a variable challenge-value. CHAP requires that
both the client and server know the plaintext of the secret, although it is never sent over the
network.

Mutual CHAP authentication. With this level of security, the target and the initiator authenticate
each other. CHAP sets a separate secret for each target and for each initiator.

Attaching Initiators to Targets

The administrator has created two volume groups, volume group A and volume group B.
• Volume group A has three vDisks and volume group B has two vDisks.

• The hosts HostA and HostB have their iSCSI initiators configured to communicate with the
iSCSI target (data services IP).

• Volumes presents the vDisks to the initiators as LUNs.

Before we get to configuration, we need to configure the data services IP that will act as our
central discovery/login portal.

Volume groups (VGs) work with ESXi, Hyper-V, and AHV for iSCSI connectivity. AHV also
supports attaching VGs directly to VMs. In this case, the VM discovers the vDisks associated
with the VG over the virtual SCSI controller.

You can use VGs with traditional hypervisor vDisks. For example, some VMs in a Nutanix cluster
may leverage .vmdk or .vhdx based storage on Network File System (NFS) or Server Message
Block (SMB), while other hosts leverage VGs as their primary storage.
VMs utilizing VGs at a minimum have their boot and operating system drive presented with
hypervisor vDisks. You can manage VGs from Prism or from a preferred CLI such as aCLI, nCLI,
or PowerShell. Within Prism, the Storage page lets you create and monitor VGs.

Nutanix Volumes presents a volume group and its vDisks as iSCSI targets and assigns IQNs.
Initiators or hosts have their IQNs attached to a volume group to gain access.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 119
Files and Volumes

Configuring a Volume Group for Shared Access

Multiple hosts can share the vDisks associated with a VG for the purposes of shared storage
clustering. A common scenario for using shared storage is in Windows Server failover
clustering. You must explicitly mark the VG for sharing to allow more than one external initiator
or VM to attach.

In some cases, Volumes needs to present a volume group to multiple VMs or bare metal servers
for features like clustering. The graphic shows how an administrator can present the same
volume group to multiple servers.

Volume groups are connected via the iSCSI Qualified Name (IQN) which follows the format of
iqn.yyyy-mm.<naming-authority>.<unique_character_string>

Note: Allowing multiple systems to concurrently access this volume group can
cause serious problems.

Nutanix Volumes iSCSI Connectivity

Volumes uses iSCSI redirection to control target path management for vDisk load balancing
and path resiliency.

Instead of configuring host iSCSI client sessions to connect directly to CVMs, Volumes uses
an external data services IP address. This data services IP acts as a discovery portal and initial

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 120
Files and Volumes

connection point. The data services address is owned by one CVM at a time. If the owner goes
offline, the address moves between CVMs, thus ensuring that it’s always available.

For failback the default interval is 120 seconds.

Once the affined Stargate is healthy for 2 or more minutes, the system quiesces and closes the
session, forcing another logon back to the affined Stargate.

Guest VM data management

Hosts read and write data in shared Nutanix datastores as if they were connected to a SAN.
Therefore, from the perspective of a hypervisor host, the only difference is the improved
performance that results from data not traveling across a network.

When a guest VM submits a write request through the hypervisor, Stargate sends that request
to the Controller VM on the host. To provide a rapid response to the guest VM, Volumes first
stores this data on the metadata drive, within a subset of storage called the oplog. This cache is
rapidly distributed across the 10 GbE network to other metadata drives in the cluster. Volumes
periodically transfers oplog data to persistent storage within the cluster.

Volumes writes data locally for performance and replicated on multiple nodes for High
Availability.

When the guest VM sends a read request through the hypervisor, the Controller VM reads from
the local copy first, if present. If the host does not contain a local copy, then the Controller VM
reads across the network from a host that does contain a copy. As Volumes accesses remote
data, the remote data is migrated to storage devices on the current host so that future read
requests can be local.

Labs
1. Deploying Windows or Linux VMs

2. Configuring the Data Services IP

3. Creating a Volume Group for Windows

4. Configuring the Windows VM as an iSCSI Initiator

5. Configuring the Windows VM for Access to a Volume Group

6. Creating a Volume Group for Linux

7. Configuring the Linux VM as an iSCSI Initiator

8. Preparing new disks for Linux

Nutanix Files
Nutanix Files allows users to leverage the Nutanix platform as a highly available file server.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 121
Files and Volumes

Files is a software-defined, scale-out file storage solution that provides a repository for
unstructured data, such as

• home directories

• user profiles

• departmental shares

• application logs

• backups

• archives

Flexible and responsive to workload requirements, Files is a fully integrated, core component of
the Nutanix Enterprise Cloud.

Unlike standalone NAS appliances, Files consolidates VM and file storage, eliminating the need
to create an infrastructure silo. Administrators can manage Files with Nutanix Prism, just like
VM services, unifying and simplifying management. Integration with Active Directory enables
support for quotas and access-based enumeration, as well as self-service restores with the
Windows previous versions feature. All administration of share permissions, users, and groups
is done using the traditional Windows MMC for file management. Nutanix Files also supports
file server cloning, which lets you back up Files off-site and run antivirus scans and machine
learning without affecting production.

Files is fully integrated into Microsoft Active Directory (AD) and DNS. This allows all the secure
and established authentication and authorization capabilities of AD to be leveraged.

Files is a scale-out approach that provides SMB and NFS file services to clients. Nutanix Files
server instances contain a set of VMs (called FSVMs). Files requires at least three FSVMs
running on three nodes to satisfy a quorum for High Availability.

Files is compatible with:

• Hypervisors: AHV, ESXi, Hyper-V

• File protocols: CIFS 2.1

• Compatible features: Async-DR

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 122
Files and Volumes

Advantages of Nutanix Files

Easy to Implement: Deploy in minutes, update non-disruptively with a single click, and manage
all storage from a single pane of glass.

Flexible: Scale-up or scale-out flexibly on the hardware of your choice and enjoy cloud-like
pay-as-you-grow consumption.

Intelligent: Know your data, who’s using it, and how—and then drive automated management
and control.

Nutanix Files Architecture

Nutanix Files consists of the following constructs just like any file server.

• File server: High level namespace. Each file server has a set of file services VMs (FSVM)
deployed.

• Share: A file share is a folder that can be accessed by machines over a network. Access
to these shares is controlled by a special Windows permissions called NTACLs, which are
typically set by the administrator. By default, domain administrators have full access and
domain users have read only access to the home share. General purpose shares have full
access to both domain administrator and domain users.

• Folder: Folders for storing files. Files shares folders across FSVMs.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 123
Files and Volumes

Load Balancing and Scaling

The graphic above shows a high-level overview of File Services Virtual Machine (FSVM)
storage. Each FSVM leverages the Acropolis Volumes API for data storage. Files accesses the
API using in-guest iSCSI. This allows any FSVM to connect to any iSCSI target in the event of an
FSVM failure.

Load balancing occurs on two levels. First, a client can connect to any one of the FSVMs and
users can add FSVMs as needed. Second, on the storage side, Nutanix Files can redistribute
volume groups to different FSVMs for better load balancing across nodes. The following
situations prompt load balancing:

1. When removing an FSVM from the cluster, Files automatically load balances all its volume
groups across the remaining FSVMs.

2. During normal operation, the distribution of top-level directories becomes poorly balanced
due to changing client usage patterns or suboptimal initial placement.

3. When increased user demand necessitates adding a new FSVM, its volume groups are
initially empty and may require rebalancing.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 124
Files and Volumes

Features

• Security descriptors

• Alternate data streams

• Data streams

• OpLocks

• Shared-mode locks AHV

• ESXi

• Many-to-one replication

Networking

Nutanix Files uses an external and a storage network. The IP addresses are within the user-
defined range for VLAN and IP addresses.

• Storage network: The storage network enables communication between the file server VMs
and the Controller VM.

• Client-side network: The external network enables communication between the SMB clients
to the FSVMs. This allows Windows clients to access the Nutanix Files shares. Files also uses
the external network to communicate to the Active Directory and domain name servers.

High Availability
Nutanix Files provides two levels of High Availability:

• Stargate path failures through Nutanix Volumes

• VM failures by assuming different VM resources

To provide for path availability, Files leverages DM-MPIO within the FSVM, which has the active
path set to the local CVM by default.

CVM Failure

If a CVM goes offline because of failure or planned maintenance, Files disconnects any active
sessions against that CVM, triggering the iSCSI client to log on again. The new logon occurs
through the external data services IP, which redirects the session to a healthy CVM. When the

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 125
Files and Volumes

failed CVM returns to operation, the iSCSI session fails back. In the case of a failback, the FSVM
off and redirected to the appropriate CVM.

Node and FSVM Failure

1. Stop SMB and NFS services.

2. Disconnect the volume group.

3. Release the IP address and share and export locks.

4. Register the volume group with FSVM-1.

5. Present new shares and exports to FSVM-1 with eth1.

Node Failure

When a physical node fails completely, Files uses leadership elections and the local CVM to
recover. The FSVM sends heartbeats to its local CVM once per second, indicating its state and
that it’s alive. The CVM keeps track of this information and can act during a failover. During a
node failure, a FSVM on that host can migrate to another host. Any loss of service of that FSVM
will then follow the below FSVM failure scenario until it is restored on a new host.

FSVM Failure

When an FSVM goes down, the CVM unlocks the files from the downed FSVM and releases the
external address from eth1. The downed FSVM’s resources then appear on a running FSVM. The
internal Zookeeper instances store this information so that they can send it to other FSVMs if
necessary.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 126
Files and Volumes

When an FSVM is unavailable, the remaining FSVMs volunteer for ownership of the shares
and exports that were associated with the failed FSVM. The FSVM that takes ownership of the
volume group informs the CVM that the volume group reservation has changed. If the FSVM
that attempts to control of the volume group is already the leader for a different volume group
that it has volunteered for, it relinquishes leadership for the new volume group immediately.
This arrangement ensures distribution of volume groups, even if multiple FSVMs fail.

The Nutanix Files Zookeeper instance tracks the original FSVM’s ownership using the storage
IP address (eth0), which does not float from node to node. Because FSVM-1’s client IP address
from eth1 is now on FSVM-2, client connections persist. The volume group and its shares and
exports are reregistered and locked to FSVM-2 until FSVM-1 can recover and a grace period has
elapsed.

When FSVM-1 comes back up and finds that its shares and exports are locked, it assumes that
an HA event has occurred. After the grace period expires, FSVM-1 regains control of the volume
group through the CVM.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 127
Managing Failures

Module

10
MANAGING FAILURES

Overview
Data Resiliency describes the number and types of failures a cluster can withstand; determined
by features such as redundancy factor and block or rack awareness.
After completing this module, you will be able to:

• Explain the recovery procedure for a given failure.

• Describe redundancy factor.
• Describe block and rack awareness.
• Discuss VM high availability and affinity/anti-affinity rules.

Scenarios
Component unavailability is an inevitable part of any datacenter lifecycle. The Nutanix
architecture was designed to address failures using various forms of hardware and software
redundancy.

A cluster can tolerate single failures of a variety of components while still running guest VMs
and responding to commands via the management console—all typically without a performance
penalty.

CVM Unavailability
A Nutanix node is a physical host with a Controller VM (CVM). Either component can fail
without impacting the rest of the cluster.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 128
Managing Failures

The Nutanix cluster monitors the status of CVMs in the cluster. If any Stargate process fails to
respond two or more times in a 30-second period, another CVM redirects hypervisor I/O on the
related host to another CVM. Read and write operations occur over the 10 GbE network until
the failed Stargate comes back online.

To prevent constant switching between Stargates, the data path is not restored until the
original Stargate has been stable for 30 seconds.

What will users notice?

During the switching process, the host with a failed CVM may report that the shared storage
is unavailable. Guest VM IO may pause until the storage path is restored. Although the primary
copy of the guest VM data is unavailable because it is stored on disks mapped to the failed
CVM, the replicas of that data are still accessible.

As soon as the redirection takes place, VMs resume read and write I/O. Performance may
decrease slightly because the I/O is traveling across the network rather than across an internal
bus. Because all traffic goes across the 10 GbE network, most workloads do not diminish in a
way that is perceivable to users.

What happens if another one fails?

A second CVM failure has the same impact on the VMs on the other host, which means there
will be two hosts sending I/O requests across the network. More important is the additional risk
to guest VM data. With two CVMs unavailable, there are now two sets of physical disks that are
inaccessible. In a cluster with a replication factor 2 there is now a chance that some VM data
extents are missing completely, at least until one of the failed CVMs resume operation.

VM impact

• HA event: None

• Failed I/O operations: None

• Latency: Potentially, higher given I/O operations over the network

In the event of a CVM failure, the I/O operation is forwarded to another CVM in the cluster.

ESXi and Hyper-V handle this via a process called CVM autopathing, which leverages a Python
program called HA.py (like “happy”). HA.py modifies the routing table on the host to forward
traffic that is going to the internal CVM address (192.168.5.2) to the external IP of another CVM.
This enables the datastore to remain online - just the CVM responsible for serving the I/O
operations is remote. Once the local CVM comes back up and is stable the route is removed,
and the local CVM takes over all new I/O operations.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 129
Managing Failures

AHV leverages iSCSI multipathing, where the primary path is the local CVM and the two other
paths are remote. In the event where the primary path fails, one of the other paths becomes
active. Similar to autopathing with ESXi and Hyper-V, when the local CVM comes back online it
takes over as the primary path.

In the event where the node remains down for a prolonged period (for example, 30-minutes),
the CVM is removed from the metadata ring. It is joined back into the ring after it has been up
and stable for a period of time.

Node Unavailability
The built-in data redundancy in a Nutanix cluster supports High Availability (HA) provided by
the hypervisor. If a node fails, all HA-protected VMs can be automatically restarted on other
nodes in the cluster.

Curator and Stargate respond to two issues that arise from the host failure:

• When the guest VM begins reading across the network, Stargate begins migrating those
extents to the new host. This improves performance for the guest VM.

• Curator responds to the host and CVM being down by instructing Stargate to create new
replicas of the missing vDisk data.

What will users notice?

Users who are accessing HA-protected VMs will notice that their VM is unavailable while it is
restarting on the new host. Without HA, the VM needs to be manually restarted.

What if another host fails?

Depending on the cluster workload, a second host failure could leave the remaining hosts with
insufficient processing power to restart the VMs from the second host. Even in lightly loaded
clusters, the larger concern is additional risk to guest VM data. For example, if a second host/
CVM fails before the cluster heals and its physical disks are inaccessible, some VM data will be
unavailable.

Remember, with replication factor 2 (RF2, set on a storage container level) there are two copies
of all data. If two nodes go offline simultaneously, it is possible to lose the primary and replicate
data. If this is unacceptable, implement replication factor 3 at the storage container level, or
redundancy factor RF3 which applies to the full cluster.

Drive Unavailability
Drives in a Nutanix node store four primary types of data:

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 130
Managing Failures

• Persistent data (hot-tier and cold-tier)

• Storage metadata

• Oplog

• CVM boot files

Cold-tier persistent data is stored on the hard-disk (HDD) drives of the node. Storage metadata,
oplog, hot-tier persistent data, and CVM boot files are kept in the serial AT attachment solid
state drive (SATA-SSD) in drive bay one. SSDs in a dual-SSD system are used for storage
metadata, oplog, hot-tier persistent data according to the replication factor of system. CVM
boot and operating system files are stored on the first two SSD devices in a RAID-1 (mirrored)
configuration. In all-flash nodes, data of all types is stored in the SATA-SSDs.

Note: On hardware platforms that contain peripheral component interconnect

express SSD (PCIe-SSD) drives, the SATA-SSD holds only the CVM boot files.
Storage metadata, oplog, and hot-tier persistent data reside on the PCIe-SSD.

Boot Drive (DOM) Unavailability

When a boot DOM (SATA DOM for NX hardware) fails, the node will continue to operate
normally as long as the hypervisor or CVM does not reboot. After a DOM failure, the hypervisor
or CVM on that node will no longer be able to boot as their boot files reside on the DOM.

Note: The CVM restarts if a boot drive fails or if you remove a boot drive without
marking the drive for removal and the data has not successfully migrated.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 131
Managing Failures

Metadata drive failure

Cassandra uses up to 4 SSDs to store the database providing read and write access for cluster
metadata.

Depending on cluster RF (2 or 3), either 3 or 5 copies of each piece of metadata is stored on

the CVMs in the cluster.

When a metadata drive fails, the local Cassandra process will no longer be able to access its
share of the database and will begin a persistent cycle of restarts until its data is available. If
Cassandra cannot restart, the Stargate process on that CVM will crash as well. Failure of both
processes results in automatic IO redirection.

During the switching process, the host with the failed SSD may report that the shared storage is
unavailable. Guest VM IO on this host will pause until the storage path is restored.

After redirection occurs, VMs can resume read and write I/O. Performance may decrease
slightly, because the I/O is traveling across the network rather than across the internal network.
Because all traffic goes across the 10 GbE network, most workloads will not diminish in a way
that is perceivable to users.

Multiple drive failures in a single selected domain (node, block, or rack) are also tolerated.

Note: The Controller VM restarts if a metadata drive fails, or if you remove a

metadata drive without marking the drive for removal and the data has not
successfully migrated.

If Cassandra remains in a failed state for more than thirty minutes, the surviving Cassandra
nodes detach the failed node from the Cassandra database so that the unavailable metadata
can be replicated to the remaining cluster nodes. The process of healing the database takes
about 30-40 minutes.

If the Cassandra process restarts and remains running for five minutes, the procedure to
detach the node is canceled. If the process resumes and is stable after the healing procedure
is complete, the node will be automatically added back to the ring. A node can be manually
added to the database using the nCLI command:
ncli> host enable-metadata-store id=cvm_id

Data drive failure

Each node contributes its local storage devices to the cluster storage pool. Cold-tier data is
stored in HDDs, while hot-tier data is stored in SSDs for faster performance. Data is replicated
across the cluster, so a single data drive failure does not result in data loss. Nodes containing
only SSD drives only have a hot tier.

When a data drive (HDD/SSD) fails, the cluster receives an alert from the host and immediately
begins working to create a second replica of any guest VM data that was stored on the drive.

What happens if another drive fails?

In a cluster with a replication factor 2, losing a second drive in a different domain (node, block,
or rack) before the cluster heals can result in some VM data loss to both replicas. Although a
single drive failure does not have the same impact as a host failure, it is important to replace the
failed drive as soon as possible.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 132
Managing Failures

Network Link Unavailability

The physical network adapters on each host are grouped together on the external network.
Unavailability of a network link is tolerated with no impact to users if multiple ports are
connected to the network.

The Nutanix platform does not leverage any backplane for internode communication. It relies on
a standard 10 GbE network.

All storage I/O for VMs running on a Nutanix node is handled by the hypervisor on a dedicated
private network. The I/O request is handled by the hypervisor, which then forwards the request
to the private IP on the local CVM. The CVM then performs the remote replication with other
Nutanix nodes using its external IP over the public 10 GbE network.

In most cases, read requests are serviced by the local node and are not routed to the 10 GbE
network. This means that the only traffic routed to the public 10 GbE network is data replication
traffic and VM network I/O. Cluster-wide tasks, disk balancing for example, generate I/O on the
10 GbE network.

What will users notice?

Each Nutanix node is configured at the factory to use one 10 GbE port as the primary pathway
for vSwitch0. Other 10 GbE ports are configured in standby mode. Guest VM performance does
not decrease in this configuration. If a 10 GbE port is not configured as the failover path, then
traffic fails over to a 1 GbE port. This failover reduces the throughput of storage traffic and
decreases the write performance for guest VMs on the host with the failed link. Other hosts may
experience a slight decrease as well, but only on writes to extents that are stored on the host
with the link failure. Nutanix networking best practices recommend removing 1 GbE ports from
each host’s network configuration.

What happens if there is another failure?

If both 10 GbE links are down, then the host will fail over to a 1 GbE port if it is configured as
a standby interface. This failover reduces the throughput of storage traffic and decreases the
write performance for guest VMs on the host with the failed link. Other hosts may experience
a slight decrease as well, but only on writes to extents that are stored on the host with the link
failure.

Redundancy Factor (Fault Tolerance)

By default, Nutanix clusters have an RF of 2, which tolerates the failure of a single node or drive.
The larger the cluster, the more likely it is to experience simultaneous failures. Multiple failures
can result in cluster unavailability until the failures are repaired.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 133
Managing Failures

Redundancy factor 3 (RF3) is a configurable option that allows a Nutanix cluster to withstand
the failure of two nodes or drives in different blocks. This is configured by navigating to the
gear button in Prism Element, then Redundancy State. From the drop-down menu on this page,
you can modify the redundancy factor configuration.

Note: If a cluster is set to RF2 it can be converted to RF3 if sufficient nodes are
present. Increasing the cluster RF level consumes 66% of the cluster’s storage vs
50% for RF2.

In the event of a Metadata drive failure: If any metadata drive fails on a host, the Controller
VM restarts. Once the Cassandra process restarts, the missing metadata is retrieved from the
other Controller VMs and sharded across the remaining metadata drives. When the faulty drive
recovers or is replaced, metadata is stored on that drive again. Performance may decrease
slightly for user VMs on the host due to the reboot and the fact that some I/O is traveling
across the network. However, most workloads should not diminish in a way that is perceivable
to users (other than during the reboot).

RF3 features

• At least one copy of all guest VM data plus the oplog is available if two nodes fail.

• Under-replicated VM data is copied to other nodes.

• The cluster maintains five copies of metadata and five copies of configuration data.

- If two nodes fail at least three copies are available.

RF3 requirements

• A cluster must have at least five nodes for RF3 to be enabled.

• For guest VMs to tolerate the simultaneous failure of two nodes or drives in different blocks,
the data must be stored on storage containers with RF3.
• The CVM must be configured with enough memory to support RF3.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 134
Managing Failures

Block Fault Tolerant Data Placement

Block-aware placement of guest VM data.

Block-aware placement of guest VM data with block failure.

Stargate is responsible for placing data across blocks, and Curator makes data placement
requests to Stargate to maintain block fault tolerance.

New and existing clusters can reach a block fault tolerant state. New clusters can be block fault
tolerant immediately after being created if the configuration supports it. Existing clusters that
were not previously block fault tolerant can be made tolerant by reconfiguring the cluster in a
manner that supports block fault tolerance.

New data in a block fault tolerant cluster is placed to maintain block fault tolerance. Existing
data that was not in a block fault tolerant state is moved and scanned by Curator to a block
fault tolerant state.

Depending on the volume of data that needs to be relocated, it might take Curator several
scans over a period of hours to distribute data across the blocks.

Block fault tolerant data placement is on a best effort basis but is not guaranteed. Conditions
such as high disk usage between blocks may prevent the cluster from placing guest VM
redundant copy data on other blocks.

Redundant copies of guest VM data are written to nodes in blocks other than the block that
contains the node where the VM is running. The cluster keeps two copies of each write stored
in the oplog.

The Nutanix Medusa component uses Cassandra to store metadata. Cassandra uses a ring-
like structure where data is copied to peers within the ring to ensure data consistency and

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 135
Managing Failures

availability. The cluster keeps at least three redundant copies of the metadata, at least half of
which must be available to ensure consistency.

With block fault tolerance, the Cassandra peers are distributed among the blocks to ensure that
no two peers are on the same block. In the event of a block failure, at least two copies of the
metadata is present in the cluster.

Rack Fault Tolerance

Rack fault tolerance is the ability to provide rack level availability domain. With rack fault
tolerance, redundant copies of data are made and placed on the nodes that are not in the same
rack.

Rack failure can occur in the following situations:

• All power supplies fail within a rack

• Top-of-rack (TOR) switch fails

• Network partition; where one of the racks becomes inaccessible from other racks

When rack fault tolerance is enabled, and the guest VMs can continue to run with failure of one
rack (RF2) or two racks (RF3). The redundant copies of guest VM data and metadata exist on
other racks when one rack fails.

Fault Replication Minimum Minimum Minimum Data Resiliency

Domain Factor Number of Number of Number of
Nodes Blocks Racks

Rack 2 3 3 3* 1 node or 1 block

or 1 rack or 1 disk

3 5 5 5* 2 nodes or 2
blocks or 2 racks
or 2 disks

* Erasure Coding with Rack Awareness - Erasure coding is supported on a rack-aware cluster.
You can enable erasure coding on new containers in rack aware clusters provided certain
minimums are met that are shown in the table above.

The table shows the level of data resiliency (simultaneous failure) provided for the following
combinations of replication factor, minimum number of nodes, minimum number of blocks, and
minimum number of racks.

Note: Rack Fault Tolerance is supported for AHV and ESXi only.

VM High Availability in Acropolis

In Acropolis managed clusters, you can enable High Availability (HA) for the cluster to ensure
that VMs can be migrated and restarted on another host in case of failure.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 136
Managing Failures

HA can ensure sufficient cluster resources are available to accommodate the migration of VMs
in case of node failure.

The Acropolis Master tracks node health by monitoring connections on all cluster nodes. When
a node becomes unavailable, Acropolis Master restarts all the VMs that were running on that
node on another node in the same cluster.

The Acropolis Master detects failures due to VM network isolation, which is signaled by a failure
to respond to heartbeats.

HA Configuration Options

There are three AHV cluster HA configuration options:

• Reserved segments. On each node, some memory is reserved in the cluster for failover
of virtual machines from a failed node. The Acropolis service in the cluster calculates the
memory to be reserved in the cluster based on the virtual machine memory configuration.
AHV marks all nodes as schedulable and resources available for running VMs.

• Best effort (not recommended). No reservations of node or memory on node are done in
the cluster and in case of any failures the virtual machines are moved to other nodes based
on the resources and memory available on the node. This is not a preferred method. If there
are no resources available on the cluster or node some of the virtual machines may not be
powered-on.

• Reserved host (only available via aCLI and not recommended). A full node is reserved for
HA of VM in case of any node failures in the cluster and does not allow virtual machines to
be run and powered on or migrated to the node during normal operation of the cluster. This
mode only works if all the nodes in the cluster have the same amount of memory.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 137
Managing Failures

High Availability

The built-in data redundancy in a Nutanix cluster supports high availability provided by the
hypervisor. If a node fails, all HA-protected VMs can be automatically restarted on other
nodes in the cluster. Virtualization management VM high availability may implement admission
control to ensure that, in case of node failure, the rest of the cluster has enough resources to
accommodate the VMs. The hypervisor management system selects a new host for the VMs
that may or may not contain a copy of the VM data.

If the data is stored on a node other than the VM’s new host, then read requests are sent across
the network. As remote data is accessed, the remote data is migrated to storage devices on the
current host so that future read requests can be local. Write requests are sent to local storage
and replicated on a different host. During this interaction, the Nutanix software also creates new
copies of preexisting data to protect against future node or disk failures.

Affinity and Anti-Affinity Rules for AHV

You can specify scheduling policies for virtual machines on an AHV cluster. By defining these
policies, you can control placement of the virtual machines on the hosts within a cluster.

You can define two types of affinity policies:

• VM-VM anti-affinity policy

• VM-host affinity policy

VM Anti-Affinity policy

This policy prevents virtual machines from running on the same node. The policy forces VMs to
run on separate nodes so that application availability is not affected by node failure. This policy

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 138
Managing Failures

does not limit the Acropolis Dynamic Scheduling (ADS) feature to take necessary action in case
of resource constraints.

Note: Currently, you can only define VM-VM anti-affinity policy by using aCLI. For
more information, see Configuring VM-VM Anti-Affinity Policy.

Note: Anti-Affinity policy is applied during the initial placement of VMs (when a VM
is powered on). Anti-Affinity policy can be over-ridden by manually migrating a VM
to the same host as its opposing VM, when a host is put in maintenance mode, or
during a HA event. ADS will attempt to resolve any anti-affinity violations when they
are detected.

Note: VM-VM affinity policy is not supported.

VM-Host Affinity policy

The VM-host affinity policy controls the placement of VMs. Use this policy to specify that
a selected VM can only run on the members of the affinity host list. This policy checks and
enforces where a VM can be hosted when you power on or migrate the VM.

Note: If you choose to apply a VM-host affinity policy, it limits Acropolis HA and
Acropolis Dynamic Scheduling (ADS) in such a way that a virtual machine cannot
be powered on or migrated to a host that does not conform to requirements of the
affinity policy as this policy is enforced mandatorily.

Note: The VM-host anti-affinity policy is not supported.

Note: Select at least two hosts when creating a host affinity list to protect against
downtime in the case of a node failure. This configuration is always enforced; VMs
will not be moved from the hosts specified here, even in the case of an HA event.

Watch the following video to learn more about Nutanix affinity rules: https://youtu.be/
rfHR93RFuuU.

Limitations of Affinity Rules

• Even if a host is removed from a cluster, the host UUID is not removed from the host-affinity
list for a VM. Review and update any host affinity rules any time a node is removed from a
cluster.
• The VM-host affinity cannot be configured on a cluster that has HA configured using
reserved host method.

• You cannot remove the VM-host affinity for a powered on VM from Prism. You can use
the vm.affinity_unset vm_list aCLI command to perform this operation.

Labs
1. Failing a Node - VM High Availability

2. Configuring High Availability

3. Configuring Virtual Machine Affinity

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 139
Managing Failures

4. Configuring Virtual Machine Anti-Affinity

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 140
Module

11
DATA PROTECTION

Overview
After completing this module, you will be able to:

• Understand a/synchronous replication options.

• Understand Protection Domains, Consistency Groups, and migrate/activate procedures.

• Understand Leap Availability Zones.

• Understand ROBO configuration and recovery options.

VM-centric Data Protection Terminology

Disaster Recovery (DR)

Disaster Recovery (DR) is an area of failover planning that aims to protect an organization
from the effects of significant negative events. DR allows an organization to maintain or quickly
resume mission-critical functions following a disaster.

Recovery Point Objective (RPO)

• RPO is the tolerated time interval after disruption that allows for a quantity of data lost
without exceeding the maximum allowable threshold.
• RPO designates the variable amount of data that will be lost or will have to be re-entered
during network downtime.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 141
Data Protection

Example: If the data snapshot interval and the RPO is 180 minutes, and the outage lasts only
2 hours, you’re still within the parameters that allow for recovery and business processes to
proceed given the volume of data lost during the disruption.

Recovery Time Objective (RTO)

How much time does it take to recover after notification of business process disruption?

• RTO is therefore the duration of time and a service level within which a business process
must be restored after a disaster in order to avoid unacceptable consequences associated
with a break in continuity.
• RTO designates the amount of “real time” that can pass before the disruption begins to
seriously and unacceptably impede the flow of normal business operations.

Native (on-site) and Remote Data Replication Capabilities

• Data replication can be local or remote

• Choose from backup or disaster recovery.

Local Replication

• This is also known as Time Stream, a set of snapshots.

• Snapshots are placed locally on the same cluster as the source VM.

Remote Replication

• Snapshots are replicated to one or more other clusters.

• Remote cluster is physical server or Cloud.
• Synchronous [Metro]
• Asynchronous

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 142
Data Protection

RPO and RTO Considerations

• Time Stream and Cloud: High RPO and RTO (hours) should be used for minor incidents.
• Synchronous and asynchronous: (near)-zero RPO and RTO should be used for major
incidents.

Time Stream
A time stream is a set of snapshots that are stored on the same cluster as the source VM or
volume group. Time stream is configured as an async protection domain without a remote site.
The Time Stream feature in Nutanix Acropolis gives you the ability to:

• Schedule and store VM-level snapshots on the primary cluster

• Configure retention policies for these snapshots

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 143
Data Protection

When a snapshot of a VM is initially taken on the Nutanix Enterprise Cloud Platform, the system
creates a read only, zero-space clone of the metadata (index to data) and makes the underlying
VM data immutable or read only. No VM data or virtual disks are copied or moved. The system
creates a read-only copy of the VM that can be accessed like its active counterpart.

Nutanix snapshots take only a few seconds to create, eliminating application and VM backup
windows.

Nutanix Guest Tools (NGT) is a software bundle that can be installed on a guest virtual machine
(Microsoft Windows or Linux). It is a software based in-guest agent framework which enables
advanced VM management functionality through the Nutanix Platform.

The solution is composed of the NGT installer which is installed on the VMs and the Guest Tools
Framework which is used for coordination between the agent and Nutanix platform.

The NGT installer contains the following components:

• Nutanix Guest Agent (NGA) service. Communicates with the Nutanix Controller VM.

• File Level Restore CLI. Performs self-service file-level recovery from the VM snapshots.

• Nutanix VM Mobility Drivers. Facilitates by providing drivers for VM migration between ESXi
and AHV, in-place hypervisor conversion, and cross-hypervisor disaster recovery (CHDR)
features.

• VSS requestor and hardware provider for Windows VMs. Enables application-consistent
snapshots of AHV or ESXi Windows VMs.

• Application-consistent snapshot for Linux VMs. Supports application-consistent snapshots

for Linux VMs by running specific scripts on VM quiesce.

The Guest Tools Framework is composed of a few high-level components:

• Guest Tools Service: Gateway between the Acropolis and Nutanix services and the Guest
Agent. Distributed across CVMs within the cluster with an elected NGT Master which runs on
the current Prism Leader (hosting cluster vIP)

• Guest Agent: Agent and associated services deployed in the VM's OS as part of the NGT
installation process. Handles any local functions (e.g. VSS, Self-service Restore (SSR), etc.)
and interacts with the Guest Tools Service.

The Guest Agent Service communicates with Guest Tools Service via the Nutanix Cluster IP
using SSL. For deployments where the Nutanix cluster components and UVMs are on a different
network (hopefully all), ensure that the following are possible:

• Ensure routed communication from UVM network(s) to Cluster IP, or

• Create a firewall rule (and associated NAT) from UVM network(s) allowing communication
with the Cluster IP on port 2074 (preferred)

The Guest Tools Service acts as a Certificate Authority (CA) and is responsible for generating
certificate pairs for each NGT enabled UVM. This certificate is embedded into the ISO which is

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 144
Data Protection

configured for the UVM and used as part of the NGT deployment process. These certificates are
installed inside the UVM as part of the installation process.

Protection Domains
Concepts

Replication is a fundamental component of any enterprise data protection solution, ensuring

that critical data and applications can be reliably and efficiently replicated to a different site or a
separate infrastructure.

Terminology

Protection Domain

Protection Domain (PD) is a defined group of entities (VMs, files and Volume Groups) that are
always backed up locally and optionally replicated to one or more remote sites.

An async DR protection domain supports backup snapshots for VMs and volume groups. A
metro availability protection domain operates at the storage container level.

A protection domain can use one of two replication engines depending on the replication
frequency that is defined when the protection domain is created. For 1 to 15 minute RPO,
NearSync will be used for replication. For 60 minutes and above, async DR will be used.

Metro Availability Protection Domain

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 145
Data Protection

Active local storage container linked to a standby container at a remote site. Local and remote
containers will have the same name. Containers defined in a Metro Availability Protection
Domain are synchronously replicated to a remote container of the same.

Consistency Group

Optional subset of entities in a protection domain, a default CG is created with PD. A CG

cancontain one or more virtual machines and/or volume groups. The main purpose of a CG is to
take crash-consistent manner. Consistency groups should not exceed 20 entities.

Schedule

A schedule is a PD property that specifies snapshot intervals and snapshot retention. Retention
can be set differently for local and remote snapshots.

Snapshot

Read-only copy of the data and state of a VM, file or Volume Group at a specific point in time.

Considerations for Async DR

• No more than 200 entities (VMs, files, and volume groups)

• Because restoring a VM does not allow for VMX editing, VM characteristics such as MAC
addresses may be in conflict with other VMs in the cluster

• VMs must be entirely on Nutanix datastore (no external storage)

• Data Replication between sites relies on the connection for encryption

• You cannot make snapshots of entire file systems (beyond the scope of a VM) or containers

• The shortest possible snapshot frequency is once per hour

• Consistency groups cannot define boot ordering

• You cannot include Volume Groups (Nutanix Volumes) in a protection domain configured for
Metro Availability

• Keep consistency groups as small as possible, typically at the application level. Note that
when using application consistent snapshots, it is not possible to include more than one VM
in a consistency group.

• Always specify retention time when you create one-time snapshots

• Do not deactivate and then delete a protection domain that contains VMs

• If you want to enable deduplication on a container with protected VMs that are replicated to
a remote site, wait to enable deduplication until:

- Both sites are upgraded to a version that supports capacity tier deduplication.

- No scheduled replications are in progress. If either of these conditions is false, replication

fails.

Protection Domain States

A protection domain on a cluster can be in one of two modes:

• Active: Manages volume groups and live VMs. Makes, replicates, and expires snapshots.

• Inactive: Receives snapshots from a remote cluster.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 146
Data Protection

Note: For a list of guidelines when configuring async DR protection domains, please
see the Async DR Protection Domain Configuration section of the Prism Web
Console Guide on the Nutanix Support Portal.

Protection Domain Failover and Failback

After a protection domain is replicated to at least one remote site, you can carry out a planned
migration of the contained entities by failing over the protection domain. You can also trigger
failover in the event of a site disaster.

Failover and failback events re-create the VMs and volume groups at the other site, but the
volume groups are detached from the iSCSI initiators to which they were attached before the
event. After the failover or failback event, you must manually reattach the volume groups to
iSCSI initiators and rediscover the iSCSI targets from the VMs.

Using Nutanix Leap

Disaster recovery configurations which are created with Prism Element use protection domains
and optional third-party integrations to protect VMs, and they replicate data between on-
premises Nutanix clusters. Protection domains provide limited flexibility in terms of supporting

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 147
Data Protection

operations such as VM boot order and require you to perform manual tasks to protect new VMs
as an application scales up.

Leap uses an entity-centric approach and runbook-like automation to recover applications,

and the lowest RPO it supports is 60 minutes. It uses categories to group the entities to be
protected and applies policies to automate the protection of new entities as the application
scales. Application recovery is more flexible with network mappings, configurable stages
to enforce a boot order, and optional inter-stage delays. Application recovery can also be
validated and tested without affecting production workloads. All the configuration information
that an application requires upon failover are synchronized to the recovery location.

You can use Leap between two physical data centers or between a physical data center and Xi
Cloud Services. Leap works with pairs of physically isolated locations called availability zones.
One availability zone serves as the primary location for an application while a paired availability
zone serves as the recovery location. While the primary availability zone is an on-premises
Prism Central instance, the recovery availability zone can be either on-premises or in Xi Cloud
Services.

Configuration tasks and disaster recovery workflows are largely the same regardless of whether
you choose Xi Cloud Services or an on-premises deployment for recovery.

Availability Zone
An availability zone is a location to which you can replicate the data that you want to protect.
It is represented by a Prism Central instance to which a Nutanix cluster is registered. To ensure
availability, availability zones must be physically isolated from each other.

An availability zone can be in either of the following locations:

• Xi Cloud Services. If you choose to replicate data to Xi Cloud Services, the on-premises
Prism Central instance is paired with a Xi Cloud Services account, and data is replicated to Xi
Cloud Services.

• Physical Datacenter. If you choose to back up data to a physical datacenter, you must
provide the details of a Prism Central instance running in a datacenter that you own and that
is physically isolated from the primary availability zone.

Availability zones in Xi Cloud Services are physically isolated from each other to ensure that a
disaster at one location does not affect another location. If you choose to pair with a physical
datacenter, the responsibility of ensuring that the paired locations are physically isolated lies
with you.

Primary Availability Zone

The availability zone that is primarily meant to host the VMs you want to protect.

Recovery Availability Zone

The availability zone that is paired with the primary availability zone, for recovery purposes.
This can be a physical datacenter or Xi Cloud Services.

License Requirements
For disaster recovery between on-premises clusters and Xi Cloud Services, it is sufficient to use
the AOS Starter license on the on-premises clusters.

For disaster recovery between on-premises clusters, the license requirement depends on the
Leap features that you want to use. For information about the features that are available with
an AOS license, see Software Options.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 148
Data Protection

Nutanix Software Requirements

• You cannot use Leap without Prism Central. Each datacenter must have a Prism Central
instance with Leap enabled on it.

• On-premises Nutanix clusters and the Prism Central instance with which they are registered
must be running AOS 5.10 or later.

• The on-premises clusters must be running the version of AHV that is bundled with supported
version of AOS.

• On-Premises clusters registered with the Prism Central instance must have an external IP
address.

• The cluster on which the Prism Central instance is hosted must meet the following
requirements:

- The cluster must be registered to the Prism Central instance

- The cluster must have an iSCSI data services IP address configured on it.

- The cluster must also have sufficient memory to support a hot add of memory to all
Prism Central nodes when you enable Leap. A small Prism Central instance (4 vCPUs, 16
GB memory) requires a hot add of 4 GB and a large Prism Central VM (8 vCPUs, 32 GB
memory) requires a hot add of 8 GB. If you have enabled Nutanix Flow, an additional 1 GB
must be hot-added to each Prism Central instance.

• A single-node Prism Central instance must have a minimum of 8 vCPUs and 32 GB memory.

• Each node in a scaled-out Prism Central instance must have a minimum of 4 vCPUs and 16
GB memory.

• The Prism Central VM must not be on the same network as the protected user VMs. If
present on the user VM network, the Prism Central VM becomes inaccessible when the route
to the network is removed following failover.

• Do not uninstall Nutanix VM mobility drivers on the VMs as the VMs become unusable post
migration after uninstalling mobility drivers.

Networking Requirements

Requirements for Static IP Address Preservation After Failover

Static IP address preservation refers to maintaining the same IP address in the destination. The
considerations to achieve this are as follows:

• The VMs must have Nutanix Guest Tools (NGT) installed on them.

• VMs must have at least one empty CD-ROM slot.

• For an unplanned failover, if the snapshot used for restoration does not have an empty CD-
ROM slot, the static IP address is not configured on that VM.

• For a planned failover, if the latest state of the VM does not have an empty CD-ROM slot, the
static IP address is not configured on that VM after the failover.

• Linux VMs must have the NetworkManager command-line tool (nmcli) installed on them. The
version of nmcli must be greater than or equal to 0.9.10.0.

• Additionally, the network on Linux VMs must be managed by NetworkManager. To enable

NetworkManager on a Linux VM, in the interface configuration file (for example, in CentOS,

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 149
Data Protection

the file is /etc/sysconfig/network-scripts/ifcfg-eth0), set the value of the NM_CONTROLLED

field to yes. After setting the field, restart the network service on the VM.

• If you select a non-IPAM network in a VPC in Xi Cloud Services, the gateway IP address and
prefix fields are not auto-populated, and you must manually specify these values.

Requirements for Static IP Address Mapping between Source and Target Virtual Networks

If you want to map, make sure that the following requirements are met:

• Install NGT on the VMs.

• Make sure that a free CD-ROM is available on each VM. The CD-ROM is required for
mounting NGT at the remote site after failover.

• Assign static IP addresses to the VMs.

• Make sure that the guest VMs can reach the Controller VM from both availability zones.

• Configure a VM-level IP address mapping in the recovery plan.

Virtual Network Design Requirements

You must design the virtual subnets that you plan to use for disaster recovery at the recovery
availability zone so that they can accommodate the VMs.

• Make sure that any virtual network intended for use as a recovery virtual network meets the
following requirements:

- The network prefix is the same as that of the source virtual network. For example, if the
source network address is 192.0.2.0/24, the network prefix of the recovery virtual network
must also be 24.

- The gateway IP address offset is the same as that in the source network. For example,
if the gateway IP address in the source virtual network 192.0.2.0/24 is 192.0.2.10, the last
octet of the gateway IP address in the recovery virtual network must also be 10.

• If you want to specify a single cluster as a target for recovering VMs from multiple source
clusters, make sure that the number of virtual networks on the target cluster is equal to the
sum of the number of virtual networks on the individual source clusters. For example, if there
are two source clusters, with one cluster having m networks and the other cluster having n
networks, make sure that the target cluster has m + n networks. Such a design ensures that
all migrated VMs can be attached to a network.

• It is possible to test failover and failback between physical clusters. To perform test runs
without affecting production, prepare test networks at both the source and destination sites.
Then, when testing, attach your test VMs to these networks.

• After you migrate VMs to Xi Cloud Services, make sure that the router in your data center
stops advertising the subnet in which the VMs were hosted.

Remote Office Branch Office

The landscape for enterprise remote and branch offices (ROBO), retail locations, regional
offices and other edge sites has rapidly evolved in the last 5 years. In addition, new demands
have grown with more field-based IT infrastructures such as oil rigs, kiosks, cruise ships,
forward-deployed military operations and even airport security devices that need processing
power in proximity to the point of data collection. Often these needs can’t be met by the public
cloud due to latency and connection realities. They are often too small for traditional, legacy

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 150
Data Protection

approaches to IT infrastructure from capex, power and space perspectives, as well as opex
constraints and the skills required on-site to manage and maintain them.

The Nutanix Enterprise Cloud is a powerful converged compute and storage system that offers
one-click simplicity and high availability for remote and branch offices. This makes deploying
and operating remote and branch offices as easy as deploying to the public cloud, but with
control and security on your own terms. Picking the right solution always involves trade-offs.
While a remote site is not your datacenter, uptime is nonetheless a crucial concern. Financial
constraints and physical layout also affect what counts as the best architecture for your
environment.

Requirements and Best Practices

• Do not enable deduplication on the containers on any of the sites. You can turn it on after
seeding finishes.

• The seed cluster can use any hypervisor.

ROBO Cluster Considerations

The 3 node and new 1 and 2 node offering (ROBO only) from Nutanix allows remote offices to
harness the power of Nutanix Enterprise Cloud OS and simplify remote IT infrastructure that
can now be managed centrally with a single pane of glass. The Nutanix OS can be consistently
deployed across classic on-premises data centers, remote office/branch office and disaster
recovery (DR) sites and the public clouds. This allows businesses to leverage common IT
tooling and enabling application mobility across private/ public clouds without being locked
into any hardware, hyper-visor, or cloud.

ROBO Three-Node Clusters

Three-node (or more) clusters are the gold standard for ROBO deployments. They provide data
protection by always committing two copies of your data, keeping data safe during failures, and
automatically rebuilding data within 60-seconds of a node failure.

Nutanix recommends designing three-node clusters with enough capacity to recover from the
failure of a single node. For sites with high availability requirements or which are difficult to
visit, additional capacity above the n+1 node counts is recommended.

Three-node clusters can scale up to eight nodes with 1 Gbps networking, and up to any scale
when using 10 Gbps and higher networking.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 151
Data Protection

ROBO Two-Node Clusters

Two-node clusters offer reliability for smaller sites while also being cost effective. A Witness
VM is required for two-node clusters only and is used only for failure scenarios to coordinate
rebuilding data and automatic upgrades. You can deploy the witness offsite up to 500 ms away
for ROBO. Multiple clusters can use the same witness for two-node configurations. Nutanix
supports two-node clusters with ESXi and AHV only.

Two-node clusters cannot be expanded.

ROBO One-Node Clusters

One-node clusters are recommended for low availability requirements coupled with strong
overall management for multiple sites. Note that a one-node cluster provides resiliency against
the loss of a single hard drive. Nutanix supports one-node clusters with ESXi and AHV only.

One-node clusters cannot be expanded.

Hypervisor
The three main considerations for choosing the right hypervisor for your ROBO environment
are supportability, operations, and licensing costs.

With Nutanix Acropolis, VM placement and data placement occurs automatically. Nutanix
also hardens systems by default to meet security requirements and provides the automation
necessary to maintain that security. Nutanix supplies STIGs (Security Technical Information
Guidelines) in machine-readable code for both AHV and the storage controller.

For environments that do not want to switch hypervisors in the main datacenter, Nutanix offers
cross-hypervisor disaster recovery to replicate VMs from AHV to ESXi or ESXi to AHV. In the
event of a disaster, administrators can restore their AHV VM to ESXi for quick recovery or
replicate the VM back to the remote site with easy workflows.

Centralized Management and Maintenance

Maintaining a branch utilizing onsite IT is an expensive and inefficient method to ROBO

deployments. In addition, managing three separate tiers of infrastructure requires special
training. Multiplying these requirements across dozens to hundreds of branch locations is often
a non-starter. Nutanix Prism offers centralized infrastructure management, one-click simplicity
and intelligence for everyday operations and insights into capacity planning and forecast. It
makes it possible to schedule upgrades for hundreds of remote sites within a few clicks. Prism

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 152
Data Protection

also provides network visualization allowing you to troubleshoot basic networking issues,
right from the same dashboard. With the scale out capabilities added to the control plane, it is
possible to manage as high as 25 thousand VMs and more centrally.

Prism Element

Prism Element is a management interface native to the platform for every Nutanix cluster
deployed. Because Prism Element manages only the cluster it is part of, each Nutanix cluster
in a deployment has a unique Prism Element instance for management. Multiple clusters are
managed via Prism Central.

Prism Central

Initial Installation and Sizing

• Small environments: For fewer than 2,500 VMs, size Prism Central to 4 vCPUs, 12 GB of
memory, and 500 GB of storage.

• Large environments: For up to 12,000 VMs, size Prism Central to 8 vCPUs, 32 GB of RAM,
and 2,500 GB of storage.

• If installing on Hyper-V, use the SCVMM library on the same cluster to enable fast copy. Fast
copy improves the deployment time.

Each node registered to and managed by Prism Pro requires you to apply a Prism Pro license
through the Prism Central web console. For example, if you have registered and are managing
10 Nutanix nodes (regardless of the individual node or cluster license level), you need to apply
10 Prism Pro licenses through the Prism Central web console.

Integrated Data Protection

Nutanix offers an integrated solution for local on-site backups and replication for central
backup and disaster recovery. The powerful Nutanix Time Stream capability allows unlimited
VM snapshots to be created on a local cluster for faster RPO and RTO and rapidly restore state
when required. Using Prism, administrators can schedule local snapshots and replication tasks
and control retention policies on an individual snapshot basis. An intuitive snapshot browser,
allows administrators to quickly see local and remote snapshots and restore or retrieve a saved
snapshot or a specific VM within a snapshot with a single click. Snapshots are differential and
de-duplicated, hence backup and recovery is automatically optimized, allowing DR and remote
backups to be completed efficiently, for different environments.

• Backup – Provides local snapshot/restore at the ROBO site as well as remote snapshot/
restore to the main data center.

• Disaster Recovery – Provides snapshot replication to the main data center with automatic
failover in the event of an outage.

ROBO Witness VM Requirements

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 153
Data Protection

There are several requirements when setting up a Witness VM. The minimum requirements are:

• 2 vCPUs

• 6 GB of memory

• 25 GB of storage

The Witness VM must reside in a separate failure domain. This means the witness and all
two-node clusters must have independent power and network connections. We recommend
locating the witness VM in a third physical site with dedicated network connections to all sites
to avoid single points of failure.

Communication with the witness happens over port TCP 9440. This port must be open for the
CVMs on any two-node clusters using the witness.

Network latency between each two-node cluster and the Witness VM must be less than 500 ms
for ROBO.

The Witness VM may reside on any supported hypervisor and run on Nutanix or non-Nutanix
hardware. You can register multiple two-node clusters to a single Witness VM.

ROBO Failure and Recovery Scenarios for Two-Node Clusters

For two node recovery processes, a Witness VM is required. There are several potential failure
scenarios between the nodes and the Witness VM. Each failure generates one or more alerts
that can be reviewed in Prism. The recovery steps depend on the nature of the failure. In this
section, we will summarize the steps needed (or not needed) when a failure occurs.

Node Failure

When a node goes down, the live node sends a leadership request to the Witness VM and goes
into single-node mode. In this mode RF2 is still retained at the disk level, meaning data is copied
to two disks. (Normally, RF2 is maintained at the node level normally meaning data is copied to
each node.)

If one of the two metadata SSDs fails while in single-node mode, the cluster (node) goes into
read-only mode until a new SSD is picked for metadata service. When the node that was down
is back up and stable again, the system automatically returns to the previous state (RF2 at the
node level). No user intervention is necessary during this transition.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 154
Data Protection

Network Failure Between Nodes

When the network connection between the nodes fails, both nodes send a leadership request
to the Witness VM. Whichever node gets the leadership lock stays active and goes into single-
node mode. All operations and services on the other node are shut down, and the node goes
into a waiting state. When the connection is re-established, the same recovery process as in the
node failure scenario begins.

Network Failure Between Node and Witness VM

When the network connection between a single node (Node A in this example) and the Witness
fails, an alert is generated that Node A is not able to reach the Witness. The cluster is otherwise
unaffected, and no administrator intervention is required.

Witness VM Failure

When the Witness goes down (or the network connections to both nodes and the Witness fail),
an alert is generated but the cluster is otherwise unaffected. When connection to the Witness
is re-established, the Witness process resumes automatically. No administrator intervention is
required.

If the Witness VM goes down permanently (unrecoverable), follow the steps for configuring a
new Witness through the Configure Witness option of the Prism web console as described in
the Configuring a Witness (two-node cluster) topic on the Nutanix Support Portal.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 155
Data Protection

Complete Network Failure

When a complete network failure occurs (no connections between the nodes or the Witness),
the cluster becomes unavailable. Manual intervention is needed to fix the network. While
the network is down (or when a node fails and the other node does not have access to the
Witness), you have the option to manually elect a leader and run in single-node mode. To
manually elect a leader, do the following:

1. Log in using SSH to the Controller VM for the node to be set as the leader and enter the
following command:
nutanix@cvm$ cluster set_two_node_cluster_leader

Run this command on just the node you want to elect as the leader. If both nodes are
operational, do not run it on the other node.

2. Remove (unconfigure) the current Witness and reconfigure with a new (accessible) Witness
when one is available as described in the Configuring a Witness (two-node cluster) topic on
the Nutanix Support Portal.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 156
Data Protection

Seeding

When dealing with a remote site that has a limited network connection back to the main
datacenter, it may be necessary to seed data to overcome network speed deficits. You may
also need to seed data if systems were foundationed at a main site and shipped to a remote
site without data, but that data is required at a later date.

Seeding involves using a separate device to ship the data to the remote location. Instead of
replication taking weeks or months, depending on the amount of data you need to protect, you
can copy the data locally to a separate Nutanix node and then ship it to your remote site.

Nutanix checks the snapshot metadata before sending the device to prevent unnecessary
duplication. Nutanix can apply its native data protection to a seed cluster by placing VMs in a
protection domain and replicating them to a seed cluster. A protection domain is a collection
of VMs that have a similar recovery point objective (RPO). You must ensure, however, that the
seeding snapshot doesn’t expire before you can copy the data to the final destination.

Note: For more information, please see the ROBO Deployment and Operations
Guide on the Nutanix Support Portal.

How to Seed a Cluster to Bypass Network Replication

During this procedure, the administrator stores a snapshot of the VMs on the seed cluster while
it’s installed in the ROBO site, then physically ships it to the main datacenter.

1. Install and configure application VMs on a ROBO cluster.

2. Create a protection domain called PD1 on the ROBO cluster for the VMs and volume
groups.
3. Create an out-of-band snapshot (S1) for the protection domain on ROBO with no
expiration.

4. Create an empty protection domain called PD1 (same name used in step 2) on the seed
cluster.

5. Deactivate PD1 on the seed cluster.

6. Create remote sites on the ROBO cluster and the seed cluster.

7. Retrieve snapshot S1 from the ROBO cluster to the seed cluster (using Prism on the seed
cluster).

8. Ship the seed cluster to the datacenter.

9. Re-IP the seed cluster.

10. Create remote sites on the ROBO cluster and on the datacenter main cluster (DC1).

11. Create PD1 (same name used in steps 2 and 4) on DC1.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 157
Data Protection

12. Deactivate PD1 on DC1.

13. Retrieve S1 from the seed cluster to DC1 (using Prism on DC1). Prism generates an alert
here, but though it appears to be a full data replication, the seed cluster transferred
metadata information only.

14. Create remote sites on DC1 and the ROBO cluster.

15. Set up a replication schedule for PD1 on the ROBO cluster in Prism.

16. Once the first scheduled replication finishes, you can delete snapshot S1 to reclaim space.

Labs
1. Creating protection domains and local VM restore

2. Creating containers for replication

3. Configuring remote sites

4. Creating protection domains

5. Performing VM migration
6. Migrating back to primary

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 158
Module

12
PRISM CENTRAL

Overview
In the Managing a Nutanix Cluster module, you learned how to use Prism Element to configure a
cluster and set up Pulse and alerts. In this module you'll learn how to:
• Describe Prism Central.

• Deploy a new instance of Prism Central.

• Register and unregister clusters to Prism Central.

• Recognize the additional features of Prism Pro.

• Learn how to view Prism Central features in Test Drive.

Prism Central Overview

Prism Central allows you to monitor and manage all Nutanix clusters from a single GUI:

• Single sign-on for all registered clusters

• Summary dashboard across clusters

• Central dashboard for clusters, VMs, hosts, disks, and storage with drill-down for detailed
information.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 159
Prism Central

• Multi-Cluster analytics

• Multi-Cluster alerts summary with drill-down for possible causes and corrective actions.

• Centrally configure individual clusters.

Prism Starter vs Prism Pro

Prism Element and Central are collectively referred to as Prism Starter and are both included
with every edition of Acropolis for single and multisite management.

Prism Pro is available as an add-on subscription.

The table above can be found on the Nutanix website.

Deploying a New Instance of Prism Central

For more information on this topic, see the Prism Central Guide on the Support Portal.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 160
Prism Central

First, you must deploy an instance of Prism Central into your environment.

Once you have Prism Central deployed, you need to connect all of your clusters to Prism
Central.

You can deploy a Prism Central VM using the "1-click" method. This method employs the Prism
web console from a cluster of your choice and creates the Prism Central VM in that cluster.

The "1-click" method is the easiest method to deploy Prism Central in most cases. However, you
cannot use this method when:

• The target cluster runs Hyper-V or Citrix Hypervisor (or mixed hypervisors)

• You do not want to deploy the Prism Central VM in a Nutanix cluster

• You do not have access to a Nutanix cluster

In any of these cases, use the manual method of installation.

Deployment Methods

There are three methods to deploy Prism Central:

• Deploying from an AOS 5.10 cluster.

• Deploying from a cluster with Internet access.

• Deploying from a cluster that does not have Internet access (aka dark site).

Registering a Cluster to Prism Central

• Ensure that you have logged on to the Prism cluster as an admin

• Do not enable client authentication in combination with ECDSA certificates

• Open ports 9440 and 80 in both directions

• You cannot register a cluster to multiple Prism Central instances

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 161
Prism Central

If you have never logged into Prism Central as the user admin, you need to log on and change
the password before attempting to register a cluster with Prism Central.

Do not enable client authentication in combination with ECDSA certificates on a registered

cluster since it causes interference when communicating with Prism Central.

Open ports 9440 and 80 (both directions) between the Prism Central VM, all Controller VMs,
and the cluster virtual IP address in each registered cluster.

A cluster can register with just one Prism Central instance at a time. To register with a different
Prism Central instance, first unregister the cluster.

Unregistering a Cluster from Prism Central

Unregistering a cluster through the Prism GUI is no longer available. Removal of this option
reduces the risk of accidentally unregistering a cluster. Because several features such as role-
based access control, application management, microsegmentation policies, and self-service
capability all require Prism Central. If a cluster is unregistered from Prism Central, these features
may not be available and the configuration for them may be erased.

Note: See KB 4944 for additional details if you have enabled Prism Self Service,
Calm, or other special features in Prism Central.

Prism Pro Features

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 162
Prism Central

Customizable Dashboards

The custom dashboard feature allows you to build a dashboard based on a collection of fixed
and customizable widgets. You can arrange the widgets on the screen to create exactly the
view into the environment that works best for you. A dashboard’s contents can range from a
single widget to a screen full of widgets.

Prism Pro comes with a default dashboard offering a view of capacity, health, performance, and
alerts that should be ideal for most users and a good starting point for others. The customizable
widgets allow you to display top lists, alerts, and analytics.

Note: Prism Pro allows you to create dashboards using fixed and customizable
widgets.

• Fixed widgets = capacity, health, performance, and alerts.

• Customizable widgets = top lists, alerts, and analytics.

Scheduled Reporting
Reports can provide information to the organization that is useful at all levels, from operations
to leadership. A few common good use cases include:

• Environmental summary: Provides a summary of cluster inventory entities and resource

utilization
• Cluster efficiency: Details possible capacity savings at the VM or cluster level

• Inventory: Produces a list of physical clusters, nodes, VMs, or other entities within an
environment

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 163
Prism Central

The reporting feature within Prism Pro allows you to create both scheduled and as-needed
reports. Prism Pro includes a set of customizable predefined reports, or you can create new
reports using a built-in WYSIWYG (what you see is what you get) editor. In the editor, simply
select data points and arrange them in the desired layout to create your report.The ability to
group within reports can help you get a global view of a given data point or allow you to look at
entities by cluster. Once you have created reports, they can be run either on an as-needed basis
or by setting them to run on a schedule. Configure each report to retain a certain number of
copies before the system deletes the oldest versions. To access reports, choose the report, then
select the version you wish to view. You can either view the report within Prism or via email, if
you have configured the report to send copies to a recipient list.

Dynamic Monitoring
The system learns the behavior of each VM and establishes a dynamic threshold as a
performance baseline for each resource assigned to that VM.

Dynamic monitoring uses VM behavioral learning powered by the Nutanix Machine Learning
Engine (X-Fit) technology to build on VM-level resource monitoring. Each resource chart
represents the baseline as a blue shaded range. If a given data point for a VM strays outside the
baseline range (higher or lower), the system detects an anomaly and generates an alert. The
anomaly appears on the performance charts for easy reference and follow-up.

If the data point’s anomalous results persist over time, the system learns the new VM behavior
and adjusts the baseline for that resource. With behavioral learning, performance reporting
helps you better understand your workloads and have early knowledge of issues that traditional
static threshold monitoring would not otherwise discover.

Dynamic monitoring is available for both VMs and physical hosts and encompasses multiple
data points within CPU, memory, storage, and networking.

Capacity Runway
Capacity planning focuses on the consumption of three resource categories within a Nutanix
cluster: storage capacity, CPU, and memory.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 164
Prism Central

Capacity results appear as a chart that shows the historical consumption for the metric along
with the estimated capacity runway. The capacity runway is the number of days remaining
before the resource item is fully consumed. The Nutanix X-Fit algorithms perform capacity
calculations based on historical data. Prism Pro initially uses 90 days of historical data from
each Prism Element instance, then continues to collect additional data to use in calculations.
Prism Pro retains capacity data points longer than Prism Element, allowing organizations to
study a larger data sample.

The X-Fit method considers resources consumed and the rate at which the system consumes
additional amounts in the calculations for runway days remaining. Storage calculations factor
the amounts of live usage, system usage, reserved capacity, and snapshot capacity into runway
calculations. Storage capacity runway is aware of containers, so it can calculate capacity when
multiple containers that are growing at different rates consume a single storage pool. Container
awareness allows X-Fit to create more accurate runway estimates.

Note:

The Capacity Runway tab allows you to view a summary of the resource runway
information for the registered clusters and access detailed runway information
about each cluster. Capacity runway calculations include data from live usage,
system usage, reserved capacity, and snapshot capacity.

Creating a Scenario

Anticipating future resource needs can be a challenging task. To address this task, Prism
Central provides an option to create "what if" scenarios that assess the resource requirements
for possible future workloads. This allows you to evaluate questions like

• How many new VMs can the current cluster support?

• If I need a new database server in a month, does the cluster have sufficient resources to
handle that increased load?

• If I create a new cluster for a given set of workloads, what kind of cluster do I need?

• If I remove a set of VMs or nodes, how will my cluster look?

You can create various "what if" scenarios to answer these and other questions. The answers
are derived by applying industry standard consumption patterns to the hypothetical workloads
and current consumption patterns for existing workloads.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 165
Prism Central

Finding Waste and Right-Sizing VMs

The VM efficiency features in Prism Pro recommend VMs within the environment that are
candidates for reclaiming unused resources that you can then return to the cluster.

Candidate types:

• Overprovisioned

• Inactive

• Constrained

• Bully

Within a virtualized environment, resources can become constrained globally or on a per-

VM basis. Administrators can address global capacity constraints by scaling out resources,
either by adding capacity or by reclaiming existing resources. Individual VMs can also become
constrained when they do not have enough resources to meet their demands.
Prism Pro presents the VMs it has identified as candidates for VM efficiency in a widget,
breaking the efficiency data into four different categories for easy identification:
overprovisioned, inactive, constrained, and bully.

• Overprovisioned: An overprovisioned VM is the opposite of a constrained VM, meaning it

is a VM that is over-sized and wasting resources which are not needed. A VM is considered
over-provisioned when it exhibits one or more of the following baseline values, based on
the past 30 days: CPU usage < 50% (moderate) or < 20% (severe) and CPU ready time <
5%, Memory usage < 50% (moderate) or < 20% (severe), and memory swap rate = 0 Kbps.

• Inactive: A VM is inactive in either of the following states:A VM is considered dead when it

has been powered off for at least 30 days.A VM is considered a zombie when it is powered
on but does fewer than 30 read or write I/Os (total), and receives or transfers fewer than
1000 bytes per day for the past 30 days.

• Constrained: A constrained VM is one that does not have enough resources for the demand
and can lead to performance bottlenecks. A VM is considered constrained when it exhibits
one or more of the following baseline values, based on the past 30 days: CPU usage > 90%

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 166
Prism Central

(moderate), 95% (severe) CPU ready time > 5% , 10% Memory usage > 90%, 95%, Memory
swap rate > 0 Kbps (no moderate value).

• Bully: A bully VM is one that consumes too many resources and causes other VMs to starve.
A VM is considered a bully when it exhibits one or more of the following conditions for over
an hour: CPU ready time > 5%, memory swap rate > 0 Kbps, host I/O Stargate CPU usage >
85%.

The lists of candidates show the total amount of CPU and memory configured versus peak
amounts of CPU and memory used for each VM. The overprovisioned and inactive categories
provide a high-level summary of potential resources that can be reclaimed from each VM.

Capacity Planning and Just-in-Time Forecasting

Prism Pro calculates the number, type, and configuration of nodes recommended for scaling to
provide the days of capacity requested.

You can model adding new workloads to a cluster and how those new workloads may affect
your capacity.

Capacity Planning

The Capacity Runway tab can help you understand how many days of resources you have left.
For example, determining how expanding an existing workload or adding new workloads to a
cluster may affect resources.

When you can’t reclaim enough resources, or when organizations need to scale the overall
environment, the capacity planning function can make node-based recommendations. These
node recommendations use the X-Fit data to account for consumption rates and growth and
meet the target runway period. Setting the runway period to 180 days causes Prism Pro to
calculate the number, type, and configuration of nodes recommended to provide the 180 days
of capacity requested.

Just in Time Forecasting

As part of the capacity planning portion of Prism Pro, you can model adding new workloads to
a cluster and how those new workloads may affect your capacity. The Nutanix Enterprise Cloud
uses data from X-Fit and workload models that have been carefully curated over time through
our Sizer application to inform capacity planning. The add workload function allows you to add
various applications for capacity planning.

The available workload planning options are:

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 167
Prism Central

• SQL Server: Size database workload based on different workload sizes and database types

• VMs: Enables growth modeling specifying a generic VM size to model or selecting existing
VMs on a cluster to model.

- This is helpful when planning to scale a specific application already running on the cluster

• VDI: Provides options to select broker technology, provisioning method, user type, and
number of users

• Splunk: Size based on daily index size, hot and cold retention times, and number of search
users

• XenApp: Similar to VDI; size server-based computing with data points for broker types,
server OS, provisioning type, and concurrent user numbers

• Percentage: Allows modeling that increases or decreases capacity demand for the cluster

- Example: Plan for 20 percent growth of cluster resources on a specified date

The figure below captures an example of this part of the modeling process.

Multiple Cluster Upgrades

Prism Pro offers the ability to upgrade multiple clusters from one Prism Central instance. With
this functionality, you can select multiple clusters, choose an available software version, and
push the upgrade to these clusters. If the multiple clusters you’re selecting are all within one
upgrade group, you can decide whether to perform the process on them sequentially or in
parallel.

This centralized upgrade approach provides a single point from which you can monitor status
and alerts as well as initiate upgrades. Currently, multiple cluster upgrades are only available

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 168
Prism Central

for AOS software. One-click upgrades of the hypervisor and firmware are still conducted at the
cluster level.

Test Drive Guided Menu Tours Options

Although there are - and will be in the future - more Test Drive options to choose from, our
focus is on those sessions that are related to Prism Central and Calm.

All shorter labs should be performed individually at first, using the default Guided Tour option.

Run through them at your own pace to become more familiar with the topic and the Test Drive
interface which is always available to you through http://www.nutanix.com.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 169
Prism Central

Guided Tours: Automate Operations

Find Inefficiencies lab: Main menu > My Operations > Find Inefficiencies - Guided Tour [Prism
Central > VM Efficiency]

Plan for the Future lab: Main menu > My Operations > Plan for the Future - Guided Tour [Prism
Central > Runway+Scenarios]

Automated Response lab: Main menu > My Operations > Automated Response - Guided Tour
[Prism Central > Playbooks]

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 170
Prism Central

Guided Tours: Consolidate your Data

Deploy Applications lab: Main menu > My Applications > Deploy Applications - Guided Tour
[Prism Central > Calm Setup + Blueprint]

Create App Blueprints lab: Main menu > My Applications > Create App Blueprints - Guided
Tour [Prism Central > Calm Blueprint edits + 30 secs video]

Empower Users lab: Main menu > My Applications > Empower Users - Guided Tour [Prism
Central > Calm Blueprint Marketplace]

Operations Deep Dive

Quick Links > Operations Deep Dive

60-mins lab using Prism Central (Pro license) to check for VM efficiency and anomalies, plan
your cluster’s future capacity needs, check the cluster’s “Runway”, and learn about automating
operations using Nutanix X-Play.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 171
Prism Central

Applications Deep Dive

Quick Links > Applications Deep Dive

60-mins lab using Prism Pro to build and deploy a Calm blueprint.

Labs
1. Deploying Prism Central

2. Registering a Cluster with Prism Central

3. Using Prism Central Basic Features

4. Creating a Custom Dashboard

5. Creating a Custom Report

6. Creating a “What-If?” Scenario

7. Unregistering a Cluster from Prism Central

8. Using Test Drive (optional)

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 172
Module

13
MONITORING THE NUTANIX CLUSTER

Overview
After completing this module, you will be able to:

• Understand available log files.

• Access the Nutanix support portal and online help.

Nutanix Portal

Nutanix provides support services in several ways

• Nutanix Technical Support can monitor clusters and provide assistance when problems
occur.

• The Nutanix Support Portal is available for support assistance, software downloads, and
documentation.

• Nutanix supports REST API, which allows you to request information or run administration
scripts for a Nutanix cluster.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 173
Monitoring the Nutanix Cluster

Pulse

Pulse provides diagnostic system data to the Nutanix Support team to deliver proactive,
context- aware support for Nutanix solutions.

The Nutanix cluster automatically and unobtrusively collects this information with no effect on
system performance.

Pulse shares only basic system-level information necessary for monitoring the health and status
of a Nutanix cluster. Information includes:

• System alerts

• Current Nutanix software version

• Nutanix processes and Controller VM information

• Hypervisor details such as type and version

When Pulse is enabled, it sends a message once every 24 hours to a Nutanix Support server by
default.

Pulse also collects the most important system-level statistics and configuration information
more frequently to automatically detect issues and help make troubleshooting easier. With this
information, Nutanix Support can apply advanced analytics to optimize your implementation
and to address potential problems.

Note: Pulse sends messages through ports 80/8443/443. If this is not allowed,
Pulse sends messages through your mail server. The Zeus leader IP address must
also be open in the firewall.

Pulse is enabled by default. You can enable or disable Pulse at any time.

Log File Analysis

The Nutanix CVMs keep log files documenting events that occur over the life cycle of the
cluster. These files are stored in the /home/nutanix/data/logs directory.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 174
Monitoring the Nutanix Cluster

Logs
A log is generated as a result of a component failure in a cluster.

The different types of logs are:

• INFO
• WARNING
• ERROR
• FATAL
Entries within a log use the following format:

[IWEF] mmdd hh:mm:ss.uuuuuu threadid file:line] msg

• [IWEF] identifies whether the log entry is information, a warning, an error, or fatal

• mmdd identifies the month and date of the entry

• hh:mm:ss.uuuuuu identifies the time at which the entry was made

• threadid file:line
You can also generate a FATAL log on a process for testing. To do this, run the following
command in the CVM:
curl http://<svm ip>:<component port>/h/exit?abort=1

For practice, you can use this FATAL log to understand how to correlate it with an INFO file to
get more information. There are two ways to correlate a FATAL log with an INFO log:

• Search for the timestamp of the FATAL event in the corresponding INFO files.

1. Determine the timestamp of the FATAL event.

2. Search for the timestamp in the corresponding INFO files.

3. Open the INFO file with vi and go to the bottom of the file (Shift+G).

4. Analyze the log entries immediately before the FATAL event, especially any errors or
warnings.

• If a process is repeatedly failing, it might be faster to do a long listing of the INFO files and
select the one immediately preceding the current one. The current one would be the one
referenced by the symbolic link.

Command Line Tools

cd/home/nutanix/data/logs

$ ls *stargate*FATAL*

$ tail stargate.NUTANIX-CVM03.nutanix.log.FATAL.20120510-152823

$ grep F0820stargate.NUTANIX-CVM03.nutanix.log.INFO.20120510-152823
stargate.ERROR
stargate.INFO
stargate.ntnx-16sm32070038-b-cvm.nutanix.log.ERROR.20190505-142229.18195
stargate.ntnx-16sm32070038-b-cvm.nutanix.log.INFO.20190927-204653.18195.gz
stargate.ntnx-16sm32070038-b-cvm.nutanix.log.WARNING.20190505-142229.18195
stargate.out
stargate.out.20190505-142228
stargate.WARNING

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 175
Monitoring the Nutanix Cluster
vip_service_stargate.out
vip_service_stargate.out.20190505-142302

Linux Tools

This command returns a list of all files in the current directory, which is useful when you want to
see how many log files exist.

Include a subset of the filename that you are looking for to narrow the search. For example: $ ls
*stargate*FATAL*

cat

This command reads data from files and outputs their content. It is the simplest way to display
the contents of a file at the command line.

tail

This command returns the last 10 lines that were written to the file, which is useful when
investigating issues that have happened recently or are still happening.
To change the number of lines, add the -n flag. For example: $ tail -n 20 stargate.NUTANIX-
CVM03.nutanix.log.FATAL. 20120510-152823.3135

grep

This command returns lines in the file that match a search string, which is useful if you
are looking for a failure that occurred on a particular day. For example: $ grep F0820
stargate.NUTANIX-CVM03.nutanix.log.FATAL. 20120510-152823.3135

Nutanix Support Tools

Nutanix provides a variety of support services and materials through the Support portal. To
access the Nutanix support portal from Prism Central:

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 176
Monitoring the Nutanix Cluster

1. Select Support Portal from the user icon pull-down list of the main menu. The login screen
for the Nutanix support portal appears in a new tab or window.

2. Enter your support account user name and password. The Nutanix support portal home
page appears.

3. Select the desired service from the screen options. The options available to you are:

• Select an option from one of the main menu pull-down lists

• Search for a topic at the top of the screen

• Click one of the icons (Documentation, Open Case, View Cases, Downloads) in the middle

• View one of the selections at the bottom such as an announcement or KB article.

Note: Some options have restricted access and are not available to all users.

Labs
1. Using Nutanix Cluster Check (NCC) Health Checks
2. Collecting logs for support

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 177
Cluster Management and Expansion

Module

14
CLUSTER MANAGEMENT AND EXPANSION

Overview
After completing this module, you will be able to:

• Shutdown a node in a cluster and start a node in a cluster.

• Stop and Start a cluster.
• Remove nodes from a cluster.

• Expand a cluster.
• Explain license management.

• Update AOS and firmware.

Starting and Stopping a Cluster or Node

Understanding Controller VM Access
Most administrative functions of a Nutanix cluster can be performed through the web console
(Prism), however, there are some management tasks that require access to the Controller
VM (CVM) over SSH. Nutanix recommends restricting CVM SSH access with password or key
authentication.

Exercise caution whenever connecting directly to a CVM as the risk of causing cluster issues is
increased. This is because if you make an error when entering a container name or VM name,
you are not typically prompted to confirm your action – the command simply executes. In
addition, commands are executed with elevated privileges, similar to root, requiring attention
when making such changes.

Cluster Shutdown Procedures

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 178
Cluster Management and Expansion

While Nutanix cluster upgrades are non-disruptive and allow the cluster to run while nodes
upgrade in the background, there are situations in which some downtime may be necessary.
Certain maintenance operations and tasks such as hardware relocation would require a cluster
shutdown.

Before shutting down a node, shut down all the guest VMs running on the node or move
them to the other nodes in the cluster. Verify the data resiliency status of the cluster. The
recommendation for any RF level is to only shut down one node at a time, even if it's RF3. If
a cluster needs to have more than one node shut down, shut down the entire cluster. The
command, cluster status, executed from the CLI on a Control VM, shows the current status of all
cluster processes.

Note: This topic shows the process for AHV. Consult the appropriate admin manual
for other hypervisors.

Follow the tasks listed below if a cluster shutdown is needed:

1. Verify the cluster status with the cluster status command on a CVM.

2. If there are issues with the cluster, you can run the NCC checker with the command ncc
health_checks_run_all.

3. Verify cluster data resiliency status on the Prism screen.

4. Shut down all guest VMs (aside from CVMs).

5. Shutdown the cluster with the command cluster stop. use the cluster status command to
see the current status of all cluster processes.

6. Power off CVMs with the command cvm shutdown -P now.

7. Power off hosts with the command $shutdown -h now.

Starting a Node
1. If the node is turned off, turn it on (otherwise, go to the next step).

2. Log on to the AHV host with SSH.

3. Find the name of the CVM by executing the following on the host: virsh list --all | grep
CVM

4. Examining the output from the previous command, if the CVM is OFF, start it from the
prompt on the host: virsh start cvm_name

Note: The cvm_name is obtained from the command run in step 3.

5. If the node is in maintenance mode, log on to the CVM over SSH and take it out of
maintenance mode: acli host.exit_maintenance_mode AHV-hypervisor-IP-address

6. Log on to another CVM in the cluster with SSH.

7. Confirm that cluster services are running on the CVM (make sure to replace cvm_ip_addr
accordingly): ncli cluster status | grep –A 15 cvm_ip_addr

a. Alternatively, you can use the following command to check if any services are down in the
cluster: cluster status | grep -v UP

8. Verify that all services are up on all CVMs.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 179
Cluster Management and Expansion

Starting a Cluster
1. Log on to any CVM in the cluster with SSH.

2. Get the cluster status: cluster status

3. Start the Nutanix cluster: cluster start

Once the process begins, you will see a list of all the services that need to be started on each
CVM:

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 180
Cluster Management and Expansion

If the cluster starts properly, output similar to the following is displayed for each node in the
cluster at the end of the command execution:
CVM: 10.1.64.60 Up
Zeus UP [5362, 5391, 5392, 10848, 10977, 10992]
Scavenger UP [6174, 6215, 6216, 6217]
SSLTerminator UP [7705, 7742, 7743, 7744]
SecureFileSync UP [7710, 7761, 7762, 7763]
Medusa UP [8029, 8073, 8074, 8176, 8221]
DynamicRingChanger UP [8324, 8366, 8367, 8426]
Pithos UP [8328, 8399, 8400, 8418]
Hera UP [8347, 8408, 8409, 8410]
Stargate UP [8742, 8771, 8772, 9037, 9045]
InsightsDB UP [8774, 8805, 8806, 8939]
InsightsDataTransfer UP [8785, 8840, 8841, 8886, 8888, 8889, 8890]
Ergon UP [8814, 8862, 8863, 8864]
Cerebro UP [8850, 8914, 8915, 9288]
Chronos UP [8870, 8975, 8976, 9031]
Curator UP [8885, 8931, 8932, 9243]
Prism UP [3545, 3572, 3573, 3627, 4004, 4076]
CIM UP [8990, 9042, 9043, 9084]
AlertManager UP [9017, 9081, 9082, 9324]
Arithmos UP [9055, 9217, 9218, 9353]
Catalog UP [9110, 9178, 9179, 9180]
Acropolis UP [9201, 9321, 9322, 9323]
Atlas UP [9221, 9316, 9317, 9318]
Uhura UP [9390, 9447, 9448, 9449]
Snmp UP [9418, 9513, 9514, 9516]
SysStatCollector UP [9451, 9510, 9511, 9518]
Tunnel UP [9480, 9543, 9544]
ClusterHealth UP [9521, 9619, 9620, 9947, 9976, 9977,
10301]
Janus UP [9532, 9624, 9625]
NutanixGuestTools UP [9572, 9650, 9651, 9674]
MinervaCVM UP [10174, 10200, 10201, 10202, 10371]
ClusterConfig UP [10205, 10233, 10234, 10236]
APLOSEngine UP [10231, 10261, 10262, 10263]
APLOS UP [10343, 10368, 10369, 10370, 10502, 10503]
Lazan UP [10377, 10402, 10403, 10404]
Orion UP [10409, 10449, 10450, 10474]
Delphi UP [10418, 10466, 10467, 10468]

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 181
Cluster Management and Expansion

After you have verified that the cluster is up and running and there are no services down, you
can start guest VMs.

Removing a Node from a Cluster

Hardware components, such as nodes and disks, can be removed from a cluster or reconfigured
in other ways when conditions warrant it. However, node removal is typically a lengthy and I/O-
intensive operation. Nutanix recommends to remove a node only when it needs to be removed
permanently from a cluster. Node removal is not recommended for troubleshooting scenarios.

Before You Begin

If Data-at-Rest Encryption is enabled, then before removing a drive or node from a cluster:

1. Navigate to the Settings section of Prism and select Data at Rest Encryption.

2. Create a new configuration.

3. Enter the required credentials in the Certificate Signing Request Information section.

4. In the Key Management Server section, add a new key management server.

5. Add a new certificate authority and upload a CA certificate.

6. Return to the Key Management Server section, upload all node certificates.

7. Test the certificates again by clicking Test all nodes.

8. Ensure that testing is successful and the status is Verified.

Note: For a detailed procedure, refer to the Configuring Data-at-Rest Encryption

(SEDs) section of the Prism Web Console Guide on the Nutanix Support Portal.

Note: Note that if an SED drive or node is not removed as recommended, then the
drive or node will be locked.

Removing or Reconfiguring Cluster Hardware

When removing a host, remember that:

• You need to reclaim licenses before you remove a host from a cluster.

• Removing a host takes some time because data on that host must be migrated to other
hosts before it can be removed from the cluster. You can monitor progress through the
dashboard messages.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 182
Cluster Management and Expansion

• Removing a host automatically removes all the disks in that host from the storage containers
and the storage pool(s).

• Only one host can be removed at a time. If you want to remove multiple hosts, you must
wait until the first host is removed completely before attempting to remove the next host.

• After a node is removed, it goes into an unconfigured state. You can add such a node back
into the cluster through the expand cluster workflow, which we will discuss in the next topic
of this chapter.

Expanding a Cluster

Nutanix supports these cluster expansion scenarios:

• Add a new node to an existing block

• Add a new block containing one or more nodes

• Add all nodes from an existing cluster to another existing cluster

The ability to dynamically scale the Acropolis cluster is core to its functionality. To scale an
Acropolis cluster, install the new nodes in the rack and power them on. After the nodes are
powered on, if the nodes contain a factory installed image of AHV and CVM, the cluster should
discover the new nodes using IPv6 Neighborhood Discovery protocol.

Note: Nodes that are installed with AHV and CVM, but not associated with a
cluster, are also discoverable. Factory install of AHV and CVM may not be possible
for nodes shipped in some regions of the world.

Multiple nodes can be discovered and added to the cluster concurrently if AHV and the CVM
are imaged in the factory, before they are shipped. Some pre-work is necessary for nodes that
do not meet this criteria. Additionally, nodes that are already part of a cluster are not listed as
options for cluster expansion.

The process for expanding a cluster depends on the hypervisor type, version of AOS, and data-
at-rest encryption status.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 183
Cluster Management and Expansion

Configuration Description

Same hypervisor and The node is added to the cluster without re-imaging it.
AOS version

AOS version is The node is re-imaged before it is added.

different
If the AOS version on the node is different (lower) but the hypervisor
version is the same, you have the option to upgrade just AOS from the
command line. To do this, log into a Controller VM in the cluster and run
the following command:

nutanix@cvm$ /home/nutanix/cluster/bin/cluster -u
new_node_cvm_ip_address upgrade_node

After the upgrade is complete, you can add the node to the cluster
without re-imaging it. Alternately, if the AOS version on the node is
higher than the cluster, you must either upgrade the cluster to that
version or re-image the node.

AOS version is same You are provided with the option to re-image the node before adding it.
but hypervisor version (Re-imaging is appropriate in many such cases, but in some cases it may
is different not be necessary such as for a minor version difference. Depending on
the hypervisor, installation binaries (e.g. ISO) might need to be provided.

Data-At-Rest If Data-At-Rest Encryption is enabled for the cluster (see Data-at-Rest

Encryption Encryption), you must configure Data-At-Rest Encryption for the new
nodes. The new nodes must have self-encrypting disks or AOS based
software encryption.

Re-imaging is not an option when adding nodes to a cluster where Data-

At-Rest Encryption is enabled. Therefore, such nodes must already have
the correct hypervisor and AOS version.

Expanding cluster To expand the ESXi cluster configured with DVS for Controller VM
when the ESXi cluster external communication, ensure that you do the following.
is configured with DVS
(Distributed VSwitch) • Expand DVS with the new node.
for CVM. • Make sure both the host and the CVM are configured with DVS.

• Make sure that host to CVM and CVM to CVM communications are
working.

• Follow the cluster expansion procedure.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 184
Cluster Management and Expansion

Managing Licenses

Nutanix provides automatic and manually applied licenses to ensure access to the variety of
features available. These features will enable you to administer your environment based on your
current and future needs. You can use the default feature set of AOS, upgrade to an advanced
feature set, update your license for a longer term or reassign existing licenses to nodes or
clusters as needed.

Each Nutanix NX Series node or block is delivered with a default Starter license which does not
expire. You are not required to register this license on your Nutanix Customer Portal account.
These licenses are automatically applied when a cluster is created, even when a cluster has
been destroyed and re-created. In these cases, Starter licenses do not need to be reclaimed.

Software only platforms, qualified by Nutanix (for example, the Cisco UCS M5 C-Series Rack
Server), might require a manually applied Starter license. Depending on the license level you
purchase, you can apply it using the Prism Element or Prism Central web console.

In this section, we will discuss the fundamentals of Nutanix license management.

Cluster Licensing Considerations

• Nutanix nodes and blocks are delivered with a default Starter license that does not expire.

• Pro and Ultimate licenses have expiration dates. License notification alerts in Prism start 60
days before expiration.

• Upgrade your license type if you require continued access to Pro or Ultimate features.

• An administrator must install a license after creating a cluster for Pro and Ultimate licensing.

• Reclaim licenses before destroying a cluster.

• Ensure consistent licensing for all nodes in a cluster. Nodes with different licensing, default to
minimum feature set.

For example, if two nodes in the cluster have Pro licenses and two nodes in the same have
Ultimate licenses, all nodes will effectively have Pro licenses and access to that feature set only.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 185
Cluster Management and Expansion

Attempts to access Ultimate features in this case result in a warning in the web console. If you
are using a Prism Pro trial license, the warning shows the expiration date and number of days
left in the trial period. Trial period is 60 days.

• You may see a "Licensing Status: In Process" alert message in the web console or log files.

• Generating a Cluster Summary File through the Prism web console, nCLI commands
(generate-cluster-info) or PowerShell commands (get-NTNXClusterLicenseInfo and get-
NTNXClusterLicenseInfoFile) initiates the cluster licensing process.

Understanding AOS Prism and Add on Licenses

Nutanix offers three licensed editions of AOS, two of Prism and a licensing or subscription
model for Add-ons. Subscription models are for one to five year terms.

AOS Licenses

Starter Licenses are installed by default, on each Nutanix node and block. They never expire
and they do not require registration on your assigned Nutanix customer portal account.

Pro and Ultimate licenses are downloaded as a license file from the Nutanix Support Portal and
applied to your cluster using Prism.

With Pro or Ultimate or after upgrading to the Pro or Ultimate license, adding nodes or clusters
to your environment, requires you to generate a new license file for download and installation.

Note: For more information about the different features that are available with
Acropolis Starter, Pro, and Ultimate, please see: https://www.nutanix.com/
products/software-options

Prism Licenses

The Prism Starter license is installed by default, on every edition of AOS.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 186
Cluster Management and Expansion

The Prism Pro license is available on a per-node basis, with options to purchase on a 1, 2, 3, 4, or
5-year term. A trial version of Prism Pro is included with every edition of AOS.

Add-on Licenses

Individual features known as add-ons can be added to your existing Prism license feature set.
When Nutanix makes add-ons available, you can add them to your existing Starter or Pro
license. For example, you can purchase Nutanix Files for your existing Pro licenses.

You need to purchase and apply one add-on license for each node in the cluster with a Pro
license. For example, if your current Pro-licensed cluster consists of four nodes, you need to
purchase four add-on licenses, then apply them to your cluster.

All nodes in your cluster need to be at the same license level (four Pro licenses and four add-
on licenses). You cannot buy one add-on license, apply it to one node and have three nodes
without add-on licenses.

Add-ons that are available with one to five year subscription terms are Nutanix Era, Nutanix
Flow, Nutanix Files and Nutanix Files Pro. Nutanix Calm is available in 25 VM subscription
license packs.

Managing Your Licenses

Before Licensing a Cluster

Before attempting to install an upgraded or add-on license, ensure that you have created a
cluster and have logged into the web console to ensure the Starter license has been applied.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 187
Cluster Management and Expansion

Managing Licenses Using Portal Connection

The Portal Connection feature simplifies licensing by integrating the licensing workflow into
a single interface in the web console. Once you configure this feature, you can perform most
licensing tasks from Prism without needing to explicitly log on to the Nutanix Support Portal.

Note: This feature is disabled by default. If you want to enable Portal Connection,
please see the Nutanix Licensing Guide on the Nutanix Support Portal.

Portal Connection communicates with the Nutanix Support Portal to detect changes or updates
to your cluster license status. When you open Licensing from the web console, the screen
displays 1-click action buttons to enable you to manage your licenses without leaving Prism.

This button If you are eligible or want to…

appears…

Add Add an add-on license. This button appears if add-on features are
available for licensing.

Downgrade Downgrade your cluster to Pro from Ultimate or to Starter from

Pro or Ultimate. Use this button when reclaiming licenses before
destroying a cluster.

Rebalance Ensure your available licenses are applied to each node in your
cluster. For example:

If you have added a node and have an available license in your

account, click Rebalance.

If you have removed a node, click Rebalance to reclaim the now-

unused license.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 188
Cluster Management and Expansion

This button If you are eligible or want to…

appears…

Remove Remove an add-on license, disabling the add-on feature.

Renew Apply newly-purchased licenses.

Select Apply a license for an unlicensed cluster.

Update Extend the expiration date of current valid licenses.

Upgrade Upgrade your cluster from Starter to Pro or Ultimate, or Pro to

Ultimate license types.

Note: For more information on managing licenses with the Portal Connection
feature, including example of upgrades, renewals, and removal, please see the
Nutanix Licensing Guide on the Nutanix Support Portal.

Managing Licenses Without Portal Connection (Dark Site)

This is the default method of managing licenses since the Portal Connection feature is disabled
by default. This method is a 3-step process, in which you:

1. Generate a cluster summary file in the web console and upload it to the Nutanix support
portal.

2. Generate and download a license file from the Nutanix support portal.

3. Install the license file on a cluster connected to the internet.

Generating a Cluster Summary File

1. From an internet-connected cluster, click the gear icon in the web console and open
Licensing.

2. Click Update License.

3. Click Generate to create and save a cluster summary file to your local machine. The cluster
summary file is saved to your browser download directory or directory you specify.

Generating and Downloading a License File

Note: To begin this process, you must have first generated a cluster summary file in
the web console.

1. Upload the Cluster Summary File to the Nutanix support portal.

2. Click Support Portal, log on to the Nutanix support portal, and click My Products > Licenses.

3. Click License a New Cluster. The Manage Licenses dialog box displays.

4. Click Choose File. Browse to the Cluster Summary File you just downloaded, select it, and
click Next. The portal automatically assigns a license, based on the information contained in
the Cluster Summary File.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 189
Cluster Management and Expansion

5. Generate and apply the downloaded license file to the cluster. Click Generate to download
the license file created for the cluster to your browser download folder or directory you
specify.

Installing the License File

Note: To begin this process, you must have first generated and downloaded a
license file from the Nutanix Support Portal.

1. In the Prism web console, click the upload link in the Manage Licenses dialog box.

2. Browse to the license file you downloaded, select it, and click Save.

Note: Note that the 3-step process described here applies to Prism Element, Prism
Central, and Add-on Licenses. For specific instructions related to each of these
three license types, please the relevant section of the Nutanix Licensing Guide on
the Nutanix Support Portal.

Managing Licenses in a Dark Site

Since a dark site cluster will not be connected to the internet, the Portal Connection feature
cannot be used from the cluster itself. However, some steps in the licensing process will require
the use of a system connected to the internet. The three step process for licensing a dark site
cluster is as follows:

Getting Cluster Summary Information Manually

1. Open Licensing from the gear icon in the web console for the connected cluster.

2. Click Update License.

3. Click Show Info and copy the cluster information needed to generate a license file. This
page displays the information that you need to enter at the support portal on an internet-
connected system. Copy this information to complete this licensing procedure.

Cluster UUID String indicating the unique cluster ID

Signature Cluster security key

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 190
Cluster Management and Expansion

License Class Indicates a software-only, appliance-based, or Prism Central

license class

License Version Indicates the version of the installed license file

Node Count Number of available licenses for this model

Cores Num Number of CPU cores; used with capacity-based licensing

Flash TiB Number of Flash TiBs; used with capacity based licensing

Installing a New License in a Dark Site

1. Get your cluster information from the web console. Complete the installation process on a
machine connected to the internet.

2. Navigate to the Cluster Usage section of the Nutanix Support Portal to manage your
licenses.

3. Select the option for Dark Sites and then select the required license information, including
class, license version, and AOS version.

4. If necessary, enter capacity and block details. (Ensure that there are no typing errors.)

5. Select your licenses for Acropolis and then license your add-ons individually.

6. Check the summary, make sure all details are correct, and then download the license file.

7. Apply the downloaded license file to your dark site cluster to complete the process.

Reclaiming Your Licenses

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 191
Cluster Management and Expansion

Reclaiming a license returns it to your inventory and you can reapply it to other nodes in a
cluster. You will need to reclaim licenses when modifying license assignments, when removing
nodes from a cluster or before you destroy a cluster.

As with license management, licenses can be reclaimed both with and without the use of the
Portal Connection feature. Both procedures have been described below. For more information,
included detailed step-by-step procedures, please see the Nutanix Licensing Guide on the
Nutanix Support Portal.

Reclaiming Licenses with a Portal Connection

You must reclaim licenses (other than Starter) when you plan to destroy a cluster. First, reclaim
your licenses, then downgrade to Starter. When the cluster (all nodes) are at the Starter license
level, you can then destroy the cluster. You do not need to reclaim Starter licenses. These
licenses are automatically applied whenever you create a cluster.

Reclaim licenses to return them to your inventory when you remove one or more nodes from
a cluster. If you move nodes from one cluster to another, first reclaim the licenses, move the
nodes, then re-apply the licenses. Otherwise, if you are removing a node and not moving it to
another cluster, use the Rebalance button.

You can reclaim licenses for nodes in your clusters in cases where you want to make
modifications or downgrade licenses. For example, applying an Ultimate license to all nodes
in a cluster where some nodes are currently licensed as Pro and some nodes are licensed as
Ultimate. You might also want to transition nodes from Ultimate to Pro licensing.

Using Portal Connection to Reclaim a License

1. Open Licensing from the gear icon in the web console for the connected cluster.

2. Remove any add-ons. For example, Nutanix Files.

a. Open Licensing from the gear icon in the Prism web console for the connected cluster.

b. The Licensing window shows that you have installed the Nutanix Files add-on.

c. Click Remove File Server to remove this add-on feature. Click Yes in the confirmation
window.

Portal Connection places the cluster into standby mode to remove the feature and update
the cluster license status. After this operation is complete, license status is updated.

d. Click X to close the Licensing window.

You will need to repeat this procedure for any other add-ons that you have installed.

3. Click Downgrade to Starter after any add-ons are removed.

4. Click X to close the Licensing window.

You can now perform any additional tasks, such as destroying the cluster or re-applying
licenses.

Reclaiming Licenses Without Portal Connection

Note: This procedure applies to clusters that are not configured with Portal
Connection, as well as dark-site clusters.

There are two scenarios in which you will reclaim licenses without using Portal Connection.
First, when destroying a cluster and second, when removing nodes from a cluster. The

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 192
Cluster Management and Expansion

procedure for both scenarios is largely the same. Differences have been noted in the steps
below, where applicable.

Points to Remember

• After you remove a node, if you move the node to another cluster, it requires using an
available license in your inventory.

• You must unlicense (reclaim) your cluster (other than Starter on Nutanix NX Series
platforms) when you plan to destroy a cluster. First unlicense (reclaim) the cluster, then
destroy the cluster.

Note: If you have destroyed the cluster and did not reclaim all existing licenses by
unlicensing the cluster, contact Nutanix Support to help reclaim the licenses.

• Return licenses to your inventory when you remove one or more nodes from a cluster. Also,
if you move nodes from one cluster to another, first reclaim the licenses, move the nodes,
then re-apply the licenses.

• You can reclaim licenses for nodes in your clusters in cases where you want to make
modifications or downgrade licenses. For example, applying an Ultimate license to all nodes
in a cluster where some nodes are currently licensed as Pro and some nodes are licensed as
Ultimate. You might also want to transition nodes from Ultimate to Pro licensing.

• You do not need to reclaim Starter licenses for Nutanix NX Series platforms. These licenses
are automatically applied whenever you create a cluster.

Reclaiming a License without Portal Connection

1. Generate a cluster summary file in the web console and upload it to the Nutanix Support
Portal.

2. In the Support Portal, unlicense the cluster and download the license file.

3. Apply the downloaded license file to your cluster to complete the license reclamation
process.

Upgrading Software and Firmware

Nutanix provides a mechanism to perform nonintrusive rolling upgrades through Prism. This
simplifies the job of the administrator and results in zero loss of services.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 193
Cluster Management and Expansion

AOS

Each node in a cluster runs AOS. When upgrading a cluster, all nodes should be upgraded to
the same AOS version.

Nutanix provides a live upgrade mechanism that allows the cluster to run continuously while a
rolling upgrade of the nodes is started in the background. There is no downgrade option.

Hypervisor Software

Hypervisor upgrades provided by vendors such as VMware and qualified by Nutanix. The
upgrade process updates one node in a cluster at a time.

NCC

Nutanix Cluster Check (NCC).

Foundation

Nutanix Foundation installation software.

BIOS and BMC Firmware

Nutanix provides updated BIOS and Base Management Controller (BMC) firmware.

Nutanix rarely includes this firmware on the Nutanix Support Portal. Nutanix recommends that
you open a case on the Support Portal to request the availability of updated firmware for your
platform.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 194
Cluster Management and Expansion

Disk Firmware

Nutanix provides a live upgrade mechanism for disk firmware. The upgrade process updates
one disk at a time on each node for the disk group you have selected to upgrade.

Once the upgrade is complete on the first node in the cluster, the process begins on the next
node. Update happens on one disk at a time until all drives in the cluster have been updated.

Understanding Long Term Support and Short Term Support Releases

For AOS only, Nutanix offers two types of releases that cater to the needs of different customer
environments.
• Short Term Support (STS) releases have new features and provide a regular upgrade path

• Long Term Support (LTS) releases are maintained for longer periods of time and primarily
include bug fixes over that extended period

To understand whether you have an STS or LTS release or which one is right for you, refer to
the following table:

Type Release Support Cycle Content Target User Current Upgrade

Cadence AOS paths
release
family

STS Quarterly 3 months of Major new Customers that 5.6.x To the next
maintenance, features, are interested in upgrade path
followed by hardware adopting major 5.8.x supported
an additional platforms for new features 5.9.x STS release
3 months of new features. and are able
support. Also contains to perform 5.11.x OR
bug fixes. upgrades To the next
multiple times a upgrade path
year. supported
LTS release

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 195
Cluster Management and Expansion

Type Release Support Cycle Content Target User Current Upgrade

Cadence AOS paths
release
family

LTS Annually 12 months of Focused Customers that 5.5.x To the next

maintenance heavily on bug are interested upgrade path
after the release fixes. Minimal in a release 5.10.x supported
date of the minor feature family with 5.15.x STS release
next upgrade, introduction. an extended
followed by support cycle. OR
6 months of To the next
support. upgrade path
supported
LTS release

Note: Note that the upgrade path must always be to a later release. Downgrades
are not supported.

Before You Upgrade

Before you can proceed with an upgrade, you need to:

• Check the status of your cluster to ensure everything is in a proper working state.

• Check to see if your desired upgrade is a valid upgrade path.

• Check the compatibility matrix for details of hypervisor and hardware support for different
versions of AOS.

Lifecycle Manager (LCM) Upgrade Process

Upgrading Nutanix clusters involves a specific sequence of tasks:

1. Upgrade AOS on each cluster

2. Perform a Lifecycle Manager (LCM) inventory

3. Update LCM

4. Upgrade any recommended firmware

The upgrade process steps are detailed below.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 196
Cluster Management and Expansion

Upgrading the Hypervisor and AOS on Each Cluster

Overview and Requirements

1. Check the AOS release notes for late-breaking upgrade information.

2. Run the Nutanix Cluster Check (NCC) health checks from any CVM in the cluster.

3. Download the available hypervisor software from the vendor and the metadata file (JSON)
from the Nutanix Support Portal. If you are upgrading AHV, you can download the binary
bundle from the Nutanix Support Portal.
4. Upload the software and metadata through Upgrade Software.

5. Upgrading the hypervisor causes each CVM to restart.

6. Only one node is upgraded at a time. Ensure that all the hypervisors hosted in your cluster
are running the same version (all ESXi hosts running the same version, all AHV hosts running
the same version, and so on). The NCC check, same_hypervisor_version_check returns a
FAIL status if the hypervisors are different.

Note: Using the Upgrade Software (1-click upgrade) feature does not complete
successfully in this case.

Upgrading AHV

To upgrade AHV through the Upgrade Software feature in the Prism web console, do the
following:

1. Ensure that you are running the latest version of NCC. Upgrade NCC if required.

2. Run NCC to ensure that there are no issues with the cluster.

3. In the web console, navigate to the Upgrade Software section of the Settings page and click
the Hypervisor tab.

4. If Available Compatible Versions shows a new version of AHV, click Upgrade, then click
Upgrade Now, and click Yes when prompted for confirmation.

Upgrading AOS

To upgrade AOS through the Upgrade Software feature in the Prism web console, do the
following:

1. Ensure that you are running the latest version of NCC. Upgrade NCC if required.

2. Run NCC to ensure that there are no issues with the cluster.

3. In the web console, navigate to the Upgrade Software section of the Settings page and
select the option to upgrade AOS.

4. Optionally, you can also run pre-upgrade installation checks before proceeding with the
ugrade process.

5. If automatic downloads are enabled on your cluster, install the downloaded package. If
automatic downloads are not enabled, download the upgrade package and install it.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 197
Cluster Management and Expansion

Working with Life Cycle Manager

The Life Cycle Manager (LCM) tracks software and firmware versions of all entities in the
cluster. It performs two functions: taking inventory of the cluster and performing updates on
the cluster.

LCM consists of a framework consisting of a set of modules for inventory and update. LCM
supports all Nutanix, Dell XC, Dell XC Core, and Lenovo HX platforms. LCM modules are
independent of AOS. They contain libraries and images, as well as metadata and checksums for
security. Currently, Nutanix supplies all modules.
The LCM framework is accessible through the Prism interface. It acts as a download manager
for LCM modules, validating and downloading module content. All communication between the
cluster and LCM modules goes through the LCM framework.

Accessing LCM

Whether you are accessing LCM from Prism Element or Prism Central, the steps to do so are
the same.

1. Click the gear button to open the settings page.

2. Select Life Cycle Management from the sidebar.

Note: Note: In AOS 5.11 and later, LCM is available as a menu item from the
Prism Home page, rather than the Settings page.

Performing Inventory with LCM

You can use LCM to display software and firmware versions of entities in a cluster. Inventory
information for a node is persistent for as long as the node remains in the chassis. When you
remove a node from a chassis, LCM will not retain inventory information for that node. When
you return the node to the chassis, you must perform inventory again to restore the inventory
information.

To perform inventory:

1. Open LCM.

2. To take an inventory, click Options and select Perform Inventory. If you do not have auto-
update enabled, and a new version of the LCM framework is available, LCM will display the
following warning:

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 198
Cluster Management and Expansion

3. Click OK. The new inventory appears on the Inventory page.

Other features in LCM that might be useful to you are:

• The Focus button, which lets you switch between a general display and a component-by-
component display.

• The Export option, which will export the inventory as a spreadsheet.

• Auto-inventory. To enable this feature, click Settings and select the Enable LCM Auto
Inventory check box in the dialog box that appears.

Upgrading Recommended Firmware

You will use LCM to upgrade firmware on your cluster. Before you begin, remember to:

• Get the current status of your cluster to ensure everything is in the proper working order.

• Update your cluster to the most recent version of Nutanix Foundation.

• Configure rules in your external firewall to allow LCM updates. For details, see the Firewall
Requirements section of the Prism Web Console Guide on the Support Portal.

The LCM Update Workflow

LCM updates the cluster one node at a time: it brings a node down (if needed), performs
updates, brings the node up, waits until is fully functional, and then moves on to the next node.
If LCM encounters a problem during an update, it waits until the problem has been resolved
before moving on to the next node.

During an LCM update, there is never more than one node down at the same time even if the
cluster is RF3.

All LCM updates follow the procedure shown in the following flowchart:

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 199
Cluster Management and Expansion

Details of the procedure shown in the flowchart are as follows:

1. If updates for the LCM framework are available, LCM auto-updates its own framework, then
continues with the operation.

2. After a self-update, LCM runs the series of pre-checks described in the Life Cycle Manager
Pre-Checks section of the Life Cycle Manager Guide on the Support Portal.

3. When the pre-checks are complete, LCM looks at the available component updates and
batches them according to dependencies. LCM batches updates in order to reduce or
eliminate the downtime of the individual nodes; when updates are batched, LCM only
performs the pre-update and post-update actions once. For example, on NX platforms, BIOS
updates depend on BMC updates, so LCM batches them so the BMC always updates before
the BIOS on each node.
4. Next, LCM chooses a node and performs any necessary pre-update actions.

5. Next, LCM performs the update. The update process and duration vary by component.

6. LCM performs any necessary post-update actions and brings the node back up.

7. When cluster data resiliency is back to normal, LCM moves to the next node.

Performing Upgrades with LCM

With Internet Access

1. Open LCM and select either software or firmware updates.

2. Specify where LCM should look for updates, and then select the updates you want to
perform.

3. Select the NCC prechecks you want to run before updating.

4. Once the prechecks are complete, apply your updates.

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 200
Cluster Management and Expansion

At a Dark Site

By default, LCM automatically fetches updates from a pre-configured URL. If you are managing
a Nutanix cluster at a site that cannot access the provided URL, you must configure LCM to
fetch updates locally, using the procedure described in the Life Cycle Manager Guide on the
Nutanix Support Portal.

Labs
1. Performing a one-click NCC upgrade

2. Adding nodes to a cluster

3. Removing nodes from a cluster

Do not replicate or distribute without written consent. Copyright Nutanix Inc. 2020 | 201

Aim Nhẹ
No ratings yet
Aim Nhẹ
710 pages
1697496726075.5 Course Guide Va
No ratings yet
1697496726075.5 Course Guide Va
385 pages
Support
No ratings yet
Support
357 pages
Command Reference-NOS v4 0
No ratings yet
Command Reference-NOS v4 0
161 pages
Ent Cloud Platform Admin 50 Courseware Labmanual Ahv PDF
No ratings yet
Ent Cloud Platform Admin 50 Courseware Labmanual Ahv PDF
248 pages
Cisco ACI With Nutanix
No ratings yet
Cisco ACI With Nutanix
34 pages
ECA-5 - 15-Course-Guide A.1 PDF
100% (5)
ECA-5 - 15-Course-Guide A.1 PDF
195 pages
Nutanix
No ratings yet
Nutanix
8 pages
Nutanix Advanced Administration and Performance Management (AAPM) Guide
No ratings yet
Nutanix Advanced Administration and Performance Management (AAPM) Guide
89 pages
Field Installation Guide v3 6
No ratings yet
Field Installation Guide v3 6
100 pages
TN 2041 Nutanix Files
No ratings yet
TN 2041 Nutanix Files
66 pages
Nutanix - Advanced-Admin-AOS-v51
No ratings yet
Nutanix - Advanced-Admin-AOS-v51
63 pages
Command Reference-NOS v3 5
No ratings yet
Command Reference-NOS v3 5
73 pages
Command Reference-NOS v4 0
No ratings yet
Command Reference-NOS v4 0
167 pages
Nutanix - Command-Ref-AOS-v51
No ratings yet
Nutanix - Command-Ref-AOS-v51
276 pages
Service As Built User Guide
No ratings yet
Service As Built User Guide
5 pages
ONTAP FlexArray Documentation
No ratings yet
ONTAP FlexArray Documentation
166 pages
Nca
No ratings yet
Nca
39 pages
NCP-MCI 5.15 Study Guide
No ratings yet
NCP-MCI 5.15 Study Guide
36 pages
WP AHV Virtualization Solution ENT Cloud
No ratings yet
WP AHV Virtualization Solution ENT Cloud
34 pages
Nutanix Hybrid Cloud
No ratings yet
Nutanix Hybrid Cloud
56 pages
Field Installation Guide v3 7
No ratings yet
Field Installation Guide v3 7
102 pages
BP Linux Nutanix Ahv
No ratings yet
BP Linux Nutanix Ahv
36 pages
Unit 4
No ratings yet
Unit 4
60 pages
LogRhythm High Performance Appliances Data Sheet
No ratings yet
LogRhythm High Performance Appliances Data Sheet
2 pages
Security Architecture and Design
No ratings yet
Security Architecture and Design
33 pages
Manual TK-Strike Truescore 2014
50% (2)
Manual TK-Strike Truescore 2014
27 pages
Windows 7 Activation Key
No ratings yet
Windows 7 Activation Key
2 pages
Nutanix Services - Cluster Deployment Guide
No ratings yet
Nutanix Services - Cluster Deployment Guide
14 pages
Address Evolving Euc Needs With Nutanix
No ratings yet
Address Evolving Euc Needs With Nutanix
4 pages
IPSO 4.2KeySyslogMessages
No ratings yet
IPSO 4.2KeySyslogMessages
19 pages
Services Academy BootCamp v2.2
No ratings yet
Services Academy BootCamp v2.2
163 pages
Create Setup and Deployment of WPF Application Step by Step
No ratings yet
Create Setup and Deployment of WPF Application Step by Step
15 pages
BP 2071 AHV Networking PDF
No ratings yet
BP 2071 AHV Networking PDF
50 pages
Major Project Mid-Evaluation
No ratings yet
Major Project Mid-Evaluation
51 pages
How To Use TBS 6900 With Kylone
No ratings yet
How To Use TBS 6900 With Kylone
29 pages
AM3440
No ratings yet
AM3440
37 pages
Corebootcamp 1o4 Hci Rebranded
No ratings yet
Corebootcamp 1o4 Hci Rebranded
61 pages
Sidak Hut Klu
No ratings yet
Sidak Hut Klu
14 pages
Module 13-14
No ratings yet
Module 13-14
3 pages
Nutanix-Files 1598838521771
No ratings yet
Nutanix-Files 1598838521771
47 pages
TN 2041 Nutanix Files
No ratings yet
TN 2041 Nutanix Files
66 pages
Powerpoint Historical Review: Cathleen Belleville A Bit Better Corporation
No ratings yet
Powerpoint Historical Review: Cathleen Belleville A Bit Better Corporation
13 pages
KMS Activation
No ratings yet
KMS Activation
35 pages
1gb Nand Spi M68a
No ratings yet
1gb Nand Spi M68a
44 pages
VPC PeeringLab
No ratings yet
VPC PeeringLab
9 pages
Cubase 7 Quick Start Guide
No ratings yet
Cubase 7 Quick Start Guide
95 pages
Bits & Bytes at Production Engg by Ace Academy
71% (7)
Bits & Bytes at Production Engg by Ace Academy
127 pages
Nutanix Certified Professional (NCP) : Program Overview
No ratings yet
Nutanix Certified Professional (NCP) : Program Overview
3 pages
Cisco APIC Installation and ACI Upgrade and Downgrade Guide - Ravi K
100% (1)
Cisco APIC Installation and ACI Upgrade and Downgrade Guide - Ravi K
13 pages
Nutanix ECA v6.5 Datasheet
No ratings yet
Nutanix ECA v6.5 Datasheet
5 pages
Nutanix - Advanced-Setup-Guide-AOS-v51
No ratings yet
Nutanix - Advanced-Setup-Guide-AOS-v51
29 pages
Livret Accueil VDI v2
No ratings yet
Livret Accueil VDI v2
8 pages
Cisco ACI + Nutanix Integration - Best Practices PDF
No ratings yet
Cisco ACI + Nutanix Integration - Best Practices PDF
34 pages
Session Plan-Customer Services NC II Lo1
No ratings yet
Session Plan-Customer Services NC II Lo1
10 pages
Nutanix Files
No ratings yet
Nutanix Files
64 pages
Course Catalog APRIL 2021
No ratings yet
Course Catalog APRIL 2021
10 pages
Nutanix Controller VM Security Operations Guide
No ratings yet
Nutanix Controller VM Security Operations Guide
15 pages
Field Installation Guide-V2 1 Foundation
No ratings yet
Field Installation Guide-V2 1 Foundation
61 pages
Security Analyitics With Apache Metron
No ratings yet
Security Analyitics With Apache Metron
3 pages
Computers in Our Daily Life
No ratings yet
Computers in Our Daily Life
27 pages
BP 2105 Linux On AHV
No ratings yet
BP 2105 Linux On AHV
34 pages
How To Get Rid of Tavo - Exe, Kavo - Exe Trojan Curiouser and Curiouser!
No ratings yet
How To Get Rid of Tavo - Exe, Kavo - Exe Trojan Curiouser and Curiouser!
3 pages
Nutanix On Hpe Proliant: Nutanix Best Practices Version 2.1 - February 2019 - Bp-2086
No ratings yet
Nutanix On Hpe Proliant: Nutanix Best Practices Version 2.1 - February 2019 - Bp-2086
22 pages
Could Computing
No ratings yet
Could Computing
2 pages
Ds VMC Ahv
No ratings yet
Ds VMC Ahv
3 pages
EASE Scan Tool V11Setup Instructions (Software ONLY) 022813
No ratings yet
EASE Scan Tool V11Setup Instructions (Software ONLY) 022813
7 pages
Enterprise Cloud Administration 5.15: Description
No ratings yet
Enterprise Cloud Administration 5.15: Description
4 pages
Enterprise Cloud Administration (ECA) : Description
No ratings yet
Enterprise Cloud Administration (ECA) : Description
3 pages
NCP Datasheet 0918
No ratings yet
NCP Datasheet 0918
3 pages
CN Connecting Devices
No ratings yet
CN Connecting Devices
3 pages
Lesson 4
No ratings yet
Lesson 4
2 pages
Numerical Computing Extra Exercises ch.1
No ratings yet
Numerical Computing Extra Exercises ch.1
7 pages
Nutanix Certified Professional (NCP) : Program Overview
No ratings yet
Nutanix Certified Professional (NCP) : Program Overview
3 pages
AI-Driven Digital Transformation: A Proven Blueprint for Responsible AI Scaling
From Everand
AI-Driven Digital Transformation: A Proven Blueprint for Responsible AI Scaling
Srikanth Victory
No ratings yet
Linux For Beginners : From Zero To System Admin
From Everand
Linux For Beginners : From Zero To System Admin
David William
No ratings yet
Cloud Computing : Beginners And Intermediate User Guide
From Everand
Cloud Computing : Beginners And Intermediate User Guide
David comer
No ratings yet
Plain JavaScript: Learning the Front-End
From Everand
Plain JavaScript: Learning the Front-End
Roger Beans-Rivet
No ratings yet
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
From Everand
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
Michael Basler
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
Cybersecurity for Executives: A Guide to Protecting Your Business
From Everand
Cybersecurity for Executives: A Guide to Protecting Your Business
Matthew C. Smith
No ratings yet
Blog Smarter, Not Harder: SEO, Blogging, and AI Strategies to Skyrocket Your Traffic
From Everand
Blog Smarter, Not Harder: SEO, Blogging, and AI Strategies to Skyrocket Your Traffic
Jay Nans
No ratings yet
CAN Bus for Beginners: A Practical Guide to Automotive Networking
From Everand
CAN Bus for Beginners: A Practical Guide to Automotive Networking
Mohamad Charara
No ratings yet
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
No ratings yet
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
A To Z of Internet: Everything You Wanted to Know
From Everand
A To Z of Internet: Everything You Wanted to Know
Bittu Kumar
No ratings yet
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
From Everand
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
Matthew C. Smith
No ratings yet
10K Blueprint
From Everand
10K Blueprint
Cian O Farrell
5/5 (2)
Software Patterns Made Easy
From Everand
Software Patterns Made Easy
Justice Nanhou
No ratings yet