Pure Storage VDI Reference Architecture For VMware View
Pure Storage VDI Reference Architecture For VMware View
Audience
The target audience for this document includes storage and virtualization administrators, consulting
data center architects, field engineers, and desktop specialists who want to implement VMware View
based virtual desktops on the FlashArray. A working knowledge of VMware vSphere, VMware View,
server, storage, network and data center design is assumed but is not a prerequisite to read this
document.
!
• We then repeated the test using 1,000 persistent full-clone desktops, achieving the same View
Planner score and showing that users can confidently use any combination of linked clone or
full clone persistent desktops on the FlashArray – both perform the same.
• Throughout the testing the FlashArray delivered up to 50,000 IOPS and maintained latency
under 1.1ms, demonstrating the FlashArray’s consistent latency and ability to deliver the best
all-flash VDI end-user experience at all times. The FlashArray delivers a better desktop
experience for end-users than dedicated laptops with SSDs, and doesn’t risk the end-user
experience by relying on caching as hybrid flash/disk arrays do.
• In total throughout the testing we deployed more than 2,000 desktops, including both 1,000
linked clones and 1,000 persistent desktops (each of 31 GB disk size), together only consuming
about 1.1 TB of physical storage on the FlashArray. This massive data reduction (>20-to-1) is the
result of the high-performance inline data reduction (deduplication and compression) delivered
by the FlashArray, which enables using any combination of linked clones or persistent full-
clone desktops – both of which reduce to about the same amount of space on the array.
• As tested, the 11TB FlashArray FA-320 delivered best-in-class VDI performance at a cost of
$100/desktop for 2,000 desktops. Since the FlashArray was significantly under-utilized
throughout the testing on both a capacity and performance basis, the array could have
supported 1,000s more desktops, or a smaller array could have been used, either of which
would have reduced the $/desktop cost even further.
• Throughout the testing we performed common VDI administrator operations and found a
drastic reduction in time for recomposing desktops, cloning persistent desktops, (re)booting
desktops, and other day-to-day virtual desktop operations. Taken together these operational
savings deliver substantial efficiency gains for VDI administrators throughout the VDI day.
• The power footprint for the tested FA-320 FlashArray was 9 Amps (110V) which is a fraction of
any mechanical disk storage array available in the marketplace. This configuration consumed
eight rack units (8 RU) in data center space.
• This reference architecture can be treated as a 1,000 desktop building block. Customers can
add more server and infrastructure components to scale the architecture out to 1,000s of
desktops. Based on the results, we believe a single FA-320 can support up to 5,000
desktops with any mix of linked clones and/or persistent desktops.
!
So if VDI comes with so many potential advantages, why has adoption of VDI been so slow? The
reality is that the path to achieving the VDI promise land is a difficult one, and many organizations have
abandoned their VDI initiatives outright or in partial stages of deployments. The reasons why are
many, but most failed deployments boil-down to three key issues:
• Too expensive: VDI is often positioned as a technology to reduce desktop cost, but in reality most
find that they are unable to achieve the promised ROI due to infrastructure costs. In particular,
expensive server, networking, and storage devices are often dramatically more expensive than
dedicated desktops/laptops.
• Poor end-user experience: if VDI isn’t implemented properly, the end result is slow or unavailable
desktops that can lead to user frustration and lost productivity.
• Too difficult to manage: VDI shifts the desktop administration burden from the end-users to IT
staff. While this affords many security and administrative benefits, it also means more work for
often burdened IT staff, especially if the VDI environment itself isn’t architected correctly.
More often than not, one of the chief contributors to all three of these failure modes is storage.
Traditional disk-based storage is optimized for high-capacity, modest performance, and read-heavy
workloads – the exact opposite of VDI which is write-heavy, very high performance, and low-capacity.
The result is that as performance lags, spindle after spindle of legacy disk storage has to be thrown at
VDI, causing a spike in infrastructure costs and a spike in management complexity.
In this reference architecture for virtual desktops we’re going to explore how a new, 100%-flash based
approach to VDI can help overcome the key VDI failure traps, and help deliver a VDI solution that both
end-users and IT administrators will love. We’ll start with a high level overview of the Pure Storage
FlashArray followed by the test infrastructure components that was put together for this work and dive
into the details of each component. Finally, we’ll discuss the results of the VMware View Planner load
generator and the operational benefits of using Pure Storage FlashArray for virtual desktop
deployment.
The FlashArray’s entire architecture was designed to reduce the cost of 100% flash storage, and it
combines the power of consumer-grade MLC flash memory with inline data reduction technologies
(deduplication, compression, thin provisioning) to drive the cost of 100% flash storage to be inline or
under the cost of traditional enterprise disk storage. Data reduction technologies are particularly
effective in VDI environments, typically providing >5-to-1 reduction for stateless desktops and >10-to-1
reduction for stateful desktops.
Simplicity
100% MLC Flash
It’s important to note that unlike some flash appliances, the FlashArray was designed with enterprise-
class scale and resiliency in mind. That means a true active/active controller architecture, online
capacity expansion, and online non-disruptive code upgrades. The FlashArray also employs a unique
form of RAID protection, called RAID-3D™, which is designed to protect against the three failure modes
of flash: device failure, bit errors, and performance variability.
Last but not least, the FlashArray is the simplest enterprise storage that you’ll ever use. We’ve
designed from the start to remove the layers of complexity of LUN, storage virtualization, RAID, and
caching management common in traditional arrays, and have integrated management directly into
VMware vSphere’s Web Client, making management of a VDI environment seamless.
• Create a scalable building block that can be easily replicated at any customer site using a
customer’s chosen server and networking hardware.
• Create a design that is resilient, even in the face of failure of any component. For example, we
include best practices to enforce multiple paths to storage, multiple NICs for connectivity, and
high availability (HA) clustering including dynamic resource scheduling (DRS) on vSphere.
• Take advantage of inline data reduction and low latency of the Pure Storage FlashArray to
push the envelope on desktops-per-server density.
• Avoid tweaks to make the results look better than a normal out-of-box environment.
Solution Overview
Figure 1 shows a topological view of the test environment for our reference architecture. The VMware
View infrastructure components were placed on a dedicated host. We tested 1,000 linked clone
desktops, 1,000 full-clone persistent desktops, and various mixtures of the two. The infrastructure,
virtual machines and desktops were all hosted on a single 11TB FlashArray FA-320 (although the
workload would have easily fit on the smallest 2.75TB FA-320 or FA-310 as well). VMware vSphere and
VMware View best practices were used in addition to the stringent requirements as mandated by the
View Planner guideline document [See reference 1].
• One 11TB Pure Storage FlashArray (FA-320) in HA configuration, including two controllers and
two disk shelves:
— Ten x 4 TB volumes were carved out of the Pure FlashArray to host 2,000 desktops
(1,000 linked clones + 1,000 persistent desktops)
— A separate 600 GB volume was used to hold all the infrastructure components
• Eight Intel Xeon x5690 based commodity servers with 192 GB of memory running ESXi 5.1
were used to host the desktops
• One dedicated server was used to host the all of the infrastructure virtual machines:
— SQL server for both virtual center and View event database
Figure 1: Test Environment overview of VMware View deployment with infrastructure components,
ESX hosts and Pure Storage FlashArray volumes.
!
!
Figure 2: Detailed Reference Architecture Configuration
Figure 2 shows a detailed topology of the reference architecture configuration. A major goal of the
architecture is to build out a highly redundant and resilient infrastructure. Thus, we used powerful
servers with dual Fibre Channel ports connected redundantly to two SAN switches that were
connected to redundant FC target ports on the FlashArray. The servers were hosted in a vSphere HA
cluster and had redundant network connectivity.
Component Description
Controllers Two active/active controllers which provided highly redundant SAS connectivity
(24Gb) to two shelves and were interconnected for HA via two redundant InfiniBand
connections (40Gb)
Shelves Two flash memory shelves with 22 SSD drives, 22 X 256 GB or a total raw capacity of
11TB (10.3 TiB).
External Four 8Gb Fibre Channel ports or four 10 Gb Ethernet ports per controller, total of eight
Connectivity ports for two controllers. As shown in figure 2, only four Fibre Channel ports (two FC
ports from each controller) were used for this test.
Management Two redundant 1 Gb Ethernet management ports per controller. Three management IP
Ports addresses are required to configure the array, one for each controller management
port and a third one for virtual port IP address for seamless management access.
Power Dual power supply rated at 450W per controller and 200W per storage shelf or
approximately 9 Amps of power
Space The entire FA-320 system was hosted on eight rack unit (8 RU) space (2 RU for each
controller and 2 RU for each shelf).
There was no special tweaking or tuning done on the FlashArray; we do not recommend any special
tunable variables as the system is designed to perform out of the box.
A common question when provisioning storage is how many LUNs of what size should be created to
support the virtual desktop deployment. Because linked clone desktops take very little space, we
could have either put all the virtual desktops in one big LUN or spread them across several LUNs. The
FlashArray supports the VMware VAAI ATS primitive which gives you access of multiple VMDKs on a
single LUN (Note in vSphere 5.x the maximum size of a LUN is 64 TB). VAAI ATS eliminates serializing
of VMFS locks on the LUN, which used to severely limit VM scalability in previous ESX versions. See
Appendix A for more details on provisioning Pure Storage.
Since we are advocating placing the OS image, user data, persona and application data on the same
storage, we need to take in account the size of those drives when calculating the LUN size.
Consider a desktop with a 30 GB base image including applications and app data, 20GB of user data
and we need to provision “d” desktops:
• One could distribute the “d” desktops across “n” LUNs with (50 * d) / n desktops on each LUN.
Regardless of the data reduction, we need to create the LUN with the correct size so that vSphere
doesn’t run out of storage capacity. Figure 4 below shows the virtual desktop deployment on Pure
Storage.
!
!
Figure 4: OS image, applications, user data and application data hosted on Pure Storage
!
Figure 5: Data reduction of 1,000 Windows 7 linked clone desktops
!
Figure 6: Data reduction of 2,000 desktops: 1,000 linked clones plus 1,000 full clones
!
The persistent desktops had 1 GB of user data each, so for 1,000 desktops for different view Planner
user customization (profiles/registry settings), the user data added up to 1.19 TB. In a real world
scenario, the data reduction number is more in the order of 10 to 1 as the user data would differ more
than in our example. Note that the OS image doesn’t add to the physical space as most blocks are
deduplicated.
Unlike traditional storage arrays, we used a common LUN to store the OS image, user data, application
data, and persona. We don’t see any benefits in separating them on the FlashArray. We do not do data
reduction on a volume basis; it is done across the entire array, which is reflected in the shared data in
the capacity bar above.
Component Description
Processor 2 X Intel Xeon X5690 @ 3.47GHz (12 Cores total, 24 Logical CPUs)
HBA Dual port Qlogic ISP2532-based 8Gb Fibre Channel PCIe card
BIOS Intel Virtualization Tech, Intel AES-NI, Intel VT-D features were enabled
SAN Configuration
Figure 2 shows the SAN switch connectivity with two Cisco 8Gb MDS 9148 Switch (48 ports). The key
point to note is that there is no single point of failure in the configuration. The connectivity is highly
resilient in terms of host initiator port or HBA failure, SAN switch failure, a controller port failure, or
even array controller failure. The zoning on the Cisco MDS follows best practices i.e a single initiator
and single target zoning. All eight ESXi host dual HBA port World Wide Names (pWWN) were zoned to
see the four Pure Storage FlashArray target port World Wide Names. The target ports were picked
such that on a given controller we had one port from each target Qlogic adapter connected to one
switch, the other Qlogic adapter port was connected to second switch (See Figure 2 for the wiring
details). This resulted in ESXi 5.1 hosts to see 8 distinct paths to the Pure Storage FlashArray LUNs
(Figure 7 shows vCenter datastore details). Check Appendix B for a sample Cisco MDS zoning.
Network Configuration
Figure 8 below illustrates the network design used for the desktop deployment. A virtual machine was
setup to run AD/DNS and DHCP services and we used a private domain. As large numbers of desktops
were to be deployed, we wanted to setup our own private VLAN (VLAN 131) for desktops to hand out
IP addresses to virtual desktops that were spun up. A separate VLAN (VLAN 124) was used for
management network including the ESXi hosts on a single Cisco 3750 1Gb Ethernet switch (48 ports).
!
Figure 8: Logical view of the reference architecture showing network configuration
We set all the Pure Storage LUNs to a round robin policy from vMA using the CLI command -
for i in `esxcli storage nmp device list | grep PURE|awk '{print $8}'|sed 's/(//g'|sed
's/)//g'` ; do esxcli storage nmp device set -d $i --psp=VMW_PSP_RR ; done
For our tests, we set the default PSP for VMW_SATP_ALUA as VMW_PSP_RR and every Pure Storage
LUN configured got the round robin policy. The following command accomplished that:
Figure 9 shows a properly configured Pure Storage LUN with VMware Round Robin PSP.
!
!
The default 128 ports on a virtual switch were changed to 248, as there was a potential to put more
desktops on a single host (host reboot is mandated for this change). The MTU was left at 1500.
!
Figure 11: Virtual switch properties showing 248 ports
The same can be accomplished by using a vMA appliance and the command line (a script was used to
configure these settings):
The qlogic HBA max queue depth was increased to 64 from its default value on all hosts (see VMware
KB article 1267 for setting this value):
There was no other tuning that was done to the vSphere server.
We created a master Microsoft Windows 2008 R2 template with all the updates and cloned the
different infrastructure VMs. The SQL server VM hosted the Microsoft SQL Server 2008 R2 database
instance of the vCenter database and the VMware View events logging database. The VMware® View
Planner is a product of VMware, Inc. and can be obtained via the VMware partner program website.
Their newest version of View Planner 2.1 is a Centos-based appliance available in the form of a OVF to
deploy in vCenter. The description of each of the infrastructure component is shown in Figure 13.
!
Figure 13: Infrastructure Virtual Machine component detailed description
The management infrastructure host used for the infrastructure VMs was provisioned with a 600 GB
LUN; the server configuration is shown in Table C below.
!
Table C: Management infrastructure host configuration details
The VMware View solution helps IT organizations automate desktop and application management,
reduce costs and increase data security through centralization of the desktop environment. This
centralization results in greater end-user freedom and increased control for IT organizations. By
encapsulating the operating systems, applications and user data into isolated layers, IT organizations
can deliver a modern desktop. It can then deliver dynamic, elastic desktop cloud services such as
applications, unified communications and 3D graphics for real-world productivity and greater business
agility.
Unlike other desktop virtualization products, VMware View is built on, and tightly integrated with,
vSphere, the industry-leading virtualization platform, allowing customers to extend the value of
VMware infrastructure and its enterprise class features such as high availability, disaster recovery and
business continuity.
View 5 includes many enhancements to the end-user experience and IT control. Some of the more
notable features include:
• View Media Services for 3D Graphics—enable View desktops to run basic 3D applications such
as Aero, Office 2010 or those requiring OpenGL or DirectX—without specialized graphics cards
or client devices
• View Media Services for Integrated Unified Communications—integrate voice over IP (VoIP)
and the View desktop experience for the end user through an architecture that optimizes
performance for both the desktop and unified communications
• View Client for Android—enables end users with Android-based tablets to access View virtual
desktops
For additional details and features available in VMware View 5, see the release notes.
Typical VMware View 5 deployments consist of several common components (illustrated in Figure 14
below), which represent a typical architecture. It includes VMware View components as well as other
components commonly integrated with VMware View.
!
Figure 14: VMware View architecture overview
!
Figure 15: Automated desktop pool settings for Windows 7
Other changes to View connection included increasing the number of concurrent operations on the
Virtual Center server.
VMware View Administrator ! View Configuration ! Servers ! vCenter Servers ! Edit ! Advanced
!
In order to do faster recompose, we changed the settings on the View Composer to do batches of 100
!
Table D: Windows 7 virtual desktop configuration summary
View Planner runs a set of application operations selected to be representative of real‐world user
applications, and reports data on the latencies of those operations. In our tests, we used this tool to
simulate a real world scenario, then accepted the resultant application latency as a metric to measure
end user experience.
View Planner has three run modes based on what is getting tested, including passive mode, remote
mode and local mode. We did local mode testing with VMware View based desktops with the
The View Planner appliance was made accessible to Pure Storage, a VMware partner, as part of the
Rapid Desktop program. Test bed was configured and Windows 7 desktop base image setup was
done in strict adherence to the View Planner installation and user guide document, version 2.1.
The following parameters were tweaked in the View Planner adminops.cfg file for booting more
machines at a time:
CONCURRENT_POWERONS_ONE_MINUTE=100
CONCURRENT_LOGONS_ONE_MINUTE=100
RESET_TIMER_PERIOD_IN_SECONDS=1800
POWERON_DESKTOPS=1
!
Figure 16: View Planner “Group A” Operations Latency
Figure 17 below shows the Pure Storage GUI dashboard, which shows that the latency during the
entire duration of the tests was within 1 millisecond. The maximum CPU utilization on the servers was
78% and the memory used was 100% as shown in Figure 18. We saw no ballooning and no swapping
as we had 192 GB of memory on each virtual desktop server host.
!
Figure 18: vCenter performance data of a single host; CPU and memory utilization
Common operations like creating virtual machine clones from template, Storage vMotion, vmfs
datastore creation, VM snapshots, and general infrastructure deployment are all tremendously
accelerated compared to mechanical disk.
Storage administrators have adapted to myriads of painstaking ordeals to provision storage and it is
refreshing to see that those practices can be put to rest with our storage management approach. In
addition to ease of storage capacity management and provisioning we found several benefits that help
in rapid deployment and adaption of virtual desktops. These benefits of FlashArray for virtual desktops
are broadly classified into three sections and we describe them in detail in the next subsection.
With the FlashArray, we were able to demonstrate a 1,000 desktop patch push out in less than two
hours while sustaining 40K IOPS with half millisecond latencies. See Figure 20 below for the
FlashArray GUI dashboard view during recomposing 1,000 desktops. We tuned the View composer to
perform more concurrently, as mentioned in the View configuration section.
Desktop administrators can recompose their desktops anytime of the day, as the IOPS and latency are
not a limiter on the FlashArray. They can even recompose a pool of desktops that is not in use while
other pools are actively in use. This not only makes pushing out patches more efficient, but also keeps
the organization free from malware, virus infections, worms and other common bugs that plague
desktops due to a lack of timely updates. The efficiency of the organization improves many folds and
the software applications are always up-to-date.
!
Figure 20: Dashboard view of recomposing 1,000 desktops in less than two hours
We simulated the worst-case scenario of powering on 1,000 virtual desktops and measuring the
backend IOPS. The Figure 21 below shows the Pure Storage GUI dashboard for this activity. We
sustained upwards of 55K IOPS and booted 1,000 virtual desktops in less than 10 minutes while
maintaining less than 1 msec latency. This is phenomenal testimony of how the FlashArray can
withstand heavy load like boot storms and still deliver sub-millisecond latency.
!
Figure 21: Booting 1,000 VMs in less than 10 minutes with sustained IOPS upto 50K and < 1msec
latency
In the next section we talk more about the different configurations of FlashArray that can be procured
for your VDI deployment. The different form factors are designed to host certain number of desktops
and cost varies based on the FlashArray configuration deployed.
As shown in the Figure 22 below, a pilot can be implemented on a single controller and ½ drive shelf
system. As the deployment passes out of the pilot phase, you can upgrade to a two-controller HA
system and ½ drive shelf for 1,000 desktops. As your user data grows, additional shelves can be
added. Both controllers and shelves can be added without downtime.
If more desktops are needed, customers can expand to a full shelf to accommodate up to 2,000
desktops. For a 5,000 desktop deployment or larger, we recommend a fully-configured FA-320 with
two controllers and two drive shelves. The sizing guidelines below are approximations based upon
best practices, your actual desktop density may vary depending on how the desktops are configured,
whether or not user data is stored in the desktops or the array, and a variety of other factors. Pure
Storage recommends a pilot deployment in your user community to fully-understand space and
performance requirements.
Adding a new shelf to increase capacity is very straightforward and involves simply connecting SAS
cables from the controller to the new shelf that can be done while the array is online. The Pure Storage
FlashArray features stateless controllers, which means all the configuration information is stored on the
storage shelves instead of within the controllers themselves. In the event of a controller failure, one
can easily swap out a failed controller with a new controller without reconfiguring SAN zoning, which
again can be done non-disruptively.
Now that Pure Storage has broken the price barrier for VDI on
100% flash storage, why risk your VDI deployment on disk?
!
Prior to that he was part of the storage ecosystem engineering team at VMware for
three years, and a lead engineer at VERITAS working on storage virtualization,
volume management and file system technologies for the prior eight years.
References
1. VMware View 5 Performance and Best Practices: http://www.vmware.com/files/pdf/view/VMware-
View-Performance-Study-Best-Practices-Technical-White-Paper.pdf
4. VMware View Planner Installation and User Guide – Version 2.1 dated 10/24/2011
!
New hosts are created using step 2 and purehgroup setattr –addhostlist HOSTLIST HGROUP is used
new hosts to the host group.
The figure below shows the Pure Host Group and LUN configuration.
!
The Pure Storage GUI can accomplish this using similar operation.
# conf t
(config-zone) # exit
(config-zone) # exit
(config-zone) # exit
(config-zone) # exit
! !
T: 650-290-6088
F: 650-625-9667
Sales: sales@purestorage.com
Support: support@purestorage.com
Media: pr@purestorage.com
General: info@purestorage.com