Mellanox WinOF
Mellanox WinOF
Rev 1.45
SW version 1.45.50000
www.mellanox.com
Rev 1.45
NOTE:
THIS HARDWARE, SOFTWARE OR TEST SUITE PRODUCT (“PRODUCT(S)”) AND ITS RELATED
DOCUMENTATION ARE PROVIDED BY MELLANOX TECHNOLOGIES “AS -IS” WITH ALL FAULTS OF ANY
KIND AND SOLELY FOR THE PURPOSE OF AIDING THE CUSTOMER IN TESTING APPLICATIONS THAT USE
THE PRODUCTS IN DESIGNATED SOLUTIONS. THE CUSTOMER'S MANUFACTURING TEST ENVIRONMENT
HAS NOT MET THE STANDARDS SET BY MELLANOX TECHNOLOGIES TO FULLY QUALIFY THE PRODUCT (S)
AND/OR THE SYSTEM USING IT. THEREFORE, MELLANOX TECHNOLOGIES CANNOT AND DOES NOT
GUARANTEE OR WARRANT THAT THE PRODUCTS WILL OPERATE WITH THE HIGHEST QUALITY. ANY
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT ARE DISCLAIMED.
IN NO EVENT SHALL MELLANOX BE LIABLE TO CUSTOMER OR ANY THIRD PARTIES FOR ANY DIRECT ,
INDIRECT, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES OF ANY KIND(INCLUDING, BUT NOT
LIMITED TO, PAYMENT FOR PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA,
OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY , OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY FROM THE USE OF THE PRODUCT(S) AND RELATED DOCUMENTATION EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Mellanox Technologies
350 Oakmead Parkway Suite 100
Sunnyvale , CA 94085
U.S.A.
www.mellanox .com
Tel: (408) 970-3400
Fax: (408) 970-3403
Mellanox®, Mellanox logo, Accelio®, BridgeX®, CloudX logo, CompustorX®, Connect-IB®, ConnectX®, CoolBox®,
CORE-Direct®, EZchip®, EZchip logo, EZappliance®, EZdesign®, EZdriver®, EZsystem®, GPUDirect®, InfiniHost®,
InfiniScale®, Kotura®, Kotura logo, Mellanox Federal Systems®, Mellanox Open Ethernet®, Mellanox ScalableHPC®,
Mellanox TuneX®, Mellanox Connect Accelerate Outperform logo , Mellanox Virtual Modular Switch
®, MetroDX®,
MetroX®, MLNX-OS®, NP-1c®, NP-2®, NP-3®, Open Ethernet logo, PhyX®, PSIPHY®, SwitchX®, Tilera®, Tilera logo,
TestX®, TuneX®, The Generation of Open Ethernet logo, UFM®, Virtual Protocol Interconnect®, Voltaire® and Voltaire
logo are registered trademarks of Mellanox Technologies
, Ltd.
Table of Contents
Mellanox Technologies 3
Rev 1.45
3.3 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.1 Hyper-V with VMQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.2 Network Virtualization using Generic Routing Encapsulation (NVGRE) . . . . . . 52
3.3.3 Single Root I/O Virtualization (SR-IOV) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.4 Virtual Machine Multiple Queue (VMMQ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.3.5 Network Direct Kernel Provider Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3.6 PacketDirect Provider Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.4 Configuration Using Registry Keys. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.4.1 Finding the Index Value of the Network Interface . . . . . . . . . . . . . . . . . . . . . . . . 77
3.4.2 Basic Registry Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.4.3 Offload Registry Keys. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.4.4 Performance Registry Keys. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.4.5 Ethernet Registry Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.4.6 Network Direct Interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.4.7 Win-Linux nd_rping Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.5 Performance Tuning and Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.5.1 General Performance Optimization and Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.5.2 Application Specific Optimization and Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.5.3 Tunable Performance Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.5.4 Adapter Proprietary Performance Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Chapter 4 Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.1 Fabric Performance Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.2 Management Utilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.2.1 mlx5cmd Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.3 Snapshot Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.3.1 Snapshot Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Chapter 5 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.1 Installation Related Troubleshooting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.1.1 Installation Error Codes and Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.2 Ethernet Related Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.3 Performance Related Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.3.1 General Diagnostic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.4 Virtualization Related Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.5 Reported Driver Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.6 State Dumping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.7 Extracting WPP Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Appendix A NVGRE Configuration Scripts Examples . . . . . . . . . . . . . . . . . . .117
A.1 Adding NVGRE Configuration to Host 14 Example . . . . . . . . . . . . . . . . . . 117
A.2 Adding NVGRE Configuration to Host 15 Example . . . . . . . . . . . . . . . . . . 118
Appendix B Windows MPI (MS-MPI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .120
B.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
B.1.1 System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
B.2 Running MPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
B.3 Directing MSMPI Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Mellanox Technologies 4
Rev 1.45
Mellanox Technologies 5
Rev 1.45
List of Tables
Mellanox Technologies 6
Rev 1.45
Mellanox Technologies 7
Rev 1.45
Mellanox Technologies 8
Rev 1.45
The document describes WinOF-2 Rev 1.45 features, performance, diagnostic tools, content and
configuration. Additionally, this document provides information on various performance tools
supplied with this version.
Intended Audience
This manual is intended for system administrators responsible for the installation, configuration,
management and maintenance of the software and hardware of Ethernet adapter cards. It is also
intended for application developers.
Documentation Conventions
Mellanox Technologies 9
Rev 1.45
Mellanox Technologies 10
Rev 1.45
Related Documents
Table 4 - Related Documents
Document Description
MFT User Manual Describes the set of firmware management tools for a single Infini-
Band node. MFT can be used for:
• Generating a standard or customized Mellanox firmware image
Querying for firmware information
• Burning a firmware image to a single InfiniBand nodeEnabling
changing card configuration to support SRIOV
WinOF-2 Release Notes For possible software issues, please refer to WinOF-2 Release
Notes.
README file Includes basic installation instructions, summary of main features
and requirements.
ConnectX®-4 Firmware For possible firmware issues, please refer to ConnectX®-4 Firm-
Release Notes ware Release Notes.
Mellanox Technologies 11
Rev 1.45
1 Introduction
This User Manual describes installation, configuration and operation of Mellanox WinOF-2
driver Rev 1.45 package.
Mellanox WinOF-2 is composed of several software modules that contain Ethernet drivers only
(InfiniBand drivers are not supported yet). It supports 10, 25, 40, 50 or 100 Gb/s Ethernet net-
work ports. The port speed is determined upon boot based on card capabilities and user settings.
The Mellanox WinOF-2 driver release introduces the following capabilities:
• Support for Single and Dual port Adapters
• Receive Side Scaling (RSS)
• Receive Side Coalescing (RSC)
• Hardware Tx/Rx checksum offload
• Large Send Offload (LSO)
• Adaptive interrupt moderation
• Support for MSI-X interrupts
• Network Direct Kernel (NDK) with support for SMBDirect
• Virtual Machine Queue (VMQ) for Hyper-V
• Hardware VLAN filtering
• RDMA over Converged Ethernet
• RoCE MAC Based (v1)
• RoCE IP Based (v1)
• RoCE over IP (v1.5)
• RoCE over UDP (v2)
• VXLAN
• NDKPI v2.0
• VMMQ
• PacketDirect Provider Interface (PDPI)
• NVGRE hardware encapsulation task offload
• Quality of Service (QoS)
• Support for global flow control and Priority Flow Control (PFC)
• Enhanced Transmission Selection (ETS)
• Single Root I/O Virtualization (SR-IOV)
Mellanox Technologies 12
Rev 1.45
2 Installation
2.1 Hardware and Software Requirements
Table 5 - Hardware and Software Requirements
Descriptiona Package
Mellanox Technologies 13
Rev 1.45
Step 3. Download the .exe image according to the architecture of your machine (see Step 1). The
name of the .exe is in the following format
MLNX_WinOF2-<version>_<arch>.exe.
Installing the incorrect .exe file is prohibited. If you do so, an error message will be displayed. For
example, if you try to install a 64-bit .exe on a 32-bit machine, the wizard will display the follow-
ing (or a similar) error message:
Mellanox Technologies 14
Rev 1.45
Step 5. Read then accept the license agreement and click Next.
Mellanox Technologies 15
Rev 1.45
Mellanox Technologies 16
Rev 1.45
Step 7. The firmware upgrade screen will be displayed in the following cases:
• If the user has an OEM card. In this case, the firmware will not be displayed.
• If the user has a standard Mellanox card with an older firmware version, the firmware will be updated
accordingly. However, if the user has both an OEM card and a Mellanox card, only the Mellanox
card will be updated.
Mellanox Technologies 17
Rev 1.45
Mellanox Technologies 18
Rev 1.45
Step 9. In case firmware upgrade option was checked in Step 7, you will be notified if a firmware
upgrade is required (see ).
Mellanox Technologies 19
Rev 1.45
If no reboot options are specified, the installer restarts the computer whenever necessary without
displaying any prompt or warning to the user.
Use the /norestart or /forcerestart standard command-line options to control reboots.
Applications that hold the driver files (such as ND applications) will be closed during the unat-
tended installation.
Mellanox Technologies 20
Rev 1.45
Mellanox Technologies 21
Rev 1.45
Step 4. Click Change and specify the location in which the files are extracted to.
Mellanox Technologies 22
Rev 1.45
Step 5. Click Install to extract this folder, or click Change to install to a different folder.
Mellanox Technologies 23
Rev 1.45
This location should be specified for DriversPath property when injecting driver into the
Nano server image:
New-NanoServerImage -MediaPath \\Path\To\Media\en_us -BasePath .\Base -TargetPath
.\InjectingDrivers.vhdx -DriversPath C:\WS2016TP5_Drivers
Mellanox Technologies 24
Rev 1.45
Mellanox Technologies 25
Rev 1.45
Changes made to the Windows registry happen immediately, and no backup is automati-
cally made.
Do not edit the Windows registry unless you are confident regarding the changes.
mt4115_pciconf0 bus:dev.fn=24:00.0
b. Identify the desired device by its "bus:dev.fn" address.
3. Execute the following command with the appropriate device name:
mlxconfig -d mt4115_pciconf0 set LINK_TYPE_P1=2
4. Reboot the system.
For further information, please refer to the MFT User Manual.
Mellanox Technologies 26
Rev 1.45
Step 3. Select Internet Protocol Version 4 (TCP/IPv4) from the scroll list and click Properties.
Mellanox Technologies 27
Rev 1.45
Step 4. Select the “Use the following IP address:” radio button and enter the desired IP information.
Mellanox Technologies 28
Rev 1.45
tion when creating address vectors. The library and driver are modified to provide mapping
from GID to MAC addresses required by the hardware.
The proposed RoCEv2 packets use a well-known UDP destination port value that unequivo-
cally distinguishes the datagram. Similar to other protocols that use UDP encapsulation, the
UDP source port field is used to carry an opaque flow-identifier that allows network devices to
implement packet forwarding optimizations (e.g. ECMP) while staying agnostic to the specifics
of the protocol header format.
The UDP source port is calculated as follows: UDP.SrcPort = (SrcPort XOR DstPort) OR
0xC000, where SrcPort and DstPort are the ports used to establish the connection.
For example, in a Network Direct application, when connecting to a remote peer, the destina-
tion IP address and the destination port must be provided as they are used in the calculation
above. The source port provision is optional.
Furthermore, since this change exclusively affects the packet format on the wire, and due to the
fact that with RDMA semantics packets are generated and consumed below the AP applications
can seamlessly operate over any form of RDMA service (including the routable version of
RoCE as shown in Figure 2,“RoCE and RoCE v2 Frame Format Differences”), in a completely
transparent way1.
1. Standard RDMA APIs are IP based already for all existing RDMA technologies
Mellanox Technologies 29
Rev 1.45
RDMA Application So
ft
w
ar
e
ND/NDK API
The fabric must use the same protocol stack in order for nodes to communicate.
In earlier versions, the default value of RoCE mode was RoCE v1. Starting from v1.30,
the default value of RoCE mode will be RoCEv2.
Upgrading from earlier versions to version 1.30 or above will save the old default value
(RoCEv1).
Mellanox Technologies 30
Rev 1.45
Mellanox Technologies 31
Rev 1.45
Mellanox Technologies 32
Rev 1.45
The supported RoCE modes depend on the firmware installed. If the firmware does not sup-
port the needed mode, the fallback mode would be the maximum supported RoCE mode of
the installed NIC.
RoCE is enabled by default. Configuring or disabling the RoCE mode can be done via the
registry key.
• To update it for a specific adapter using the registry key, set the roce_mode as follows:
Step 1. Find the registry key index value of the adapter according to Section 3.4.1, “Finding the
Index Value of the Network Interface”, on page 77.
Step 2. Set the roce_mode in the following path:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4d36e972-e325-11ce-bfc1-
08002be10318}\<IndexValue>
• To update it for all the devices using the registry key, set the roce_mode as follows:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\mlx5\Parameters\Roce
For changes to take effect, please restart the network adapter after changing this registry key.
Parameter
Parameters Name Description Allowed Values and Default
type
roce_mode DWORD Sets the RoCE mode. The following • RoCE MAC Based = 0
are the possible RoCE modes: • RoCE v2 = 2
• RoCE MAC Based • No RoCE = 4
• RoCE v2 • Default: RoCE v2
• No RoCE
Mellanox Technologies 33
Rev 1.45
Network Congestion occurs when the number of packets being transmitted through the network
approaches the packet handling the capacity of the network. A congested network will suffer
from throughput deterioration manifested by increasing time delays and high latency.
In lossy environments, this leads to a packet loss. In lossless environments, it leads to “victim
flows” (streams of data which are affected by the congestion, caused by other data flows that
pass through the same network).
Example:
The figure below demonstrates a victim flow scenario. In the absence of congestion control,
flow X'Y suffers from reduced bandwidth due to flow F'G, which experiences congestion.
Mellanox Technologies 34
Rev 1.45
node, the source node reacts by decreasing, and later on increasing, the Tx rates according to
the feedback provided. The source node keeps increasing the Tx rates until the system reaches a
steady state of non-congested flow with traffic as high rate as possible.
The RoCEv2 Congestion Management feature is composed of three points:
• The congestion point (CP) - detects congestion and marks packets using the ECN bits
• The notification point (NP) (receiving end node) - reacts to the ECN marked packets by
sending congestion notification packets (CNPs)
• The reaction point (RP) (transmitting end node) - reduces the transmission rate accord-
ing to the received CNPs
These three components can be seen in the High-Level sequence diagram below:
For further details, please refer to the IBTA RoCeV2 Spec, Annex A-17.
Mellanox Technologies 35
Rev 1.45
Mellanox Technologies 36
Rev 1.45
Mellanox Technologies 37
Rev 1.45
Mellanox Technologies 38
Rev 1.45
DefaultUntaggedPriority 0 0-7
• In order to view the current default priority on the adapter, run the following command:
Mlx5Cmd.exe -QoSConfig -DefaultUntaggedPriority -Name -Get
• In order to set the default priority to a specific priority on the adapter, run the following
command:
Mlx5Cmd.exe -QoSConfig -DefaultUntaggedPriority -Name -Set
Changing the values of the parameters may strongly affect the congestion control efficiency.
Please make sure you fully understand the parameter usage, value and expected results before
changing its default value.
Mellanox Technologies 39
Rev 1.45
Mellanox Technologies 40
Rev 1.45
Mellanox Technologies 41
Rev 1.45
For information on the RCM counters, please refer to Section 3.5.4.1.5, “Mellanox WinOF-2
Congestion Control Counters”, on page 98.
3.1.5.1 Configuring a Network Interface to Work with VLAN in Windows Server 2012 and Above
In this procedure you DO NOT create a VLAN, rather use an existing VLAN ID.
Mellanox Technologies 42
Rev 1.45
Mellanox Technologies 43
Rev 1.45
Mellanox Technologies 44
Rev 1.45
After establishing the priorities of ND/NDK traffic, the priorities must have PFC
enabled on them.
Step 9. Disable Priority Flow Control (PFC) for all other priorities except for 3.
PS $ Disable-NetQosFlowControl 0,1,2,4,5,6,7
Step 10. Enable QoS on the relevant interface.
PS $ Enable-NetAdapterQos -InterfaceAlias "Ethernet 4"
Step 11. Enable PFC on priority 3.
PS $ Enable-NetQosFlowControl -Priority 3
To add the script to the local machine startup scripts:
Step 1. From the PowerShell invoke.
gpedit.msc
Step 2. In the pop-up window, under the 'Computer Configuration' section, perform the following:
1. Select Windows Settings
2. Select Scripts (Startup/Shutdown)
3. Double click Startup to open the Startup Properties
4. Move to “PowerShell Scripts” tab
5. Click Add
The script should include only the following commands:
PS $ Remove-NetQosTrafficClass
PS $ Remove-NetQosPolicy -Confirm:$False
PS $ set-NetQosDcbxSetting -Willing 0
PS $ New-NetQosPolicy "SMB" -Policystore Activestore -NetDirectPortMatchCondition
445 -PriorityValue8021Action 3
Mellanox Technologies 45
Rev 1.45
Mellanox Technologies 46
Rev 1.45
Sub-key Description
Data Center Bridging Exchange (DCBX) protocol is an LLDP based protocol which manages
and negotiates host and switch configuration. The WinOF-2 driver supports the following:
• PFC - Priority Flow Control
• ETS - Enhance Transmission Selection
• Application priority
Mellanox Technologies 47
Rev 1.45
The protocol is widely used to assure lossless path when running multiple protocols at the same
time. DCBX is functional as part of configuring QoS mentioned in section Section 3.1.6, “Con-
figuring Quality of Service (QoS)”, on page 43. Users should make sure the willing bit on the
host is enabled, using PowerShell if needed.:
set-NetQosDcbxSetting -Willing 1
This is required to allow negotiating and accepting peer configurations. Willing bit is set to 1 by
default by the operating system.
The new settings can be queried by calling the following command in PowerShell
Get-NetAdapterQos
Note: The below configuration was received from the switch in the below example.
The output would look like the following:
In a scenario where both peers are set to Willing, the adapter with a lower MAC address takes
the settings of the peer.
DCBX is disabled in the driver by default and in the some firmware versions as well.
Mellanox Technologies 48
Rev 1.45
To use DCBX:
1. Query and enable DCBX in the firmware.
a. Install WinMFT package and go to \Program Files\Mellanox\WinMFT
b. Get the list of devices, run "mst status".
Mellanox Technologies 49
Rev 1.45
1. The NETSTAT command confirms if the File Server is listening on the RDMA interfaces.
Mellanox Technologies 50
Rev 1.45
If you have no activity while you run the commands above, you might get an empty
list due to session expiration and absence current connections.
For further details on how to configure the switches to be lossless, please refer to
https://community.mellanox.com
3.3 Virtualization
Mellanox Technologies 51
Rev 1.45
Network Virtualization using Generic Routing Encapsulation (NVGRE) offload is currently sup-
ported in Windows Server 2012 R2 with the latest updates for Microsoft.
Mellanox Technologies 52
Rev 1.45
Mellanox Technologies 53
Rev 1.45
Mellanox Technologies 54
Rev 1.45
Step 6. Add customer route on all Hyper-V hosts (same command on all Hyper-V hosts).
PS $ New-NetVirtualizationCustomerRoute -RoutingDomainID "{11111111-2222-3333-4444-
000000005001}" -VirtualSubnetID <virtualsubnetID> -DestinationPrefix <VMInterfaceIPAd-
dress/Mask> -NextHop "0.0.0.0" -Metric 255
Step 7. Configure the Provider Address and Route records on each Hyper-V Host using an appro-
priate interface name and IP address.
PS $ $NIC = Get-NetAdapter <EthInterfaceName>
PS $ New-NetVirtualizationProviderAddress -InterfaceIndex $NIC.InterfaceIndex -Provid-
erAddress <HypervisorInterfaceIPAddress> -PrefixLength 24
Mellanox Technologies 55
Rev 1.45
Step 6. For HyperV running Windows Server 2012 only disable network adapter binding to ms_-
netwnv service
PS $ Disable-NetAdapterBinding <EthInterfaceName>(a) -ComponentID ms_netwnv
<EthInterfaceName> - Physical NIC name
3.3.3.1.1System Requirements
• A server and BIOS with SR-IOV support. BIOS settings might need to be updated to
enable virtualization support and SR-IOV support.
• Hypervisor OS: Windows Server 2012 R2
• Virtual Machine (VM) OS:
• The VM OS can be either Windows Server 2012 and above
• Mellanox ConnectX®-4 VPI Adapter Card family
• Mellanox WinOF-2 1.20 or higher
Mellanox Technologies 56
Rev 1.45
Please, consult BIOS vendor website for SR-IOV supported BIOS versions list. Update
the BIOS version if necessary.
Step 2. Follow BIOS vendor guidelines to enable SR-IOV according to BIOS User Manual.
For example:
a. Enable SR-IOV.
Mellanox Technologies 57
Rev 1.45
Mellanox Technologies 58
Rev 1.45
Mellanox Technologies 59
Rev 1.45
Mellanox Technologies 60
Rev 1.45
3.3.3.2.3 Verifying SR-IOV Support within the Host Operating System (SR-IOV Ethernet Only)
To verify that the system is properly configured for SR-IOV:
Step 1. Go to: Start-> Windows Powershell.
Mellanox Technologies 61
Rev 1.45
Note: If BIOS was updated according to BIOS vendor instructions and you see the mes-
sage displayed in the figure below, update the registry configuration as described in the
(Get-VmHost).IovSupportReasons message.
Step 3. Reboot
Step 4. Verify the system is configured correctly for SR-IOV as described in Steps 1/2.
Mellanox Technologies 62
Rev 1.45
Step 3. Connect the virtual hard disk in the New Virtual Machine Wizard.
Step 4. Go to: Connect Virtual Hard Disk -> Use an existing virtual hard disk.
Step 5. Select the location of the vhd file.
Mellanox Technologies 63
Rev 1.45
Mellanox Technologies 64
Rev 1.45
Configurations: Current
SRIOV_EN N/A
NUM_OF_VFS N/A
WOL_MAGIC_EN_P2 N/A
LINK_TYPE_P1 N/A
LINK_TYPE_P2 N/A
Step 4. Enable SR-IOV with 16 VFs.
> mlxconfig -d mt4115_pciconf0 s SRIOV_EN=1 NUM_OF_VFS=16
All servers are guaranteed to support 16 VFs. Increasing the number of VFs
can lead to exceeding the BIOS limit of MMIO available address space.
Example:
Device #1:
----------
Mellanox Technologies 65
Rev 1.45
Mellanox Technologies 66
Rev 1.45
Mellanox Technologies 67
Rev 1.45
Mellanox Technologies 68
Rev 1.45
To achieve best performance on SR-IOV VF, please run the following powershell commands on
the host:
• For 10Gbe:
• PS $ Set-VMNetworkAdapter -Name "Network Adapter" -VMName vm1 -Iov-
QueuePairsRequested 4
• For 40Gbe and above:
• PS $ Set-VMNetworkAdapter -Name "Network Adapter" -VMName vm1 -Iov-
QueuePairsRe-quested 8
Mellanox Technologies 69
Rev 1.45
3.3.4.2.2 On a VPort
To enable VMMQ on a VPort:
PS $ Set-VMNetworkAdapter -Name "Virtual Adapter Name" -VmmqEnabled $true
To disable VMMQ on a VPort:
PS $ Set-VMNetworkAdapter -Name "Virtual Adapter Name" -VmmqEnabled $false
Since the VMMQ is an offload feature for vRss, vRss must be enabled prior to enabling VMMQ.
Mellanox Technologies 70
Rev 1.45
The number provided to this cmdlet is the requested number of queues per vPort. How-
ever, the OS might decide to not fulfill the request due to some resources and other factors
considerations.
Mellanox Technologies 71
Rev 1.45
Step 4. Add a Network Adapter to the VM in the Hyper-V Manager, and choose the VMSwitch just
created.
Mellanox Technologies 72
Rev 1.45
Step 5. Check the "Enable SR-IOV" option on the "Hardware Acceleration" under the Network
Adapter.
If you turn ON the VM at this time in the VM Device Manager, you should see Mellanox
ConnectX-4 Virtual Adapter under the Network adapters.
Mellanox Technologies 73
Rev 1.45
For example:
xcopy /s c:\Windows \\11.0.0.5\c$\tmp
Mellanox Technologies 74
Rev 1.45
Mellanox Technologies 75
Rev 1.45
Mellanox Technologies 76
Rev 1.45
The port should be disabled after each reboot of the VM to allow traffic.
Mellanox Technologies 77
Rev 1.45
*JumboPacket 1514 The maximum size of a frame (or a packet) that can be sent over
the wire. This is also known as the maximum transmission unit
(MTU). The MTU may have a significant impact on the network's
performance as a large packet can cause high latency. However, it
can also reduce the CPU utilization and improve the wire effi-
ciency. The standard Ethernet frame size is 1514 bytes, but Mella-
nox drivers support wide range of packet sizes.
The valid values are:
• Ethernet: 600 up to 9600
Note: All the devices across the network (switches and routers)
should support the same frame size. Be aware that different net-
work devices calculate the frame size differently. Some devices
include the header, i.e. information in the frame size, while others
do not.
Mellanox adapters do not include Ethernet header information in
the frame size. (i.e when setting *JumboPacket to 1500, the actual
frame size is 1514).
Mellanox Technologies 78
Rev 1.45
*ReceiveBuffers 512 The number of packets each ring receives. This parameter affects
the memory consumption and the performance. Increasing this
value can enhance receive performance, but also consumes more
system memory.
In case of lack of received buffers (dropped packets or out of order
received packets), you can increase the number of received buffers.
The valid values are 256 up to 4096.
*TransmitBuffers 2048 The number of packets each ring sends. Increasing this value can
enhance transmission performance, but also consumes system
memory.
The valid values are 256 up to 4096.
Mellanox Technologies 79
Rev 1.45
LSOSize 64000 The maximum number of bytes that the TCP/IP stack can pass to
an adapter in a single packet.
This value affects the memory consumption and the NIC perfor-
mance.
The valid values are MTU+1024 up to 64000.
Note: This registry key is not exposed to the user via the UI.
If LSOSize is smaller than MTU+1024, LSO will be disabled.
LSOMinSegment 2 The minimum number of segments that a large TCP packet must be
divisible by, before the transport can offload it to a NIC for seg-
mentation.
The valid values are 2 up to 32.
Note: This registry key is not exposed to the user via the UI.
LSOTcpOptions 1 Enables that the miniport driver to segment a large TCP packet
whose TCP header contains TCP options.
The valid values are:
• 0: disable
• 1: enable
Note: This registry key is not exposed to the user via the UI.
LSOIpOptions 1 Enables its NIC to segment a large TCP packet whose IP header
contains IP options.
The valid values are:
• 0: disable
• 1: enable
Note: This registry key is not exposed to the user via the UI.
Mellanox Technologies 80
Rev 1.45
Mellanox Technologies 81
Rev 1.45
Mellanox Technologies 82
Rev 1.45
*InterruptModeration 1 Sets the rate at which the controller moderates or delays the
generation of interrupts, making it possible to optimize net-
work throughput and CPU utilization. When disabled, the
interrupt moderation of the system generates an interrupt
when the packet is received. In this mode, the CPU utiliza-
tion is increased at higher data rates, because the system
must handle a larger number of interrupts. However, the
latency is decreased, since that packet is processed more
quickly.
When interrupt moderation is enabled, the system accumu-
lates interrupts and sends a single interrupt rather than a
series of interrupts.
The valid values are:
• 0: disable
• 1: enable
RxIntModeration 2 Sets the rate at which the controller moderates or delays the
generation of interrupts, making it possible to optimize net-
work throughput and CPU utilization. The default setting
(Adaptive) adjusts the interrupt rates dynamically, depend-
ing on traffic type and network usage. Choosing a different
setting may improve network and system performance in
certain configurations.
The valid values are:
• 1: static
• 2: adaptive
The interrupt moderation count and time are configured
dynamically, based on traffic types and rate.
Mellanox Technologies 83
Rev 1.45
TxIntModeration 2 Sets the rate at which the controller moderates or delays the
generation of interrupts, making it possible to optimize net-
work throughput and CPU utilization. The default setting
(Adaptive) adjusts the interrupt rates dynamically, depend-
ing on traffic type and network usage. Choosing a different
setting may improve network and system performance in
certain configurations.
The valid values are:
• 1: static
• 2: adaptive
The interrupt moderation count and time are configured
dynamically, based on traffic types and rate.
*RSS 1 Sets the driver to use Receive Side Scaling (RSS) mode to
improve the performance of handling incoming packets.
This mode allows the adapter port to utilize the multiple
CPUs in a multi-core system for receiving incoming packets
and steering them to their destination. RSS can significantly
improve the number of transactions per second, the number
of connections per second, and the network throughput.
This parameter can be set to one of two values:
• 1: enable (default)
Sets RSS Mode.
• 0: disable
The hardware is configured once to use the Toeplitz
hash function and the indirection table is never changed.
ThreadPoll 3000 The number of cycles that should be passed without receiv-
ing any packet before the polling mechanism stops when
using polling completion method for receiving. Afterwards,
receiving new packets will generate an interrupt that
reschedules the polling mechanism.
The valid values are 0 up to 200000.
*NumRSSQueues 8 The maximum number of the RSS queues that the device
should use.
Mellanox Technologies 84
Rev 1.45
RoceMaxFrameSize 1024 The maximum size of a frame (or a packet) that can be sent by the
RoCE protocol (a.k.a Maximum Transmission Unit (MTU).
Using larger RoCE MTU will improve the performance; however,
one must ensure that the entire system, including switches, sup-
ports the defined MTU.
Ethernet packet uses the general MTU value, whereas the RoCE
packet uses the RoCE MTU
The valid values are:
• 256
• 512
• 1024
• 2048
*PriorityVLANTag 3 (Packet Pri- Enables sending and receiving IEEE 802.3ac tagged frames, which
ority & VLAN include:
Enabled) • 802.1p QoS (Quality of Service) tags for priority-tagged pack-
ets.
• 802.1Q tags for VLANs.
When this feature is enabled, the Mellanox driver supports sending
and receiving a packet with VLAN and QoS tag.
Mellanox Technologies 85
Rev 1.45
*VMQ 1 The support for the virtual machine queue (VMQ) features of the
network adapter.
The valid values are:
• 1: enable
• 0: disable
*VMQVlanFiltering 1 Specifies whether the device enables or disables the ability to filter
network packets by using the VLAN identifier in the media access
control (MAC) header.
The valid values are:
• 0: disable
• 1: enable
Mellanox Technologies 86
Rev 1.45
Mellanox Technologies 87
Rev 1.45
Letter Usage
-c Client side
-a Address
-p Port
Debug Extensions:
Letter Usage
-v Displays ping data to stdout every test cycle
-V Validates ping data every test cycle
-d Shows debug prints to stdout
-S Indicates ping data size - must be < (64*1024)
-C Indicates the number of ping cycles to perform
Example:
Linux server:
rping -v -s -a <IP address> -C 10
Windows client:
nd_rping -v -c -a <same IP as above> -C 10
Mellanox Technologies 88
Rev 1.45
Mellanox Technologies 89
Rev 1.45
All devices on the same physical network, or on the same logical network, must have
the same MTU.
• Receive Buffers
The number of receive buffers (default 512).
• Send Buffers
The number of sent buffers (default 2048).
• Performance Options
Configures parameters that can improve adapter performance.
• Interrupt Moderation
Moderates or delays the interrupts’ generation. Hence, optimizes network throughput and CPU uti-
lization (default Enabled).
Mellanox Technologies 90
Rev 1.45
• When the interrupt moderation is enabled, the system accumulates interrupts and sends a single interrupt
rather than a series of interrupts. An interrupt is generated after receiving 5 packets or after 10ms from
the first packet received. It improves performance and reduces CPU load however, it increases latency.
• When the interrupt moderation is disabled, the system generates an interrupt each time a packet is
received or sent. In this mode, the CPU utilization data rates increase, as the system handles a larger
number of interrupts. However, the latency decreases as the packet is handled faster.
• Receive Side Scaling (RSS Mode)
Improves incoming packet processing performance. RSS enables the adapter port to utilize the
multiple CPUs in a multi-core system for receiving incoming packets and steering them to the des-
ignated destination. RSS can significantly improve the number of transactions, the number of con-
nections per second, and the network throughput.
This parameter can be set to one of the following values:
• Enabled (default): Set RSS Mode
• Disabled: The hardware is configured once to use the Toeplitz hash function, and the indirection table is
never changed.
Mellanox Technologies 91
Rev 1.45
Enables the adapter to compute IPv4 checksum upon transmit and/or receive instead of the CPU
(default Enabled).
• TCP/UDP Checksum Offload for IPv4 packets
Enables the adapter to compute TCP/UDP checksum over IPv4 packets upon transmit and/or
receive instead of the CPU (default Enabled).
• TCP/UDP Checksum Offload for IPv6 packets
Enables the adapter to compute TCP/UDP checksum over IPv6 packets upon transmit and/or
receive instead of the CPU (default Enabled).
• Large Send Offload (LSO)
Allows the TCP stack to build a TCP message up to 64KB long and sends it in one call down the
stack. The adapter then re-segments the message into multiple TCP packets for transmission on the
wire with each pack sized according to the MTU. This option offloads a large amount of kernel
processing time from the host CPU to the adapter.
Bytes IN
Bytes Received Shows the number of bytes received by the adapter. The counted bytes
include framing characters.
Bytes Received/Sec Shows the rate at which bytes are received by the adapter. The counted
bytes include framing characters.
Packets Received Shows the number of packets received by ConnectX-4 and ConnectX-4
Lx network interface.
Mellanox Technologies 92
Rev 1.45
Packets Received/Sec Shows the rate at which packets are received by ConnectX-4 and Con-
nectX-4 Lx network interface.
Bytes Sent Shows the number of bytes sent by the adapter. The counted bytes include
framing characters.
Bytes Sent/Sec Shows the rate at which bytes are sent by the adapter. The counted bytes
include framing characters.
Packets Sent Shows the number of packets sent by ConnectX-4 and ConnectX-4 Lx
network interface.
Packets Sent/Sec Shows the rate at which packets are sent by ConnectX-4 and ConnectX-4
Lx network interface.
Bytes’ TOTAL
Bytes Total Shows the total of bytes handled by the adapter. The counted bytes
include framing characters.
Bytes Total/Sec Shows the total rate of bytes that are sent and received by the adapter. The
counted bytes include framing characters.
Packets Total Shows the total of packets handled by ConnectX-4 and ConnectX-4 Lx
network interface.
Packets Total/Sec Shows the rate at which packets are sent and received by ConnectX-4 and
ConnectX-4 Lx network interface.
Packets Outbound Errorsa Shows the number of outbound packets that could not be transmitted
because of errors found in the physical layer.
Packets Outbound Discardeda Shows the number of outbound packets to be discarded in the physical
layer, even though no errors had been detected to prevent transmission.
One possible reason for discarding packets could be to free up some buf-
fer space.
Packets Received Errorsa Shows the number of inbound packets that contained errors in the physi-
cal layer, preventing them from being deliverable.
Packets Received with Frame Shows the number of inbound packets that contained error where the
Length Error frame has length error. Packets received with frame length error are a sub-
set of packets received errors.
Packets Received with Symbol Shows the number of inbound packets that contained symbol error or an
Error invalid block. Packets received with symbol error are a subset of packets
received errors.
Mellanox Technologies 93
Rev 1.45
Packets Received with Bad CRC Shows the number of inbound packets that failed the CRC check. Packets
Error received with bad CRC error are a subset of packets received errors.
Packets Received Discardeda Shows the number of inbound packets that were chosen to be discarded in
the physical layer, even though no errors had been detected to prevent
their being deliverable. One possible reason for discarding such a packet
could be a buffer overflow.
RSC Aborts Number of RSC abort events. That is, the number of exceptions other
than the IP datagram length being exceeded. This includes the cases
where a packet is not coalesced because of insufficient hardware
resources.
RSC Coalesced Events Number of RSC coalesced events. That is, the total number of packets
that were formed from coalescing packets.
RSC Average Packet Size RSC Average Packet Size is the average size in bytes of received packets
across all TCP connections.
a. Those error/discard counters are related to layer-2 issues, such as CRC, length, and type errors. There is a possi-
bility of an error/discard in the higher interface level. For example, a packet can be discarded for the lack of a
receive buffer. To see the sum of all error/discard packets, read the Windows Network-Interface Counters. Note
that for IPoIB, the Mellanox counters are for IB layer-2 issues only, and Windows Network-Interface counters are
for interface level issues.
Mellanox Technologies 94
Rev 1.45
Bytes/Packets IN
Bytes Received/Sec Shows the rate at which bytes are received over each net-
work VPort. The counted bytes include framing characters.
Bytes Received Unicast/Sec Shows the rate at which subnet-unicast bytes are delivered
to a higher-layer protocol.
Bytes Received Broadcast/Sec Shows the rate at which subnet-broadcast bytes are deliv-
ered to a higher-layer protocol.
Bytes Received Multicast/Sec Shows the rate at which subnet-multicast bytes are delivered
to a higher-layer protocol.
Packets Received Unicast/Sec Shows the rate at which subnet-unicast packets are delivered
to a higher-layer protocol.
Packets Received Broadcast/Sec Shows the rate at which subnet-broadcast packets are deliv-
ered to a higher-layer protocol.
Packets Received Multicast/Sec Shows the rate at which subnet-multicast packets are deliv-
ered to a higher-layer protocol.
Bytes/Packets IN
Bytes Sent/Sec Shows the rate at which bytes are sent over each network
VPort. The counted bytes include framing characters.
Bytes Sent Unicast/Sec Shows the rate at which bytes are requested to be transmit-
ted to subnet-unicast addresses by higher-level protocols.
The rate includes the bytes that were discarded or not sent.
Bytes Sent Broadcast/Sec Shows the rate at which bytes are requested to be transmit-
ted to subnet-broadcast addresses by higher-level protocols.
The rate includes the bytes that were discarded or not sent.
Bytes Sent Multicast/Sec Shows the rate at which bytes are requested to be transmit-
ted to subnet-multicast addresses by higher-level protocols.
The rate includes the bytes that were discarded or not sent.
Packets Sent Unicast/Sec Shows the rate at which packets are requested to be trans-
mitted to subnet-unicast addresses by higher-level protocols.
The rate includes the packets that were discarded or not sent.
Mellanox Technologies 95
Rev 1.45
Packets Sent Broadcast/Sec Shows the rate at which packets are requested to be trans-
mitted to subnet-broadcast addresses by higher-level proto-
cols. The rate includes the packets that were discarded or not
sent.
Packets Sent Multicast/Sec Shows the rate at which packets are requested to be trans-
mitted to subnet-multicast addresses by higher-level proto-
cols. The rate includes the packets that were discarded or not
sent.
ERRORS, DISCARDED
Packets Outbound Discarded Shows the number of outbound packets to be discarded even
though no errors had been detected to prevent transmission.
One possible reason for discarding a packet could be to free
up buffer space.
Packets Outbound Errors Shows the number of outbound packets that could not be
transmitted because of errors.
Packets Received Discarded Shows the number of inbound packets that were chosen to
be discarded even though no errors had been detected to pre-
vent their being deliverable to a higher-layer protocol. One
possible reason for discarding such a packet could be to free
up buffer space.
Packets Received Errors Shows the number of inbound packets that contained errors
preventing them from being deliverable to a higher-layer
protocol.
Bytes/Packets IN
Bytes Received The number of bytes received that are covered by this priority.
The counted bytes include framing characters (modulo 2^64).
Bytes Received/Sec The number of bytes received per second that are covered by
this priority. The counted bytes include framing characters.
Packets Received The number of packets received that are covered by this priority
(modulo 2^64).
Packets Received/Sec The number of packets received per second that are covered by
this priority.
Mellanox Technologies 96
Rev 1.45
Bytes/Packets OUT
Bytes Sent The number of bytes sent that are covered by this priority. The
counted bytes include framing characters (modulo 2^64).
Bytes Sent/Sec The number of bytes sent per second that are covered by this
priority. The counted bytes include framing characters.
Packets Sent The number of packets sent that are covered by this priority
(modulo 2^64).
Packets Sent/Sec The number of packets sent per second that are covered by this
priority.
Bytes Total The total number of bytes that are covered by this priority. The
counted bytes include framing characters (modulo 2^64).
Bytes Total/Sec The total number of bytes per second that are covered by this
priority. The counted bytes include framing characters.
Packets Total The total number of packets that are covered by this priority
(modulo 2^64).
Packets Total/Sec The total number of packets per second that are covered by this
priority.
PAUSE INDICATION
Sent Pause Frames The total number of pause frames sent from this priority to the
far-end port.
The untagged instance indicates the number of global pause
frames that were sent.
Sent Pause Duration The total duration of packets transmission being paused on this
priority in microseconds.
Received Pause Frames The number of pause frames that were received to this priority
from the far-end port.
The untagged instance indicates the number of global pause
frames that were received.
Received Pause Duration The total duration that far-end port was requested to pause for
the transmission of packets in microseconds.
Mellanox Technologies 97
Rev 1.45
RDMA Completion Queue Errors This counter is not supported, and always is set to zero.
RDMA Connection Errors The number of established connections with an error before a consumer
disconnected the connection.
RDMA Failed Connection The number of inbound and outbound RDMA connection attempts that
Attempts failed.
RDMA Inbound Bytes/sec The number of bytes for all incoming RDMA traffic. This includes
additional layer two protocol overhead.
RDMA Inbound Frames/sec The number, in frames, of layer two frames that carry incoming RDMA
traffic.
RDMA Outbound Bytes/sec The number of bytes for all outgoing RDMA traffic. This includes addi-
tional layer two protocol overhead.
RDMA Outbound Frames/sec The number, in frames, of layer two frames that carry outgoing RDMA
traffic.
Notification Point
Notification Point – CNPs Sent Successfully Number of congestion notification packets (CNPs)
successfully sent by the notification point.
Notification Point – RoCEv2 ECN Marked Packets Number of RoCEv2 packets that were marked as
congestion encountered.
Reaction Point
Reaction Point – Current Number of Flows Current number of Rate Limited Flows due to
RoCEv2 Congestion Control.
Mellanox Technologies 98
Rev 1.45
Reaction Point – Ignored CNP Packets Number of ignored congestion notification packets
(CNPs).
Reaction Point – Number of Flows over Time Number of rate limited flows multiple the rate limit-
ing time.
Reaction Point – Successfully Handled CNP Packets Number of congestion notification packets (CNPs)
received and handled successfully.
Link State Change Events Number of link status updates received from HW.
Send Completions in Passive/Sec Number of send completion events handled in passive mode
per second.
Copied Send Packets Number of send packets that were copied in slow path.
Correct Checksum Packets In Slow Path Number of receive packets that required the driver to per-
form the checksum calculation and resulted in success.
Bad Checksum Packets In Slow Path Number of receive packets that required the driver to per-
form checksum calculation and resulted in failure.
Undetermined Checksum Packets In Slow Path Number of receive packets with undetermined checksum
result.
Mellanox Technologies 99
Rev 1.45
4 Utilities
4.1 Fabric Performance Utilities
The performance utilities described in this chapter are intended to be used as a performance
micro-benchmark. They support both InfiniBand and RoCE.
For further information on the following tools, please refer to the help text of the tool by
running the --help command line parameter.
Utility Description
Utility Description
nd_send_bw This test is used for performance measuring of Send requests in Micro-
soft Windows Operating Systems. nd_send_bw is performance ori-
ented for Send with maximum throughput, and runs over Microsoft's
NetworkDirect standard. The level of customizing for the user is rela-
tively high. User may choose to run with a customized message size,
customized number of iterations, or alternatively, customized test dura-
tion time. nd_send_bw runs with all message sizes from 1B to 4MB
(powers of 2), message inlining, CQ moderation.
nd_send_lat This test is used for performance measuring of Send requests in Micro-
soft Windows Operating Systems. nd_send_lat is performance oriented
for Send with minimum latency, and runs over Microsoft's NetworkDi-
rect standard. The level of customizing for the user is relatively high.
User may choose to run with a customized message size, customized
number of iterations, or alternatively, customized test duration time.
nd_send_lat runs with all message sizes from 1B to 4MB (powers of
2), message inlining, CQ moderation.
The following InfiniBand performance tests are deprecated and might be removed in
future releases.
• The PCI information can be queried from the “General” properties tab under “Loca-
tion”.
Example:
If the “Location” is “PCI Slot 3 (PCI bus 8, device 0, function 0)”, run the following command:
Mlx5cmd.exe -Mstdump -bdf 8.0.0
• The output will indicate the files location.
Example:
“Mstdump succeeded. Dump files for device at location 8.0.0 were created in c:\windows\temp
directory.”
It is highly recommended to add this report when you contact the support team.
Once the report is ready, the folder which contains the report will open automatically.
5 Troubleshooting
You may be able to easily resolve the issues described in this section. If a problem persists and
you are unable to resolve it, please contact your Mellanox representative or Mellanox Support at
support@mellanox.com.
The installation of An incorrect driver version Use the correct driver package accord-
WinOF-2 fails with the might have been installed, ing to the CPU architecture.
following error mes- e.g., you are trying to install
sage: a 64-bit driver on a 32-bit
“This installation machine (or vice versa).
package is not sup-
ported by this pro-
cessor type. Contact
your product ven-
dor".
For additional details on Windows installer return codes, please refer to:
http://support.microsoft.com/kb/229683
Low performance Non-optimal system con- See section “Performance Tuning and
figuration might have Counters” on page 89. to take advantage
occurred. of Mellanox 10/40/56 GBit NIC perfor-
mance.
The driver fails to start. There might have been an 1. Open the event log and look under
RSS configuration mis- "System" for the "mlx5" source.
match between the TCP 2. If found, enable RSS, run: "netsh
stack and the Mellanox int tcp set global rss =
adapter. enabled".
or a less recommended suggestion (as it
will cause low performance):
• Disable RSS on the adapter, run:
"netsh int tcp set global rss
= no dynamic balancing".
The driver fails to start Look into the Event Viewer • If the failure occurred due to unsup-
and a yellow sign to view the error. ported mode type, refer to
appears near the "Mel- Section 3.1.1, “Mode Configuration”
lanox ConnectX-4 for the solution.
Adapter <X>" in the • If the solution isn't mentioned in
Device Manager dis- event viewer, disable and re-enable
play. (Code 10) "Mellanox ConnectX-4 Adapter
<X>" from the Device Manager dis-
play. If the failure resumes, please
refer to Mellanox support at
support@mellanox.com.
No connectivity to a The network capture tool Close the network capture tool on the
Fault Tolerance team might have captured the physical adapter card, and set it on the
while using network network traffic of the non- team interface instead.
capture tools (e.g., active adapter in the team.
Wireshark). This is not allowed since
the tool sets the packet filter
to "promiscuous", thus
causing traffic to be trans-
ferred on multiple inter-
faces.
No Ethernet connectiv- A TcpWindowSize registry • Remove the value key under
ity on 10Gb adapters value might have been HKEY_LOCAL_MACHINE\SYSTEM\Cur-
after activating Perfor- added. rentControlSet\Ser-
vices\Tcpip\Parameters\TcpWind
mance Tuning (part of
owSize
the installation). Or
• Set its value to 0xFFFF.
Packets are being lost. The port MTU might have Change the MTU according to the maxi-
been set to a value higher mum MTU supported by the switch.
than the maximum MTU
supported by the switch.
NVGRE changes done The configuration changes Stop the VM and afterwards perform
on a running VM, are might not have taken effect any NVGRE configuration changes on
not propagated to the until the OS is restarted. the VM connected to the virtual switch.
VM.
Low performance issues The OS profile might not be 1. Go to "Power Options" in the "Con-
configured for maximun trol Panel". Make sure "Maximum
performace. Performance" is set as the power
scheme
2. Reboot the machine.
• Flow control on the device Mellanox ConnectX-4 VPI Adapter <X> wasn't enabled.
Therefore, RoCE cannot function properly. To resolve this issue, please make sure that
flow control is configured on both the hosts and switches in your network. For more
details, please refer to the user manual.
• Mellanox ConnectX-4 VPI Adapter <X> device is configured with a MAC address desig-
nated as a multicast address: <Y>.
Please configure the registry value NetworkAddress with another address, then restart the
driver.
• The miniport driver initiates reset on device Mellanox ConnectX-4 VPI Adapter <X>.
• NDIS initiates reset on device Mellanox ConnectX-4 VPI Adapter <X>.
• Reset on device Mellanox ConnectX-4 VPI Adapter <X> has finished.
• Mellanox ConnectX-4 VPI Adapter <X> has got:
• vendor_id <Y>
• device_id <Z>
• subvendor_id <F>
• subsystem_id <L>
• HW revision <M>
• FW version <R>.<G>.<Q>
• port type <N>
• Mellanox ConnectX-4 VPI Adapter <X>: QUERY_HCA_CAP command fails with error
<Y>.
The adapter card is dysfunctional.
Most likely a FW problem.
Please burn the last FW and restart the Mellanox ConnectX device.
• Mellanox ConnectX-4 VPI Adapter <X>: QUERY_ADAPTER command fails with error
<Y>.
The adapter card is dysfunctional.
Most likely a FW problem.
Please burn the last FW and restart the Mellanox ConnectX device.
• Mellanox ConnectX-4 VPI Adapter <X>: The number of allocated MSI-X vectors is less
than recommended. This may decrease the network performance.
The number of requested MSI-X vectors is: <Y> while the number of allocated MSI-X
vectors is: <Z>.
• Mellanox ConnectX-4 VPI Adapter <X>: FW command fails. op 0x<Y>, status 0x<Z>,
errno <F>, syndrome 0x<L>.
• Too many IPs in-use for RRoCE.
Mellanox ConnectX-4 VPI Adapter <X>: RRoCE supports only <Y> IPs per port.
Please reduce the number of IPs to use the new IPs.
• Mellanox ConnectX-4 VPI Adapter <X>: Driver startup fails because an insufficient num-
ber of Event Queues (EQs) is available.
(<Y> are required, <Z> are recommended, <M> are available)
• Mellanox ConnectX-4 VPI Adapter <X>: Execution of FW command fails. op 0x<Y>,
errno <Z>.
• Mellanox ConnectX-4 VPI Adapter <X>: Driver startup has failed due to unsupported
port type=<Y> configured on the device.
The driver supports Ethernet mode only, please refer to the Mellanox WinOF-2 User Man-
ual for instructions on how to configure the correct mode.
• Mellanox ConnectX-4 VPI Adapter <X>: Driver startup fails because minimal driver
requirements are not supported by FW <Y>.<Z>.<L>.
FW reported:
• rss_ind_tbl_cap <Q>
• vlan_cap <M>
• max_rqs <F>
• max_sqs <N>
• max_tirs <O>
Please burn a firmware that supports the requirements and restart the Mellanox ConnectX device.
For additional information, please refer to Support information on http://mellanox.com.
• Mellanox ConnectX-4 VPI Adapter <X>: Driver startup fails because maximum flow
table size that is supported by FW <Y>.<Z>.<L> is too small (<K> entries).
Please burn a firmware that supports a greater flow table size and restart the Mellanox
ConnectX device. For additional information, please refer to Support information on http:/
/mellanox.com.
• Mellanox ConnectX-4 VPI Adapter <X>: Driver startup fails because required receive
WQE size is greater than the maximum WQEs size supported by FW <Y>.<Z>.<M>.
(<F> are required, <O> are supported)
• Mellanox ConnectX-4 VPI Adapter <X>: Driver startup fails because maximum WQE
size that is supported by FW <Y>.<L>.<M> is too small (<K>).
Please burn a firmware that supports a greater WQE size and restart the Mellanox Con-
nectX device. For additional information, please refer to Support information on http://
mellanox.com
• Mellanox ConnectX-4 VPI Adapter <X>: CQ moderation is not supported by FW
<Y>.<Z>.<L>.
• Mellanox ConnectX-4 VPI Adapter <X>: CQ to EQ remap is not supported by FW
<Y>.<Z>.<L>.
• Mellanox ConnectX-4 VPI Adapter <X>: VPort counters are not supported by FW
<Y>.<Z>.<L>.
• Mellanox ConnectX-4 VPI Adapter <X>: LSO is not supported by FW <Y>.<Z>.<L>.
• Mellanox ConnectX-4 VPI Adapter <X>: Checksum offload is not supported by FW
<Y>.<Z>.<L>.
• NDIS initiated reset on device Mellanox ConnectX-4 VPI Adapter <X> has failed.
• Mellanox ConnectX-4 VPI Adapter <X>: mstdump %System-
Root%\Temp\<Y>_<Z><L>_<M>_<F>_<O>.log was created after fatal error.
• Mellanox ConnectX-4 VPI Adapter <X>: mstdump %System-
Root%\Temp\<Y>_<Z><L>_<M>_<F>_<O>.log was created after OID request.
• Mellanox ConnectX-4 VPI Adapter <X> Physical/Virtual function drivers compatibility
issue <Y>.
• Mellanox ConnectX-4 VPI Adapter <X> (module <Y>) detects that the link is down.
Cable is unplugged. Please connect the cable to continue working.
• Mellanox ConnectX-4 VPI Adapter <X> (module <Y>) detects that the link is down.
Cable is unplugged. Please connect the cable to continue working.
• Mellanox ConnectX-4 VPI Adapter <X> Setting QoS port default priority is not allowed
on a virtual device. This adapter will use the default priority <Y>.
• Mellanox ConnectX-4 VPI Adapter <X>: FW health report - ver <Y>, hw <Z>, callra
<A>, var[1] <B> synd <C>.
• Mellanox ConnectX-4 VPI Adapter <X>: Adapter failed to initialize due to FW initializa-
tion timeout.
• Mellanox ConnectX-4 VPI Adapter <X>: Setting QoS port default priority is not allowed
on a virtual device. This adapter will use the default priority <Y>.
• Mellanox ConnectX-4 VPI Adapter <X> failed to set port default priority to <Y>. This
adapter will use the default priority <Z>.
• Mellanox ConnectX-4 VPI Adapter <X>: ECN is not allowed on a virtual device.
• ECN was enabled for adapter Mellanox ConnectX-4 VPI Adapter <X> but FW
<Y>.<Z>.<W> does not support it. ECN congestion control will not be enabled for this
adapter. Please burn a newer firmware. For more details, please refer to the user manual
document.
• Mellanox ConnectX-4 VPI Adapter <X> failed to set ECN RP/NP congestion control
parameters. This adapter will use default ECN RP/NP congestion control values. Please
verify the ECN configuration and then restart the adapter.
• Mellanox ConnectX-4 VPI Adapter <X> failed to enable ECN RP/NP congestion control
for priority <Y>. This adapter will continue without ECN <Y> congestion control for this
priority. Please verify the ECN configuration and then restart the adapter.
• Mellanox ConnectX-4 VPI Adapter <X>: mstdump SystemRoot\Temp\<Y>.log was cre-
ated after a timeout on TxQueue.
• Mellanox ConnectX-4 VPI Adapter <X>: mstdump SystemRoot\Temp\<Y>.log was cre-
ated after a timeout on RxQueue.
• Mellanox ConnectX-4 VPI Adapter <X>: Ecn RP attributes:
• EcnClampTgtRate = <Y>
• EcnClampTgtRateAfterTimeInc = <Z>
• EcnRpgTimeReset = <E>
• EcnRpgByteReset = <L>
• EcnRpgThreshold = <M>
• EcnRpgAiRate = <N>
• EcnRpgHaiRate = <R>
• EcnAlphaToRateShift = <W>
• EcnRpgMinDecFac = <G>
• EcnRpgMinRate = <Q>
• EcnRateToSetOnFirstCnp = <F>
• EcnDceTcpG = <V>
• EcnDceTcpRtt = <O>
• EcnRateReduceMonitorPeriod = <K>
• EcnInitialAlphaValue = <J>
• Mellanox ConnectX-4 VPI Adapter <X>: Ecn NP attributes:
• EcnMinTimeBetweenCnps = <Y>
• EcnCnpDscp = <Z>
• EcnCnpPrioMode = <V>
• EcnCnp802pPrio = <W>
• Mellanox ConnectX-4 VPI Adapter <X>: FW health report - ver <Y>, hw <Z>, callra
<W>, var1 <K>, synd <K>.
• Mellanox ConnectX-4 VPI Adapter <X>: RDMA device initialization failure <Y>. This
adapter will continue running in Ethernet only mode.
• Mellanox ConnectX-4 VPI Adapter <X>:
mstdump %SystemRoot%\Temp\<A>_<B>_<C>_<D>_<E>_<F>.log was created after
changed of link state.
Defa
Event Type Description Provider Tag
ult
where
Example:
Name: SingleFunc_4_0_0_p000_eth-down-1_eq_dump_0.log
The default number of sets of files for each event is 20. It can be changed by adding DumpE-
ventsNum DWORD32 parameter under HKLM\System\CurrnetControlSet\Services\mlx4_-
bus\Parameters and setting it to another value.
# Step 2. Configure a Subnet Locator and Route records on each Hyper-V Host (Host 1 and
Host 2) mtlae14 & mtlae15
New-NetVirtualizationLookupRecord -CustomerAddress 172.16.14.5 -ProviderAddress
192.168.20.114 -VirtualSubnetID 5001 -MACAddress "00155D720100" -Rule "TranslationMetho-
dEncap"
New-NetVirtualizationLookupRecord -CustomerAddress 172.16.14.6 -ProviderAddress
192.168.20.114 -VirtualSubnetID 5001 -MACAddress "00155D720101" -Rule "TranslationMetho-
dEncap"
New-NetVirtualizationLookupRecord -CustomerAddress 172.16.15.5 -ProviderAddress
192.168.20.115 -VirtualSubnetID 5001 -MACAddress "00155D730100" -Rule "TranslationMetho-
dEncap"
New-NetVirtualizationLookupRecord -CustomerAddress 172.16.15.6 -ProviderAddress
192.168.20.115 -VirtualSubnetID 5001 -MACAddress "00155D730101" -Rule "TranslationMetho-
dEncap"
# Add customer route
New-NetVirtualizationCustomerRoute -RoutingDomainID "{11111111-2222-3333-4444-
000000005001}" -VirtualSubnetID "5001" -DestinationPrefix "172.16.0.0/16" -NextHop
"0.0.0.0" -Metric 255
# Step 3. Configure the Provider Address and Route records on Hyper-V Host 1 (Host 1
Only) mtlae14
$NIC = Get-NetAdapter "Port1"
New-NetVirtualizationProviderAddress -InterfaceIndex $NIC.InterfaceIndex -ProviderAd-
dress 192.168.20.114 -PrefixLength 24
New-NetVirtualizationProviderRoute -InterfaceIndex $NIC.InterfaceIndex -Destination-
Prefix "0.0.0.0/0" -NextHop 192.168.20.1
# Step 5. Configure the Virtual Subnet ID on the Hyper-V Network Switch Ports for each
Virtual Machine on each Hyper-V Host (Host 1 and Host 2)
# Run the command below for each VM on the host the VM is running on it, i.e. the for
mtlae14-005, mtlae14-006 on
# host 192.168.20.114 and for VMs mtlae15-005, mtlae15-006 on host 192.168.20.115
# mtlae14 only
Get-VMNetworkAdapter -VMName mtlae14-005 | where {$_.MacAddress –eq "00155D720100"} |
Set-VMNetworkAdapter -VirtualSubnetID 5001
Get-VMNetworkAdapter -VMName mtlae14-006 | where {$_.MacAddress –eq "00155D720101"} |
Set-VMNetworkAdapter -VirtualSubnetID 5001
# ------- The commands from Step 2 - 4 are not persistent, Its suggested to create
script is running after each OS reboot
# Step 2. Configure a Subnet Locator and Route records on each Hyper-V Host (Host 1 and
Host 2) mtlae14 & mtlae15
New-NetVirtualizationLookupRecord -CustomerAddress 172.16.14.5 -ProviderAddress
192.168.20.114 -VirtualSubnetID 5001 -MACAddress "00155D720100" -Rule "TranslationMetho-
dEncap"
New-NetVirtualizationLookupRecord -CustomerAddress 172.16.14.6 -ProviderAddress
192.168.20.114 -VirtualSubnetID 5001 -MACAddress "00155D720101" -Rule "TranslationMetho-
dEncap"
New-NetVirtualizationLookupRecord -CustomerAddress 172.16.15.5 -ProviderAddress
192.168.20.115 -VirtualSubnetID 5001 -MACAddress "00155D730100" -Rule "TranslationMetho-
dEncap"
New-NetVirtualizationLookupRecord -CustomerAddress 172.16.15.6 -ProviderAddress
192.168.20.115 -VirtualSubnetID 5001 -MACAddress "00155D730101" -Rule "TranslationMetho-
dEncap"
# Add customer route
New-NetVirtualizationCustomerRoute -RoutingDomainID "{11111111-2222-3333-4444-
000000005001}" -VirtualSubnetID "5001" -DestinationPrefix "172.16.0.0/16" -NextHop
"0.0.0.0" -Metric 255
# Step 4. Configure the Provider Address and Route records on Hyper-V Host 2 (Host 2
Only) mtlae15
$NIC = Get-NetAdapter "Port1"
New-NetVirtualizationProviderAddress -InterfaceIndex $NIC.InterfaceIndex -ProviderAd-
dress 192.168.20.115 -PrefixLength 24
New-NetVirtualizationProviderRoute -InterfaceIndex $NIC.InterfaceIndex -Destination-
Prefix "0.0.0.0/0" -NextHop 192.168.20.1
# Step 5. Configure the Virtual Subnet ID on the Hyper-V Network Switch Ports for each
Virtual Machine on each Hyper-V Host (Host 1 and Host 2)
# Run the command below for each VM on the host the VM is running on it, i.e. the for
mtlae14-005, mtlae14-006 on
# host 192.168.20.114 and for VMs mtlae15-005, mtlae15-006 on host 192.168.20.115
# mtlae15 only
Get-VMNetworkAdapter -VMName mtlae15-005 | where {$_.MacAddress –eq "00155D730100"} |
Set-VMNetworkAdapter -VirtualSubnetID 5001
Get-VMNetworkAdapter -VMName mtlae15-006 | where {$_.MacAddress –eq "00155D730101"} |
Set-VMNetworkAdapter -VirtualSubnetID 5001
Step 3. [Recommended] Direct ALL TCP/UDP traffic to a lossy priority by using the “IPProtocol-
MatchCondition”.
TCP is being used for MPI control channel (smpd), while UDP is being used for other
services such as remote-desktop.
Arista switches forwards the pcp bits (e.g. 802.1p priority within the vlan tag) from ingress to
egress to enable any two End-Nodes in the fabric as to maintain the priority along the route.
In this case the packet from the sender goes out with priority X and reaches the far end-node with
the same priority X.
To force MSMPI to work over ND and not over sockets, add the following in mpiexec com-
mand:
-env MPICH_DISABLE_ND 0 -env MPICH_DISABLE_SOCK 1
• Create a Quality of Service (QoS) policy and tag each type of traffic with the relevant
priority.
In this example we used TCP/UDP priority 1, ND/NDK priority 3.
New-NetQosPolicy “SMB" -NetDirectPortMatchCondition 445 -PriorityValue8021Action 3
New-NetQosPolicy “DEFAULT" -Default -PriorityValue8021Action 3
New-NetQosPolicy “TCP" -IPProtocolMatchCondition TCP -PriorityValue8021Action1
New-NetQosPolicy “UDP" -IPProtocolMatchCondition UDP -PriorityValue8021Action 1
• Enable PFC on priority 3.
Enable-NetQosFlowControl 3
• Disable Priority Flow Control (PFC) for all other priorities except for 3.
Disable-NetQosFlowControl 0,1,2,4,5,6,7
• Enable QoS on the relevant interface.
Enable-netadapterqos -Name