This objective focuses purely on Networking, whether Physical or Virtual. This will mostly cover on Network Redundancy and some design best practices
There are different traffic types in VMware environment:
- Management Traffic
- Virtual Machine Traffic
- vMotion Traffic
- IP Storage Traffic (iSCSI/NFS)
- Fault Tolerance Traffic
Let us start looking at the Management Network Redundancy, since vSphere 5.x its only ESXi, we will not focus more on Service Console.
There are 2 ways to setup management network redudancy:
- Setup 2 seperate vSwitches (each connected to an uplink) with a VMkernel Port group (Management Traffic) on each vSwitch configured with different IP subnets. This provides resiliency as well as some load balancing especially when you are performing P2V’s where everything is transferred over the Management Network. But this setup requires some additional settings to be changed, such as providing a second “das-isloationaddress” (specify a 2nd DG) and also change the “das.failuredetectiontime” to 30,000 ms.
- The 2nd option is to setup a single vSwitch and only one management port group connected to NICS (uplinks) that are set as Active-Passive. But will need to ensure both the VMNICS are connected to seperate Physical switches so that a SPOF can be avoided if the whole physical switch goes down. In this typical design, the default NIC teaming policies are used.
This has been a big question as to how many NIC’s per host are required, if you have limited number of network adapters on the host then the choices are very limited to have every traffic type dedicated to 1 or more Network Adapters. This decision can be completely driven by the company’s security/complaince policies whether multiple traffic types can be allowed over 1 network adapter (This can be achieved by VLAN tagging as the vSwitches supports it completely)
Virtual Machine Traffic: Can be segregated by using VLANs, most network these days have the option to trunk different VLANs on the same network port. So instead of having seperate Network Cards for every network, you can trunk them all onto the same physical NIC.
IP Storage Traffic (NFS/iSCSI): This type of traffic should defintely be on seperate Network Adapter and shouldn’t really rely on one single pNIC, Also, IP Storage should be on a non routable subnet. For SW iSCSI you can create multiple port groups and assign each port group as an iSCSI initiator so that you can maintain redundancy as well as load balancing. You can also use Jumbo frames, ensure they are enabled end to end, right from the ESXi host to your Physicl Switch layer. For iSCSI we can use CHAP Authentication to provide high level of security.
vMotion Traffic: This traffic type does require redundancy, so you can use one of the NICs from mamnagement network, of course by putting the network on a seperate VLAN. Also, to avoid man in the middle attack this should also be on a non routable subnet as the migration happens over the network and is not encrypted at all. From vSphere 5.0 onwards we can leverage on MultiNIC vMotion, refer to Duncan Epping’s blog here. vSphere 5 allows you to perform 8 concurrent vMotions on a single host with 10GbE capabilities. For 1GbE, the limit is 4 concurrent vMotions
Some best practices for vMotion:
- Leave 30% unreserved CPU capacity while using resource pools at the host level and 10% of CPU capacity while using RP’s under DRS.
- Use 10Gb NIC for performance improvements while migrating very large virtual machines
- Do not place your swap files on local datastore as this will directly impact your vMotion performance, instead place them on shared storage
- When using MultiNIC vMotion, setup both the vmk adapters on the same vSwitch. In the vmknic properties, configure each vmknic to leverage a different vmnic as its active vmnic, with the rest marked as standby
- Use network I/O control if your network is constrained, it is also recommended to setup limits on vMotion traffic otherwise multiple vMotions from different hosts may chew up the shared bandwidth
Fault Tolerance Traffic: This type of traffic too requires redundancy as all the FT logging happens through the Network Adapters. Som if you loose the pNIC that means you simple don’t have Fault Torelance available for your Virtual Machines
It is important to understand what NIC speed your traffic type is going to require. For management network it should atleast be on 1GB if you’re planning for P2V’s. vMotion traffic should atleast be 1GB NIC otherwise the VM migration might take longer, also with latest enhancements in vSphere 4.1, number of concurrent vMotions was raised to eight from four and speed cap was raised to 8Gbps on a 10 GB NIC, so if you dedicate a 10Gb NIC you may saturate the the bandwidth with vMotion only. With IP based storage now being one of the main stream Storage platforms, 10 GB NICs will provide maximum performance.
For NFS (The Physical Switches that support Ether Channel), if you need to have your ESX/ESXi hosts access more than one storage controller from different pNICs, you need to setup multiple IP addresses in the storage controller and configure Link Aggregation Control Protocol(LACP) load balancing on your storage Array controller. The vSwitch Load balancing policy should be set to Route Based on IP Hash.
Let us look at vDistributed Switches:
vDS spans across multiple ESX/ESXi hosts at the Datacenter level. vDSs are only available with Enterprise Plus license, some key features are:
- Can have traffic shaping for Inbound (Ingress) and Outbound (Egress) traffic
- Centrally managed by vCenter Server, This can be tricky because if vCenter goes down, the vDS is unmanageable, think of vCenter being a VM connected to VM Port Group on the vDSS, it will be a chicken n egg situation
- Supports PVLans
- NIC teaming based on Load (new feature on vSphere 5)
PVLANs: These divide the broadcast domain into several logical broadcast domains. PVLAN is an extension of VLAN standard. Private in this case means that hosts in the same PVLAN can’t be seen by others, except those selected in a Promiscuous PVLAN. PVLAN’s are divided into:
Primary, The original VLAN that is being divided into small er groups, all the secondary PVLANs exist only inside the primary VLAN
Secondary, The VLANs that exists in the primary VLANs, secondary PVLAN is further divided into:
Promiscuous: The switch port connects to a router, firewall or other common gateway device. This port can communicate with anything else connected to the primary or any secondary VLAN. In other words, it is a type of a port that is allowed to send and receive frames from any other port on the VLAN
Isolated: Any switch ports associated with an Isolated VLAN can reach the primary VLAN, but not any other Secondary VLAN. In addition, hosts associated with the same Isolated VLAN cannot reach each other. Only one Isolated VLAN is allowed in one Private VLAN domain
Community: Connects to the regular host that resides on community VLAN. This port communicates with P-Ports and ports on the same community VLAN
Network I/O Control: Network I/O Control is the traffic management capability which is only available on vDS. The NIOC concept revolves around resource pools that are similar in many ways to the ones existing for CPU and memory. I/O shares can be allocated to different traffic types, i.e. VM, vMotion, FT, IP Storage and Mgmt Traffic.
NetIOC Best Practices:
- When using bandwidth allocation, use “shares” instead of “limits,” as the former has greater flexibility for unused capacity redistribution
- Fault tolerance is a latency-sensitive traffic flow, so it is recommended to always set the corresponding resource-pool shares to a reasonably high relative value in the case of custom shares
- Use LBT as your vDS teaming policy while using NetIOC in order to maximize the networking capacity utilization
- Use the DV Port Group and Traffic Shaper features offered by the vDS to maximum effect when configuring the vDS. Configure each of the traffic flow types with a dedicated DV Port Group
Read the VMware KB Article 1004048 on Link Aggregation