DRS and Cluster Config Considerations for HCX Deployments

A vSphere DRS cluster provides capabilities for automated workload balancing, power consumption high availability, mechanisms to manage how resources are available for workloads. DRS configurations are well known and are widely used in vSphere deployments. Default DRS settings work well for many situation, but it is worthwhile to optimize them when virtual machines are providing critical infrastructure services.

In an ideal world, the HCX > vSphere integration would handle these optimizations, but that is not yet the case.

The goal for this post is to surface the cluster settings that will help the HCX deployment hum, for your consideration..

The environment I used for this post is running vSphere 6.7

In vSphere > Cluster > Configure > Services > vSphere DRS)

SettingOptionsRecommendations / Comments
Automation LevelFully Automated, Partially Automated, ManualIf the DRS configuration Fully Automated , the VM Override settings should be applied. If the DRS settings are Partially Automated or Manual, the VM Override settings are not necessary.
Migration ThresholdConservative / Aggressive sliderIf the primary objective for HCX is to evacuate a source cluster, but Fully Automated DRS is desired, I would urge leaning towards a more conservative (if not the most conservative setting).

It’s a simple matter of the operations having a cost, the hosts having limits, and reducing points of contention to more quickly achieve the goal (assuming the goal is HCX-fueled migration of VMs to their intended destination).

In vSphere > Cluster > Configure > Services > vSphere Availability

SettingOptionsRecommendations / Comments
vSphere AvailabilityOn/OffHCX services rely on vSphere Availability. It should not be disabled when HCX is providing data path services.
Proactive HAOn/Off

Automated /Manual
Proactive HA in available vSphere 6.5+ environments. This is a good setting to mitigate an HCX related outage. When partial degradation happens on a host containing HCX service appliances, proactive DRS will adjust the placement and reduce the risk of total degradation.

In vSphere > Cluster > Configure > Configuration

SettingOptionsRecommendations / Comments
VMware EVCHCX vMotion and Replication Assisted vMotion do not require EVC to be enabled for “forward migrations” (from clusters with older to newer generation processors). This is a core benefit of HCX migration.
“Reverse migration” will also work for those forward migrated virtual machines as long as the virtual machine hasn’t been power cycled.

If EVC settings are used, the migration administrator should have knowledge of the setting considering both the source and target cluster and understand how that may impact vMotion based HCX migrations.
VM/Host Groups & RulesIX/WANOPT AffinityWhen WAN Optimization (HCX-WO) service is enabled, the IX will perform a handoff of packets to the HCX-WO, and HCX-WO will return the packet to the IX, before it is transmitted to the peer IX receiver. You can create an affinity rule to optimize the data path for migration flows.

Create a group that contains HCX-IX and the HCX-WAN-OPT appliance. Create a rule that keeps those virtual machines together.
VM OverridesDRS Automation LevelSet to Manual for HCX service appliances.

When DRS kicks in for re-balancing, it tends to grab the little VMs first. The small footprint HCX service mesh appliances tend to be good candidates.

This setting is very useful with environments with dense Host/VM ratios.
VM OverridesHA Restart PriorityAs mentioned above HCX relies on vSphere Availability, particularly the HCX-NE, which provides critical datapath services for VMs to VM traffic during Network Extension forwarding.

To optimize for resiliency, the VM Restart Priority should be set to Highest for HCX Network Extension virtual machines.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s