What’s New in HCX 4.1 Part II – Seed Checkpoint for Bulk Migration

In What’s New in HCX 4.1 Part 2 , we will look into the new option Seed Checkpoint for Bulk Migration.

Last week in the What’s New in HCX 4.1 Part 1 post we explored Mobility Optimized Networking, which is now Generally Available for all HCX deployments.

Read the official HCX 4.1 release note here.

Read the HCX 4.1 launch blog here.

About HCX Bulk Migration

Bulk Migration in Action (another blog post
Understanding HCX Bulk Migration (in the Guide)

HCX Bulk migration is the workhorse in the HCX migration world.

  • This migration type is resilient to less favorable environmental conditions (much more when compared to its hot migration counterpart).
  • It makes migration possible when migrating from newer CPU family to older CPU family (it happens)
  • It makes migration possible when traversing CPU vendors
  • It may be the ideal choice when a live migration is not a hard requirement (when maintenance windows can be secured from the application owners for controlled/ parallel switchovers).
  • It enables some level of guest customization
  • It tolerates VM configuration changes during the migration (to some degree).

On the Bulk migration option uses Host Based Replication. This is not a technology that will saturate available resources for a single VM migration (i.e. it is a good choice for migration reliability, but not a high speed migration type).

With Bulk migration, when the transfer phase involves Terabytes of VM disk data, this phase may last several days until a full synchronization completes.

After a full synchronization, we transition to a delta synchronization (continuous replication of only changes to the virtual machine).

Bulk Migration Non-Completion without Seed Checkpoint

While a migration is considered to be generally highly resilient, a failure can happen due to unexpected change in site to site conditions (that cannot be handled by the process). Also a user could invoke a manual cancellation of a migration in progress.

The non-completion behavior we have today is an operational rollback with data clean-up:

Bulk Migration Cancelled or Failed
Operation is marked Cancelled or Failed. VM Data is Cleaned Up

With the existing rollback/cleanup approach, the source virtual machine continues to function undisrupted at the source; migrated data is is removed from the destination datastore.

While there is no impact to the original virtual machine, a migration project timelines can be heavily impacted by the lost progress on transferring Terabytes of VM data. The negative cascading effect is severe to users that have multi-phased migrations planned out.

Bulk Migration Non-Completion with Seed Checkpoint

This is where the new Seed Checkpoint feature comes into play! Seed checkpoint gives the user a new mode for handling.

Seed Checkpoint protects migration progress, and reduces risk to planning efforts due to unexpected failures.

This mode replaces the rollback/cleanup actions with progress breadcrumbs. Two basic things change when the feature is selected during the migration:

  1. A migration failure or cancellation invokes a process in which replica disks are labeled as “not in use”, HCX knows of these disks for subsequent operations.
  2. A migration operation where seed checkpoint is selected and existing data is available will result in the remapping of existing transfer data.
  3. HCX will reconcile available data and resume the replication from that point.
Bulk Migration Cancelled or Failed
Operation is Marked Cancelled or Failed; Seed Checkpoint is Recorded

How to Enable the Feature

For the initial launch of this feature, Seed Checkpoint appears as a checkbox option at the individual virtual machine level during a Bulk Migration:

Some Additional Considerations

Software Requirements

Seed Checkpoint requires HCX 4.1 in every component of a service mesh. As with any new functionality, the feature will not work with a partial upgrade.

Seedcheck point is not enabled by default during its initial launch. Why?

This is a great new feature and we envision it being the default mode very soon, however, there are some existing scenarios which may result in orphaned data (which can be non-negligible, depending on the VM data). Two scenarios that come to mind:

1 – A failed/cancelled HCX Bulk Migration attempt followed by a different migration type selection (HCX vMotion and HCX RAV migrations will not use the seed checkpoint data).

In this scenario, the subsequent attempt is not a Bulk migration, the user has to return to the failed/cancelled Bulk migration operation to clean up the seed data. The HCX interface has some validations in place to make it known to the user that seed checkpoint data exists.

2 – A failed/cancelled migration is archived by the user (This removes the migration’s clean up option).

In 4.1, we would like the use of the feature to be very intentional.

What about Seed Checkpoint for Replication Assisted vMotion?

Seed Checkpoint for Bulk Migration is the beginning of a wider effort for related technologies. While I cannot make any guarantees at all about the future; I can say I’m excited for the next round of advancements in this area. Stay tuned for more.


Gabe

P.S. Happy Father’s Day to any dads (and moms who are functional dads)

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s