Wednesday, August 14, 2019

HCX Overview



HCX is the swiss army knife of workload mobility. It abstracts and removes the boundaries of underlying infrastructure focusing on the workloads. A HCX vMotion, for example, requires no direct connectivity to ESXi hosts in either direction compared to a vSphere vMotion. All HCX vMotion traffic gets managed through the HCX vMotion Proxy at each location. The HCX vMotion Proxy resembles an ESXi host within the vCenter Server inventory. It’s deployed at the data center level by default, no intervention is necessary. One thing to mention is the HCX vMotion proxy gets added to the vCenter Server host count by default. The HCX team is aware and will be changing this in the future, but this has no impact on your vSphere licensing.


Another boundary HCX removes is it supports several versions of vSphere going back to vSphere 5.0 to the most current release of vSphere 6.7 Update 1. This provides flexibility in moving workloads across vSphere versions, on-premises locations, and vSphere SSO domains. For on-premises to on-premises migrations, a NSX Hybrid Connect license is required per HCX site pairing. We will cover site pairing in the configuration blog post. Migrating workloads from on-premises to VMC does not require a separate HCX license. By default when deploying a VMC SDDC HCX is included as an add-on and is enabled by default. From the VMC add-ons tab, all that is required is clicking open Hybrid Cloud Extension and then deploy HCX. The deployment of HCX is completely automated within the VMC SDDC.



In order to start migrating workloads, network connectivity between the source and destination needs to be in place. The good news is it’s all built-in to the product. HCX has WAN optimization, deduplication, and compression to increase efficiency while decreasing the time it takes to perform migrations. The minimum network bandwidth required to migrate workloads with HCX is 100 Mbps. HCX can leverage your internet connection as well as direct connect. The established network tunnel is secured using suite B encryption. The on-premises workloads being migrated with no downtime will need to reside on a vSphere Distributed Switch (VDS). It also supports a 3rd party switch in the Nexus 1000v. Cold and bulk HCX migration types are currently the only two options which support the use of a vSphere standard switch but implies downtime for the workload (The HCX team is working on adding support for the vSphere standard switch for other migration types). To minimize migration downtime, HCX has a single click option to extend on-premises networks (L2 stretch) to other on-premises sites or VMware Cloud on AWS. Once the workloads have been migrated there is also an option to migrate the extended network, if you choose. Other built-in functionality includes:
  •     Native scheduler for migrations
  •     Per-VM EVC
  •     Upgrade VM Tools / Compatibility (hardware)
  •     Retain mac address
  •     Remove snapshots
  •     Force Unmount ISO images
  •     Bi-directional migration support


HCX provides enhanced functionality on top of the built-in vSphere VM mobility options. Customers can now use HCX to migrate workloads seamlessly from on-premises to other paired on-premises sites (multisite) and VMware Cloud on AWS. Workload mobility can also help with hardware refreshes as well as upgrading from unsupported vSphere 5.x version. The next post will cover the different migration options available within HCX, followed by how to setup and configure the product.

I will share the update information shortly. I hope this has been informative and thank you for reading!

Monday, August 5, 2019

vSAN Space Efficiency Features

vSAN Space efficiency features such as: 
  • Deduplication
  • Compression 
  • Erasure Coding
Reduce the total cost of ownership (TCO) of storage which are all features built directly into vSAN. Let’s go into each one a little more in-depth to learn how we’re saving money, storage and increasing performance at the same time.

Deduplication & Compression

Enabling dedup & compression can actually reduce the amount of physical storage consumed by almost as much as 7 times. For example, let’s say you have 20 Windows Server 2012 R2 VM’s and they have all their specific purpose (AD, Exchange, App, Web, DB, etc…). If we didn’t utilize de-dup and compression we would be holding the same set of data 20 times more than we need to.

Environments with redundant data such as similar operating systems typically benefit the most. Likewise, compression offers more favorable results with data that compresses well like text, bitmap, and program files. Data that is already compressed such as certain graphics formats and video files, as well as files that are encrypted, will yield little or no reduction in storage consumption from compression. Deduplication and compression results will vary based on the types of data stored in an all flash vSAN environment.
Note: Dedup and compression is a single cluster-wide setting that is disable by default and can be enabled using a drop down menu in the vSphere Web Client.

RAID 5/6 Erasure Coding

RAID-5/6 erasure coding is a space efficiency feature optimized for all flash configurations. Erasure coding provides the same levels of redundancy as mirroring, but with a reduced capacity requirement. In general, erasure coding is a method of taking data, breaking it into multiple pieces and spreading it across multiple devices, while adding parity data so it may be recreated in the event one of the pieces is corrupted or lost.


Unlike deduplication and compression, which offer variable levels of space efficiency, erasure coding guarantees capacity reduction over a mirroring data protection method at the same failure tolerance level. As an example, let’s consider a 100GB virtual disk. Surviving one disk or host failure requires 2 copies of data at 2x the capacity, i.e., 200GB. If RAID-5 erasure coding is used to protect the object, the 100GB virtual disk will consume 133GB of raw capacity—a 33% reduction in consumed capacity versus RAID-1 mirroring.
RAID-5 erasure coding requires a minimum of four hosts. Let’s look at a simple example of a 100GB virtual disk. When a policy containing a RAID-5 erasure coding rule is assigned to this object, three data components and one parity component are created. To survive the loss of a disk or host (FTT=1), these components are distributed across four hosts in the cluster.

RAID-6 erasure coding requires a minimum of six hosts. Using our previous example of a 100GB virtual disk, the RAID-6 erasure coding rule creates four data components and two parity components. This configuration can survive the loss of two disks or hosts simultaneously (FTT=2). While erasure coding provides significant capacity savings over mirroring, understand that erasure coding requires additional processing overhead. This is common with any storage platform. Erasure coding is only supported in all flash vSAN configurations. Therefore, the performance impact is negligible in most cases due to the inherent performance of flash devices.

I will share the update information shortly. I hope this has been informative and thank you for reading!

Understand how vSAN Data Protects

vSAN Protects data in many different forms. We will discuss these in brief.

Storage Policy-Based Management

Storage Policy-Based Management (SPBM) from VMware enables precise control of storage services. Like other storage solutions, vSAN provides services such as availability levels, capacity consumption, and stripe widths for performance. A storage policy contains one or more rules that define service levels.
 
Storage policies are created and managed using the vSphere Web Client. Policies can be assigned to virtual machines and individual objects such as a virtual disk. Storage policies are easily changed or reassigned if application requirements change. These modifications are performed with no downtime and without the need to migrate virtual machines from one datastore to another. SPBM makes it possible to assign and modify service levels with precision on a per-virtual machine basis.
 
Failures To Tolerance (FTT)

Defines how many failures an object can tolerate before it becomes unavailable.
Fault Domains: “Fault domain” is a term that comes up often in availability discussions. In IT, a fault domain usually refers to a group of servers, storage, and/or networking components that would be impacted collectively by an outage. A common example of this is a server rack. If a top-of-rack switch or the power distribution unit for a server rack would fail, it would take all the servers in that rack offline even though the server hardware is functioning properly. That server rack is considered a fault domain.
 
Each host in a vSAN cluster is an implicit fault domain. vSAN automatically distributes components of a vSAN object across fault domains in a cluster based on the Number of Failures to Tolerate rule in the assigned storage policy. The following diagram shows a simple example of component distribution across hosts (fault domains). The two larger components are mirrored copies of the object and the smaller component represents the witness component.
 


To mitigate this risk, place the servers in a vSAN cluster across server racks and configure a fault domain for each rack in the vSAN UI. This instructs vSAN to distribute components across server racks to eliminate the risk of a rack failure taking multiple objects offline. This feature is commonly referred to as “Rack Awareness”. The diagram below shows component placement when three servers in each rack are configured as separate vSAN fault domains.

 

Disk Group

A disk group is a unit of physical storage capacity on a host and a group of physical devices that provideperformance and capacity to the vSAN cluster. On each ESXi host that contributes its local devices to avSAN cluster, devices are organized into disk groups.Each disk group must have one flash cache device and one or multiple capacity devices. The devicesused for caching cannot be shared across disk groups, and cannot be used for other purposes. A singlecaching device must be dedicated to a single disk group. In hybrid clusters, flash devices are used for thecache layer and magnetic disks are used for the storage capacity layer.

Consumed Capacity

Consumed capacity is the amount of physical capacity consumed by one or more virtual machines at anypoint. Many factors determine consumed capacity, including the consumed size of your VMDKs,protection replicas, and so on. When calculating for cache sizing, do not consider the capacity used forprotection replicas.


Object-Based Storage

vSAN stores and manages data in the form of flexible data containers called objects. An object is a logicalvolume that has its data and metadata distributed across the cluster. For example, every VMDK is anobject, as is every snapshot. When you provision a virtual machine on a vSAN datastore, vSAN creates aset of objects comprised of multiple components for each virtual disk. It also creates the VM homenamespace, which is a container object that stores all metadata files of your virtual machine. Based onthe assigned virtual machine storage policy, vSAN provisions and manages each object individually,which might also involve creating a RAID configuration for every object.When vSAN creates an object for a virtual disk and determines how to distribute the object in the cluster,it considers the following factors:nvSAN verifies that the virtual disk requirements are applied according to the specified virtual machinestorage policy settings.nvSAN verifies that the correct cluster resources are used at the time of provisioning. For example,based on the protection policy, vSAN determines how many replicas to create. The performancepolicy determines the amount of flash read cache allocated for each replica and how many stripes tocreate for each replica and where to place them in the cluster.nvSAN continually monitors and reports the policy compliance status of the virtual disk. If you find anynoncompliant policy status, you must troubleshoot and resolve the underlying problem.

vSAN Datastore

After you enable vSAN on a cluster, a single vSAN datastore is created. It appears as another type ofdatastore in the list of datastores that might be available, including Virtual Volume, VMFS, and NFS. Asingle vSAN datastore can provide different service levels for each virtual machine or each virtual disk. InvCenter Server, storage characteristics of the vSAN datastore appear as a set of capabilities. You canreference these capabilities when defining a storage policy for virtual machines. When you later deployvirtual machines, vSAN uses this policy to place virtual machines in the optimal manner based on therequirements of each virtual machine.

Objects and Components

Each object is composed of a set of components, determined by capabilities that are in use in the VMStorage Policy. For example, with Primary level of failures to tolerate set to 1, vSAN ensures that theprotection components, such as replicas and witnesses, are placed on separate hosts in the vSANcluster, where each replica is an object component. In addition, in the same policy, if the Number of diskstripes per object configured to two or more, vSAN also stripes the object across multiple capacitydevices and each stripe is considered a component of the specified object. When needed, vSAN mightalso break large objects into multiple components.

Virtual Machine Compliance Status

Compliant and NoncompliantA virtual machine is considered noncompliant when one or more of its objects fail to meet therequirements of its assigned storage policy. For example, the status might become noncompliant whenone of the mirror copies is inaccessible. If your virtual machines are in compliance with the requirementsdefined in the storage policy, the status of your virtual machines is compliant. From the Physical DiskPlacement tab on the Virtual Disks page, you can verify the virtual machine object compliance status.


Component State: Degraded and Absent States

vSAN acknowledges the following failure states for components:nDegraded. A component is Degraded when vSAN detects a permanent component failure anddetermines that the failed component cannot recover to its original working state. As a result, vSANstarts to rebuild the degraded components immediately. This state might occur when a component ison a failed device.nAbsent. A component is Absent when vSAN detects a temporary component failure wherecomponents, including all its data, might recover and return vSAN to its original state. This state mightoccur when you are restarting hosts or if you unplug a device from a vSAN host. vSAN starts torebuild the components in absent status after waiting for 60 minutes.

Object State

Healthy and UnhealthyDepending on the type and number of failures in the cluster, an object might be in one of the followingstates:nHealthy. When at least one full RAID 1 mirror is available, or the minimum required number of datasegments are available, the object is considered healthy.nUnhealthy. An object is considered unhealthy when no full mirror is available or the minimum requirednumber of data segments are unavailable for RAID 5 or RAID 6 objects. If fewer than 50 percent ofan object's votes are available, the object is unhealthy. Multiple failures in the cluster can causeobjects to become unhealthy. When the operational status of an object is considered unhealthy, itimpacts the availability of the associated VM.

Witness

A witness is a component that contains only metadata and does not contain any actual application data. Itserves as a tiebreaker when a decision must be made regarding the availability of the surviving datastorecomponents, after a potential failure. A witness consumes approximately 2 MB of space for metadata onthe vSAN datastore when using on-disk format 1.0, and 4 MB for on-disk format for version 2.0 and later.vSAN 6.0 and later maintains a quorum by using an asymmetrical voting system where each componentmight have more than one vote to decide the availability of objects. Greater than 50 percent of the votesthat make up a VM’s storage object must be accessible at all times for the object to be consideredavailable. When 50 percent or fewer votes are accessible to all hosts, the object is no longer accessibleto the vSAN datastore. Inaccessible objects can impact the availability of the associated VM.

I hope this has been informative and thank you for reading!

VMware Private AI

VMware Private AI In the fast-paced world of AI, privacy and control of corporate data are paramount concerns for organizations. That's ...