Sunday, February 18, 2024

Evacuate ESXi host without DRS

One of the biggest draws to vSphere Enterprise Plus licensing is the Distributed Resource Scheduler feature. DRS allows for recommendations and automated actions to help balance virtual machine workloads across hosts, as well as affinity rules to keep VMs on or off of specific hosts. 

One of the more common functions is the ability to automatically migrate virtual machines off hosts when they are placed in maintenance mode to perform firmware or hardware upgrades. I set out to create a script that would do this for me on a vSphere Standard license. That script can be found here: https://github.com/ThisGuyFuchs/Evacuate-ESXi-Host-without-DRS

The script is pretty straightforward:

# Connect to vCenter Server
Connect-VIServer -Server "Your-vCenter-Server" -User Your-Username -Password Your-Password

Replace "Your-vCenter-Server" with the IP address or FQDN of your vCenter, as well as the administrator account (mine for example is administrator@vsphere.local) and the password for that account. You can remove -Password if you want it to prompt for it instead.

# Specify the ESXi host to evacuate
$esxiHost = "ESXi-Host-Name"

Replace "ESXi-Host-Name" with the IP address or the FQDN of the host you wish to evacuate.

From there, the script will generate a list of VMs, regardless of power state, and then migrate those VMs to any powered on host in the cluster. Once the script finishes, you are free to put the host into maintenance mode manually, or you can add this step to the script with:

# Put the ESXi host into maintenance mode
Set-VMHost -VMHost $esxiHost -State Maintenance -Confirm:$false

Keep in mind that this will migrate ALL virtual machines, whether they are powered on or off. While this isn't a true replacement to DRS, I find this useful to facilitate firmware updates and add hardware to hosts when needed. Hopefully this can provide additional value to vSphere Standard license holders.

Friday, December 15, 2023

ESXi on Arm: NVMe working on Orange Pi 5 Plus

Recently, a few awesome things have happened. First, the VMware Flings page returned after a short absence. Then, the ESXi on Arm team released an update that brings NVMe support to the Raspberry Pi CM4. This update got me thinking about the issues I had trying to get NVMe working on the Orange Pi 5 Plus. I checked on the edk2-rk3588 project to see if there were any updates, and there were a few that addressed PCI-e. So I flashed my eMMC module to 0.9.1, updated ESXi on Arm to 1.15 and... no dice.


It was at this point that I decided to start actually reading about the problem. Turns out, the issue is occurring because of how the RK3588 chip handles MSI. Erratum 3588001 goes into detail, but the point is that for the edk2-rk3588 project, it was easier to disable MSI than it was to try to fix it. This is what causes the problem that I was experiencing; ESXi will load modules, but will hang at a certain point and fail to boot.


After a bit more research, I found that there's a rather easy work around: kernel options! William Lam has an awesome list of advanced kernel options and lo and behold, there is an option to disable MSI. 


DISCLAIMER: I have no idea what the implications of disabling MSI are in a VMware environment. I offer no warranty. Any change to advanced settings in ESXi has a non-zero chance of wreaking havoc. I would not put any data I care about on the device we're about to configure!

With that out of the way...

I rebooted the Orange Pi and hit shift+O during boot to type:

disableMSI=TRUE

Then hit enter, and let ESXi continue to boot. It was able to get through module load without hanging, and was rewarded with a storage device in the vSphere client:


Created a datastore, made a VM and installed Ubuntu all with no consequence. Great! So now that it works, all we need to do is make the change persistent. To do so, let's check the status of the disableMSI kernel option:

[root@localhost:~] esxcli system settings kernel list -o "disableMSI"
Name        Type  Configured  Runtime  Default  Description
----------  ----  ----------  -------  -------  -----------
disableMSI  Bool  FALSE       TRUE     FALSE    Disable use of MSI/MSI-X
This is what we'd expect to see; runtime is TRUE, but configured is FALSE. Let's fix that with:
esxcli system settings kernel set -s "disableMSI" -v "TRUE"
And then double check with the list command from before:
[root@localhost:~] esxcli system settings kernel list -o "disableMSI"
Name        Type  Configured  Runtime  Default  Description
----------  ----  ----------  -------  -------  -----------
disableMSI  Bool  TRUE        TRUE     FALSE    Disable use of MSI/MSI-X

Now, the kernel option we set should persist through reboots. Good luck and happy homelabbing!


Monday, November 20, 2023

ESXi on Lenovo ThinkCentre M75q Gen 2 part 2 - Adding supported network cards

To follow up on my previous blog post, I've made a few changes to the M75q Gen 2. It can run vSphere 7.0 and 8.0, but is unable to use the onboard NIC as it is a Realtek 8168, which doesn't have a supported driver. With the recent removal of the VMware Flings webpage (archive still exists), the move to supported network adapters is becoming more prudent. While 10Gbe networking is nice, I'd like to expand to perhaps a quad port 2.5Gbe card if possible. If unable to do so, I could perhaps utilize the onboard A+E M.2 slot for a supported gigabit card.

I've optimized the 3D print to allow for a more open air approach to the previous "box" design. This should allow for optimal airflow, at the expense of being able to stack boxes on top of each other. 


Tested and working

As previously mentioned, using the M.2 to PCI-e adapter enables the use of a PCI-e 3.0 x4 slot, with some limitations. The AQC107 based SYBA network card works without issue, establishing a full 10Gbe connection. This card is natively supported as of ESXi 7.0 Update 2.


Tested, limited or uncertain capability

I tested the zimaboard 4x 2.5Gbe i225 network card on the same slot. I was only able to get three of the four ports to work. The fourth port showed link lights even when a cable wasn't connected. I suspected that the issue was with the card on it's own, as the power consumption should be similar to the 10Gbe card. Testing the card in a proper PCI-e slot yielded the same result. DOA parts happen, but I haven't bothered purchasing another to test. If I work up the nerve to try again, I'll order another and post an update, or will try a different brand altogether.

I then tested a Lenovo OEM Intel X550-T2. This is a PCI-e 3.0 x4 card with dual 10Gbe ports. It isn't detected during POST, but works in other systems, which leads me to draw two conclusions:

  1. The power draw of the card is too much, which could be worked around with a different adapter that allows more power (such as a M.2 to Oculink converter with external PSU).
  2. The card has OEM pins that prevent it from being detected during POST, which could be worked around with a tape mod. I'm less inclined to believe this is the case, as it's a Lenovo card going into a Lenovo system, and works in non-OEM machines seamlessly.
As of right now, the card is being used in another system. I might test using the Oculink converter at a later time. The benefit of the Oculink card would mean more compatibility with GPUs, as well as PCI-e 4.0 support for Ryzen 5000 series processors. The downside is that I'd need an external PSU for that purpose.

Tested, not working

The last card I tested was a sketchy looking M.2 A+E to gigabit network adapter. This was a best effort attempt as I have several machines that could use this card. Unfortunately, the M75q Gen 2 did not detect this card at all. If you're going to test this in a compact system, I would suggest getting a A+E extension cable, as the card is wider than most WiFi adapters and may not fit. If/when the Orange Pi 5 Plus supports PCI-e for the ARM fling, I plan on testing this adapter with the extension cable for that purpose.


Overall, I like the Ryzen based, 8 core, 64GB of RAM system for ESXi 6.7 as I can still use the community Realtek driver for it. Nothing in my homelab is essential, but if you wish to learn the capabilities of the latest versions of ESXi, I would suggest against trying to hack a machine to do so. The USB network card fling can be utilized for gigabit to 2.5Gbe connectivity with some success, but for general reliability, stick to Intel based onboard networking with real PCI-e slots or Thunderbolt capabilities. 


Monday, October 30, 2023

Three changes you should be prepared for when upgrading from 6.x

This post outlines some of the impactful changes that come with new versions of vSphere. Sometimes, these changes aren't evident until they're pushed to prod, other times, they creep up and cause problems. As updates are released, I'll publish more topics. For now, these are the main things you should be aware of when upgrading from an ESXi 6.x environment.

The SD card thing
Starting with vSphere 7.0, SD cards and USB boot devices are no longer supported. These devices cannot handle the new partitioning scheme. Unlike 6.x and previous versions, the boot device is now used for logs, which can cause issues when said devices aren't rated for frequent writes. 

The solution is to use a write-intensive local boot disk, or boot from SAN.


vVol minimum size change
In vSphere 8.0 and below, the minimum size for Virtual Volumes is 4GB. Starting with 8.0 Update 1, the size has been updated to 255GB. Any vVols that existed prior to upgrade may disappear, or other errors may occur if you try to make a vVol smaller than 255GB.

The solution is to increase vVols to 255GB or larger, and create new vVols at the same or larger capacity. A workaround exists, which allows the smaller vVols to be used, by using the following command:
esxcli system settings advanced set -o /VVOL/vvolUseVMFS6AndLargeConfigVVols -i 0


CPU Support
An unpleasant surprise awaited me in upgrading to vSphere 8.0 Update 2. The workhorse servers in my homelab are all first generation Xeon Scalable (Skylake), and have been put on notice that they may not be supported in future releases of vSphere. Granted, the processors are 5 years old, but Naples-based AMD EPYC are 6 years and do not have this warning, and Broadwell processors (E5-26xx v4) are still supported as well. 

The solution is to append a boot option when loading ESXi, although this is a band-aid that probably shouldn't be used in production as it may cause instability. Using SHIFT+O prior to module load, use the following: allowLegacyCPU=true

Tuesday, September 5, 2023

Orange Pi 5 Plus ESXi on ARM Fling 1.14 update

 With VMware Fling 1.14 and the latest commits of the edk2-3588 firmware, the Orange Pi 5 Plus can now utilize some USB NICs, and make use of all USB ports within an ESXi environment.





I can hot add USB NICs as well. It seems to run into the same problems that I had on the ThinkCentre M75q Gen 2 and USB NICs, in that the vmnic needs to be assigned to the management vSwitch on boot. The requirement to comment the line out of the BIOS doesn't seem necessary, either; you can do so by entering system setup and disabling PCI-e 3.0 in the menu. My BOM now consists of:

Orange Pi 5 Plus
3D printed case
Noctua NF-A4x20 PWM
USB fan adapter (USB 2.0 port)
UGREEN USB hub + network adapter Model 60544 (Managment network + additional USB ports) 
Cable Matters USB NIC Model 202013 (dedicated VM traffic)
Sabrent USB to SATA adapter Model EC-SSHD
Crucial MX500 1TB SATA SSD

This leaves the option of adding additional USB 3.0 devices if needed, thanks to the UGREEN hub. The onboard Realtek 8125's are not of much use as there is no driver available, but getting dual gigabit NICs makes this much more useful to run some basic web servers. 




I was able to port my Jira instance off of my Raspberry Pi 4 onto a Ubuntu 22.04 VM successfully, which opens the door to back up not only at the application level, but full VM backup as well. The only remaining caveat is that the cores run at their minimum speed (800 Mhz), but that bodes well for me given that I want to keep power consumption as low as possible. It'll be worthwhile to have a virtual platform where the power sipping Orange Pi 5 Plus can remain on, without having to power on a rack to deliver some basic web applications. Other ideas would be to install Tautulli for my Plex server, or Wiki.js for lab documentation. 

Thursday, July 20, 2023

Booting the Orange Pi 5 Plus into ESXi on ARM

Disclaimer: Orange Pi products are not officially supported for ESXi on ARM. Even if you do get it to work, functionality will be limited at the time of this writing. I would not recommend purchasing an Orange Pi for the expressed purpose of ESXi on ARM. The folks working on the edk2-3588 UEFI project are doing amazing work, and I can't wait to see what comes next.

With that out of the way...






Booting and installing the ESXi on ARM Fling is possible on the Orange Pi 5 Plus. There are some pretty sizable caveats in doing so:

  • On-board NICs are Realtek 8125, which do not have a compatible driver
  • The only usable network adapter is the Realtek 8153 USB, which is capped at 100Mbps
  • M.2 NVMe slot is not supported - having a device in the M.2 slot will cause drivers to hang when trying to boot
  • Modification and creation of the edk2-rk3588 UEFI firmware project is necessary
  • Hardware BOM needs to be used in specific ports in order to work
  • Once installed, ESXi only recognizes the lone USB-C port, a hub is necessary
  • As of this writing, the USB NIC Fling driver does not work with the ESXi on ARM Fling (no Fling-ception allowed)
This blog post will serve as a guide to get this installed, the bulk of which is modifying the firmware to work properly. My BOM is as follows:

  • Orange Pi 5 Plus with 3D printed case and 40x40mm fan
  • 16GB USB 3.0 thumb drive for ESXi installer (connected to the top USB 3.0 port)
  • USB-C to USB A adapter
  • Cable Matters 4 port USB hub, connected to the USB-C to USB A adapter 
  • Cable Matters USB Network Adapter model 202013 (connected to hub)
  • USB keyboard (connected to hub)
  • USB 3.0 to SATA adapter with 1TB SSD as an install target/ESXi boot (connected to hub)
  • 8GB Micro SD Card for edk2-3588 UEFI firmware (can use eMMC if your model comes with it, this guide will cover SD card)
The first step is to build out the UEFI firmware. The folks working on this project have been hard at work, and have successfully built firmware that allows things like the WoR Project (Windows on ARM) to gain some functionality. The base release works pretty well for Windows, but for ESXi, we'll have to build a custom version of it.

To do so, we'll need to use a Linux environment. I chose Ubuntu 23.04. The install instructions on the github page linked above include almost all of the packages needed. For my version of Ubuntu, I installed the following:

sudo apt install git gcc g++ build-essential gcc-aarch64-linux-gnu iasl python3-pyelftools uuid-dev device-tree-compiler

Then, clone the repository and change directory:

git clone https://github.com/edk2-porting/edk2-rk35xx.git --recursive
cd /edk2-rk35xx

Prior to building the firmware, we need to modify one of the files to disable PCI-e 3.0. If this step is not followed, the installation media will purple screen during boot. Use vim to modify this file:

sudo vi /edk2-rockchip/Platform/OrangePi/OrangePi5Plus/AcpiTables/AcpiTables.inf

Once you're in the file, comment the following line:

$(RK_COMMON_ACPI_DIR)/Pcie2x1l0.asl
Confirm that the line looks like this, then write changes:

# $(RK_COMMON_ACPI_DIR)/Pcie2x1l0.asl
Now we can build it (change release number to match the latest release found on the github page. As of this writing, 0.7.1 is the latest:

sudo ./build.sh --device orangepi-5plus --release 0.7.1
After a few minutes, the script should finish, and create a file named "RK3588_NOR_FLASH.img" in the edk2-rk35xx directory. You can then either use dd to copy the .img file onto an SD card, or SCP the file to a Windows machine and use the Raspberry Pi imager tool to write the file instead.

With the freshly imaged SD card inserted into the Orange Pi, we can now plug everything else into the machine. Remember, the device order is as follows:

  • USB drive containing ESXi installer in the top USB 3.0 port
  • USB hub plugged into the USB-C port
  • Keyboard, SSD and Realtek 8153 adapter plugged into the hub
If all goes well, ESXi should boot without issue, and will see the network adapter and SSD. Install ESXi as you normally would. I made the mistake of thinking that I could put the USB SSD on the 3.0 ports after installing - this will work, meaning ESXi will still boot, but any additional capacity used for the datastore will be inaccessible, and ESXi will be read only. You must keep the SSD and network adapter on the hub. 

The promise of an 8 core, 16GB RAM, ARM based ESXi machine looks great. I'm hoping that in the not too distant future, we can see more improvements, such as additional USB port and NVMe disk utilization. A workaround for the lack of networking would be to pass another USB network adapter to a VM, which could act as a DHCP server and present gigabit connectivity for other guests, but this would only work if the other USB ports were recognized. 

Time will tell what the future holds for the Orange Pi 5 Plus, but to have UEFI functioning at this level is fantastic, and I can't say enough good things about the people working on this project. Also, a special thanks to the ESXi on Arm team, who have made this endeavor possible.

Thursday, May 11, 2023

HCIBench analysis part 2: vSAN OSA vs. vSAN ESA with all Optane drives

 In my previous post, I compared what a huge difference having proper caching drives can make with vSAN OSA, replacing my read intensive disks with Intel Optane. One question remains, however: since Optane are high performance, mixed use drives, how would they get along with a vSAN ESA deployment? To find out, I reconfigured my vSAN cluster by deleting the witness VM, deploying a vSAN ESA witness, and loaded each server with 5 Intel Optane disks each. Let's see how it got along.

Caveats: 

  • ESA has compression only mode on automatically, and cannot be disabled (OSA was tested without dedupe or compression enabled)
  • ESA best practices call for a minimum of three nodes. It can absolutely work with two nodes but would see true performance benefits in a right sized cluster


100% read, 4K random


Interesting result, as the vSAN OSA with 2x Optane for caching and 2 read intensive capacity disks actually managed 24K higher IOPS on this test. There are a few likely reasons that I think this would happen:

  1. The benchmark likely kept everything in the hot tier throughout the benchmark
  2. As mentioned previously, ESA is running compression while OSA is not. I plan on re-running this benchmark specifically with compression only enabled on the OSA configuration.
  3. ESA works best at the recommended configuration (3+ nodes).

70% read, 4K random


Where Optane improved write performance in the OSA model, ESA with 5 Optane disks per node improved further. We see an improvement of 53K IOPS, as well as higher throughput, and generally better latency across the board.


50% read/write, 8K random


We get a decent bump in performance in comparison to the OSA build. Read latency remains low, write latency remains about the same.


100% write, 256KB sequential


This is perhaps the biggest difference between all the tests, and highlights the key advantage of vSAN ESA especially for a 2 node cluster. Where some of the other tests showed percentage bumps in performance, ESA with write intensive disks managed to get over twice the throughput of vSAN OSA. 5.69GB/s represents 45.52 Gb/s over the network cards. Latency also improved dramatically.

Overall, the 2 node vSAN ESA with 5 Intel Optane disk per server configuration performs generally as expected, comparatively outperforming the OSA with capacity disks. While OSA went blow for blow with ESA, it should be reiterated that deduplication and compression were disabled; IOPS performance tends to be lower in favor of capacity. One thing that I've omitted from these is the usable capacity, of which I would defer to the vSAN calculators to get final numbers, but keep in mind: vSAN OSA accomplished it's numbers with a datastore size of ~13TB, whereas the 280GB Intel Optane disks granted a capacity just north of 2TB.

So what can we take away from this? Is OSA dead? Not by a long shot. In a real world, 2-node ROBO scenario, several questions should be asked:

  • What is the workload?
  • How much performance do you need?
  • Are mixed use drives going to be able to meet capacity demands?
  • What's the best bang for the buck hardware to meet workload requirements?
In a workload that favors capacity and read intensive workloads, I would favor vSAN OSA. For performance, ESA should absolutely be a consideration. And if you have the chance to get more nodes, the capacity and performance improvements tend to favor ESA.












Evacuate ESXi host without DRS

One of the biggest draws to vSphere Enterprise Plus licensing is the Distributed Resource Scheduler feature. DRS allows for recommendations ...