Friday, December 15, 2023

ESXi on Arm: NVMe working on Orange Pi 5 Plus

Recently, a few awesome things have happened. First, the VMware Flings page returned after a short absence. Then, the ESXi on Arm team released an update that brings NVMe support to the Raspberry Pi CM4. This update got me thinking about the issues I had trying to get NVMe working on the Orange Pi 5 Plus. I checked on the edk2-rk3588 project to see if there were any updates, and there were a few that addressed PCI-e. So I flashed my eMMC module to 0.9.1, updated ESXi on Arm to 1.15 and... no dice.


It was at this point that I decided to start actually reading about the problem. Turns out, the issue is occurring because of how the RK3588 chip handles MSI. Erratum 3588001 goes into detail, but the point is that for the edk2-rk3588 project, it was easier to disable MSI than it was to try to fix it. This is what causes the problem that I was experiencing; ESXi will load modules, but will hang at a certain point and fail to boot.


After a bit more research, I found that there's a rather easy work around: kernel options! William Lam has an awesome list of advanced kernel options and lo and behold, there is an option to disable MSI. 


DISCLAIMER: I have no idea what the implications of disabling MSI are in a VMware environment. I offer no warranty. Any change to advanced settings in ESXi has a non-zero chance of wreaking havoc. I would not put any data I care about on the device we're about to configure!

With that out of the way...

I rebooted the Orange Pi and hit shift+O during boot to type:

disableMSI=TRUE

Then hit enter, and let ESXi continue to boot. It was able to get through module load without hanging, and was rewarded with a storage device in the vSphere client:


Created a datastore, made a VM and installed Ubuntu all with no consequence. Great! So now that it works, all we need to do is make the change persistent. To do so, let's check the status of the disableMSI kernel option:

[root@localhost:~] esxcli system settings kernel list -o "disableMSI"
Name        Type  Configured  Runtime  Default  Description
----------  ----  ----------  -------  -------  -----------
disableMSI  Bool  FALSE       TRUE     FALSE    Disable use of MSI/MSI-X
This is what we'd expect to see; runtime is TRUE, but configured is FALSE. Let's fix that with:
esxcli system settings kernel set -s "disableMSI" -v "TRUE"
And then double check with the list command from before:
[root@localhost:~] esxcli system settings kernel list -o "disableMSI"
Name        Type  Configured  Runtime  Default  Description
----------  ----  ----------  -------  -------  -----------
disableMSI  Bool  TRUE        TRUE     FALSE    Disable use of MSI/MSI-X

Now, the kernel option we set should persist through reboots. Good luck and happy homelabbing!


Evacuate ESXi host without DRS

One of the biggest draws to vSphere Enterprise Plus licensing is the Distributed Resource Scheduler feature. DRS allows for recommendations ...