A series of unfortunate events occurred shortly after posting the previous blog post:
- DIMM H1 decided to fail
- Replacement was ordered
- Post office lost the replacement
I am the warranty.
A series of unfortunate events occurred shortly after posting the previous blog post:
Hello, internet! Long time no see, how you been? It's been a pretty interesting year so far, and the homelab has not been spared from the chaos; hardware failures and upgrades have caused my projects to come to a standstill. Fortunately, I've made headway by consolidating some of the hardware into a project I've been trying to get online for some time now.
I hit a stroke of luck by winning an AMD based Supermicro motherboard off of my favorite auction site, which came with a processor and some memory for about $300. The large number of PCI-e lanes opens up a number of expansion options in a standard mid-tower case. In this blog, I'm going to discuss consolidating all ten of the Intel Optane disks that I received last year into one compute node, and detail the process of getting the latest versions of ESXi and vCenter Server installed.
My BOM:
H11SSL-i - each PCI-e slot configured for x4x4 or x4x4x4x4 bifurcation
AMD EPYC 7551
128GB (8x16GB) 2133 DDR4 RAM
10x Intel Optane 280GB NVMe SSDs (vSAN pool)
5x 10Gtek PCI-e x8 to 2x U.2 NVMe adapters
Solidigm P41 Plus 2TB M.2 NVMe SSD (boot disk)
Corsair RM1000x PSU
Silverstone CS380 8 bay mid-tower case
Noctua NH-U9 TR4-SP3 heatsink
This system will consume the same Optane drives that I used in my Supermicro BigTwin SuperServer, which comprised of two X11DPT-B boards, each containing 2x Xeon Platinum 8160's and 768GB of RAM. The previous vSAN ESA build gave 5 of the Optane drives to each node, a vSAN Witness VM on a third node, and 100Gbe direct connect to share bandwidth.
Consolidating down to the tower will be considerably quieter and draw less power, while allowing us to benchmark all ten drives without networking overhead. Downsides include less processing power, far less memory, and a little more work to do under the hood to get it working. Unlike the two node cluster, this will have no redundancy.
I'm going to detail how to accomplish all of this without a vSphere license of any kind; this will utilize the 60 day trial license, and a copy of ESXi that was acquired through supported means. Some of the old tricks of standing up a vSAN node still work with ESA, and can be deployed without vCenter.
As most of my blog posts go, this is strictly for lab use - I would not suggest running a single node vSAN cluster in production, nor would I suggest running a vSAN cluster without a proper vCenter server. We will install vCenter in a later blog post.
The first step is to download the ESXi-Customizer-PS script. This can be found here: https://github.com/VFrontDe-Org/ESXi-Customizer-PS/tree/master
PowerCLI is required to use the script. Full documentation on the script can be found here: https://www.v-front.de/p/esxi-customizer-ps.html
Simply running the script without any options will seek out the VMware online depot and create an ISO based on the latest patch version. As of this writing, I can confirm that the script works to download ESXi 8.0 Update 3 build 24022510.
Install the OS to the boot disk, then reboot.
Once booted, clear any partitions that may be on the Optane disks.
Prior to creating the vSAN cluster, we'll want to get a list of the disks that we want to use in the cluster. For my use case, I was able to run the command "esxcli storage core device list | grep t10" to list out all NVMe drives. I removed my 2TB boot disk from that output.
Since we're using a single node vSAN cluster, we can create a vSwitch with no uplinks for the purpose of vSAN networking:
One of the biggest draws to vSphere Enterprise Plus licensing is the Distributed Resource Scheduler feature. DRS allows for recommendations and automated actions to help balance virtual machine workloads across hosts, as well as affinity rules to keep VMs on or off of specific hosts.
One of the more common functions is the ability to automatically migrate virtual machines off hosts when they are placed in maintenance mode to perform firmware or hardware upgrades. I set out to create a script that would do this for me on a vSphere Standard license. That script can be found here: https://github.com/ThisGuyFuchs/Evacuate-ESXi-Host-without-DRS
The script is pretty straightforward:
# Connect to vCenter Server
Connect-VIServer -Server "Your-vCenter-Server" -User Your-Username -Password Your-Password
Replace "Your-vCenter-Server" with the IP address or FQDN of your vCenter, as well as the administrator account (mine for example is administrator@vsphere.local) and the password for that account. You can remove -Password if you want it to prompt for it instead.
# Specify the ESXi host to evacuate
$esxiHost = "ESXi-Host-Name"
Replace "ESXi-Host-Name" with the IP address or the FQDN of the host you wish to evacuate.
From there, the script will generate a list of VMs, regardless of power state, and then migrate those VMs to any powered on host in the cluster. Once the script finishes, you are free to put the host into maintenance mode manually, or you can add this step to the script with:
# Put the ESXi host into maintenance mode
Set-VMHost -VMHost $esxiHost -State Maintenance -Confirm:$false
Keep in mind that this will migrate ALL virtual machines, whether they are powered on or off. While this isn't a true replacement to DRS, I find this useful to facilitate firmware updates and add hardware to hosts when needed. Hopefully this can provide additional value to vSphere Standard license holders.
Recently, a few awesome things have happened. First, the VMware Flings page returned after a short absence. Then, the ESXi on Arm team released an update that brings NVMe support to the Raspberry Pi CM4. This update got me thinking about the issues I had trying to get NVMe working on the Orange Pi 5 Plus. I checked on the edk2-rk3588 project to see if there were any updates, and there were a few that addressed PCI-e. So I flashed my eMMC module to 0.9.1, updated ESXi on Arm to 1.15 and... no dice.
It was at this point that I decided to start actually reading about the problem. Turns out, the issue is occurring because of how the RK3588 chip handles MSI. Erratum 3588001 goes into detail, but the point is that for the edk2-rk3588 project, it was easier to disable MSI than it was to try to fix it. This is what causes the problem that I was experiencing; ESXi will load modules, but will hang at a certain point and fail to boot.
After a bit more research, I found that there's a rather easy work around: kernel options! William Lam has an awesome list of advanced kernel options and lo and behold, there is an option to disable MSI.
DISCLAIMER: I have no idea what the implications of disabling MSI are in a VMware environment. I offer no warranty. Any change to advanced settings in ESXi has a non-zero chance of wreaking havoc. I would not put any data I care about on the device we're about to configure!
With that out of the way...
I rebooted the Orange Pi and hit shift+O during boot to type:
disableMSI=TRUE
Then hit enter, and let ESXi continue to boot. It was able to get through module load without hanging, and was rewarded with a storage device in the vSphere client:
[root@localhost:~] esxcli system settings kernel list -o "disableMSI"
Name Type Configured Runtime Default Description
---------- ---- ---------- ------- ------- -----------
disableMSI Bool FALSE TRUE FALSE Disable use of MSI/MSI-X
esxcli system settings kernel set -s "disableMSI" -v "TRUE"
[root@localhost:~] esxcli system settings kernel list -o "disableMSI"
Name Type Configured Runtime Default Description
---------- ---- ---------- ------- ------- -----------
disableMSI Bool TRUE TRUE FALSE Disable use of MSI/MSI-X
Now, the kernel option we set should persist through reboots. Good luck and happy homelabbing!
To follow up on my previous blog post, I've made a few changes to the M75q Gen 2. It can run vSphere 7.0 and 8.0, but is unable to use the onboard NIC as it is a Realtek 8168, which doesn't have a supported driver. With the recent removal of the VMware Flings webpage (archive still exists), the move to supported network adapters is becoming more prudent. While 10Gbe networking is nice, I'd like to expand to perhaps a quad port 2.5Gbe card if possible. If unable to do so, I could perhaps utilize the onboard A+E M.2 slot for a supported gigabit card.
I've optimized the 3D print to allow for a more open air approach to the previous "box" design. This should allow for optimal airflow, at the expense of being able to stack boxes on top of each other.
Tested and working
As previously mentioned, using the M.2 to PCI-e adapter enables the use of a PCI-e 3.0 x4 slot, with some limitations. The AQC107 based SYBA network card works without issue, establishing a full 10Gbe connection. This card is natively supported as of ESXi 7.0 Update 2.
Tested, limited or uncertain capability
I tested the zimaboard 4x 2.5Gbe i225 network card on the same slot. I was only able to get three of the four ports to work. The fourth port showed link lights even when a cable wasn't connected. I suspected that the issue was with the card on it's own, as the power consumption should be similar to the 10Gbe card. Testing the card in a proper PCI-e slot yielded the same result. DOA parts happen, but I haven't bothered purchasing another to test. If I work up the nerve to try again, I'll order another and post an update, or will try a different brand altogether.
I then tested a Lenovo OEM Intel X550-T2. This is a PCI-e 3.0 x4 card with dual 10Gbe ports. It isn't detected during POST, but works in other systems, which leads me to draw two conclusions:
Tested, not working
The last card I tested was a sketchy looking M.2 A+E to gigabit network adapter. This was a best effort attempt as I have several machines that could use this card. Unfortunately, the M75q Gen 2 did not detect this card at all. If you're going to test this in a compact system, I would suggest getting a A+E extension cable, as the card is wider than most WiFi adapters and may not fit. If/when the Orange Pi 5 Plus supports PCI-e for the ARM fling, I plan on testing this adapter with the extension cable for that purpose.
Overall, I like the Ryzen based, 8 core, 64GB of RAM system for ESXi 6.7 as I can still use the community Realtek driver for it. Nothing in my homelab is essential, but if you wish to learn the capabilities of the latest versions of ESXi, I would suggest against trying to hack a machine to do so. The USB network card fling can be utilized for gigabit to 2.5Gbe connectivity with some success, but for general reliability, stick to Intel based onboard networking with real PCI-e slots or Thunderbolt capabilities.
With VMware Fling 1.14 and the latest commits of the edk2-3588 firmware, the Orange Pi 5 Plus can now utilize some USB NICs, and make use of all USB ports within an ESXi environment.
A series of unfortunate events occurred shortly after posting the previous blog post: DIMM H1 decided to fail Replacement was ordered Post ...