Monday, December 30, 2024

ESXi on ARM 2.1 - Broadcom 5719 and USB NICs working on vSphere 8.0

 Let's sneak one more blog post in before the new year! Last week, the ESXi on ARM version 2.1 released, which reintroduced the USB NIC driver, as well as some other fixes. One item on the change log caught my attention:

  •     Fix ntg3 driver support (experimental)


This fix might address the purple screen issue I documented in my previous blog post. This coincides with the 0.12 release of the edk2-rk3588 firmware, so we'll load that as well. The updated BOM is as follows:

Orange Pi 5 Plus



With that out of the way, we can get to work on installing ESXi. For the initial install, I'm going to use one of the USB network adapters for the management traffic, as the BCM5719 isn't going to work right off the bat. Part of this is because of Rockchip erratum 3588001. While this adapter will work in other computers and operating systems, for some reason, the driver does not automatically switch to legacy interrupts if MSI isn't working. In my previous blog post, I found that disabling MSI at the driver level led to a purple screen, which then led to a conversation with Cyprien Laplace about the driver itself. The fixed driver appears to have resolved this issue, as I can now boot into ESXi and even use the 5719 network ports:

To disable MSI and use legacy interrupts at the driver level, I ran the following command then rebooted the host:
    esxcli system module parameters set -m ntg3 -p intrMode=0

On reboot, the network adapters now show up in ESXi, and are usable, at least for management traffic. I haven't stress tested the adapter, but it's encouraging to see the flexibility that ESXi on ARM 2.1 offers for experimental edge use cases. The progress of both the ESXi-ARM and edk2-rk3588 projects has been tremendous, and I'm looking forward to pushing some virtual machines onto this system once more :)

Friday, December 6, 2024

ESXi on ARM Fling 2.0 - Challenges with the vSphere 8.0 update

TL;DR - if you're using an Orange Pi 5 Plus or Raspberry Pi 4/5, you might want to stick with 1.15/7.0.


Apologies for the lack of updates. I've run into more issues trying to get VCF running on the embedded AMD EPYC build. With ESXi on ARM Fling 2.0 released, the base hypervisor has been upgraded to ESXi 8.0 Update 3b. I haven't had much luck with it as the release does not include the USB NIC community driver that the 7.0/1.15 release had. This limits us to exactly one USB NIC, that will only work at 100Mb/s. The onboard network adapter for the Raspberry Pi 5 is a different model than the 4, and as such is not compatible with the uether driver.

Tested BOMs:

Orange Pi 5 Plus

Raspberry Pi 5

With the Orange Pi 5 Plus, the issue first appeared to be related to MSI-X. In previous blog posts, I had to disable MSI interrupts to get NVMe drives to work. The behavior witnessed with Arm Fling 2.0 is that if MSI is disabled with a NIC, it simply disabled the device. In UEFI release 0.9.1, there were some messages indicating that MSIX interrupts were still being allocated:
VMK_PCI: 599: 0000:01:00.0: allocated 3 MSIX interrupts

These messages were also accompanied by TX and RX hangs. While the link light remained up, I could not pull a DHCP address on any card that I tried. 

I'm not sure about everything that changed between 0.9.1 and 0.11.2, but when I attempted it on the latest version, there were no messages indicating MSIX interrupts, yet the behavior remained the same; link light, no DHCP, tx/rx hangs.

With the Broadcom 5719, the ntg3 module has an advanced parameter to enable legacy interrupts. I disabled MSI interrupts, enabled legacy with the driver, only for it to purple screen on boot. None of the Intel modules have this option. I've documented all of these issues on the Broadcom forum, in this topic.

With the Raspberry Pi 5, I ran into similar issues, with messaging indicating "Failed to allocate MSI interrupts". There has been significant development for Linux on RPi5, but the UEFI build has stagnated, as the primary contributors have shifted their focus to RK3588 based SBCs.

I believe the path forward will be with the Orange Pi 5 Plus. The only thing left to test is with an enterprise-grade i210 network adapter. I ordered a used Dell-based i210-T1 which should be here soon. This adapter has been proven to work with Ampere based servers, and the hope is that something with the device firmware will prevent the tx/rx hang issues. If it doesn't resolve the problem, I'll have to hurry up and wait on the USB NIC fling to be re-introduced to the ARM fling. For now, I'm going to stay on the 7.0 release.

ESXi on ARM 2.1 - Broadcom 5719 and USB NICs working on vSphere 8.0

 Let's sneak one more blog post in before the new year! Last week, the ESXi on ARM version 2.1 released, which reintroduced the USB NIC ...