Tuesday, September 5, 2023

Orange Pi 5 Plus ESXi on ARM Fling 1.14 update

 With VMware Fling 1.14 and the latest commits of the edk2-3588 firmware, the Orange Pi 5 Plus can now utilize some USB NICs, and make use of all USB ports within an ESXi environment.





I can hot add USB NICs as well. It seems to run into the same problems that I had on the ThinkCentre M75q Gen 2 and USB NICs, in that the vmnic needs to be assigned to the management vSwitch on boot. The requirement to comment the line out of the BIOS doesn't seem necessary, either; you can do so by entering system setup and disabling PCI-e 3.0 in the menu. My BOM now consists of:

Orange Pi 5 Plus
3D printed case
Noctua NF-A4x20 PWM
USB fan adapter (USB 2.0 port)
UGREEN USB hub + network adapter Model 60544 (Managment network + additional USB ports) 
Cable Matters USB NIC Model 202013 (dedicated VM traffic)
Sabrent USB to SATA adapter Model EC-SSHD
Crucial MX500 1TB SATA SSD

This leaves the option of adding additional USB 3.0 devices if needed, thanks to the UGREEN hub. The onboard Realtek 8125's are not of much use as there is no driver available, but getting dual gigabit NICs makes this much more useful to run some basic web servers. 




I was able to port my Jira instance off of my Raspberry Pi 4 onto a Ubuntu 22.04 VM successfully, which opens the door to back up not only at the application level, but full VM backup as well. The only remaining caveat is that the cores run at their minimum speed (800 Mhz), but that bodes well for me given that I want to keep power consumption as low as possible. It'll be worthwhile to have a virtual platform where the power sipping Orange Pi 5 Plus can remain on, without having to power on a rack to deliver some basic web applications. Other ideas would be to install Tautulli for my Plex server, or Wiki.js for lab documentation. 

Thursday, July 20, 2023

Booting the Orange Pi 5 Plus into ESXi on ARM

Disclaimer: Orange Pi products are not officially supported for ESXi on ARM. Even if you do get it to work, functionality will be limited at the time of this writing. I would not recommend purchasing an Orange Pi for the expressed purpose of ESXi on ARM. The folks working on the edk2-3588 UEFI project are doing amazing work, and I can't wait to see what comes next.

With that out of the way...






Booting and installing the ESXi on ARM Fling is possible on the Orange Pi 5 Plus. There are some pretty sizable caveats in doing so:

  • On-board NICs are Realtek 8125, which do not have a compatible driver
  • The only usable network adapter is the Realtek 8153 USB, which is capped at 100Mbps
  • M.2 NVMe slot is not supported - having a device in the M.2 slot will cause drivers to hang when trying to boot
  • Modification and creation of the edk2-rk3588 UEFI firmware project is necessary
  • Hardware BOM needs to be used in specific ports in order to work
  • Once installed, ESXi only recognizes the lone USB-C port, a hub is necessary
  • As of this writing, the USB NIC Fling driver does not work with the ESXi on ARM Fling (no Fling-ception allowed)
This blog post will serve as a guide to get this installed, the bulk of which is modifying the firmware to work properly. My BOM is as follows:

  • Orange Pi 5 Plus with 3D printed case and 40x40mm fan
  • 16GB USB 3.0 thumb drive for ESXi installer (connected to the top USB 3.0 port)
  • USB-C to USB A adapter
  • Cable Matters 4 port USB hub, connected to the USB-C to USB A adapter 
  • Cable Matters USB Network Adapter model 202013 (connected to hub)
  • USB keyboard (connected to hub)
  • USB 3.0 to SATA adapter with 1TB SSD as an install target/ESXi boot (connected to hub)
  • 8GB Micro SD Card for edk2-3588 UEFI firmware (can use eMMC if your model comes with it, this guide will cover SD card)
The first step is to build out the UEFI firmware. The folks working on this project have been hard at work, and have successfully built firmware that allows things like the WoR Project (Windows on ARM) to gain some functionality. The base release works pretty well for Windows, but for ESXi, we'll have to build a custom version of it.

To do so, we'll need to use a Linux environment. I chose Ubuntu 23.04. The install instructions on the github page linked above include almost all of the packages needed. For my version of Ubuntu, I installed the following:

sudo apt install git gcc g++ build-essential gcc-aarch64-linux-gnu iasl python3-pyelftools uuid-dev device-tree-compiler

Then, clone the repository and change directory:

git clone https://github.com/edk2-porting/edk2-rk35xx.git --recursive
cd /edk2-rk35xx

Prior to building the firmware, we need to modify one of the files to disable PCI-e 3.0. If this step is not followed, the installation media will purple screen during boot. Use vim to modify this file:

sudo vi /edk2-rockchip/Platform/OrangePi/OrangePi5Plus/AcpiTables/AcpiTables.inf

Once you're in the file, comment the following line:

$(RK_COMMON_ACPI_DIR)/Pcie2x1l0.asl
Confirm that the line looks like this, then write changes:

# $(RK_COMMON_ACPI_DIR)/Pcie2x1l0.asl
Now we can build it (change release number to match the latest release found on the github page. As of this writing, 0.7.1 is the latest:

sudo ./build.sh --device orangepi-5plus --release 0.7.1
After a few minutes, the script should finish, and create a file named "RK3588_NOR_FLASH.img" in the edk2-rk35xx directory. You can then either use dd to copy the .img file onto an SD card, or SCP the file to a Windows machine and use the Raspberry Pi imager tool to write the file instead.

With the freshly imaged SD card inserted into the Orange Pi, we can now plug everything else into the machine. Remember, the device order is as follows:

  • USB drive containing ESXi installer in the top USB 3.0 port
  • USB hub plugged into the USB-C port
  • Keyboard, SSD and Realtek 8153 adapter plugged into the hub
If all goes well, ESXi should boot without issue, and will see the network adapter and SSD. Install ESXi as you normally would. I made the mistake of thinking that I could put the USB SSD on the 3.0 ports after installing - this will work, meaning ESXi will still boot, but any additional capacity used for the datastore will be inaccessible, and ESXi will be read only. You must keep the SSD and network adapter on the hub. 

The promise of an 8 core, 16GB RAM, ARM based ESXi machine looks great. I'm hoping that in the not too distant future, we can see more improvements, such as additional USB port and NVMe disk utilization. A workaround for the lack of networking would be to pass another USB network adapter to a VM, which could act as a DHCP server and present gigabit connectivity for other guests, but this would only work if the other USB ports were recognized. 

Time will tell what the future holds for the Orange Pi 5 Plus, but to have UEFI functioning at this level is fantastic, and I can't say enough good things about the people working on this project. Also, a special thanks to the ESXi on Arm team, who have made this endeavor possible.

Thursday, May 11, 2023

HCIBench analysis part 2: vSAN OSA vs. vSAN ESA with all Optane drives

 In my previous post, I compared what a huge difference having proper caching drives can make with vSAN OSA, replacing my read intensive disks with Intel Optane. One question remains, however: since Optane are high performance, mixed use drives, how would they get along with a vSAN ESA deployment? To find out, I reconfigured my vSAN cluster by deleting the witness VM, deploying a vSAN ESA witness, and loaded each server with 5 Intel Optane disks each. Let's see how it got along.

Caveats: 

  • ESA has compression only mode on automatically, and cannot be disabled (OSA was tested without dedupe or compression enabled)
  • ESA best practices call for a minimum of three nodes. It can absolutely work with two nodes but would see true performance benefits in a right sized cluster


100% read, 4K random


Interesting result, as the vSAN OSA with 2x Optane for caching and 2 read intensive capacity disks actually managed 24K higher IOPS on this test. There are a few likely reasons that I think this would happen:

  1. The benchmark likely kept everything in the hot tier throughout the benchmark
  2. As mentioned previously, ESA is running compression while OSA is not. I plan on re-running this benchmark specifically with compression only enabled on the OSA configuration.
  3. ESA works best at the recommended configuration (3+ nodes).

70% read, 4K random


Where Optane improved write performance in the OSA model, ESA with 5 Optane disks per node improved further. We see an improvement of 53K IOPS, as well as higher throughput, and generally better latency across the board.


50% read/write, 8K random


We get a decent bump in performance in comparison to the OSA build. Read latency remains low, write latency remains about the same.


100% write, 256KB sequential


This is perhaps the biggest difference between all the tests, and highlights the key advantage of vSAN ESA especially for a 2 node cluster. Where some of the other tests showed percentage bumps in performance, ESA with write intensive disks managed to get over twice the throughput of vSAN OSA. 5.69GB/s represents 45.52 Gb/s over the network cards. Latency also improved dramatically.

Overall, the 2 node vSAN ESA with 5 Intel Optane disk per server configuration performs generally as expected, comparatively outperforming the OSA with capacity disks. While OSA went blow for blow with ESA, it should be reiterated that deduplication and compression were disabled; IOPS performance tends to be lower in favor of capacity. One thing that I've omitted from these is the usable capacity, of which I would defer to the vSAN calculators to get final numbers, but keep in mind: vSAN OSA accomplished it's numbers with a datastore size of ~13TB, whereas the 280GB Intel Optane disks granted a capacity just north of 2TB.

So what can we take away from this? Is OSA dead? Not by a long shot. In a real world, 2-node ROBO scenario, several questions should be asked:

  • What is the workload?
  • How much performance do you need?
  • Are mixed use drives going to be able to meet capacity demands?
  • What's the best bang for the buck hardware to meet workload requirements?
In a workload that favors capacity and read intensive workloads, I would favor vSAN OSA. For performance, ESA should absolutely be a consideration. And if you have the chance to get more nodes, the capacity and performance improvements tend to favor ESA.












Monday, May 8, 2023

HCIBench analysis part 1: OSA vs. OSA with Intel Optane

In my previous post, I shared my current vSAN setup details. In this post, we'll take a look at the performance of the original vSAN deployment, and see how much of a performance difference Optane can make when we replace sub-optimal cache disks with Optane for OSA, followed by a full Optane and ESA redeployment.

As previously stated, the cache disks that I have in my current vSAN OSA 2 node cluster are not optimized for caching as they are read intensive disks, and they are mismatched capacity. When running vSAN in a production environment, it is best to adhere to the vSAN HCL as well as the appropriate disks for cache and capacity. While my 3.84TB read intensive drives will be great for capacity, I should be using a write intensive cache disk. Fortunately, the Intel Optane 905's that were sent to me through the vExpert program and Intel should do the trick.

For benchmarking comparisons, I'm using HCIBench 2.8.1. This free utility is provided by VMware as a "Fling". It is an OVA template that, once deployed, allows us to automatically create Linux based VMs that run preconfigured Flexible I/O (FIO) benchmarks. Throughout the benchmarking process, I have selected "Easy Run", which will automatically deploy the number of VMs it feels is correct based on my hardware configuration.

The first thing I did was select the default benchmarks and deploy each on the original vSAN OSA configuration. The "Easy Run" mode determined it would be best to deploy 4 VMs, as I suspect the cache wasn't up to par. 

Of note, all OSA benchmarks were run with deduplication and compression disabled.  


Here's how it went:

100% read, 4KB random



This is decent, but expected considering all of the disks are read intensive. 


70% read, 4KB random



This is more of a realistic bench, with some write performance metrics. Read latency is pretty high.


50% read/write, 8KB random



This simulates database workloads with an equal mix of reads and writes. With the larger block size, performance improves a bit, but latency remains a concern.


100% write, 256KB sequential



The throughput bench, best for video and media. We come close to saturating the 10Gbe link between the two servers on this one. 


For the next tests, I'll remove the 1.2TB and 1.92TB cache disks and replace them with two of the Intel Optane 905's. These are 280GB each, and have drastically better read and write performance capabilities. I could, in theory, use four cache disks, but vSAN prefers a 1:1 cache to capacity disk ratio. This was also the point where I swapped out the 10Gbe 82599 network card and replaced it with a ConnectX-4 CX455 100Gbe card to ensure we don't run into a bandwidth bottleneck (although I doubt I'll be able to saturate 100Gb). We should see a measurable difference across the board in terms of vSAN OSA performance. 

Of note, when selecting "Easy Run" on the following tests, HCIBench deployed 8 benchmark VMs.

100% read, 4KB random



Here we can see an increase of 200K IOPS. Despite the read intensive nature of the original disks, the Optane disks are rated higher for reads and substantially higher for writes. I'm not sure why the results failed to capture the 95th percentile read latency, but I would expect it to be similar to the original OSA result


70% read, 4KB random



Unsurprisingly, the write performance improved dramatically, as well as read latency. 188K IOPS at 70/30 for a 2 node cluster is impressive.


50% read/write, 8KB random



Once again, we see the difference in performance thanks to the Optane drives. 172K IOPS and much lower latency.


100% write, 256KB sequential



This was surprising for me. Just by replacing the cache disks, we can see a near 3.5x bandwidth increase. We see that it would've more than saturated the 10Gbe link, so switching to the 100Gbe card gave us a better idea of what to expect. Average write latency also improved considerably!

Overall, we can see a clear advantage in using the Intel Optane disks. It is critical to choose a cache disk that excels in write intensive workloads, as "hot tier" data gets pushed to capacity over time. By contrast, ESA excels with uniform, mixed use disks. We've seen that Optane has impressive write capabilities, but it also has great read performance. Optane can do it all, but what will the HCIBench numbers reflect? Stay tuned for my next blog post, where we'll compare vSAN OSA with 4 Optane cache/4 capacity disks to the ESA with 10 Optane disks test. 

Tuesday, May 2, 2023

My current setup: vSAN OSA 2 node cluster

I'm excited to announce that I was selected for the vExpert Intel Optane giveaway. To prepare for the incoming drives, I've started a series of blog posts focusing on benchmarking my two-node vSAN cluster. This will serve as a baseline of what to expect in terms of differences between using best practices for OSA as well as the fundamental performance differences of ESA.

My BOM consists of:

Supermicro BigTwin 6029BT-DNC0R
2x X11DPT-B compute servers (OEM branded), each containing:
  • 2x Xeon Silver 4116
  • 768GB DDR4 (24x32GB)
  • Intel X550 RJ-45 10Gbe dual port SIOM network card
  • RSC riser to break up x16 slot into 2 x8
  • Intel 82599 SFP+ 10Gbe dual-port network card in slot 1 of the riser
  • 10Gtek PCI-e x8 to 2x U.2 NVMe drive adapter in slot 2 of the riser
  • Open PCI-e x16 low-profile slot
  • Backplane supports 2x U.2 NVMe with 4 SAS/SATA HDD/SSD per server
Each node contains a single NVMe cache disk along with two NVMe drives for capacity. The drive list is as follows:
  • Node 1: Intel DC P3500 1.2TB for cache, 2x SanDisk Skyhawk 3.84TB for capacity
  • Node 2: SanDisk Skyhawk 1.92TB for cache, 2x SanDisk Skyhawk 3.84TB for capacity
The cache disks are not recommended for two reasons: They are mismatched in size, and they are not write-intensive disks. Nevertheless, they are what I have on hand and should be enough to establish a baseline of vSAN OSA performance. 

The first port of the X550 will be used for management and VM traffic (green lines), and the second will be used for vSAN witness traffic (blue lines). I didn't have a switch capable of handling VLANs, so this will run over two "dumb" 8-port switches. The servers will be direct-connected over the 82599 network cards to pass vSAN storage traffic (red line).



Once we have benchmarked the original build, the plan is to swap the cache disks with Optane drives for OSA, then use them all for ESA. With the 10Gtek cards, each server can hold a maximum of 6 NVMe disks. vSAN OSA prefers to have a 1:1 cache-to-capacity disk ratio, so we will test with 2x Optane and 2x capacity disks, followed by 5x Optane for ESA. To make room for the second NVMe adapter, I'm going to use a ConnectX-4 100Gbe adapter in the open x16 slot for testing, while it is overkill, I don't want there to be a bottleneck (and the original build won't likely saturate the 10Gbe link currently in place). The latest HCIBench utility as of this writing is 2.8.1 and will be utilized in "easy run" mode. Stay tuned!



Monday, March 20, 2023

Adding 10Gbe networking to the Lenovo ThinkCentre M75q Gen 2

 This post has been a process, but finally, I'm happy to report that 10Gbe works on the M75q Gen 2 and also with ESXi 7.0. There were several challenges that needed to be addressed:

  • No PCI-e slot
  • No expansion chassis
  • Only 1x SATA and 1x NVMe
  • Realtek NIC onboard

The reason I wanted to use this system is because of how compute dense it is for the form factor - a <1L system that houses 8 Ryzen cores and 64GB of DDR4 would make a great power sipping small box for the homelab; adding 10Gbe networking would allow it a better storage option for VM consumption. I've addressed how to overcome the Realtek NIC in a previous blog post by utilizing the USB fling, now we'll cover what it takes to add the 10Gbe card.

Our BOM is as follows:

The case extender was designed by me, which allows a single slot PCI-e card to be installed and has a hole on the side to allow a screw driver to affix an M3 screw to secure the card, and a slot in the back to pass the USB to SATA cable through.




Once the card has been connected, ESXi can be booted and the card should be recognized:


The only real issue that I've run into so far is what cards are supported by this setup. I have an Intel X550 (Lenovo OEM) which physically fits the slot, but wasn't detected on boot. I assume this is a power delivery limitation, as I doubt the USB adapter can provide enough juice. I would like to test the quad port Intel i225 card provided by QNAP, and may do so in the future as having supported NICs would make booting this much easier.

Tuesday, February 21, 2023

VMware Cloud Professional certification - thoughts and tips to pass

 I recently passed the VCP-VMC 2023 exam, which was made possible by the free VMware course that allowed me to check the prerequisite for the certification. For those looking to take on the exam, I'll include what I can remember in terms of general concepts.

For starters, and as per usual with any VMware exam, start with the exam guide.

Like anything to do with cloud, it is network heavy. I'm not a networking engineer but have a long-since lapsed CCENT certification. This exam is going to grill you on CIDR, subnets, and network overlaps, and assumes that you have a general knowledge of the OSI model. Focus as well on the different connection and VPN types for each cloud provider. While it primarily focuses on AWS, GCP and Azure questions were in there as well, so be sure to know what each provider supports for configuration minimums and maximums regarding management networking configuration.

Speaking of minimums and maximums, you'll want to read up on cluster sizes and hardware configurations. What are the specs of an i3.metal instance vs. an i3en.metal? What kind of nodes can you get with Azure and GCP? And how many can you throw into a cluster? All of these may appear on the exam.

Managed services, such as VMware Cloud on Dell EMC and AWS Outposts should be studied as well. What physical requirements do these carry? What are their responsibilities?

HCX... woo boy. Several of these questions showed up, and did nothing but generate anxiety. Get to know HCX. Get to know the deployment models, and read up on how to troubleshoot different scenarios.

Containers, Kubernetes and Tanzu all showed up on my exam. Know what TKG does, how to deploy it, what value Kubernetes brings to containers in general, and what Tanzu services do what function.

That's all I can remember as of this moment. I'm still kind of pumped from getting through it. The only feedback that I have is that I don't know if some of my answers were right as they may have changed after the exam was written. For instance, Google has updated their networking requirements as of November 2022, so I'm not sure if I got the question wrong by answering based on current requirements, or if I should've answered based on previous specs. Perhaps a higher-level question that isn't dependent on something that can change with relative frequency would be better.

I hope you found this helpful. Feel free to comment below or ping me on Twitter if you have any questions!

Orange Pi 5 Plus ESXi on ARM Fling 1.14 update

 With VMware Fling 1.14 and the latest commits of the edk2-3588 firmware, the Orange Pi 5 Plus can now utilize some USB NICs, and make use o...