Monday, August 19, 2024

Revisiting the vExpert Intel Optane drives - single node vSAN ESA the "hard" way

Hello, internet! Long time no see, how you been? It's been a pretty interesting year so far, and the homelab has not been spared from the chaos; hardware failures and upgrades have caused my projects to come to a standstill. Fortunately, I've made headway by consolidating some of the hardware into a project I've been trying to get online for some time now.

I hit a stroke of luck by winning an AMD based Supermicro motherboard off of my favorite auction site, which came with a processor and some memory for about $300. The large number of PCI-e lanes opens up a number of expansion options in a standard mid-tower case. In this blog, I'm going to discuss consolidating all ten of the Intel Optane disks that I received last year into one compute node, and detail the process of getting the latest versions of ESXi and vCenter Server installed.


My BOM:

H11SSL-i - each PCI-e slot configured for x4x4 or x4x4x4x4 bifurcation

AMD EPYC 7551

128GB (8x16GB) 2133 DDR4 RAM

10x Intel Optane 280GB NVMe SSDs (vSAN pool)

5x 10Gtek PCI-e x8 to 2x U.2 NVMe adapters

Solidigm P41 Plus 2TB M.2 NVMe SSD (boot disk)

Corsair RM1000x PSU

Silverstone CS380 8 bay mid-tower case

Noctua NH-U9 TR4-SP3 heatsink


This system will consume the same Optane drives that I used in my Supermicro BigTwin SuperServer, which comprised of two X11DPT-B boards, each containing 2x Xeon Platinum 8160's and 768GB of RAM. The previous vSAN ESA build gave 5 of the Optane drives to each node, a vSAN Witness VM on a third node, and 100Gbe direct connect to share bandwidth. 

Consolidating down to the tower will be considerably quieter and draw less power, while allowing us to benchmark all ten drives without networking overhead. Downsides include less processing power, far less memory, and a little more work to do under the hood to get it working. Unlike the two node cluster, this will have no redundancy.

I'm going to detail how to accomplish all of this without a vSphere license of any kind; this will utilize the 60 day trial license, and a copy of ESXi that was acquired through supported means. Some of the old tricks of standing up a vSAN node still work with ESA, and can be deployed without vCenter.


As most of my blog posts go, this is strictly for lab use - I would not suggest running a single node vSAN cluster in production, nor would I suggest running a vSAN cluster without a proper vCenter server. We will install vCenter in a later blog post.


The first step is to download the ESXi-Customizer-PS script. This can be found here: https://github.com/VFrontDe-Org/ESXi-Customizer-PS/tree/master

PowerCLI is required to use the script. Full documentation on the script can be found here: https://www.v-front.de/p/esxi-customizer-ps.html

Simply running the script without any options will seek out the VMware online depot and create an ISO based on the latest patch version. As of this writing, I can confirm that the script works to download ESXi 8.0 Update 3 build 24022510.

Install the OS to the boot disk, then reboot.

Once booted, clear any partitions that may be on the Optane disks.


Prior to creating the vSAN cluster, we'll want to get a list of the disks that we want to use in the cluster. For my use case, I was able to run the command "esxcli storage core device list | grep t10" to list out all NVMe drives. I removed my 2TB boot disk from that output.

Since we're using a single node vSAN cluster, we can create a vSwitch with no uplinks for the purpose of vSAN networking:

  • Create vSwitch
  • Add vmkernel port
  • From an SSH session, mark the vmkernel port for vSAN traffic: esxcli vsan network ip add -i vmk1
Previously, we would use the command "esxcli vsan cluster new" to create a vSAN OSA cluster. With 8.0, we have more options:

[root@localhost:~] esxcli vsan cluster new --help
Usage: esxcli vsan cluster new [cmd options]

Description:
  new                   Create a vSAN cluster with current host joined. A random
                        sub-cluster UUID will be generated.

Cmd options:
  -c|--client-mode      vSAN client mode allows mount of vSAN datastore from the
                        server cluster without enabling vSAN.
  -s|--storage-mode     vSAN storage mode allows to create a vSAN Max cluster.
  -x|--vsanesa          vSAN ESA mode allows to create a vSAN ESA cluster.

By using the -x option, we can create the ESA cluster: esxcli vsan cluster new -x

We can then add disks to the storage pool with: "esxcli vsan storagepool add" then specify the disks that we want to use by adding the -d option to each device we listed previously. 

For example, the command should read: "esxcli vsan storagepool add -d t10.longnumbergoeshere1 -d t10.longnumbergoeshere2", repeating for each device you wish to add.

This will take a few minutes, but once it is complete, we should have a vSAN datastore:


We won't be able to use this datastore quite yet; a little more housekeeping is in order. Because the default storage policy can only be changed by vCenter, we need to update the policy to tolerate 0 failures. To do so, run the following commands:

esxcli vsan policy setdefault -c cluster -p "((\"hostFailures
ToTolerate\" i0) (\"forceProvisioning\" i1))"

esxcli vsan policy setdefault -c vdisk -p "((\"hostFailuresTo
Tolerate\" i0) (\"forceProvisioning\" i1))"

esxcli vsan policy setdefault -c vmnamespace -p "((\"hostFail
uresToTolerate\" i0) (\"forceProvisioning\" i1))"

esxcli vsan policy setdefault -c vmswap -p "((\"hostFailuresT
oTolerate\" i0) (\"forceProvisioning\" i1))"

esxcli vsan policy setdefault -c vmem -p "((\"hostFailuresToT
olerate\" i0) (\"forceProvisioning\" i1))"

Once these commands have been run, we should be able to create virtual machines on the vSAN datastore. We are now ready for a vCenter install and HCIBench testing. This also primes a VMware Cloud Foundation single node lab deployment. For now, we'll call this entry done and cover HCIBench in the next one.

No comments:

Post a Comment

Using Intel Optane for NVMe Tiering

 A series of unfortunate events occurred shortly after posting the previous blog post: DIMM H1 decided to fail Replacement was ordered Post ...