Monday, March 20, 2023

Adding 10Gbe networking to the Lenovo ThinkCentre M75q Gen 2

 This post has been a process, but finally, I'm happy to report that 10Gbe works on the M75q Gen 2 and also with ESXi 7.0. There were several challenges that needed to be addressed:

  • No PCI-e slot
  • No expansion chassis
  • Only 1x SATA and 1x NVMe
  • Realtek NIC onboard

The reason I wanted to use this system is because of how compute dense it is for the form factor - a <1L system that houses 8 Ryzen cores and 64GB of DDR4 would make a great power sipping small box for the homelab; adding 10Gbe networking would allow it a better storage option for VM consumption. I've addressed how to overcome the Realtek NIC in a previous blog post by utilizing the USB fling, now we'll cover what it takes to add the 10Gbe card.

Our BOM is as follows:

The case extender was designed by me, which allows a single slot PCI-e card to be installed and has a hole on the side to allow a screw driver to affix an M3 screw to secure the card, and a slot in the back to pass the USB to SATA cable through.




Once the card has been connected, ESXi can be booted and the card should be recognized:


The only real issue that I've run into so far is what cards are supported by this setup. I have an Intel X550 (Lenovo OEM) which physically fits the slot, but wasn't detected on boot. I assume this is a power delivery limitation, as I doubt the USB adapter can provide enough juice. I would like to test the quad port Intel i225 card provided by QNAP, and may do so in the future as having supported NICs would make booting this much easier.

Tuesday, February 21, 2023

VMware Cloud Professional certification - thoughts and tips to pass

 I recently passed the VCP-VMC 2023 exam, which was made possible by the free VMware course that allowed me to check the prerequisite for the certification. For those looking to take on the exam, I'll include what I can remember in terms of general concepts.

For starters, and as per usual with any VMware exam, start with the exam guide.

Like anything to do with cloud, it is network heavy. I'm not a networking engineer but have a long-since lapsed CCENT certification. This exam is going to grill you on CIDR, subnets, and network overlaps, and assumes that you have a general knowledge of the OSI model. Focus as well on the different connection and VPN types for each cloud provider. While it primarily focuses on AWS, GCP and Azure questions were in there as well, so be sure to know what each provider supports for configuration minimums and maximums regarding management networking configuration.

Speaking of minimums and maximums, you'll want to read up on cluster sizes and hardware configurations. What are the specs of an i3.metal instance vs. an i3en.metal? What kind of nodes can you get with Azure and GCP? And how many can you throw into a cluster? All of these may appear on the exam.

Managed services, such as VMware Cloud on Dell EMC and AWS Outposts should be studied as well. What physical requirements do these carry? What are their responsibilities?

HCX... woo boy. Several of these questions showed up, and did nothing but generate anxiety. Get to know HCX. Get to know the deployment models, and read up on how to troubleshoot different scenarios.

Containers, Kubernetes and Tanzu all showed up on my exam. Know what TKG does, how to deploy it, what value Kubernetes brings to containers in general, and what Tanzu services do what function.

That's all I can remember as of this moment. I'm still kind of pumped from getting through it. The only feedback that I have is that I don't know if some of my answers were right as they may have changed after the exam was written. For instance, Google has updated their networking requirements as of November 2022, so I'm not sure if I got the question wrong by answering based on current requirements, or if I should've answered based on previous specs. Perhaps a higher-level question that isn't dependent on something that can change with relative frequency would be better.

I hope you found this helpful. Feel free to comment below or ping me on Twitter if you have any questions!

Friday, February 17, 2023

vSphere 8.0 2023 homelab buyer's guide

The hardware market has started to recover, and with vSphere 8.0 introducing native support for the excellent Intel i226-V network card, some new contenders have arrived in terms of price to performance that generally should be able to run ESXi. This post will focus on several categories, including mini PCs, second hand workstations and servers, and whitebox builds that should meet the requirements of the updated HCL. Let's get started!


Mini PCs

NUC like systems are a classic piece in the homelab. Historically, the tradeoff has been limited network connectivity and/or lack of compute/memory density. Now, however, there are some exceptions to the rule. 

Topton, a 6 year old shop on AliExpress, has an AMD Ryzen 5000 series based "router" which grants 6 to 8 cores, up to 64GB of RAM, 3x M.2 NVMe slots, and four i226-V based 2.5Gbe ports. With an entry point of $346 USD at the time of this writing, along with a claimed capability to ship VAT/Tax free (not verified, YMMV), this looks like it could be the value king mini PC of 2023. Be wary of copycat shops who may offer similar specs at a higher discount; ensure that the store has been around for some time as not every seller can be trusted. Intel based systems can also be found with similar specs (minus the awesome core count, of course) for ~$200 USD for super cheap vSAN clusters. 

Sadly, mini PCs still suffer from a lack of PCI lanes, but I found a creative way around this... that's reserved for a follow up blog.


Second hand workstations

With buying cycles slowing down in an uncertain economy, second hand hardware is getting harder and harder to come by. Workstations, however, are sometimes offered up on eBay for a deep discount. Most of the hardware in these are supported by ESXi, with perhaps an exception for the onboard network card. Fortunately, to make up for this, they have several PCI-E slots, so adding a supported NIC isn't too much of a hassle.

The Dell Precision T7820 and T7920 can sometimes be found with Xeon Bronze or Silver processors in the sub $500 USD range. Recently, I saw a 2x Silver with 64GB of RAM listed for $350 USD. The HP equivalent Z6 G4 and Z8 G4 can be found for similar price points. Both of these *should* come with an Intel based on board NIC according to their drivers on their respective support pages. Be wary of barebone kits, as these systems do not have onboard graphics - a barebone system with no GPU will require a graphics card to function properly.

Looking forward to the future of Threadripper based workstations - The Dell Precision 7865 can pack up to 64 cores in a standard ATX tower form factor. Definitely not cheap, but exciting to see nonetheless!


Second hand servers

As of this writing, the same issue that is faced with workstations is impacting servers ten fold. Most hardware vendors are officially ending support for socket 2011-3 based servers, such as PowerEdge 13G, HPE Proliant G9, and Cisco UCS M4 systems. These are also falling out of support with ESXi 8.0, as they were originally introduced in 2014. The next generation of each (14G, G10, M5) are difficult to find on eBay for a decent price. I'll post an update to this in another blog post later this year, as I expect more second hand hardware will drop in price closer to the EOL of the older generations.


White box builds

Building out hardware that is intended for gaming and enthusiast builds has some caveats. Most gaming systems use a Realtek NIC for gigabit or 2.5Gbe networking onboard. These cards are not supported with ESXi as there is no compatible driver. Our options are:

  • Find a board with i226-V onboard (i225 for most gaming boards had many issues, regardless of revision)
  • Add a supported USB NIC with the Fling driver
  • Add a supported PCI-e NIC (and/or HBA if using a CPU with onboard graphics enabled)
AM4 builds that can make use of the Ryzen 7 5700G or Intel builds with onboard graphics can allow for multiple supported network cards. Enthusiast boards, such as the B550, have many of the similar options you'd expect to see on a server board regarding BIOS options. If you want to add in KVM-like capabilities, you can also invest in a Raspberry Pi based PiKVM solution that can allow for remote out of band management, but these cost quite a bit for a quality of life feature.
SuperMicro also has some relatively cost effective hardware available second hand on eBay, such as the H11SSL motherboard at $400 USD, which supports 8-32 cores (or up to 64 if it's revision 2.0), although you still need to factor in the cost of a heatsink, CPU and ECC memory.

Tuesday, January 3, 2023

Installing Ansible on CentOS for vSphere First Class Disks

I've been wanting to play with first class disks for some time now, and needed a means of doing so. While there are many ways to interact with vSphere APIs, Ansible provides a means of automation that has some key advantages (primarily in that it is free). This post will get us started on installing Ansible in a CentOS 9 environment, as well as installing the VMware Community modules and deploying a runbook that will create a first class disk.

The first thing we'll need to do is install Ansible. With CentOS 9, we'll need to add a repository in order to install it.

sudo yum install epel-release
sudo yum update
sudo yum install ansible
Once installed, we'll need to install some additional modules - python package manager (pip3), python SDK for VMware API (PyVmomi), and the VMware Ansible collection:

sudo yum install python3-pip -y
pip3 install PyVmomi
ansible-galaxy collection install vmware.vmware_rest
Once complete, we can work on updating our Ansible hosts file and write our first playbook. The hosts file for Ansible is located at /etc/ansible/hosts. Using the text editor of your choice, add the hostname or IP address of your vCenter server to the last line of the file.

To prepare the VMware environment, ensure that SSH is enabled on both the vCenter server and host(s) that you plan on running the playbook against. I took the extra step of starting an SSH session to each (root@vCenterIP and root@ESXiHostIP) to log the key thumbprint for each; while this may not be necessary, it's a force of habit for me.

Finally, it's time to write the First Class Disk playbook. The playbook will create a 1GB FCD based VMDK onto the datastore of our choosing. This will require the vCenter username and password. While I suggest using a variable file to store this information (shorthand "vars"), for the purpose of this example I will use plain text. I used the command "vi fcd.yml" and wrote the following:

- name: FCD
  hosts: localhost
  become: false
  gather_facts: false
  collections:
    - community.vmware
  tasks:
    - name: create disk
      vmware_first_class_disk:
        hostname: '(vCenter IP address or FQDN)'
        username: 'administrator@vsphere.local'
        password: '(enter password here)'
        validate_certs: no
        datastore_name: 'Datastore1'
        disk_name: '1GBDisk'
        size: '1GB'
        state: present
      delegate_to: localhost

Let's break this down a bit:
Hostname: Use the vCenter IP address or FQDN.
Username: Typically administrator@vsphere.local but can be a domain account. Note that as we are carrying out a vSphere action, we do not want to use "root" here.
Password: The password for the above username.
validate_certs: This was a work around for an error that I received when first trying to run the playbook. There may be other ways around it, but adding this line seems to do the trick.
disk_name: I'm not entirely certain this variable works, but it is called out in the Ansible example.

Once this is written, we can execute the playbook with: ansible-playbook fcd.yml
This should yield the following result:

PLAY [FCD] *********************************************************************

TASK [create disk] *************************************************************
ok: [localhost]

PLAY RECAP *********************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
When we log into vCenter, a new folder on the target datastore should have a label of "fcd", and we can see the new VMDK:





Tuesday, June 21, 2022

Lenovo ThinkCentre M75q Gen 2 ESXi 7.0 U3 install guide

 The last year or two hasn't been kind to homelabbers' budgets. Supply chain issues have had an insurmountable impact not only on new parts and servers, but on the second hand market as businesses desperately snap up what they need. Fortunately, supply chain and global economic recovery are making both availability and pricing of some components affordable again.

In a previous blog post, I touched on the options available for servers. Mini PCs and NUC like systems offer a lot of value, especially with the Fling drivers that add Intel and Realtek USB NIC functionality. The Fling drivers allow for USB network cards to be added to systems that have limited or no supported network adapters. Such is the case with the Lenovo ThinkCentre M75q Gen 2. It only has one network card onboard, the RealTek 8111 gigabit adapter. RealTek does not make a driver for ESXi, and while drivers have existed in the past, they were vmklinux and community based, meaning they do not work in ESXi 7.0. There are also a handful of new Intel network cards that do not have a supported driver, but this is answered by a Fling driver as well. The workaround is to add gigabit and 2.5GbE USB network cards to the system, and inject the Fling driver to the ESXi image prior to install. 

Hardware wise, I have added a CableCreation USB 3.0 Gigabit LAN 2.5Gbe adapter for the storage network, as well as a TP-Link UE300 Gigabit adapter for management.


To do so, we will need to install PowerCLI. I'll include the install commands which were taken from this link:

  1. Open PowerShell as an administrator
  2. Run the following command: Install-Module VMware.PowerCLI -Scope CurrentUser
  3. Press "y" if prompted, then enter

The next steps are borrowed directly from the first objective of the VCAP exam. 
  1. Add an offline bundle to work with, in this case I'm using an offline bundle I created previously:
    • Add-EsxSoftwareDepot ESXi-7.0U3-USBNIC.zip
  2. List the profile(s) available within that offline bundle, and make note of it:
    • Get-EsxImageProfile (in this case, it listed ESXi-7.0U3-USBNIC as the profile)
  3. Add the fling driver:
    • Add-EsxSoftwareDepot ESXi703-VMKUSB-NIC-FLING-51233328-component-18902399.zip (this will change in the future, be sure the filename matches!)
  4. Clone the profile:
    • New-EsxImageProfile -CloneProfile "ESXi-7.0U3-USBNIC" -name "ESXi-7.0U3-injected" -Vendor "vshoestring"
  5. Add the Fling software package to the newly created profile:
    • Add-EsxSoftwarePackage -ImageProfile "ESXi-7.0U3-injected" -SoftwarePackage "vmkusb-nic-fling"
  6. Export to ISO:
    • Export-ESXImageProfile -ImageProfile "ESXi-7.0U3-injected" -ExportToIso -filepath ESXi-7.0U3-injected.iso

Note that if you have issues exporting to ISO, you might have to export it to an offline bundle first then repeat the process above with the new offline bundle. To export it to an offline bundle instead:

    • Export-ESXImageProfile -ImageProfile "ESXi-7.0U3-injected" -ExportToBundle -filepath ESXi-7.0U3-injected.zip
One caveat that I've run into with using a system that only utilizes RealTek USB NICs is that the install process will halt as the system does not detect supported network cards. If you reboot, it will boot to ESXi just fine but will not store the configured password - use a blank password to configure the host.
Another issue is that on reboot, vmnic configuration isn't maintained on the virtual switches. This is due to the USB driver loading out of order. The only fix I've found is to manually reconfigure the management network adapter. Once complete, remove other vmnics from the virtual switches then re-add them.

The benefits of the M75q Gen 2 is that the CPU performance is on par with some of the Intel Xeon processors that I have in my lab currently. Granted, they are 6-8 years old at this point, but at a fraction of the power consumption, it is something that I can feel comfortable keeping powered on 24/7 without much consequence. My next blog post will compare power consumption benchmarking.

Thursday, September 23, 2021

Is the diskless server dead? Long live boot from SAN!

 Recently, VMware announced that future versions of ESXi will no longer support SD cards or USB devices as a standalone boot option. This update comes on the heels of ESXi 7.0 U2C, which rectified issues with the new partitioning scheme and heavy I/O, which would cause SD cards/USB devices to fail quickly. Rather than back tracking on the partitioning changes, VMware has decided to end support for a media that is in use in many diskless server use cases, recommending redundant persistent flash devices, SD cards with a separate persistent device for the OSDATA partition, or boot from SAN/NAS. More detailed information can be found at the official KB: https://kb.vmware.com/s/article/85685 

It makes sense that VMware is taking this route. vSphere has come a long way from the 4.x days of old. The hypervisor has changed drastically in terms of drivers, software packages and services that must meet the demand of the modern datacenter. Unfortunately, homelabbers may have difficulty making the move to said devices.

With that being said, the question remains: Is the diskless server dead? Not quite. Today, we're going to be covering how to set up option 3 of the KB mentioned before - boot from SAN. Boot from SAN simplifies the boot option conundrum as it doesn't require us to have redundant, local storage that not only is required for each server, but also requires an additional controller which adds additional cost. Boot from SAN can create multiple LUNs on the same mirrored or striped storage for multiple hosts to boot from.


Let's start with the requirements:

Storage that supports SAN/NAS (in this example, I'm going to use a TrueNAS iSCSI virtual machine, but bare metal would work just the same)

A server with a network adapter that supports iSCSI boot (not necessarily iSCSI hardware offload, just boot - in this example, a Broadcom 57810S as it is what I have on hand)

Recommended: a separate network switch or VLAN for boot from iSCSI (in this example, a separate physical switch is used)


Step 1: Configure the storage network

From the TrueNAS web interface, go to Network > Interfaces and select the configured network interface.







Edit the interface, disable DHCP, and enter an IP address of your choosing. DHCP can be used, but for the sake of this exercise we will be using a static configuration.






Step 2: Configure storage

After configuring network, we will need to add a storage pool. Go to Storage > Pools and click the "Add" button in the top right corner.





Follow the prompts and use the available disks to create a pool. If you're using bare metal hardware, the recommendation would be to use mirroring, two disks should be more than enough.





Step 3: Configure iSCSI

Once complete, we can move on to creating the iSCSI block shares. Select Sharing > Block Shares (iSCSI), and click the "Wizard" button in the upper right corner.





For the name, we will use "bfs". Multiple boot from san files can be configured here, so if you plan on booting multiple servers, feel free to enumerate them as needed. Under Device, select "Create New", drill down the folders and select "Boot from San", and set the size to 32 GiB.





Click next, and create a new portal. We will use the IP address configured previously.





Click next, and it will bring us to the initiator section. We can leave these fields blank, or if you wish you can specify the IQN numbers of the network adapters so that other NICs do not try to boot from this.





The last page allows you to confirm the configuration, if all looks good, hit "Submit".

After doing this, be sure to enable the iSCSI service: Go to "Services", set iSCSI to running and hit the check box to start automatically.

Also make note of the "associated targets". This should read as LUN 0 - this will be important to configure later on.



 


Step 4: Configure physical network adapter for boot from SAN

The shortcut to get into the preboot/oprom environment for the network adapter to configure the IP addresses will vary based on the vendor. For Broadcom and Qlogic will either be CTRL-B or CTRL-S, some Intel cards will be CTRL-S or CTRL-D. Consult the user manual of the card you're using to find out which shortcut your card uses.

From here, we can configure the adapter:

Boot protocol: iSCSI (options are typically None, PXE, iSCSI)

Initiator: The IP address you wish to assign to the server, we'll use 10.0.1.2

Target: Use the IP and IQN addresses of the TrueNAS server, be sure to set LUN 0 if it isn't already, or match accordingly if it must be changed. Under "name" or target, use the iqn under "target global configuration" (defaults to iqn.2005-10.org.freenas.ctl) followed by :bfs. In my case, it should read as iqn.2005-10.org.freenas.ctl:bfs


Step 5: Install ESXi

Set the boot order as needed, and use your preferred software to extract the ESXi ISO installer onto a USB drive. Boot into the install environment, and when you go to select a device to install to, you should see the following:





Install to the iSCSI LUN and follow the prompts. Once complete, reboot and it should load into ESXi. Congratulations! You have successfully configured boot from SAN!






Monday, May 17, 2021

HPE H240 - the new value king of HBAs

With the release of ESXi 7.0 came a farewell to the vmklinux driver stack. Device vendors focused on writing native drivers for current generation hardware, meaning that a large subset of network cards and storage controllers were left behind (a near full list of deprecated devices can be found on vDan's blog: https://vdan.cz/deprecated-devices-supported-by-vmklinux-drivers-in-esxi-7-0/). One device in particular were the LSI 92xx series of HBAs, which utilized the mpt2sas driver. These are widely used for 6.x vSAN as well as other storage server operating systems. While the 92xx series can still be used in BSD and Linux based systems, this leaves a gap for those who want to run vSAN 7.0. The LSI 93xx is readily available and uses the lsi_msgpt3 native driver, but typically runs in the $80-100+ range.


The new value king is... an unlikely candidate. It isn't vendor neutral, although by my testing it should work on most mainstream systems. It advertises 12Gb/s connectivity, but still uses the old style mini-SAS SFF-8087 connectors. The new value king for vSAN 7.0 usage is the HPE Smart HBA H240. At a price range of $22-30 (as of this writing) depending on the bracket needed, the H240 proves to be a pretty capable card. It supports RAID 0, 1 and 5, but I wouldn't recommend it for this use case as it doesn't have cache. What is critical about this card is that it has a native driver, which is supported in ESXi 7.0 and is a part of the standard ESXi image.


The major concern I had was if this card would work in a non-HPE system. My homelab is comprised of unorthodox and whiteboxed machines. The Cisco UCS C220 M4 is the only complete server that I have - the 2 node vSAN cluster I had on 6.x comprised of a Dell Precision 7810 and a SuperMicro X10 motherboard in an E-ATX case. Introducing the card to both systems went without issue - all drives are detected and it defaulted to drive passthrough mode (non-RAID). One caveat is that I am using directly cabled drives - the only backplane I have to test with would be the Cisco and it doesn't appear to support hot swap. The other issue I've found is that you cannot set the controller as a boot device, although I didn't purchase it for this purpose. If you're looking for these capabilities, I would suggest sticking with HPE Proliant servers, or find a cheaper LSI 93xx series controller.


For my use case, the HPE H240 was a drop in replacement that brought my old 2 node vSAN cluster onto 7.0 without much drama. The H8SCM micro atx server remains on 6.7, but is more than capable of running the 7.0 witness appliance. Here's a few shots of the environment post-HBA swap:




Adding 10Gbe networking to the Lenovo ThinkCentre M75q Gen 2

 This post has been a process, but finally, I'm happy to report that 10Gbe works on the M75q Gen 2 and also with ESXi 7.0. There were se...