Wednesday, September 24, 2025

My journey with homelab inferencing part 1 - objectives, planning and cost

 Building a machine that can handle a local LLM isn't too difficult, but can be cost prohibitive. My journey to build a machine that can handle distilled, decent-sized models was motivated by having a completely offline voice assistant for Home Assistant - an open-source software for smart home management. My requirements for this project were as follows:

  • No subscriptions
  • Full ownership
  • Minimal cloud service connectivity
  • Keep cost down as much as possible
Since I have most of the components to build the machine already, the main thing to consider was picking a graphics card. The sweet spot in terms of performance per dollar is the NVIDIA RTX 3090 (Ti). Both the Ti and non-Ti versions of the card support 24GB of vRAM, which will fit most ~32B models. There are GPUs with more vRAM onboard, as well as multi-GPU capable systems, but the cost ramps up significantly when those are in play. Because this is a very specific use case, I don't need a lot of context - I just need it to be "smart" enough to carry out basic Home Assistant actions when asked.

As of this writing, RTX 3090's can be obtained for around $800 used on eBay. Let's draw up some pros and cons compared to subscribing to LLM as a Service offerings.

The pros are:
  • Full ownership and customization of hardware
    • No subscriptions, pay once and it's yours
  • Use whichever model you wish... 
    • ...as long as it fits in vRAM
    • Can add more hardware to run larger models
  • No internet connection required
  • Hardware can be repurposed for other projects (gaming, Folding @ Home, etc)
The cons are:
  • Steep upfront cost (LLMaaS such as ChatGPT Plus is $20/month at the time of this writing)
  • Need to decide on OS and deployment model
    • LLMaaS is turn-key and ready at a moment's notice
  • vRAM limits the size of models that can be used 
    • LLMaaS grants full size models with RAG customization
    • More GPUs can be used but increases power and cost significantly
  • Risk of hardware deprecation (RTX 3090 is two generations old and may not be able to run future models)
    • LLMaaS provides the latest models and additional features, such as advanced image generation
At the end of the day, you need to determine where you want your flexibility. Do you want flexibility in LLM capabilities, or in the hardware aspect of the stack? At $800 plus the cost of hardware, you're looking at a 40-month ROI compared to LLMaaS, with less context and capabilities. Again, for my use case, this is fine, but for a general purpose chat bot you might prefer the monthly subscription. In my next blog, I plan on covering the next stage of local LLM: OS install and containerized deployment.

No comments:

Post a Comment

My journey with homelab inferencing part 1 - objectives, planning and cost

 Building a machine that can handle a local LLM isn't too difficult, but can be cost prohibitive. My journey to build a machine that can...