| Just got a DGX Spark set up today and starting to configure it for local LLM inference. Plan is to run: as a local API backend for an application I’m building (education / analytics use case, trying to keep everything local/private). I’ve mostly been working with cloud GPUs up to now, so this is my first time running something like this fully on-prem. A few things I’m curious about: Would appreciate any insights from people running similar setups. [link] [comments] |