LocalLLaMA

You can do CUDA inference on an Apple Silicon Mac with PCI Passthrough

I have been working on a project to adapt QEMU, running on macOS, to support passing through a GPU into a Linux VM. I wrote this post walking through some of the interesting challenges there, along with benchmarks. The post focuses a lot on gamin…