PSA: llama-swap released a new grouping feature, matrix, allowing you to fine tune which models can run together
Previously a model could only be present in a single group. Now you can create whatever groups you want: one for big models that should run on their own, a group for STT + bigger model, a group for RAG usages, etc. It'll intelligently unload models…