Llama-server: is it bleeding to CPU/RAM?

Is there an easy way to know if a model is using CPU/RAM (and not only GPU/VRAM)?
(I think standard verbose output, which got shorter, says nothing about this, but I may be missing something)

submitted by /u/jopereira
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top