Llm inference with rust Hi all, I have been vibe playing with Candle to run some inference with qwen 3.5 4b q4 ggufs on cpu only. The speed I get is mindblowing 3.5 to 6 tok/s with some optimizations. Does anyone have any tips or tricks to gain more t/s ?
Anonymous readers can preview up to 1024 characters here. Log in to unlock the full article once ingest succeeds.