cs.AI, cs.CL, cs.LG

Doing More With Less: Revisiting the Effectiveness of LLM Pruning for Test-Time Scaling

arXiv:2604.25098v1 Announce Type: cross
Abstract: While current Large Language Models (LLMs) exhibit remarkable reasoning capabilities through test-time compute scaling (TTS), their massive parameter counts and high inference costs have motivated the …