Doing More With Less: Revisiting the Effectiveness of LLM Pruning for Test-Time Scaling
arXiv:2604.25098v1 Announce Type: cross
Abstract: While current Large Language Models (LLMs) exhibit remarkable reasoning capabilities through test-time compute scaling (TTS), their massive parameter counts and high inference costs have motivated the …