Decouple before Integration: Test-time Synthesis of SFT and RLVR Task Vectors
arXiv:2605.00610v1 Announce Type: new
Abstract: SFT and RLVR represent two fundamental yet distinct paradigms for LLM post-training, each excelling in distinct dimensions. SFT expands knowledge breadth while RLVR enhances reasoning depth. Yet integrat…