Gradient Compression Beyond Low-Rank: Wavelet Subspaces Compact Optimizer States
arXiv:2501.07237v4 Announce Type: replace
Abstract: Large language models (LLMs) have shown impressive performance across a range of natural language processing tasks. However, their vast number of parameters introduces significant memory challenges d…