cs.LG

TAH-QUANT: Effective Activation Quantization in Pipeline Parallelism over Slow Network

arXiv:2506.01352v2 Announce Type: replace
Abstract: Decentralized training of large language models offers the opportunity to pool computational resources across geographically distributed participants, but is often bottlenecked by network communicati…