cs.CL, cs.DC

Folding Tensor and Sequence Parallelism for Memory-Efficient Transformer Training & Inference

arXiv:2604.26294v1 Announce Type: new
Abstract: We present tensor and sequence parallelism (TSP), a parallel execution strategy that folds tensor parallelism and sequence parallelism onto a single device axis. In conventional multi-dimensional paralle…