Folding Tensor and Sequence Parallelism for Memory-Efficient Transformer Training & Inference
arXiv:2604.26294v1 Announce Type: new
Abstract: We present tensor and sequence parallelism (TSP), a parallel execution strategy that folds tensor parallelism and sequence parallelism onto a single device axis. In conventional multi-dimensional paralle…