Large Transformer Model Inference Optimization
[Updated on 2023-01-24: add a small section on Distillation.]
Large transformer models are mainstream nowadays, creating SoTA results for a variety of tasks. They are powerful but very expensive to train and use. The extremely high inference cost, in b…