cs.AI, cs.CL, cs.LG

A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio

arXiv:2409.06624v4 Announce Type: replace
Abstract: Large Language Models (LLM) often need to be Continual Pre-Trained (CPT) to obtain unfamiliar language skills or adapt to new domains. The huge training cost of CPT often asks for cautious choice of …