Team-Based Self-Play With Dual Adaptive Weighting for Fine-Tuning LLMs
arXiv:2605.09922v1 Announce Type: cross
Abstract: While recent self-training approaches have reduced reliance on human-labeled data for aligning LLMs, they still face critical limitations: (i) sensitivity to synthetic data quality, leading to instabil…