Evaluating whether AI models would sabotage AI safety research
arXiv:2604.24618v1 Announce Type: new
Abstract: We evaluate the propensity of frontier models to sabotage or refuse to assist with safety research when deployed as AI research agents within a frontier AI company. We apply two complementary evaluations…