Built an political benchmark for LLMs. KIMI K2 can’t answer about Taiwan (Obviously). GPT-5.3 refuses 100% of questions when given an opt-out. [P]
I spent the few days building a benchmark that maps where frontier LLMs fall on a 2D political compass (economic left/right + social progressive/conservative) using 98 structured questions across 14 policy areas. I tested GPT-5.3, Claude Opus 4.6, and …