Frame In, Frame Out: Measuring Framing Bias in LLM-Generated News Summaries
arXiv:2505.05406v2 Announce Type: replace
Abstract: News headlines and summaries shape how events are interpreted through selective emphasis and omission, a phenomenon commonly referred to as framing. Large language models are now routinely used to generate such content, yet existing evaluation frameworks largely overlook this dimension. We introduce Frame In, Frame Out (FIFO), the first large-scale benchmark for measuring framing bias in LLM-generated news summaries, grounded in the widely used XSum dataset. FIFO combines 15,499 jury-annotated examples with 320 expert-labeled instances ($\kappa = 0.61$) to validate and calibrate model-based annotations. Using FIFO, we analyze framing behavior across 27 summarization models. We find that LLMs systematically exhibit higher framing rates than human journalists, with strong variation across topics and training regimes, including elevated framing in scientific and public health summaries. Our results establish framing as a missing yet consequential dimension of summarization quality.