CXRMate-2: Structured Multimodal Temporal Embeddings and Tractable Reinforcement Learning for Clinically Acceptable Chest X-ray Radiology Report Generation
arXiv:2604.18967v2 Announce Type: replace
Abstract: Chest X-ray (CXR) radiology report generation (RRG) models have shown rapid progress on automated metrics, yet their clinical utility remains uncertain due to limited qualitative evaluation by radiologists. We present CXRMate-2, a state-of-the-art CXR RRG model that enables tractable reinforcement learning (RL) through structured multimodal temporal embeddings and high-resolution visual feature compression, for efficient, unified conditioning of an LLM decoder on visual, textual, and temporal context from a study and its prior. This enables group relative policy optimisation (GRPO), where a proposed reward function is used to improve semantic alignment with radiologist reports. Across the MIMIC-CXR, CheXpert Plus, and ReXgradient datasets, CXRMate-2 achieves statistically significant improvements over strong benchmarks, including gains of 11.2% and 24.4% in GREEN and RadGraph-XL, respectively, on MIMIC-CXR relative to MedGemma 1.5 (4B).
To directly compare CXRMate-2 against radiologist reporting, we conduct a blinded, randomised qualitative retrospective evaluation. Three consultant radiologists compare generated and radiologist reports across 120 studies from the MIMIC-CXR test set. Generated reports were deemed acceptable (defined as preferred or rated equally to radiologist reports) in 45% of ratings, with no statistically significant difference in preference rates for seven of the eight analysed findings. Preferences for radiologist reports were driven primarily by higher recall, while generated reports were consistently preferred for readability.
Together, these results define a clear pathway to clinically acceptable CXR RRG. Improving recall and the detection of subtle findings represents the primary remaining barrier to non-inferiority with radiologist reporting, positioning CXR RRG for prospective evaluation in assistive, radiologist-led workflows.