AI in healthcare, AI research, AI training, Artificial Intelligence, Benchmark, computer-vision

AI models confidently describe images they never saw, and benchmarks fail to catch it

Multimodal AI models like GPT-5, Gemini 3 Pro, and Claude Opus 4.5 generate detailed image descriptions and medical diagnoses even when no image is provided. A Stanford study shows that common benchmarks obscure the problem.
The article AI mod…