Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes?By Hugging Face - Blog / March 5, 2024