Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes?

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top