Masked Next-Scale Prediction for Self-supervised Scene Text Recognition
arXiv:2605.14885v1 Announce Type: new
Abstract: Scene Text Recognition requires modeling visual structures that evolve from coarse layouts to fine-grained character strokes. Training such models relies on large amounts of annotated data. Recent self-s…