Author name: /u/PositiveInformal9512

[D] VIT16 – Should I use all or only final attention MHA to generate attention heatmap?

/u/PositiveInformal9512 / February 10, 2026

Hello, I'm currently extracting attention heatmaps from pretrained ViT16 models (which i then finetune) to see what regions of the image did the model use to make its prediction. Many research papers and sources suggests that I should only ex…

MachineLearning

[D] Vision Transformer (ViT) – How do I deal with variable size images?

/u/PositiveInformal9512 / January 21, 2026

Hi, I'm currently building a ViT following the research paper (An Image is Worth 16×16 Words). I was wondering what the best solution is for dealing with variable size images for training the model for classification? One solution I can think of is…