[D] VIT16 – Should I use all or only final attention MHA to generate attention heatmap?
Hello, I'm currently extracting attention heatmaps from pretrained ViT16 models (which i then finetune) to see what regions of the image did the model use to make its prediction. Many research papers and sources suggests that I should only ex…