Attention With Actual Numbers
I spent some time trying to understand attention mechanisms and kept running into the same problem — tutorials would show the architecture diagram, briefly touch on the math, and move on. So I worked through the numbers myself. This is what I came away…