new MoE from ai2, EMO
new MoE release from ai2 – EMO, 1b-active/14b-total trained on 1t tokens interesting thing is document-level routing. experts cluster around domains like health, news, etc. instead of surface patterns models: https://huggingface.co/collections/al…