Forgive my ignorance but how is a 27B model better than 397B?

Forgive my ignorance but how is a 27B model better than 397B?

Is Qwen just incredibly good at doing dense and not so good at doing MoE?

I get that dense is generally better than MoE but 27B being better than 397B just doesn’t sit right with me.

What are those additional experts even doing then?

submitted by /u/No_Conversation9561
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top