Is there a limit on the number of active parameters in an MoE model?

Hi. We recently had MoE models as big as 1T and 1.6T total parameters. My expectation on the proportion between total and active parameters so far was around 10 to 1, that we save on smaller, "actually local" models.

However, these new huge models have a much smaller number of active parameters for their size (~40B?). It makes me wonder.

Is there a new architecture at play here? Or it's that there is no point in increasing the active parameter count after a certain number? Will we never see for example a 2T/A200B MoE model? Is there a "cap" in MoE models beyond which adding active parameters doesn't improve quality of results?

Thanks

submitted by /u/ihatebeinganonymous
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top