/u/imstilllearningthis

could refusal layers be masking dialect-conditioned safety failures in MoE models [d]

/u/imstilllearningthis / May 18, 2026

I set out to test whether AAVE-coded (African American English Vernacular) prompts cause MoE language models to route, deliberate, and respond differently from semantically matched AE (Academic English) prompts in safety-sensitive situations, especiall…

Author name: /u/imstilllearningthis

could refusal layers be masking dialect-conditioned safety failures in MoE models [d]