Author name: /u/retarded_770

Going from 3B/7B dense to Nemotron 3 Nano (hybrid Mamba-MoE) for multi-task reasoning — what changes in the fine-tuning playbook? [D]

/u/retarded_770 / April 26, 2026

Following up on something I posted a few days back about fine-tuning for multi-task reasoning. Read a lot since then, and I've moved past the dense 3B vs 7B question — landing on Nemotron 3 Nano (the 30B-A3B hybrid Mamba-Attention-MoE NVIDIA releas…

MachineLearning

First time fine-tuning, need a sanity check — 3B or 7B for multi-task reasoning? [D]

/u/retarded_770 / April 23, 2026

Ok so this is my first post here, been lurking for a while. I’m about to start my first fine-tuning project and I don’t want to commit to the wrong direction so figured I’d ask. Background on me: I’m not from an ML background, self-taught, been working…