How do you experiment with a (very) large model architecture? [D]
Im trying to reproduce a paper (a very particular kind of diffusion model), and their training regime is incredibly compute heavy. In general, how are quick experiments performed to validate hypotheses when the models are large and compute is expensive…