Trained a 3B Llama with GRPO on a chemistry RL environment. Drug-like molecules in 6 hours of GPU time. [P]

Trained a 3B Llama with GRPO on a chemistry RL environment. Drug-like molecules in 6 hours of GPU time. [P]

What if AI could design a new drug in 30 seconds?

That's the future I'm betting on. Maybe 5 years out. Maybe 15. But it's coming.

This week I built a tiny piece of it. A reinforcement learning environment where an AI agent designs drug molecules atom by atom. Add a fragment. Swap an atom. Build a scaffold. The environment scores it on real chemistry: Lipinski rules, drug-likeness, synthesis difficulty, target protein binding.

Trained Llama-3.2-3B with GRPO. Six hours on a single A10G.

Image 1: a molecule the trained model designed. QED 0.94. Same drug-likeness range as FDA-approved oral medications.

Image 2: the model's chemistry sense evolving across 150 training steps. You can almost see it figuring out what "drug-like" means.

Six hours of GPU time produced something a medicinal chemist would actually look at. What happens at 600 hours? 6000?

Genuinely curious what people here think. How far are we from "AI proposes 10,000 candidates, a chemist picks 5" being the routine drug discovery workflow?

https://preview.redd.it/33xmgygf9oyg1.png?width=1430&format=png&auto=webp&s=1ee257081ed592678604faab6df8ed8572508775

https://preview.redd.it/j5ci9ygf9oyg1.png?width=1754&format=png&auto=webp&s=6eef89f1c5635a56951a012d84db0b86606baacf

submitted by /u/anshumanatrey
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top