How Small Can You Go? Testing DeepSeek-R1’s RL Technique on Tiny Financial Models

I trained 1.5B, 3B, and 7B models to answer questions about earnings reports using GRPO, the same reinforcement learning behind…

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top