Alibaba’s Qwen team makes AI models think deeper with new algorithm

By Jonathan Kemper / April 5, 2026

Reinforcement learning hits a wall with reasoning models because every token gets the same reward. A new algorithm from Alibaba's Qwen team fixes this by weighting each step based on how much it shapes what comes next, doubling the length of thought processes in the process.

The article Alibaba's Qwen team makes AI models think deeper with new algorithm appeared first on The Decoder.

Leave a Comment